issue_comments
9 rows where author_association = "MEMBER" and issue = 245624267 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- lazily load dask arrays to dask data frames by calling to_dask_dataframe · 9 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
340125534 | https://github.com/pydata/xarray/pull/1489#issuecomment-340125534 | https://api.github.com/repos/pydata/xarray/issues/1489 | MDEyOklzc3VlQ29tbWVudDM0MDEyNTUzNA== | shoyer 1217238 | 2017-10-28T00:21:48Z | 2017-10-28T00:21:48Z | MEMBER | @jmunroe Thanks for your help here! I'm going to merge this now and take care of my remaining clean-up requests in a follow-on PR. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
lazily load dask arrays to dask data frames by calling to_dask_dataframe 245624267 | |
339894999 | https://github.com/pydata/xarray/pull/1489#issuecomment-339894999 | https://api.github.com/repos/pydata/xarray/issues/1489 | MDEyOklzc3VlQ29tbWVudDMzOTg5NDk5OQ== | shoyer 1217238 | 2017-10-27T07:28:02Z | 2017-10-27T07:28:02Z | MEMBER | Just pushed a couple of commits, which should resolve the failures on Windows. It was typical int32 vs int64 NumPy on Windows nonsense. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
lazily load dask arrays to dask data frames by calling to_dask_dataframe 245624267 | |
338424196 | https://github.com/pydata/xarray/pull/1489#issuecomment-338424196 | https://api.github.com/repos/pydata/xarray/issues/1489 | MDEyOklzc3VlQ29tbWVudDMzODQyNDE5Ng== | shoyer 1217238 | 2017-10-21T18:49:57Z | 2017-10-21T18:49:57Z | MEMBER | @mrocklin are you saying that it's easier to properly rechunk data on the xarray side (as arrays) before converting to dask dataframes? That does make sense -- we have some nice structure (as multi-dimensional arrays) that is lost once the data gets put in a DataFrame. In this case, I suppose we really should add a keyword argument like Initially, I was concerned about the resulting dask graphs when flattening out arrays in the wrong order. Although that would have bad performance implications if you need to stream the data from disk, I see now the total number of chunks no longer blows up, thanks to @pitrou's impressive rewrite of |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
lazily load dask arrays to dask data frames by calling to_dask_dataframe 245624267 | |
338392039 | https://github.com/pydata/xarray/pull/1489#issuecomment-338392039 | https://api.github.com/repos/pydata/xarray/issues/1489 | MDEyOklzc3VlQ29tbWVudDMzODM5MjAzOQ== | mrocklin 306380 | 2017-10-21T12:47:34Z | 2017-10-21T12:47:34Z | MEMBER | I think that you would want to rechunk the dask.array so that its chunks align with the outputs divisions of the dask.dataframe. For example if you have a 2d array and are partitioning along the x-axis then you will want to align the array so that there is no chunking along the y axis. In this case |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
lazily load dask arrays to dask data frames by calling to_dask_dataframe 245624267 | |
338368158 | https://github.com/pydata/xarray/pull/1489#issuecomment-338368158 | https://api.github.com/repos/pydata/xarray/issues/1489 | MDEyOklzc3VlQ29tbWVudDMzODM2ODE1OA== | shoyer 1217238 | 2017-10-21T06:33:27Z | 2017-10-21T06:33:27Z | MEMBER | @jcrist @mrocklin @jhamman do any of you have opinions on my latest design question above about the order of elements in dask dataframes? Is it as important as I suspect to keep chunking/divisions consistent when converting from arrays to dataframes? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
lazily load dask arrays to dask data frames by calling to_dask_dataframe 245624267 | |
335307599 | https://github.com/pydata/xarray/pull/1489#issuecomment-335307599 | https://api.github.com/repos/pydata/xarray/issues/1489 | MDEyOklzc3VlQ29tbWVudDMzNTMwNzU5OQ== | jhamman 2443309 | 2017-10-09T22:29:45Z | 2017-10-09T22:29:45Z | MEMBER | @jmunroe - can we help move this forward? I'd like to see this get into v0.10 if possible. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
lazily load dask arrays to dask data frames by calling to_dask_dataframe 245624267 | |
327300551 | https://github.com/pydata/xarray/pull/1489#issuecomment-327300551 | https://api.github.com/repos/pydata/xarray/issues/1489 | MDEyOklzc3VlQ29tbWVudDMyNzMwMDU1MQ== | jhamman 2443309 | 2017-09-05T20:55:25Z | 2017-09-05T20:55:25Z | MEMBER | @jmunroe - I added the PR checklist back to the top of this issue. The most pressing to-do item is getting some documentation written for this.
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
lazily load dask arrays to dask data frames by calling to_dask_dataframe 245624267 | |
321461998 | https://github.com/pydata/xarray/pull/1489#issuecomment-321461998 | https://api.github.com/repos/pydata/xarray/issues/1489 | MDEyOklzc3VlQ29tbWVudDMyMTQ2MTk5OA== | shoyer 1217238 | 2017-08-10T06:22:02Z | 2017-08-10T06:22:02Z | MEMBER | @jmunroe This is great functionality -- thanks for your work on this! One concern: if possible, I would like to avoid adding explicit dask graph building code in xarray. It looks like the canonical way to transform from a list of dask/numpy arrays to a dask dataframe is to make use of In [35]: import dask.dataframe as dd In [36]: import dask.array as da In [37]: x = da.from_array(np.arange(5), 2) In [38]: y = da.from_array(np.linspace(-np.pi, np.pi, 5), 2) notice that dtype is preserved properlyIn [39]: dd.concat([dd.from_array(x), dd.from_array(y)], axis=1) Out[39]: Dask DataFrame Structure: 0 1 npartitions=2 0 int64 float64 2 ... ... 4 ... ... Dask Name: concat-indexed, 26 tasks ``` Can you look into refactoring your code to make use of these? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
lazily load dask arrays to dask data frames by calling to_dask_dataframe 245624267 | |
318244646 | https://github.com/pydata/xarray/pull/1489#issuecomment-318244646 | https://api.github.com/repos/pydata/xarray/issues/1489 | MDEyOklzc3VlQ29tbWVudDMxODI0NDY0Ng== | shoyer 1217238 | 2017-07-27T02:58:35Z | 2017-07-27T02:58:35Z | MEMBER | Given that dask dataframes don't support MultiIndexes (among many other features), I have a hard time seeing them as a drop-in replacement for We could also use a new method as an opportunity to slightly change the API, by not setting an index automatically. This lets us handle N-dimensional data while side-stepping the issue of MultiIndex support -- I don't think this would be very useful when limited to 1D arrays, and dask MultiIndex support seems to be a ways away (https://github.com/dask/dask/issues/1493). Also, |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
lazily load dask arrays to dask data frames by calling to_dask_dataframe 245624267 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 3