html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/1489#issuecomment-340127569,https://api.github.com/repos/pydata/xarray/issues/1489,340127569,MDEyOklzc3VlQ29tbWVudDM0MDEyNzU2OQ==,6181563,2017-10-28T00:46:58Z,2017-10-28T00:46:58Z,CONTRIBUTOR,@shoyer Sound good. Thanks.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,245624267
https://github.com/pydata/xarray/pull/1489#issuecomment-340125534,https://api.github.com/repos/pydata/xarray/issues/1489,340125534,MDEyOklzc3VlQ29tbWVudDM0MDEyNTUzNA==,1217238,2017-10-28T00:21:48Z,2017-10-28T00:21:48Z,MEMBER,@jmunroe Thanks for your help here! I'm going to merge this now and take care of my remaining clean-up requests in a follow-on PR.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,245624267
https://github.com/pydata/xarray/pull/1489#issuecomment-339894999,https://api.github.com/repos/pydata/xarray/issues/1489,339894999,MDEyOklzc3VlQ29tbWVudDMzOTg5NDk5OQ==,1217238,2017-10-27T07:28:02Z,2017-10-27T07:28:02Z,MEMBER,"Just pushed a couple of commits, which should resolve the failures on Windows. It was typical int32 vs int64 NumPy on Windows nonsense.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,245624267
https://github.com/pydata/xarray/pull/1489#issuecomment-338424196,https://api.github.com/repos/pydata/xarray/issues/1489,338424196,MDEyOklzc3VlQ29tbWVudDMzODQyNDE5Ng==,1217238,2017-10-21T18:49:57Z,2017-10-21T18:49:57Z,MEMBER,"@mrocklin are you saying that it's easier to properly rechunk data on the xarray side (as arrays) before converting to dask dataframes? That does make sense -- we have some nice structure (as multi-dimensional arrays) that is lost once the data gets put in a DataFrame.
In this case, I suppose we really should add a keyword argument like `dims_order` to `to_dask_dataframe()` that lets the user choose how they want to order dimensions on the result.
Initially, I was concerned about the resulting dask graphs when flattening out arrays in the wrong order. Although that *would* have bad performance implications if you need to stream the data from disk, I see now the total number of chunks no longer blows up, thanks to @pitrou's impressive rewrite of `dask.array.reshape()`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,245624267
https://github.com/pydata/xarray/pull/1489#issuecomment-338392039,https://api.github.com/repos/pydata/xarray/issues/1489,338392039,MDEyOklzc3VlQ29tbWVudDMzODM5MjAzOQ==,306380,2017-10-21T12:47:34Z,2017-10-21T12:47:34Z,MEMBER,"I think that you would want to rechunk the dask.array so that its chunks align with the outputs divisions of the dask.dataframe. For example if you have a 2d array and are partitioning along the x-axis then you will want to align the array so that there is no chunking along the y axis. In this case `set_index` will also be free because your data is already aligned and you already know (I think) the division values.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,245624267
https://github.com/pydata/xarray/pull/1489#issuecomment-338368158,https://api.github.com/repos/pydata/xarray/issues/1489,338368158,MDEyOklzc3VlQ29tbWVudDMzODM2ODE1OA==,1217238,2017-10-21T06:33:27Z,2017-10-21T06:33:27Z,MEMBER,@jcrist @mrocklin @jhamman do any of you have opinions on my latest design [question above](https://github.com/pydata/xarray/pull/1489#pullrequestreview-70750344) about the order of elements in dask dataframes? Is it as important as I suspect to keep chunking/divisions consistent when converting from arrays to dataframes?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,245624267
https://github.com/pydata/xarray/pull/1489#issuecomment-338118973,https://api.github.com/repos/pydata/xarray/issues/1489,338118973,MDEyOklzc3VlQ29tbWVudDMzODExODk3Mw==,6181563,2017-10-20T06:36:43Z,2017-10-20T06:36:43Z,CONTRIBUTOR,I don't understand how only test (TestDataArrayAndDataset::test_to_dask_dataframe_2D) can pass on TravisCI yet fail on Appveyor. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,245624267
https://github.com/pydata/xarray/pull/1489#issuecomment-335719811,https://api.github.com/repos/pydata/xarray/issues/1489,335719811,MDEyOklzc3VlQ29tbWVudDMzNTcxOTgxMQ==,6181563,2017-10-11T07:55:44Z,2017-10-11T07:55:44Z,CONTRIBUTOR,"Hi @shoyer and @jhamman . Thanks for your patience. Please let me know if there is still anything needed to be done on this PR.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,245624267
https://github.com/pydata/xarray/pull/1489#issuecomment-335316753,https://api.github.com/repos/pydata/xarray/issues/1489,335316753,MDEyOklzc3VlQ29tbWVudDMzNTMxNjc1Mw==,6181563,2017-10-09T23:28:45Z,2017-10-09T23:28:45Z,CONTRIBUTOR,Hi @jhamman. Thanks for the nudge. I'll look at this again today and either a) just get it done or b) ask for help where needed.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,245624267
https://github.com/pydata/xarray/pull/1489#issuecomment-335307599,https://api.github.com/repos/pydata/xarray/issues/1489,335307599,MDEyOklzc3VlQ29tbWVudDMzNTMwNzU5OQ==,2443309,2017-10-09T22:29:45Z,2017-10-09T22:29:45Z,MEMBER,@jmunroe - can we help move this forward? I'd like to see this get into v0.10 if possible.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,245624267
https://github.com/pydata/xarray/pull/1489#issuecomment-327300551,https://api.github.com/repos/pydata/xarray/issues/1489,327300551,MDEyOklzc3VlQ29tbWVudDMyNzMwMDU1MQ==,2443309,2017-09-05T20:55:25Z,2017-09-05T20:55:25Z,MEMBER,"@jmunroe -
I added the PR checklist back to the top of this issue. The most pressing to-do item is getting some documentation written for this.
- The method will need to be added to `api.rst`
- We need a note briefly describing this feature in `whats-new.rst`
- We'll want to show an example of how this method can be used (either in the working with pandas or the dask doc sections)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,245624267
https://github.com/pydata/xarray/pull/1489#issuecomment-327073701,https://api.github.com/repos/pydata/xarray/issues/1489,327073701,MDEyOklzc3VlQ29tbWVudDMyNzA3MzcwMQ==,6181563,2017-09-05T05:19:08Z,2017-09-05T05:19:08Z,CONTRIBUTOR,"Sorry for the delay. I think this task is now complete.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,245624267
https://github.com/pydata/xarray/pull/1489#issuecomment-322022140,https://api.github.com/repos/pydata/xarray/issues/1489,322022140,MDEyOklzc3VlQ29tbWVudDMyMjAyMjE0MA==,6181563,2017-08-13T05:04:11Z,2017-08-13T05:04:11Z,CONTRIBUTOR,"I agree that using dask.dataframe.from_array and dask.dataframe.concat should work. Sorry I haven't had a chance to get back to this recently. I'll try to make the change early next week.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,245624267
https://github.com/pydata/xarray/pull/1489#issuecomment-321461998,https://api.github.com/repos/pydata/xarray/issues/1489,321461998,MDEyOklzc3VlQ29tbWVudDMyMTQ2MTk5OA==,1217238,2017-08-10T06:22:02Z,2017-08-10T06:22:02Z,MEMBER,"@jmunroe This is great functionality -- thanks for your work on this!
One concern: if possible, I would like to avoid adding explicit dask graph building code in xarray. It looks like the canonical way to transform from a list of dask/numpy arrays to a dask dataframe is to make use of `dask.dataframe.from_array` along with `dask.dataframe.concat`:
```
In [34]: import numpy as np
In [35]: import dask.dataframe as dd
In [36]: import dask.array as da
In [37]: x = da.from_array(np.arange(5), 2)
In [38]: y = da.from_array(np.linspace(-np.pi, np.pi, 5), 2)
# notice that dtype is preserved properly
In [39]: dd.concat([dd.from_array(x), dd.from_array(y)], axis=1)
Out[39]:
Dask DataFrame Structure:
0 1
npartitions=2
0 int64 float64
2 ... ...
4 ... ...
Dask Name: concat-indexed, 26 tasks
```
Can you look into refactoring your code to make use of these?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,245624267
https://github.com/pydata/xarray/pull/1489#issuecomment-318246117,https://api.github.com/repos/pydata/xarray/issues/1489,318246117,MDEyOklzc3VlQ29tbWVudDMxODI0NjExNw==,6181563,2017-07-27T03:08:35Z,2017-07-27T03:08:35Z,CONTRIBUTOR,"After working on this for a little while, I agree that this really should be a to_dask_dataframe() method. I'll make that change.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,245624267
https://github.com/pydata/xarray/pull/1489#issuecomment-318244646,https://api.github.com/repos/pydata/xarray/issues/1489,318244646,MDEyOklzc3VlQ29tbWVudDMxODI0NDY0Ng==,1217238,2017-07-27T02:58:35Z,2017-07-27T02:58:35Z,MEMBER,"Given that dask dataframes don't support MultiIndexes (among many other features), I have a hard time seeing them as a drop-in replacement for `pandas.DataFrame`. So maybe it would make sense to make this a separate method, e.g., `to_dask_dataframe()`?
We could also use a new method as an opportunity to slightly change the API, by not setting an index automatically. This lets us handle N-dimensional data while side-stepping the issue of MultiIndex support -- I don't think this would be very useful when limited to 1D arrays, and dask MultiIndex support seems to be a ways away (https://github.com/dask/dask/issues/1493). Also, `set_index()` in dask shuffles data, so it can be somewhat expensive.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,245624267