issue_comments: 405740643

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/2289#issuecomment-405740643	https://api.github.com/repos/pydata/xarray/issues/2289	405740643	MDEyOklzc3VlQ29tbWVudDQwNTc0MDY0Mw==	1217238	2018-07-17T21:53:37Z	2018-07-17T21:53:37Z	MEMBER	I assume you mean `report.to_dataset('columns').to_dask_dataframe().to_csv(...)`? Yes, something like this :). it doesn't support a MultiIndex on the first dimension, which I need. It could be worked around but only at the cost of a lot of ugly hacking. By default (if `set_index=False`), xarray will put variables in separate columns rather than a MultiIndex when converting into a dask dataframe. So this should work fine for exporting to CSV. I'm pretty sure you don't actually need a MultiIndex on each CSV chunk, since you could just pass `index=False` in `to_csv()` instead. We could also potentially add a dask equivalent to the `DataArray.to_pandas()` method, which would preserves the dimensionality of the argument (e.g., 2D DataArray directly to a 2D dask DataFrame). it doesn't support writing to a single file, which means I'd need to manually reassemble the file afterwards, which translates to both more code and either I/O ops or RAM sacrificed to /dev/shm. from my benchmarks, it's 12 to 20 times slower than my implementation. I did not analyse it and I'm completely unfamiliar with dask.dataframe, so I'm not sure where the bottleneck is, but the fact that it doesn't fork into subprocesses (while pandas.DataFrame.to_csv() does not release the GIL) makes me suspicious. Both of these look like improvements that would be welcome in dask.dataframe, and benefit far more users there than downstream in xarray. I have been intentionally trying to push more complex code related to distributed computing (e.g., queues and subprocesses) upstream to dask. So far, we have avoided all uses of explicit task graphs in xarray, and have only used dask.delayed in a few places.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		341355638