html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2289#issuecomment-405743746,https://api.github.com/repos/pydata/xarray/issues/2289,405743746,MDEyOklzc3VlQ29tbWVudDQwNTc0Mzc0Ng==,6213168,2018-07-17T22:05:29Z,2018-07-17T22:05:29Z,MEMBER,"Thing is, I don't know if performance on dask.dataframe is fixable without drastically changing its design. Also while I think dask.array is an amazing building block of xarray, dask.dataframe does feel quite redundant to me... ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,341355638
https://github.com/pydata/xarray/issues/2289#issuecomment-405402029,https://api.github.com/repos/pydata/xarray/issues/2289,405402029,MDEyOklzc3VlQ29tbWVudDQwNTQwMjAyOQ==,6213168,2018-07-16T22:33:34Z,2018-07-16T22:33:34Z,MEMBER,"I assume you mean ``report.to_dataset('columns').to_dask_dataframe().to_csv(...)``?
There's several problems with that:
1. it doesn't support a MultiIndex on the first dimension, which I need. It *could* be worked around but only at the cost of a lot of ugly hacking.
2. it doesn't support writing to a single file, which means I'd need to manually reassemble the file afterwards, which translates to both more code and either I/O ops or RAM sacrificed to /dev/shm.
3. from my benchmarks, it's *12 to 20 times slower* than my implementation. I did not analyse it and I'm completely unfamiliar with ``dask.dataframe``, so I'm not sure where the bottleneck is, but the fact that it doesn't fork into subprocesses (while pandas.DataFrame.to_csv() does not release the GIL) makes me suspicious.
benchmarks: https://gist.github.com/crusaderky/89819258ff960d06136d45526f7d05db","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,341355638