home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 405402029

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2289#issuecomment-405402029 https://api.github.com/repos/pydata/xarray/issues/2289 405402029 MDEyOklzc3VlQ29tbWVudDQwNTQwMjAyOQ== 6213168 2018-07-16T22:33:34Z 2018-07-16T22:33:34Z MEMBER

I assume you mean report.to_dataset('columns').to_dask_dataframe().to_csv(...)?

There's several problems with that: 1. it doesn't support a MultiIndex on the first dimension, which I need. It could be worked around but only at the cost of a lot of ugly hacking. 2. it doesn't support writing to a single file, which means I'd need to manually reassemble the file afterwards, which translates to both more code and either I/O ops or RAM sacrificed to /dev/shm. 3. from my benchmarks, it's 12 to 20 times slower than my implementation. I did not analyse it and I'm completely unfamiliar with dask.dataframe, so I'm not sure where the bottleneck is, but the fact that it doesn't fork into subprocesses (while pandas.DataFrame.to_csv() does not release the GIL) makes me suspicious.

benchmarks: https://gist.github.com/crusaderky/89819258ff960d06136d45526f7d05db

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  341355638
Powered by Datasette · Queries took 0.768ms · About: xarray-datasette