issue_comments
6 rows where issue = 341355638 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- DataArray.to_csv() · 6 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
405755212 | https://github.com/pydata/xarray/issues/2289#issuecomment-405755212 | https://api.github.com/repos/pydata/xarray/issues/2289 | MDEyOklzc3VlQ29tbWVudDQwNTc1NTIxMg== | shoyer 1217238 | 2018-07-17T23:01:10Z | 2018-07-17T23:01:10Z | MEMBER | I would also be very happy to reference xarray_extras specifically (even including an example) for parallel CSV export in the relevant section of our docs, which could be renamed "CSV and other tabular formats". |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.to_csv() 341355638 | |
405750184 | https://github.com/pydata/xarray/issues/2289#issuecomment-405750184 | https://api.github.com/repos/pydata/xarray/issues/2289 | MDEyOklzc3VlQ29tbWVudDQwNTc1MDE4NA== | shoyer 1217238 | 2018-07-17T22:34:48Z | 2018-07-17T22:34:48Z | MEMBER |
I suppose we could at least ask?
I agree somewhat, but I hope you also understand my reluctance to grow CSV export and distributed computing logic directly in xarray :). Distributed CSV writing is very clearly in scope for dask.dataframe. If we can push this core logic into dask somewhere, I would welcome a thin |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.to_csv() 341355638 | |
405743746 | https://github.com/pydata/xarray/issues/2289#issuecomment-405743746 | https://api.github.com/repos/pydata/xarray/issues/2289 | MDEyOklzc3VlQ29tbWVudDQwNTc0Mzc0Ng== | crusaderky 6213168 | 2018-07-17T22:05:29Z | 2018-07-17T22:05:29Z | MEMBER | Thing is, I don't know if performance on dask.dataframe is fixable without drastically changing its design. Also while I think dask.array is an amazing building block of xarray, dask.dataframe does feel quite redundant to me... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.to_csv() 341355638 | |
405740643 | https://github.com/pydata/xarray/issues/2289#issuecomment-405740643 | https://api.github.com/repos/pydata/xarray/issues/2289 | MDEyOklzc3VlQ29tbWVudDQwNTc0MDY0Mw== | shoyer 1217238 | 2018-07-17T21:53:37Z | 2018-07-17T21:53:37Z | MEMBER |
Yes, something like this :).
By default (if We could also potentially add a dask equivalent to the
Both of these look like improvements that would be welcome in dask.dataframe, and benefit far more users there than downstream in xarray. I have been intentionally trying to push more complex code related to distributed computing (e.g., queues and subprocesses) upstream to dask. So far, we have avoided all uses of explicit task graphs in xarray, and have only used dask.delayed in a few places. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.to_csv() 341355638 | |
405402029 | https://github.com/pydata/xarray/issues/2289#issuecomment-405402029 | https://api.github.com/repos/pydata/xarray/issues/2289 | MDEyOklzc3VlQ29tbWVudDQwNTQwMjAyOQ== | crusaderky 6213168 | 2018-07-16T22:33:34Z | 2018-07-16T22:33:34Z | MEMBER | I assume you mean There's several problems with that:
1. it doesn't support a MultiIndex on the first dimension, which I need. It could be worked around but only at the cost of a lot of ugly hacking.
2. it doesn't support writing to a single file, which means I'd need to manually reassemble the file afterwards, which translates to both more code and either I/O ops or RAM sacrificed to /dev/shm.
3. from my benchmarks, it's 12 to 20 times slower than my implementation. I did not analyse it and I'm completely unfamiliar with benchmarks: https://gist.github.com/crusaderky/89819258ff960d06136d45526f7d05db |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.to_csv() 341355638 | |
405146697 | https://github.com/pydata/xarray/issues/2289#issuecomment-405146697 | https://api.github.com/repos/pydata/xarray/issues/2289 | MDEyOklzc3VlQ29tbWVudDQwNTE0NjY5Nw== | shoyer 1217238 | 2018-07-16T04:28:31Z | 2018-07-16T04:28:31Z | MEMBER | Interesting. Would it be equivalent to export to a dask dataframe and write that to CSVs, e.g., |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DataArray.to_csv() 341355638 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 2