issues: 341355638
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
341355638 | MDU6SXNzdWUzNDEzNTU2Mzg= | 2289 | DataArray.to_csv() | 6213168 | closed | 0 | 6 | 2018-07-15T21:56:20Z | 2019-03-12T15:01:18Z | 2019-03-12T15:01:18Z | MEMBER | I'm using xarray to aggregate 38 GB worth of NetCDF data into a bunch of CSV reports. I have two problems:
To solve both problems, I wrote a new function: http://xarray-extras.readthedocs.io/en/latest/api/csv.html And now my high level wrapper code looks like this: ``` DataSet from 200 .nc files, with a total of 500000 points on the 'row' dimensionnc = xarray.open_mfdataset('inputs..nc') reports = [ # DataArrays with shape (500000, 2000), with the rows split in 200 chunks gen_report0(nc), gen_report1(nc), .... gen_report39(nc), ] futures = [ # dask.delayed objects to_csv(reports[0], 'report0.csv.gz', compression='gzip'), to_csv(reports[1], 'report1.csv.gz', compression='gzip'), .... to_csv(reports[39], 'report39.csv.gz', compression='gzip'), ] dask.compute(futures) ``` The function is currently production quality in xarray-extras, but it would be very easy to refactor it as a method of xarray.DataArray in the main library. Opinions? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2289/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |