issues: 504497403
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
504497403 | MDU6SXNzdWU1MDQ0OTc0MDM= | 3386 | add option to open_mfdataset for not using dask | 42270910 | closed | 0 | 6 | 2019-10-09T08:33:53Z | 2022-04-09T01:16:21Z | 2022-04-09T01:16:21Z | NONE | open_mfdataset only works with dask, whereas with open_dataset one can choose to use dask or not. It would be nice have an option (e.g. use_dask=False) to not use dask. My special use-case is the following: I use netcdf data as input for a tensorflow/keras application. I use parallel preprocessing threads in Keras. When using dask arrays, it gets complicated because both dask and tensorflow work with threads. I do not need any processing capability of dask/xarray, I only need a lazily loaded array that I can slice, and where the slices are loaded the moment they are accessed. So my application works nice with open_dataset (without defining chunks, and thus not using dask, but the data is accessed slice by slice, so it is never loaded as a whole into memory). However, it would be nice to have the same with open_mfdataset. Right now my workaround is to use netCDF4.MFDataset . (Obviously another workaround would be to concatenate my files into one and use open_dataset) Opening each file separately with open_dataset, and then concatenating them with xr.concat does not work, as this loads the data into memory. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3386/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |