issues: 304624171
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
304624171 | MDU6SXNzdWUzMDQ2MjQxNzE= | 1985 | Load a small subset of data from a big dataset takes forever | 22245117 | closed | 0 | 8 | 2018-03-13T04:27:58Z | 2019-01-13T01:46:08Z | 2019-01-13T01:46:08Z | CONTRIBUTOR | Code Sample```python def cut_dataset(ds2cut, varList = ['Temp' 'S' 'Eta' 'U' 'V' 'W'], lonRange = [-180, 180], latRange = [-90, 90], depthRange = [0, float("inf")], timeRange = ['2007-09-01T00', '2008-08-31T18'], timeFreq = '1D', sampMethod = 'mean', interpC = True, saveNetCDF = False): """ Cut the dataset """
3D testds_cut, grid_cut = cut_dataset(ds, varList = ['Eta'], latRange = [65, 65.5], depthRange = [0, 2], timeRange = ['2007-11-15T00', '2007-11-16T00'], timeFreq = '1D', sampMethod = 'mean', interpC = False, saveNetCDF = '3Dvariable.nc') 4D testds_cut, grid_cut = cut_dataset(ds, varList = ['Temp'], lonRange = [-30, -29.5], latRange = [65, 65.5], depthRange = [0, 2], timeRange = ['2007-11-15T00', '2007-11-16T00'], timeFreq = '1D', sampMethod = 'mean', interpC = False, saveNetCDF = '4Dvariable.nc') ``` Problem descriptionI'm working with a big dataset. However, most of the time I only need a small subset of data. My idea was to open and concatenate everything with open_mfdataset, and then extract subsets of data using the indexing routines. This approach works very good when I extract 3D variables (just lon, lat, and time), but it fails when I try to extract 4D variables (lon, lat, time, and depth). It doesn't actually fail, but to_netcdf takes forever. When I open a smaller dataset since the very beginning (let's say just November), then I'm also able to extract 4D variables. When I load the sub-dataset after using the indexing routines, does xarray need to read the whole original 4D variable? If yes, then I should probably change my approach and I should open subset of data since the very beginning. If no, am I doing something wrong? Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1985/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |