home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 307318224

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
307318224 MDU6SXNzdWUzMDczMTgyMjQ= 2004 Slicing DataArray can take longer than not slicing 291576 closed 0     14 2018-03-21T16:20:49Z 2020-12-03T18:15:35Z 2020-12-03T18:15:35Z CONTRIBUTOR      

Code Sample, a copy-pastable example if possible

```ipython In [1]: import xarray as xr

In [2]: radmax_ds = xr.open_dataset('tests/radmax_baseline.nc')

In [3]: radmax_ds Out[3]: <xarray.Dataset> Dimensions: (latitude: 5650, longitude: 12050, time: 3) Coordinates: * latitude (latitude) float32 13.505002 13.515002 13.525002 13.535002 ... * longitude (longitude) float32 -170.495 -170.485 -170.475 -170.465 ... * time (time) datetime64[ns] 2017-03-07T01:00:00 2017-03-07T02:00:00 ... Data variables: RadarMax (time, latitude, longitude) float32 ... Attributes: start_date: 03/07/2017 01:00 end_date: 03/07/2017 01:55 elapsed: 60 data_rights: Respond (TM) Confidential Data. (c) Insurance Services Offi...

In [4]: %timeit foo = radmax_ds.RadarMax.load() The slowest run took 35509.20 times longer than the fastest. This could mean that an intermediate result is being cached. 1 loop, best of 3: 216 µs per loop

In [5]: 216 * 35509.2 Out[5]: 7669987.199999999 ``` So, without any slicing, it takes approximately 7.5 seconds for me to load this complete file into memory. Now, let's see what happens when I slice the DataArray and load it:

``` ipython In [1]: import xarray as xr

In [2]: radmax_ds = xr.open_dataset('tests/radmax_baseline.nc')

In [3]: %timeit foo = radmax_ds.RadarMax[::1, ::1, ::1].load() 1 loop, best of 3: 7.56 s per loop

In [4]: radmax_ds.close()

In [5]: radmax_ds = xr.open_dataset('tests/radmax_baseline.nc')

In [6]: %timeit foo = radmax_ds.RadarMax[::1, ::10, ::10].load() `` I killed this session after 17 minutes.top` did not report any unusual io wait, and memory usage was not out of control. I am using v0.10.2 of xarray. My suspicion is that there is something wrong with the indexing system that is causing xarray to read in the data in a bad order. Notice that if I slice all the data, then the timing works out the same as reading it all in straight-up. Not shown here is a run where if I slice every 100 lats and 100 longitudes, then the timing is shorter again, but not to the same amount of time as reading it all in at once.

Let me know if you want a copy of the file. It is a compressed netcdf4, taking up only 1.7MB.

I wonder if this is related to #1985?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2004/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 14 rows from issue in issue_comments
Powered by Datasette · Queries took 321.692ms · About: xarray-datasette