id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 2243685081,I_kwDOAMm_X86Fu-rZ,8945,netCDF4 indexing: `reindex_like` is very slow if dataset not loaded into memory,11130776,closed,0,,,4,2024-04-15T13:26:08Z,2024-04-23T21:49:28Z,2024-04-23T15:33:36Z,NONE,,,,"### What is your issue? Reindexing a dataset without loading it into memory seems to be very slow (about 1000x slower than reindexing after loading into memory). Here is a minimum working example: ``` times = 100 nlat = 200 nlon = 300 fp = xr.Dataset({""fp"": ([""time"", ""lat"", ""lon""], np.arange(times * nlat * nlon).reshape(times, nlat, nlon))}, coords={""time"": pd.date_range(start=""2019-01-01T02:00:00"", periods=times, freq=""1H""), ""lat"": np.arange(nlat), ""lon"": np.arange(nlon)}) flux = xr.Dataset({""flux"": ([""time"", ""lat"", ""lon""], np.arange(nlat * nlon).reshape(1, nlat, nlon))}, coords={""time"": [pd.to_datetime(""2019-01-01"")], ""lat"": np.arange(nlat) + np.random.normal(0.0, 0.01, nlat), ""lon"": np.arange(nlon) + np.random.normal(0.0, 0.01, nlon)}) fp.to_netcdf(""combine_datasets_tests/fp.nc"") flux.to_netcdf(""combine_datasets_tests/flux.nc"") fp1 = xr.open_dataset(""combine_datasets_tests/fp.nc"") flux1 = xr.open_dataset(""combine_datasets_tests/flux.nc"") ``` Then ``` flux1 = flux1.reindex_like(fp1, method=""ffill"", tolerance=None) ``` takes over a minute, while ``` flux1 = flux1.load().reindex_like(fp1, method=""ffill"", tolerance=None) ``` is almost instantaneous (timeit says 91ms, including opening the dataset... I'm not sure if caching is influencing this). Profiling the ""reindex without load"" cell: ``` 804936 function calls (804622 primitive calls) in 93.285 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 92.211 92.211 93.191 93.191 {built-in method _operator.getitem} 1 0.289 0.289 0.980 0.980 utils.py:81(_StartCountStride) 6 0.239 0.040 0.613 0.102 shape_base.py:267(apply_along_axis) 72656 0.109 0.000 0.109 0.000 utils.py:429() 72656 0.085 0.000 0.136 0.000 utils.py:430() 72661 0.051 0.000 0.051 0.000 {built-in method numpy.arange} 145318 0.048 0.000 0.115 0.000 shape_base.py:370() 2 0.045 0.023 0.046 0.023 indexing.py:1334(__getitem__) 6 0.044 0.007 0.044 0.007 numeric.py:136(ones) 145318 0.044 0.000 0.067 0.000 index_tricks.py:690(__next__) 14 0.033 0.002 0.033 0.002 {built-in method numpy.empty} 145333/145325 0.023 0.000 0.023 0.000 {built-in method builtins.next} 1 0.020 0.020 93.275 93.275 duck_array_ops.py:317(where) 21 0.018 0.001 0.018 0.001 {method 'astype' of 'numpy.ndarray' objects} 145330 0.013 0.000 0.013 0.000 {built-in method numpy.asanyarray} 1 0.002 0.002 0.002 0.002 {built-in method _functools.reduce} 1 0.002 0.002 93.279 93.279 variable.py:821(_getitem_with_mask) 18 0.001 0.000 0.001 0.000 {built-in method numpy.zeros} 1 0.000 0.000 0.000 0.000 file_manager.py:226(close) ``` The `getitem` call at the top is from `xarray.backends.netCDF4_.py`, line 114. Because of the jittered coordinates in `flux`, I'm assuming that the index passed to netCDF4 is not consecutive/strictly monotonic integers (0, 1, 2, 3, ...). In the past, this has caused issues: https://github.com/Unidata/netcdf4-python/issues/680. In my venv, netCDF4 was installed from a wheel with the following versions: ``` netcdf4-python version: 1.6.5 HDF5 lib version: 1.12.2 netcdf lib version: 4.9.3-development ``` This is with xarray version 2023.12.0, numpy 1.26, and pandas 1.5.3. I will try to investigate more and hopefully simplify the example. (Can't quite justify spending more time on it at work because this is just to tag a version that was used in some experiments before we switch to zarr as a backend, so hopefully it won't be relevant at that point.)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8945/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue