html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/7363#issuecomment-1340952624,https://api.github.com/repos/pydata/xarray/issues/7363,1340952624,IC_kwDOAMm_X85P7VAw,43316012,2022-12-07T13:15:38Z,2022-12-07T13:15:38Z,COLLABORATOR,No worries :),"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1340951101,https://api.github.com/repos/pydata/xarray/issues/7363,1340951101,IC_kwDOAMm_X85P7Uo9,8382834,2022-12-07T13:14:56Z,2022-12-07T13:14:56Z,CONTRIBUTOR,"(really feeling bad about missing your nice suggestion @headtr1ck , I must find a better way to jump between computer / smartphone / tablet and not miss some comments :see_no_evil: . again thanks for all the help).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1340947035,https://api.github.com/repos/pydata/xarray/issues/7363,1340947035,IC_kwDOAMm_X85P7Tpb,8382834,2022-12-07T13:12:28Z,2022-12-07T13:12:28Z,CONTRIBUTOR,"Oooh I am so sorry @headtr1ck , apologies. I am using a lot the email received messages to check things out, and your message and the one from @keewis arrived in the same time and I missed yours. Really sorry, many thanks for pointing to this first, my bad.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1340942358,https://api.github.com/repos/pydata/xarray/issues/7363,1340942358,IC_kwDOAMm_X85P7SgW,43316012,2022-12-07T13:08:54Z,2022-12-07T13:08:54Z,COLLABORATOR,"You all totally ignored my comment, haha.
But glad that this solution works :)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1340939982,https://api.github.com/repos/pydata/xarray/issues/7363,1340939982,IC_kwDOAMm_X85P7R7O,8382834,2022-12-07T13:07:10Z,2022-12-07T13:07:10Z,CONTRIBUTOR,"Following the pointer by @keewis, I just did an:
```
extended_observations = previous_observations.pad(pad_width={""time"": (0, needed_padding)}, mode=""constant"", constant_values=-999)
```
This runs nearly instantaneously and does exactly what I need. Many thanks to all for your help, and sorry for missing that there was the pad function. I close for now (the only question, is why the call to reindex is costly on my machine; I wonder if there may be some old version of some underlying software at stake).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1340816132,https://api.github.com/repos/pydata/xarray/issues/7363,1340816132,IC_kwDOAMm_X85P6zsE,8382834,2022-12-07T11:14:03Z,2022-12-07T11:14:03Z,CONTRIBUTOR,"Aaah, you are right @keewis , pad should do exactly what I need :) . Many thanks. Interesting, I did spend a bit of time looking for this, somehow I could not find it - it is always hard to find the correct function to use when not knowing exactly what name to look for in advance :) .
Then I will check the use of ```pad``` this afternoon and I think this will fit my need. Still not sure why reindex was so problematic on my machine.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1340809916,https://api.github.com/repos/pydata/xarray/issues/7363,1340809916,IC_kwDOAMm_X85P6yK8,14808389,2022-12-07T11:09:03Z,2022-12-07T11:09:03Z,MEMBER,"> implementing a ""grow_coordinate"" function to grow / reallocate larger arrays copying the previous chunk along a coordinate
this sounds a lot like `pad` with `mode=""constant""`?
> is it possible that xarray makes no assumptions of this kind
`xarray` uses `pandas` indexes for alignment and indexing (if you have a recent version of `xarray` you should see the ""Indexes"" section in the HTML repr), so yes, it will always make sure to use a search that is more efficient than the linear search, as long as the data is sorted. This was also the reason why you had to use `swap_dims` / `set_index` to create an index along the coordinate you wanted to `reindex`.","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1340806519,https://api.github.com/repos/pydata/xarray/issues/7363,1340806519,IC_kwDOAMm_X85P6xV3,8382834,2022-12-07T11:06:21Z,2022-12-07T11:06:21Z,CONTRIBUTOR,"Yes, this is representative of my dataset :) .
Ok, interesting. I start this on my machine (Ubuntu 20.04, with 16GB of RAM, 15.3GB reported by the system as max available for memory).
- I start at around 6GB used, ie 9.3 GB available
- I run the script, in ipython3, after a few seconds my machine exhausts RAM and freezes, then the process gets killed:
```
[ins] In [1]: import numpy as np
...: import xarray as xr
...: import datetime
...:
...: # create two timeseries', second is for reindex
...: itime = np.arange(0, 3208464).astype("" 340_097_184. Going from float (8 byte) this will lead to 2_720_777_472, roughly 2.7GB which should fit in most setups. I'm not really sure but good chance that reindex is creating a completely new Dataset, which means the computer has to hold the origin as well as the new Dataset (which is roughly 3.2GB). This adds up to almost 6GB RAM. Depending on your machine and other tasks this might drive into RAM issues. But xarray devs will know better.
@keewis suggestion of creating and concatenating a new array with predefined values which is file-backed could resolve the issues you are currently facing.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1339973248,https://api.github.com/repos/pydata/xarray/issues/7363,1339973248,IC_kwDOAMm_X85P3l6A,8382834,2022-12-06T20:33:38Z,2022-12-06T20:33:38Z,CONTRIBUTOR,(and I guess this pattern of appending at the end of time dimension is quite common),"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1339972500,https://api.github.com/repos/pydata/xarray/issues/7363,1339972500,IC_kwDOAMm_X85P3luU,8382834,2022-12-06T20:32:49Z,2022-12-06T20:32:49Z,CONTRIBUTOR,"@keewis I will come back to my computer tomorrow but the basis is big - like going from 3 million time points before growing to 3.5 million time points after growing, and there are 100 'stations' with this number of time points each. So if re indexing does a search without dichotomy for each station and each time point, that may take some time. The specificity here is that the 3 first million time points are unchanged and the new 500k are just empty by default, but I guess reindex has no way to know it if it is written to be general?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1339717048,https://api.github.com/repos/pydata/xarray/issues/7363,1339717048,IC_kwDOAMm_X85P2nW4,43316012,2022-12-06T17:21:20Z,2022-12-06T17:21:20Z,COLLABORATOR,"Maybe pad could also work?
Something like `da.pad(x=(0, 1), constant_values=np.nan)` and then overwriting the nans `da[{""x"": -1}] = 1` with new values.
I'm actually not sure if you can use the new values directly in the constant_values argument, cannot try it now though.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1339675640,https://api.github.com/repos/pydata/xarray/issues/7363,1339675640,IC_kwDOAMm_X85P2dP4,14808389,2022-12-06T16:55:40Z,2022-12-06T16:55:40Z,MEMBER,"I'm a bit surprised. Could you post a `repr` of `timestamps_extended_basis`? That might help figuring out what exactly happened.
If everything fails, you might also create a new `xarray` object with just the new values, and then use `xr.concat` to combine both?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1339607893,https://api.github.com/repos/pydata/xarray/issues/7363,1339607893,IC_kwDOAMm_X85P2MtV,8382834,2022-12-06T16:07:17Z,2022-12-06T16:07:17Z,CONTRIBUTOR,"The call to reindex is eating up my RAM and not finishing after 15 minutes, killing it, I will apply the ""allocate larger np arrays, block copy preexisting data, create new dataset"" approach, it could be useful to have a turn key function for doing so :) .","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1339595819,https://api.github.com/repos/pydata/xarray/issues/7363,1339595819,IC_kwDOAMm_X85P2Jwr,8382834,2022-12-06T15:59:19Z,2022-12-06T16:00:51Z,CONTRIBUTOR,"This has been running for 10 minutes now; if there is a ""stupid"", ""non searchsorted"" lookup for every entry (which would make sense, there is no reason to make some assumption about how the index looks like), reindex may take a reeeeeally long time, I think I will drop this in a few minutes and do the i) create extended numpy arrays, ii) extract the xarray data as numpy arrays iii) block copy the data that is not modified, iv) block fill the data that are modified instead.
So this discussion may still be relevant for adding a new way of extending by just re-allocating with more memory at the end of a dimension, copying the previously existing data up to the previous size, and filling the new entries corresponding to the additional entries created with a user value, as this will be much faster than using reindex and lookup for every entry.
I think this is a quite typical workflow needed when working in geosciences and adding some new observations to an aggregated dataset, so this may be useful for quite many people :) .","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1339588658,https://api.github.com/repos/pydata/xarray/issues/7363,1339588658,IC_kwDOAMm_X85P2IAy,8382834,2022-12-06T15:53:53Z,2022-12-06T15:53:53Z,CONTRIBUTOR,"You are right, many thanks, applying the first solution works fine :) .
New ""issue"": the call to reindex seems to take a lot of time (guess this is because there is a lookup for every single entry), while extending a numpy array would be close to instantaneous from my point of view.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1339568566,https://api.github.com/repos/pydata/xarray/issues/7363,1339568566,IC_kwDOAMm_X85P2DG2,14808389,2022-12-06T15:39:20Z,2022-12-06T15:39:20Z,MEMBER,"I *think* this is because you don't have an index along the dimension. Try any of
```python
previous_observations.set_coords([""timestamps""]).swap_dims({""time"": ""timestamps""}).reindex(...)
previous_observations.set_index({""time"": ""timestamps""}).reindex(...)
```
(the only difference is the name of the dimension / coordinate you end up with)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1339549627,https://api.github.com/repos/pydata/xarray/issues/7363,1339549627,IC_kwDOAMm_X85P1-e7,8382834,2022-12-06T15:25:33Z,2022-12-06T15:25:33Z,CONTRIBUTOR,"A bit of context (sorry in advance for the screenshots rather than snippets; I could generate snippets if we need, it would just be a bit extra work): my dataset initially looks like (from a netCDF file):

I add a coord so that it fits the documentation above:

however the reindex then fails (either I use time or timestamps):

If you have an idea why (I googled the error message, could not find much, though I may have missed something), this could be great :) .","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1339540315,https://api.github.com/repos/pydata/xarray/issues/7363,1339540315,IC_kwDOAMm_X85P18Nb,8382834,2022-12-06T15:18:26Z,2022-12-06T15:18:26Z,CONTRIBUTOR,"Sorry, actually it does seem to work following the example from the documentation above adding a station... Then I need to understand why it does not work in my example.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1339533763,https://api.github.com/repos/pydata/xarray/issues/7363,1339533763,IC_kwDOAMm_X85P16nD,8382834,2022-12-06T15:13:48Z,2022-12-06T15:13:48Z,CONTRIBUTOR,"Ahh, actually it seems like reindexing only works if the size remains the same (?). Getting:
```
ValueError: cannot reindex or align along dimension 'time' without an index because its size 3208464 is different from the size of the new index 3304800
```
then the reindex solution would not work.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1339499858,https://api.github.com/repos/pydata/xarray/issues/7363,1339499858,IC_kwDOAMm_X85P1yVS,8382834,2022-12-06T14:50:08Z,2022-12-06T14:50:08Z,CONTRIBUTOR,"I will do the following: use only int types in these ""critical"" dimensions (I should do so anyways), this way there will be no issue of numerical equality roundoffs. I keep this open so that maintainers can see it, but feel free to close if you feel the initial suggestion is too close to reindex :) .","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1339455353,https://api.github.com/repos/pydata/xarray/issues/7363,1339455353,IC_kwDOAMm_X85P1nd5,8382834,2022-12-06T14:16:36Z,2022-12-06T14:16:36Z,CONTRIBUTOR,"Yes, this is exactly what I plan on doing, I will find my way around by myself on this, no worries :) . I just wonder if for example there may be some float rounding issues etc for example when ""matching"" values that may potentially lead to silent issues - just saying that ""re-allocated with a bunch more memory and default initializing new entries with a given value"" just feels a bit safer to me, but of course I may just be a bit paranoid :) .","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1339450779,https://api.github.com/repos/pydata/xarray/issues/7363,1339450779,IC_kwDOAMm_X85P1mWb,5821660,2022-12-06T14:13:30Z,2022-12-06T14:13:30Z,MEMBER,"You could take the exact time you have and just add the addition times. You even might create those additional ones by giving a timeinterval and the number . I'd need to look up , but I'm currently only on phone.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1339440638,https://api.github.com/repos/pydata/xarray/issues/7363,1339440638,IC_kwDOAMm_X85P1j3-,8382834,2022-12-06T14:06:02Z,2022-12-06T14:12:04Z,CONTRIBUTOR,"@kmuehlbauer many thanks for your answer :) . I think this is a very good fit indeed. The only drawback I see is the need to create the time array in advance (i.e. I have to say to xarray ""use this time array instead"" and trust that the right ""matching"" is done on existing data, rather than just say ""keep the existing arrays as they are, just extend their size and fill them with default value""), but agree this is otherwise equivalent to the thing I ask for :) .
I will udpate the SO thread with your suggestion, pointing to this issue, and giving credits to you of course, if this is ok :) .
---
edit: re-reading the SO answer, I think it is not exactly what we discuss here, I will wait for now.
edit: with your help, I am able to search better on SO, and this is well described at https://stackoverflow.com/questions/70370667/how-do-i-expand-a-data-variable-along-a-time-dimension-using-xarray :) . Keeping open just so that maintainers can decide if the ""size extension"" is so close to the ""reindex"" that they just want to direct users to ""reindex"", or if they want to add an ""growdim"" or similar.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713
https://github.com/pydata/xarray/issues/7363#issuecomment-1339403307,https://api.github.com/repos/pydata/xarray/issues/7363,1339403307,IC_kwDOAMm_X85P1awr,5821660,2022-12-06T13:39:06Z,2022-12-06T13:39:33Z,MEMBER,"Would [xarray.Dataset.reindex](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.reindex.html) do what you want?
You would need to extend you time array/coordinate appropriately and feed it to reindex. Maybe you also need to provide fillvalue keywords to get your need portions filled with the correct fillvalue.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1479121713