id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
730792268,MDU6SXNzdWU3MzA3OTIyNjg=,4544,Writing subset of array with to_zarr,58827984,closed,0,,,2,2020-10-27T20:31:18Z,2022-04-11T13:31:13Z,2022-04-11T13:31:13Z,NONE,,,,"Related to #4035, just using 'region' might be the answer for me.

Within my system, I reprocess a subset of an already written dataset (written to zarr) that I would then like to write back to zarr, overwriting the stored data.  It seems like the only way to do that currently is to load the zarr array in memory, replace the changed bit, and then write the full array back with mode='w'.  

I have a hacky way of doing this (outside of to_zarr) that kind of aligns with how append_dim works with to_zarr.  I specify the overwrite_dim, match the incoming data to the part of the written zarr array that is of name=overwrite_dim and that has the same values, and use numpy syntax to overwrite just that part of the array that matches the incoming data.

```
# initialize zarr array

ds = xr.Dataset({'arr': (('time', 'data_dim'), np.ones((10,3)))}, 
                          coords={'time': np.arange(100,110,1), 'data_dim': np.arange(3)})
ans = ds.to_zarr(r'C:\collab\dasktest\data_dir\test', mode='w')
ans.get_variables()

# this would be the result of the first write

Frozen({'arr': <xarray.Variable (time: 10, data_dim: 3)>
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

# what I'd like to use to_zarr to do, just overwrite part of the first write, leaving the rest intact
import zarr
overwrite_dim = 'time'
rg = zarr.open(r'C:\collab\dasktest\data_dir\test')
overwrite_subset_ds = xr.Dataset({'arr': (('time', 'data_dim'), np.full((2,3), 2))}, 
                                                      coords={'time': np.array([103, 104]), 'data_dim': np.arange(3)})
overwrite_index = np.isin(rg[overwrite_dim], overwrite_subset_ds[overwrite_dim].values)

if overwrite_index.any():
    for darray in overwrite_subset_ds:
        if overwrite_dim in overwrite_subset_ds[darray].dims:
            d_dims_loc = d_dims.index('time')
            msk = np.zeros_like(rg['arr'], dtype=bool)
            msk[overwrite_index] = True
            rg[darray].set_mask_selection(msk, overwrite_subset_ds[darray].stack({'zwrite':d_dims}))

rg[darray][:]

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [2., 2., 2.],
       [2., 2., 2.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])
```

So the old zarr array remains except for the portion i wanted changed.  It would be nice if I could just do:

```
overwrite_subset_ds.to_zarr(r'C:\collab\dasktest\data_dir\test', mode='w', overwrite_dim='time')
```

And it would just do it.

Would this be in the scope of to_zarr?  It seems almost necessary if you are going to use xarray/zarr as a working system, with data changing as it is processed.

Or maybe this is already possible and I need to RTFM?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4544/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
928533488,MDU6SXNzdWU5Mjg1MzM0ODg=,5521,Memory inefficiency when using sortby,58827984,open,0,,,0,2021-06-23T18:31:18Z,2021-06-23T18:31:18Z,,NONE,,,,"**What happened**:

High memory usage seen when sorting after loading from disk.  Loading from disk took about 150MB, where after the sort I saw a usage of about 1.5 GB.  I believe this is due to the reindexing that requires the data to be loaded into memory during sort.  So I guess I am not surprised, but I wanted to submit this as a possible issue just to make sure that my reasoning is good.  For my use case, I will have to abandon sortby and ensure data is sorted prior to writing to disk.

I am afraid my MVCE relies on data on disk that I have.  If this is an actual issue that needs more looking into, I can provide an example that anyone can run.  Otherwise I can close.

**Minimal Complete Verifiable Example**:

```python
import xarray as xr
from psutil import virtual_memory
startmem = virtual_memory().used
data = xr.open_zarr(r""D:\falkor\FK181005_processed\em302_105_10_06_2018\attitude.zarr"", synchronizer=None, mask_and_scale=False, decode_coords=False, decode_times=False, decode_cf=False, concat_characters=False)
afterload_mem = virtual_memory().used - startmem
ans = data.sortby('time')
aftersort_mem = virtual_memory().used - startmem
print('Without sort: {}'.format(afterload_mem))
print('With sort: {}'.format(aftersort_mem))

Out: Without sort: 149241856
Out: With sort: 1657593856
```

**Environment**:

<details><summary>Output of <tt>xr.show_versions()</tt></summary>

INSTALLED VERSIONS
------------------
commit: None
python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 15:50:08) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD
byteorder: little
LC_ALL: None
LANG: None
LOCALE: English_United States.1252
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.17.0
pandas: 1.2.3
numpy: 1.20.3
scipy: 1.6.0
netCDF4: 1.5.6
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.6.1
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.1
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.03.0
distributed: 2021.03.0
matplotlib: 3.3.4
cartopy: 0.18.0
seaborn: 0.11.1
numbagg: None
pint: None
setuptools: 49.6.0.post20210108
pip: 21.0.1
conda: None
pytest: 6.2.2
IPython: 7.21.0
sphinx: 3.5.2

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5521/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue