id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1056247970,I_kwDOAMm_X84-9RCi,5995,High memory usage of xarray vs netCDF4 function,49512274,closed,0,,,3,2021-11-17T15:13:19Z,2023-09-12T15:44:19Z,2023-09-12T15:44:18Z,NONE,,,,"Hi,

I would like to open one netcdf file, changes some variables attributes, zlib compression and sometimes global attributes.
I used to do it using netCDF4, and it worked.

Recently, i tried using xarray to perform the same job. The result are the same, but xarray always load entirely the file in memory, instead of write variable by variables.

**Minimal example here**

Creation of example file : 

```python
import xarray as xr

ds = xr.Dataset()
obs=4835680
n = 20

basic_encoding = dict(zlib=True, shuffle=True, complevel=1)

# some variables with scale factor
for i in range(3):
    vname = f""scale{i:02d}""
    ds[vname] = ([""obs""], np.random.rand(obs).astype(np.float32)/1e3)
    ds[vname].encoding.update(basic_encoding)
    ds[vname].encoding.update({""dtype"": np.uint16, ""scale_factor"": 0.0001, ""add_offset"": 0, ""chunksizes"": (1611894,)})
    
# some variables without scale factor
for i in range(3):
    vname = f""float{i:02d}""
    ds[vname] = ([""obs""], np.random.rand(obs).astype(np.float32))
    ds[vname].encoding.update(basic_encoding)
    ds[vname].encoding.update({""chunksizes"": (967136,)})
    

# some variables with 2 dimensions which use more memory
for i in range(3):
    vname = f""matrix{i:02d}""
    ds[vname] = ([""obs"", ""n""], np.random.rand(obs, n).astype(np.float32)*10)
    ds[vname].encoding.update(basic_encoding)
    ds[vname].encoding.update({""dtype"": np.int16, ""scale_factor"": 0.01, ""add_offset"": 0, ""chunksizes"": (20000, 20)})

ds.to_netcdf(""/tmp/test_original.nc"")
```

here is my olf function to copy/rewrite my netcdf file, and the new function
(i deleted useless changes in both function to keep only importants parts)

```python
import netCDF4
def old_copy(f_in, f_out):
    
    with netCDF4.Dataset(f_out, 'w') as h_out:
        with netCDF4.Dataset(f_in, 'r') as h_in:
            for dimension, size in h_in.dimensions.items():
                h_out.createDimension(dimension, len(size))
                
            for varname, var_in in h_in.variables.items():
                var_out = h_out.createVariable(
                    varname, var_in.dtype, var_in.dimensions,
                    zlib=True, complevel=2
                )
                for key in var_in.ncattrs():
                    if key != '_FillValue':
                        setattr(var_out, key, getattr(var_in, key))
                var_in.set_auto_maskandscale(False)
                var_out.set_auto_maskandscale(False)
                var_out[:] = var_in[:]
                
            for attr in h_in.ncattrs():
                setattr(h_out, attr, getattr(h_in, attr))
                
def new_copy(f_in, f_out):
    with xr.open_dataset(f_in) as d_in:
        d_in.to_netcdf(f_out)
```


here i compare both function in term of memory usage, 

```python
import holoviews as hv
from dask.diagnostics import ResourceProfiler, visualize
hv.extension(""bokeh"")

F_IN  = ""/tmp/test_original.nc""
F_OUT = ""/tmp/test.nc""

!rm -rfv {F_OUT}
with ResourceProfiler(dt=0.1) as rprof_old:
    
    old_copy(F_IN, F_OUT)
    
# rprof.visualize()
!rm -rfv {F_OUT}
with ResourceProfiler(dt=0.1) as rprof_new:
    
    new_copy(F_IN, F_OUT)
    
visualize([rprof_old, rprof_new])
```

![image](https://user-images.githubusercontent.com/49512274/142226010-b6a3b69a-82f3-46dd-b02c-88da065dbf52.png)

**What happened**:

xarray seems to load the entire file in memory to dump it.


**What you expected to happen**:

How can i tell xarray to load/dump variable by variable without loading the entire file ?

Thanks you



**Environment**:

<details><summary>Output of <tt>xr.show_versions()</tt></summary>


INSTALLED VERSIONS
------------------
commit: None
python: 3.9.6 (default, Jul 30 2021, 16:35:19) 
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-142-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: ('fr_FR', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.19.0
pandas: 1.3.2
numpy: 1.20.3
scipy: 1.6.2
netCDF4: 1.5.6
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.8.1
cftime: 1.5.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.07.2
distributed: 2021.07.2
matplotlib: 3.4.3
cartopy: 0.19.0
seaborn: None
numbagg: None
pint: 0.17
setuptools: 52.0.0.post20210125
pip: 21.2.2
conda: 4.10.3
pytest: 6.2.5
IPython: 7.26.0
sphinx: None



</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5995/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
673504545,MDU6SXNzdWU2NzM1MDQ1NDU=,4311,"uint32 variable in zarr, but float64 when loading with xarray",49512274,closed,0,,,1,2020-08-05T12:34:35Z,2021-04-19T08:59:51Z,2021-04-19T08:59:51Z,NONE,,,,"Hi all,

I start to play with xarray and zarr and came across something curious : 
I create a zarr folder and a zarr variable in uint32. When i load this dataset with xarray, it loads in float64. I don't know if it is something expected ?

```python
fichier1 = ""/tmp/test.zarr""

zh = zarr.open(fichier1, ""w"")

example = np.zeros(10, dtype=np.uint32)
myvar = zh.create_dataset(""myvar"", 
    shape=example.shape, 
    dtype=example.dtype
)

myvar.attrs[""_ARRAY_DIMENSIONS""] = [""obs""]  # <- without this, the zarr dataset will not be readable by xarray
myvar[:] = example
# dtype is uint32
zh.myvar.dtype
```
```python
>>> dtype('uint32')
```

when reloading with zarr : 

```python
# dtype is stil uint32
zh = zarr.open(fichier1, 'r')
zh.myvar.dtype
```

```python
>>> dtype('uint32')
```

But when loading with xarray :  

```python
# dtype is float64
ds = xr.open_zarr(fichier1)
ds.myvar.dtype
```

```python
>>> dtype('float64')
```

Is it something expected ? Am I missing something ?

link to the notebook created : [bad_dtype_zarr_xarray](https://github.com/ludwigVonKoopa/problems/blob/master/bad_dtype_zarr_xarray.ipynb)

**Environment**:

<details><summary>Output of <tt>xr.show_versions()</tt></summary>


INSTALLED VERSIONS
------------------
commit: None
python: 3.7.6 (default, Jan  8 2020, 19:59:22) 
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-106-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.1

xarray: 0.15.1
pandas: 1.0.3
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.3.2
cftime: 1.1.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.13.0
distributed: 2.13.0
matplotlib: 3.1.3
cartopy: None
seaborn: None
numbagg: None
setuptools: 46.1.1.post20200323
pip: 20.0.2
conda: None
pytest: 5.4.1
IPython: 7.13.0
sphinx: None

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4311/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
778221436,MDU6SXNzdWU3NzgyMjE0MzY=,4763,Keep attributes across operations,49512274,closed,0,,,1,2021-01-04T16:45:45Z,2021-01-04T16:52:15Z,2021-01-04T16:52:15Z,NONE,,,,"Hi,

I felt on this [issue#2582](https://github.com/pydata/xarray/issues/2582) about the problem when arithmetic operation doesn't keep attribute in an DataArray.

Is this problem not merged yet ?
I just installed a fresh conda env with python3.8 & xarray 0.16.2 and the problem still persist : 


```python
ds = xr.Dataset({""a"": ((""x"",), np.array([1,2,3]))})
ds[""a""].attrs[""units""] = ""m""
ds.a

Out[1]: 
<xarray.DataArray 'a' (x: 3)>
array([1, 2, 3])
Dimensions without coordinates: x
Attributes:
    units:    m
```


```python
ds[""b""] = ds.a * 2
ds.b

Out[2]: 
<xarray.DataArray 'b' (x: 3)>
array([2, 4, 6])
Dimensions without coordinates: x
```


<details><summary>Output of <tt>xr.show_versions()</tt></summary>

INSTALLED VERSIONS
------------------
commit: None
python: 3.8.5 (default, Sep  4 2020, 07:30:14)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-128-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8
libhdf5: None
libnetcdf: None

xarray: 0.16.2
pandas: 1.1.5
numpy: 1.19.2
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 51.0.0.post20201207
pip: 20.3.3
conda: None
pytest: None
IPython: 7.19.0
sphinx: None


</details>","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4763/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue