id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1056247970,I_kwDOAMm_X84-9RCi,5995,High memory usage of xarray vs netCDF4 function,49512274,closed,0,,,3,2021-11-17T15:13:19Z,2023-09-12T15:44:19Z,2023-09-12T15:44:18Z,NONE,,,,"Hi,
I would like to open one netcdf file, changes some variables attributes, zlib compression and sometimes global attributes.
I used to do it using netCDF4, and it worked.
Recently, i tried using xarray to perform the same job. The result are the same, but xarray always load entirely the file in memory, instead of write variable by variables.
**Minimal example here**
Creation of example file :
```python
import xarray as xr
ds = xr.Dataset()
obs=4835680
n = 20
basic_encoding = dict(zlib=True, shuffle=True, complevel=1)
# some variables with scale factor
for i in range(3):
vname = f""scale{i:02d}""
ds[vname] = ([""obs""], np.random.rand(obs).astype(np.float32)/1e3)
ds[vname].encoding.update(basic_encoding)
ds[vname].encoding.update({""dtype"": np.uint16, ""scale_factor"": 0.0001, ""add_offset"": 0, ""chunksizes"": (1611894,)})
# some variables without scale factor
for i in range(3):
vname = f""float{i:02d}""
ds[vname] = ([""obs""], np.random.rand(obs).astype(np.float32))
ds[vname].encoding.update(basic_encoding)
ds[vname].encoding.update({""chunksizes"": (967136,)})
# some variables with 2 dimensions which use more memory
for i in range(3):
vname = f""matrix{i:02d}""
ds[vname] = ([""obs"", ""n""], np.random.rand(obs, n).astype(np.float32)*10)
ds[vname].encoding.update(basic_encoding)
ds[vname].encoding.update({""dtype"": np.int16, ""scale_factor"": 0.01, ""add_offset"": 0, ""chunksizes"": (20000, 20)})
ds.to_netcdf(""/tmp/test_original.nc"")
```
here is my olf function to copy/rewrite my netcdf file, and the new function
(i deleted useless changes in both function to keep only importants parts)
```python
import netCDF4
def old_copy(f_in, f_out):
with netCDF4.Dataset(f_out, 'w') as h_out:
with netCDF4.Dataset(f_in, 'r') as h_in:
for dimension, size in h_in.dimensions.items():
h_out.createDimension(dimension, len(size))
for varname, var_in in h_in.variables.items():
var_out = h_out.createVariable(
varname, var_in.dtype, var_in.dimensions,
zlib=True, complevel=2
)
for key in var_in.ncattrs():
if key != '_FillValue':
setattr(var_out, key, getattr(var_in, key))
var_in.set_auto_maskandscale(False)
var_out.set_auto_maskandscale(False)
var_out[:] = var_in[:]
for attr in h_in.ncattrs():
setattr(h_out, attr, getattr(h_in, attr))
def new_copy(f_in, f_out):
with xr.open_dataset(f_in) as d_in:
d_in.to_netcdf(f_out)
```
here i compare both function in term of memory usage,
```python
import holoviews as hv
from dask.diagnostics import ResourceProfiler, visualize
hv.extension(""bokeh"")
F_IN = ""/tmp/test_original.nc""
F_OUT = ""/tmp/test.nc""
!rm -rfv {F_OUT}
with ResourceProfiler(dt=0.1) as rprof_old:
old_copy(F_IN, F_OUT)
# rprof.visualize()
!rm -rfv {F_OUT}
with ResourceProfiler(dt=0.1) as rprof_new:
new_copy(F_IN, F_OUT)
visualize([rprof_old, rprof_new])
```

**What happened**:
xarray seems to load the entire file in memory to dump it.
**What you expected to happen**:
How can i tell xarray to load/dump variable by variable without loading the entire file ?
Thanks you
**Environment**:
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.6 (default, Jul 30 2021, 16:35:19)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-142-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: ('fr_FR', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.19.0
pandas: 1.3.2
numpy: 1.20.3
scipy: 1.6.2
netCDF4: 1.5.6
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.8.1
cftime: 1.5.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.07.2
distributed: 2021.07.2
matplotlib: 3.4.3
cartopy: 0.19.0
seaborn: None
numbagg: None
pint: 0.17
setuptools: 52.0.0.post20210125
pip: 21.2.2
conda: 4.10.3
pytest: 6.2.5
IPython: 7.26.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5995/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
673504545,MDU6SXNzdWU2NzM1MDQ1NDU=,4311,"uint32 variable in zarr, but float64 when loading with xarray",49512274,closed,0,,,1,2020-08-05T12:34:35Z,2021-04-19T08:59:51Z,2021-04-19T08:59:51Z,NONE,,,,"Hi all,
I start to play with xarray and zarr and came across something curious :
I create a zarr folder and a zarr variable in uint32. When i load this dataset with xarray, it loads in float64. I don't know if it is something expected ?
```python
fichier1 = ""/tmp/test.zarr""
zh = zarr.open(fichier1, ""w"")
example = np.zeros(10, dtype=np.uint32)
myvar = zh.create_dataset(""myvar"",
shape=example.shape,
dtype=example.dtype
)
myvar.attrs[""_ARRAY_DIMENSIONS""] = [""obs""] # <- without this, the zarr dataset will not be readable by xarray
myvar[:] = example
# dtype is uint32
zh.myvar.dtype
```
```python
>>> dtype('uint32')
```
when reloading with zarr :
```python
# dtype is stil uint32
zh = zarr.open(fichier1, 'r')
zh.myvar.dtype
```
```python
>>> dtype('uint32')
```
But when loading with xarray :
```python
# dtype is float64
ds = xr.open_zarr(fichier1)
ds.myvar.dtype
```
```python
>>> dtype('float64')
```
Is it something expected ? Am I missing something ?
link to the notebook created : [bad_dtype_zarr_xarray](https://github.com/ludwigVonKoopa/problems/blob/master/bad_dtype_zarr_xarray.ipynb)
**Environment**:
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.6 (default, Jan 8 2020, 19:59:22)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-106-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.1
xarray: 0.15.1
pandas: 1.0.3
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.3.2
cftime: 1.1.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.13.0
distributed: 2.13.0
matplotlib: 3.1.3
cartopy: None
seaborn: None
numbagg: None
setuptools: 46.1.1.post20200323
pip: 20.0.2
conda: None
pytest: 5.4.1
IPython: 7.13.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4311/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
778221436,MDU6SXNzdWU3NzgyMjE0MzY=,4763,Keep attributes across operations,49512274,closed,0,,,1,2021-01-04T16:45:45Z,2021-01-04T16:52:15Z,2021-01-04T16:52:15Z,NONE,,,,"Hi,
I felt on this [issue#2582](https://github.com/pydata/xarray/issues/2582) about the problem when arithmetic operation doesn't keep attribute in an DataArray.
Is this problem not merged yet ?
I just installed a fresh conda env with python3.8 & xarray 0.16.2 and the problem still persist :
```python
ds = xr.Dataset({""a"": ((""x"",), np.array([1,2,3]))})
ds[""a""].attrs[""units""] = ""m""
ds.a
Out[1]:
array([1, 2, 3])
Dimensions without coordinates: x
Attributes:
units: m
```
```python
ds[""b""] = ds.a * 2
ds.b
Out[2]:
array([2, 4, 6])
Dimensions without coordinates: x
```
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-128-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8
libhdf5: None
libnetcdf: None
xarray: 0.16.2
pandas: 1.1.5
numpy: 1.19.2
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 51.0.0.post20201207
pip: 20.3.3
conda: None
pytest: None
IPython: 7.19.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4763/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue