id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1778486450,PR_kwDOAMm_X85UHfL4,7948,Implement preferred_chunks for netcdf 4 backends,167802,closed,0,,,10,2023-06-28T08:43:30Z,2023-09-12T09:01:03Z,2023-09-11T23:05:49Z,CONTRIBUTOR,,0,pydata/xarray/pulls/7948,"According to the `open_dataset` documentation, using `chunks=""auto""` or `chunks={}` should yield datasets with variables chunked depending on the preferred chunks of the backend. However neither the netcdf4 nor the h5netcdf backend seem to implement the `preferred_chunks` encoding attribute needed for this to work.
This PR adds this attribute to the encoding upon data reading. This results in `chunks=""auto""` in `open_dataset` returning variables with chunk sizes multiples of the chunks in the nc file, and for `chunks={}`, returning the variables with then exact nc chunk sizes.
- [x] Closes #1440
- [x] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [ ] New functions/methods are listed in `api.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7948/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
277441150,MDU6SXNzdWUyNzc0NDExNTA=,1743,Assigning data to vector-indexed data doesn't seem to work,167802,closed,0,,,4,2017-11-28T16:06:56Z,2022-02-23T12:23:42Z,2017-12-09T03:29:35Z,CONTRIBUTOR,,,,"#### Code Sample
```python
import xarray as xr
import numpy as np
import dask.array as da
arr = np.arange(25).reshape((5, 5))
l_indices = xr.DataArray(np.array(((0, 1), (2, 3))), dims=['lines', 'cols'])
c_indices = xr.DataArray(np.array(((1, 3), (0, 2))), dims=['lines', 'cols'])
xarr = xr.DataArray(da.from_array(arr, chunks=10), dims=['y', 'x'])
print(xarr[l_indices, c_indices])
xarr[l_indices, c_indices] = 2
```
#### Problem description
This crashes on the last line with a
```
IndexError: Unlabeled multi-dimensional array cannot be used for indexing: [[0 1]
[2 3]]
```
I'm expecting to be able to do assignment this way, and it doesn't work.
#### Expected Output
Expected output is the modified array with 2's in the indicated positions
#### Output of ``xr.show_versions()``
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-693.2.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: None.None
xarray: 0.10.0
pandas: 0.21.0
numpy: 1.13.3
scipy: 0.18.1
netCDF4: 1.1.8
h5netcdf: 0.4.2
Nio: None
bottleneck: None
cyordereddict: None
dask: 0.15.4
matplotlib: 1.2.0
cartopy: None
seaborn: None
setuptools: 36.2.1
pip: 9.0.1
conda: None
pytest: 3.1.3
IPython: 5.1.0
sphinx: 1.3.6
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1743/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
510892578,MDU6SXNzdWU1MTA4OTI1Nzg=,3433,Attributes are dropped after `clip` even if `keep_attrs` is True,167802,closed,0,,,5,2019-10-22T20:32:44Z,2020-10-14T16:29:52Z,2020-10-14T16:29:52Z,CONTRIBUTOR,,,,"#### MCVE Code Sample
```python
import xarray as xr
import numpy as np
arr = xr.DataArray(np.ones((5, 5)), attrs={'units': 'K'})
xr.set_options(keep_attrs=True)
arr
#
# array([[1., 1., 1., 1., 1.],
# [1., 1., 1., 1., 1.],
# [1., 1., 1., 1., 1.],
# [1., 1., 1., 1., 1.],
# [1., 1., 1., 1., 1.]])
# Dimensions without coordinates: dim_0, dim_1
# Attributes:
# units: K
arr.clip(0, 1)
#
# array([[1., 1., 1., 1., 1.],
# [1., 1., 1., 1., 1.],
# [1., 1., 1., 1., 1.],
# [1., 1., 1., 1., 1.],
# [1., 1., 1., 1., 1.]])
# Dimensions without coordinates: dim_0, dim_1
```
#### Expected Output
I would expect the attributes to be kept
#### Problem Description
`keep_attrs` set to `True` doesn't seem to be respected with the `DataArray.clip` method.
#### Output of ``xr.show_versions()``
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1062.1.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.6.2
xarray: 0.14.0
pandas: 0.25.1
numpy: 1.17.0
scipy: 1.3.0
netCDF4: 1.5.1.2
pydap: None
h5netcdf: 0.7.4
h5py: 2.10.0
Nio: None
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.0.28
cfgrib: None
iris: None
bottleneck: None
dask: 2.6.0
distributed: 2.6.0
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 41.4.0
pip: 19.3
conda: None
pytest: 5.0.1
IPython: 7.8.0
sphinx: 2.2.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3433/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
559645981,MDU6SXNzdWU1NTk2NDU5ODE=,3746,dataarray arithmetics restore removed coordinates in xarray 0.15,167802,closed,0,,,5,2020-02-04T11:06:40Z,2020-03-21T19:03:51Z,2020-03-21T19:03:51Z,CONTRIBUTOR,,,,"#### MCVE Code Sample
```python
import xarray as xr
import numpy as np
arr2 = xr.DataArray(np.ones((2, 2)), dims=['y', 'x'])
arr1 = xr.DataArray(np.ones((2, 2)), dims=['y', 'x'], coords={'y': [0, 1], 'x': [0, 1]})
del arr1.coords['y']
del arr1.coords['x']
# shows arr1 without coordinates
arr1
# shows coordinates in xarray 0.15
arr1 * arr2
```
#### Expected Output
```python
array([[1., 1.],
[1., 1.]])
Dimensions without coordinates: y, x
```
#### Problem Description
In xarray 0.15, the coordinates are restored when doing the multiplication:
```python
array([[1., 1.],
[1., 1.]])
Coordinates:
* y (y) int64 0 1
* x (x) int64 0 1
```
#### Output of ``xr.show_versions()``
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.1 | packaged by conda-forge | (default, Jan 29 2020, 14:55:04)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-147.0.3.el8_1.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.3
xarray: 0.15.0
pandas: 1.0.0
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: 0.7.4
h5py: 2.10.0
Nio: None
zarr: 2.3.2
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.2
cfgrib: None
iris: None
bottleneck: 1.3.1
dask: 2.10.1
distributed: 2.10.0
matplotlib: 3.1.3
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 45.1.0.post20200119
pip: 20.0.2
conda: None
pytest: 5.3.5
IPython: 7.12.0
sphinx: 2.3.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3746/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
495198361,MDU6SXNzdWU0OTUxOTgzNjE=,3317,Can't create weakrefs on DataArrays since xarray 0.13.0,167802,closed,0,6213168,,8,2019-09-18T12:36:46Z,2019-10-14T21:38:09Z,2019-09-18T15:53:51Z,CONTRIBUTOR,,,,"#### MCVE Code Sample
```python
import xarray as xr
from weakref import ref
arr = xr.DataArray([1, 2, 3])
ref(arr)
```
#### Expected Output
I expect the weak reference to be created as in former versions
#### Problem Description
The above code raises the following exception:
`TypeError: cannot create weak reference to 'DataArray' object`
#### Output of ``xr.show_versions()``
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1062.1.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.2
xarray: 0.13.0
pandas: 0.25.1
numpy: 1.17.0
scipy: 1.3.0
netCDF4: 1.5.1.2
pydap: None
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.0.22
cfgrib: None
iris: None
bottleneck: None
dask: 2.3.0
distributed: 2.4.0
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 41.2.0
pip: 19.2.3
conda: None
pytest: 5.0.1
IPython: 7.8.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3317/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
387732534,MDExOlB1bGxSZXF1ZXN0MjM2MTU0NTUz,2591,Fix h5netcdf saving scalars with filters or chunks,167802,closed,0,,,8,2018-12-05T12:22:40Z,2018-12-11T07:27:27Z,2018-12-11T07:24:36Z,CONTRIBUTOR,,0,pydata/xarray/pulls/2591," - [x] Closes #2563
- [x] Tests added (for all bug fixes or enhancements)
- [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2591/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
383667887,MDU6SXNzdWUzODM2Njc4ODc=,2563,Scalars from netcdf dataset can't be written with h5netcdf,167802,closed,0,,,1,2018-11-22T22:44:48Z,2018-12-11T07:24:36Z,2018-12-11T07:24:36Z,CONTRIBUTOR,,,,"#### Code Sample, a copy-pastable example if possible
A ""Minimal, Complete and Verifiable Example"" will make it much easier for maintainers to help you:
http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports
```python
import xarray as xr
from netCDF4 import Dataset
def write_netcdf(filename,zlib,least_significant_digit,data,dtype='f4',shuffle=False,contiguous=False,\
chunksizes=None,complevel=6,fletcher32=False):
file = Dataset(filename,'w')
file.createDimension('n', 1)
foo = file.createVariable('data',\
dtype,('n'),zlib=zlib,least_significant_digit=least_significant_digit,\
shuffle=shuffle,contiguous=contiguous,complevel=complevel,fletcher32=fletcher32,chunksizes=chunksizes)
foo[:] = data
file.close()
write_netcdf(""mydatafile.nc"",True,None,0.0,shuffle=True, chunksizes=(1,))
data = xr.open_dataset('mydatafile.nc')
arr = data['data']
arr[0].to_netcdf('mytestfile.nc', mode='w', engine='h5netcdf')
```
#### Problem description
The above example crashes with a TypeError since xarray 0.10.4 (works before, hence reporting the error here and not in eg. h5netcdf):
`TypeError: Scalar datasets don't support chunk/filter options`
The problem here is that it is not anymore possible to squeeze an array that comes from a netcdf file that was compressed or filtered.
#### Expected Output
The expected output is that the creation of the trimmed netcdf file works.
#### Output of ``xr.show_versions()``
>>> xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
xarray: 0.11.0
pandas: 0.23.4
numpy: 1.15.4
scipy: 1.1.0
netCDF4: 1.3.1
h5netcdf: 0.6.2
h5py: 2.8.0
Nio: None
zarr: None
cftime: None
PseudonetCDF: None
rasterio: 1.0.2
iris: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.20.2
distributed: None
matplotlib: 3.0.0
cartopy: 0.16.0
seaborn: None
setuptools: 40.5.0
pip: 9.0.3
conda: None
pytest: None
IPython: 6.2.1
sphinx: 1.8.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2563/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
255989233,MDU6SXNzdWUyNTU5ODkyMzM=,1560,DataArray.unstack taking unreasonable amounts of memory,167802,closed,0,,,11,2017-09-07T16:01:50Z,2018-08-15T00:18:28Z,2018-08-15T00:18:28Z,CONTRIBUTOR,,,,"Hi,
While trying to support DataArrays in pyresample, I stumble upon what seems to me to be a bug. It looks like unstacking a dimension takes unreasonable amounts of memory. For example:
```python
from xarray import DataArray
import numpy as np
arr = DataArray(np.empty([1, 8996, 9223])).stack(flat_dim=['dim_1', 'dim_2'])
print(arr)
arr.unstack('flat_dim')
```
peaks at about 8GB of my memory (in top), while the array in itself isn't supposed to take more than 635MB approximately. I know my measuring method is not very accurate, but should it be this way ?
As a side note, the unstacking also takes a very long time. What is going on under the hood ?
Martin","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1560/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
296673404,MDU6SXNzdWUyOTY2NzM0MDQ=,1906,Coordinate attributes as DataArray type doesn't export to netcdf,167802,closed,0,,,5,2018-02-13T09:48:53Z,2018-02-26T09:34:24Z,2018-02-26T09:34:24Z,CONTRIBUTOR,,,,"#### Code Sample, a copy-pastable example if possible
```python
import numpy as np
import xarray as xr
arr = xr.DataArray([[1, 2, 3]], dims=['time', 'x'])
arr['time'] = np.array([1])
time_bnds = xr.DataArray([0, 1], dims='time_bounds')
arr['time'].attrs['bounds'] = time_bnds
dataset = xr.Dataset({'arr': arr,
'time_bnds': time_bnds})
dataset.to_netcdf('time_bnd.nc')
```
#### Problem description
This code produces a TypeError
```
Traceback (most recent call last):
File ""test_time_bounds.py"", line 12, in
dataset.to_netcdf('time_bnd.nc')
File ""/home/a001673/.local/lib/python2.7/site-packages/xarray/core/dataset.py"", line 1132, in to_netcdf
unlimited_dims=unlimited_dims)
File ""/home/a001673/.local/lib/python2.7/site-packages/xarray/backends/api.py"", line 598, in to_netcdf
_validate_attrs(dataset)
File ""/home/a001673/.local/lib/python2.7/site-packages/xarray/backends/api.py"", line 121, in _validate_attrs
check_attr(k, v)
File ""/home/a001673/.local/lib/python2.7/site-packages/xarray/backends/api.py"", line 112, in check_attr
'files'.format(value))
TypeError: Invalid value for attr:
array([0, 1])
Dimensions without coordinates: time_bounds must be a number string, ndarray or a list/tuple of numbers/strings for serialization to netCDF files
```
This is a problem for me because we need to provide attributes to the coordinate variables and save the to netcdf in order to be CF compliant. There are workarounds (like saving the `time_bnds` as a regular variable and putting its name as an attribute of the `time` variable) , but the provided code seems to be the most intuitive way to do it.
#### Expected output
I would expect an output like this (ncdump -h):
```
netcdf time_bnd {
dimensions:
time = 1 ;
time_bounds = 2 ;
x = 3 ;
variables:
int64 time(time) ;
time:bounds = ""time_bnds"" ;
int64 time_bnds(time_bounds) ;
int64 arr(time, x) ;
```
#### Output of ``xr.show_versions()``
In [2]: xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-693.11.6.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: None.None
xarray: 0.10.0
pandas: 0.21.0
numpy: 1.13.3
scipy: 0.18.1
netCDF4: 1.1.8
h5netcdf: 0.4.2
Nio: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.16.1
matplotlib: 2.1.0
cartopy: None
seaborn: None
setuptools: 38.4.0
pip: 9.0.1
conda: None
pytest: 3.1.3
IPython: 5.5.0
sphinx: 1.6.6
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1906/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
289972054,MDU6SXNzdWUyODk5NzIwNTQ=,1842,DataArray read from netcdf with unexpected type,167802,closed,0,,,1,2018-01-19T13:15:11Z,2018-01-23T20:15:29Z,2018-01-23T20:15:29Z,CONTRIBUTOR,,,,"#### Code Sample, a copy-pastable example if possible
```python
import numpy as np
import h5netcdf
filename = ""mask_and_scale_float32.nc""
with h5netcdf.File(filename, 'w') as f:
f.dimensions = {'x': 5}
v = f.create_variable('hello', ('x',), dtype=np.uint16)
v[:] = np.ones(5, dtype=np.uint16)
v[0] = np.uint16(65535)
v.attrs['_FillValue'] = np.uint16(65535)
v.attrs['scale_factor'] = np.float32(2)
v.attrs['add_offset'] = np.float32(0.5)
import xarray as xr
v = xr.open_dataset(filename, mask_and_scale=True)['hello']
print(v.dtype)
```
#### Problem description
The `scale_factor` and `add_offset` being float32, I would expect the result from loading to be a float32 array. However, we get a float64 array instead. A float32 array for a very large dataset is better for faster computations.
#### Expected Output
float32
#### Output of ``xr.show_versions()``
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-693.11.6.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: None.None
xarray: 0.10.0
pandas: 0.21.0
numpy: 1.13.3
scipy: 0.18.1
netCDF4: 1.1.8
h5netcdf: 0.4.2
Nio: None
bottleneck: None
cyordereddict: None
dask: 0.16.0+37.g1fef002
matplotlib: 2.1.0
cartopy: None
seaborn: None
setuptools: 38.2.4
pip: 9.0.1
conda: None
pytest: 3.1.3
IPython: 5.5.0
sphinx: 1.3.6
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1842/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue