id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1778486450,PR_kwDOAMm_X85UHfL4,7948,Implement preferred_chunks for netcdf 4 backends,167802,closed,0,,,10,2023-06-28T08:43:30Z,2023-09-12T09:01:03Z,2023-09-11T23:05:49Z,CONTRIBUTOR,,0,pydata/xarray/pulls/7948,"According to the `open_dataset` documentation, using `chunks=""auto""` or `chunks={}` should yield datasets with variables chunked depending on the preferred chunks of the backend. However neither the netcdf4 nor the h5netcdf backend seem to implement the `preferred_chunks` encoding attribute needed for this to work. This PR adds this attribute to the encoding upon data reading. This results in `chunks=""auto""` in `open_dataset` returning variables with chunk sizes multiples of the chunks in the nc file, and for `chunks={}`, returning the variables with then exact nc chunk sizes. - [x] Closes #1440 - [x] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7948/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 277441150,MDU6SXNzdWUyNzc0NDExNTA=,1743,Assigning data to vector-indexed data doesn't seem to work,167802,closed,0,,,4,2017-11-28T16:06:56Z,2022-02-23T12:23:42Z,2017-12-09T03:29:35Z,CONTRIBUTOR,,,,"#### Code Sample ```python import xarray as xr import numpy as np import dask.array as da arr = np.arange(25).reshape((5, 5)) l_indices = xr.DataArray(np.array(((0, 1), (2, 3))), dims=['lines', 'cols']) c_indices = xr.DataArray(np.array(((1, 3), (0, 2))), dims=['lines', 'cols']) xarr = xr.DataArray(da.from_array(arr, chunks=10), dims=['y', 'x']) print(xarr[l_indices, c_indices]) xarr[l_indices, c_indices] = 2 ``` #### Problem description This crashes on the last line with a ``` IndexError: Unlabeled multi-dimensional array cannot be used for indexing: [[0 1] [2 3]] ``` I'm expecting to be able to do assignment this way, and it doesn't work. #### Expected Output Expected output is the modified array with 2's in the indicated positions #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 2.7.5.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-693.2.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: None.None xarray: 0.10.0 pandas: 0.21.0 numpy: 1.13.3 scipy: 0.18.1 netCDF4: 1.1.8 h5netcdf: 0.4.2 Nio: None bottleneck: None cyordereddict: None dask: 0.15.4 matplotlib: 1.2.0 cartopy: None seaborn: None setuptools: 36.2.1 pip: 9.0.1 conda: None pytest: 3.1.3 IPython: 5.1.0 sphinx: 1.3.6
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1743/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 510892578,MDU6SXNzdWU1MTA4OTI1Nzg=,3433,Attributes are dropped after `clip` even if `keep_attrs` is True,167802,closed,0,,,5,2019-10-22T20:32:44Z,2020-10-14T16:29:52Z,2020-10-14T16:29:52Z,CONTRIBUTOR,,,,"#### MCVE Code Sample ```python import xarray as xr import numpy as np arr = xr.DataArray(np.ones((5, 5)), attrs={'units': 'K'}) xr.set_options(keep_attrs=True) arr # # array([[1., 1., 1., 1., 1.], # [1., 1., 1., 1., 1.], # [1., 1., 1., 1., 1.], # [1., 1., 1., 1., 1.], # [1., 1., 1., 1., 1.]]) # Dimensions without coordinates: dim_0, dim_1 # Attributes: # units: K arr.clip(0, 1) # # array([[1., 1., 1., 1., 1.], # [1., 1., 1., 1., 1.], # [1., 1., 1., 1., 1.], # [1., 1., 1., 1., 1.], # [1., 1., 1., 1., 1.]]) # Dimensions without coordinates: dim_0, dim_1 ``` #### Expected Output I would expect the attributes to be kept #### Problem Description `keep_attrs` set to `True` doesn't seem to be respected with the `DataArray.clip` method. #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1062.1.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.6.2 xarray: 0.14.0 pandas: 0.25.1 numpy: 1.17.0 scipy: 1.3.0 netCDF4: 1.5.1.2 pydap: None h5netcdf: 0.7.4 h5py: 2.10.0 Nio: None zarr: 2.3.2 cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: 1.0.28 cfgrib: None iris: None bottleneck: None dask: 2.6.0 distributed: 2.6.0 matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: None numbagg: None setuptools: 41.4.0 pip: 19.3 conda: None pytest: 5.0.1 IPython: 7.8.0 sphinx: 2.2.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3433/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 559645981,MDU6SXNzdWU1NTk2NDU5ODE=,3746,dataarray arithmetics restore removed coordinates in xarray 0.15,167802,closed,0,,,5,2020-02-04T11:06:40Z,2020-03-21T19:03:51Z,2020-03-21T19:03:51Z,CONTRIBUTOR,,,,"#### MCVE Code Sample ```python import xarray as xr import numpy as np arr2 = xr.DataArray(np.ones((2, 2)), dims=['y', 'x']) arr1 = xr.DataArray(np.ones((2, 2)), dims=['y', 'x'], coords={'y': [0, 1], 'x': [0, 1]}) del arr1.coords['y'] del arr1.coords['x'] # shows arr1 without coordinates arr1 # shows coordinates in xarray 0.15 arr1 * arr2 ``` #### Expected Output ```python array([[1., 1.], [1., 1.]]) Dimensions without coordinates: y, x ``` #### Problem Description In xarray 0.15, the coordinates are restored when doing the multiplication: ```python array([[1., 1.], [1., 1.]]) Coordinates: * y (y) int64 0 1 * x (x) int64 0 1 ``` #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.8.1 | packaged by conda-forge | (default, Jan 29 2020, 14:55:04) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-147.0.3.el8_1.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.15.0 pandas: 1.0.0 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: 0.7.4 h5py: 2.10.0 Nio: None zarr: 2.3.2 cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.2 cfgrib: None iris: None bottleneck: 1.3.1 dask: 2.10.1 distributed: 2.10.0 matplotlib: 3.1.3 cartopy: 0.17.0 seaborn: None numbagg: None setuptools: 45.1.0.post20200119 pip: 20.0.2 conda: None pytest: 5.3.5 IPython: 7.12.0 sphinx: 2.3.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3746/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 495198361,MDU6SXNzdWU0OTUxOTgzNjE=,3317,Can't create weakrefs on DataArrays since xarray 0.13.0,167802,closed,0,6213168,,8,2019-09-18T12:36:46Z,2019-10-14T21:38:09Z,2019-09-18T15:53:51Z,CONTRIBUTOR,,,,"#### MCVE Code Sample ```python import xarray as xr from weakref import ref arr = xr.DataArray([1, 2, 3]) ref(arr) ``` #### Expected Output I expect the weak reference to be created as in former versions #### Problem Description The above code raises the following exception: `TypeError: cannot create weak reference to 'DataArray' object` #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1062.1.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.13.0 pandas: 0.25.1 numpy: 1.17.0 scipy: 1.3.0 netCDF4: 1.5.1.2 pydap: None h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: 1.0.22 cfgrib: None iris: None bottleneck: None dask: 2.3.0 distributed: 2.4.0 matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: None numbagg: None setuptools: 41.2.0 pip: 19.2.3 conda: None pytest: 5.0.1 IPython: 7.8.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3317/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 387732534,MDExOlB1bGxSZXF1ZXN0MjM2MTU0NTUz,2591,Fix h5netcdf saving scalars with filters or chunks,167802,closed,0,,,8,2018-12-05T12:22:40Z,2018-12-11T07:27:27Z,2018-12-11T07:24:36Z,CONTRIBUTOR,,0,pydata/xarray/pulls/2591," - [x] Closes #2563 - [x] Tests added (for all bug fixes or enhancements) - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2591/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 383667887,MDU6SXNzdWUzODM2Njc4ODc=,2563,Scalars from netcdf dataset can't be written with h5netcdf,167802,closed,0,,,1,2018-11-22T22:44:48Z,2018-12-11T07:24:36Z,2018-12-11T07:24:36Z,CONTRIBUTOR,,,,"#### Code Sample, a copy-pastable example if possible A ""Minimal, Complete and Verifiable Example"" will make it much easier for maintainers to help you: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports ```python import xarray as xr from netCDF4 import Dataset def write_netcdf(filename,zlib,least_significant_digit,data,dtype='f4',shuffle=False,contiguous=False,\ chunksizes=None,complevel=6,fletcher32=False): file = Dataset(filename,'w') file.createDimension('n', 1) foo = file.createVariable('data',\ dtype,('n'),zlib=zlib,least_significant_digit=least_significant_digit,\ shuffle=shuffle,contiguous=contiguous,complevel=complevel,fletcher32=fletcher32,chunksizes=chunksizes) foo[:] = data file.close() write_netcdf(""mydatafile.nc"",True,None,0.0,shuffle=True, chunksizes=(1,)) data = xr.open_dataset('mydatafile.nc') arr = data['data'] arr[0].to_netcdf('mytestfile.nc', mode='w', engine='h5netcdf') ``` #### Problem description The above example crashes with a TypeError since xarray 0.10.4 (works before, hence reporting the error here and not in eg. h5netcdf): `TypeError: Scalar datasets don't support chunk/filter options` The problem here is that it is not anymore possible to squeeze an array that comes from a netcdf file that was compressed or filtered. #### Expected Output The expected output is that the creation of the trimmed netcdf file works. #### Output of ``xr.show_versions()``
>>> xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.6.6.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-957.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 xarray: 0.11.0 pandas: 0.23.4 numpy: 1.15.4 scipy: 1.1.0 netCDF4: 1.3.1 h5netcdf: 0.6.2 h5py: 2.8.0 Nio: None zarr: None cftime: None PseudonetCDF: None rasterio: 1.0.2 iris: None bottleneck: 1.2.1 cyordereddict: None dask: 0.20.2 distributed: None matplotlib: 3.0.0 cartopy: 0.16.0 seaborn: None setuptools: 40.5.0 pip: 9.0.3 conda: None pytest: None IPython: 6.2.1 sphinx: 1.8.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2563/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 255989233,MDU6SXNzdWUyNTU5ODkyMzM=,1560,DataArray.unstack taking unreasonable amounts of memory,167802,closed,0,,,11,2017-09-07T16:01:50Z,2018-08-15T00:18:28Z,2018-08-15T00:18:28Z,CONTRIBUTOR,,,,"Hi, While trying to support DataArrays in pyresample, I stumble upon what seems to me to be a bug. It looks like unstacking a dimension takes unreasonable amounts of memory. For example: ```python from xarray import DataArray import numpy as np arr = DataArray(np.empty([1, 8996, 9223])).stack(flat_dim=['dim_1', 'dim_2']) print(arr) arr.unstack('flat_dim') ``` peaks at about 8GB of my memory (in top), while the array in itself isn't supposed to take more than 635MB approximately. I know my measuring method is not very accurate, but should it be this way ? As a side note, the unstacking also takes a very long time. What is going on under the hood ? Martin","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1560/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 296673404,MDU6SXNzdWUyOTY2NzM0MDQ=,1906,Coordinate attributes as DataArray type doesn't export to netcdf,167802,closed,0,,,5,2018-02-13T09:48:53Z,2018-02-26T09:34:24Z,2018-02-26T09:34:24Z,CONTRIBUTOR,,,,"#### Code Sample, a copy-pastable example if possible ```python import numpy as np import xarray as xr arr = xr.DataArray([[1, 2, 3]], dims=['time', 'x']) arr['time'] = np.array([1]) time_bnds = xr.DataArray([0, 1], dims='time_bounds') arr['time'].attrs['bounds'] = time_bnds dataset = xr.Dataset({'arr': arr, 'time_bnds': time_bnds}) dataset.to_netcdf('time_bnd.nc') ``` #### Problem description This code produces a TypeError ``` Traceback (most recent call last): File ""test_time_bounds.py"", line 12, in dataset.to_netcdf('time_bnd.nc') File ""/home/a001673/.local/lib/python2.7/site-packages/xarray/core/dataset.py"", line 1132, in to_netcdf unlimited_dims=unlimited_dims) File ""/home/a001673/.local/lib/python2.7/site-packages/xarray/backends/api.py"", line 598, in to_netcdf _validate_attrs(dataset) File ""/home/a001673/.local/lib/python2.7/site-packages/xarray/backends/api.py"", line 121, in _validate_attrs check_attr(k, v) File ""/home/a001673/.local/lib/python2.7/site-packages/xarray/backends/api.py"", line 112, in check_attr 'files'.format(value)) TypeError: Invalid value for attr: array([0, 1]) Dimensions without coordinates: time_bounds must be a number string, ndarray or a list/tuple of numbers/strings for serialization to netCDF files ``` This is a problem for me because we need to provide attributes to the coordinate variables and save the to netcdf in order to be CF compliant. There are workarounds (like saving the `time_bnds` as a regular variable and putting its name as an attribute of the `time` variable) , but the provided code seems to be the most intuitive way to do it. #### Expected output I would expect an output like this (ncdump -h): ``` netcdf time_bnd { dimensions: time = 1 ; time_bounds = 2 ; x = 3 ; variables: int64 time(time) ; time:bounds = ""time_bnds"" ; int64 time_bnds(time_bounds) ; int64 arr(time, x) ; ``` #### Output of ``xr.show_versions()``
In [2]: xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 2.7.5.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-693.11.6.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: None.None xarray: 0.10.0 pandas: 0.21.0 numpy: 1.13.3 scipy: 0.18.1 netCDF4: 1.1.8 h5netcdf: 0.4.2 Nio: None bottleneck: 1.2.1 cyordereddict: None dask: 0.16.1 matplotlib: 2.1.0 cartopy: None seaborn: None setuptools: 38.4.0 pip: 9.0.1 conda: None pytest: 3.1.3 IPython: 5.5.0 sphinx: 1.6.6
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1906/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 289972054,MDU6SXNzdWUyODk5NzIwNTQ=,1842,DataArray read from netcdf with unexpected type,167802,closed,0,,,1,2018-01-19T13:15:11Z,2018-01-23T20:15:29Z,2018-01-23T20:15:29Z,CONTRIBUTOR,,,,"#### Code Sample, a copy-pastable example if possible ```python import numpy as np import h5netcdf filename = ""mask_and_scale_float32.nc"" with h5netcdf.File(filename, 'w') as f: f.dimensions = {'x': 5} v = f.create_variable('hello', ('x',), dtype=np.uint16) v[:] = np.ones(5, dtype=np.uint16) v[0] = np.uint16(65535) v.attrs['_FillValue'] = np.uint16(65535) v.attrs['scale_factor'] = np.float32(2) v.attrs['add_offset'] = np.float32(0.5) import xarray as xr v = xr.open_dataset(filename, mask_and_scale=True)['hello'] print(v.dtype) ``` #### Problem description The `scale_factor` and `add_offset` being float32, I would expect the result from loading to be a float32 array. However, we get a float64 array instead. A float32 array for a very large dataset is better for faster computations. #### Expected Output float32 #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 2.7.5.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-693.11.6.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: None.None xarray: 0.10.0 pandas: 0.21.0 numpy: 1.13.3 scipy: 0.18.1 netCDF4: 1.1.8 h5netcdf: 0.4.2 Nio: None bottleneck: None cyordereddict: None dask: 0.16.0+37.g1fef002 matplotlib: 2.1.0 cartopy: None seaborn: None setuptools: 38.2.4 pip: 9.0.1 conda: None pytest: 3.1.3 IPython: 5.5.0 sphinx: 1.3.6
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1842/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue