id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 718716799,MDU6SXNzdWU3MTg3MTY3OTk=,4501,Scalar non-dimension coords forget their heritage,1053153,open,0,,,7,2020-10-10T22:44:25Z,2023-09-22T04:50:34Z,,CONTRIBUTOR,,,,"I'm not sure if this is a bug or a feature, to be honest. I noted that expanded scalar coordinates would retain their scalar values, and so thought that non-dimension coordinates might keep their references to each other and do the same. **What happened**: When a dimension is squeezed or selected with a scalar, the associated non-dimension coordinates are unassociated from the dimension. When the dimension is expanded later, the previously associated non-dimension coordinates are not expanded. **What you expected to happen**: The previously associated non-dimension coordinates should be expanded. **Minimal Complete Verifiable Example**: ```python import numpy as np import xarray as xr arr1 = xr.DataArray(np.zeros((1,5)), dims=['y', 'x'], coords={'e': ('y', [10])}) arr2 = arr1.squeeze('y').expand_dims('y') xr.testing.assert_identical(arr1, arr2) ``` Error: ``` AssertionError: Left and right DataArray objects are not identical Differing coordinates: L e (y) int64 10 R e int64 10 ``` Taken another way, I would desire these statements to be possible: ``` xr.DataArray(np.zeros(5), dims=['x'], coords={'y': 0, 'a': ('y', 1)}) xr.DataArray(np.zeros((0, 5)), dims=['y', 'x'], coords={'e': ('y', 10)}) ``` **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Jul 28 2020, 12:59:40) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-48-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: None xarray: 0.16.1 pandas: 1.1.3 numpy: 1.19.2 scipy: 1.5.2 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.5.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: 2.30.0 matplotlib: 3.3.2 cartopy: None seaborn: 0.11.0 numbagg: None pint: None setuptools: 50.3.0 pip: 20.2.3 conda: None pytest: 6.1.1 IPython: 7.18.1 sphinx: 3.2.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4501/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 651945063,MDU6SXNzdWU2NTE5NDUwNjM=,4205,Chunking causes unrelated non-dimension coordinate to become a dask array,1053153,open,0,,,2,2020-07-07T02:35:15Z,2022-07-12T15:25:05Z,,CONTRIBUTOR,,,,"**What happened**: Rechunking along an independent dimension causes unrelated non-dimension coordinates to become dask arrays. The dimension coordinates do not seem affected. I can stick in a synchronous compute on the coordinate to recover, but wanted to be sure this was the expected behavior. **What you expected to happen**: Chunking along an unrelated dimension should not affect unrelated non-dimension coordinates. **Minimal Complete Verifiable Example**: ```python import xarray as xr import dask.array as da def print_coords(a, title): print() print(title) for dim in ['x', 'y', 'b']: if dim in a.dims or dim in a.coords: print('dim:', dim, 'type:', type(a.coords[dim].data)) arr = xr.DataArray(da.zeros((20, 20), chunks=10), dims=('x', 'y'), coords={'b': ('y', range(100,120)), 'x': range(20), 'y': range(20)}) print_coords(arr, 'Original') # The following line rechunks independently of b or y. # Removing this line allows the code to succeed. arr = arr.chunk({'x': 5}) print_coords(arr, 'After chunking') arr = arr.sel(y=2) print_coords(arr, 'After selection') print() print('Scalar values:') print('y=', arr.coords['y'].item()) print('b=', arr.coords['b'].item()) # Sad Panda ``` ``` Original dim: x type: dim: y type: dim: b type: After chunking dim: x type: dim: y type: dim: b type: After selection dim: x type: dim: y type: dim: b type: Scalar values: y= 2 NotImplementedError: 'item' is not yet a valid method on dask arrays ``` **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.19.112+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: None xarray: 0.15.1 pandas: 1.0.5 numpy: 1.18.5 scipy: 1.4.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.19.0 distributed: 2.19.0 matplotlib: 3.2.2 cartopy: None seaborn: None numbagg: None setuptools: 49.1.0.post20200704 pip: 20.1.1 conda: 4.8.3 pytest: 5.4.3 IPython: 7.16.1 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4205/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 688640232,MDU6SXNzdWU2ODg2NDAyMzI=,4389,Stack: avoid re-chunking (dask) and insert new coordinates arbitrarily,1053153,open,0,,,3,2020-08-30T02:35:48Z,2022-04-28T01:39:06Z,,CONTRIBUTOR,,,,"The behavior of stack was not quite intuitive to me, and I'd like to understand if this was an explicit technical decision or if it can be changed. First, with regard to chunking: ``` arr = xr.DataArray(da.zeros((2, 3, 4), dtype=np.int, chunks=(1 ,1, 1)), dims=['z', 'y' ,'x']) stacked = arr.stack(v=('y', 'x')) print(stacked) -- xarray.DataArray 'zeros-6eb2edd0fca7ec97141e1310bd303988' (z: 2, v: 12)> dask.array Coordinates: * v (v) MultiIndex - y (v) int64 0 0 0 0 1 1 1 1 2 2 2 2 - x (v) int64 0 1 2 3 0 1 2 3 0 1 2 3 Dimensions without coordinates: z ``` Why did the number of chunks change in this case? Couldn't the chunksize be (1,1)? Next, why is it necessary to put the new dimension at the end? It seems there are often more natural (perhaps just to my naive thought process) placements. One example would be that same array above, but stacked on the first two dimensions. I would want the new dimension to be the first dimension (again without the rechunking above). To accomplish this, I do: ``` arr = xr.DataArray(da.zeros((2, 3, 4), dtype=np.int, chunks=(1 ,1, 1)), dims=['z', 'y' ,'x']) stacked = arr.stack(v=('z', 'y')).transpose('v', ...).chunk({'v': 1}) print(stacked) -- dask.array Coordinates: * v (v) MultiIndex - z (v) int64 0 0 0 1 1 1 - y (v) int64 0 1 2 0 1 2 Dimensions without coordinates: x ``` The dask graph for this last bit insert a rechunk and two transposes, but my intent was not to have any of the underlying chunks change at all. Here is 1 of 8 pieces of the graph (with optimization off -- optimization combines operations, but doesn't change the topology or the operations): ![out](https://user-images.githubusercontent.com/1053153/91649954-2518d100-ea2e-11ea-83ac-1e1f52d2e11b.png) Is it technically feasible for stack to avoid rechunking, and for the user to determine where the new dimensions should go? ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4389/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 859577556,MDU6SXNzdWU4NTk1Nzc1NTY=,5168,multiple arrays with common nan-shaped dimension,1053153,open,0,,,6,2021-04-16T08:07:41Z,2021-04-24T02:44:34Z,,CONTRIBUTOR,,,,"**What happened**: When creating a dataset from two variables with a common dimension, there is a TypeError thrown when that dimension has shape nan. **What you expected to happen**: A dataset should be created. I believe dask has an [`allow_unknown_chunksizes`](https://github.com/dask/dask/blob/a199302b0c69df82a09c8ab166de882368e6e8c8/dask/array/core.py#L4625) parameter for cases like this -- would that be something that could work here? (Assuming I'm not making a mistake myself.) **Minimal Complete Verifiable Example**: ```python import dask import dask.array as da import xarray as xr import numpy as np def foo(): return np.zeros(3) arr0 = da.from_delayed(dask.delayed(foo)(), shape=(np.nan,), dtype=float) arr0_xr = xr.DataArray(arr0, dims=('z',)) arr1 = da.from_delayed(dask.delayed(foo)(), shape=(np.nan,), dtype=float) arr1_xr = xr.DataArray(arr1, dims=('z',)) ds = xr.Dataset({'arr0': arr0_xr, 'arr1': arr0_xr}) ```
stack trace ``` --------------------------------------------------------------------------- KeyError Traceback (most recent call last) ~/kitchen_sync/xarray/xarray/core/dataarray.py in _getitem_coord(self, key) 692 try: --> 693 var = self._coords[key] 694 except KeyError: KeyError: 'z' During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) in 8 arr1_xr = xr.DataArray(arr1, dims=('z',)) 9 ---> 10 ds = xr.Dataset({'arr0': arr0_xr, 'arr1': arr0_xr}) ~/kitchen_sync/xarray/xarray/core/dataset.py in __init__(self, data_vars, coords, attrs) 739 coords = coords.variables 740 --> 741 variables, coord_names, dims, indexes, _ = merge_data_and_coords( 742 data_vars, coords, compat=""broadcast_equals"" 743 ) ~/kitchen_sync/xarray/xarray/core/merge.py in merge_data_and_coords(data, coords, compat, join) 465 explicit_coords = coords.keys() 466 indexes = dict(_extract_indexes_from_coords(coords)) --> 467 return merge_core( 468 objects, compat, join, explicit_coords=explicit_coords, indexes=indexes 469 ) ~/kitchen_sync/xarray/xarray/core/merge.py in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value) 608 609 coerced = coerce_pandas_values(objects) --> 610 aligned = deep_align( 611 coerced, join=join, copy=False, indexes=indexes, fill_value=fill_value 612 ) ~/kitchen_sync/xarray/xarray/core/alignment.py in deep_align(objects, join, copy, indexes, exclude, raise_on_invalid, fill_value) 422 out.append(variables) 423 --> 424 aligned = align( 425 *targets, 426 join=join, ~/kitchen_sync/xarray/xarray/core/alignment.py in align(join, copy, indexes, exclude, fill_value, *objects) 283 for dim in obj.dims: 284 if dim not in exclude: --> 285 all_coords[dim].append(obj.coords[dim]) 286 try: 287 index = obj.indexes[dim] ~/kitchen_sync/xarray/xarray/core/coordinates.py in __getitem__(self, key) 326 327 def __getitem__(self, key: Hashable) -> ""DataArray"": --> 328 return self._data._getitem_coord(key) 329 330 def _update_coords( ~/kitchen_sync/xarray/xarray/core/dataarray.py in _getitem_coord(self, key) 694 except KeyError: 695 dim_sizes = dict(zip(self.dims, self.shape)) --> 696 _, key, var = _get_virtual_variable( 697 self._coords, key, self._level_coords, dim_sizes 698 ) ~/kitchen_sync/xarray/xarray/core/dataset.py in _get_virtual_variable(variables, key, level_vars, dim_sizes) 146 147 if key in dim_sizes: --> 148 data = pd.Index(range(dim_sizes[key]), name=key) 149 variable = IndexVariable((key,), data) 150 return key, key, variable TypeError: 'float' object cannot be interpreted as an integer ```
**Anything else we need to know?**: **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:12:38) [Clang 11.0.1 ] python-bits: 64 OS: Darwin OS-release: 20.3.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: None.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.17.1.dev66+g18ed29e4 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.6.2 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.10.0 h5py: 3.1.0 Nio: None zarr: 2.7.0 cftime: 1.4.1 nc_time_axis: 1.2.0 PseudoNetCDF: installed rasterio: None cfgrib: 0.9.9.0 iris: 2.4.0 bottleneck: 1.3.2 dask: 2021.04.0 distributed: 2021.04.0 matplotlib: 3.4.1 cartopy: 0.18.0 seaborn: 0.11.1 numbagg: installed pint: 0.17 setuptools: 49.6.0.post20210108 pip: 20.2.4 conda: None pytest: 6.2.3 IPython: 7.22.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5168/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 859611240,MDU6SXNzdWU4NTk2MTEyNDA=,5169,nan length coordinates,1053153,open,0,,,2,2021-04-16T08:50:50Z,2021-04-24T02:43:35Z,,CONTRIBUTOR,,,,"**Is your feature request related to a problem? Please describe.** When using arrays with a nan shape, I'd like to provide a coordinate specification from a delayed object, which is my responsibility to make sure has the right chunks and length. **Describe the solution you'd like** Below are three examples that I think should work, where I give a coordinate via a dask array, a dask series, and a dask index. Currently, all three examples error out by computing the coordinate length (triggering an unwanted dask computation!), and then indicating that its different than the array's corresponding length, which is nan. ```python import dask import dask.array as da import dask.dataframe as dd import numpy as np import pandas as pd import xarray as xr def foo(): return np.arange(4) arr = da.from_delayed(dask.delayed(foo)(), shape=(np.nan,), dtype=int) idx = da.from_delayed(dask.delayed(foo)(), shape=(np.nan,), dtype=int) ddf = dd.from_pandas(pd.DataFrame({'y': np.arange(4)}), npartitions=1) arr0 = xr.DataArray(arr, coords=[('z', idx)]) arr1 = xr.DataArray(arr, coords=[('z', ddf['y'])]) arr2 = xr.DataArray(arr, coords=[('z', ddf.index)]) ``` Error: ``` ValueError: conflicting sizes for dimension 'z': length nan on the data but length 4 on coordinate 'z' ``` **Describe alternatives you've considered** After computations to complete add the missing coordinate. This requires carrying around the delayed index with the delayed array. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5169/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue