home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

9 rows where user = 1053153 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: title, comments, closed_at, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 7
  • pull 2

state 2

  • open 5
  • closed 4

repo 1

  • xarray 9
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
718716799 MDU6SXNzdWU3MTg3MTY3OTk= 4501 Scalar non-dimension coords forget their heritage chrisroat 1053153 open 0     7 2020-10-10T22:44:25Z 2023-09-22T04:50:34Z   CONTRIBUTOR      

I'm not sure if this is a bug or a feature, to be honest. I noted that expanded scalar coordinates would retain their scalar values, and so thought that non-dimension coordinates might keep their references to each other and do the same.

What happened:

When a dimension is squeezed or selected with a scalar, the associated non-dimension coordinates are unassociated from the dimension. When the dimension is expanded later, the previously associated non-dimension coordinates are not expanded.

What you expected to happen:

The previously associated non-dimension coordinates should be expanded.

Minimal Complete Verifiable Example:

```python import numpy as np import xarray as xr

arr1 = xr.DataArray(np.zeros((1,5)), dims=['y', 'x'], coords={'e': ('y', [10])}) arr2 = arr1.squeeze('y').expand_dims('y')

xr.testing.assert_identical(arr1, arr2) ```

Error: ``` AssertionError: Left and right DataArray objects are not identical

Differing coordinates: L e (y) int64 10 R e int64 10 ```

Taken another way, I would desire these statements to be possible:

xr.DataArray(np.zeros(5), dims=['x'], coords={'y': 0, 'a': ('y', 1)}) xr.DataArray(np.zeros((0, 5)), dims=['y', 'x'], coords={'e': ('y', 10)})

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Jul 28 2020, 12:59:40) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-48-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: None xarray: 0.16.1 pandas: 1.1.3 numpy: 1.19.2 scipy: 1.5.2 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.5.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: 2.30.0 matplotlib: 3.3.2 cartopy: None seaborn: 0.11.0 numbagg: None pint: None setuptools: 50.3.0 pip: 20.2.3 conda: None pytest: 6.1.1 IPython: 7.18.1 sphinx: 3.2.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4501/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
651945063 MDU6SXNzdWU2NTE5NDUwNjM= 4205 Chunking causes unrelated non-dimension coordinate to become a dask array chrisroat 1053153 open 0     2 2020-07-07T02:35:15Z 2022-07-12T15:25:05Z   CONTRIBUTOR      

What happened:

Rechunking along an independent dimension causes unrelated non-dimension coordinates to become dask arrays. The dimension coordinates do not seem affected.

I can stick in a synchronous compute on the coordinate to recover, but wanted to be sure this was the expected behavior.

What you expected to happen:

Chunking along an unrelated dimension should not affect unrelated non-dimension coordinates.

Minimal Complete Verifiable Example:

```python import xarray as xr import dask.array as da

def print_coords(a, title): print() print(title) for dim in ['x', 'y', 'b']: if dim in a.dims or dim in a.coords: print('dim:', dim, 'type:', type(a.coords[dim].data))

arr = xr.DataArray(da.zeros((20, 20), chunks=10), dims=('x', 'y'), coords={'b': ('y', range(100,120)), 'x': range(20), 'y': range(20)})

print_coords(arr, 'Original')

The following line rechunks independently of b or y.

Removing this line allows the code to succeed.

arr = arr.chunk({'x': 5})

print_coords(arr, 'After chunking')

arr = arr.sel(y=2)

print_coords(arr, 'After selection')

print() print('Scalar values:') print('y=', arr.coords['y'].item()) print('b=', arr.coords['b'].item()) # Sad Panda Original dim: x type: <class 'numpy.ndarray'> dim: y type: <class 'numpy.ndarray'> dim: b type: <class 'numpy.ndarray'>

After chunking dim: x type: <class 'numpy.ndarray'> dim: y type: <class 'numpy.ndarray'> dim: b type: <class 'dask.array.core.Array'>

After selection dim: x type: <class 'numpy.ndarray'> dim: y type: <class 'numpy.ndarray'> dim: b type: <class 'dask.array.core.Array'>

Scalar values: y= 2

<stack trace elided> NotImplementedError: 'item' is not yet a valid method on dask arrays ```

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.19.112+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: None xarray: 0.15.1 pandas: 1.0.5 numpy: 1.18.5 scipy: 1.4.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.19.0 distributed: 2.19.0 matplotlib: 3.2.2 cartopy: None seaborn: None numbagg: None setuptools: 49.1.0.post20200704 pip: 20.1.1 conda: 4.8.3 pytest: 5.4.3 IPython: 7.16.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4205/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
688640232 MDU6SXNzdWU2ODg2NDAyMzI= 4389 Stack: avoid re-chunking (dask) and insert new coordinates arbitrarily chrisroat 1053153 open 0     3 2020-08-30T02:35:48Z 2022-04-28T01:39:06Z   CONTRIBUTOR      

The behavior of stack was not quite intuitive to me, and I'd like to understand if this was an explicit technical decision or if it can be changed.

First, with regard to chunking: arr = xr.DataArray(da.zeros((2, 3, 4), dtype=np.int, chunks=(1 ,1, 1)), dims=['z', 'y' ,'x']) stacked = arr.stack(v=('y', 'x')) print(stacked) -- xarray.DataArray 'zeros-6eb2edd0fca7ec97141e1310bd303988' (z: 2, v: 12)> dask.array<reshape, shape=(2, 12), dtype=int64, chunksize=(1, 4), chunktype=numpy.ndarray> Coordinates: * v (v) MultiIndex - y (v) int64 0 0 0 0 1 1 1 1 2 2 2 2 - x (v) int64 0 1 2 3 0 1 2 3 0 1 2 3 Dimensions without coordinates: z

Why did the number of chunks change in this case? Couldn't the chunksize be (1,1)?

Next, why is it necessary to put the new dimension at the end? It seems there are often more natural (perhaps just to my naive thought process) placements. One example would be that same array above, but stacked on the first two dimensions. I would want the new dimension to be the first dimension (again without the rechunking above). To accomplish this, I do:

arr = xr.DataArray(da.zeros((2, 3, 4), dtype=np.int, chunks=(1 ,1, 1)), dims=['z', 'y' ,'x']) stacked = arr.stack(v=('z', 'y')).transpose('v', ...).chunk({'v': 1}) print(stacked) -- <xarray.DataArray 'zeros-6eb2edd0fca7ec97141e1310bd303988' (v: 6, x: 4)> dask.array<rechunk-merge, shape=(6, 4), dtype=int64, chunksize=(1, 1), chunktype=numpy.ndarray> Coordinates: * v (v) MultiIndex - z (v) int64 0 0 0 1 1 1 - y (v) int64 0 1 2 0 1 2 Dimensions without coordinates: x

The dask graph for this last bit insert a rechunk and two transposes, but my intent was not to have any of the underlying chunks change at all. Here is 1 of 8 pieces of the graph (with optimization off -- optimization combines operations, but doesn't change the topology or the operations):

Is it technically feasible for stack to avoid rechunking, and for the user to determine where the new dimensions should go?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4389/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
929518413 MDExOlB1bGxSZXF1ZXN0Njc3MzU2NDE5 5526 Handle empty containers in zarr chunk checks chrisroat 1053153 closed 0     6 2021-06-24T18:52:50Z 2022-01-30T08:05:40Z 2022-01-27T21:46:58Z CONTRIBUTOR   0 pydata/xarray/pulls/5526
  • [x] Closes #4084
  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • ~User visible changes (including notable bug fixes) are documented in whats-new.rst~
  • ~New functions/methods are listed in api.rst~

Continuation of https://github.com/pydata/xarray/pull/5019, which closed during the master/main switch.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5526/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
621451930 MDU6SXNzdWU2MjE0NTE5MzA= 4084 write/read to zarr subtly changes array with non-dim coord chrisroat 1053153 closed 0     1 2020-05-20T04:34:42Z 2022-01-27T21:46:58Z 2022-01-27T21:46:58Z CONTRIBUTOR      

With an array containing a non-dimension coordinate, I do "where+squeeze" selection and can write the array to zarr without problem. However, if I write out and read back in the array prior to doing the selection, the array no longer writes properly.

The problem may stem from the non-dimension coordinate being read back as a dask array (which I can see by printing the coordinate).

The problem goes away if this line is altered to check for empty tuples in addition to None: if (var_chunks is None or not len(var_chunks)) and (enc_chunks is None or not len(enc_chunks)):

However, I'm not sure if the subtle change in the array will cause other issues, so I don't know if the above modification is a band-aid or a real solution.

MCVE Code Sample

```python

Your code here

import xarray as xr import numpy as np import dask.array as da

def create(): image = da.zeros((2,2)) return xr.DataArray(image, dims=['y', 'x'], coords={'x': [0, 1], 'y': [0, 1], 'xname': ('x', ['apple', 'banana'])})

def select_and_write(arr, fname): arr = arr.where(arr.coords['xname'] == 'apple', drop=True) arr = arr.squeeze('x') arr.to_dataset(name='foo').to_zarr(fname, mode='w')

def ok(): print('ok') arr = create() select_and_write(arr, '/tmp/ok.zarr')

def error(): print('error') arr = create() arr.to_dataset(name='foo').to_zarr('/tmp/error_intermediate.zarr', mode='w') arr2 = xr.open_zarr('/tmp/error_intermediate.zarr')['foo'] select_and_write(arr2, '/tmp/error.zarr')

ok() error() ```

Expected Output

No stacktrace.

Problem Description

Stacktrace ``` --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-1-0db8ba1ef466> in <module> 27 28 ok() ---> 29 error() <ipython-input-1-0db8ba1ef466> in error() 24 arr.to_dataset(name='foo').to_zarr('/tmp/error_intermediate.zarr', mode='w') 25 arr2 = xr.open_zarr('/tmp/error_intermediate.zarr')['foo'] ---> 26 select_and_write(arr2, '/tmp/error.zarr') 27 28 ok() <ipython-input-1-0db8ba1ef466> in select_and_write(arr, fname) 11 arr = arr.where(arr.coords['xname'] == 'apple', drop=True) 12 arr = arr.squeeze('x') ---> 13 arr.to_dataset(name='foo').to_zarr(fname, mode='w') 14 15 ~/.local/share/virtualenvs/starmap2-kOR7I2hi/lib/python3.7/site-packages/xarray/core/dataset.py in to_zarr(self, store, mode, synchronizer, group, encoding, compute, consolidated, append_dim) 1632 compute=compute, 1633 consolidated=consolidated, -> 1634 append_dim=append_dim, 1635 ) 1636 ~/.local/share/virtualenvs/starmap2-kOR7I2hi/lib/python3.7/site-packages/xarray/backends/api.py in to_zarr(dataset, store, mode, synchronizer, group, encoding, compute, consolidated, append_dim) 1341 writer = ArrayWriter() 1342 # TODO: figure out how to properly handle unlimited_dims -> 1343 dump_to_store(dataset, zstore, writer, encoding=encoding) 1344 writes = writer.sync(compute=compute) 1345 ~/.local/share/virtualenvs/starmap2-kOR7I2hi/lib/python3.7/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1133 variables, attrs = encoder(variables, attrs) 1134 -> 1135 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) 1136 1137 ~/.local/share/virtualenvs/starmap2-kOR7I2hi/lib/python3.7/site-packages/xarray/backends/zarr.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 385 self.set_dimensions(variables_encoded, unlimited_dims=unlimited_dims) 386 self.set_variables( --> 387 variables_encoded, check_encoding_set, writer, unlimited_dims=unlimited_dims 388 ) 389 ~/.local/share/virtualenvs/starmap2-kOR7I2hi/lib/python3.7/site-packages/xarray/backends/zarr.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 434 else: 435 # new variable --> 436 encoding = _extract_zarr_variable_encoding(v, raise_on_invalid=check) 437 encoded_attrs = {} 438 # the magic for storing the hidden dimension data ~/.local/share/virtualenvs/starmap2-kOR7I2hi/lib/python3.7/site-packages/xarray/backends/zarr.py in _extract_zarr_variable_encoding(variable, raise_on_invalid) 188 189 chunks = _determine_zarr_chunks( --> 190 encoding.get("chunks"), variable.chunks, variable.ndim 191 ) 192 encoding["chunks"] = chunks ~/.local/share/virtualenvs/starmap2-kOR7I2hi/lib/python3.7/site-packages/xarray/backends/zarr.py in _determine_zarr_chunks(enc_chunks, var_chunks, ndim) 108 if len(enc_chunks_tuple) != ndim: 109 # throw away encoding chunks, start over --> 110 return _determine_zarr_chunks(None, var_chunks, ndim) 111 112 for x in enc_chunks_tuple: ~/.local/share/virtualenvs/starmap2-kOR7I2hi/lib/python3.7/site-packages/xarray/backends/zarr.py in _determine_zarr_chunks(enc_chunks, var_chunks, ndim) 104 enc_chunks_tuple = ndim * (enc_chunks,) 105 else: --> 106 enc_chunks_tuple = tuple(enc_chunks) 107 108 if len(enc_chunks_tuple) != ndim: TypeError: 'NoneType' object is not iterable ```

Versions

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.5 (default, Nov 7 2019, 10:50:52) [GCC 8.3.0] python-bits: 64 OS: Linux OS-release: 5.3.0-51-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: None xarray: 0.15.1 pandas: 1.0.3 numpy: 1.18.4 scipy: 1.4.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.15.0 distributed: 2.15.2 matplotlib: 3.2.1 cartopy: None seaborn: 0.10.1 numbagg: None setuptools: 46.1.3 pip: 20.0.2 conda: None pytest: 5.4.1 IPython: 7.14.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4084/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
827233565 MDExOlB1bGxSZXF1ZXN0NTg5MTY2NTc5 5019 Handle empty containers in zarr chunk checks chrisroat 1053153 closed 0     9 2021-03-10T06:55:31Z 2021-06-24T16:52:29Z 2021-06-23T16:14:29Z CONTRIBUTOR   0 pydata/xarray/pulls/5019
  • [x] Closes #4084
  • [x] Tests added
  • [x] Passes pre-commit run --all-files
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5019/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
859577556 MDU6SXNzdWU4NTk1Nzc1NTY= 5168 multiple arrays with common nan-shaped dimension chrisroat 1053153 open 0     6 2021-04-16T08:07:41Z 2021-04-24T02:44:34Z   CONTRIBUTOR      

What happened:

When creating a dataset from two variables with a common dimension, there is a TypeError thrown when that dimension has shape nan.

What you expected to happen:

A dataset should be created. I believe dask has an allow_unknown_chunksizes parameter for cases like this -- would that be something that could work here? (Assuming I'm not making a mistake myself.)

Minimal Complete Verifiable Example:

```python import dask import dask.array as da import xarray as xr import numpy as np

def foo(): return np.zeros(3)

arr0 = da.from_delayed(dask.delayed(foo)(), shape=(np.nan,), dtype=float) arr0_xr = xr.DataArray(arr0, dims=('z',))

arr1 = da.from_delayed(dask.delayed(foo)(), shape=(np.nan,), dtype=float) arr1_xr = xr.DataArray(arr1, dims=('z',))

ds = xr.Dataset({'arr0': arr0_xr, 'arr1': arr0_xr}) ```

stack trace ``` --------------------------------------------------------------------------- KeyError Traceback (most recent call last) ~/kitchen_sync/xarray/xarray/core/dataarray.py in _getitem_coord(self, key) 692 try: --> 693 var = self._coords[key] 694 except KeyError: KeyError: 'z' During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) <ipython-input-4-06b01b94eab3> in <module> 8 arr1_xr = xr.DataArray(arr1, dims=('z',)) 9 ---> 10 ds = xr.Dataset({'arr0': arr0_xr, 'arr1': arr0_xr}) ~/kitchen_sync/xarray/xarray/core/dataset.py in __init__(self, data_vars, coords, attrs) 739 coords = coords.variables 740 --> 741 variables, coord_names, dims, indexes, _ = merge_data_and_coords( 742 data_vars, coords, compat="broadcast_equals" 743 ) ~/kitchen_sync/xarray/xarray/core/merge.py in merge_data_and_coords(data, coords, compat, join) 465 explicit_coords = coords.keys() 466 indexes = dict(_extract_indexes_from_coords(coords)) --> 467 return merge_core( 468 objects, compat, join, explicit_coords=explicit_coords, indexes=indexes 469 ) ~/kitchen_sync/xarray/xarray/core/merge.py in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value) 608 609 coerced = coerce_pandas_values(objects) --> 610 aligned = deep_align( 611 coerced, join=join, copy=False, indexes=indexes, fill_value=fill_value 612 ) ~/kitchen_sync/xarray/xarray/core/alignment.py in deep_align(objects, join, copy, indexes, exclude, raise_on_invalid, fill_value) 422 out.append(variables) 423 --> 424 aligned = align( 425 *targets, 426 join=join, ~/kitchen_sync/xarray/xarray/core/alignment.py in align(join, copy, indexes, exclude, fill_value, *objects) 283 for dim in obj.dims: 284 if dim not in exclude: --> 285 all_coords[dim].append(obj.coords[dim]) 286 try: 287 index = obj.indexes[dim] ~/kitchen_sync/xarray/xarray/core/coordinates.py in __getitem__(self, key) 326 327 def __getitem__(self, key: Hashable) -> "DataArray": --> 328 return self._data._getitem_coord(key) 329 330 def _update_coords( ~/kitchen_sync/xarray/xarray/core/dataarray.py in _getitem_coord(self, key) 694 except KeyError: 695 dim_sizes = dict(zip(self.dims, self.shape)) --> 696 _, key, var = _get_virtual_variable( 697 self._coords, key, self._level_coords, dim_sizes 698 ) ~/kitchen_sync/xarray/xarray/core/dataset.py in _get_virtual_variable(variables, key, level_vars, dim_sizes) 146 147 if key in dim_sizes: --> 148 data = pd.Index(range(dim_sizes[key]), name=key) 149 variable = IndexVariable((key,), data) 150 return key, key, variable TypeError: 'float' object cannot be interpreted as an integer ```

Anything else we need to know?:

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:12:38) [Clang 11.0.1 ] python-bits: 64 OS: Darwin OS-release: 20.3.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: None.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.17.1.dev66+g18ed29e4 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.6.2 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.10.0 h5py: 3.1.0 Nio: None zarr: 2.7.0 cftime: 1.4.1 nc_time_axis: 1.2.0 PseudoNetCDF: installed rasterio: None cfgrib: 0.9.9.0 iris: 2.4.0 bottleneck: 1.3.2 dask: 2021.04.0 distributed: 2021.04.0 matplotlib: 3.4.1 cartopy: 0.18.0 seaborn: 0.11.1 numbagg: installed pint: 0.17 setuptools: 49.6.0.post20210108 pip: 20.2.4 conda: None pytest: 6.2.3 IPython: 7.22.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5168/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
859611240 MDU6SXNzdWU4NTk2MTEyNDA= 5169 nan length coordinates chrisroat 1053153 open 0     2 2021-04-16T08:50:50Z 2021-04-24T02:43:35Z   CONTRIBUTOR      

Is your feature request related to a problem? Please describe.

When using arrays with a nan shape, I'd like to provide a coordinate specification from a delayed object, which is my responsibility to make sure has the right chunks and length.

Describe the solution you'd like

Below are three examples that I think should work, where I give a coordinate via a dask array, a dask series, and a dask index. Currently, all three examples error out by computing the coordinate length (triggering an unwanted dask computation!), and then indicating that its different than the array's corresponding length, which is nan.

```python import dask import dask.array as da import dask.dataframe as dd import numpy as np import pandas as pd import xarray as xr

def foo(): return np.arange(4)

arr = da.from_delayed(dask.delayed(foo)(), shape=(np.nan,), dtype=int)

idx = da.from_delayed(dask.delayed(foo)(), shape=(np.nan,), dtype=int)

ddf = dd.from_pandas(pd.DataFrame({'y': np.arange(4)}), npartitions=1)

arr0 = xr.DataArray(arr, coords=[('z', idx)]) arr1 = xr.DataArray(arr, coords=[('z', ddf['y'])]) arr2 = xr.DataArray(arr, coords=[('z', ddf.index)]) ```

Error:

ValueError: conflicting sizes for dimension 'z': length nan on the data but length 4 on coordinate 'z'

Describe alternatives you've considered

After computations to complete add the missing coordinate. This requires carrying around the delayed index with the delayed array.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5169/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
490476815 MDU6SXNzdWU0OTA0NzY4MTU= 3287 GroupBy of stacked dim with strings renames underlying dims chrisroat 1053153 closed 0     7 2019-09-06T18:59:47Z 2020-03-31T16:10:10Z 2020-03-31T16:10:10Z CONTRIBUTOR      

Names for dimensions are lost (renamed) when they are stacked and grouped, if one of the dimensions has string coordinates.

```python data = np.zeros((2,1,1)) dims = ['c', 'y', 'x']

d1 = xr.DataArray(data, dims=dims) g1 = d1.stack(f=['c', 'x']).groupby('f').first() print('Expected dim names:') print(g1.coords) print()

d2 = xr.DataArray(data, dims=dims, coords={'c': ['R', 'G']}) g2 = d2.stack(f=['c', 'x']).groupby('f').first() print('Unexpected dim names:') print(g2.coords) ```

Output

It is expected the 'f_level_0' and 'f_level_1' be 'c' and 'x', respectively in the second part below. ``` Expected dim names: Coordinates: * f (f) MultiIndex - c (f) int64 0 1 - x (f) int64 0 0

Unexpected dim names: Coordinates: * f (f) MultiIndex - f_level_0 (f) object 'G' 'R' - f_level_1 (f) int64 0 0 ```

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Jul 9 2019, 18:13:23) [Clang 10.0.1 (clang-1001.0.46.4)] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.2 libnetcdf: 4.6.3 xarray: 0.12.3 pandas: 0.25.1 numpy: 1.17.1 scipy: 1.3.1 netCDF4: 1.5.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.1.1 cartopy: None seaborn: None numbagg: None setuptools: 41.2.0 pip: 19.2.3 conda: None pytest: None IPython: 7.8.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3287/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 22.109ms · About: xarray-datasette