home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1402168223

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1402168223 I_kwDOAMm_X85Tk2Of 7148 Concatenate using Multiindex cannot be unstacked anymore 14276158 open 0     3 2022-10-09T06:23:06Z 2022-10-10T08:16:38Z   CONTRIBUTOR      

What happened?

When trying to concatenate data using a Pandas MultiIndex and then unstack it to get two independent dimensions (e.g. for varying different parameters in a simulation), the unstack errors. I have seen different errors with different data (MVE errors with ValueError: IndexVariable objects must be 1-dimensional, but my data errors with ValueError: cannot re-index or align objects with conflicting indexes found for the following dimensions: 'concat_dim' (2 conflicting indexes)).

One hint at the bug might be that conc._indexes shows more indexes then display(conc).

What did you expect to happen?

Originally (I think it was v2022.3.0) , it used to unstack neatly into the two levels of the multiindex as separate dimensions.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np import pandas as pd

ds = xr.Dataset(data_vars={"a": (("dim1", "dim2"), np.arange(16).reshape(4,4))}, coords={"dim1": list(range(4)), "dim2": list(range(2,6))}) dslist = [ds for i in range(6)]

arrays = [ ["bar", "bar", "baz", "baz", "foo", "foo"], ["one", "two", "one", "two", "one", "two"], ] mindex = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=["first", "second"])

conc = xr.concat(dslist, dim=mindex) conc.unstack("concat_dim") # this errors

conc = xr.concat(dslist, dim='concat_dim') conc = conc.assign_coords(dict(concat_dim=mindex)).unstack("concat_dim") # this does not ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

[Skip to left side bar](https://jupyterhub.dkrz.de/user/b381219/levante-spawner-advanced/lab/tree/home/b/b381219/software/phd_scripts/jupyter/Test.ipynb#) > / /phd_scripts/jupyter/ Name Last Modified import xarray as xr import numpy as np import pandas as pd ​ ds = xr.Dataset(data_vars={"a": (("dim1", "dim2"), np.arange(16).reshape(4,4))}, coords={"dim1": list(range(4)), "dim2": list(range(2,6))}) dslist = [ds for i in range(6)] ​ arrays = [ ["bar", "bar", "baz", "baz", "foo", "foo"], ["one", "two", "one", "two", "one", "two"], ] mindex = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=["first", "second"]) ​ conc = xr.concat(dslist, dim=mindex) conc.unstack("concat_dim") --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In [24], line 15 12 mindex = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=["first", "second"]) 14 conc = xr.concat(dslist, dim=mindex) ---> 15 conc.unstack("concat_dim") File ~/.conda/envs/xwrf-dev/lib/python3.10/site-packages/xarray/core/dataset.py:4870, in Dataset.unstack(self, dim, fill_value, sparse) 4866 result = result._unstack_full_reindex( 4867 d, stacked_indexes[d], fill_value, sparse 4868 ) 4869 else: -> 4870 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse) 4871 return result File ~/.conda/envs/xwrf-dev/lib/python3.10/site-packages/xarray/core/dataset.py:4706, in Dataset._unstack_once(self, dim, index_and_vars, fill_value, sparse) 4703 else: 4704 fill_value_ = fill_value -> 4706 variables[name] = var._unstack_once( 4707 index=clean_index, 4708 dim=dim, 4709 fill_value=fill_value_, 4710 sparse=sparse, 4711 ) 4712 else: 4713 variables[name] = var File ~/.conda/envs/xwrf-dev/lib/python3.10/site-packages/xarray/core/variable.py:1764, in Variable._unstack_once(self, index, dim, fill_value, sparse) 1759 # Indexer is a list of lists of locations. Each list is the locations 1760 # on the new dimension. This is robust to the data being sparse; in that 1761 # case the destinations will be NaN / zero. 1762 data[(..., *indexer)] = reordered -> 1764 return self._replace(dims=new_dims, data=data) File ~/.conda/envs/xwrf-dev/lib/python3.10/site-packages/xarray/core/variable.py:1017, in Variable._replace(self, dims, data, attrs, encoding) 1015 if encoding is _default: 1016 encoding = copy.copy(self._encoding) -> 1017 return type(self)(dims, data, attrs, encoding, fastpath=True) File ~/.conda/envs/xwrf-dev/lib/python3.10/site-packages/xarray/core/variable.py:2776, in IndexVariable.__init__(self, dims, data, attrs, encoding, fastpath) 2774 super().__init__(dims, data, attrs, encoding, fastpath) 2775 if self.ndim != 1: -> 2776 raise ValueError(f"{type(self).__name__} objects must be 1-dimensional") 2778 # Unlike in Variable, always eagerly load values into memory 2779 if not isinstance(self._data, PandasIndexingAdapter): ValueError: IndexVariable objects must be 1-dimensional conc = xr.concat(dslist, dim='concat_dim') conc = conc.assign_coords(dict(concat_dim=index)).unstack("concat_dim") conc xarray.Dataset Dimensions: first: 3second: 2dim1: 4dim2: 4 Coordinates: first (first) object 'bar' 'baz' 'foo' second (second) object 'one' 'two' dim1 (dim1) int64 0 1 2 3 dim2 (dim2) int64 2 3 4 5 Data variables: a (dim1, dim2, first, second) int64 0 0 0 0 0 0 1 ... 15 15 15 15 15 15 Attributes: (0) xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0] python-bits: 64 OS: Linux OS-release: 4.18.0-305.25.1.el8_4.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 2022.9.0 pandas: 1.5.0 numpy: 1.23.3 scipy: 1.9.1 netCDF4: 1.6.1 pydap: None h5netcdf: 1.0.2 h5py: 3.7.0 Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.9.2 distributed: 2022.9.2 matplotlib: 3.6.0 cartopy: 0.21.0 seaborn: None numbagg: None fsspec: 2022.8.2 cupy: None pint: 0.19.2 sparse: None flox: None numpy_groupies: None setuptools: 65.4.1 pip: 22.2.2 conda: None pytest: None IPython: 8.5.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7148/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 3 rows from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 72.467ms · About: xarray-datasette