issues: 869792877
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
869792877 | MDU6SXNzdWU4Njk3OTI4Nzc= | 5229 | Index level naming bug with `concat` | 6130352 | closed | 0 | 2 | 2021-04-28T10:29:34Z | 2021-04-28T19:38:26Z | 2021-04-28T19:38:26Z | NONE | There is an inconsistency with how indexes are generated in a concat operation: ```python def transform(df): return ( df.to_xarray() .set_index(index=['id1', 'id2']) .pipe(lambda ds: xr.concat([ ds.isel(index=ds.year == v) for v in ds.year.to_series().unique() ], dim='dates')) ) df1 = pd.DataFrame(dict( id1=[1,2,1,2], id2=[1,2,1,2], data=[1,2,3,4], year=[2019, 2019, 2020, 2020] )) transform(df1) <xarray.Dataset> Dimensions: (dates: 2, index: 2) Coordinates: * index (index) MultiIndex - id1 (index) int64 1 2 - id2 (index) int64 1 2 Dimensions without coordinates: dates Data variables: data (dates, index) int64 1 2 3 4 year (dates, index) int64 2019 2019 2020 2020 df2 = pd.DataFrame(dict( id1=[1,2,1,2], id2=[1,2,1,3], # These don't quite align now data=[1,2,3,4], year=[2019, 2019, 2020, 2020] )) transform(df2) <xarray.Dataset> Dimensions: (dates: 2, index: 3) Coordinates: * index (index) MultiIndex - index_level_0 (index) int64 1 2 2 # These names are now different from id1, id2 - index_level_1 (index) int64 1 2 3 Dimensions without coordinates: dates Data variables: data (dates, index) float64 1.0 2.0 nan 3.0 nan 4.0 year (dates, index) float64 2.019e+03 2.019e+03 ... nan 2.02e+03 ``` It only appears to happen when values in a multiindex for the datasets being concatenated differ. Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 4.19.0-16-cloud-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.17.0 pandas: 1.1.1 numpy: 1.20.2 scipy: 1.6.2 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: 2.20.0 matplotlib: 3.3.3 cartopy: None seaborn: 0.11.1 numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: 6.2.3 IPython: 7.22.0 sphinx: None |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5229/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |