home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 869792877

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
869792877 MDU6SXNzdWU4Njk3OTI4Nzc= 5229 Index level naming bug with `concat` 6130352 closed 0     2 2021-04-28T10:29:34Z 2021-04-28T19:38:26Z 2021-04-28T19:38:26Z NONE      

There is an inconsistency with how indexes are generated in a concat operation:

```python def transform(df): return ( df.to_xarray() .set_index(index=['id1', 'id2']) .pipe(lambda ds: xr.concat([ ds.isel(index=ds.year == v) for v in ds.year.to_series().unique() ], dim='dates')) )

df1 = pd.DataFrame(dict( id1=[1,2,1,2], id2=[1,2,1,2], data=[1,2,3,4], year=[2019, 2019, 2020, 2020] )) transform(df1) <xarray.Dataset> Dimensions: (dates: 2, index: 2) Coordinates: * index (index) MultiIndex - id1 (index) int64 1 2 - id2 (index) int64 1 2 Dimensions without coordinates: dates Data variables: data (dates, index) int64 1 2 3 4 year (dates, index) int64 2019 2019 2020 2020

df2 = pd.DataFrame(dict( id1=[1,2,1,2], id2=[1,2,1,3], # These don't quite align now data=[1,2,3,4], year=[2019, 2019, 2020, 2020] )) transform(df2) <xarray.Dataset> Dimensions: (dates: 2, index: 3) Coordinates: * index (index) MultiIndex - index_level_0 (index) int64 1 2 2 # These names are now different from id1, id2 - index_level_1 (index) int64 1 2 3 Dimensions without coordinates: dates Data variables: data (dates, index) float64 1.0 2.0 nan 3.0 nan 4.0 year (dates, index) float64 2.019e+03 2.019e+03 ... nan 2.02e+03 ```

It only appears to happen when values in a multiindex for the datasets being concatenated differ.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 4.19.0-16-cloud-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.17.0 pandas: 1.1.1 numpy: 1.20.2 scipy: 1.6.2 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: 2.20.0 matplotlib: 3.3.3 cartopy: None seaborn: 0.11.1 numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: 6.2.3 IPython: 7.22.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5229/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 0.764ms · About: xarray-datasette