home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where issue = 620134014 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 4

  • phockett 1
  • max-sixty 1
  • genric 1
  • stale[bot] 1

author_association 3

  • NONE 2
  • CONTRIBUTOR 1
  • MEMBER 1

issue 1

  • Error in DataArray.from_dict(data_array.to_dict()) when using pd.MultiIndex · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1163292454 https://github.com/pydata/xarray/issues/4073#issuecomment-1163292454 https://api.github.com/repos/pydata/xarray/issues/4073 IC_kwDOAMm_X85FVm8m phockett 4447466 2022-06-22T15:51:51Z 2022-06-22T16:12:34Z CONTRIBUTOR

I also ran into this when trying to serialize to dict for general file writing routines (esp. for HDF5 writing with h5py), but the issue was in my non-dimensional coordinates! I thought I was being careful by already using array.unwrap() in my IO routine, but also required .reset_index() or .drop() for non-dimensional coordinates.

Some notes below in case it is useful for anyone else trying to do this. Also - this is all quite ugly, and I may have missed some existing core functionality, so I'll be very happy to hear if there is a better way to handle this.


Following the above, a minimal example:

```python import pandas as pd import xarray as xr idx = pd.MultiIndex.from_arrays([[1, 2], [3, 4]], names=('one', 'two')) array = xr.DataArray([0, 1], dims='idx', coords={'idx': idx})

Stacked multidim coords > dict > recreate array - Fails

xr.DataArray.from_dict(array.to_dict())

Unstack multidim coords > dict > recreate array - OK

xr.DataArray.from_dict(array.unstack().to_dict())

Set non-dimensional coord

array2 = array.copy() array2['Labels'] = ('idx', ['A','B']) # Add non-dim coord array2 = array2.swap_dims({'idx':'Labels'}) # Swap dims

Non-dim coord case - also need to reset and drop non-dim coords

This will fail

array2_dict = array2.unstack().reset_index('idx').to_dict() xr.DataArray.from_dict(array2_dict)

This is OK

array2_dict = array2.unstack().reset_index('idx', drop=True).to_dict() xr.DataArray.from_dict(array2_dict)

This is also OK

array2_dict = array2.unstack().drop('idx').to_dict() xr.DataArray.from_dict(array2_dict)

```

In all cases the reconstructed array is flat, and missing non-dim coords. My work-around for this so far is to pull various mappings manually, and dump everything to .attrs, then rebuild from those if required, e.g.

```python def mapDims(data): # Get dims from Xarray dims = data.dims # Set dim list - this excludes stacked dims dimsUS = data.unstack().dims # Set unstaked (full) dim list

# List stacked dims and map
# Could also do this by type checking vs. 'pandas.core.indexes.multi.MultiIndex'?
stackedDims = list(set(dims) - set(dimsUS))
stackedDimsMap = {k: list(data.indexes[k].names) for k in stackedDims}

# Get non-dimensional coords
# These may be stacked, are not listed in self.dims, and are not addressed by .unstack()
idxKeys = list(data.indexes.keys())
coordsKeys = list(data.coords.keys())
nonDimCoords = list(set(coordsKeys) - set(idxKeys))
# nonDimCoords = list(set(dims) - set(idxKeys))

# Get non-dim indexes
# nddimIndexes = {k:data.coords[k].to_index() for k,v in data.coords.items() if k in nonDimCoords}  # Note this returns Pandas Indexes, so may fail on file IO.
nddimMap = {k:list(data.coords[k].to_index().names) for k,v in data.coords.items() if k in nonDimCoords}

# Get dict maps - to_dict per non-dim coord
#  nddimDicts = {k:data.coords[k].reset_index(k).to_dict() for k,v in data.coords.items() if k in nonDimCoords}
# Use Pandas - this allows direct dump of PD multiindex to dicts
nddimDicts = {k:data.coords[k].to_index().to_frame().to_dict() for k,v in data.coords.items() if k in nonDimCoords}
# Get coords correlated to non-dim coords, need these to recreate original links & stacking (?)
nddimDims = {k:data.coords[k].dims for k,v in data.coords.items() if k in nonDimCoords}

return {k:v for k,v in locals().items() if k !='data'}

def deconstructDims(data):

xrDecon = data.copy()

# Map dims
xrDecon.attrs['dimMaps'] = mapDims(data)

# Unstack all coords
xrDecon = xrDecon.unstack()

# Remove non-dim coords
for nddim in xrDecon.attrs['dimMaps']['nonDimCoords']:
    xrDecon = xrDecon.drop(nddim)

return xrDecon

def reconstructDims(data):

xrRecon = data.copy()

# Restack coords
for stacked in xrRecon.attrs['dimMaps']['stackedDims']:
    xrRecon = xrRecon.stack({stacked:xrRecon.attrs['dimMaps']['stackedDims']})

# General non-dim coord rebuild
for nddim in xrRecon.attrs['dimMaps']['nonDimCoords']:
    # Add nddim back into main XR array
    xrRecon.coords[nddim] = (xrRecon.attrs['dimMaps']['nddimDims'][nddim] ,pd.MultiIndex.from_frame(pd.DataFrame.from_dict(xrRecon.attrs['dimMaps']['nddimDicts'][nddim])))  # OK

return xrRecon

```

Dict round-trip is then OK, and the dictionary can also be pushed to standard file types (contains only python native types + numpy array).

```python

IO with funcs

With additional tuple coord

array2 = array.copy() array2['Labels'] = ('idx', ['A','B']) # Add non-dim coord array2 = array2.swap_dims({'idx':'Labels'}) # Swap dims

Decon to dict

safeDict = deconstructDims(array2).to_dict()

Rebuild

xrFromDict = reconstructDims(xr.DataArray.from_dict(safeDict))

Same as array2 (aside from added attrs)

array2.attrs = xrFromDict.attrs array2.identical(xrFromDict) # True

```

Again, there is likely some cleaner/more obvious thing I'm missing here, but I'm not very familiar with Xarray or Pandas internals here - this is just where I ended up when trying to convert to HDF5 compatible datastructures in a semi-general way.

(As a side-note, I ran into similar issues with xr.DataArray.to_netcdf() and multi-index coords, or at least did last time I tried it - but I didn't look into this further since I prefer using h5py for other reasons.)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error in DataArray.from_dict(data_array.to_dict()) when using pd.MultiIndex 620134014
1102116177 https://github.com/pydata/xarray/issues/4073#issuecomment-1102116177 https://api.github.com/repos/pydata/xarray/issues/4073 IC_kwDOAMm_X85BsPVR max-sixty 5635139 2022-04-19T05:48:31Z 2022-04-19T05:48:31Z MEMBER

This is indeed not ideal. I'm not sure we have great round-trip support for dict but I'd hope this would be possible to make work.

I'll mark as a bug. Any PRs working towards this would be greatly appreciated.

(and apologies your excellent issue didn't get picked up @genric )

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error in DataArray.from_dict(data_array.to_dict()) when using pd.MultiIndex 620134014
1102108752 https://github.com/pydata/xarray/issues/4073#issuecomment-1102108752 https://api.github.com/repos/pydata/xarray/issues/4073 IC_kwDOAMm_X85BsNhQ stale[bot] 26384082 2022-04-19T05:43:50Z 2022-04-19T05:43:50Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error in DataArray.from_dict(data_array.to_dict()) when using pd.MultiIndex 620134014
641786386 https://github.com/pydata/xarray/issues/4073#issuecomment-641786386 https://api.github.com/repos/pydata/xarray/issues/4073 MDEyOklzc3VlQ29tbWVudDY0MTc4NjM4Ng== genric 13747844 2020-06-10T07:21:56Z 2020-06-10T07:21:56Z NONE

Didn't know that it is such a fundamental problem. Managed to get around with set/reset index.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error in DataArray.from_dict(data_array.to_dict()) when using pd.MultiIndex 620134014

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 882.508ms · About: xarray-datasette