html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/4073#issuecomment-1163292454,https://api.github.com/repos/pydata/xarray/issues/4073,1163292454,IC_kwDOAMm_X85FVm8m,4447466,2022-06-22T15:51:51Z,2022-06-22T16:12:34Z,CONTRIBUTOR,"I also ran into this when trying to serialize to `dict` for general file writing routines (esp. for HDF5 writing with [h5py](https://docs.h5py.org/en/stable/index.html)), but the issue was in my non-dimensional coordinates! I thought I was being careful by already using `array.unwrap()` in my IO routine, but also required `.reset_index()` or `.drop()` for non-dimensional coordinates.
Some notes below in case it is useful for anyone else trying to do this. Also - this is all quite ugly, and I may have missed some existing core functionality, so I'll be very happy to hear if there is a better way to handle this.
---
Following the above, a minimal example:
```python
import pandas as pd
import xarray as xr
idx = pd.MultiIndex.from_arrays([[1, 2], [3, 4]], names=('one', 'two'))
array = xr.DataArray([0, 1], dims='idx', coords={'idx': idx})
# Stacked multidim coords > dict > recreate array - Fails
xr.DataArray.from_dict(array.to_dict())
# Unstack multidim coords > dict > recreate array - OK
xr.DataArray.from_dict(array.unstack().to_dict())
# Set non-dimensional coord
array2 = array.copy()
array2['Labels'] = ('idx', ['A','B']) # Add non-dim coord
array2 = array2.swap_dims({'idx':'Labels'}) # Swap dims
# Non-dim coord case - also need to reset and drop non-dim coords
# This will fail
array2_dict = array2.unstack().reset_index('idx').to_dict()
xr.DataArray.from_dict(array2_dict)
# This is OK
array2_dict = array2.unstack().reset_index('idx', drop=True).to_dict()
xr.DataArray.from_dict(array2_dict)
# This is also OK
array2_dict = array2.unstack().drop('idx').to_dict()
xr.DataArray.from_dict(array2_dict)
```
In all cases the reconstructed array is flat, and missing non-dim coords. My work-around for this so far is to pull various mappings manually, and dump everything to `.attrs`, then rebuild from those if required, e.g.
```python
def mapDims(data):
# Get dims from Xarray
dims = data.dims # Set dim list - this excludes stacked dims
dimsUS = data.unstack().dims # Set unstaked (full) dim list
# List stacked dims and map
# Could also do this by type checking vs. 'pandas.core.indexes.multi.MultiIndex'?
stackedDims = list(set(dims) - set(dimsUS))
stackedDimsMap = {k: list(data.indexes[k].names) for k in stackedDims}
# Get non-dimensional coords
# These may be stacked, are not listed in self.dims, and are not addressed by .unstack()
idxKeys = list(data.indexes.keys())
coordsKeys = list(data.coords.keys())
nonDimCoords = list(set(coordsKeys) - set(idxKeys))
# nonDimCoords = list(set(dims) - set(idxKeys))
# Get non-dim indexes
# nddimIndexes = {k:data.coords[k].to_index() for k,v in data.coords.items() if k in nonDimCoords} # Note this returns Pandas Indexes, so may fail on file IO.
nddimMap = {k:list(data.coords[k].to_index().names) for k,v in data.coords.items() if k in nonDimCoords}
# Get dict maps - to_dict per non-dim coord
# nddimDicts = {k:data.coords[k].reset_index(k).to_dict() for k,v in data.coords.items() if k in nonDimCoords}
# Use Pandas - this allows direct dump of PD multiindex to dicts
nddimDicts = {k:data.coords[k].to_index().to_frame().to_dict() for k,v in data.coords.items() if k in nonDimCoords}
# Get coords correlated to non-dim coords, need these to recreate original links & stacking (?)
nddimDims = {k:data.coords[k].dims for k,v in data.coords.items() if k in nonDimCoords}
return {k:v for k,v in locals().items() if k !='data'}
def deconstructDims(data):
xrDecon = data.copy()
# Map dims
xrDecon.attrs['dimMaps'] = mapDims(data)
# Unstack all coords
xrDecon = xrDecon.unstack()
# Remove non-dim coords
for nddim in xrDecon.attrs['dimMaps']['nonDimCoords']:
xrDecon = xrDecon.drop(nddim)
return xrDecon
def reconstructDims(data):
xrRecon = data.copy()
# Restack coords
for stacked in xrRecon.attrs['dimMaps']['stackedDims']:
xrRecon = xrRecon.stack({stacked:xrRecon.attrs['dimMaps']['stackedDims']})
# General non-dim coord rebuild
for nddim in xrRecon.attrs['dimMaps']['nonDimCoords']:
# Add nddim back into main XR array
xrRecon.coords[nddim] = (xrRecon.attrs['dimMaps']['nddimDims'][nddim] ,pd.MultiIndex.from_frame(pd.DataFrame.from_dict(xrRecon.attrs['dimMaps']['nddimDicts'][nddim]))) # OK
return xrRecon
```
Dict round-trip is then OK, and the dictionary can also be pushed to standard file types (contains only python native types + numpy array).
```python
# IO with funcs
# With additional tuple coord
array2 = array.copy()
array2['Labels'] = ('idx', ['A','B']) # Add non-dim coord
array2 = array2.swap_dims({'idx':'Labels'}) # Swap dims
# Decon to dict
safeDict = deconstructDims(array2).to_dict()
# Rebuild
xrFromDict = reconstructDims(xr.DataArray.from_dict(safeDict))
# Same as array2 (aside from added attrs)
array2.attrs = xrFromDict.attrs
array2.identical(xrFromDict) # True
```
Again, there is likely some cleaner/more obvious thing I'm missing here, but I'm not very familiar with Xarray or Pandas internals here - this is just where I ended up when trying to convert to HDF5 compatible datastructures in a semi-general way.
(As a side-note, I ran into similar issues with `xr.DataArray.to_netcdf()` and multi-index coords, or at least did last time I tried it - but I didn't look into this further since I prefer using h5py for other reasons.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,620134014