html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/4073#issuecomment-1163292454,https://api.github.com/repos/pydata/xarray/issues/4073,1163292454,IC_kwDOAMm_X85FVm8m,4447466,2022-06-22T15:51:51Z,2022-06-22T16:12:34Z,CONTRIBUTOR,"I also ran into this when trying to serialize to `dict` for general file writing routines (esp. for HDF5 writing with [h5py](https://docs.h5py.org/en/stable/index.html)), but the issue was in my non-dimensional coordinates! I thought I was being careful by already using `array.unwrap()` in my IO routine, but also required `.reset_index()` or `.drop()` for non-dimensional coordinates. Some notes below in case it is useful for anyone else trying to do this. Also - this is all quite ugly, and I may have missed some existing core functionality, so I'll be very happy to hear if there is a better way to handle this. --- Following the above, a minimal example: ```python import pandas as pd import xarray as xr idx = pd.MultiIndex.from_arrays([[1, 2], [3, 4]], names=('one', 'two')) array = xr.DataArray([0, 1], dims='idx', coords={'idx': idx}) # Stacked multidim coords > dict > recreate array - Fails xr.DataArray.from_dict(array.to_dict()) # Unstack multidim coords > dict > recreate array - OK xr.DataArray.from_dict(array.unstack().to_dict()) # Set non-dimensional coord array2 = array.copy() array2['Labels'] = ('idx', ['A','B']) # Add non-dim coord array2 = array2.swap_dims({'idx':'Labels'}) # Swap dims # Non-dim coord case - also need to reset and drop non-dim coords # This will fail array2_dict = array2.unstack().reset_index('idx').to_dict() xr.DataArray.from_dict(array2_dict) # This is OK array2_dict = array2.unstack().reset_index('idx', drop=True).to_dict() xr.DataArray.from_dict(array2_dict) # This is also OK array2_dict = array2.unstack().drop('idx').to_dict() xr.DataArray.from_dict(array2_dict) ``` In all cases the reconstructed array is flat, and missing non-dim coords. My work-around for this so far is to pull various mappings manually, and dump everything to `.attrs`, then rebuild from those if required, e.g. ```python def mapDims(data): # Get dims from Xarray dims = data.dims # Set dim list - this excludes stacked dims dimsUS = data.unstack().dims # Set unstaked (full) dim list # List stacked dims and map # Could also do this by type checking vs. 'pandas.core.indexes.multi.MultiIndex'? stackedDims = list(set(dims) - set(dimsUS)) stackedDimsMap = {k: list(data.indexes[k].names) for k in stackedDims} # Get non-dimensional coords # These may be stacked, are not listed in self.dims, and are not addressed by .unstack() idxKeys = list(data.indexes.keys()) coordsKeys = list(data.coords.keys()) nonDimCoords = list(set(coordsKeys) - set(idxKeys)) # nonDimCoords = list(set(dims) - set(idxKeys)) # Get non-dim indexes # nddimIndexes = {k:data.coords[k].to_index() for k,v in data.coords.items() if k in nonDimCoords} # Note this returns Pandas Indexes, so may fail on file IO. nddimMap = {k:list(data.coords[k].to_index().names) for k,v in data.coords.items() if k in nonDimCoords} # Get dict maps - to_dict per non-dim coord # nddimDicts = {k:data.coords[k].reset_index(k).to_dict() for k,v in data.coords.items() if k in nonDimCoords} # Use Pandas - this allows direct dump of PD multiindex to dicts nddimDicts = {k:data.coords[k].to_index().to_frame().to_dict() for k,v in data.coords.items() if k in nonDimCoords} # Get coords correlated to non-dim coords, need these to recreate original links & stacking (?) nddimDims = {k:data.coords[k].dims for k,v in data.coords.items() if k in nonDimCoords} return {k:v for k,v in locals().items() if k !='data'} def deconstructDims(data): xrDecon = data.copy() # Map dims xrDecon.attrs['dimMaps'] = mapDims(data) # Unstack all coords xrDecon = xrDecon.unstack() # Remove non-dim coords for nddim in xrDecon.attrs['dimMaps']['nonDimCoords']: xrDecon = xrDecon.drop(nddim) return xrDecon def reconstructDims(data): xrRecon = data.copy() # Restack coords for stacked in xrRecon.attrs['dimMaps']['stackedDims']: xrRecon = xrRecon.stack({stacked:xrRecon.attrs['dimMaps']['stackedDims']}) # General non-dim coord rebuild for nddim in xrRecon.attrs['dimMaps']['nonDimCoords']: # Add nddim back into main XR array xrRecon.coords[nddim] = (xrRecon.attrs['dimMaps']['nddimDims'][nddim] ,pd.MultiIndex.from_frame(pd.DataFrame.from_dict(xrRecon.attrs['dimMaps']['nddimDicts'][nddim]))) # OK return xrRecon ``` Dict round-trip is then OK, and the dictionary can also be pushed to standard file types (contains only python native types + numpy array). ```python # IO with funcs # With additional tuple coord array2 = array.copy() array2['Labels'] = ('idx', ['A','B']) # Add non-dim coord array2 = array2.swap_dims({'idx':'Labels'}) # Swap dims # Decon to dict safeDict = deconstructDims(array2).to_dict() # Rebuild xrFromDict = reconstructDims(xr.DataArray.from_dict(safeDict)) # Same as array2 (aside from added attrs) array2.attrs = xrFromDict.attrs array2.identical(xrFromDict) # True ``` Again, there is likely some cleaner/more obvious thing I'm missing here, but I'm not very familiar with Xarray or Pandas internals here - this is just where I ended up when trying to convert to HDF5 compatible datastructures in a semi-general way. (As a side-note, I ran into similar issues with `xr.DataArray.to_netcdf()` and multi-index coords, or at least did last time I tried it - but I didn't look into this further since I prefer using h5py for other reasons.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,620134014