html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2835#issuecomment-1256553736,https://api.github.com/repos/pydata/xarray/issues/2835,1256553736,IC_kwDOAMm_X85K5X0I,4447466,2022-09-23T18:47:58Z,2022-09-27T15:00:51Z,CONTRIBUTOR,"Just for the record, I just ran into this for the specific case of *nested* dictionary attrs in Dataarray.attrs.
It's definitely an issue in 2022.3.0 and 2022.6.0. Here's a minimal test example in case anyone else runs into this too...
```python
# MINIMAL EXAMPLE
import xarray as xr
import numpy as np
data = xr.DataArray(np.random.randn(2, 3), dims=(""x"", ""y""), coords={""x"": [10, 20]})
data.attrs['flat']='0'
data.attrs['nested']={'level1':'1'}
data2 = data.copy(deep=True)
data2.attrs['flat']='2' # OK
# data2.attrs['nested']={'level1':'2'} # OK
# data2.attrs['nested']['level1'] = '2' # Fails - overwrites data
data2.attrs['nested'].update({'level1':'2'}) # Fails - overwrites data
print(data.attrs)
print(data2.attrs)
```
Outputs
In XR 2022.3.0 and 2022.6.0 this gives (incorrect):
```
{'flat': '0', 'nested': {'level1': '2'}}
{'flat': '2', 'nested': {'level1': '2'}}
```
As a work-around, safe attrs copy with deepcopy works:
```python
data2 = data.copy(deep=True)
data2.attrs = copy.deepcopy(data.attrs)
```
With correct results after modification:
```
{'flat': '0', 'nested': {'level1': '1'}}
{'flat': '2', 'nested': {'level1': '2'}}
```
EDIT 26th Sept: retested in 2022.6.0 and found it was, in fact, failing there too. Updated comment to reflect this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,423742774
https://github.com/pydata/xarray/issues/2835#issuecomment-1258611190,https://api.github.com/repos/pydata/xarray/issues/2835,1258611190,IC_kwDOAMm_X85LBOH2,4447466,2022-09-26T20:43:01Z,2022-09-26T20:43:01Z,CONTRIBUTOR,"> Ok, I thought that copying attrs was fixed. Seems like it did not...
Sorry for the mix-up there - think I initially tested in 2022.6 with the extra `data2.attrs = copy.deepcopy(data.attrs)` implemented. Caught it with the new test routine 😄 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,423742774
https://github.com/pydata/xarray/issues/2835#issuecomment-1258538050,https://api.github.com/repos/pydata/xarray/issues/2835,1258538050,IC_kwDOAMm_X85LA8RC,4447466,2022-09-26T19:48:39Z,2022-09-26T19:52:14Z,CONTRIBUTOR,"OK, new test now pushed as #7086. (Hopefully added in the right place and style!)
A couple of additional notes:
- Revision to my comment above: this actually fails in 2022.3 *and* 2022.6 for nested attribs.
- I took a look at the source code in `dataarray.py`, but couldn't see an obvious way to fix this and/or didn't understand the attrs copying process generally.
- I tested the equivalent case for DataSet attrs too (see below), and this seems fine as per your previous comments above, so I think https://github.com/pydata/xarray/pull/2839 (which includes a ds level test) still applies to `ds.attrs`, however the issue *does* affect the individual arrays within the dataset still (as expected).
```python
import xarray as xr
ds = xr.Dataset({""a"": ([""x""], [1, 2, 3])}, attrs={""t"": 1, ""nested"":{""t2"": 1}})
ds.a.attrs = {""t"": 'a1', ""nested"":{""t2"": 'a1'}}
ds2 = ds.copy(deep=True)
ds.attrs[""t""] = 5
ds.attrs[""nested""][""t2""] = 10
ds2.a.attrs[""t""] = 'a2'
ds2.a.attrs[""nested""][""t2""] = 'a2'
print(ds.attrs)
print(ds.a.attrs)
print(ds2.attrs)
print(ds2.a.attrs)
```
Results in:
```
{'t': 5, 'nested': {'t2': 10}}
{'t': 'a1', 'nested': {'t2': 'a2'}}
{'t': 1, 'nested': {'t2': 1}}
{'t': 'a2', 'nested': {'t2': 'a2'}}
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,423742774
https://github.com/pydata/xarray/issues/2835#issuecomment-1258231700,https://api.github.com/repos/pydata/xarray/issues/2835,1258231700,IC_kwDOAMm_X85K_xeU,4447466,2022-09-26T15:38:18Z,2022-09-26T15:38:18Z,CONTRIBUTOR,"Absolutely @headtr1ck, glad that it was useful - I'm a bit green re: tests and PRs to large projects, but will make a stab at it. I'm just consulting the Contributing guide now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,423742774
https://github.com/pydata/xarray/issues/4073#issuecomment-1163292454,https://api.github.com/repos/pydata/xarray/issues/4073,1163292454,IC_kwDOAMm_X85FVm8m,4447466,2022-06-22T15:51:51Z,2022-06-22T16:12:34Z,CONTRIBUTOR,"I also ran into this when trying to serialize to `dict` for general file writing routines (esp. for HDF5 writing with [h5py](https://docs.h5py.org/en/stable/index.html)), but the issue was in my non-dimensional coordinates! I thought I was being careful by already using `array.unwrap()` in my IO routine, but also required `.reset_index()` or `.drop()` for non-dimensional coordinates.
Some notes below in case it is useful for anyone else trying to do this. Also - this is all quite ugly, and I may have missed some existing core functionality, so I'll be very happy to hear if there is a better way to handle this.
---
Following the above, a minimal example:
```python
import pandas as pd
import xarray as xr
idx = pd.MultiIndex.from_arrays([[1, 2], [3, 4]], names=('one', 'two'))
array = xr.DataArray([0, 1], dims='idx', coords={'idx': idx})
# Stacked multidim coords > dict > recreate array - Fails
xr.DataArray.from_dict(array.to_dict())
# Unstack multidim coords > dict > recreate array - OK
xr.DataArray.from_dict(array.unstack().to_dict())
# Set non-dimensional coord
array2 = array.copy()
array2['Labels'] = ('idx', ['A','B']) # Add non-dim coord
array2 = array2.swap_dims({'idx':'Labels'}) # Swap dims
# Non-dim coord case - also need to reset and drop non-dim coords
# This will fail
array2_dict = array2.unstack().reset_index('idx').to_dict()
xr.DataArray.from_dict(array2_dict)
# This is OK
array2_dict = array2.unstack().reset_index('idx', drop=True).to_dict()
xr.DataArray.from_dict(array2_dict)
# This is also OK
array2_dict = array2.unstack().drop('idx').to_dict()
xr.DataArray.from_dict(array2_dict)
```
In all cases the reconstructed array is flat, and missing non-dim coords. My work-around for this so far is to pull various mappings manually, and dump everything to `.attrs`, then rebuild from those if required, e.g.
```python
def mapDims(data):
# Get dims from Xarray
dims = data.dims # Set dim list - this excludes stacked dims
dimsUS = data.unstack().dims # Set unstaked (full) dim list
# List stacked dims and map
# Could also do this by type checking vs. 'pandas.core.indexes.multi.MultiIndex'?
stackedDims = list(set(dims) - set(dimsUS))
stackedDimsMap = {k: list(data.indexes[k].names) for k in stackedDims}
# Get non-dimensional coords
# These may be stacked, are not listed in self.dims, and are not addressed by .unstack()
idxKeys = list(data.indexes.keys())
coordsKeys = list(data.coords.keys())
nonDimCoords = list(set(coordsKeys) - set(idxKeys))
# nonDimCoords = list(set(dims) - set(idxKeys))
# Get non-dim indexes
# nddimIndexes = {k:data.coords[k].to_index() for k,v in data.coords.items() if k in nonDimCoords} # Note this returns Pandas Indexes, so may fail on file IO.
nddimMap = {k:list(data.coords[k].to_index().names) for k,v in data.coords.items() if k in nonDimCoords}
# Get dict maps - to_dict per non-dim coord
# nddimDicts = {k:data.coords[k].reset_index(k).to_dict() for k,v in data.coords.items() if k in nonDimCoords}
# Use Pandas - this allows direct dump of PD multiindex to dicts
nddimDicts = {k:data.coords[k].to_index().to_frame().to_dict() for k,v in data.coords.items() if k in nonDimCoords}
# Get coords correlated to non-dim coords, need these to recreate original links & stacking (?)
nddimDims = {k:data.coords[k].dims for k,v in data.coords.items() if k in nonDimCoords}
return {k:v for k,v in locals().items() if k !='data'}
def deconstructDims(data):
xrDecon = data.copy()
# Map dims
xrDecon.attrs['dimMaps'] = mapDims(data)
# Unstack all coords
xrDecon = xrDecon.unstack()
# Remove non-dim coords
for nddim in xrDecon.attrs['dimMaps']['nonDimCoords']:
xrDecon = xrDecon.drop(nddim)
return xrDecon
def reconstructDims(data):
xrRecon = data.copy()
# Restack coords
for stacked in xrRecon.attrs['dimMaps']['stackedDims']:
xrRecon = xrRecon.stack({stacked:xrRecon.attrs['dimMaps']['stackedDims']})
# General non-dim coord rebuild
for nddim in xrRecon.attrs['dimMaps']['nonDimCoords']:
# Add nddim back into main XR array
xrRecon.coords[nddim] = (xrRecon.attrs['dimMaps']['nddimDims'][nddim] ,pd.MultiIndex.from_frame(pd.DataFrame.from_dict(xrRecon.attrs['dimMaps']['nddimDicts'][nddim]))) # OK
return xrRecon
```
Dict round-trip is then OK, and the dictionary can also be pushed to standard file types (contains only python native types + numpy array).
```python
# IO with funcs
# With additional tuple coord
array2 = array.copy()
array2['Labels'] = ('idx', ['A','B']) # Add non-dim coord
array2 = array2.swap_dims({'idx':'Labels'}) # Swap dims
# Decon to dict
safeDict = deconstructDims(array2).to_dict()
# Rebuild
xrFromDict = reconstructDims(xr.DataArray.from_dict(safeDict))
# Same as array2 (aside from added attrs)
array2.attrs = xrFromDict.attrs
array2.identical(xrFromDict) # True
```
Again, there is likely some cleaner/more obvious thing I'm missing here, but I'm not very familiar with Xarray or Pandas internals here - this is just where I ended up when trying to convert to HDF5 compatible datastructures in a semi-general way.
(As a side-note, I ran into similar issues with `xr.DataArray.to_netcdf()` and multi-index coords, or at least did last time I tried it - but I didn't look into this further since I prefer using h5py for other reasons.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,620134014