html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2835#issuecomment-1256553736,https://api.github.com/repos/pydata/xarray/issues/2835,1256553736,IC_kwDOAMm_X85K5X0I,4447466,2022-09-23T18:47:58Z,2022-09-27T15:00:51Z,CONTRIBUTOR,"Just for the record, I just ran into this for the specific case of *nested* dictionary attrs in Dataarray.attrs. 

It's definitely an issue in 2022.3.0 and 2022.6.0.  Here's a minimal test example in case anyone else runs into this too...

```python
# MINIMAL EXAMPLE

import xarray as xr
import numpy as np

data = xr.DataArray(np.random.randn(2, 3), dims=(""x"", ""y""), coords={""x"": [10, 20]})
data.attrs['flat']='0'
data.attrs['nested']={'level1':'1'}

data2 = data.copy(deep=True)
data2.attrs['flat']='2'  # OK
# data2.attrs['nested']={'level1':'2'}  # OK
# data2.attrs['nested']['level1'] = '2'  # Fails - overwrites data
data2.attrs['nested'].update({'level1':'2'})  # Fails - overwrites data

print(data.attrs)
print(data2.attrs)
```

Outputs

In XR 2022.3.0 and 2022.6.0 this gives (incorrect):
```
{'flat': '0', 'nested': {'level1': '2'}}
{'flat': '2', 'nested': {'level1': '2'}}

```

As a work-around, safe attrs copy with deepcopy works:

```python
data2 = data.copy(deep=True)
data2.attrs = copy.deepcopy(data.attrs)

```

With correct results after modification:
```
{'flat': '0', 'nested': {'level1': '1'}}
{'flat': '2', 'nested': {'level1': '2'}}

```

EDIT 26th Sept: retested in 2022.6.0 and found it was, in fact, failing there too. Updated comment to reflect this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,423742774
https://github.com/pydata/xarray/issues/2835#issuecomment-1258611190,https://api.github.com/repos/pydata/xarray/issues/2835,1258611190,IC_kwDOAMm_X85LBOH2,4447466,2022-09-26T20:43:01Z,2022-09-26T20:43:01Z,CONTRIBUTOR,"> Ok, I thought that copying attrs was fixed. Seems like it did not...

Sorry for the mix-up there - think I initially tested in 2022.6 with the extra `data2.attrs = copy.deepcopy(data.attrs)` implemented. Caught it with the new test routine 😄 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,423742774
https://github.com/pydata/xarray/issues/2835#issuecomment-1258538050,https://api.github.com/repos/pydata/xarray/issues/2835,1258538050,IC_kwDOAMm_X85LA8RC,4447466,2022-09-26T19:48:39Z,2022-09-26T19:52:14Z,CONTRIBUTOR,"OK, new test now pushed as #7086. (Hopefully added in the right place and style!)

A couple of additional notes:

- Revision to my comment above: this actually fails in 2022.3 *and* 2022.6 for nested attribs.
- I took a look at the source code in `dataarray.py`, but couldn't see an obvious way to fix this and/or didn't understand the attrs copying process generally.
- I tested the equivalent case for DataSet attrs too (see below), and this seems fine as per your previous comments above, so I think https://github.com/pydata/xarray/pull/2839 (which includes a ds level test) still applies to `ds.attrs`, however the issue *does* affect the individual arrays within the dataset still (as expected).

```python
import xarray as xr

ds = xr.Dataset({""a"": ([""x""], [1, 2, 3])}, attrs={""t"": 1, ""nested"":{""t2"": 1}})
ds.a.attrs = {""t"": 'a1', ""nested"":{""t2"": 'a1'}}

ds2 = ds.copy(deep=True)
ds.attrs[""t""] = 5
ds.attrs[""nested""][""t2""] = 10

ds2.a.attrs[""t""] = 'a2'
ds2.a.attrs[""nested""][""t2""] = 'a2'


print(ds.attrs)
print(ds.a.attrs)
print(ds2.attrs)
print(ds2.a.attrs)

```

Results in:

```
{'t': 5, 'nested': {'t2': 10}}
{'t': 'a1', 'nested': {'t2': 'a2'}}
{'t': 1, 'nested': {'t2': 1}}
{'t': 'a2', 'nested': {'t2': 'a2'}}

```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,423742774
https://github.com/pydata/xarray/issues/2835#issuecomment-1258231700,https://api.github.com/repos/pydata/xarray/issues/2835,1258231700,IC_kwDOAMm_X85K_xeU,4447466,2022-09-26T15:38:18Z,2022-09-26T15:38:18Z,CONTRIBUTOR,"Absolutely @headtr1ck, glad that it was useful - I'm a bit green re: tests and PRs to large projects, but will make a stab at it. I'm just consulting the Contributing guide now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,423742774
https://github.com/pydata/xarray/issues/4073#issuecomment-1163292454,https://api.github.com/repos/pydata/xarray/issues/4073,1163292454,IC_kwDOAMm_X85FVm8m,4447466,2022-06-22T15:51:51Z,2022-06-22T16:12:34Z,CONTRIBUTOR,"I also ran into this when trying to serialize to `dict` for general file writing routines (esp. for HDF5 writing with [h5py](https://docs.h5py.org/en/stable/index.html)), but the issue was in my non-dimensional coordinates! I thought I was being careful by already using `array.unwrap()` in my IO routine, but also required `.reset_index()` or `.drop()` for non-dimensional coordinates.

Some notes below in case it is useful for anyone else trying to do this. Also - this is all quite ugly, and I may have missed some existing core functionality, so I'll be very happy to hear if there is a better way to handle this.

---

Following the above, a minimal example:

```python
import pandas as pd
import xarray as xr
idx = pd.MultiIndex.from_arrays([[1, 2], [3, 4]], names=('one', 'two'))
array = xr.DataArray([0, 1], dims='idx', coords={'idx': idx})


# Stacked multidim coords > dict > recreate array - Fails
xr.DataArray.from_dict(array.to_dict()) 

# Unstack multidim coords > dict > recreate array - OK
xr.DataArray.from_dict(array.unstack().to_dict()) 

# Set non-dimensional coord
array2 = array.copy()
array2['Labels'] = ('idx', ['A','B'])  # Add non-dim coord
array2 = array2.swap_dims({'idx':'Labels'})  # Swap dims

# Non-dim coord case - also need to reset and drop non-dim coords
# This will fail
array2_dict = array2.unstack().reset_index('idx').to_dict()
xr.DataArray.from_dict(array2_dict)

# This is OK
array2_dict = array2.unstack().reset_index('idx', drop=True).to_dict()
xr.DataArray.from_dict(array2_dict)

# This is also OK
array2_dict = array2.unstack().drop('idx').to_dict()
xr.DataArray.from_dict(array2_dict)

```

In all cases the reconstructed array is flat, and missing non-dim coords. My work-around for this so far is to pull various mappings manually, and dump everything to `.attrs`, then rebuild from those if required, e.g.

```python
def mapDims(data):
    # Get dims from Xarray
    dims = data.dims # Set dim list - this excludes stacked dims
    dimsUS = data.unstack().dims  # Set unstaked (full) dim list
    
    # List stacked dims and map
    # Could also do this by type checking vs. 'pandas.core.indexes.multi.MultiIndex'?
    stackedDims = list(set(dims) - set(dimsUS))
    stackedDimsMap = {k: list(data.indexes[k].names) for k in stackedDims} 
    
    # Get non-dimensional coords
    # These may be stacked, are not listed in self.dims, and are not addressed by .unstack()
    idxKeys = list(data.indexes.keys())
    coordsKeys = list(data.coords.keys())
    nonDimCoords = list(set(coordsKeys) - set(idxKeys))
    # nonDimCoords = list(set(dims) - set(idxKeys))
    
    # Get non-dim indexes
    # nddimIndexes = {k:data.coords[k].to_index() for k,v in data.coords.items() if k in nonDimCoords}  # Note this returns Pandas Indexes, so may fail on file IO.
    nddimMap = {k:list(data.coords[k].to_index().names) for k,v in data.coords.items() if k in nonDimCoords}
    
    # Get dict maps - to_dict per non-dim coord
    #  nddimDicts = {k:data.coords[k].reset_index(k).to_dict() for k,v in data.coords.items() if k in nonDimCoords}
    # Use Pandas - this allows direct dump of PD multiindex to dicts
    nddimDicts = {k:data.coords[k].to_index().to_frame().to_dict() for k,v in data.coords.items() if k in nonDimCoords}
    # Get coords correlated to non-dim coords, need these to recreate original links & stacking (?)
    nddimDims = {k:data.coords[k].dims for k,v in data.coords.items() if k in nonDimCoords}
    
    return {k:v for k,v in locals().items() if k !='data'}


def deconstructDims(data):
    
    xrDecon = data.copy()
    
    # Map dims
    xrDecon.attrs['dimMaps'] = mapDims(data)
    
    # Unstack all coords
    xrDecon = xrDecon.unstack()
    
    # Remove non-dim coords
    for nddim in xrDecon.attrs['dimMaps']['nonDimCoords']:
        xrDecon = xrDecon.drop(nddim)
        
    return xrDecon
    

def reconstructDims(data):

    xrRecon = data.copy()
    
    # Restack coords
    for stacked in xrRecon.attrs['dimMaps']['stackedDims']:
        xrRecon = xrRecon.stack({stacked:xrRecon.attrs['dimMaps']['stackedDims']})
    
    # General non-dim coord rebuild
    for nddim in xrRecon.attrs['dimMaps']['nonDimCoords']:
        # Add nddim back into main XR array
        xrRecon.coords[nddim] = (xrRecon.attrs['dimMaps']['nddimDims'][nddim] ,pd.MultiIndex.from_frame(pd.DataFrame.from_dict(xrRecon.attrs['dimMaps']['nddimDicts'][nddim])))  # OK

    return xrRecon
    
```

Dict round-trip is then OK, and the dictionary can also be pushed to standard file types (contains only python native types + numpy array).

```python
# IO with funcs
# With additional tuple coord
array2 = array.copy()
array2['Labels'] = ('idx', ['A','B'])  # Add non-dim coord
array2 = array2.swap_dims({'idx':'Labels'})  # Swap dims

# Decon to dict
safeDict = deconstructDims(array2).to_dict()

# Rebuild
xrFromDict = reconstructDims(xr.DataArray.from_dict(safeDict))

# Same as array2 (aside from added attrs)
array2.attrs = xrFromDict.attrs
array2.identical(xrFromDict)  # True

```

Again, there is likely some cleaner/more obvious thing I'm missing here, but I'm not very familiar with Xarray or Pandas internals here - this is just where I ended up when trying to convert to HDF5 compatible datastructures in a semi-general way.

(As a side-note, I ran into similar issues with `xr.DataArray.to_netcdf()` and multi-index coords, or at least did last time I tried it - but I didn't look into this further since I prefer using h5py for other reasons.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,620134014