html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2835#issuecomment-1256553736,https://api.github.com/repos/pydata/xarray/issues/2835,1256553736,IC_kwDOAMm_X85K5X0I,4447466,2022-09-23T18:47:58Z,2022-09-27T15:00:51Z,CONTRIBUTOR,"Just for the record, I just ran into this for the specific case of *nested* dictionary attrs in Dataarray.attrs. It's definitely an issue in 2022.3.0 and 2022.6.0. Here's a minimal test example in case anyone else runs into this too... ```python # MINIMAL EXAMPLE import xarray as xr import numpy as np data = xr.DataArray(np.random.randn(2, 3), dims=(""x"", ""y""), coords={""x"": [10, 20]}) data.attrs['flat']='0' data.attrs['nested']={'level1':'1'} data2 = data.copy(deep=True) data2.attrs['flat']='2' # OK # data2.attrs['nested']={'level1':'2'} # OK # data2.attrs['nested']['level1'] = '2' # Fails - overwrites data data2.attrs['nested'].update({'level1':'2'}) # Fails - overwrites data print(data.attrs) print(data2.attrs) ``` Outputs In XR 2022.3.0 and 2022.6.0 this gives (incorrect): ``` {'flat': '0', 'nested': {'level1': '2'}} {'flat': '2', 'nested': {'level1': '2'}} ``` As a work-around, safe attrs copy with deepcopy works: ```python data2 = data.copy(deep=True) data2.attrs = copy.deepcopy(data.attrs) ``` With correct results after modification: ``` {'flat': '0', 'nested': {'level1': '1'}} {'flat': '2', 'nested': {'level1': '2'}} ``` EDIT 26th Sept: retested in 2022.6.0 and found it was, in fact, failing there too. Updated comment to reflect this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,423742774 https://github.com/pydata/xarray/issues/2835#issuecomment-1258611190,https://api.github.com/repos/pydata/xarray/issues/2835,1258611190,IC_kwDOAMm_X85LBOH2,4447466,2022-09-26T20:43:01Z,2022-09-26T20:43:01Z,CONTRIBUTOR,"> Ok, I thought that copying attrs was fixed. Seems like it did not... Sorry for the mix-up there - think I initially tested in 2022.6 with the extra `data2.attrs = copy.deepcopy(data.attrs)` implemented. Caught it with the new test routine 😄 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,423742774 https://github.com/pydata/xarray/issues/2835#issuecomment-1258538050,https://api.github.com/repos/pydata/xarray/issues/2835,1258538050,IC_kwDOAMm_X85LA8RC,4447466,2022-09-26T19:48:39Z,2022-09-26T19:52:14Z,CONTRIBUTOR,"OK, new test now pushed as #7086. (Hopefully added in the right place and style!) A couple of additional notes: - Revision to my comment above: this actually fails in 2022.3 *and* 2022.6 for nested attribs. - I took a look at the source code in `dataarray.py`, but couldn't see an obvious way to fix this and/or didn't understand the attrs copying process generally. - I tested the equivalent case for DataSet attrs too (see below), and this seems fine as per your previous comments above, so I think https://github.com/pydata/xarray/pull/2839 (which includes a ds level test) still applies to `ds.attrs`, however the issue *does* affect the individual arrays within the dataset still (as expected). ```python import xarray as xr ds = xr.Dataset({""a"": ([""x""], [1, 2, 3])}, attrs={""t"": 1, ""nested"":{""t2"": 1}}) ds.a.attrs = {""t"": 'a1', ""nested"":{""t2"": 'a1'}} ds2 = ds.copy(deep=True) ds.attrs[""t""] = 5 ds.attrs[""nested""][""t2""] = 10 ds2.a.attrs[""t""] = 'a2' ds2.a.attrs[""nested""][""t2""] = 'a2' print(ds.attrs) print(ds.a.attrs) print(ds2.attrs) print(ds2.a.attrs) ``` Results in: ``` {'t': 5, 'nested': {'t2': 10}} {'t': 'a1', 'nested': {'t2': 'a2'}} {'t': 1, 'nested': {'t2': 1}} {'t': 'a2', 'nested': {'t2': 'a2'}} ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,423742774 https://github.com/pydata/xarray/issues/2835#issuecomment-1258231700,https://api.github.com/repos/pydata/xarray/issues/2835,1258231700,IC_kwDOAMm_X85K_xeU,4447466,2022-09-26T15:38:18Z,2022-09-26T15:38:18Z,CONTRIBUTOR,"Absolutely @headtr1ck, glad that it was useful - I'm a bit green re: tests and PRs to large projects, but will make a stab at it. I'm just consulting the Contributing guide now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,423742774 https://github.com/pydata/xarray/issues/4073#issuecomment-1163292454,https://api.github.com/repos/pydata/xarray/issues/4073,1163292454,IC_kwDOAMm_X85FVm8m,4447466,2022-06-22T15:51:51Z,2022-06-22T16:12:34Z,CONTRIBUTOR,"I also ran into this when trying to serialize to `dict` for general file writing routines (esp. for HDF5 writing with [h5py](https://docs.h5py.org/en/stable/index.html)), but the issue was in my non-dimensional coordinates! I thought I was being careful by already using `array.unwrap()` in my IO routine, but also required `.reset_index()` or `.drop()` for non-dimensional coordinates. Some notes below in case it is useful for anyone else trying to do this. Also - this is all quite ugly, and I may have missed some existing core functionality, so I'll be very happy to hear if there is a better way to handle this. --- Following the above, a minimal example: ```python import pandas as pd import xarray as xr idx = pd.MultiIndex.from_arrays([[1, 2], [3, 4]], names=('one', 'two')) array = xr.DataArray([0, 1], dims='idx', coords={'idx': idx}) # Stacked multidim coords > dict > recreate array - Fails xr.DataArray.from_dict(array.to_dict()) # Unstack multidim coords > dict > recreate array - OK xr.DataArray.from_dict(array.unstack().to_dict()) # Set non-dimensional coord array2 = array.copy() array2['Labels'] = ('idx', ['A','B']) # Add non-dim coord array2 = array2.swap_dims({'idx':'Labels'}) # Swap dims # Non-dim coord case - also need to reset and drop non-dim coords # This will fail array2_dict = array2.unstack().reset_index('idx').to_dict() xr.DataArray.from_dict(array2_dict) # This is OK array2_dict = array2.unstack().reset_index('idx', drop=True).to_dict() xr.DataArray.from_dict(array2_dict) # This is also OK array2_dict = array2.unstack().drop('idx').to_dict() xr.DataArray.from_dict(array2_dict) ``` In all cases the reconstructed array is flat, and missing non-dim coords. My work-around for this so far is to pull various mappings manually, and dump everything to `.attrs`, then rebuild from those if required, e.g. ```python def mapDims(data): # Get dims from Xarray dims = data.dims # Set dim list - this excludes stacked dims dimsUS = data.unstack().dims # Set unstaked (full) dim list # List stacked dims and map # Could also do this by type checking vs. 'pandas.core.indexes.multi.MultiIndex'? stackedDims = list(set(dims) - set(dimsUS)) stackedDimsMap = {k: list(data.indexes[k].names) for k in stackedDims} # Get non-dimensional coords # These may be stacked, are not listed in self.dims, and are not addressed by .unstack() idxKeys = list(data.indexes.keys()) coordsKeys = list(data.coords.keys()) nonDimCoords = list(set(coordsKeys) - set(idxKeys)) # nonDimCoords = list(set(dims) - set(idxKeys)) # Get non-dim indexes # nddimIndexes = {k:data.coords[k].to_index() for k,v in data.coords.items() if k in nonDimCoords} # Note this returns Pandas Indexes, so may fail on file IO. nddimMap = {k:list(data.coords[k].to_index().names) for k,v in data.coords.items() if k in nonDimCoords} # Get dict maps - to_dict per non-dim coord # nddimDicts = {k:data.coords[k].reset_index(k).to_dict() for k,v in data.coords.items() if k in nonDimCoords} # Use Pandas - this allows direct dump of PD multiindex to dicts nddimDicts = {k:data.coords[k].to_index().to_frame().to_dict() for k,v in data.coords.items() if k in nonDimCoords} # Get coords correlated to non-dim coords, need these to recreate original links & stacking (?) nddimDims = {k:data.coords[k].dims for k,v in data.coords.items() if k in nonDimCoords} return {k:v for k,v in locals().items() if k !='data'} def deconstructDims(data): xrDecon = data.copy() # Map dims xrDecon.attrs['dimMaps'] = mapDims(data) # Unstack all coords xrDecon = xrDecon.unstack() # Remove non-dim coords for nddim in xrDecon.attrs['dimMaps']['nonDimCoords']: xrDecon = xrDecon.drop(nddim) return xrDecon def reconstructDims(data): xrRecon = data.copy() # Restack coords for stacked in xrRecon.attrs['dimMaps']['stackedDims']: xrRecon = xrRecon.stack({stacked:xrRecon.attrs['dimMaps']['stackedDims']}) # General non-dim coord rebuild for nddim in xrRecon.attrs['dimMaps']['nonDimCoords']: # Add nddim back into main XR array xrRecon.coords[nddim] = (xrRecon.attrs['dimMaps']['nddimDims'][nddim] ,pd.MultiIndex.from_frame(pd.DataFrame.from_dict(xrRecon.attrs['dimMaps']['nddimDicts'][nddim]))) # OK return xrRecon ``` Dict round-trip is then OK, and the dictionary can also be pushed to standard file types (contains only python native types + numpy array). ```python # IO with funcs # With additional tuple coord array2 = array.copy() array2['Labels'] = ('idx', ['A','B']) # Add non-dim coord array2 = array2.swap_dims({'idx':'Labels'}) # Swap dims # Decon to dict safeDict = deconstructDims(array2).to_dict() # Rebuild xrFromDict = reconstructDims(xr.DataArray.from_dict(safeDict)) # Same as array2 (aside from added attrs) array2.attrs = xrFromDict.attrs array2.identical(xrFromDict) # True ``` Again, there is likely some cleaner/more obvious thing I'm missing here, but I'm not very familiar with Xarray or Pandas internals here - this is just where I ended up when trying to convert to HDF5 compatible datastructures in a semi-general way. (As a side-note, I ran into similar issues with `xr.DataArray.to_netcdf()` and multi-index coords, or at least did last time I tried it - but I didn't look into this further since I prefer using h5py for other reasons.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,620134014