github: issue_comments: 1 row where issue = 620134014 and user = 4447466 sorted by updated

1 row where issue = 620134014 and user = 4447466 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	performed_via_github_app	issue
1163292454	https://github.com/pydata/xarray/issues/4073#issuecomment-1163292454	https://api.github.com/repos/pydata/xarray/issues/4073	IC_kwDOAMm_X85FVm8m	phockett 4447466	2022-06-22T15:51:51Z	2022-06-22T16:12:34Z	CONTRIBUTOR	I also ran into this when trying to serialize to `dict` for general file writing routines (esp. for HDF5 writing with h5py), but the issue was in my non-dimensional coordinates! I thought I was being careful by already using `array.unwrap()` in my IO routine, but also required `.reset_index()` or `.drop()` for non-dimensional coordinates. Some notes below in case it is useful for anyone else trying to do this. Also - this is all quite ugly, and I may have missed some existing core functionality, so I'll be very happy to hear if there is a better way to handle this. Following the above, a minimal example: ```python import pandas as pd import xarray as xr idx = pd.MultiIndex.from_arrays([[1, 2], [3, 4]], names=('one', 'two')) array = xr.DataArray([0, 1], dims='idx', coords={'idx': idx}) Stacked multidim coords > dict > recreate array - Fails xr.DataArray.from_dict(array.to_dict()) Unstack multidim coords > dict > recreate array - OK xr.DataArray.from_dict(array.unstack().to_dict()) Set non-dimensional coord array2 = array.copy() array2['Labels'] = ('idx', ['A','B']) # Add non-dim coord array2 = array2.swap_dims({'idx':'Labels'}) # Swap dims Non-dim coord case - also need to reset and drop non-dim coords This will fail array2_dict = array2.unstack().reset_index('idx').to_dict() xr.DataArray.from_dict(array2_dict) This is OK array2_dict = array2.unstack().reset_index('idx', drop=True).to_dict() xr.DataArray.from_dict(array2_dict) This is also OK array2_dict = array2.unstack().drop('idx').to_dict() xr.DataArray.from_dict(array2_dict) ``` In all cases the reconstructed array is flat, and missing non-dim coords. My work-around for this so far is to pull various mappings manually, and dump everything to `.attrs`, then rebuild from those if required, e.g. ```python def mapDims(data): # Get dims from Xarray dims = data.dims # Set dim list - this excludes stacked dims dimsUS = data.unstack().dims # Set unstaked (full) dim list # List stacked dims and map # Could also do this by type checking vs. 'pandas.core.indexes.multi.MultiIndex'? stackedDims = list(set(dims) - set(dimsUS)) stackedDimsMap = {k: list(data.indexes[k].names) for k in stackedDims} # Get non-dimensional coords # These may be stacked, are not listed in self.dims, and are not addressed by .unstack() idxKeys = list(data.indexes.keys()) coordsKeys = list(data.coords.keys()) nonDimCoords = list(set(coordsKeys) - set(idxKeys)) # nonDimCoords = list(set(dims) - set(idxKeys)) # Get non-dim indexes # nddimIndexes = {k:data.coords[k].to_index() for k,v in data.coords.items() if k in nonDimCoords} # Note this returns Pandas Indexes, so may fail on file IO. nddimMap = {k:list(data.coords[k].to_index().names) for k,v in data.coords.items() if k in nonDimCoords} # Get dict maps - to_dict per non-dim coord # nddimDicts = {k:data.coords[k].reset_index(k).to_dict() for k,v in data.coords.items() if k in nonDimCoords} # Use Pandas - this allows direct dump of PD multiindex to dicts nddimDicts = {k:data.coords[k].to_index().to_frame().to_dict() for k,v in data.coords.items() if k in nonDimCoords} # Get coords correlated to non-dim coords, need these to recreate original links & stacking (?) nddimDims = {k:data.coords[k].dims for k,v in data.coords.items() if k in nonDimCoords} return {k:v for k,v in locals().items() if k !='data'} def deconstructDims(data): `xrDecon = data.copy() # Map dims xrDecon.attrs['dimMaps'] = mapDims(data) # Unstack all coords xrDecon = xrDecon.unstack() # Remove non-dim coords for nddim in xrDecon.attrs['dimMaps']['nonDimCoords']: xrDecon = xrDecon.drop(nddim) return xrDecon` def reconstructDims(data): `xrRecon = data.copy() # Restack coords for stacked in xrRecon.attrs['dimMaps']['stackedDims']: xrRecon = xrRecon.stack({stacked:xrRecon.attrs['dimMaps']['stackedDims']}) # General non-dim coord rebuild for nddim in xrRecon.attrs['dimMaps']['nonDimCoords']: # Add nddim back into main XR array xrRecon.coords[nddim] = (xrRecon.attrs['dimMaps']['nddimDims'][nddim] ,pd.MultiIndex.from_frame(pd.DataFrame.from_dict(xrRecon.attrs['dimMaps']['nddimDicts'][nddim]))) # OK return xrRecon` ``` Dict round-trip is then OK, and the dictionary can also be pushed to standard file types (contains only python native types + numpy array). ```python IO with funcs With additional tuple coord array2 = array.copy() array2['Labels'] = ('idx', ['A','B']) # Add non-dim coord array2 = array2.swap_dims({'idx':'Labels'}) # Swap dims Decon to dict safeDict = deconstructDims(array2).to_dict() Rebuild xrFromDict = reconstructDims(xr.DataArray.from_dict(safeDict)) Same as array2 (aside from added attrs) array2.attrs = xrFromDict.attrs array2.identical(xrFromDict) # True ``` Again, there is likely some cleaner/more obvious thing I'm missing here, but I'm not very familiar with Xarray or Pandas internals here - this is just where I ended up when trying to convert to HDF5 compatible datastructures in a semi-general way. (As a side-note, I ran into similar issues with `xr.DataArray.to_netcdf()` and multi-index coords, or at least did last time I tried it - but I didn't look into this further since I prefer using h5py for other reasons.)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		Error in DataArray.from_dict(data_array.to_dict()) when using pd.MultiIndex 620134014

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

1 row where issue = 620134014 and user = 4447466 sorted by updated_at descending

Stacked multidim coords > dict > recreate array - Fails

Unstack multidim coords > dict > recreate array - OK

Set non-dimensional coord

Non-dim coord case - also need to reset and drop non-dim coords

This will fail

This is OK

This is also OK

IO with funcs

With additional tuple coord

Decon to dict

Rebuild

Same as array2 (aside from added attrs)

Advanced export