github: issue_comments: 4 rows where issue = 620134014 sorted by updated

4 rows where issue = 620134014 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1163292454	https://github.com/pydata/xarray/issues/4073#issuecomment-1163292454	https://api.github.com/repos/pydata/xarray/issues/4073	IC_kwDOAMm_X85FVm8m	phockett 4447466	2022-06-22T15:51:51Z	2022-06-22T16:12:34Z	CONTRIBUTOR	I also ran into this when trying to serialize to `dict` for general file writing routines (esp. for HDF5 writing with h5py), but the issue was in my non-dimensional coordinates! I thought I was being careful by already using `array.unwrap()` in my IO routine, but also required `.reset_index()` or `.drop()` for non-dimensional coordinates. Some notes below in case it is useful for anyone else trying to do this. Also - this is all quite ugly, and I may have missed some existing core functionality, so I'll be very happy to hear if there is a better way to handle this. Following the above, a minimal example: ```python import pandas as pd import xarray as xr idx = pd.MultiIndex.from_arrays([[1, 2], [3, 4]], names=('one', 'two')) array = xr.DataArray([0, 1], dims='idx', coords={'idx': idx}) Stacked multidim coords > dict > recreate array - Fails xr.DataArray.from_dict(array.to_dict()) Unstack multidim coords > dict > recreate array - OK xr.DataArray.from_dict(array.unstack().to_dict()) Set non-dimensional coord array2 = array.copy() array2['Labels'] = ('idx', ['A','B']) # Add non-dim coord array2 = array2.swap_dims({'idx':'Labels'}) # Swap dims Non-dim coord case - also need to reset and drop non-dim coords This will fail array2_dict = array2.unstack().reset_index('idx').to_dict() xr.DataArray.from_dict(array2_dict) This is OK array2_dict = array2.unstack().reset_index('idx', drop=True).to_dict() xr.DataArray.from_dict(array2_dict) This is also OK array2_dict = array2.unstack().drop('idx').to_dict() xr.DataArray.from_dict(array2_dict) ``` In all cases the reconstructed array is flat, and missing non-dim coords. My work-around for this so far is to pull various mappings manually, and dump everything to `.attrs`, then rebuild from those if required, e.g. ```python def mapDims(data): # Get dims from Xarray dims = data.dims # Set dim list - this excludes stacked dims dimsUS = data.unstack().dims # Set unstaked (full) dim list # List stacked dims and map # Could also do this by type checking vs. 'pandas.core.indexes.multi.MultiIndex'? stackedDims = list(set(dims) - set(dimsUS)) stackedDimsMap = {k: list(data.indexes[k].names) for k in stackedDims} # Get non-dimensional coords # These may be stacked, are not listed in self.dims, and are not addressed by .unstack() idxKeys = list(data.indexes.keys()) coordsKeys = list(data.coords.keys()) nonDimCoords = list(set(coordsKeys) - set(idxKeys)) # nonDimCoords = list(set(dims) - set(idxKeys)) # Get non-dim indexes # nddimIndexes = {k:data.coords[k].to_index() for k,v in data.coords.items() if k in nonDimCoords} # Note this returns Pandas Indexes, so may fail on file IO. nddimMap = {k:list(data.coords[k].to_index().names) for k,v in data.coords.items() if k in nonDimCoords} # Get dict maps - to_dict per non-dim coord # nddimDicts = {k:data.coords[k].reset_index(k).to_dict() for k,v in data.coords.items() if k in nonDimCoords} # Use Pandas - this allows direct dump of PD multiindex to dicts nddimDicts = {k:data.coords[k].to_index().to_frame().to_dict() for k,v in data.coords.items() if k in nonDimCoords} # Get coords correlated to non-dim coords, need these to recreate original links & stacking (?) nddimDims = {k:data.coords[k].dims for k,v in data.coords.items() if k in nonDimCoords} return {k:v for k,v in locals().items() if k !='data'} def deconstructDims(data): `xrDecon = data.copy() # Map dims xrDecon.attrs['dimMaps'] = mapDims(data) # Unstack all coords xrDecon = xrDecon.unstack() # Remove non-dim coords for nddim in xrDecon.attrs['dimMaps']['nonDimCoords']: xrDecon = xrDecon.drop(nddim) return xrDecon` def reconstructDims(data): `xrRecon = data.copy() # Restack coords for stacked in xrRecon.attrs['dimMaps']['stackedDims']: xrRecon = xrRecon.stack({stacked:xrRecon.attrs['dimMaps']['stackedDims']}) # General non-dim coord rebuild for nddim in xrRecon.attrs['dimMaps']['nonDimCoords']: # Add nddim back into main XR array xrRecon.coords[nddim] = (xrRecon.attrs['dimMaps']['nddimDims'][nddim] ,pd.MultiIndex.from_frame(pd.DataFrame.from_dict(xrRecon.attrs['dimMaps']['nddimDicts'][nddim]))) # OK return xrRecon` ``` Dict round-trip is then OK, and the dictionary can also be pushed to standard file types (contains only python native types + numpy array). ```python IO with funcs With additional tuple coord array2 = array.copy() array2['Labels'] = ('idx', ['A','B']) # Add non-dim coord array2 = array2.swap_dims({'idx':'Labels'}) # Swap dims Decon to dict safeDict = deconstructDims(array2).to_dict() Rebuild xrFromDict = reconstructDims(xr.DataArray.from_dict(safeDict)) Same as array2 (aside from added attrs) array2.attrs = xrFromDict.attrs array2.identical(xrFromDict) # True ``` Again, there is likely some cleaner/more obvious thing I'm missing here, but I'm not very familiar with Xarray or Pandas internals here - this is just where I ended up when trying to convert to HDF5 compatible datastructures in a semi-general way. (As a side-note, I ran into similar issues with `xr.DataArray.to_netcdf()` and multi-index coords, or at least did last time I tried it - but I didn't look into this further since I prefer using h5py for other reasons.)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Error in DataArray.from_dict(data_array.to_dict()) when using pd.MultiIndex 620134014
1102116177	https://github.com/pydata/xarray/issues/4073#issuecomment-1102116177	https://api.github.com/repos/pydata/xarray/issues/4073	IC_kwDOAMm_X85BsPVR	max-sixty 5635139	2022-04-19T05:48:31Z	2022-04-19T05:48:31Z	MEMBER	This is indeed not ideal. I'm not sure we have great round-trip support for `dict` but I'd hope this would be possible to make work. I'll mark as a bug. Any PRs working towards this would be greatly appreciated. (and apologies your excellent issue didn't get picked up @genric )	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Error in DataArray.from_dict(data_array.to_dict()) when using pd.MultiIndex 620134014
1102108752	https://github.com/pydata/xarray/issues/4073#issuecomment-1102108752	https://api.github.com/repos/pydata/xarray/issues/4073	IC_kwDOAMm_X85BsNhQ	stale[bot] 26384082	2022-04-19T05:43:50Z	2022-04-19T05:43:50Z	NONE	In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the `stale` label; otherwise it will be marked as closed automatically	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Error in DataArray.from_dict(data_array.to_dict()) when using pd.MultiIndex 620134014
641786386	https://github.com/pydata/xarray/issues/4073#issuecomment-641786386	https://api.github.com/repos/pydata/xarray/issues/4073	MDEyOklzc3VlQ29tbWVudDY0MTc4NjM4Ng==	genric 13747844	2020-06-10T07:21:56Z	2020-06-10T07:21:56Z	NONE	Didn't know that it is such a fundamental problem. Managed to get around with set/reset index.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Error in DataArray.from_dict(data_array.to_dict()) when using pd.MultiIndex 620134014

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

4 rows where issue = 620134014 sorted by updated_at descending

Stacked multidim coords > dict > recreate array - Fails

Unstack multidim coords > dict > recreate array - OK

Set non-dimensional coord

Non-dim coord case - also need to reset and drop non-dim coords

This will fail

This is OK

This is also OK

IO with funcs

With additional tuple coord

Decon to dict

Rebuild

Same as array2 (aside from added attrs)

Advanced export