github: issue_comments: 5 rows where author_association = "CONTRIBUTOR" and user = 4447466 sorted by updated

5 rows where author_association = "CONTRIBUTOR" and user = 4447466 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1256553736	https://github.com/pydata/xarray/issues/2835#issuecomment-1256553736	https://api.github.com/repos/pydata/xarray/issues/2835	IC_kwDOAMm_X85K5X0I	phockett 4447466	2022-09-23T18:47:58Z	2022-09-27T15:00:51Z	CONTRIBUTOR	Just for the record, I just ran into this for the specific case of nested dictionary attrs in Dataarray.attrs. It's definitely an issue in 2022.3.0 and 2022.6.0. Here's a minimal test example in case anyone else runs into this too... ```python MINIMAL EXAMPLE import xarray as xr import numpy as np data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]}) data.attrs['flat']='0' data.attrs['nested']={'level1':'1'} data2 = data.copy(deep=True) data2.attrs['flat']='2' # OK data2.attrs['nested']={'level1':'2'} # OK data2.attrs['nested']['level1'] = '2' # Fails - overwrites data data2.attrs['nested'].update({'level1':'2'}) # Fails - overwrites data print(data.attrs) print(data2.attrs) ``` Outputs In XR 2022.3.0 and 2022.6.0 this gives (incorrect): ``` {'flat': '0', 'nested': {'level1': '2'}} {'flat': '2', 'nested': {'level1': '2'}} ``` As a work-around, safe attrs copy with deepcopy works: ```python data2 = data.copy(deep=True) data2.attrs = copy.deepcopy(data.attrs) ``` With correct results after modification: ``` {'flat': '0', 'nested': {'level1': '1'}} {'flat': '2', 'nested': {'level1': '2'}} ``` EDIT 26th Sept: retested in 2022.6.0 and found it was, in fact, failing there too. Updated comment to reflect this.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Dataset.copy(deep=True) does not deepcopy .attrs 423742774
1258611190	https://github.com/pydata/xarray/issues/2835#issuecomment-1258611190	https://api.github.com/repos/pydata/xarray/issues/2835	IC_kwDOAMm_X85LBOH2	phockett 4447466	2022-09-26T20:43:01Z	2022-09-26T20:43:01Z	CONTRIBUTOR	Ok, I thought that copying attrs was fixed. Seems like it did not... Sorry for the mix-up there - think I initially tested in 2022.6 with the extra `data2.attrs = copy.deepcopy(data.attrs)` implemented. Caught it with the new test routine 😄	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Dataset.copy(deep=True) does not deepcopy .attrs 423742774
1258538050	https://github.com/pydata/xarray/issues/2835#issuecomment-1258538050	https://api.github.com/repos/pydata/xarray/issues/2835	IC_kwDOAMm_X85LA8RC	phockett 4447466	2022-09-26T19:48:39Z	2022-09-26T19:52:14Z	CONTRIBUTOR	OK, new test now pushed as #7086. (Hopefully added in the right place and style!) A couple of additional notes: Revision to my comment above: this actually fails in 2022.3 and 2022.6 for nested attribs. I took a look at the source code in `dataarray.py`, but couldn't see an obvious way to fix this and/or didn't understand the attrs copying process generally. I tested the equivalent case for DataSet attrs too (see below), and this seems fine as per your previous comments above, so I think https://github.com/pydata/xarray/pull/2839 (which includes a ds level test) still applies to `ds.attrs`, however the issue does affect the individual arrays within the dataset still (as expected). ```python import xarray as xr ds = xr.Dataset({"a": (["x"], [1, 2, 3])}, attrs={"t": 1, "nested":{"t2": 1}}) ds.a.attrs = {"t": 'a1', "nested":{"t2": 'a1'}} ds2 = ds.copy(deep=True) ds.attrs["t"] = 5 ds.attrs["nested"]["t2"] = 10 ds2.a.attrs["t"] = 'a2' ds2.a.attrs["nested"]["t2"] = 'a2' print(ds.attrs) print(ds.a.attrs) print(ds2.attrs) print(ds2.a.attrs) ``` Results in: ``` {'t': 5, 'nested': {'t2': 10}} {'t': 'a1', 'nested': {'t2': 'a2'}} {'t': 1, 'nested': {'t2': 1}} {'t': 'a2', 'nested': {'t2': 'a2'}} ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Dataset.copy(deep=True) does not deepcopy .attrs 423742774
1258231700	https://github.com/pydata/xarray/issues/2835#issuecomment-1258231700	https://api.github.com/repos/pydata/xarray/issues/2835	IC_kwDOAMm_X85K_xeU	phockett 4447466	2022-09-26T15:38:18Z	2022-09-26T15:38:18Z	CONTRIBUTOR	Absolutely @headtr1ck, glad that it was useful - I'm a bit green re: tests and PRs to large projects, but will make a stab at it. I'm just consulting the Contributing guide now.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Dataset.copy(deep=True) does not deepcopy .attrs 423742774
1163292454	https://github.com/pydata/xarray/issues/4073#issuecomment-1163292454	https://api.github.com/repos/pydata/xarray/issues/4073	IC_kwDOAMm_X85FVm8m	phockett 4447466	2022-06-22T15:51:51Z	2022-06-22T16:12:34Z	CONTRIBUTOR	I also ran into this when trying to serialize to `dict` for general file writing routines (esp. for HDF5 writing with h5py), but the issue was in my non-dimensional coordinates! I thought I was being careful by already using `array.unwrap()` in my IO routine, but also required `.reset_index()` or `.drop()` for non-dimensional coordinates. Some notes below in case it is useful for anyone else trying to do this. Also - this is all quite ugly, and I may have missed some existing core functionality, so I'll be very happy to hear if there is a better way to handle this. Following the above, a minimal example: ```python import pandas as pd import xarray as xr idx = pd.MultiIndex.from_arrays([[1, 2], [3, 4]], names=('one', 'two')) array = xr.DataArray([0, 1], dims='idx', coords={'idx': idx}) Stacked multidim coords > dict > recreate array - Fails xr.DataArray.from_dict(array.to_dict()) Unstack multidim coords > dict > recreate array - OK xr.DataArray.from_dict(array.unstack().to_dict()) Set non-dimensional coord array2 = array.copy() array2['Labels'] = ('idx', ['A','B']) # Add non-dim coord array2 = array2.swap_dims({'idx':'Labels'}) # Swap dims Non-dim coord case - also need to reset and drop non-dim coords This will fail array2_dict = array2.unstack().reset_index('idx').to_dict() xr.DataArray.from_dict(array2_dict) This is OK array2_dict = array2.unstack().reset_index('idx', drop=True).to_dict() xr.DataArray.from_dict(array2_dict) This is also OK array2_dict = array2.unstack().drop('idx').to_dict() xr.DataArray.from_dict(array2_dict) ``` In all cases the reconstructed array is flat, and missing non-dim coords. My work-around for this so far is to pull various mappings manually, and dump everything to `.attrs`, then rebuild from those if required, e.g. ```python def mapDims(data): # Get dims from Xarray dims = data.dims # Set dim list - this excludes stacked dims dimsUS = data.unstack().dims # Set unstaked (full) dim list # List stacked dims and map # Could also do this by type checking vs. 'pandas.core.indexes.multi.MultiIndex'? stackedDims = list(set(dims) - set(dimsUS)) stackedDimsMap = {k: list(data.indexes[k].names) for k in stackedDims} # Get non-dimensional coords # These may be stacked, are not listed in self.dims, and are not addressed by .unstack() idxKeys = list(data.indexes.keys()) coordsKeys = list(data.coords.keys()) nonDimCoords = list(set(coordsKeys) - set(idxKeys)) # nonDimCoords = list(set(dims) - set(idxKeys)) # Get non-dim indexes # nddimIndexes = {k:data.coords[k].to_index() for k,v in data.coords.items() if k in nonDimCoords} # Note this returns Pandas Indexes, so may fail on file IO. nddimMap = {k:list(data.coords[k].to_index().names) for k,v in data.coords.items() if k in nonDimCoords} # Get dict maps - to_dict per non-dim coord # nddimDicts = {k:data.coords[k].reset_index(k).to_dict() for k,v in data.coords.items() if k in nonDimCoords} # Use Pandas - this allows direct dump of PD multiindex to dicts nddimDicts = {k:data.coords[k].to_index().to_frame().to_dict() for k,v in data.coords.items() if k in nonDimCoords} # Get coords correlated to non-dim coords, need these to recreate original links & stacking (?) nddimDims = {k:data.coords[k].dims for k,v in data.coords.items() if k in nonDimCoords} return {k:v for k,v in locals().items() if k !='data'} def deconstructDims(data): `xrDecon = data.copy() # Map dims xrDecon.attrs['dimMaps'] = mapDims(data) # Unstack all coords xrDecon = xrDecon.unstack() # Remove non-dim coords for nddim in xrDecon.attrs['dimMaps']['nonDimCoords']: xrDecon = xrDecon.drop(nddim) return xrDecon` def reconstructDims(data): `xrRecon = data.copy() # Restack coords for stacked in xrRecon.attrs['dimMaps']['stackedDims']: xrRecon = xrRecon.stack({stacked:xrRecon.attrs['dimMaps']['stackedDims']}) # General non-dim coord rebuild for nddim in xrRecon.attrs['dimMaps']['nonDimCoords']: # Add nddim back into main XR array xrRecon.coords[nddim] = (xrRecon.attrs['dimMaps']['nddimDims'][nddim] ,pd.MultiIndex.from_frame(pd.DataFrame.from_dict(xrRecon.attrs['dimMaps']['nddimDicts'][nddim]))) # OK return xrRecon` ``` Dict round-trip is then OK, and the dictionary can also be pushed to standard file types (contains only python native types + numpy array). ```python IO with funcs With additional tuple coord array2 = array.copy() array2['Labels'] = ('idx', ['A','B']) # Add non-dim coord array2 = array2.swap_dims({'idx':'Labels'}) # Swap dims Decon to dict safeDict = deconstructDims(array2).to_dict() Rebuild xrFromDict = reconstructDims(xr.DataArray.from_dict(safeDict)) Same as array2 (aside from added attrs) array2.attrs = xrFromDict.attrs array2.identical(xrFromDict) # True ``` Again, there is likely some cleaner/more obvious thing I'm missing here, but I'm not very familiar with Xarray or Pandas internals here - this is just where I ended up when trying to convert to HDF5 compatible datastructures in a semi-general way. (As a side-note, I ran into similar issues with `xr.DataArray.to_netcdf()` and multi-index coords, or at least did last time I tried it - but I didn't look into this further since I prefer using h5py for other reasons.)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Error in DataArray.from_dict(data_array.to_dict()) when using pd.MultiIndex 620134014

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

5 rows where author_association = "CONTRIBUTOR" and user = 4447466 sorted by updated_at descending

MINIMAL EXAMPLE

data2.attrs['nested']={'level1':'2'} # OK

data2.attrs['nested']['level1'] = '2' # Fails - overwrites data

Stacked multidim coords > dict > recreate array - Fails

Unstack multidim coords > dict > recreate array - OK

Set non-dimensional coord

Non-dim coord case - also need to reset and drop non-dim coords

This will fail

This is OK

This is also OK

IO with funcs

With additional tuple coord

Decon to dict

Rebuild

Same as array2 (aside from added attrs)

Advanced export