home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where author_association = "CONTRIBUTOR" and user = 4447466 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 2

  • Dataset.copy(deep=True) does not deepcopy .attrs 4
  • Error in DataArray.from_dict(data_array.to_dict()) when using pd.MultiIndex 1

user 1

  • phockett · 5 ✖

author_association 1

  • CONTRIBUTOR · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1256553736 https://github.com/pydata/xarray/issues/2835#issuecomment-1256553736 https://api.github.com/repos/pydata/xarray/issues/2835 IC_kwDOAMm_X85K5X0I phockett 4447466 2022-09-23T18:47:58Z 2022-09-27T15:00:51Z CONTRIBUTOR

Just for the record, I just ran into this for the specific case of nested dictionary attrs in Dataarray.attrs.

It's definitely an issue in 2022.3.0 and 2022.6.0. Here's a minimal test example in case anyone else runs into this too...

```python

MINIMAL EXAMPLE

import xarray as xr import numpy as np

data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]}) data.attrs['flat']='0' data.attrs['nested']={'level1':'1'}

data2 = data.copy(deep=True) data2.attrs['flat']='2' # OK

data2.attrs['nested']={'level1':'2'} # OK

data2.attrs['nested']['level1'] = '2' # Fails - overwrites data

data2.attrs['nested'].update({'level1':'2'}) # Fails - overwrites data

print(data.attrs) print(data2.attrs) ```

Outputs

In XR 2022.3.0 and 2022.6.0 this gives (incorrect): ``` {'flat': '0', 'nested': {'level1': '2'}} {'flat': '2', 'nested': {'level1': '2'}}

```

As a work-around, safe attrs copy with deepcopy works:

```python data2 = data.copy(deep=True) data2.attrs = copy.deepcopy(data.attrs)

```

With correct results after modification: ``` {'flat': '0', 'nested': {'level1': '1'}} {'flat': '2', 'nested': {'level1': '2'}}

```

EDIT 26th Sept: retested in 2022.6.0 and found it was, in fact, failing there too. Updated comment to reflect this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset.copy(deep=True) does not deepcopy .attrs 423742774
1258611190 https://github.com/pydata/xarray/issues/2835#issuecomment-1258611190 https://api.github.com/repos/pydata/xarray/issues/2835 IC_kwDOAMm_X85LBOH2 phockett 4447466 2022-09-26T20:43:01Z 2022-09-26T20:43:01Z CONTRIBUTOR

Ok, I thought that copying attrs was fixed. Seems like it did not...

Sorry for the mix-up there - think I initially tested in 2022.6 with the extra data2.attrs = copy.deepcopy(data.attrs) implemented. Caught it with the new test routine 😄

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset.copy(deep=True) does not deepcopy .attrs 423742774
1258538050 https://github.com/pydata/xarray/issues/2835#issuecomment-1258538050 https://api.github.com/repos/pydata/xarray/issues/2835 IC_kwDOAMm_X85LA8RC phockett 4447466 2022-09-26T19:48:39Z 2022-09-26T19:52:14Z CONTRIBUTOR

OK, new test now pushed as #7086. (Hopefully added in the right place and style!)

A couple of additional notes:

  • Revision to my comment above: this actually fails in 2022.3 and 2022.6 for nested attribs.
  • I took a look at the source code in dataarray.py, but couldn't see an obvious way to fix this and/or didn't understand the attrs copying process generally.
  • I tested the equivalent case for DataSet attrs too (see below), and this seems fine as per your previous comments above, so I think https://github.com/pydata/xarray/pull/2839 (which includes a ds level test) still applies to ds.attrs, however the issue does affect the individual arrays within the dataset still (as expected).

```python import xarray as xr

ds = xr.Dataset({"a": (["x"], [1, 2, 3])}, attrs={"t": 1, "nested":{"t2": 1}}) ds.a.attrs = {"t": 'a1', "nested":{"t2": 'a1'}}

ds2 = ds.copy(deep=True) ds.attrs["t"] = 5 ds.attrs["nested"]["t2"] = 10

ds2.a.attrs["t"] = 'a2' ds2.a.attrs["nested"]["t2"] = 'a2'

print(ds.attrs) print(ds.a.attrs) print(ds2.attrs) print(ds2.a.attrs)

```

Results in:

``` {'t': 5, 'nested': {'t2': 10}} {'t': 'a1', 'nested': {'t2': 'a2'}} {'t': 1, 'nested': {'t2': 1}} {'t': 'a2', 'nested': {'t2': 'a2'}}

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset.copy(deep=True) does not deepcopy .attrs 423742774
1258231700 https://github.com/pydata/xarray/issues/2835#issuecomment-1258231700 https://api.github.com/repos/pydata/xarray/issues/2835 IC_kwDOAMm_X85K_xeU phockett 4447466 2022-09-26T15:38:18Z 2022-09-26T15:38:18Z CONTRIBUTOR

Absolutely @headtr1ck, glad that it was useful - I'm a bit green re: tests and PRs to large projects, but will make a stab at it. I'm just consulting the Contributing guide now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset.copy(deep=True) does not deepcopy .attrs 423742774
1163292454 https://github.com/pydata/xarray/issues/4073#issuecomment-1163292454 https://api.github.com/repos/pydata/xarray/issues/4073 IC_kwDOAMm_X85FVm8m phockett 4447466 2022-06-22T15:51:51Z 2022-06-22T16:12:34Z CONTRIBUTOR

I also ran into this when trying to serialize to dict for general file writing routines (esp. for HDF5 writing with h5py), but the issue was in my non-dimensional coordinates! I thought I was being careful by already using array.unwrap() in my IO routine, but also required .reset_index() or .drop() for non-dimensional coordinates.

Some notes below in case it is useful for anyone else trying to do this. Also - this is all quite ugly, and I may have missed some existing core functionality, so I'll be very happy to hear if there is a better way to handle this.


Following the above, a minimal example:

```python import pandas as pd import xarray as xr idx = pd.MultiIndex.from_arrays([[1, 2], [3, 4]], names=('one', 'two')) array = xr.DataArray([0, 1], dims='idx', coords={'idx': idx})

Stacked multidim coords > dict > recreate array - Fails

xr.DataArray.from_dict(array.to_dict())

Unstack multidim coords > dict > recreate array - OK

xr.DataArray.from_dict(array.unstack().to_dict())

Set non-dimensional coord

array2 = array.copy() array2['Labels'] = ('idx', ['A','B']) # Add non-dim coord array2 = array2.swap_dims({'idx':'Labels'}) # Swap dims

Non-dim coord case - also need to reset and drop non-dim coords

This will fail

array2_dict = array2.unstack().reset_index('idx').to_dict() xr.DataArray.from_dict(array2_dict)

This is OK

array2_dict = array2.unstack().reset_index('idx', drop=True).to_dict() xr.DataArray.from_dict(array2_dict)

This is also OK

array2_dict = array2.unstack().drop('idx').to_dict() xr.DataArray.from_dict(array2_dict)

```

In all cases the reconstructed array is flat, and missing non-dim coords. My work-around for this so far is to pull various mappings manually, and dump everything to .attrs, then rebuild from those if required, e.g.

```python def mapDims(data): # Get dims from Xarray dims = data.dims # Set dim list - this excludes stacked dims dimsUS = data.unstack().dims # Set unstaked (full) dim list

# List stacked dims and map
# Could also do this by type checking vs. 'pandas.core.indexes.multi.MultiIndex'?
stackedDims = list(set(dims) - set(dimsUS))
stackedDimsMap = {k: list(data.indexes[k].names) for k in stackedDims}

# Get non-dimensional coords
# These may be stacked, are not listed in self.dims, and are not addressed by .unstack()
idxKeys = list(data.indexes.keys())
coordsKeys = list(data.coords.keys())
nonDimCoords = list(set(coordsKeys) - set(idxKeys))
# nonDimCoords = list(set(dims) - set(idxKeys))

# Get non-dim indexes
# nddimIndexes = {k:data.coords[k].to_index() for k,v in data.coords.items() if k in nonDimCoords}  # Note this returns Pandas Indexes, so may fail on file IO.
nddimMap = {k:list(data.coords[k].to_index().names) for k,v in data.coords.items() if k in nonDimCoords}

# Get dict maps - to_dict per non-dim coord
#  nddimDicts = {k:data.coords[k].reset_index(k).to_dict() for k,v in data.coords.items() if k in nonDimCoords}
# Use Pandas - this allows direct dump of PD multiindex to dicts
nddimDicts = {k:data.coords[k].to_index().to_frame().to_dict() for k,v in data.coords.items() if k in nonDimCoords}
# Get coords correlated to non-dim coords, need these to recreate original links & stacking (?)
nddimDims = {k:data.coords[k].dims for k,v in data.coords.items() if k in nonDimCoords}

return {k:v for k,v in locals().items() if k !='data'}

def deconstructDims(data):

xrDecon = data.copy()

# Map dims
xrDecon.attrs['dimMaps'] = mapDims(data)

# Unstack all coords
xrDecon = xrDecon.unstack()

# Remove non-dim coords
for nddim in xrDecon.attrs['dimMaps']['nonDimCoords']:
    xrDecon = xrDecon.drop(nddim)

return xrDecon

def reconstructDims(data):

xrRecon = data.copy()

# Restack coords
for stacked in xrRecon.attrs['dimMaps']['stackedDims']:
    xrRecon = xrRecon.stack({stacked:xrRecon.attrs['dimMaps']['stackedDims']})

# General non-dim coord rebuild
for nddim in xrRecon.attrs['dimMaps']['nonDimCoords']:
    # Add nddim back into main XR array
    xrRecon.coords[nddim] = (xrRecon.attrs['dimMaps']['nddimDims'][nddim] ,pd.MultiIndex.from_frame(pd.DataFrame.from_dict(xrRecon.attrs['dimMaps']['nddimDicts'][nddim])))  # OK

return xrRecon

```

Dict round-trip is then OK, and the dictionary can also be pushed to standard file types (contains only python native types + numpy array).

```python

IO with funcs

With additional tuple coord

array2 = array.copy() array2['Labels'] = ('idx', ['A','B']) # Add non-dim coord array2 = array2.swap_dims({'idx':'Labels'}) # Swap dims

Decon to dict

safeDict = deconstructDims(array2).to_dict()

Rebuild

xrFromDict = reconstructDims(xr.DataArray.from_dict(safeDict))

Same as array2 (aside from added attrs)

array2.attrs = xrFromDict.attrs array2.identical(xrFromDict) # True

```

Again, there is likely some cleaner/more obvious thing I'm missing here, but I'm not very familiar with Xarray or Pandas internals here - this is just where I ended up when trying to convert to HDF5 compatible datastructures in a semi-general way.

(As a side-note, I ran into similar issues with xr.DataArray.to_netcdf() and multi-index coords, or at least did last time I tried it - but I didn't look into this further since I prefer using h5py for other reasons.)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error in DataArray.from_dict(data_array.to_dict()) when using pd.MultiIndex 620134014

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.905ms · About: xarray-datasette