home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

23 rows where issue = 1392878100 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 7

  • djhoese 11
  • headtr1ck 6
  • rhkleijn 2
  • mraspaud 1
  • dcherian 1
  • max-sixty 1
  • TomNicholas 1

author_association 3

  • CONTRIBUTOR 14
  • COLLABORATOR 6
  • MEMBER 3

issue 1

  • New deep copy behavior in 2022.9.0 causes maximum recursion error · 23 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1267531839 https://github.com/pydata/xarray/issues/7111#issuecomment-1267531839 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LjQA_ djhoese 1828519 2022-10-04T20:20:19Z 2022-10-04T20:20:19Z CONTRIBUTOR

We talked about this today in our pytroll/satpy meeting. We're not sure we agree with cf-xarray putting ancillary variables as coordinates or that it will work for us, so we think we could eventually remove any "automatic" ancillary variable loading and require that the user explicitly request any ancillary variables they want from Satpy's readers.

That said, this will take a lot of work to change. Since it seems like #7112 fixes a majority of our issues I'm hoping that that can still be merged. I'd hope that the memo logic when deepcopying will still protect against other recursive objects (possibly optimize?) even if they can't be directly serialized to NetCDF.

Side note: I feel like there is a difference between the NetCDF model and serializing/saving to a NetCDF file.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1266889779 https://github.com/pydata/xarray/issues/7111#issuecomment-1266889779 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LgzQz djhoese 1828519 2022-10-04T12:07:12Z 2022-10-04T12:07:37Z CONTRIBUTOR

@mraspaud See the cf-xarray link from Deepak. We could make them coordinates. Or we could reference them by name:

ds = xr.open_dataset(...) anc_name = ds["my_var"].attrs["ancillary_variables"][0] anc_var = ds[anc_name]

Edit: Let's talk more in the pytroll meeting today.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1266649409 https://github.com/pydata/xarray/issues/7111#issuecomment-1266649409 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85Lf4lB headtr1ck 43316012 2022-10-04T09:18:25Z 2022-10-04T09:18:25Z COLLABORATOR

I think the behavior of deepcopy in #7112 is correct. I you really want to prevent the ancillary_variables attrs to be deep-copied as well, you can try to add it to the memo dict in deepcopy, e.g.: python memo = {id(da.attrs["ancillary_variables"]): da.attrs["ancillary_variables"]} da_new = deepcopy(da, memo) (untested!)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1266619173 https://github.com/pydata/xarray/issues/7111#issuecomment-1266619173 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LfxMl mraspaud 167802 2022-10-04T08:53:46Z 2022-10-04T08:54:12Z CONTRIBUTOR

Thanks for pinging me. Regarding the ancillary variables, this comes from the CF conventions, allowing to "link" two or more arrays together. For example, we might have a radiance array, with quality_flags as an ancillary variable array, that characterises the quality of each radiance pixel. Now, in netcdf/CF, the ancillary variables are just references, but the logical way to do this in xarray is to use an ancillary_variables attribute to a DataArray. I'm not sure how we could do it in another way.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1265798416 https://github.com/pydata/xarray/issues/7111#issuecomment-1265798416 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85Lco0Q dcherian 2448579 2022-10-03T17:32:45Z 2022-10-03T19:29:28Z MEMBER

The ancillary variables stuff doesn't really fit the DataArray data model, so you have to do something.

Here's an example with Dataset and cf_xarray using the ancillary_variables attribute https://cf-xarray.readthedocs.io/en/latest/selecting.html#associated-variables

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1265910157 https://github.com/pydata/xarray/issues/7111#issuecomment-1265910157 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LdEGN djhoese 1828519 2022-10-03T19:13:58Z 2022-10-03T19:13:58Z CONTRIBUTOR

@dcherian Thanks for the feedback. When these decisions were made in Satpy xarray was not able to contain dask arrays as coordinates and we depend heavily on dask for our use cases. Putting some of these datasets as coordinates as cf xarray does may have caused extra unnecessary loading/computation. I'm not sure that would be the case with modern xarray.

Note that ancillary_variables are not the only case of "embedded" DataArrays in our code. We also needed something for CRS + bounds or other geolocation information. As you know I'm very much interested in CRS and geolocation handling in xarray, but for backwards compatibility we also have pyresample AreaDefinition and SwathDefinition objects in our DataArray .attrs["area"] attributes. A SwathDefinition is able to contain two DataArray objects for longitude and latitude. These also get copied with this new deep copy behavior.

We have a monthly Pytroll/Satpy meeting tomorrow so if you have any other suggestions or points for or against our usage please comment here and we'll see what we can do.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1265792923 https://github.com/pydata/xarray/issues/7111#issuecomment-1265792923 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85Lcneb djhoese 1828519 2022-10-03T17:27:55Z 2022-10-03T17:27:55Z CONTRIBUTOR

@TomNicholas Do you mean the "name" of the sub-DataArray? Or the numpy/dask array of the sub-DataArray?

This is what I was trying to describe in https://github.com/pydata/xarray/issues/7111#issuecomment-1264386173. In Satpy we have our own Dataset-like/DataTree-like object where the user explicitly says "I want to load X from input files". As a convenience we put any ancillary variables (ex. data quality flags) in the DataArray .attrs for easier access. In Satpy there is no other direct connection between one DataArray and another. They are overall independent objects on a processing level so there may not be access to this higher-level Dataset-like container object in order to get ancillary variables by name.

@mraspaud was one of the original people who proposed our current design so maybe he can provide more context.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1265735072 https://github.com/pydata/xarray/issues/7111#issuecomment-1265735072 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LcZWg TomNicholas 35968931 2022-10-03T16:41:37Z 2022-10-03T16:41:37Z MEMBER

I was never a huge fan of putting a DataArray in the attrs of another DataArray, but nothing seemed to disallow it so I ultimately lost that argument.

Out of curiosity, why do you need to store a DataArray object as opposed to merely the values in one?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1264515251 https://github.com/pydata/xarray/issues/7111#issuecomment-1264515251 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LXviz djhoese 1828519 2022-10-02T00:25:36Z 2022-10-02T00:41:00Z CONTRIBUTOR

Sorry, false alarm. I was running with an old environment. With this new PR it seems the ancillary_variables tests that were failing now pass, but the dask .copy() related ones still fail...which is expected so I'm ok with that.

Edit: I hacked variable.py so it had this:

if deep: if is_duck_dask_array(ndata): ndata = ndata else: ndata = copy.deepcopy(ndata, memo)

and that fixed a lot of my dask related tests, but also seems to have introduced two new failures from what I can tell. So :man_shrugging:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1264437857 https://github.com/pydata/xarray/issues/7111#issuecomment-1264437857 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LXcph djhoese 1828519 2022-10-01T18:01:16Z 2022-10-01T18:01:16Z CONTRIBUTOR

It looks like that PR fixes all of my Satpy unit tests. I'm not sure how that is possible if it doesn't also change when dask arrays are copied.

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 2,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1264398329 https://github.com/pydata/xarray/issues/7111#issuecomment-1264398329 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LXS_5 headtr1ck 43316012 2022-10-01T15:27:37Z 2022-10-01T15:27:37Z COLLABORATOR

I added a PR that fixes the broken reprs and deepcopys. The other issues are not addressed yet.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1264388047 https://github.com/pydata/xarray/issues/7111#issuecomment-1264388047 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LXQfP djhoese 1828519 2022-10-01T14:56:45Z 2022-10-01T14:56:45Z CONTRIBUTOR

Also note the other important change in this new behavior which is that dask arrays are now copied (.copy()) when they weren't before. This is causing some equality issues for us in Satpy, but I agree with the change on xarray's side (xarray should be able to call .copy() on whatever array it has.

https://github.com/dask/dask/issues/9533

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1264386173 https://github.com/pydata/xarray/issues/7111#issuecomment-1264386173 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LXQB9 djhoese 1828519 2022-10-01T14:47:51Z 2022-10-01T14:47:51Z CONTRIBUTOR

I'm a little torn on this. Obviously I'm not an xarray maintainer so I'm not the one who would have to maintain it or answer support questions about it. We actually had the user-side of this discussion in the Satpy library group a while ago which is leading to this whole problem for us now. In Satpy we don't typically use or deal with xarray Datasets (the new DataTree library is likely what we'll move to) so when we have relationships between DataArrays we'll use something like ancillary variables to connect them. For example, a data quality flag that is used by the other variables in a file. Our users don't usually care about the DQF but we don't want to stop them from being able to easily access it. I was never a huge fan of putting a DataArray in the attrs of another DataArray, but nothing seemed to disallow it so I ultimately lost that argument.

So on one hand I agree it seems like there shouldn't be a need in most cases to have a DataArray inside a DataArray, especially a circular dependency. On the other hand, I'm not looking forward to the updates I'll need to make to Satpy to fix this. Note, we don't do this everywhere in Satpy, just something we use for a few formats we read.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1264335676 https://github.com/pydata/xarray/issues/7111#issuecomment-1264335676 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LXDs8 headtr1ck 43316012 2022-10-01T11:28:53Z 2022-10-01T11:28:53Z COLLABORATOR

Ok, even xarray.testing.assert_identical fails with recursive definitions. Are we sure that it is a good idea to support this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1264335114 https://github.com/pydata/xarray/issues/7111#issuecomment-1264335114 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LXDkK headtr1ck 43316012 2022-10-01T11:25:42Z 2022-10-01T11:25:42Z COLLABORATOR

I will set up a PR for that. Another issue has arisen: the repr is also broken for recursive data. With your example python should also raise a RecursionError when looking at this data?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1264322604 https://github.com/pydata/xarray/issues/7111#issuecomment-1264322604 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LXAgs rhkleijn 32801740 2022-10-01T10:40:48Z 2022-10-01T10:40:48Z CONTRIBUTOR

To avoid code duplication you may consider moving all logic from the copy methods to new _copy methods and extending that with an optional memo argument and have the copy, __copy__ and __deepcopy__ methods as thin wrappers around it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1264272446 https://github.com/pydata/xarray/issues/7111#issuecomment-1264272446 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LW0Q- headtr1ck 43316012 2022-10-01T07:08:53Z 2022-10-01T08:35:24Z COLLABORATOR

I think our implementations of copy(deep=True) and __deepcopy__ are reverted, the first should call the latter and not the other way around to be able to pass the memo dict.

This will lead to a bit of duplicate code between __copy__ and __deepcopy__ but would be the correct way.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1264014472 https://github.com/pydata/xarray/issues/7111#issuecomment-1264014472 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LV1SI rhkleijn 32801740 2022-09-30T20:52:28Z 2022-09-30T20:54:25Z CONTRIBUTOR

Is there some feature that python uses to check whether a data structure is recursive when it's copying, which we're not taking advantage of? I can look more later.

yes, def __deepcopy__(self, memo=None) has the memo argument exactly for the purpose of dealing with recursion, see https://docs.python.org/3/library/copy.html. Currently, xarray's __deepcopy__ methods do not pass on the memo argument when deepcopying its components.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1263967757 https://github.com/pydata/xarray/issues/7111#issuecomment-1263967757 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LVp4N max-sixty 5635139 2022-09-30T19:59:05Z 2022-09-30T19:59:05Z MEMBER

Hmmm, python seems to deal with this reasonably for its builtins:

```python In [1]: a = [1]

In [2]: b = [a]

In [3]: a.append(b)

In [4]: import copy

In [5]: copy.deepcopy(a) Out[5]: [1, [[...]]] ```

I doubt this is getting hit that much given it requires a recursive data structure, but it does seem like a gnarly error.

Is there some feature that python uses to check whether a data structure is recursive when it's copying, which we're not taking advantage of? I can look more later.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1263967009 https://github.com/pydata/xarray/issues/7111#issuecomment-1263967009 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LVpsh djhoese 1828519 2022-09-30T19:58:11Z 2022-09-30T19:58:11Z CONTRIBUTOR

I'd have to check, but this structure I think was originally produce by xarray reading a CF compliant NetCDF file. That is my memory at least. It could be that our library (satpy) is doing this as a convenience, replacing the name of an ancillary variable with the DataArray of that ancillary variable.

My other new issue seems to be related to .copy() doing a .copy() on dask arrays which then makes them not equivalent anymore. Working on an MVCE now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1263956728 https://github.com/pydata/xarray/issues/7111#issuecomment-1263956728 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LVnL4 headtr1ck 43316012 2022-09-30T19:45:57Z 2022-09-30T19:45:57Z COLLABORATOR

I basically copied the behavior of Dataset.copy which should already show this problem. In principle we are doing a new_attrs = copy.deepcopy(attrs).

I would claim that the new behavior is correct, but maybe other devs can confirm this.

Coming from netCDF, it does not really make sense to put complex objects in attrs, but I guess for in-memory only it works.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1263952252 https://github.com/pydata/xarray/issues/7111#issuecomment-1263952252 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LVmF8 djhoese 1828519 2022-09-30T19:41:30Z 2022-09-30T19:41:30Z CONTRIBUTOR

I get a similar error for different structures and if I do something like data_arr.where(data_arr > 5, drop=True). In this case I have dask array based DataArrays and dask ends up trying to hash the object and it ends up in a loop trying to get xarray to hash the DataArray or something and xarray trying to hash the DataArrays inside .attrs.

``` In [9]: import dask.array as da

In [15]: a = xr.DataArray(da.zeros(5.0), attrs={}, dims=("a_dim",))

In [16]: b = xr.DataArray(da.zeros(8.0), attrs={}, dims=("b_dim",))

In [20]: a.attrs["other"] = b

In [24]: lons = xr.DataArray(da.random.random(8), attrs={"ancillary_variables": [b]})

In [25]: lats = xr.DataArray(da.random.random(8), attrs={"ancillary_variables": [b]})

In [26]: b.attrs["some_attr"] = [lons, lats]

In [27]: cond = a > 5

In [28]: c = a.where(cond, drop=True) ... File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/dask/utils.py:1982, in _HashIdWrapper.hash(self) 1981 def hash(self): -> 1982 return id(self.wrapped)

RecursionError: maximum recursion depth exceeded while calling a Python object

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1263927640 https://github.com/pydata/xarray/issues/7111#issuecomment-1263927640 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LVgFY djhoese 1828519 2022-09-30T19:14:48Z 2022-09-30T19:14:48Z CONTRIBUTOR

CC @headtr1ck any idea if this is supposed to work with your new #7089?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 316.87ms · About: xarray-datasette