html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/7111#issuecomment-1267531839,https://api.github.com/repos/pydata/xarray/issues/7111,1267531839,IC_kwDOAMm_X85LjQA_,1828519,2022-10-04T20:20:19Z,2022-10-04T20:20:19Z,CONTRIBUTOR,"We talked about this today in our pytroll/satpy meeting. We're not sure we agree with cf-xarray putting ancillary variables as coordinates or that it will work for us, so we think we could eventually remove any ""automatic"" ancillary variable loading and require that the user explicitly request any ancillary variables they want from Satpy's readers. That said, this will take a lot of work to change. Since it seems like #7112 fixes a majority of our issues I'm hoping that that can still be merged. I'd hope that the `memo` logic when deepcopying will still protect against other recursive objects (possibly optimize?) even if they can't be directly serialized to NetCDF. Side note: I feel like there is a difference between the NetCDF model and serializing/saving to a NetCDF file.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1266889779,https://api.github.com/repos/pydata/xarray/issues/7111,1266889779,IC_kwDOAMm_X85LgzQz,1828519,2022-10-04T12:07:12Z,2022-10-04T12:07:37Z,CONTRIBUTOR,"@mraspaud See the cf-xarray link from Deepak. We could make them coordinates. Or we could reference them by name: ``` ds = xr.open_dataset(...) anc_name = ds[""my_var""].attrs[""ancillary_variables""][0] anc_var = ds[anc_name] ``` Edit: Let's talk more in the pytroll meeting today.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1266649409,https://api.github.com/repos/pydata/xarray/issues/7111,1266649409,IC_kwDOAMm_X85Lf4lB,43316012,2022-10-04T09:18:25Z,2022-10-04T09:18:25Z,COLLABORATOR,"I think the behavior of deepcopy in #7112 is correct. I you really want to prevent the `ancillary_variables` attrs to be deep-copied as well, you can try to add it to the memo dict in deepcopy, e.g.: ```python memo = {id(da.attrs[""ancillary_variables""]): da.attrs[""ancillary_variables""]} da_new = deepcopy(da, memo) ``` (untested!)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1266619173,https://api.github.com/repos/pydata/xarray/issues/7111,1266619173,IC_kwDOAMm_X85LfxMl,167802,2022-10-04T08:53:46Z,2022-10-04T08:54:12Z,CONTRIBUTOR,"Thanks for pinging me. Regarding the ancillary variables, this comes from the CF conventions, allowing to ""link"" two or more arrays together. For example, we might have a `radiance` array, with `quality_flags` as an ancillary variable array, that characterises the quality of each radiance pixel. Now, in netcdf/CF, the ancillary variables are just references, but the logical way to do this in xarray is to use an `ancillary_variables` attribute to a `DataArray`. I'm not sure how we could do it in another way.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1265798416,https://api.github.com/repos/pydata/xarray/issues/7111,1265798416,IC_kwDOAMm_X85Lco0Q,2448579,2022-10-03T17:32:45Z,2022-10-03T19:29:28Z,MEMBER,"The ancillary variables stuff doesn't really fit the DataArray data model, so you have to do something. Here's an example with `Dataset` and `cf_xarray` using the `ancillary_variables` attribute https://cf-xarray.readthedocs.io/en/latest/selecting.html#associated-variables","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1265910157,https://api.github.com/repos/pydata/xarray/issues/7111,1265910157,IC_kwDOAMm_X85LdEGN,1828519,2022-10-03T19:13:58Z,2022-10-03T19:13:58Z,CONTRIBUTOR,"@dcherian Thanks for the feedback. When these decisions were made in Satpy xarray was not able to contain dask arrays as coordinates and we depend heavily on dask for our use cases. Putting some of these datasets as `coordinates` as cf xarray does may have caused extra unnecessary loading/computation. I'm not sure that would be the case with modern xarray. Note that ancillary_variables are not the only case of ""embedded"" DataArrays in our code. We also needed something for CRS + bounds or other geolocation information. As you know I'm very much interested in CRS and geolocation handling in xarray, but for backwards compatibility we also have pyresample AreaDefinition and SwathDefinition objects in our DataArray `.attrs[""area""]` attributes. A `SwathDefinition` is able to contain two `DataArray` objects for longitude and latitude. These also get copied with this new deep copy behavior. We have a monthly Pytroll/Satpy meeting tomorrow so if you have any other suggestions or points for or against our usage please comment here and we'll see what we can do.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1265792923,https://api.github.com/repos/pydata/xarray/issues/7111,1265792923,IC_kwDOAMm_X85Lcneb,1828519,2022-10-03T17:27:55Z,2022-10-03T17:27:55Z,CONTRIBUTOR,"@TomNicholas Do you mean the ""name"" of the sub-DataArray? Or the numpy/dask array of the sub-DataArray? This is what I was trying to describe in https://github.com/pydata/xarray/issues/7111#issuecomment-1264386173. In Satpy we have our own Dataset-like/DataTree-like object where the user explicitly says ""I want to load X from input files"". As a convenience we put any ancillary variables (ex. data quality flags) in the DataArray `.attrs` for easier access. In Satpy there is no other direct connection between one DataArray and another. They are overall independent objects on a processing level so there may not be access to this higher-level Dataset-like container object in order to get ancillary variables by name. @mraspaud was one of the original people who proposed our current design so maybe he can provide more context.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1265735072,https://api.github.com/repos/pydata/xarray/issues/7111,1265735072,IC_kwDOAMm_X85LcZWg,35968931,2022-10-03T16:41:37Z,2022-10-03T16:41:37Z,MEMBER,"> I was never a huge fan of putting a DataArray in the attrs of another DataArray, but nothing seemed to disallow it so I ultimately lost that argument. Out of curiosity, why do you need to store a DataArray object as opposed to merely the values in one?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1264515251,https://api.github.com/repos/pydata/xarray/issues/7111,1264515251,IC_kwDOAMm_X85LXviz,1828519,2022-10-02T00:25:36Z,2022-10-02T00:41:00Z,CONTRIBUTOR,"Sorry, false alarm. I was running with an old environment. With this new PR it seems the `ancillary_variables` tests that were failing now pass, but the dask `.copy()` related ones still fail...which is expected so I'm ok with that. Edit: I hacked `variable.py` so it had this: ``` if deep: if is_duck_dask_array(ndata): ndata = ndata else: ndata = copy.deepcopy(ndata, memo) ``` and that fixed a lot of my dask related tests, but also seems to have introduced two new failures from what I can tell. So :man_shrugging: ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1264437857,https://api.github.com/repos/pydata/xarray/issues/7111,1264437857,IC_kwDOAMm_X85LXcph,1828519,2022-10-01T18:01:16Z,2022-10-01T18:01:16Z,CONTRIBUTOR,It looks like that PR fixes all of my Satpy unit tests. I'm not sure how that is possible if it doesn't also change when dask arrays are copied.,"{""total_count"": 2, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 2, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1264398329,https://api.github.com/repos/pydata/xarray/issues/7111,1264398329,IC_kwDOAMm_X85LXS_5,43316012,2022-10-01T15:27:37Z,2022-10-01T15:27:37Z,COLLABORATOR,"I added a PR that fixes the broken reprs and deepcopys. The other issues are not addressed yet.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1264388047,https://api.github.com/repos/pydata/xarray/issues/7111,1264388047,IC_kwDOAMm_X85LXQfP,1828519,2022-10-01T14:56:45Z,2022-10-01T14:56:45Z,CONTRIBUTOR,"Also note the other important change in this new behavior which is that dask arrays are now copied (`.copy()`) when they weren't before. This is causing some equality issues for us in Satpy, but I agree with the change on xarray's side (xarray should be able to call `.copy()` on whatever array it has. https://github.com/dask/dask/issues/9533","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1264386173,https://api.github.com/repos/pydata/xarray/issues/7111,1264386173,IC_kwDOAMm_X85LXQB9,1828519,2022-10-01T14:47:51Z,2022-10-01T14:47:51Z,CONTRIBUTOR,"I'm a little torn on this. Obviously I'm not an xarray maintainer so I'm not the one who would have to maintain it or answer support questions about it. We actually had the user-side of this discussion in the [Satpy library](https://github.com/pytroll/satpy) group a while ago which is leading to this whole problem for us now. In Satpy we don't typically use or deal with xarray Datasets (the new DataTree library is likely what we'll move to) so when we have relationships between DataArrays we'll use something like ancillary variables to connect them. For example, a data quality flag that is used by the other variables in a file. Our users don't usually care about the DQF but we don't want to stop them from being able to easily access it. I was never a huge fan of putting a DataArray in the attrs of another DataArray, but nothing seemed to disallow it so I ultimately lost that argument. So on one hand I agree it seems like there shouldn't be a need in most cases to have a DataArray inside a DataArray, especially a circular dependency. On the other hand, I'm not looking forward to the updates I'll need to make to Satpy to fix this. Note, we don't do this *everywhere* in Satpy, just something we use for a few formats we read.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1264335676,https://api.github.com/repos/pydata/xarray/issues/7111,1264335676,IC_kwDOAMm_X85LXDs8,43316012,2022-10-01T11:28:53Z,2022-10-01T11:28:53Z,COLLABORATOR,"Ok, even `xarray.testing.assert_identical` fails with recursive definitions. Are we sure that it is a good idea to support this?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1264335114,https://api.github.com/repos/pydata/xarray/issues/7111,1264335114,IC_kwDOAMm_X85LXDkK,43316012,2022-10-01T11:25:42Z,2022-10-01T11:25:42Z,COLLABORATOR,"I will set up a PR for that. Another issue has arisen: the repr is also broken for recursive data. With your example python should also raise a RecursionError when looking at this data?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1264322604,https://api.github.com/repos/pydata/xarray/issues/7111,1264322604,IC_kwDOAMm_X85LXAgs,32801740,2022-10-01T10:40:48Z,2022-10-01T10:40:48Z,CONTRIBUTOR,"To avoid code duplication you may consider moving all logic from the `copy` methods to new `_copy` methods and extending that with an optional `memo` argument and have the `copy`, `__copy__` and `__deepcopy__` methods as thin wrappers around it.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1264272446,https://api.github.com/repos/pydata/xarray/issues/7111,1264272446,IC_kwDOAMm_X85LW0Q-,43316012,2022-10-01T07:08:53Z,2022-10-01T08:35:24Z,COLLABORATOR,"I think our implementations of `copy(deep=True)` and `__deepcopy__` are reverted, the first should call the latter and not the other way around to be able to pass the memo dict. This will lead to a bit of duplicate code between `__copy__` and `__deepcopy__` but would be the correct way.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1264014472,https://api.github.com/repos/pydata/xarray/issues/7111,1264014472,IC_kwDOAMm_X85LV1SI,32801740,2022-09-30T20:52:28Z,2022-09-30T20:54:25Z,CONTRIBUTOR,"> Is there some feature that python uses to check whether a data structure is recursive when it's copying, which we're not taking advantage of? I can look more later. yes, `def __deepcopy__(self, memo=None)` has the `memo` argument exactly for the purpose of dealing with recursion, see https://docs.python.org/3/library/copy.html. Currently, xarray's `__deepcopy__` methods do not pass on the memo argument when deepcopying its components.","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1263967757,https://api.github.com/repos/pydata/xarray/issues/7111,1263967757,IC_kwDOAMm_X85LVp4N,5635139,2022-09-30T19:59:05Z,2022-09-30T19:59:05Z,MEMBER,"Hmmm, python seems to deal with this reasonably for its builtins: ```python In [1]: a = [1] In [2]: b = [a] In [3]: a.append(b) In [4]: import copy In [5]: copy.deepcopy(a) Out[5]: [1, [[...]]] ``` I doubt this is getting hit _that_ much given it requires a recursive data structure, but it does seem like a gnarly error. Is there some feature that python uses to check whether a data structure is recursive when it's copying, which we're not taking advantage of? I can look more later.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1263967009,https://api.github.com/repos/pydata/xarray/issues/7111,1263967009,IC_kwDOAMm_X85LVpsh,1828519,2022-09-30T19:58:11Z,2022-09-30T19:58:11Z,CONTRIBUTOR,"I'd have to check, but this structure I *think* was originally produce by xarray reading a CF compliant NetCDF file. That is my memory at least. It could be that our library (satpy) is doing this as a convenience, replacing the name of an ancillary variable with the DataArray of that ancillary variable. My other new issue seems to be related to `.copy()` doing a `.copy()` on dask arrays which then makes them not equivalent anymore. Working on an MVCE now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1263956728,https://api.github.com/repos/pydata/xarray/issues/7111,1263956728,IC_kwDOAMm_X85LVnL4,43316012,2022-09-30T19:45:57Z,2022-09-30T19:45:57Z,COLLABORATOR,"I basically copied the behavior of `Dataset.copy` which should already show this problem. In principle we are doing a `new_attrs = copy.deepcopy(attrs)`. I would claim that the new behavior is correct, but maybe other devs can confirm this. Coming from netCDF, it does not really make sense to put complex objects in attrs, but I guess for in-memory only it works.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1263952252,https://api.github.com/repos/pydata/xarray/issues/7111,1263952252,IC_kwDOAMm_X85LVmF8,1828519,2022-09-30T19:41:30Z,2022-09-30T19:41:30Z,CONTRIBUTOR,"I get a similar error for different structures and if I do something like `data_arr.where(data_arr > 5, drop=True)`. In this case I have dask array based DataArrays and dask ends up trying to hash the object and it ends up in a loop trying to get xarray to hash the DataArray or something and xarray trying to hash the DataArrays inside `.attrs`. ``` In [9]: import dask.array as da In [15]: a = xr.DataArray(da.zeros(5.0), attrs={}, dims=(""a_dim"",)) In [16]: b = xr.DataArray(da.zeros(8.0), attrs={}, dims=(""b_dim"",)) In [20]: a.attrs[""other""] = b In [24]: lons = xr.DataArray(da.random.random(8), attrs={""ancillary_variables"": [b]}) In [25]: lats = xr.DataArray(da.random.random(8), attrs={""ancillary_variables"": [b]}) In [26]: b.attrs[""some_attr""] = [lons, lats] In [27]: cond = a > 5 In [28]: c = a.where(cond, drop=True) ... File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/dask/utils.py:1982, in _HashIdWrapper.__hash__(self) 1981 def __hash__(self): -> 1982 return id(self.wrapped) RecursionError: maximum recursion depth exceeded while calling a Python object ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100 https://github.com/pydata/xarray/issues/7111#issuecomment-1263927640,https://api.github.com/repos/pydata/xarray/issues/7111,1263927640,IC_kwDOAMm_X85LVgFY,1828519,2022-09-30T19:14:48Z,2022-09-30T19:14:48Z,CONTRIBUTOR,CC @headtr1ck any idea if this is supposed to work with your new #7089?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1392878100