home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where user = 21131639 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 3

  • MultiIndex listed multiple times in Dataset.indexes property 3
  • xarray.open_dataset has issues if the dataset returned by the backend contains a multiindex 2
  • Update open_dataset backend to ensure compatibility with new explicit index model 2

user 1

  • lukasbindreiter · 7 ✖

author_association 1

  • CONTRIBUTOR 7
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1276290679 https://github.com/pydata/xarray/pull/7150#issuecomment-1276290679 https://api.github.com/repos/pydata/xarray/issues/7150 IC_kwDOAMm_X85MEqZ3 lukasbindreiter 21131639 2022-10-12T14:38:22Z 2022-10-12T14:38:22Z CONTRIBUTOR

Could you also update whats_new.rst with your contribution please?

Done 👍

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Update open_dataset backend to ensure compatibility with new explicit index model 1403144601
1274605144 https://github.com/pydata/xarray/pull/7150#issuecomment-1274605144 https://api.github.com/repos/pydata/xarray/issues/7150 IC_kwDOAMm_X85L-O5Y lukasbindreiter 21131639 2022-10-11T12:25:57Z 2022-10-11T12:25:57Z CONTRIBUTOR

Would it be possible to add a test with a multi-index?

I added such a test now (which fails with the old version of _protect_dataset_variables_inplace but runs through with the updated version of this PR)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Update open_dataset backend to ensure compatibility with new explicit index model 1403144601
1273284963 https://github.com/pydata/xarray/issues/7139#issuecomment-1273284963 https://api.github.com/repos/pydata/xarray/issues/7139 IC_kwDOAMm_X85L5Mlj lukasbindreiter 21131639 2022-10-10T13:04:30Z 2022-10-10T13:04:30Z CONTRIBUTOR

Based on your suggestion above I tried this single line fix which resolved my issue: https://github.com/pydata/xarray/pull/7150

However I'm not sure if this is the correct approach, since I'm not all to deeply familiar with the indexing model.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_dataset has issues if the dataset returned by the backend contains a multiindex 1400949778
1272922855 https://github.com/pydata/xarray/issues/7139#issuecomment-1272922855 https://api.github.com/repos/pydata/xarray/issues/7139 IC_kwDOAMm_X85L30Ln lukasbindreiter 21131639 2022-10-10T07:59:01Z 2022-10-10T07:59:01Z CONTRIBUTOR

Here is the full stacktrace:

```python

ValueError Traceback (most recent call last) Cell In [12], line 7 ----> 7 loaded = xr.open_dataset("multiindex.nc", engine="netcdf4-multiindex", handle_multiindex=True) 8 print(loaded)

File ~/.local/share/virtualenvs/test-oePfdNug/lib/python3.8/site-packages/xarray/backends/api.py:537, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, backend_kwargs, kwargs) 530 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None) 531 backend_ds = backend.open_dataset( 532 filename_or_obj, 533 drop_variables=drop_variables, 534 decoders, 535 kwargs, 536 ) --> 537 ds = _dataset_from_backend_dataset( 538 backend_ds, 539 filename_or_obj, 540 engine, 541 chunks, 542 cache, 543 overwrite_encoded_chunks, 544 inline_array, 545 drop_variables=drop_variables, 546 decoders, 547 **kwargs, 548 ) 549 return ds

File ~/.local/share/virtualenvs/test-oePfdNug/lib/python3.8/site-packages/xarray/backends/api.py:345, in _dataset_from_backend_dataset(backend_ds, filename_or_obj, engine, chunks, cache, overwrite_encoded_chunks, inline_array, **extra_tokens) 340 if not isinstance(chunks, (int, dict)) and chunks not in {None, "auto"}: 341 raise ValueError( 342 f"chunks must be an int, dict, 'auto', or None. Instead found {chunks}." 343 ) --> 345 _protect_dataset_variables_inplace(backend_ds, cache) 346 if chunks is None: 347 ds = backend_ds

File ~/.local/share/virtualenvs/test-oePfdNug/lib/python3.8/site-packages/xarray/backends/api.py:239, in _protect_dataset_variables_inplace(dataset, cache) 237 if cache: 238 data = indexing.MemoryCachedArray(data) --> 239 variable.data = data

File ~/.local/share/virtualenvs/test-oePfdNug/lib/python3.8/site-packages/xarray/core/variable.py:2795, in IndexVariable.data(self, data) 2793 @Variable.data.setter # type: ignore[attr-defined] 2794 def data(self, data): -> 2795 raise ValueError( 2796 f"Cannot assign to the .data attribute of dimension coordinate a.k.a IndexVariable {self.name!r}. " 2797 f"Please use DataArray.assign_coords, Dataset.assign_coords or Dataset.assign as appropriate." 2798 )

ValueError: Cannot assign to the .data attribute of dimension coordinate a.k.a IndexVariable 'measurement'. Please use DataArray.assign_coords, Dataset.assign_coords or Dataset.assign as appropriate. ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_dataset has issues if the dataset returned by the backend contains a multiindex 1400949778
1236917784 https://github.com/pydata/xarray/issues/6752#issuecomment-1236917784 https://api.github.com/repos/pydata/xarray/issues/6752 IC_kwDOAMm_X85Jud4Y lukasbindreiter 21131639 2022-09-05T12:07:13Z 2022-09-05T12:07:13Z CONTRIBUTOR

Thanks for the suggestions, I'll look into this Indexes.group_by_index and see if that is able to resolve our issue.

And with regards to the (de)serialization: I haven't investigated yet how the index changes in 2022.6 affect our initial usecase, maybe an approach such as the suggested with a custom backend may be even a better solution for us then. Though probably we would ideally need a way to override NetCDF4BackendEntrypoint with another Entrypoint as the default for .nc files

As for the original issue discussed here: That can probably be closed then, since it was an intentional change. But finding information about those changes right now was not so easy, is there some resource available where I can read up about the changes to indexes and functions related to them. (e.g. I was unaware of the existence of xindexes or Indexes.group_by_index)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex listed multiple times in Dataset.indexes property 1293460108
1236717180 https://github.com/pydata/xarray/issues/6752#issuecomment-1236717180 https://api.github.com/repos/pydata/xarray/issues/6752 IC_kwDOAMm_X85Jts58 lukasbindreiter 21131639 2022-09-05T08:51:24Z 2022-09-05T08:51:24Z CONTRIBUTOR

@benbovy I also just tested the get_unique() method that you mentioned and maybe noticed a related issue here, which I'm not sure is wanted / expected.

Taking the above dataset ds, accessing this function results in an error:

```python

ds.indexes.get_unique()

TypeError: unhashable type: 'MultiIndex' ```

However, for xindexes it works: ```python

ds.xindexes.get_unique()

[<xarray.core.indexes.PandasMultiIndex at 0x7f105bf1df20>] ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex listed multiple times in Dataset.indexes property 1293460108
1236714136 https://github.com/pydata/xarray/issues/6752#issuecomment-1236714136 https://api.github.com/repos/pydata/xarray/issues/6752 IC_kwDOAMm_X85JtsKY lukasbindreiter 21131639 2022-09-05T08:48:23Z 2022-09-05T08:48:23Z CONTRIBUTOR

We used the .indexes property to implement a workaround for serializing Datasets containing multiindices to netCDF. For this the implementation basically looked like this: (Inspired by and related to this issue: https://github.com/pydata/xarray/issues/1077)

Saving dataset as NetCDF: 1. Loop over dataset.indexes 2. Check if index is a multi index 3. If so, encode it somehow and save it as attribute in the dataset 4. Reset (remove) the index 5. Now save this "patched" dataset as NetCDF

And then loading it again: 1. Load the dataset 2. Check if this special attribute exists 3. If so decode the multiindex and set it as index in the dataset

When testing the pre-release version I noticed some of our tests failing, which is why I raised this issue in the first place - in case those changes were unwanted. I was not aware that you were actively working on multi index changes and therefore expecting API changes here. With that in mind I'll probably be able to adapt our code to this new API of indexes and xindexes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex listed multiple times in Dataset.indexes property 1293460108

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.281ms · About: xarray-datasette