html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/7211#issuecomment-1295785782,https://api.github.com/repos/pydata/xarray/issues/7211,1295785782,IC_kwDOAMm_X85NPB82,64480652,2022-10-29T09:39:35Z,2022-10-29T09:39:35Z,NONE,"I see, the user would be responsible to manage whats he's doing.
I think this answer closes this issue, then, as this behaviour is indeed expected.
Thanks for your helping. Nice work!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1422460071
https://github.com/pydata/xarray/issues/7211#issuecomment-1295779740,https://api.github.com/repos/pydata/xarray/issues/7211,1295779740,IC_kwDOAMm_X85NPAec,43316012,2022-10-29T09:14:31Z,2022-10-29T09:14:31Z,COLLABORATOR,"I think we wanted to explicitly support any hashable (e.g. tuples or as in this case frozenlists) as variable or dimension ""names"".","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1422460071
https://github.com/pydata/xarray/issues/7211#issuecomment-1295372811,https://api.github.com/repos/pydata/xarray/issues/7211,1295372811,IC_kwDOAMm_X85NNdIL,64480652,2022-10-28T19:26:45Z,2022-10-28T19:41:36Z,NONE,"Hey @TomNicholas, thanks for replying me back!
The problem is they chose to define how to hash a list (which is no hashable) by parsing it to a tuple
```python
class frozenlist(list):
def __readonly__(self, *args, **kwargs):
raise RuntimeError(""Cannot modify ReadOnlyList"")
# https://docs.python.org/3/library/pickle.html#object.__reduce__
#
# Like frozendict, implement __reduce__ and __setstate__ to handle pickling.
# Otherwise, __setstate__ will be called to restore the frozenlist, causing
# a RuntimeError because frozenlist is not mutable.
def __reduce__(self):
return (frozenlist, (), list(self))
def __setstate__(self, state):
self.__init__(state)
__setitem__ = __readonly__ # type: ignore[assignment]
__delitem__ = __readonly__
append = __readonly__
clear = __readonly__
extend = __readonly__
insert = __readonly__
pop = __readonly__
remove = __readonly__
reverse = __readonly__
sort = __readonly__ # type: ignore[assignment]
def __hash__(self):
return hash(tuple(self))
```
Dataset class defines __getitem__ by
```python
def __getitem__(
self: T_Dataset, key: Mapping[Any, Any] | Hashable | Iterable[Hashable]
) -> T_Dataset | DataArray:
""""""Access variables or coordinates of this dataset as a
:py:class:`~xarray.DataArray` or a subset of variables or a indexed dataset.
Indexing with a list of names will return a new ``Dataset`` object.
""""""
if utils.is_dict_like(key):
return self.isel(**key)
if utils.hashable(key):
return self._construct_dataarray(key)
if utils.iterable_of_hashable(key):
return self._copy_listed(key)
raise ValueError(f""Unsupported key-type {type(key)}"")
```
It tries to hash the key (and this frozenlist do this without any error).
I've checked out how pandas library deal with this task. They have a much more complex structure to deal with this problem (I haven't read all of the code below to understand):
```python
def __getitem__(self, key):
check_deprecated_indexers(key)
key = lib.item_from_zerodim(key)
key = com.apply_if_callable(key, self)
if is_hashable(key) and not is_iterator(key):
# is_iterator to exclude generator e.g. test_getitem_listlike
# shortcut if the key is in columns
is_mi = isinstance(self.columns, MultiIndex)
# GH#45316 Return view if key is not duplicated
# Only use drop_duplicates with duplicates for performance
if not is_mi and (
self.columns.is_unique
and key in self.columns
or key in self.columns.drop_duplicates(keep=False)
):
return self._get_item_cache(key)
elif is_mi and self.columns.is_unique and key in self.columns:
return self._getitem_multilevel(key)
# Do we have a slicer (on rows)?
indexer = convert_to_index_sliceable(self, key)
if indexer is not None:
if isinstance(indexer, np.ndarray):
indexer = lib.maybe_indices_to_slice(
indexer.astype(np.intp, copy=False), len(self)
)
if isinstance(indexer, np.ndarray):
# GH#43223 If we can not convert, use take
return self.take(indexer, axis=0)
# either we have a slice or we have a string that can be converted
# to a slice for partial-string date indexing
return self._slice(indexer, axis=0)
# Do we have a (boolean) DataFrame?
if isinstance(key, DataFrame):
return self.where(key)
# Do we have a (boolean) 1d indexer?
if com.is_bool_indexer(key):
return self._getitem_bool_array(key)
# We are left with two options: a single key, and a collection of keys,
# We interpret tuples as collections only for non-MultiIndex
is_single_key = isinstance(key, tuple) or not is_list_like(key)
if is_single_key:
if self.columns.nlevels > 1:
return self._getitem_multilevel(key)
indexer = self.columns.get_loc(key)
if is_integer(indexer):
indexer = [indexer]
else:
if is_iterator(key):
key = list(key)
indexer = self.columns._get_indexer_strict(key, ""columns"")[1]
# take() does not accept boolean indexers
if getattr(indexer, ""dtype"", None) == bool:
indexer = np.where(indexer)[0]
data = self._take_with_is_copy(indexer, axis=1)
if is_single_key:
# What does looking for a single key in a non-unique index return?
# The behavior is inconsistent. It returns a Series, except when
# - the key itself is repeated (test on data.shape, #9519), or
# - we have a MultiIndex on columns (test on self.columns, #21309)
if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):
# GH#26490 using data[key] can cause RecursionError
return data._get_item_cache(key)
return data
```
This similar code, by the way, runs correctly
```python
# %%
import pandas as pd
data = pd.DataFrame({'a': [1], 'b': [2]})
variables = frozenlist(['a'])
data[variables]
```
Now it is a matter of how xarray want to treat this issue.
If you do want to change your way of doing this, I could try coding and request this PR trying pandas approach.
Anyway, I will report this issue back to Dagster for them to check this discussion too. Maybe they see their side is worth changing.
Let me know what is your view about this!
Links:
https://github.com/pydata/xarray/blob/e1936a98059ae29da2861f58a7aff4a56302aac1/xarray/core/dataset.py#L1419
https://github.com/pandas-dev/pandas/blob/v1.5.1/pandas/core/frame.py#L473-L11983","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1422460071
https://github.com/pydata/xarray/issues/7211#issuecomment-1293975333,https://api.github.com/repos/pydata/xarray/issues/7211,1293975333,IC_kwDOAMm_X85NIH8l,35968931,2022-10-27T19:33:55Z,2022-10-27T19:40:19Z,MEMBER,"Hi @airton-neto - here is a much shorter example that reproduces the same error
```python
url = 'https://rda.ucar.edu/thredds/dodsC/files/g/ds084.1/2022/20220201/gfs.0p25.2022020100.f000.grib2'
dataset = xr.open_dataset(url, cache=True, engine=""netcdf4"")
```
```python
from dagster._utils import frozenlist
# variables = list(['u-component_of_wind_height_above_ground']) # <--- This way it runs
variables = frozenlist(['u-component_of_wind_height_above_ground'])
dataset[variables]
```
```python
--------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/miniconda3/envs/py39/lib/python3.9/site-packages/xarray/core/dataset.py:1317, in Dataset._construct_dataarray(self, name)
1316 try:
-> 1317 variable = self._variables[name]
1318 except KeyError:
KeyError: ['u-component_of_wind_height_above_ground']
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
Input In [43], in ()
----> 1 dataset[variables]
File ~/miniconda3/envs/py39/lib/python3.9/site-packages/xarray/core/dataset.py:1410, in Dataset.__getitem__(self, key)
1408 return self.isel(**key)
1409 if utils.hashable(key):
-> 1410 return self._construct_dataarray(key)
1411 if utils.iterable_of_hashable(key):
1412 return self._copy_listed(key)
File ~/miniconda3/envs/py39/lib/python3.9/site-packages/xarray/core/dataset.py:1319, in Dataset._construct_dataarray(self, name)
1317 variable = self._variables[name]
1318 except KeyError:
-> 1319 _, name, variable = _get_virtual_variable(self._variables, name, self.dims)
1321 needed_dims = set(variable.dims)
1323 coords: dict[Hashable, Variable] = {}
File ~/miniconda3/envs/py39/lib/python3.9/site-packages/xarray/core/dataset.py:171, in _get_virtual_variable(variables, key, dim_sizes)
168 return key, key, variable
170 if not isinstance(key, str):
--> 171 raise KeyError(key)
173 split_key = key.split(""."", 1)
174 if len(split_key) != 2:
KeyError: ['u-component_of_wind_height_above_ground']
```
I'm not immediately sure why the elements of the list are not properly extracted, but the `frozenlist` class is a [dagster private internal](https://github.com/dagster-io/dagster/blob/688a5d895ff265fbfb90dfb1e2876caa81051ccc/python_modules/dagster/dagster/_utils/__init__.py#L223), and the problem is specific to how that object interacts with the current xarray code. As a downstream user of xarray, can dagster not just change it's code to pass the type xarray expects (i.e. a normal `list`)?
Having said that if you want to submit a PR which fixes this bug (and doesn't require special-casing dagster somehow) then that would be welcome too!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1422460071
|