2 rows where issue = 1422460071 and user = 64480652 sorted by updated_at descending

Search:

descending

✎ View and edit SQL

This data as json, CSV (advanced)

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	performed_via_github_app	issue
1295785782	https://github.com/pydata/xarray/issues/7211#issuecomment-1295785782	https://api.github.com/repos/pydata/xarray/issues/7211	IC_kwDOAMm_X85NPB82	airton-neto 64480652	2022-10-29T09:39:35Z	2022-10-29T09:39:35Z	NONE	I see, the user would be responsible to manage whats he's doing. I think this answer closes this issue, then, as this behaviour is indeed expected. Thanks for your helping. Nice work!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		Incorrect handle to Dagster frozenlists in Dataset object 1422460071
1295372811	https://github.com/pydata/xarray/issues/7211#issuecomment-1295372811	https://api.github.com/repos/pydata/xarray/issues/7211	IC_kwDOAMm_X85NNdIL	airton-neto 64480652	2022-10-28T19:26:45Z	2022-10-28T19:41:36Z	NONE	Hey @TomNicholas, thanks for replying me back! The problem is they chose to define how to hash a list (which is no hashable) by parsing it to a tuple ```python class frozenlist(list): def readonly(self, args, kwargs): raise RuntimeError("Cannot modify ReadOnlyList") # https://docs.python.org/3/library/pickle.html#object.__reduce__ # # Like frozendict, implement __reduce__ and __setstate__ to handle pickling. # Otherwise, __setstate__ will be called to restore the frozenlist, causing # a RuntimeError because frozenlist is not mutable. def __reduce__(self): return (frozenlist, (), list(self)) def __setstate__(self, state): self.__init__(state) __setitem__ = __readonly__ # type: ignore[assignment] __delitem__ = __readonly__ append = __readonly__ clear = __readonly__ extend = __readonly__ insert = __readonly__ pop = __readonly__ remove = __readonly__ reverse = __readonly__ sort = __readonly__ # type: ignore[assignment] def __hash__(self): return hash(tuple(self)) ``` Dataset class defines getitem* by python def __getitem__( self: T_Dataset, key: Mapping[Any, Any] \| Hashable \| Iterable[Hashable] ) -> T_Dataset \| DataArray: """Access variables or coordinates of this dataset as a :py:class:`~xarray.DataArray` or a subset of variables or a indexed dataset. Indexing with a list of names will return a new ``Dataset`` object. """ if utils.is_dict_like(key): return self.isel(key) if utils.hashable(key): return self._construct_dataarray(key) if utils.iterable_of_hashable(key): return self._copy_listed(key) raise ValueError(f"Unsupported key-type {type(key)}") It tries to hash the key (and this frozenlist do this without any error). I've checked out how pandas library deal with this task. They have a much more complex structure to deal with this problem (I haven't read all of the code below to understand): ```python def getitem**(self, key): check_deprecated_indexers(key) key = lib.item_from_zerodim(key) key = com.apply_if_callable(key, self) if is_hashable(key) and not is_iterator(key): # is_iterator to exclude generator e.g. test_getitem_listlike # shortcut if the key is in columns is_mi = isinstance(self.columns, MultiIndex) # GH#45316 Return view if key is not duplicated # Only use drop_duplicates with duplicates for performance if not is_mi and ( self.columns.is_unique and key in self.columns or key in self.columns.drop_duplicates(keep=False) ): return self._get_item_cache(key) elif is_mi and self.columns.is_unique and key in self.columns: return self._getitem_multilevel(key) # Do we have a slicer (on rows)? indexer = convert_to_index_sliceable(self, key) if indexer is not None: if isinstance(indexer, np.ndarray): indexer = lib.maybe_indices_to_slice( indexer.astype(np.intp, copy=False), len(self) ) if isinstance(indexer, np.ndarray): # GH#43223 If we can not convert, use take return self.take(indexer, axis=0) # either we have a slice or we have a string that can be converted # to a slice for partial-string date indexing return self._slice(indexer, axis=0) # Do we have a (boolean) DataFrame? if isinstance(key, DataFrame): return self.where(key) # Do we have a (boolean) 1d indexer? if com.is_bool_indexer(key): return self._getitem_bool_array(key) # We are left with two options: a single key, and a collection of keys, # We interpret tuples as collections only for non-MultiIndex is_single_key = isinstance(key, tuple) or not is_list_like(key) if is_single_key: if self.columns.nlevels > 1: return self._getitem_multilevel(key) indexer = self.columns.get_loc(key) if is_integer(indexer): indexer = [indexer] else: if is_iterator(key): key = list(key) indexer = self.columns._get_indexer_strict(key, "columns")[1] # take() does not accept boolean indexers if getattr(indexer, "dtype", None) == bool: indexer = np.where(indexer)[0] data = self._take_with_is_copy(indexer, axis=1) if is_single_key: # What does looking for a single key in a non-unique index return? # The behavior is inconsistent. It returns a Series, except when # - the key itself is repeated (test on data.shape, #9519), or # - we have a MultiIndex on columns (test on self.columns, #21309) if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex): # GH#26490 using data[key] can cause RecursionError return data._get_item_cache(key) return data ``` This similar code, by the way, runs correctly ```python %% import pandas as pd data = pd.DataFrame({'a': [1], 'b': [2]}) variables = frozenlist(['a']) data[variables] ``` Now it is a matter of how xarray want to treat this issue. If you do want to change your way of doing this, I could try coding and request this PR trying pandas approach. Anyway, I will report this issue back to Dagster for them to check this discussion too. Maybe they see their side is worth changing. Let me know what is your view about this! Links: https://github.com/pydata/xarray/blob/e1936a98059ae29da2861f58a7aff4a56302aac1/xarray/core/dataset.py#L1419 https://github.com/pandas-dev/pandas/blob/v1.5.1/pandas/core/frame.py#L473-L11983	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		Incorrect handle to Dagster frozenlists in Dataset object 1422460071

html_url

issue_url

node_id

user

created_at

updated_at ▲

author_association

body

reactions

performed_via_github_app

issue

1295785782

https://github.com/pydata/xarray/issues/7211#issuecomment-1295785782

https://api.github.com/repos/pydata/xarray/issues/7211

IC_kwDOAMm_X85NPB82

airton-neto 64480652

2022-10-29T09:39:35Z

NONE

I see, the user would be responsible to manage whats he's doing.

I think this answer closes this issue, then, as this behaviour is indeed expected.

Thanks for your helping. Nice work!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}

Incorrect handle to Dagster frozenlists in Dataset object 1422460071

1295372811

https://github.com/pydata/xarray/issues/7211#issuecomment-1295372811

https://api.github.com/repos/pydata/xarray/issues/7211

IC_kwDOAMm_X85NNdIL

airton-neto 64480652

2022-10-28T19:26:45Z

2022-10-28T19:41:36Z

NONE

Hey @TomNicholas, thanks for replying me back!

The problem is they chose to define how to hash a list (which is no hashable) by parsing it to a tuple

```python class frozenlist(list): def readonly(self, args, *kwargs): raise RuntimeError("Cannot modify ReadOnlyList")

# https://docs.python.org/3/library/pickle.html#object.__reduce__
#
# Like frozendict, implement __reduce__ and __setstate__ to handle pickling.
# Otherwise, __setstate__ will be called to restore the frozenlist, causing
# a RuntimeError because frozenlist is not mutable.

def __reduce__(self):
    return (frozenlist, (), list(self))

def __setstate__(self, state):
    self.__init__(state)

__setitem__ = __readonly__  # type: ignore[assignment]
__delitem__ = __readonly__
append = __readonly__
clear = __readonly__
extend = __readonly__
insert = __readonly__
pop = __readonly__
remove = __readonly__
reverse = __readonly__
sort = __readonly__  # type: ignore[assignment]

def __hash__(self):
    return hash(tuple(self))

```

Dataset class defines getitem by

python def __getitem__( self: T_Dataset, key: Mapping[Any, Any] | Hashable | Iterable[Hashable] ) -> T_Dataset | DataArray: """Access variables or coordinates of this dataset as a :py:class:`~xarray.DataArray` or a subset of variables or a indexed dataset. Indexing with a list of names will return a new ``Dataset`` object. """ if utils.is_dict_like(key): return self.isel(**key) if utils.hashable(key): return self._construct_dataarray(key) if utils.iterable_of_hashable(key): return self._copy_listed(key) raise ValueError(f"Unsupported key-type {type(key)}")

It tries to hash the key (and this frozenlist do this without any error).

I've checked out how pandas library deal with this task. They have a much more complex structure to deal with this problem (I haven't read all of the code below to understand):

```python def getitem(self, key): check_deprecated_indexers(key) key = lib.item_from_zerodim(key) key = com.apply_if_callable(key, self)

    if is_hashable(key) and not is_iterator(key):
        # is_iterator to exclude generator e.g. test_getitem_listlike
        # shortcut if the key is in columns
        is_mi = isinstance(self.columns, MultiIndex)
        # GH#45316 Return view if key is not duplicated
        # Only use drop_duplicates with duplicates for performance
        if not is_mi and (
            self.columns.is_unique
            and key in self.columns
            or key in self.columns.drop_duplicates(keep=False)
        ):
            return self._get_item_cache(key)

        elif is_mi and self.columns.is_unique and key in self.columns:
            return self._getitem_multilevel(key)
    # Do we have a slicer (on rows)?
    indexer = convert_to_index_sliceable(self, key)
    if indexer is not None:
        if isinstance(indexer, np.ndarray):
            indexer = lib.maybe_indices_to_slice(
                indexer.astype(np.intp, copy=False), len(self)
            )
            if isinstance(indexer, np.ndarray):
                # GH#43223 If we can not convert, use take
                return self.take(indexer, axis=0)
        # either we have a slice or we have a string that can be converted
        #  to a slice for partial-string date indexing
        return self._slice(indexer, axis=0)

    # Do we have a (boolean) DataFrame?
    if isinstance(key, DataFrame):
        return self.where(key)

    # Do we have a (boolean) 1d indexer?
    if com.is_bool_indexer(key):
        return self._getitem_bool_array(key)

    # We are left with two options: a single key, and a collection of keys,
    # We interpret tuples as collections only for non-MultiIndex
    is_single_key = isinstance(key, tuple) or not is_list_like(key)

    if is_single_key:
        if self.columns.nlevels > 1:
            return self._getitem_multilevel(key)
        indexer = self.columns.get_loc(key)
        if is_integer(indexer):
            indexer = [indexer]
    else:
        if is_iterator(key):
            key = list(key)
        indexer = self.columns._get_indexer_strict(key, "columns")[1]

    # take() does not accept boolean indexers
    if getattr(indexer, "dtype", None) == bool:
        indexer = np.where(indexer)[0]

    data = self._take_with_is_copy(indexer, axis=1)

    if is_single_key:
        # What does looking for a single key in a non-unique index return?
        # The behavior is inconsistent. It returns a Series, except when
        # - the key itself is repeated (test on data.shape, #9519), or
        # - we have a MultiIndex on columns (test on self.columns, #21309)
        if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):
            # GH#26490 using data[key] can cause RecursionError
            return data._get_item_cache(key)

    return data

```

This similar code, by the way, runs correctly

```python

%%

import pandas as pd data = pd.DataFrame({'a': [1], 'b': [2]}) variables = frozenlist(['a']) data[variables] ```

Now it is a matter of how xarray want to treat this issue. If you do want to change your way of doing this, I could try coding and request this PR trying pandas approach.

Anyway, I will report this issue back to Dagster for them to check this discussion too. Maybe they see their side is worth changing.

Let me know what is your view about this!

Links: https://github.com/pydata/xarray/blob/e1936a98059ae29da2861f58a7aff4a56302aac1/xarray/core/dataset.py#L1419 https://github.com/pandas-dev/pandas/blob/v1.5.1/pandas/core/frame.py#L473-L11983

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}

Incorrect handle to Dagster frozenlists in Dataset object 1422460071

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);