home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "CONTRIBUTOR" and issue = 341643235 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • ttung 2
  • gimperiale 2

issue 1

  • Support non-string dimension/variable names · 4 ✖

author_association 1

  • CONTRIBUTOR · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
490824656 https://github.com/pydata/xarray/issues/2292#issuecomment-490824656 https://api.github.com/repos/pydata/xarray/issues/2292 MDEyOklzc3VlQ29tbWVudDQ5MDgyNDY1Ng== gimperiale 47244312 2019-05-09T09:13:22Z 2019-05-09T09:13:22Z CONTRIBUTOR

A possible way out would be to open a PEP for "and" and "not" operators in the typing module. That way we could define a "variable-name-like" type and use it throughout the module:

xarray.utils: from typing import AllOf, Hashable, NoneOf VarName = AllOf[Hashable, NoneOf[None, tuple]] Elsewhere: from .utils import VarName def f(x: Union[VarName, Sequence[VarName], None]): if x is None: x = [DEFAULT] elif isinstance(x, VarName): x = [x] elif not isinstance(x, Sequence): raise TypeError('x: expected hashable or sequence of hashables)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support non-string dimension/variable names 341643235
490821558 https://github.com/pydata/xarray/issues/2292#issuecomment-490821558 https://api.github.com/repos/pydata/xarray/issues/2292 MDEyOklzc3VlQ29tbWVudDQ5MDgyMTU1OA== gimperiale 47244312 2019-05-09T09:04:21Z 2019-05-09T09:05:48Z CONTRIBUTOR

There are problems with typing. I already mentioned them in #2929 but I'll summarize here.

The vast majority of xarray functions/methods allow for "string or sequence of strings, optional". When you move to "hashable or sequence of hashables, optional", however, you want to specifically avoid tuples, which are both Sequence and Hashable instances.

Most functions currently look like this: if isinstance(x, str): x = [x] elif x is None: x = [DEFAULT] for xi in x: ... After the change they would become: if x is None: x = [DEFAULT] elif isinstance(x, Hashable) and not isinstance(x, tuple): x = [x] for xi in x: ... Or: if x is None: x = [DEFAULT] elif isinstance(x, str) or not isinstance(x, Sequence): x = [x] for xi in x: ... Note how I moved the test for None above. This matters, because isinstance(None, Hashable) returns True. This is very error-prone and expensive to maintain, which will very easily cause beginner contributors to introduce bugs. Every test that currently runs three use cases, one for None, one for str and another for a sequence of str, will now be forced to be expanded to SIX test cases:

  1. str
  2. tuple (hashable sequence) of str
  3. list (non-hashable sequence) of str
  4. enum (non-str, non-sequence hashable)
  5. sequence of non-sortable hashables
  6. None

One way to mitigate it would be to have an helper function, which would be invoked everywhere around the codebase, and then religiously make sure that the helper function is always used. _no_default = [object()] def ensure_sequence(name: str, x: Union[Hashable, Sequence[Hashable]], default: Sequence[Hashable] = _no_default) -> Sequence[Hashable]: if x is None: if default is _no_default: raise ValueError(name + ' must be explicitly defined') return default if isinstance(x, Sequence) and not isinstance(x, str): return x if isinstance(x, Hashable): return [x] raise TypeError(name + ' must be a Hashable or Sequence of Hashable') You would still be forced to implement the test for non-sortable hashables, though.


A completely separate problem with typing is that I expect a huge amount of xarray users to just assume variable names and dims are always strings. They'll have things like for k, v in ds.data_vars: if k.startswith('foo'): ... or [dim for dim in da.dims if "foo" in dim] The above will fill the mypy output with errors as soon as xarray becomes integrated in mypy (#2929), and the user will have to go through a lot of explicitly forcing dims and variable names to str, even if in their project all dims and variables names are always str.


The final problem is that integers are Hashables, and there's a wealth of cases in xarray where there is special logic that dynamically treats ints as positional indices.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support non-string dimension/variable names 341643235
410008899 https://github.com/pydata/xarray/issues/2292#issuecomment-410008899 https://api.github.com/repos/pydata/xarray/issues/2292 MDEyOklzc3VlQ29tbWVudDQxMDAwODg5OQ== ttung 280924 2018-08-02T17:38:52Z 2018-08-02T17:38:52Z CONTRIBUTOR

The problem with generic scalar types is that it wouldn't work after serialization and deserialization (which would presumably go to strings). My suggestion has the advantage of being able to create a __eq__ method in the base class that would match both the object itself or its string equivalent, so that one could use the scalar type even after ser/deser. I disagree that base classes aren't very pythonic.

However, I think (1)/(2) are both reasonable solution (in fact, they seem to be identical except for when you call str). It has its warts, as even a mutable sequence would pass muster. :)

If that's the direction you'd like to see the project go towards, I'd be happy to take a stab at it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support non-string dimension/variable names 341643235
409822333 https://github.com/pydata/xarray/issues/2292#issuecomment-409822333 https://api.github.com/repos/pydata/xarray/issues/2292 MDEyOklzc3VlQ29tbWVudDQwOTgyMjMzMw== ttung 280924 2018-08-02T06:38:32Z 2018-08-02T06:38:32Z CONTRIBUTOR

We're using xarray in a project that is encouraging use of python typing, and we too would like to use enums as data dimension names. How do you feel about using a base class that data dimension classes need to subclass?

Here's a really simple proof-of-concept (though not very thorough, as it would certainly fail serialization): https://github.com/ttung/xarray/commit/8e623ebebc8f5c1e5615e6d07a82451c0dbe763d

``` In [1]: import xarray as xr

In [2]: import numpy as np

In [5]: from enum import Enum

In [6]: class A(xr.core.dataarray.DimensionBase, Enum): ...: X = "abc" ...: Y = "def" ...: Z = "ghi" ...:

In [7]: a = xr.DataArray(np.random.randint(0, 255, size=(4, 3, 5)), dims=[A.X, A.Y, A.Z])

In [8]: a[A.X] Out[8]: <xarray.DataArray \<A.X: 'abc'> (A.X: 4)> array([0, 1, 2, 3]) Dimensions without coordinates: A.X

In [9]: a.max(A.X) Out[9]: <xarray.DataArray (A.Y: 3, A.Z: 5)> array([[254, 226, 181, 191, 233], [139, 195, 212, 167, 169], [191, 241, 199, 174, 208]]) Dimensions without coordinates: A.Y, A.Z

In [10]: ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support non-string dimension/variable names 341643235

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 230.592ms · About: xarray-datasette