home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where author_association = "MEMBER" and issue = 341643235 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 2

  • shoyer 8
  • crusaderky 1

issue 1

  • Support non-string dimension/variable names · 9 ✖

author_association 1

  • MEMBER · 9 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
496435570 https://github.com/pydata/xarray/issues/2292#issuecomment-496435570 https://api.github.com/repos/pydata/xarray/issues/2292 MDEyOklzc3VlQ29tbWVudDQ5NjQzNTU3MA== crusaderky 6213168 2019-05-28T09:18:52Z 2019-05-28T09:18:52Z MEMBER

@shoyer the biggest problem I see with your suggestion is that, for DataArrays, you'd likewise need to write xarray.DataArray[str, np.ndarray], except that str in this case refers to the coords alone, which I think novice users may find confusing as they won't mentally associate a DataArray to a dict-like - even if you can write da[coord name].

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support non-string dimension/variable names 341643235
496064041 https://github.com/pydata/xarray/issues/2292#issuecomment-496064041 https://api.github.com/repos/pydata/xarray/issues/2292 MDEyOklzc3VlQ29tbWVudDQ5NjA2NDA0MQ== shoyer 1217238 2019-05-27T03:15:53Z 2019-05-27T03:15:53Z MEMBER

From a typing perspective, what if xarray.Dataset was a generic type? Then you could write something like xarray.Dataset[str, np.ndarray] to get a Dataset specialized to string keys and numpy arrays in the .data attribute of its constituent data.

I don't think we need to change the signature of xarray functions to support "hashable or sequence of hashable". It's OK if convenience features (like support for passing in only a single argument as a string) don't work in all cases. I agree that it would be a good idea to use a centralized helper function for this, though.

It is unfortunate that there doesn't seem to be a good way to distinguish between "string" and "non-string sequence of strings" in Python's typing system. But I don't know a good way to solve this problem. Certainly the folks who work on typing in Python are aware of this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support non-string dimension/variable names 341643235
410121888 https://github.com/pydata/xarray/issues/2292#issuecomment-410121888 https://api.github.com/repos/pydata/xarray/issues/2292 MDEyOklzc3VlQ29tbWVudDQxMDEyMTg4OA== shoyer 1217238 2018-08-03T02:04:33Z 2018-08-03T02:04:33Z MEMBER

I disagree that base classes aren't very pythonic.

I should have said that required base classes don't feel very Pythonic. I'm not opposed to base classes in principle, and I'm definitely sympathetic to a desire to use static typing. See also https://github.com/pydata/xarray/issues/1900 for related discussion.

One consideration is what the advantages are of using enums over "dummy enums" like: class A: X = 'X' Y = 'Y' Z = 'Z' (i.e., constants in a namespace)

You can still refer to these programmatically like A.X, but I guess the string repr is different. On the plus side, "dummy enums" will serialize/deserialize perfectly to strings (because they are indeed strings).

I don't love the sound of names that deserialize to different types than their inputs. That seems very error prone, even if you do your best to overload all the special methods like __eq__.

What does seems like potentially a better idea to me would be a library with dedicated loader functions that "destringify" names by turning them back into enum objects.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support non-string dimension/variable names 341643235
409829431 https://github.com/pydata/xarray/issues/2292#issuecomment-409829431 https://api.github.com/repos/pydata/xarray/issues/2292 MDEyOklzc3VlQ29tbWVudDQwOTgyOTQzMQ== shoyer 1217238 2018-08-02T07:12:17Z 2018-08-02T07:12:17Z MEMBER

Most of the places in the code where we do the isinstance(obj, string) checks are where we allow passing in a single string as a convenient shortcut to a list of names. So I'm not sure it's really essential to allow flexible types there.

That said, checking explicitly for strings wasn't take a careful API choice. I could see a case for replacing all these "as sequence" casts with something more generic, e.g., based on checking explicitly for more generic scalar types. Certainly enums should be scalars.

If possible, I would rather stick to duck typing for any requirements we put on names. Base classes don't feel terribly Pythonic.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support non-string dimension/variable names 341643235
406639686 https://github.com/pydata/xarray/issues/2292#issuecomment-406639686 https://api.github.com/repos/pydata/xarray/issues/2292 MDEyOklzc3VlQ29tbWVudDQwNjYzOTY4Ng== shoyer 1217238 2018-07-20T15:41:45Z 2018-07-20T15:41:45Z MEMBER

Oops, I was reading a little too quickly. I did indeed mean (1) above. The one thing I would emphasize is that we don't actually want to check for something like hasattr(dim, '__str__') if possible, but rather just call str(dim). (Though I guess Python's object type defines a default __str__ method, so pretty much everything will pass that test.)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support non-string dimension/variable names 341643235
406371878 https://github.com/pydata/xarray/issues/2292#issuecomment-406371878 https://api.github.com/repos/pydata/xarray/issues/2292 MDEyOklzc3VlQ29tbWVudDQwNjM3MTg3OA== shoyer 1217238 2018-07-19T18:29:11Z 2018-07-20T15:39:10Z MEMBER

~~(2)~~ (1) seems like a pretty decent option to me. It's compatible with Python's duck-typing philosophy, and we don't really need string variable/dimension names for anything other than various serialization formats like netCDF. So the full requirement for names would be "Hashable, can be coerced with str() and not None" (we use None as a sentinel value to indicate the lack of a name in xarray).

CC @pydata/xarray in case anyone else has opinions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support non-string dimension/variable names 341643235
406345790 https://github.com/pydata/xarray/issues/2292#issuecomment-406345790 https://api.github.com/repos/pydata/xarray/issues/2292 MDEyOklzc3VlQ29tbWVudDQwNjM0NTc5MA== shoyer 1217238 2018-07-19T16:58:41Z 2018-07-19T16:58:41Z MEMBER

Another choice would be to intentionally simplify xarray's data model and not allow anything other than strings for variable/dimension names.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support non-string dimension/variable names 341643235
405396751 https://github.com/pydata/xarray/issues/2292#issuecomment-405396751 https://api.github.com/repos/pydata/xarray/issues/2292 MDEyOklzc3VlQ29tbWVudDQwNTM5Njc1MQ== shoyer 1217238 2018-07-16T22:07:18Z 2018-07-16T22:07:18Z MEMBER

It would be an improvement even to clearly document the requirements for dimension/variable names.

I suspect we don't actually need them to be sortable, though we do using sorting as part of the current repr() for some xarray objecs. This is mostly to ensure reproducible displays across multiple loads/runs of a file, but it's increasingly less relevant now than Python's dict is ordered by default (since Python 3.6).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support non-string dimension/variable names 341643235
405371789 https://github.com/pydata/xarray/issues/2292#issuecomment-405371789 https://api.github.com/repos/pydata/xarray/issues/2292 MDEyOklzc3VlQ29tbWVudDQwNTM3MTc4OQ== shoyer 1217238 2018-07-16T20:30:16Z 2018-07-16T20:30:16Z MEMBER

Hi @joshburkart -- thanks for raising this concern.

I agree, it would be nice to support enums (really any hashable value) as dimension names. Our current checks for strings are somewhat inconsistent, e.g., you can actually use these in an xarray.Dataset if you use the fully explicit API for constructing a dataset: python ds = xr.Dataset( data_vars={'foo': ((CoordId.LAT, CoordId.LON), np.arange(3 * 2).reshape(3, 2))}, coords={CoordId.LAT: ((CoordId.LAT,), [1, 2, 3]), CoordId.LON: ((CoordId.LON,), [7, 8])}, )

But now if you try to print the dataset, you get an error about sorting: ```python-traceback


TypeError Traceback (most recent call last) /usr/local/lib/python3.6/dist-packages/IPython/core/formatters.py in call(self, obj) 697 type_pprinters=self.type_printers, 698 deferred_pprinters=self.deferred_printers) --> 699 printer.pretty(obj) 700 printer.flush() 701 return stream.getvalue()

/usr/local/lib/python3.6/dist-packages/IPython/lib/pretty.py in pretty(self, obj) 396 if callable(meth): 397 return meth(obj, self, cycle) --> 398 return _default_pprint(obj, self, cycle) 399 finally: 400 self.end_group()

/usr/local/lib/python3.6/dist-packages/IPython/lib/pretty.py in default_pprint(obj, p, cycle) 516 if _safe_getattr(klass, '__repr__', None) not in _baseclass_reprs: 517 # A user-provided repr. Find newlines and replace them with p.break() --> 518 _repr_pprint(obj, p, cycle) 519 return 520 p.begin_group(1, '<')

/usr/local/lib/python3.6/dist-packages/IPython/lib/pretty.py in repr_pprint(obj, p, cycle) 707 """A pprint that just redirects to the normal repr function.""" 708 # Find newlines and replace them with p.break() --> 709 output = repr(obj) 710 for idx,output_line in enumerate(output.splitlines()): 711 if idx:

/usr/local/lib/python3.6/dist-packages/xarray/core/formatting.py in repr(self) 62 63 def repr(self): ---> 64 return ensure_valid_repr(self.unicode()) 65 66

/usr/local/lib/python3.6/dist-packages/xarray/core/dataset.py in unicode(self) 1188 1189 def unicode(self): -> 1190 return formatting.dataset_repr(self) 1191 1192 def info(self, buf=None):

/usr/local/lib/python3.6/dist-packages/xarray/core/formatting.py in dataset_repr(ds) 415 416 dims_start = pretty_print(u'Dimensions:', col_width) --> 417 summary.append(u'%s(%s)' % (dims_start, dim_summary(ds))) 418 419 if ds.coords:

/usr/local/lib/python3.6/dist-packages/xarray/core/formatting.py in dim_summary(obj) 324 325 def dim_summary(obj): --> 326 elements = [u'%s: %s' % (k, v) for k, v in obj.sizes.items()] 327 return u', '.join(elements) 328

/usr/local/lib/python3.6/dist-packages/xarray/core/formatting.py in <listcomp>(.0) 324 325 def dim_summary(obj): --> 326 elements = [u'%s: %s' % (k, v) for k, v in obj.sizes.items()] 327 return u', '.join(elements) 328

/usr/lib/python3.6/_collections_abc.py in iter(self) 741 742 def iter(self): --> 743 for key in self._mapping: 744 yield (key, self._mapping[key]) 745

/usr/local/lib/python3.6/dist-packages/xarray/core/utils.py in iter(self) 311 312 def iter(self): --> 313 return iter(self.mapping) 314 315 def len(self):

/usr/local/lib/python3.6/dist-packages/xarray/core/utils.py in iter(self) 347 348 def iter(self): --> 349 return iter(sorted(self.mapping)) 350 351 def len(self):

TypeError: '<' not supported between instances of 'CoordId' and 'CoordId' ```

I would be open to PRs to improve the situation.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support non-string dimension/variable names 341643235

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 558.961ms · About: xarray-datasette