home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "MEMBER", issue = 115210260 and user = 5635139 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • max-sixty · 7 ✖

issue 1

  • Display of PeriodIndex · 7 ✖

author_association 1

  • MEMBER · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
164093345 https://github.com/pydata/xarray/issues/645#issuecomment-164093345 https://api.github.com/repos/pydata/xarray/issues/645 MDEyOklzc3VlQ29tbWVudDE2NDA5MzM0NQ== max-sixty 5635139 2015-12-12T01:21:12Z 2015-12-12T06:03:10Z MEMBER

@shoyer Coming back to this:

The main subtlety is that currently we don't actually create the pandas.Index object until we load the entire index array into memory or need to do a lookup operation on the index. I'm not entirely sure this laziness is necessary, but it might be helpful -- it lets us differ loading some data from disk (or remote data sources) until absolutely necessary.

How expensive / inconvenient would it be to force only the coordinate dims to be eagerly loaded? There are some real benefits of having the coordinate dims be pandas Indexes, such as string coercion on dates, full PeriodIndex support, and tz support (and would it make some slicing & MultiIndexes easier too?). If the adaptors are there only to enable lazy loading, is that a worthwhile tradeoff?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Display of PeriodIndex 115210260
154212825 https://github.com/pydata/xarray/issues/645#issuecomment-154212825 https://api.github.com/repos/pydata/xarray/issues/645 MDEyOklzc3VlQ29tbWVudDE1NDIxMjgyNQ== max-sixty 5635139 2015-11-05T22:17:31Z 2015-11-05T22:17:31Z MEMBER

On reflection I wonder how difficult it would be to have a mapping of numpy dtypes to pandas indexes (there are five or so), and then a mapping of pandas indexes to dtypes. The full list is here: http://pandas.pydata.org/pandas-docs/stable/basics.html#selecting-columns-based-on-dtype. Then coords could (almost?) completely delegate to Pandas Index.

Regardless let me finish up that PR and the wider issue can marinate.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Display of PeriodIndex 115210260
154193942 https://github.com/pydata/xarray/issues/645#issuecomment-154193942 https://api.github.com/repos/pydata/xarray/issues/645 MDEyOklzc3VlQ29tbWVudDE1NDE5Mzk0Mg== max-sixty 5635139 2015-11-05T21:10:42Z 2015-11-05T21:10:42Z MEMBER

OK, because we need the dtype before we've loaded the Index?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Display of PeriodIndex 115210260
154184592 https://github.com/pydata/xarray/issues/645#issuecomment-154184592 https://api.github.com/repos/pydata/xarray/issues/645 MDEyOklzc3VlQ29tbWVudDE1NDE4NDU5Mg== max-sixty 5635139 2015-11-05T20:42:57Z 2015-11-05T20:42:57Z MEMBER

When I originally wrote that code, pandas didn't have Float64Index and would use dtype=object. Now, the need for this sort of thing is definitely less pressing.

What are your thoughts on making that change instead, then? Or too big a blast radius without more reflection? Currently only one test fails - setting a float32 dtype.

Sorry, I still don't understand exactly what you're referring to! This does sound pretty bizarre, though -- possibly a bug.

Ha - maybe we'll never get there. One more push: in the comment above, can you see the differences between the two cases? One succeeds and one fails. The only difference is the length of the other coord. That's at least weird if not a bug?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Display of PeriodIndex 115210260
154179449 https://github.com/pydata/xarray/issues/645#issuecomment-154179449 https://api.github.com/repos/pydata/xarray/issues/645 MDEyOklzc3VlQ29tbWVudDE1NDE3OTQ0OQ== max-sixty 5635139 2015-11-05T20:21:38Z 2015-11-05T20:21:38Z MEMBER

PR in for the pressing issue. It won't repr nicely, but it'll work.

Re the main issue: I think that makes sense. So we need to support dtypes that pd.Index doesn't support? If we didn't, this could all be much simpler; for example this could be the __getitem__ method (and maybe we could just have Index as a type, including MultiIndex etc):

``` python def getitem(self, key): if isinstance(key, tuple) and len(key) == 1: # unpack key so it can index a pandas.Index object (pandas.Index # objects don't like tuples) key, = key

    return self.array[key]

```

... but maybe there are data interfaces that need float32 compatibility or similar.

Re:

I'm not entirely sure what you're referring to here -- which line(s) of code is surprising you?

The unanswered question is why the code accesses the items from this coord when it's repr-ing differently, depending on the length of the other coord.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Display of PeriodIndex 115210260
153985189 https://github.com/pydata/xarray/issues/645#issuecomment-153985189 https://api.github.com/repos/pydata/xarray/issues/645 MDEyOklzc3VlQ29tbWVudDE1Mzk4NTE4OQ== max-sixty 5635139 2015-11-05T08:22:28Z 2015-11-05T08:46:35Z MEMBER

Yes .values is much headache on PeriodIndex... A Period dtype would be great although unlikely to happen soon I'd guess. In the mean time, I used is_period_arraylike rather than dtype to identify type IIRC.

Happy to have a go at this - at least to ensure it doesn't break while printing - could you give me an initial 'leg up'? Specifically: - Do you know why it's trying to pull a value from the index when it prints? Its dependence on n seems particularly odd, since changing n doesn't actually change what's attempted to be displayed from that coord (shown below with Timestamps to demonstrate what's displayed in either case) - Do you know why this line https://github.com/xray/xray/blob/master/xray/core/indexing.py#L400 isn't just value? If we need value in a container, I think .shallow_copy([value]) will work. But this leaves the question above unanswered.

``` python In [149]:

n=100 m=3 xray.Dataset( variables = { 'a': (['x', 'y'], np.random.rand(m,n)), 'b': (['x', 'y'], np.random.rand(m,n)) }, coords = { 'x': pd.date_range(start='2000', periods=m), 'y': range(n), }

) Out[149]: <xray.Dataset> Dimensions: (x: 3, y: 100) Coordinates: * y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... * x (x) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 Data variables: a (x, y) float64 0.927 0.9906 0.1317 0.7665 0.4558 0.9502 0.1435 ... b (x, y) float64 0.9084 0.5827 0.8724 0.1391 0.4529 0.6794 0.555 ... In [150]:

n=10 m=3 xray.Dataset( variables = { 'a': (['x', 'y'], np.random.rand(m,n)), 'b': (['x', 'y'], np.random.rand(m,n)) }, coords = { 'x': pd.date_range(start='2000', periods=m), 'y': range(n), }

) Out[150]: <xray.Dataset> Dimensions: (x: 3, y: 10) Coordinates: * y (y) int64 0 1 2 3 4 5 6 7 8 9 * x (x) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 Data variables: a (x, y) float64 0.09265 0.4552 0.6755 0.5913 0.5198 0.2473 ... b (x, y) float64 0.5253 0.04162 0.8621 0.2462 0.2081 0.4814 ... ```

Am excited for IntervalIndex!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Display of PeriodIndex 115210260
153967536 https://github.com/pydata/xarray/issues/645#issuecomment-153967536 https://api.github.com/repos/pydata/xarray/issues/645 MDEyOklzc3VlQ29tbWVudDE1Mzk2NzUzNg== max-sixty 5635139 2015-11-05T06:22:03Z 2015-11-05T06:22:03Z MEMBER

This error is graver. Is there a way to work with PeriodIndexes in the meantime?

``` python In [142]:

n=10 xray.Dataset( variables = { 'a': (['x', 'y'], np.random.rand(3,n)), 'b': (['x', 'y'], np.random.rand(3,n)) }, coords = { 'x': pd.period_range(start='2000', periods=3), 'y': range(n), }

) Out[142]: <xray.Dataset> Dimensions: (x: 3, y: 10) Coordinates: * y (y) int64 0 1 2 3 4 5 6 7 8 9 * x (x) int64 10957 10958 10959 Data variables: a (x, y) float64 0.9978 0.5963 0.3108 0.9992 0.4629 0.8929 0.9299 ... b (x, y) float64 0.9923 0.8678 0.4767 0.2957 0.4157 0.8527 0.269 ... ```

Change n to 100, leaving everything else identical:

``` python In [143]:

0 n=100 xray.Dataset( variables = { 'a': (['x', 'y'], np.random.rand(3,n)), 'b': (['x', 'y'], np.random.rand(3,n)) }, coords = { 'x': pd.period_range(start='2000', periods=3), 'y': range(n), }

)

TypeError Traceback (most recent call last) /usr/local/lib/python2.7/dist-packages/IPython/core/formatters.pyc in call(self, obj) 695 type_pprinters=self.type_printers, 696 deferred_pprinters=self.deferred_printers) --> 697 printer.pretty(obj) 698 printer.flush() 699 return stream.getvalue()

/usr/local/lib/python2.7/dist-packages/IPython/lib/pretty.pyc in pretty(self, obj) 381 if callable(meth): 382 return meth(obj, self, cycle) --> 383 return _default_pprint(obj, self, cycle) 384 finally: 385 self.end_group()

/usr/local/lib/python2.7/dist-packages/IPython/lib/pretty.pyc in default_pprint(obj, p, cycle) 501 if _safe_getattr(klass, '__repr__', None) not in _baseclass_reprs: 502 # A user-provided repr. Find newlines and replace them with p.break() --> 503 _repr_pprint(obj, p, cycle) 504 return 505 p.begin_group(1, '<')

/usr/local/lib/python2.7/dist-packages/IPython/lib/pretty.pyc in repr_pprint(obj, p, cycle) 683 """A pprint that just redirects to the normal repr function.""" 684 # Find newlines and replace them with p.break() --> 685 output = repr(obj) 686 for idx,output_line in enumerate(output.splitlines()): 687 if idx:

/usr/local/lib/python2.7/dist-packages/xray/core/dataset.pyc in repr(self) 885 886 def repr(self): --> 887 return formatting.dataset_repr(self) 888 889 @property

/usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in dataset_repr(ds) 271 272 summary.append(coords_repr(ds.coords, col_width=col_width)) --> 273 summary.append(vars_repr(ds.data_vars, col_width=col_width)) 274 if ds.attrs: 275 summary.append(attrs_repr(ds.attrs))

/usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in _mapping_repr(mapping, title, summarizer, col_width) 208 summary = ['%s:' % title] 209 if mapping: --> 210 summary += [summarizer(k, v, col_width) for k, v in mapping.items()] 211 else: 212 summary += [EMPTY_REPR]

/usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in summarize_var(name, var, col_width) 172 def summarize_var(name, var, col_width): 173 show_values = _not_remote(var) --> 174 return _summarize_var_or_coord(name, var, col_width, show_values) 175 176

/usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in _summarize_var_or_coord(name, var, col_width, show_values, marker, max_width) 154 front_str = first_col + dims_str + ('%s ' % var.dtype) 155 if show_values: --> 156 values_str = format_array_flat(var, max_width - len(front_str)) 157 else: 158 values_str = '...'

/usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in format_array_flat(items_ndarray, max_width) 130 # print at least one item 131 max_possibly_relevant = max(int(np.ceil(max_width / 2.0)), 1) --> 132 relevant_items = first_n_items(items_ndarray, max_possibly_relevant) 133 pprint_items = format_items(relevant_items) 134

/usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in first_n_items(x, n_desired) 53 if n_desired < x.size: 54 indexer = _get_indexer_at_least_n_items(x.shape, n_desired) ---> 55 x = x[indexer] 56 return np.asarray(x).flat[:n_desired] 57

/usr/local/lib/python2.7/dist-packages/xray/core/dataarray.pyc in getitem(self, key) 370 else: 371 # orthogonal array indexing --> 372 return self.isel(**self._item_key_to_dict(key)) 373 374 def setitem(self, key, value):

/usr/local/lib/python2.7/dist-packages/xray/core/dataarray.pyc in isel(self, indexers) 537 DataArray.sel 538 """ --> 539 ds = self._dataset.isel(indexers) 540 return self._with_replaced_dataset(ds) 541

/usr/local/lib/python2.7/dist-packages/xray/core/dataset.pyc in isel(self, indexers) 1008 for name, var in iteritems(self._variables): 1009 var_indexers = dict((k, v) for k, v in indexers if k in var.dims) -> 1010 variables[name] = var.isel(var_indexers) 1011 return self._replace_vars_and_dims(variables) 1012

/usr/local/lib/python2.7/dist-packages/xray/core/variable.pyc in isel(self, *indexers) 494 if dim in indexers: 495 key[i] = indexers[dim] --> 496 return self[tuple(key)] 497 498 def transpose(self, dims):

/usr/local/lib/python2.7/dist-packages/xray/core/variable.pyc in getitem(self, key) 830 def getitem(self, key): 831 key = self._item_key_to_tuple(key) --> 832 values = self._indexable_data[key] 833 if not hasattr(values, 'ndim') or values.ndim == 0: 834 return Variable((), values, self._attrs, self._encoding)

/usr/local/lib/python2.7/dist-packages/xray/core/indexing.pyc in getitem(self, key) 398 value = np.timedelta64(getattr(value, 'value', value), 'ns') 399 else: --> 400 value = np.asarray(value, dtype=self.dtype) 401 else: 402 value = PandasIndexAdapter(self.array[key], dtype=self.dtype)

/usr/local/lib/python2.7/dist-packages/numpy/core/numeric.pyc in asarray(a, dtype, order) 472 473 """ --> 474 return array(a, dtype, copy=False, order=order) 475 476 def asanyarray(a, dtype=None, order=None):

TypeError: long() argument must be a string or a number, not 'pandas._period.Period' ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Display of PeriodIndex 115210260

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 197.376ms · About: xarray-datasette