html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/645#issuecomment-164093345,https://api.github.com/repos/pydata/xarray/issues/645,164093345,MDEyOklzc3VlQ29tbWVudDE2NDA5MzM0NQ==,5635139,2015-12-12T01:21:12Z,2015-12-12T06:03:10Z,MEMBER,"@shoyer Coming back to this:
> The main subtlety is that currently we don't actually create the pandas.Index object until we load the entire index array into memory or need to do a lookup operation on the index. I'm not entirely sure this laziness is necessary, but it might be helpful -- it lets us differ loading some data from disk (or remote data sources) until absolutely necessary.
How expensive / inconvenient would it be to force only the coordinate `dim`s to be eagerly loaded? There are some real benefits of having the coordinate `dim`s be pandas Indexes, such as string coercion on dates, full PeriodIndex support, and tz support (and would it make some slicing & MultiIndexes easier too?). If the adaptors are there only to enable lazy loading, is that a worthwhile tradeoff?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260
https://github.com/pydata/xarray/issues/645#issuecomment-154212825,https://api.github.com/repos/pydata/xarray/issues/645,154212825,MDEyOklzc3VlQ29tbWVudDE1NDIxMjgyNQ==,5635139,2015-11-05T22:17:31Z,2015-11-05T22:17:31Z,MEMBER,"On reflection I wonder how difficult it would be to have a mapping of numpy dtypes to pandas indexes (there are five or so), and then a mapping of pandas indexes to dtypes. The full list is here: http://pandas.pydata.org/pandas-docs/stable/basics.html#selecting-columns-based-on-dtype.
Then coords could (almost?) completely delegate to Pandas Index.
Regardless let me finish up that PR and the wider issue can marinate.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260
https://github.com/pydata/xarray/issues/645#issuecomment-154193942,https://api.github.com/repos/pydata/xarray/issues/645,154193942,MDEyOklzc3VlQ29tbWVudDE1NDE5Mzk0Mg==,5635139,2015-11-05T21:10:42Z,2015-11-05T21:10:42Z,MEMBER,"OK, because we need the `dtype` before we've loaded the `Index`?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260
https://github.com/pydata/xarray/issues/645#issuecomment-154184592,https://api.github.com/repos/pydata/xarray/issues/645,154184592,MDEyOklzc3VlQ29tbWVudDE1NDE4NDU5Mg==,5635139,2015-11-05T20:42:57Z,2015-11-05T20:42:57Z,MEMBER,"> When I originally wrote that code, pandas didn't have Float64Index and would use dtype=object. Now, the need for this sort of thing is definitely less pressing.
What are your thoughts on making that change instead, then? Or too big a blast radius without more reflection? Currently only one test fails - setting a `float32` dtype.
> Sorry, I still don't understand exactly what you're referring to! This does sound pretty bizarre, though -- possibly a bug.
Ha - maybe we'll never get there. One more push: in the comment [above](https://github.com/xray/xray/issues/645#issuecomment-153967536), can you see the differences between the two cases? One succeeds and one fails. The only difference is the length of the _other_ coord. That's at least weird if not a bug?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260
https://github.com/pydata/xarray/issues/645#issuecomment-154179449,https://api.github.com/repos/pydata/xarray/issues/645,154179449,MDEyOklzc3VlQ29tbWVudDE1NDE3OTQ0OQ==,5635139,2015-11-05T20:21:38Z,2015-11-05T20:21:38Z,MEMBER,"PR in for the pressing issue. It won't `repr` nicely, but it'll work.
Re the main issue: I think that makes sense. So we need to support `dtype`s that `pd.Index` doesn't support? If we didn't, this could all be much simpler; for example this could be the `__getitem__` method (and maybe we could just have `Index` as a type, including `MultiIndex` etc):
``` python
def __getitem__(self, key):
if isinstance(key, tuple) and len(key) == 1:
# unpack key so it can index a pandas.Index object (pandas.Index
# objects don't like tuples)
key, = key
return self.array[key]
```
... but maybe there are data interfaces that need `float32` compatibility or similar.
Re:
> I'm not entirely sure what you're referring to here -- which line(s) of code is surprising you?
The unanswered question is why the code accesses the items from this coord when it's repr-ing differently, depending on the length of the _other_ coord.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260
https://github.com/pydata/xarray/issues/645#issuecomment-153985189,https://api.github.com/repos/pydata/xarray/issues/645,153985189,MDEyOklzc3VlQ29tbWVudDE1Mzk4NTE4OQ==,5635139,2015-11-05T08:22:28Z,2015-11-05T08:46:35Z,MEMBER,"Yes `.values` is much headache on `PeriodIndex`... A `Period` `dtype` would be great although unlikely to happen soon I'd guess. In the mean time, I used `is_period_arraylike` rather than `dtype` to identify type IIRC.
Happy to have a go at this - at least to ensure it doesn't break while printing - could you give me an initial 'leg up'? Specifically:
- Do you know why it's trying to pull a value from the index when it prints? Its dependence on `n` seems particularly odd, since changing `n` doesn't actually change what's attempted to be displayed from that coord (shown below with `Timestamp`s to demonstrate what's displayed in either case)
- Do you know why this line https://github.com/xray/xray/blob/master/xray/core/indexing.py#L400 isn't just `value`? If we need `value` in a container, I think `.shallow_copy([value])` will work. But this leaves the question above unanswered.
``` python
In [149]:
n=100
m=3
xray.Dataset(
variables = {
'a': (['x', 'y'], np.random.rand(m,n)),
'b': (['x', 'y'], np.random.rand(m,n))
},
coords = {
'x': pd.date_range(start='2000', periods=m),
'y': range(n),
}
)
Out[149]:
Dimensions: (x: 3, y: 100)
Coordinates:
* y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
* x (x) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03
Data variables:
a (x, y) float64 0.927 0.9906 0.1317 0.7665 0.4558 0.9502 0.1435 ...
b (x, y) float64 0.9084 0.5827 0.8724 0.1391 0.4529 0.6794 0.555 ...
In [150]:
n=10
m=3
xray.Dataset(
variables = {
'a': (['x', 'y'], np.random.rand(m,n)),
'b': (['x', 'y'], np.random.rand(m,n))
},
coords = {
'x': pd.date_range(start='2000', periods=m),
'y': range(n),
}
)
Out[150]:
Dimensions: (x: 3, y: 10)
Coordinates:
* y (y) int64 0 1 2 3 4 5 6 7 8 9
* x (x) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03
Data variables:
a (x, y) float64 0.09265 0.4552 0.6755 0.5913 0.5198 0.2473 ...
b (x, y) float64 0.5253 0.04162 0.8621 0.2462 0.2081 0.4814 ...
```
Am excited for `IntervalIndex`!
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260
https://github.com/pydata/xarray/issues/645#issuecomment-153967536,https://api.github.com/repos/pydata/xarray/issues/645,153967536,MDEyOklzc3VlQ29tbWVudDE1Mzk2NzUzNg==,5635139,2015-11-05T06:22:03Z,2015-11-05T06:22:03Z,MEMBER,"This error is graver. Is there a way to work with `PeriodIndex`es in the meantime?
``` python
In [142]:
n=10
xray.Dataset(
variables = {
'a': (['x', 'y'], np.random.rand(3,n)),
'b': (['x', 'y'], np.random.rand(3,n))
},
coords = {
'x': pd.period_range(start='2000', periods=3),
'y': range(n),
}
)
Out[142]:
Dimensions: (x: 3, y: 10)
Coordinates:
* y (y) int64 0 1 2 3 4 5 6 7 8 9
* x (x) int64 10957 10958 10959
Data variables:
a (x, y) float64 0.9978 0.5963 0.3108 0.9992 0.4629 0.8929 0.9299 ...
b (x, y) float64 0.9923 0.8678 0.4767 0.2957 0.4157 0.8527 0.269 ...
```
Change `n` to 100, leaving everything else identical:
``` python
In [143]:
0
n=100
xray.Dataset(
variables = {
'a': (['x', 'y'], np.random.rand(3,n)),
'b': (['x', 'y'], np.random.rand(3,n))
},
coords = {
'x': pd.period_range(start='2000', periods=3),
'y': range(n),
}
)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/usr/local/lib/python2.7/dist-packages/IPython/core/formatters.pyc in __call__(self, obj)
695 type_pprinters=self.type_printers,
696 deferred_pprinters=self.deferred_printers)
--> 697 printer.pretty(obj)
698 printer.flush()
699 return stream.getvalue()
/usr/local/lib/python2.7/dist-packages/IPython/lib/pretty.pyc in pretty(self, obj)
381 if callable(meth):
382 return meth(obj, self, cycle)
--> 383 return _default_pprint(obj, self, cycle)
384 finally:
385 self.end_group()
/usr/local/lib/python2.7/dist-packages/IPython/lib/pretty.pyc in _default_pprint(obj, p, cycle)
501 if _safe_getattr(klass, '__repr__', None) not in _baseclass_reprs:
502 # A user-provided repr. Find newlines and replace them with p.break_()
--> 503 _repr_pprint(obj, p, cycle)
504 return
505 p.begin_group(1, '<')
/usr/local/lib/python2.7/dist-packages/IPython/lib/pretty.pyc in _repr_pprint(obj, p, cycle)
683 """"""A pprint that just redirects to the normal repr function.""""""
684 # Find newlines and replace them with p.break_()
--> 685 output = repr(obj)
686 for idx,output_line in enumerate(output.splitlines()):
687 if idx:
/usr/local/lib/python2.7/dist-packages/xray/core/dataset.pyc in __repr__(self)
885
886 def __repr__(self):
--> 887 return formatting.dataset_repr(self)
888
889 @property
/usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in dataset_repr(ds)
271
272 summary.append(coords_repr(ds.coords, col_width=col_width))
--> 273 summary.append(vars_repr(ds.data_vars, col_width=col_width))
274 if ds.attrs:
275 summary.append(attrs_repr(ds.attrs))
/usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in _mapping_repr(mapping, title, summarizer, col_width)
208 summary = ['%s:' % title]
209 if mapping:
--> 210 summary += [summarizer(k, v, col_width) for k, v in mapping.items()]
211 else:
212 summary += [EMPTY_REPR]
/usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in summarize_var(name, var, col_width)
172 def summarize_var(name, var, col_width):
173 show_values = _not_remote(var)
--> 174 return _summarize_var_or_coord(name, var, col_width, show_values)
175
176
/usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in _summarize_var_or_coord(name, var, col_width, show_values, marker, max_width)
154 front_str = first_col + dims_str + ('%s ' % var.dtype)
155 if show_values:
--> 156 values_str = format_array_flat(var, max_width - len(front_str))
157 else:
158 values_str = '...'
/usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in format_array_flat(items_ndarray, max_width)
130 # print at least one item
131 max_possibly_relevant = max(int(np.ceil(max_width / 2.0)), 1)
--> 132 relevant_items = first_n_items(items_ndarray, max_possibly_relevant)
133 pprint_items = format_items(relevant_items)
134
/usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in first_n_items(x, n_desired)
53 if n_desired < x.size:
54 indexer = _get_indexer_at_least_n_items(x.shape, n_desired)
---> 55 x = x[indexer]
56 return np.asarray(x).flat[:n_desired]
57
/usr/local/lib/python2.7/dist-packages/xray/core/dataarray.pyc in __getitem__(self, key)
370 else:
371 # orthogonal array indexing
--> 372 return self.isel(**self._item_key_to_dict(key))
373
374 def __setitem__(self, key, value):
/usr/local/lib/python2.7/dist-packages/xray/core/dataarray.pyc in isel(self, **indexers)
537 DataArray.sel
538 """"""
--> 539 ds = self._dataset.isel(**indexers)
540 return self._with_replaced_dataset(ds)
541
/usr/local/lib/python2.7/dist-packages/xray/core/dataset.pyc in isel(self, **indexers)
1008 for name, var in iteritems(self._variables):
1009 var_indexers = dict((k, v) for k, v in indexers if k in var.dims)
-> 1010 variables[name] = var.isel(**var_indexers)
1011 return self._replace_vars_and_dims(variables)
1012
/usr/local/lib/python2.7/dist-packages/xray/core/variable.pyc in isel(self, **indexers)
494 if dim in indexers:
495 key[i] = indexers[dim]
--> 496 return self[tuple(key)]
497
498 def transpose(self, *dims):
/usr/local/lib/python2.7/dist-packages/xray/core/variable.pyc in __getitem__(self, key)
830 def __getitem__(self, key):
831 key = self._item_key_to_tuple(key)
--> 832 values = self._indexable_data[key]
833 if not hasattr(values, 'ndim') or values.ndim == 0:
834 return Variable((), values, self._attrs, self._encoding)
/usr/local/lib/python2.7/dist-packages/xray/core/indexing.pyc in __getitem__(self, key)
398 value = np.timedelta64(getattr(value, 'value', value), 'ns')
399 else:
--> 400 value = np.asarray(value, dtype=self.dtype)
401 else:
402 value = PandasIndexAdapter(self.array[key], dtype=self.dtype)
/usr/local/lib/python2.7/dist-packages/numpy/core/numeric.pyc in asarray(a, dtype, order)
472
473 """"""
--> 474 return array(a, dtype, copy=False, order=order)
475
476 def asanyarray(a, dtype=None, order=None):
TypeError: long() argument must be a string or a number, not 'pandas._period.Period'
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260