html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/645#issuecomment-164093345,https://api.github.com/repos/pydata/xarray/issues/645,164093345,MDEyOklzc3VlQ29tbWVudDE2NDA5MzM0NQ==,5635139,2015-12-12T01:21:12Z,2015-12-12T06:03:10Z,MEMBER,"@shoyer Coming back to this: > The main subtlety is that currently we don't actually create the pandas.Index object until we load the entire index array into memory or need to do a lookup operation on the index. I'm not entirely sure this laziness is necessary, but it might be helpful -- it lets us differ loading some data from disk (or remote data sources) until absolutely necessary. How expensive / inconvenient would it be to force only the coordinate `dim`s to be eagerly loaded? There are some real benefits of having the coordinate `dim`s be pandas Indexes, such as string coercion on dates, full PeriodIndex support, and tz support (and would it make some slicing & MultiIndexes easier too?). If the adaptors are there only to enable lazy loading, is that a worthwhile tradeoff? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260 https://github.com/pydata/xarray/issues/645#issuecomment-154212825,https://api.github.com/repos/pydata/xarray/issues/645,154212825,MDEyOklzc3VlQ29tbWVudDE1NDIxMjgyNQ==,5635139,2015-11-05T22:17:31Z,2015-11-05T22:17:31Z,MEMBER,"On reflection I wonder how difficult it would be to have a mapping of numpy dtypes to pandas indexes (there are five or so), and then a mapping of pandas indexes to dtypes. The full list is here: http://pandas.pydata.org/pandas-docs/stable/basics.html#selecting-columns-based-on-dtype. Then coords could (almost?) completely delegate to Pandas Index. Regardless let me finish up that PR and the wider issue can marinate. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260 https://github.com/pydata/xarray/issues/645#issuecomment-154193942,https://api.github.com/repos/pydata/xarray/issues/645,154193942,MDEyOklzc3VlQ29tbWVudDE1NDE5Mzk0Mg==,5635139,2015-11-05T21:10:42Z,2015-11-05T21:10:42Z,MEMBER,"OK, because we need the `dtype` before we've loaded the `Index`? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260 https://github.com/pydata/xarray/issues/645#issuecomment-154184592,https://api.github.com/repos/pydata/xarray/issues/645,154184592,MDEyOklzc3VlQ29tbWVudDE1NDE4NDU5Mg==,5635139,2015-11-05T20:42:57Z,2015-11-05T20:42:57Z,MEMBER,"> When I originally wrote that code, pandas didn't have Float64Index and would use dtype=object. Now, the need for this sort of thing is definitely less pressing. What are your thoughts on making that change instead, then? Or too big a blast radius without more reflection? Currently only one test fails - setting a `float32` dtype. > Sorry, I still don't understand exactly what you're referring to! This does sound pretty bizarre, though -- possibly a bug. Ha - maybe we'll never get there. One more push: in the comment [above](https://github.com/xray/xray/issues/645#issuecomment-153967536), can you see the differences between the two cases? One succeeds and one fails. The only difference is the length of the _other_ coord. That's at least weird if not a bug? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260 https://github.com/pydata/xarray/issues/645#issuecomment-154179449,https://api.github.com/repos/pydata/xarray/issues/645,154179449,MDEyOklzc3VlQ29tbWVudDE1NDE3OTQ0OQ==,5635139,2015-11-05T20:21:38Z,2015-11-05T20:21:38Z,MEMBER,"PR in for the pressing issue. It won't `repr` nicely, but it'll work. Re the main issue: I think that makes sense. So we need to support `dtype`s that `pd.Index` doesn't support? If we didn't, this could all be much simpler; for example this could be the `__getitem__` method (and maybe we could just have `Index` as a type, including `MultiIndex` etc): ``` python def __getitem__(self, key): if isinstance(key, tuple) and len(key) == 1: # unpack key so it can index a pandas.Index object (pandas.Index # objects don't like tuples) key, = key return self.array[key] ``` ... but maybe there are data interfaces that need `float32` compatibility or similar. Re: > I'm not entirely sure what you're referring to here -- which line(s) of code is surprising you? The unanswered question is why the code accesses the items from this coord when it's repr-ing differently, depending on the length of the _other_ coord. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260 https://github.com/pydata/xarray/issues/645#issuecomment-153985189,https://api.github.com/repos/pydata/xarray/issues/645,153985189,MDEyOklzc3VlQ29tbWVudDE1Mzk4NTE4OQ==,5635139,2015-11-05T08:22:28Z,2015-11-05T08:46:35Z,MEMBER,"Yes `.values` is much headache on `PeriodIndex`... A `Period` `dtype` would be great although unlikely to happen soon I'd guess. In the mean time, I used `is_period_arraylike` rather than `dtype` to identify type IIRC. Happy to have a go at this - at least to ensure it doesn't break while printing - could you give me an initial 'leg up'? Specifically: - Do you know why it's trying to pull a value from the index when it prints? Its dependence on `n` seems particularly odd, since changing `n` doesn't actually change what's attempted to be displayed from that coord (shown below with `Timestamp`s to demonstrate what's displayed in either case) - Do you know why this line https://github.com/xray/xray/blob/master/xray/core/indexing.py#L400 isn't just `value`? If we need `value` in a container, I think `.shallow_copy([value])` will work. But this leaves the question above unanswered. ``` python In [149]: n=100 m=3 xray.Dataset( variables = { 'a': (['x', 'y'], np.random.rand(m,n)), 'b': (['x', 'y'], np.random.rand(m,n)) }, coords = { 'x': pd.date_range(start='2000', periods=m), 'y': range(n), } ) Out[149]: Dimensions: (x: 3, y: 100) Coordinates: * y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... * x (x) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 Data variables: a (x, y) float64 0.927 0.9906 0.1317 0.7665 0.4558 0.9502 0.1435 ... b (x, y) float64 0.9084 0.5827 0.8724 0.1391 0.4529 0.6794 0.555 ... In [150]: n=10 m=3 xray.Dataset( variables = { 'a': (['x', 'y'], np.random.rand(m,n)), 'b': (['x', 'y'], np.random.rand(m,n)) }, coords = { 'x': pd.date_range(start='2000', periods=m), 'y': range(n), } ) Out[150]: Dimensions: (x: 3, y: 10) Coordinates: * y (y) int64 0 1 2 3 4 5 6 7 8 9 * x (x) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 Data variables: a (x, y) float64 0.09265 0.4552 0.6755 0.5913 0.5198 0.2473 ... b (x, y) float64 0.5253 0.04162 0.8621 0.2462 0.2081 0.4814 ... ``` Am excited for `IntervalIndex`! ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260 https://github.com/pydata/xarray/issues/645#issuecomment-153967536,https://api.github.com/repos/pydata/xarray/issues/645,153967536,MDEyOklzc3VlQ29tbWVudDE1Mzk2NzUzNg==,5635139,2015-11-05T06:22:03Z,2015-11-05T06:22:03Z,MEMBER,"This error is graver. Is there a way to work with `PeriodIndex`es in the meantime? ``` python In [142]: n=10 xray.Dataset( variables = { 'a': (['x', 'y'], np.random.rand(3,n)), 'b': (['x', 'y'], np.random.rand(3,n)) }, coords = { 'x': pd.period_range(start='2000', periods=3), 'y': range(n), } ) Out[142]: Dimensions: (x: 3, y: 10) Coordinates: * y (y) int64 0 1 2 3 4 5 6 7 8 9 * x (x) int64 10957 10958 10959 Data variables: a (x, y) float64 0.9978 0.5963 0.3108 0.9992 0.4629 0.8929 0.9299 ... b (x, y) float64 0.9923 0.8678 0.4767 0.2957 0.4157 0.8527 0.269 ... ``` Change `n` to 100, leaving everything else identical: ``` python In [143]: 0 n=100 xray.Dataset( variables = { 'a': (['x', 'y'], np.random.rand(3,n)), 'b': (['x', 'y'], np.random.rand(3,n)) }, coords = { 'x': pd.period_range(start='2000', periods=3), 'y': range(n), } ) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /usr/local/lib/python2.7/dist-packages/IPython/core/formatters.pyc in __call__(self, obj) 695 type_pprinters=self.type_printers, 696 deferred_pprinters=self.deferred_printers) --> 697 printer.pretty(obj) 698 printer.flush() 699 return stream.getvalue() /usr/local/lib/python2.7/dist-packages/IPython/lib/pretty.pyc in pretty(self, obj) 381 if callable(meth): 382 return meth(obj, self, cycle) --> 383 return _default_pprint(obj, self, cycle) 384 finally: 385 self.end_group() /usr/local/lib/python2.7/dist-packages/IPython/lib/pretty.pyc in _default_pprint(obj, p, cycle) 501 if _safe_getattr(klass, '__repr__', None) not in _baseclass_reprs: 502 # A user-provided repr. Find newlines and replace them with p.break_() --> 503 _repr_pprint(obj, p, cycle) 504 return 505 p.begin_group(1, '<') /usr/local/lib/python2.7/dist-packages/IPython/lib/pretty.pyc in _repr_pprint(obj, p, cycle) 683 """"""A pprint that just redirects to the normal repr function."""""" 684 # Find newlines and replace them with p.break_() --> 685 output = repr(obj) 686 for idx,output_line in enumerate(output.splitlines()): 687 if idx: /usr/local/lib/python2.7/dist-packages/xray/core/dataset.pyc in __repr__(self) 885 886 def __repr__(self): --> 887 return formatting.dataset_repr(self) 888 889 @property /usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in dataset_repr(ds) 271 272 summary.append(coords_repr(ds.coords, col_width=col_width)) --> 273 summary.append(vars_repr(ds.data_vars, col_width=col_width)) 274 if ds.attrs: 275 summary.append(attrs_repr(ds.attrs)) /usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in _mapping_repr(mapping, title, summarizer, col_width) 208 summary = ['%s:' % title] 209 if mapping: --> 210 summary += [summarizer(k, v, col_width) for k, v in mapping.items()] 211 else: 212 summary += [EMPTY_REPR] /usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in summarize_var(name, var, col_width) 172 def summarize_var(name, var, col_width): 173 show_values = _not_remote(var) --> 174 return _summarize_var_or_coord(name, var, col_width, show_values) 175 176 /usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in _summarize_var_or_coord(name, var, col_width, show_values, marker, max_width) 154 front_str = first_col + dims_str + ('%s ' % var.dtype) 155 if show_values: --> 156 values_str = format_array_flat(var, max_width - len(front_str)) 157 else: 158 values_str = '...' /usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in format_array_flat(items_ndarray, max_width) 130 # print at least one item 131 max_possibly_relevant = max(int(np.ceil(max_width / 2.0)), 1) --> 132 relevant_items = first_n_items(items_ndarray, max_possibly_relevant) 133 pprint_items = format_items(relevant_items) 134 /usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in first_n_items(x, n_desired) 53 if n_desired < x.size: 54 indexer = _get_indexer_at_least_n_items(x.shape, n_desired) ---> 55 x = x[indexer] 56 return np.asarray(x).flat[:n_desired] 57 /usr/local/lib/python2.7/dist-packages/xray/core/dataarray.pyc in __getitem__(self, key) 370 else: 371 # orthogonal array indexing --> 372 return self.isel(**self._item_key_to_dict(key)) 373 374 def __setitem__(self, key, value): /usr/local/lib/python2.7/dist-packages/xray/core/dataarray.pyc in isel(self, **indexers) 537 DataArray.sel 538 """""" --> 539 ds = self._dataset.isel(**indexers) 540 return self._with_replaced_dataset(ds) 541 /usr/local/lib/python2.7/dist-packages/xray/core/dataset.pyc in isel(self, **indexers) 1008 for name, var in iteritems(self._variables): 1009 var_indexers = dict((k, v) for k, v in indexers if k in var.dims) -> 1010 variables[name] = var.isel(**var_indexers) 1011 return self._replace_vars_and_dims(variables) 1012 /usr/local/lib/python2.7/dist-packages/xray/core/variable.pyc in isel(self, **indexers) 494 if dim in indexers: 495 key[i] = indexers[dim] --> 496 return self[tuple(key)] 497 498 def transpose(self, *dims): /usr/local/lib/python2.7/dist-packages/xray/core/variable.pyc in __getitem__(self, key) 830 def __getitem__(self, key): 831 key = self._item_key_to_tuple(key) --> 832 values = self._indexable_data[key] 833 if not hasattr(values, 'ndim') or values.ndim == 0: 834 return Variable((), values, self._attrs, self._encoding) /usr/local/lib/python2.7/dist-packages/xray/core/indexing.pyc in __getitem__(self, key) 398 value = np.timedelta64(getattr(value, 'value', value), 'ns') 399 else: --> 400 value = np.asarray(value, dtype=self.dtype) 401 else: 402 value = PandasIndexAdapter(self.array[key], dtype=self.dtype) /usr/local/lib/python2.7/dist-packages/numpy/core/numeric.pyc in asarray(a, dtype, order) 472 473 """""" --> 474 return array(a, dtype, copy=False, order=order) 475 476 def asanyarray(a, dtype=None, order=None): TypeError: long() argument must be a string or a number, not 'pandas._period.Period' ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260