html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/645#issuecomment-164635815,https://api.github.com/repos/pydata/xarray/issues/645,164635815,MDEyOklzc3VlQ29tbWVudDE2NDYzNTgxNQ==,1217238,2015-12-15T03:39:49Z,2015-12-15T03:39:49Z,MEMBER,"Lazy loading, even of indices, can be pretty important -- sometimes calculating indices requiring downloading a significant amount of data over a wire. I am reluctant to change it.

However, another possible way to fix the printing issue is to guarantee that index data always gets cast to a `pandas.Index` before accessing it, even if the next step is simply pulling out `.values` (a numpy array).
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260
https://github.com/pydata/xarray/issues/645#issuecomment-164093345,https://api.github.com/repos/pydata/xarray/issues/645,164093345,MDEyOklzc3VlQ29tbWVudDE2NDA5MzM0NQ==,5635139,2015-12-12T01:21:12Z,2015-12-12T06:03:10Z,MEMBER,"@shoyer Coming back to this: 

> The main subtlety is that currently we don't actually create the pandas.Index object until we load the entire index array into memory or need to do a lookup operation on the index. I'm not entirely sure this laziness is necessary, but it might be helpful -- it lets us differ loading some data from disk (or remote data sources) until absolutely necessary. 

How expensive / inconvenient would it be to force only the coordinate `dim`s to be eagerly loaded? There are some real benefits of having the coordinate `dim`s be pandas Indexes, such as string coercion on dates, full PeriodIndex support, and tz support (and would it make some slicing & MultiIndexes easier too?). If the adaptors are there only to enable lazy loading, is that a worthwhile tradeoff?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260
https://github.com/pydata/xarray/issues/645#issuecomment-154212825,https://api.github.com/repos/pydata/xarray/issues/645,154212825,MDEyOklzc3VlQ29tbWVudDE1NDIxMjgyNQ==,5635139,2015-11-05T22:17:31Z,2015-11-05T22:17:31Z,MEMBER,"On reflection I wonder how difficult it would be to have a mapping of numpy dtypes to pandas indexes (there are five or so), and then a mapping of pandas indexes to dtypes. The full list is here: http://pandas.pydata.org/pandas-docs/stable/basics.html#selecting-columns-based-on-dtype.
Then coords could (almost?) completely delegate to Pandas Index.

Regardless let me finish up that PR and the wider issue can marinate. 
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260
https://github.com/pydata/xarray/issues/645#issuecomment-154195290,https://api.github.com/repos/pydata/xarray/issues/645,154195290,MDEyOklzc3VlQ29tbWVudDE1NDE5NTI5MA==,1217238,2015-11-05T21:16:32Z,2015-11-05T21:16:32Z,MEMBER,"yes, exactly

On Thu, Nov 5, 2015 at 1:10 PM, Maximilian Roos notifications@github.com
wrote:

> OK, because we need the dtype before we've loaded the Index?
> 
> —
> Reply to this email directly or view it on GitHub
> https://github.com/xray/xray/issues/645#issuecomment-154193942.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260
https://github.com/pydata/xarray/issues/645#issuecomment-154193942,https://api.github.com/repos/pydata/xarray/issues/645,154193942,MDEyOklzc3VlQ29tbWVudDE1NDE5Mzk0Mg==,5635139,2015-11-05T21:10:42Z,2015-11-05T21:10:42Z,MEMBER,"OK, because we need the `dtype` before we've loaded the `Index`?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260
https://github.com/pydata/xarray/issues/645#issuecomment-154189059,https://api.github.com/repos/pydata/xarray/issues/645,154189059,MDEyOklzc3VlQ29tbWVudDE1NDE4OTA1OQ==,1217238,2015-11-05T20:59:08Z,2015-11-05T20:59:08Z,MEMBER,"> Ha - maybe we'll never get there. One more push: in the comment above, can you see the differences between the two cases? One succeeds and one fails. The only difference is the length of the other coord. That's at least weird if not a bug?

Oh -- yes, I agree that is very strange. I have no idea why that is!

> What are your thoughts on making that change instead, then? Or too big a blast radius without more reflection? Currently only one test fails - setting a float32 dtype.

I think this will be a little tricky to change. The main subtlety is that currently we don't actually create the `pandas.Index` object until we load the entire index array into memory or need to do a lookup operation on the index. I'm not entirely sure this laziness is necessary, but it _might_ be helpful -- it lets us differ loading some data from disk (or remote data sources) until absolutely necessary. The challenge then is ensuring that dtypes are preserved if the array is cached or not -- we would need to figure out what the corresponding pandas type is even before we load the data necessary to create the Index object.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260
https://github.com/pydata/xarray/issues/645#issuecomment-154184592,https://api.github.com/repos/pydata/xarray/issues/645,154184592,MDEyOklzc3VlQ29tbWVudDE1NDE4NDU5Mg==,5635139,2015-11-05T20:42:57Z,2015-11-05T20:42:57Z,MEMBER,"> When I originally wrote that code, pandas didn't have Float64Index and would use dtype=object. Now, the need for this sort of thing is definitely less pressing.

What are your thoughts on making that change instead, then? Or too big a blast radius without more reflection? Currently only one test fails - setting a `float32` dtype.

> Sorry, I still don't understand exactly what you're referring to! This does sound pretty bizarre, though -- possibly a bug.

Ha - maybe we'll never get there. One more push: in the comment [above](https://github.com/xray/xray/issues/645#issuecomment-153967536), can you see the differences between the two cases? One succeeds and one fails. The only difference is the length of the _other_ coord. That's at least weird if not a bug?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260
https://github.com/pydata/xarray/issues/645#issuecomment-154182414,https://api.github.com/repos/pydata/xarray/issues/645,154182414,MDEyOklzc3VlQ29tbWVudDE1NDE4MjQxNA==,1217238,2015-11-05T20:34:53Z,2015-11-05T20:34:53Z,MEMBER,"When I originally wrote that code, pandas didn't have `Float64Index` and would use `dtype=object`. Now, the need for this sort of thing is definitely less pressing.

> The unanswered question is why the code accesses the items from this coord when it's repr-ing differently, depending on the length of the other coord.

Sorry, I still don't understand exactly what you're referring to! This does sound pretty bizarre, though -- possibly a bug.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260
https://github.com/pydata/xarray/issues/645#issuecomment-154179449,https://api.github.com/repos/pydata/xarray/issues/645,154179449,MDEyOklzc3VlQ29tbWVudDE1NDE3OTQ0OQ==,5635139,2015-11-05T20:21:38Z,2015-11-05T20:21:38Z,MEMBER,"PR in for the pressing issue. It won't `repr` nicely, but it'll work. 

Re the main issue: I think that makes sense. So we need to support `dtype`s that `pd.Index` doesn't support? If we didn't, this could all be much simpler; for example this could be the `__getitem__` method (and maybe we could just have `Index` as a type, including `MultiIndex` etc):

``` python
    def __getitem__(self, key):
        if isinstance(key, tuple) and len(key) == 1:
            # unpack key so it can index a pandas.Index object (pandas.Index
            # objects don't like tuples)
            key, = key

        return self.array[key]
```

... but maybe there are data interfaces that need `float32` compatibility or similar.

Re:

> I'm not entirely sure what you're referring to here -- which line(s) of code is surprising you?

The unanswered question is why the code accesses the items from this coord when it's repr-ing differently, depending on the length of the _other_ coord.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260
https://github.com/pydata/xarray/issues/645#issuecomment-154149878,https://api.github.com/repos/pydata/xarray/issues/645,154149878,MDEyOklzc3VlQ29tbWVudDE1NDE0OTg3OA==,1217238,2015-11-05T18:46:32Z,2015-11-05T18:46:32Z,MEMBER,"> Do you know why this line https://github.com/xray/xray/blob/master/xray/core/indexing.py#L400 isn't just value?

This line is basically there to work around cases where pandas stores an array in an index with a different dtype. For example, consider this dataset with an int32 coordinate:

```
In [10]: xray.Dataset({'x': np.arange(3, dtype='int32')}).x.dtype
Out[10]: dtype('int32')
```

Under the covers, there's an int64 index (pandas doesn't have `Int32Index`):

```
In [11]: xray.Dataset({'x': np.arange(3, dtype='int32')}).indexes['x']
Out[11]: Int64Index([0, 1, 2], dtype='int64', name=u'x')
```

This line ensure that we cast back to the original dtype when we get `.values` from the data.

In this case, I think a simple fix for `PandasIndexAdapter` would be to update it's `dtype` so it reports `object` instead of `int64` if it's holding a `PeriodIndex`. Then the casting should work properly.

> Do you know why it's trying to pull a value from the index when it prints?

I'm not entirely sure what you're referring to here -- which line(s) of code is surprising you?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260
https://github.com/pydata/xarray/issues/645#issuecomment-153985189,https://api.github.com/repos/pydata/xarray/issues/645,153985189,MDEyOklzc3VlQ29tbWVudDE1Mzk4NTE4OQ==,5635139,2015-11-05T08:22:28Z,2015-11-05T08:46:35Z,MEMBER,"Yes `.values` is much headache on `PeriodIndex`... A `Period` `dtype` would be great although unlikely to happen soon I'd guess. In the mean time, I used `is_period_arraylike` rather than `dtype` to identify type IIRC.

Happy to have a go at this - at least to ensure it doesn't break while printing - could you give me an initial 'leg up'? Specifically:
- Do you know why it's trying to pull a value from the index when it prints? Its dependence on `n` seems particularly odd, since changing `n` doesn't actually change what's attempted to be displayed from that coord (shown below with `Timestamp`s to demonstrate what's displayed in either case)
- Do you know why this line https://github.com/xray/xray/blob/master/xray/core/indexing.py#L400 isn't just `value`? If we need `value` in a container, I think `.shallow_copy([value])` will work. But this leaves the question above unanswered.

``` python
In [149]:

n=100
m=3
xray.Dataset(
    variables = {
        'a': (['x', 'y'], np.random.rand(m,n)),
        'b': (['x', 'y'], np.random.rand(m,n))
        },
    coords = {
        'x': pd.date_range(start='2000', periods=m),
        'y': range(n),
    }

)
Out[149]:
<xray.Dataset>
Dimensions:  (x: 3, y: 100)
Coordinates:
  * y        (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
  * x        (x) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03
Data variables:
    a        (x, y) float64 0.927 0.9906 0.1317 0.7665 0.4558 0.9502 0.1435 ...
    b        (x, y) float64 0.9084 0.5827 0.8724 0.1391 0.4529 0.6794 0.555 ... 
In [150]:

n=10
m=3
xray.Dataset(
    variables = {
        'a': (['x', 'y'], np.random.rand(m,n)),
        'b': (['x', 'y'], np.random.rand(m,n))
        },
    coords = {
        'x': pd.date_range(start='2000', periods=m),
        'y': range(n),
    }

)
Out[150]:
<xray.Dataset>
Dimensions:  (x: 3, y: 10)
Coordinates:
  * y        (y) int64 0 1 2 3 4 5 6 7 8 9
  * x        (x) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03
Data variables:
    a        (x, y) float64 0.09265 0.4552 0.6755 0.5913 0.5198 0.2473 ...
    b        (x, y) float64 0.5253 0.04162 0.8621 0.2462 0.2081 0.4814 ...
```

Am excited for `IntervalIndex`!
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260
https://github.com/pydata/xarray/issues/645#issuecomment-153980946,https://api.github.com/repos/pydata/xarray/issues/645,153980946,MDEyOklzc3VlQ29tbWVudDE1Mzk4MDk0Ng==,1217238,2015-11-05T07:55:27Z,2015-11-05T07:55:27Z,MEMBER,"I have not tried using xray with pandas's `PeriodIndex` before. On the whole, I'm not a really big fan of PeriodIndex -- `IntervalIndex` (https://github.com/pydata/pandas/pull/8707) will be a more general solution that allows for arbitrary interval bounds.

The broken thing about PeriodIndex is that it lies and claims to have `int64` dtype even though it consists of `Period` scalars:

```
In [3]: pd.period_range('2000', freq='Y', periods=3).dtype
Out[3]: dtype('int64')
```

I suppose pandas is unlikely to fix this in the immediate (though I would argue that it really should). In the meantime, do you have any interest in working on a fix for this? I suspect this would be relatively straightforward -- you'll simply need a work around or two to explicitly handle PeriodIndex.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260
https://github.com/pydata/xarray/issues/645#issuecomment-153967536,https://api.github.com/repos/pydata/xarray/issues/645,153967536,MDEyOklzc3VlQ29tbWVudDE1Mzk2NzUzNg==,5635139,2015-11-05T06:22:03Z,2015-11-05T06:22:03Z,MEMBER,"This error is graver. Is there a way to work with `PeriodIndex`es in the meantime?

``` python
In [142]:

n=10
xray.Dataset(
    variables = {
        'a': (['x', 'y'], np.random.rand(3,n)),
        'b': (['x', 'y'], np.random.rand(3,n))
        },
    coords = {
        'x': pd.period_range(start='2000', periods=3),
        'y': range(n),
    }

)
Out[142]:
<xray.Dataset>
Dimensions:  (x: 3, y: 10)
Coordinates:
  * y        (y) int64 0 1 2 3 4 5 6 7 8 9
  * x        (x) int64 10957 10958 10959
Data variables:
    a        (x, y) float64 0.9978 0.5963 0.3108 0.9992 0.4629 0.8929 0.9299 ...
    b        (x, y) float64 0.9923 0.8678 0.4767 0.2957 0.4157 0.8527 0.269 ...
```

Change `n` to 100, leaving everything else identical:

``` python
In [143]:

0
n=100
xray.Dataset(
    variables = {
        'a': (['x', 'y'], np.random.rand(3,n)),
        'b': (['x', 'y'], np.random.rand(3,n))
        },
    coords = {
        'x': pd.period_range(start='2000', periods=3),
        'y': range(n),
    }

)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/usr/local/lib/python2.7/dist-packages/IPython/core/formatters.pyc in __call__(self, obj)
    695                 type_pprinters=self.type_printers,
    696                 deferred_pprinters=self.deferred_printers)
--> 697             printer.pretty(obj)
    698             printer.flush()
    699             return stream.getvalue()

/usr/local/lib/python2.7/dist-packages/IPython/lib/pretty.pyc in pretty(self, obj)
    381                             if callable(meth):
    382                                 return meth(obj, self, cycle)
--> 383             return _default_pprint(obj, self, cycle)
    384         finally:
    385             self.end_group()

/usr/local/lib/python2.7/dist-packages/IPython/lib/pretty.pyc in _default_pprint(obj, p, cycle)
    501     if _safe_getattr(klass, '__repr__', None) not in _baseclass_reprs:
    502         # A user-provided repr. Find newlines and replace them with p.break_()
--> 503         _repr_pprint(obj, p, cycle)
    504         return
    505     p.begin_group(1, '<')

/usr/local/lib/python2.7/dist-packages/IPython/lib/pretty.pyc in _repr_pprint(obj, p, cycle)
    683     """"""A pprint that just redirects to the normal repr function.""""""
    684     # Find newlines and replace them with p.break_()
--> 685     output = repr(obj)
    686     for idx,output_line in enumerate(output.splitlines()):
    687         if idx:

/usr/local/lib/python2.7/dist-packages/xray/core/dataset.pyc in __repr__(self)
    885 
    886     def __repr__(self):
--> 887         return formatting.dataset_repr(self)
    888 
    889     @property

/usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in dataset_repr(ds)
    271 
    272     summary.append(coords_repr(ds.coords, col_width=col_width))
--> 273     summary.append(vars_repr(ds.data_vars, col_width=col_width))
    274     if ds.attrs:
    275         summary.append(attrs_repr(ds.attrs))

/usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in _mapping_repr(mapping, title, summarizer, col_width)
    208     summary = ['%s:' % title]
    209     if mapping:
--> 210         summary += [summarizer(k, v, col_width) for k, v in mapping.items()]
    211     else:
    212         summary += [EMPTY_REPR]

/usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in summarize_var(name, var, col_width)
    172 def summarize_var(name, var, col_width):
    173     show_values = _not_remote(var)
--> 174     return _summarize_var_or_coord(name, var, col_width, show_values)
    175 
    176 

/usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in _summarize_var_or_coord(name, var, col_width, show_values, marker, max_width)
    154     front_str = first_col + dims_str + ('%s ' % var.dtype)
    155     if show_values:
--> 156         values_str = format_array_flat(var, max_width - len(front_str))
    157     else:
    158         values_str = '...'

/usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in format_array_flat(items_ndarray, max_width)
    130     # print at least one item
    131     max_possibly_relevant = max(int(np.ceil(max_width / 2.0)), 1)
--> 132     relevant_items = first_n_items(items_ndarray, max_possibly_relevant)
    133     pprint_items = format_items(relevant_items)
    134 

/usr/local/lib/python2.7/dist-packages/xray/core/formatting.pyc in first_n_items(x, n_desired)
     53     if n_desired < x.size:
     54         indexer = _get_indexer_at_least_n_items(x.shape, n_desired)
---> 55         x = x[indexer]
     56     return np.asarray(x).flat[:n_desired]
     57 

/usr/local/lib/python2.7/dist-packages/xray/core/dataarray.pyc in __getitem__(self, key)
    370         else:
    371             # orthogonal array indexing
--> 372             return self.isel(**self._item_key_to_dict(key))
    373 
    374     def __setitem__(self, key, value):

/usr/local/lib/python2.7/dist-packages/xray/core/dataarray.pyc in isel(self, **indexers)
    537         DataArray.sel
    538         """"""
--> 539         ds = self._dataset.isel(**indexers)
    540         return self._with_replaced_dataset(ds)
    541 

/usr/local/lib/python2.7/dist-packages/xray/core/dataset.pyc in isel(self, **indexers)
   1008         for name, var in iteritems(self._variables):
   1009             var_indexers = dict((k, v) for k, v in indexers if k in var.dims)
-> 1010             variables[name] = var.isel(**var_indexers)
   1011         return self._replace_vars_and_dims(variables)
   1012 

/usr/local/lib/python2.7/dist-packages/xray/core/variable.pyc in isel(self, **indexers)
    494             if dim in indexers:
    495                 key[i] = indexers[dim]
--> 496         return self[tuple(key)]
    497 
    498     def transpose(self, *dims):

/usr/local/lib/python2.7/dist-packages/xray/core/variable.pyc in __getitem__(self, key)
    830     def __getitem__(self, key):
    831         key = self._item_key_to_tuple(key)
--> 832         values = self._indexable_data[key]
    833         if not hasattr(values, 'ndim') or values.ndim == 0:
    834             return Variable((), values, self._attrs, self._encoding)

/usr/local/lib/python2.7/dist-packages/xray/core/indexing.pyc in __getitem__(self, key)
    398                 value = np.timedelta64(getattr(value, 'value', value), 'ns')
    399             else:
--> 400                 value = np.asarray(value, dtype=self.dtype)
    401         else:
    402             value = PandasIndexAdapter(self.array[key], dtype=self.dtype)

/usr/local/lib/python2.7/dist-packages/numpy/core/numeric.pyc in asarray(a, dtype, order)
    472 
    473     """"""
--> 474     return array(a, dtype, copy=False, order=order)
    475 
    476 def asanyarray(a, dtype=None, order=None):

TypeError: long() argument must be a string or a number, not 'pandas._period.Period'
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,115210260