html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/1132#issuecomment-262557502,https://api.github.com/repos/pydata/xarray/issues/1132,262557502,MDEyOklzc3VlQ29tbWVudDI2MjU1NzUwMg==,3404817,2016-11-23T16:06:46Z,2016-11-23T16:06:46Z,CONTRIBUTOR,"Great, `safe_cast_to_index` works nicely (it passes my test). I added the change to the existing PR. Do we need to add more groupby tests to make sure the solution is safe for other cases (e.g. other data types)?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,190683531 https://github.com/pydata/xarray/issues/1132#issuecomment-262233644,https://api.github.com/repos/pydata/xarray/issues/1132,262233644,MDEyOklzc3VlQ29tbWVudDI2MjIzMzY0NA==,3404817,2016-11-22T12:56:03Z,2016-11-22T13:21:04Z,CONTRIBUTOR,"OK, here is the minimal example: ```python import xarray as xr import pandas as pd def test_groupby_da_datetime(): """"""groupby with a DataArray of dtype datetime"""""" # create test data times = pd.date_range('2000-01-01', periods=4) foo = xr.DataArray([1,2,3,4], coords=dict(time=times), dims='time') # create test index dd = times.to_datetime() reference_dates = [dd[0], dd[2]] labels = reference_dates[0:1]*2 + reference_dates[1:2]*2 ind = xr.DataArray(labels, coords=dict(time=times), dims='time', name='reference_date') # group foo by ind g = foo.groupby(ind) # check result actual = g.sum(dim='time') expected = xr.DataArray([3,7], coords=dict(reference_date=reference_dates), dims='reference_date') assert actual.to_dataset(name='foo').equals(expected.to_dataset(name='foo')) ``` Making that, I found out that the problem only occurs when the DataArray used with `groupby` has **`dtype=datetime64[ns]`**. The problem is that we effectively feed the DataArray to [`pd.factorize`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.factorize.html) and that goes well for most data types: Pandas checks with the function [`needs_i8_conversion`](https://github.com/pandas-dev/pandas/blob/v0.19.1/pandas/types/common.py#L248-L251) whether it can factorize the DataArray and decides YES for our `datetime64[ns]`. But then [in `pd.factorize`](https://github.com/pandas-dev/pandas/blob/v0.19.1/pandas/core/algorithms.py#L295-L307) it fails because it tries to access `DataArray.view` to convert to `int64`. So as I see it there are three possible solutions to this: 1. Make Pandas' `pd.factorize` handle our `datetime` DataArrays better, 2. Add an attribute `.view` to DataArrays, or 3. Use the solution in the above PR, which means feeding only the NumPy `.values` to `pd.factorize`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,190683531