html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2191#issuecomment-465294992,https://api.github.com/repos/pydata/xarray/issues/2191,465294992,MDEyOklzc3VlQ29tbWVudDQ2NTI5NDk5Mg==,23510121,2019-02-19T20:22:28Z,2019-02-19T20:22:28Z,NONE,"@spencerkclark 
Very helpful!!! Thanks a million! :) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-465289567,https://api.github.com/repos/pydata/xarray/issues/2191,465289567,MDEyOklzc3VlQ29tbWVudDQ2NTI4OTU2Nw==,6628425,2019-02-19T20:06:15Z,2019-02-19T20:06:15Z,MEMBER,"@zzheng93 sure thing!

> I hope NCAR will support the next release of xarray.

I know you didn't ask for help with this, but I can't resist :) -- I recommend you set up your own Python environment on Cheyenne.  This is nice because it gives you full control over the packages you install (so you don't need to wait until someone else installs them for you).  A good place to start on how to do this is the [""Getting started with Pangeo on HPC""](http://pangeo.io/setup_guides/hpc.html#getting-started-with-pangeo-on-hpc) page on the Pangeo website.

> A follow-up question is that when we using xarray to manipulate the large dataset such as <xarray.DataArray (time: 14600, lat: 192, lon: 288)> and want to save the results for further machine learning applications (e.g., using sklearn or XGBoost, even deep learning), what will be a **good format** to store the data on server or local machine that will be easily used by sklearn or XGBoost?

I think with some more specific details regarding what you are looking to do, this could potentially be a good question to ask in the (relatively new) [pangeo-data/ml-workflow-examples](https://github.com/pangeo-data/ml-workflow-examples) repo, where they are discussing machine learning workflows connected to xarray.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-464953041,https://api.github.com/repos/pydata/xarray/issues/2191,464953041,MDEyOklzc3VlQ29tbWVudDQ2NDk1MzA0MQ==,23510121,2019-02-19T02:22:22Z,2019-02-19T02:22:58Z,NONE,"@spencerkclark Thank you very much for your help! I will install the development version on my local machine.
Currently I am using NCAR Cheyenne to manipulate the climate data. What I am doing on Cheyenne as a detour is: 
`
xarray.assign_coords(time = xarray.indexes['time'].to_datetimeindex())
xarray.resample(time=""D"").mean(""time"")
`
I hope NCAR will support the next release of xarray.
A follow-up question is that when we using xarray to manipulate the large dataset such as <xarray.DataArray (time: 14600, lat: 192, lon: 288)> and want to save the results for further machine learning applications (e.g., using sklearn or XGBoost, even deep learning), what will be a **good format** to store the data on server or local machine that will be easily used by sklearn or XGBoost?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-464949490,https://api.github.com/repos/pydata/xarray/issues/2191,464949490,MDEyOklzc3VlQ29tbWVudDQ2NDk0OTQ5MA==,6628425,2019-02-19T02:04:39Z,2019-02-19T02:04:39Z,MEMBER,"@zzheng93 welcome!  One way to install the development version is to clone this repo, and do an editable install:
```
$ git clone https://github.com/pydata/xarray.git
$ cd xarray
$ pip install -e .
```
Then using resample with a daily frequency would look something like:
```
In [1]: import xarray as xr

In [2]: times = xr.cftime_range('2000', periods=4, freq='12H')

In [3]: times
Out[3]:
CFTimeIndex([2000-01-01 00:00:00, 2000-01-01 12:00:00, 2000-01-02 00:00:00,
             2000-01-02 12:00:00],
            dtype='object')

In [4]: da = xr.DataArray(range(4), [('time', times)])

In [5]: da.resample(time='D').mean()
Out[5]:
<xarray.DataArray (time: 2)>
array([0.5, 2.5])
Coordinates:
  * time     (time) object 2000-01-01 00:00:00 2000-01-02 00:00:00
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-464923777,https://api.github.com/repos/pydata/xarray/issues/2191,464923777,MDEyOklzc3VlQ29tbWVudDQ2NDkyMzc3Nw==,23510121,2019-02-18T23:46:46Z,2019-02-18T23:46:59Z,NONE,"> @zzheng93 this will be possible in the next release of xarray, so not quite yet, but soon. If you're in a hurry you could install the development version.


@spencerkclark Thank you very much :)
I am new to the Xarray community. I am wondering if there is any instruction regarding installing the latest development version and how to implement the daily resampling function.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-464890837,https://api.github.com/repos/pydata/xarray/issues/2191,464890837,MDEyOklzc3VlQ29tbWVudDQ2NDg5MDgzNw==,6628425,2019-02-18T21:43:34Z,2019-02-18T21:43:34Z,MEMBER,"@zzheng93 this will be possible in the next release of xarray, so not quite yet, but soon.  If you're in a hurry you could install the development version.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-464875401,https://api.github.com/repos/pydata/xarray/issues/2191,464875401,MDEyOklzc3VlQ29tbWVudDQ2NDg3NTQwMQ==,23510121,2019-02-18T20:56:02Z,2019-02-18T20:56:02Z,NONE,"Hi folks,
I have some data like
2000-01-01 00:00:00, 2000-01-01 12:00:00,
2000-01-02 00:00:00, 2000-01-02 12:00:00.
The index is cftime
And I want to take the average within the same date and save the results.
I am wondering if it is possible to resample them at a daily level (e.g., the results will be 2000-01-01 00:00:00 and 2000-01-02 00:00:00)?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-460046479,https://api.github.com/repos/pydata/xarray/issues/2191,460046479,MDEyOklzc3VlQ29tbWVudDQ2MDA0NjQ3OQ==,6628425,2019-02-03T12:16:21Z,2019-02-03T12:16:21Z,MEMBER,This has been implemented in #2593 🎉.,"{""total_count"": 2, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 2, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-395082238,https://api.github.com/repos/pydata/xarray/issues/2191,395082238,MDEyOklzc3VlQ29tbWVudDM5NTA4MjIzOA==,6628425,2018-06-06T14:09:56Z,2018-10-19T19:38:56Z,MEMBER,"When the time coordinate contains `np.datetime64` objects I recommend using resample directly, because the underlying index will be a pandas `DatetimeIndex` (so you just need some logic to detect if that's the case).  

I think the most general workaround for right now would probably look something like the example below.  This has the property that it preserves the underlying calendar type of the time index.
```python
import pandas as pd
import xarray as xr

def resample_ms_freq(ds, dim='time'):
    """"""Resample the dataset to 'MS' frequency regardless of the
    calendar used.
    
    Parameters
    ----------
    ds : Dataset
        Dataset to be resampled
    dim : str
        Dimension name associated with the time index
        
    Returns
    -------
    Dataset
    """"""
    index = ds.indexes[dim]
    if isinstance(index, pd.DatetimeIndex):
        return ds.resample(**{dim: 'MS'}).mean(dim)
    elif isinstance(index, xr.CFTimeIndex):
        date_type = index.date_type
        month_start = [date_type(date.year, date.month, 1) for date in ds[dim].values]
        ms = xr.DataArray(month_start, coords=ds[dim].coords)
        ds = ds.assign_coords(MS=ms)
        return ds.groupby('MS').mean(dim).rename({'MS': dim})
    else:
        raise TypeError(
            'Resampling to month start frequency requires using a time index of either '
            'type pd.DatetimeIndex or xr.CFTimeIndex.')

with xr.set_options(enable_cftimeindex=True):
    ds = xr.open_mfdataset(files)
resampled = resample_ms_freq(ds)
```","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-426334003,https://api.github.com/repos/pydata/xarray/issues/2191,426334003,MDEyOklzc3VlQ29tbWVudDQyNjMzNDAwMw==,6628425,2018-10-02T16:10:51Z,2018-10-02T16:10:51Z,MEMBER,"Thanks @shoyer for getting things started!  @huard your help would be very much appreciated in implementing this.  As mentioned in https://github.com/pydata/xarray/issues/2437#issuecomment-424395224, this is one of the biggest remaining gaps in functionality between xarray objects indexed by a CFTimeIndex and xarray objects indexed by a DatetimeIndex.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-426324533,https://api.github.com/repos/pydata/xarray/issues/2191,426324533,MDEyOklzc3VlQ29tbWVudDQyNjMyNDUzMw==,1217238,2018-10-02T15:45:08Z,2018-10-02T15:45:08Z,MEMBER,Take a look at https://github.com/pydata/xarray/pull/2458 for a very basic version of this.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-426035957,https://api.github.com/repos/pydata/xarray/issues/2191,426035957,MDEyOklzc3VlQ29tbWVudDQyNjAzNTk1Nw==,81219,2018-10-01T19:38:44Z,2018-10-01T19:38:44Z,CONTRIBUTOR,I'm trying to wrap my head around what is needed to get the resample method to work but I must say I'm confused. Would it be possible/practical to create a branch with stubs in the code for the methods that need to be written (with a #2191 comment) so newbies can help fill-in the gaps? ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-399337976,https://api.github.com/repos/pydata/xarray/issues/2191,399337976,MDEyOklzc3VlQ29tbWVudDM5OTMzNzk3Ng==,1217238,2018-06-22T06:42:03Z,2018-06-22T06:42:03Z,MEMBER,"Yes, that would probably be a good idea.
On Thu, Jun 21, 2018 at 9:51 PM Aidan Heerdegen <notifications@github.com>
wrote:

> Does this need it's own issue then, so it doesn't get lost?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/issues/2191#issuecomment-399320016>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/ABKS1lEE7z5wdd_cmlrNnLzUJWC5wmegks5t_HfFgaJpZM4UQeax>
> .
>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-399320016,https://api.github.com/repos/pydata/xarray/issues/2191,399320016,MDEyOklzc3VlQ29tbWVudDM5OTMyMDAxNg==,6063709,2018-06-22T04:51:16Z,2018-06-22T04:51:16Z,CONTRIBUTOR,"Does this need it's own issue then, so it doesn't get lost?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-399315302,https://api.github.com/repos/pydata/xarray/issues/2191,399315302,MDEyOklzc3VlQ29tbWVudDM5OTMxNTMwMg==,6063709,2018-06-22T04:12:11Z,2018-06-22T04:45:03Z,CONTRIBUTOR,"I'm not sure if my issue belongs in here, but I didn't want to create a new Issue (there are already 455 open ones).

I am experimenting with the new `CFTimeIndex` functionality (thanks heaps BTW! That was a mammoth effort if the PR thread is anything to go by).

I am trying to `shift` a time index as I need to align datasets to a common start point. So using the example code above,

```python
da.time.get_index('time').shift(1,'D')
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-71-db48b2fbb340> in <module>()
----> 1 da.time.get_index('time').shift(1,'D')

/g/data3/hh5/public/apps/miniconda3/envs/analysis27-18.04/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in shift(self, periods, freq)
   2627         """"""
   2628         raise NotImplementedError(""Not supported for type %s"" %
-> 2629                                   type(self).__name__)
   2630 
   2631     def argsort(self, *args, **kwargs):

NotImplementedError: Not supported for type CFTimeIndex
```
Is this not implemented because it might require resampling?

I ask because this works:
```python
times[0] + pd.Timedelta('365 days')
cftime.DatetimeNoLeap(2, 1, 1, 0, 0, 0, 0, -1, 1)
```

I guess I am asking, if I want to shift a time index is the best (only?) way currently is to loop over all the individual elements of the index and add a time offset to each?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-399316316,https://api.github.com/repos/pydata/xarray/issues/2191,399316316,MDEyOklzc3VlQ29tbWVudDM5OTMxNjMxNg==,1217238,2018-06-22T04:20:48Z,2018-06-22T04:20:48Z,MEMBER,"shift() is different from resampling, but indeed it looks like we’ll need
to add it manually to CFTimeIndex.
On Thu, Jun 21, 2018 at 9:12 PM Aidan Heerdegen <notifications@github.com>
wrote:

> I'm not sure if my issue belongs in here, but I didn't want to create a
> new Issue (there are already 455 open ones).
>
> I am experimenting with the new CFTimeIndex functionality (thanks heaps
> BTW! That was a mammoth effort if the PR thread is anything to go by).
>
> I am trying to shift a time index as I need to align datasets to a common
> start point. So using the example code above,
>
> da.time.get_index('time').shift(1,'D')---------------------------------------------------------------------------NotImplementedError                       Traceback (most recent call last)<ipython-input-71-db48b2fbb340> in <module>()----> 1 da.time.get_index('time').shift(1,'D')
> /g/data3/hh5/public/apps/miniconda3/envs/analysis27-18.04/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in shift(self, periods, freq)
>    2627         """"""   2628         raise NotImplementedError(""Not supported for type %s"" %-> 2629                                   type(self).__name__)   2630    2631     def argsort(self, *args, **kwargs):NotImplementedError: Not supported for type CFTimeIndex
>
> Is this not implemented because it might require resampling?
>
> I ask because this works:
>
> times[0] + pd.Timedelta('365 days')
> cftime.DatetimeNoLeap(2, 1, 1, 0, 0, 0, 0, -1, 1)```
>
> I guess I am asking, if I want to shift a time index is the best (only?) way currently to loop over all the individual elements of the index and add a time offset to each?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/issues/2191#issuecomment-399315302>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/ABKS1vEhsbxVMPJ6nHrwU9BT_AgCLLWlks5t_G6cgaJpZM4UQeax>
> .
>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-395067197,https://api.github.com/repos/pydata/xarray/issues/2191,395067197,MDEyOklzc3VlQ29tbWVudDM5NTA2NzE5Nw==,31460695,2018-06-06T13:25:11Z,2018-06-06T13:25:11Z,NONE,"Yes, when open_mfdataset decides to convert to CFTime this is much faster. When time is in datetime64, I get:
```
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-72-a96fa0263d3e> in <module>()
      9     dss = xr.open_mfdataset(files,decode_times=True,autoclose=True)
     10     #month_start = [DatetimeNoLeap(date.dt.year, date.dt.month, 1) for date in dss.time]
---> 11     month_start = [DatetimeNoLeap(date.year, date.month, 1) for date in dss.time.values]
     12     #month_start = [DatetimeNoLeap(yr, mon, 1) for yr,mon in zip(dss.time.dt.year,dss.time.dt.month)]
     13     #break

<ipython-input-72-a96fa0263d3e> in <listcomp>(.0)
      9     dss = xr.open_mfdataset(files,decode_times=True,autoclose=True)
     10     #month_start = [DatetimeNoLeap(date.dt.year, date.dt.month, 1) for date in dss.time]
---> 11     month_start = [DatetimeNoLeap(date.year, date.month, 1) for date in dss.time.values]
     12     #month_start = [DatetimeNoLeap(yr, mon, 1) for yr,mon in zip(dss.time.dt.year,dss.time.dt.month)]
     13     #break

AttributeError: 'numpy.datetime64' object has no attribute 'year'
```
You can see I made a feeble attempt to fix it to work for all the CMIP5 calendars, but is just as slow. Any suggestions?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-394898828,https://api.github.com/repos/pydata/xarray/issues/2191,394898828,MDEyOklzc3VlQ29tbWVudDM5NDg5ODgyOA==,6628425,2018-06-06T00:07:10Z,2018-06-06T00:07:10Z,MEMBER,"Indeed what I had above is quite slow!

```python
In [6]: %%timeit
   ...: month_start = [DatetimeNoLeap(date.dt.year, date.dt.month, 1) for date in da.time]
   ...:
1 loop, best of 3: 588 ms per loop
```

Iterating over the contents of `da.time` generates DataArray instances encapsulating single dates.  We can iterate over the dates themselves directly, which is much (over 1000x) faster:

```python
In [7]: %%timeit
   ...: month_start = [DatetimeNoLeap(date.year, date.month, 1) for date in da.time.values]
   ...:
1000 loops, best of 3: 302 µs per loop
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-394890878,https://api.github.com/repos/pydata/xarray/issues/2191,394890878,MDEyOklzc3VlQ29tbWVudDM5NDg5MDg3OA==,31460695,2018-06-05T23:20:00Z,2018-06-05T23:20:00Z,NONE,"@spencerkclark thanks!  I hadn't figured out that particular workaround, but it works, albeit quite slow. For now it will get me to the next step, but just changing to first-of-the-month takes longer than regridding all models to a common grid!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-394839627,https://api.github.com/repos/pydata/xarray/issues/2191,394839627,MDEyOklzc3VlQ29tbWVudDM5NDgzOTYyNw==,6628425,2018-06-05T19:56:30Z,2018-06-05T19:56:30Z,MEMBER,"@naomi-henderson thanks!  In the meantime here's a possible workaround, in case you haven't figured one out already:
```python
import numpy as np
import xarray as xr

from cftime import num2date, DatetimeNoLeap


times = num2date(np.arange(730), calendar='noleap', units='days since 0001-01-01')
da = xr.DataArray(np.arange(730), coords=[times], dims=['time'])

month_start = [DatetimeNoLeap(date.dt.year, date.dt.month, 1) for date in da.time]
da['MS'] = xr.DataArray(month_start, coords=da.time.coords)
resampled = da.groupby('MS').mean('time').rename({'MS': 'time'})
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-394827475,https://api.github.com/repos/pydata/xarray/issues/2191,394827475,MDEyOklzc3VlQ29tbWVudDM5NDgyNzQ3NQ==,31460695,2018-06-05T19:15:09Z,2018-06-05T19:15:09Z,NONE,"I am trying to combine the monthly CMIP5 rcp85 ts datasets (go past 2064AD)  with the myriad calendars, so I love the new CFTimeIndex!  But I need resample(time='MS') in order to force them all to start on the first of each month
thanks!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588
https://github.com/pydata/xarray/issues/2191#issuecomment-392589537,https://api.github.com/repos/pydata/xarray/issues/2191,392589537,MDEyOklzc3VlQ29tbWVudDM5MjU4OTUzNw==,1217238,2018-05-28T19:16:24Z,2018-05-28T19:16:24Z,MEMBER,"Yes, I think so. The main thing we need is a function to map from datetime -> datetime at start of frequency.","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588