html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2191#issuecomment-465289567,https://api.github.com/repos/pydata/xarray/issues/2191,465289567,MDEyOklzc3VlQ29tbWVudDQ2NTI4OTU2Nw==,6628425,2019-02-19T20:06:15Z,2019-02-19T20:06:15Z,MEMBER,"@zzheng93 sure thing! > I hope NCAR will support the next release of xarray. I know you didn't ask for help with this, but I can't resist :) -- I recommend you set up your own Python environment on Cheyenne. This is nice because it gives you full control over the packages you install (so you don't need to wait until someone else installs them for you). A good place to start on how to do this is the [""Getting started with Pangeo on HPC""](http://pangeo.io/setup_guides/hpc.html#getting-started-with-pangeo-on-hpc) page on the Pangeo website. > A follow-up question is that when we using xarray to manipulate the large dataset such as <xarray.DataArray (time: 14600, lat: 192, lon: 288)> and want to save the results for further machine learning applications (e.g., using sklearn or XGBoost, even deep learning), what will be a **good format** to store the data on server or local machine that will be easily used by sklearn or XGBoost? I think with some more specific details regarding what you are looking to do, this could potentially be a good question to ask in the (relatively new) [pangeo-data/ml-workflow-examples](https://github.com/pangeo-data/ml-workflow-examples) repo, where they are discussing machine learning workflows connected to xarray.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588 https://github.com/pydata/xarray/issues/2191#issuecomment-464949490,https://api.github.com/repos/pydata/xarray/issues/2191,464949490,MDEyOklzc3VlQ29tbWVudDQ2NDk0OTQ5MA==,6628425,2019-02-19T02:04:39Z,2019-02-19T02:04:39Z,MEMBER,"@zzheng93 welcome! One way to install the development version is to clone this repo, and do an editable install: ``` $ git clone https://github.com/pydata/xarray.git $ cd xarray $ pip install -e . ``` Then using resample with a daily frequency would look something like: ``` In [1]: import xarray as xr In [2]: times = xr.cftime_range('2000', periods=4, freq='12H') In [3]: times Out[3]: CFTimeIndex([2000-01-01 00:00:00, 2000-01-01 12:00:00, 2000-01-02 00:00:00, 2000-01-02 12:00:00], dtype='object') In [4]: da = xr.DataArray(range(4), [('time', times)]) In [5]: da.resample(time='D').mean() Out[5]: <xarray.DataArray (time: 2)> array([0.5, 2.5]) Coordinates: * time (time) object 2000-01-01 00:00:00 2000-01-02 00:00:00 ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588 https://github.com/pydata/xarray/issues/2191#issuecomment-464890837,https://api.github.com/repos/pydata/xarray/issues/2191,464890837,MDEyOklzc3VlQ29tbWVudDQ2NDg5MDgzNw==,6628425,2019-02-18T21:43:34Z,2019-02-18T21:43:34Z,MEMBER,"@zzheng93 this will be possible in the next release of xarray, so not quite yet, but soon. If you're in a hurry you could install the development version.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588 https://github.com/pydata/xarray/issues/2191#issuecomment-460046479,https://api.github.com/repos/pydata/xarray/issues/2191,460046479,MDEyOklzc3VlQ29tbWVudDQ2MDA0NjQ3OQ==,6628425,2019-02-03T12:16:21Z,2019-02-03T12:16:21Z,MEMBER,This has been implemented in #2593 🎉.,"{""total_count"": 2, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 2, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588 https://github.com/pydata/xarray/issues/2191#issuecomment-395082238,https://api.github.com/repos/pydata/xarray/issues/2191,395082238,MDEyOklzc3VlQ29tbWVudDM5NTA4MjIzOA==,6628425,2018-06-06T14:09:56Z,2018-10-19T19:38:56Z,MEMBER,"When the time coordinate contains `np.datetime64` objects I recommend using resample directly, because the underlying index will be a pandas `DatetimeIndex` (so you just need some logic to detect if that's the case). I think the most general workaround for right now would probably look something like the example below. This has the property that it preserves the underlying calendar type of the time index. ```python import pandas as pd import xarray as xr def resample_ms_freq(ds, dim='time'): """"""Resample the dataset to 'MS' frequency regardless of the calendar used. Parameters ---------- ds : Dataset Dataset to be resampled dim : str Dimension name associated with the time index Returns ------- Dataset """""" index = ds.indexes[dim] if isinstance(index, pd.DatetimeIndex): return ds.resample(**{dim: 'MS'}).mean(dim) elif isinstance(index, xr.CFTimeIndex): date_type = index.date_type month_start = [date_type(date.year, date.month, 1) for date in ds[dim].values] ms = xr.DataArray(month_start, coords=ds[dim].coords) ds = ds.assign_coords(MS=ms) return ds.groupby('MS').mean(dim).rename({'MS': dim}) else: raise TypeError( 'Resampling to month start frequency requires using a time index of either ' 'type pd.DatetimeIndex or xr.CFTimeIndex.') with xr.set_options(enable_cftimeindex=True): ds = xr.open_mfdataset(files) resampled = resample_ms_freq(ds) ```","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588 https://github.com/pydata/xarray/issues/2191#issuecomment-426334003,https://api.github.com/repos/pydata/xarray/issues/2191,426334003,MDEyOklzc3VlQ29tbWVudDQyNjMzNDAwMw==,6628425,2018-10-02T16:10:51Z,2018-10-02T16:10:51Z,MEMBER,"Thanks @shoyer for getting things started! @huard your help would be very much appreciated in implementing this. As mentioned in https://github.com/pydata/xarray/issues/2437#issuecomment-424395224, this is one of the biggest remaining gaps in functionality between xarray objects indexed by a CFTimeIndex and xarray objects indexed by a DatetimeIndex.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588 https://github.com/pydata/xarray/issues/2191#issuecomment-394898828,https://api.github.com/repos/pydata/xarray/issues/2191,394898828,MDEyOklzc3VlQ29tbWVudDM5NDg5ODgyOA==,6628425,2018-06-06T00:07:10Z,2018-06-06T00:07:10Z,MEMBER,"Indeed what I had above is quite slow! ```python In [6]: %%timeit ...: month_start = [DatetimeNoLeap(date.dt.year, date.dt.month, 1) for date in da.time] ...: 1 loop, best of 3: 588 ms per loop ``` Iterating over the contents of `da.time` generates DataArray instances encapsulating single dates. We can iterate over the dates themselves directly, which is much (over 1000x) faster: ```python In [7]: %%timeit ...: month_start = [DatetimeNoLeap(date.year, date.month, 1) for date in da.time.values] ...: 1000 loops, best of 3: 302 µs per loop ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588 https://github.com/pydata/xarray/issues/2191#issuecomment-394839627,https://api.github.com/repos/pydata/xarray/issues/2191,394839627,MDEyOklzc3VlQ29tbWVudDM5NDgzOTYyNw==,6628425,2018-06-05T19:56:30Z,2018-06-05T19:56:30Z,MEMBER,"@naomi-henderson thanks! In the meantime here's a possible workaround, in case you haven't figured one out already: ```python import numpy as np import xarray as xr from cftime import num2date, DatetimeNoLeap times = num2date(np.arange(730), calendar='noleap', units='days since 0001-01-01') da = xr.DataArray(np.arange(730), coords=[times], dims=['time']) month_start = [DatetimeNoLeap(date.dt.year, date.dt.month, 1) for date in da.time] da['MS'] = xr.DataArray(month_start, coords=da.time.coords) resampled = da.groupby('MS').mean('time').rename({'MS': 'time'}) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,327089588