home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

80 rows where author_association = "NONE" and user = 4992424 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

issue >30

  • Groupby-like API for resampling 17
  • Add DatetimeAccessor for accessing datetime fields via `.dt` attribute 6
  • GroupBy like API for resample 3
  • Rolling window operation does not work with dask arrays 3
  • Changing projections under plot() 3
  • add .dt and .str accessors to DataArray (like pandas.Series) 2
  • API for multi-dimensional resampling/regridding 2
  • roll method 2
  • almost-equal grids 2
  • Dataset groups 2
  • Saving to netCDF with 0D dimension doesn't work 2
  • Support for jagged array 2
  • Equivalent of numpy.insert for DataSet / DataArray? 2
  • Fix resample/interpolate for non-upsampling case 2
  • Chunked processing across multiple raster (geoTIF) files 2
  • open_mfdataset too many files 1
  • add scatter plot method to dataset 1
  • Having trouble with time dim of CMIP5 dataset 1
  • Complete renaming xray -> xarray 1
  • save/load DataArray to numpy npz functions 1
  • xr.concat and xr.to_netcdf new filesize 1
  • Return a scalar instead of DataArray when the return value is a scalar 1
  • Hooks for custom attribute handling in xarray operations 1
  • Is there a more efficient way to convert a subset of variables to a dataframe? 1
  • Implementing dask.array.coarsen in xarrays 1
  • replace a dim with a coordinate from another dataset 1
  • Add 'count' as option for how in dataset resample 1
  • concat automagically outer-joins coordinates 1
  • Adding Example/Tutorial of importing data to Xarray (Merge/conact/etc) 1
  • Package naming "conventions" for xarray extensions 1
  • …

user 1

  • darothen · 80 ✖

author_association 1

  • NONE · 80 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
666422864 https://github.com/pydata/xarray/issues/2314#issuecomment-666422864 https://api.github.com/repos/pydata/xarray/issues/2314 MDEyOklzc3VlQ29tbWVudDY2NjQyMjg2NA== darothen 4992424 2020-07-30T14:52:50Z 2020-07-30T14:52:50Z NONE

Hi @shaprann, I haven't re-visited this exact workflow recently, but one really good option (if you can manage the intermediate storage cost) would be to try to use new tools like http://github.com/pangeo-data/rechunker to pre-process and prepare your data archive prior to analysis.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Chunked processing across multiple raster (geoTIF) files 344621749
661953980 https://github.com/pydata/xarray/issues/1086#issuecomment-661953980 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDY2MTk1Mzk4MA== darothen 4992424 2020-07-21T16:09:25Z 2020-07-21T16:09:52Z NONE

Hi @andreall, I'll leave @dcherian or another maintainer to comment on internals of xarray that might be pertinent for optimization here. However, just to throw it out there, for workflows like this, it can sometimes be a bit easier to process each NetCDF file (subsetting your locations and whatnot) and convert it to CSV individually, then merge/concatenate those CSV files together at the end. This sort of workflow can be parallelized a few different ways, but is nice because you can parallelize across the number of files you need to process. A simple example based on your MRE:

``` python import xarray as xr from pathlib import Path from joblib import delayed, Parallel

dir_input = Path('.') fns = list(sorted(dir_input.glob('*/' + 'WW3_EUR-11_CCCma-CanESM2_r1i1p1_CLMcom-CCLM4-8-17_v1_6hr_.nc')))

Helper function to convert NetCDF to CSV with our processing

def _nc_to_csv(fn): data_ww3 = xr.open_dataset(fn) data_ww3 = data_ww3.isel(latitude=74, longitude=18) df_ww3 = data_ww3[['hs', 't02', 't0m1', 't01', 'fp', 'dir', 'spr', 'dp']].to_dataframe()

out_fn = fn.replace(".nc", ".csv")
df_ww3.to_csv(out_fn)

return out_fn

Using joblib.Parallel to distribute my work across whatever resources i have

out_fns = Parallel( n_jobs=-1, # Use all cores available here delayed(_nc_to_csv)(fn) for fn in fns )

Read the CSV files and merge them

dfs = [ pd.read_csv(fn) for fn in out_fns ] df_ww3_all = pd.concat(dfs, ignore_index=True) ```

YMMV but this pattern often works for many types of processing applications.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
536079602 https://github.com/pydata/xarray/issues/3349#issuecomment-536079602 https://api.github.com/repos/pydata/xarray/issues/3349 MDEyOklzc3VlQ29tbWVudDUzNjA3OTYwMg== darothen 4992424 2019-09-27T20:07:13Z 2019-09-27T20:07:13Z NONE

I second @TomNicholas' point... functionality like this would be wonderful to have but where would be the best place for it to live?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement polyfit? 499477363
524104485 https://github.com/pydata/xarray/issues/3213#issuecomment-524104485 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDUyNDEwNDQ4NQ== darothen 4992424 2019-08-22T22:39:21Z 2019-08-22T22:39:21Z NONE

Tagging @jeliashi for visibility/collaboration

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
485272748 https://github.com/pydata/xarray/issues/2911#issuecomment-485272748 https://api.github.com/repos/pydata/xarray/issues/2911 MDEyOklzc3VlQ29tbWVudDQ4NTI3Mjc0OA== darothen 4992424 2019-04-21T18:32:56Z 2019-04-21T18:32:56Z NONE

Hi @tomchor, it's not too difficult to take the readers that you already have and two wrap them in such a way that you can interact with them via xarray; you can check out the packages xgcm or xbpch for examples of this can work in practice. I'm not sure if a more generic reader is within or beyond the scope of the core xarray project, though... although example implementations and writeups would make a great contribution to the community!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support from reading unformatted Fortran files 435532136
417175383 https://github.com/pydata/xarray/issues/2314#issuecomment-417175383 https://api.github.com/repos/pydata/xarray/issues/2314 MDEyOklzc3VlQ29tbWVudDQxNzE3NTM4Mw== darothen 4992424 2018-08-30T03:09:41Z 2018-08-30T03:09:41Z NONE

Can you provide a gdalinfo of one of the GeoTiffs? I'm still working on some documentation for use-cases with cloud-optimized GeoTiffs to supplement @scottyhq's fantastic example notebook. One of the wrinkles I'm tracking down and trying to document is when exactly the GDAL->rasterio->dask->xarray pipeline eagerly load the entire file versus when it defers reading or reads subsets of files. So far, it seems that if the GeoTiff is appropriately chunked ahead of time (when it's written to disk), things basically work "automagically."

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Chunked processing across multiple raster (geoTIF) files 344621749
372475210 https://github.com/pydata/xarray/issues/1970#issuecomment-372475210 https://api.github.com/repos/pydata/xarray/issues/1970 MDEyOklzc3VlQ29tbWVudDM3MjQ3NTIxMA== darothen 4992424 2018-03-12T21:52:22Z 2018-03-12T21:52:22Z NONE

@jhamman What do you think would be involved in fleshing out the integration between xarray and rasterio in order to output cloud-optimized GeoTiffs? I

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  API Design for Xarray Backends 302806158
336634555 https://github.com/pydata/xarray/issues/1631#issuecomment-336634555 https://api.github.com/repos/pydata/xarray/issues/1631 MDEyOklzc3VlQ29tbWVudDMzNjYzNDU1NQ== darothen 4992424 2017-10-14T13:19:58Z 2017-10-14T13:19:58Z NONE

Thanks for documenting this @jhamman. I think all the logic is in .resample(...).interpolate() to build out true interpolation or really imputation/infilling. I can jump in if there's any confusion in the code.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Resample / upsample behavior diverges from pandas  265056503
336001921 https://github.com/pydata/xarray/issues/1627#issuecomment-336001921 https://api.github.com/repos/pydata/xarray/issues/1627 MDEyOklzc3VlQ29tbWVudDMzNjAwMTkyMQ== darothen 4992424 2017-10-12T02:26:05Z 2017-10-12T02:26:05Z NONE

Wow, great job @benbovy!

With the upcoming move towards Jupyter Lab and a better infrastructure for custom plugins, could this serve as the basis for a "NetCDF Extension" for Jupyter Lab? It would be great if double clicking on a NetCDF file in the JLab file explorer could open up this sort of information, or even a quick and dirty ncview-like plotter.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  html repr of xarray object (for the notebook) 264747372
334526971 https://github.com/pydata/xarray/pull/1608#issuecomment-334526971 https://api.github.com/repos/pydata/xarray/issues/1608 MDEyOklzc3VlQ29tbWVudDMzNDUyNjk3MQ== darothen 4992424 2017-10-05T16:57:03Z 2017-10-05T16:57:03Z NONE

I'm a bit slow on the uptake here, but big 👍 from me. Thanks for catching this bug!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix resample/interpolate for non-upsampling case 262874270
334453965 https://github.com/pydata/xarray/pull/1608#issuecomment-334453965 https://api.github.com/repos/pydata/xarray/issues/1608 MDEyOklzc3VlQ29tbWVudDMzNDQ1Mzk2NQ== darothen 4992424 2017-10-05T12:46:54Z 2017-10-05T12:46:54Z NONE

Great catch; do you need any input from me @jhamman ?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix resample/interpolate for non-upsampling case 262874270
334224596 https://github.com/pydata/xarray/issues/1605#issuecomment-334224596 https://api.github.com/repos/pydata/xarray/issues/1605 MDEyOklzc3VlQ29tbWVudDMzNDIyNDU5Ng== darothen 4992424 2017-10-04T17:10:02Z 2017-10-04T17:10:02Z NONE

(sorry, originally commented from my work account)

The tutorial dataset is ~6-hourly, so your operation is a downsampling operation. We don't actually support interpolation on downsampling operations - just aggregations/reductions. Upsampling supports interpolation since there is no implicit way to estimate data between the gaps at the lower temporal frequency. If you just want to estimate a given field at 15-day intervals, for 00Z on those days, then I think you should use ds.reindex(), but at the moment I do not think it will work with timeseries. That would be a critical feature to implement.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Resample interpolate failing on tutorial dataset  262847801
332619692 https://github.com/pydata/xarray/issues/1596#issuecomment-332619692 https://api.github.com/repos/pydata/xarray/issues/1596 MDEyOklzc3VlQ29tbWVudDMzMjYxOTY5Mg== darothen 4992424 2017-09-27T18:49:34Z 2017-09-27T18:49:34Z NONE

@willirath Never hurts!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Equivalent of numpy.insert for DataSet / DataArray? 260912521
332519089 https://github.com/pydata/xarray/issues/1596#issuecomment-332519089 https://api.github.com/repos/pydata/xarray/issues/1596 MDEyOklzc3VlQ29tbWVudDMzMjUxOTA4OQ== darothen 4992424 2017-09-27T13:23:38Z 2017-09-27T13:23:38Z NONE

@willirath is your time data equally spaced? If so, you should be able to use the new version of DataArray.resample() available on the master (and scheduled for the 0.10.0 release) which supports upsampling/infilling.

Should work something like this, assuming each timestep is a daily value on the time axis:

``` python ds = xr.open_mfdataset("paths/to/my/data.nc")

ds_infilled = ds.resample(time='1D').asfreq() ```

That should get you nans wherever your data is missing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Equivalent of numpy.insert for DataSet / DataArray? 260912521
331281120 https://github.com/pydata/xarray/pull/1272#issuecomment-331281120 https://api.github.com/repos/pydata/xarray/issues/1272 MDEyOklzc3VlQ29tbWVudDMzMTI4MTEyMA== darothen 4992424 2017-09-21T21:02:39Z 2017-09-21T21:10:51Z NONE

@jhamman Ohhh i totally misunderstood the last readout from travis-ci. Dealing with the scipy dependency is easy enough. ~However, another test fails because it uses np.flip() which wasn't added to numpy until v1.12.0. Do we want to bump the numpy version in the dependencies? Or is there another aproach to take here?~

Nevermind, easy solution is just to use other axis-reversal methods :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Groupby-like API for resampling 208215185
330910590 https://github.com/pydata/xarray/pull/1272#issuecomment-330910590 https://api.github.com/repos/pydata/xarray/issues/1272 MDEyOklzc3VlQ29tbWVudDMzMDkxMDU5MA== darothen 4992424 2017-09-20T16:41:01Z 2017-09-20T16:41:01Z NONE

@jhamman done - caught me right while I was compiling GEOS-Chem, and the merge conflicts were very simple.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Groupby-like API for resampling 208215185
330840457 https://github.com/pydata/xarray/pull/1272#issuecomment-330840457 https://api.github.com/repos/pydata/xarray/issues/1272 MDEyOklzc3VlQ29tbWVudDMzMDg0MDQ1Nw== darothen 4992424 2017-09-20T12:47:08Z 2017-09-20T12:47:08Z NONE

@jhamman Think we're good. I deferred 4 small pep8 issues because they're in parts of the codebase which I don't think I ever touched, and i'm worried they're going to screw up the merge.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Groupby-like API for resampling 208215185
330530760 https://github.com/pydata/xarray/pull/1272#issuecomment-330530760 https://api.github.com/repos/pydata/xarray/issues/1272 MDEyOklzc3VlQ29tbWVudDMzMDUzMDc2MA== darothen 4992424 2017-09-19T12:58:34Z 2017-09-19T12:58:34Z NONE

@jhamman Gotcha, I'll clean everything up by the end of the week. If that's going to block 0.10.0, let me know and I'll shuffle some things around to prioritize this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Groupby-like API for resampling 208215185
329227114 https://github.com/pydata/xarray/pull/1272#issuecomment-329227114 https://api.github.com/repos/pydata/xarray/issues/1272 MDEyOklzc3VlQ29tbWVudDMyOTIyNzExNA== darothen 4992424 2017-09-13T16:43:32Z 2017-09-13T16:43:32Z NONE

@shoyer fixed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Groupby-like API for resampling 208215185
329162517 https://github.com/pydata/xarray/pull/1272#issuecomment-329162517 https://api.github.com/repos/pydata/xarray/issues/1272 MDEyOklzc3VlQ29tbWVudDMyOTE2MjUxNw== darothen 4992424 2017-09-13T13:10:04Z 2017-09-13T13:10:04Z NONE

Hmmm. Something is really screwy with my feature branch and making the task of cleaning up the merge difficult. I'll work on fixing this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Groupby-like API for resampling 208215185
329039697 https://github.com/pydata/xarray/pull/1272#issuecomment-329039697 https://api.github.com/repos/pydata/xarray/issues/1272 MDEyOklzc3VlQ29tbWVudDMyOTAzOTY5Nw== darothen 4992424 2017-09-13T02:34:21Z 2017-09-13T02:34:21Z NONE

Try refreshing? Latest commit is 7a767d8 and has all these changes plus some more tweaks.

Daniel Rothenberg * Postdoctoral Research Associate Center for Global Change Science Massachusetts Institute of Technology A: 77 Massachusetts Ave | E18-402A Cambridge, MA 02139 T: (502) 648-7513; T: (617) 258-0407 E: darothen@mit.edu H: *www.danielrothenberg.com http://github.com/darothen [image: http://www.linkedin.com/in/rothenbergdaniel/] http://www.linkedin.com/in/rothenbergdaniel/ http://www.twitter.com/danrothenberg

On Tue, Sep 12, 2017 at 12:02 PM, Stephan Hoyer notifications@github.com wrote:

@shoyer commented on this pull request.

In xarray/core/resample.py https://github.com/pydata/xarray/pull/1272#discussion_r138390791:

  • f = interp1d(x, y, kind=kind, axis=axis, bounds_error=True,
  • assume_sorted=True)
  • Prepare coordinates by dropping non-dimension coordinates along the

  • resampling dimension.

  • note: the lower-level Variable API could be used to speed this up

  • coords = OrderedDict()
  • if self._obj.data_vars:
  • var = list(self._obj.data_vars.keys())[0]
  • da = self._obj[var].copy()

did you push those commits yet? I'm not seeing it yet

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1272#discussion_r138390791, or mute the thread https://github.com/notifications/unsubscribe-auth/AEwtqKtNFFYYlmBgLHyDLPVZNJe7T18Gks5shqqdgaJpZM4MDalB .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Groupby-like API for resampling 208215185
328724595 https://github.com/pydata/xarray/issues/1279#issuecomment-328724595 https://api.github.com/repos/pydata/xarray/issues/1279 MDEyOklzc3VlQ29tbWVudDMyODcyNDU5NQ== darothen 4992424 2017-09-12T03:29:29Z 2017-09-12T03:29:29Z NONE

@shoyer - This output is usually provided as a sequence of daily netCDF files, each on a ~2 degree global grid with 24 timesteps per file (so shape 24 x 96 x 144). For convenience, I usually concatenate these files into yearly datasets, so they'll have a shape (8736 x 96 x 144). I haven't played too much with how to chunk the data, but it's not uncommon for me to load 20-50 of these files simultaneously (each holding a years worth of data) and treat each year as an "ensemble member dimension, so my data has shape (50 x 8736 x 96 x 144). Yes, keeping everything in dask array land is preferable, I suppose.

@jhamman - Wow, that worked pretty much perfectly! There's a handful of typos (you switch from "a" to "x" halfway through), and there's a lot of room for optimization by chunksize. But it just works, which is absolutely ridiculous. I just pushed a ~200 GB dataset on my cluster with ~50 cores and it screamed through the calculation.

Is there anyway this could be pushed before 0.10.0? It's a killer enhancement.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Rolling window operation does not work with dask arrays 208903781
328314676 https://github.com/pydata/xarray/issues/1279#issuecomment-328314676 https://api.github.com/repos/pydata/xarray/issues/1279 MDEyOklzc3VlQ29tbWVudDMyODMxNDY3Ng== darothen 4992424 2017-09-10T02:04:33Z 2017-09-10T02:04:33Z NONE

In light of #1489 is there a way to move forward here with rolling on dask-backed data structures?

In soliciting the atmospheric chemistry community for a few illustrative examples for gcpy, it's become apparent that indices computed from re-sampled timeseries would be killer, attention-grabbing functionality. For instance, the EPA air quality standard we use for ozone involves taking hourly data, computing 8-hour rolling means for each day of your dataset, and then picking the maximum of those means for each day ("MDA8 ozone"). Similar metrics exist for other pollutants.

With traditional xarray data-structures, it's trivial to compute this quantity (assuming we have hourly data and using the new resample API from #1272):

python ds = xr.open_dataset("hourly_ozone_data.nc") mda8_o3 = ( ds['O3'] .rolling(time=8, min_periods=6) .mean('time') .resample(time='D').max() ) There's one quirk relating to timestamp the rolling data (by default rolling uses the last timestamp in a dataset, where in my application I want to label data with the first one) which makes that chained method a bit impractical, but it only adds like one line of code and it is totally dask-friendly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Rolling window operation does not work with dask arrays 208903781
326689304 https://github.com/pydata/xarray/pull/1272#issuecomment-326689304 https://api.github.com/repos/pydata/xarray/issues/1272 MDEyOklzc3VlQ29tbWVudDMyNjY4OTMwNA== darothen 4992424 2017-09-01T21:38:18Z 2017-09-01T21:38:18Z NONE

Resolved to drop auxiliary coordinates which are defined along the dimension to be re-sampled. This makes sense; if someone wants them to be interpolated or manipulated in some way, then they should promote them from coordinates to variables before doing the resampling.

In response to #1328, count() works just fine if you call it from a Resample object. Works for both resampling and up-sampling, but it will preserve the shape of the non-resampled dimensions. I think that's fine, because count() treats NaN as missing by default, so you can immediately know in which grid cells you have missing data :)

Final review, @shoyer, before merging in anticipation of 0.10.0?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Groupby-like API for resampling 208215185
325974604 https://github.com/pydata/xarray/issues/486#issuecomment-325974604 https://api.github.com/repos/pydata/xarray/issues/486 MDEyOklzc3VlQ29tbWVudDMyNTk3NDYwNA== darothen 4992424 2017-08-30T12:26:07Z 2017-08-30T12:26:07Z NONE

@ocefpaf Awesome, good to know that hurdle has already been leaped :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  API for multi-dimensional resampling/regridding 96211612
325969302 https://github.com/pydata/xarray/issues/486#issuecomment-325969302 https://api.github.com/repos/pydata/xarray/issues/486 MDEyOklzc3VlQ29tbWVudDMyNTk2OTMwMg== darothen 4992424 2017-08-30T12:01:29Z 2017-08-30T12:01:29Z NONE

If ESMF is the way to go, then some effort needs to be made to build conda recipes and other infrastructure for distributing and building the platform. It's a heavy dependency to haul around.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  API for multi-dimensional resampling/regridding 96211612
325777712 https://github.com/pydata/xarray/issues/1534#issuecomment-325777712 https://api.github.com/repos/pydata/xarray/issues/1534 MDEyOklzc3VlQ29tbWVudDMyNTc3NzcxMg== darothen 4992424 2017-08-29T19:42:24Z 2017-08-29T19:42:24Z NONE

@mmartini-usgs, an entire netCDF file (as long as it only has 1 group, which it most likely does if we're talking about standard atmospheric/oceanic data) would be the equivalent of an xarray.Dataset. Each variable could be represented as a pandas.DataFrame, but with a MultiIndex - an index with multiple levels, but which are consist across each level.

To start with, you should read in your data using the chunks keyword to open_dataset(); this turns all of the data you read into dask arrays. Then, you use xarray Dataset and DataArray operations to manipulate them. So you can start, instead, by opening your data:

python ds = xr.open_dataset('hugefile.nc', chunks={<fill me in>}) ds_lp = ds.resample('H','time','mean')

You'd have to choose chunks based on the dimensions of your data. Like @rabernat previously mentioned, it's very likely you can perform your entire workflow within xarray without every having to drop down to pandas; let us know if you can share more details

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_dataframe (pandas) usage question 253407851
325494110 https://github.com/pydata/xarray/issues/1535#issuecomment-325494110 https://api.github.com/repos/pydata/xarray/issues/1535 MDEyOklzc3VlQ29tbWVudDMyNTQ5NDExMA== darothen 4992424 2017-08-28T21:52:54Z 2017-08-28T21:52:54Z NONE

Great; there's only a single action item left on #1272, so I'll try to get to that later this week.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 1,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  v0.10 Release 253463226
323539716 https://github.com/pydata/xarray/pull/1272#issuecomment-323539716 https://api.github.com/repos/pydata/xarray/issues/1272 MDEyOklzc3VlQ29tbWVudDMyMzUzOTcxNg== darothen 4992424 2017-08-19T18:24:29Z 2017-08-19T18:24:29Z NONE

All set except for my one question to @shoyer above. I've opted not to include a chart outlining the various upsampling options... couldn't really think of a nice and clean way to do so, because adding it to the time series doc page ends up being really ugly and there isn't quite enough substance for its own worked example page.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Groupby-like API for resampling 208215185
320297159 https://github.com/pydata/xarray/pull/1272#issuecomment-320297159 https://api.github.com/repos/pydata/xarray/issues/1272 MDEyOklzc3VlQ29tbWVudDMyMDI5NzE1OQ== darothen 4992424 2017-08-04T16:45:56Z 2017-08-19T18:23:06Z NONE

Okay, it was a bit of effort but I implemented upsampling. For the padding methods I just re-index the Dataset or DataArray using the re-sampled time frequencies. I also added interpolation, but that was a bit tricky; we have to sort of break the split-apply-combine idiom to do that, so I created a Resampler mix-in which could contain the logic for the up-sampling. The DatasetResampler and DataArrayResampler each then implement similar logic for doing the interpolation. The up-sampling is designed to work with n-dimensional data.

The padding methods work 100% with dask arrays - since we're just calling xarray methods which themselves work with dask arrays! There are some eager computations (just the calculation of the up-sampled time frequencies) but I don't think that's a major issue; the actual re-indexing/padding is deferred. Interpolation works with dask arrays too, but eagerly does the computations.

Could use a review from @shoyer or @jhamman.

New TODO list:

  • [ ] Add example chart to the timeseries doc page comparing the different upsampling options
  • [x] Additional up-sampling test cases for both DataArrays and Datasets
  • [x] Code clean-up
  • [x] What's new
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Groupby-like API for resampling 208215185
323105006 https://github.com/pydata/xarray/issues/1509#issuecomment-323105006 https://api.github.com/repos/pydata/xarray/issues/1509 MDEyOklzc3VlQ29tbWVudDMyMzEwNTAwNg== darothen 4992424 2017-08-17T15:20:22Z 2017-08-17T15:20:22Z NONE

@betaplane a re-factoring of the resample API to match pandas' is currently being wrapped up and slated for 0.10.0; see #1272

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unexpected behavior with DataArray.resample(how='sum') in presence of NaNs 250751931
321245721 https://github.com/pydata/xarray/issues/1505#issuecomment-321245721 https://api.github.com/repos/pydata/xarray/issues/1505 MDEyOklzc3VlQ29tbWVudDMyMTI0NTcyMQ== darothen 4992424 2017-08-09T12:50:40Z 2017-08-09T12:50:40Z NONE

How exactly is your WRF output split? It's not clear exactly what you want to do... is it split along different tiles such that indices [1, ..., m] are in ds_col_0, [m+1, ..., p] are in ds_col_1, and [p+1, ..., n] are in ds_col_2? Or is each dataset a different vertical level? Or a different timestep?

I'm not sure that xr.concat will even work if you pass dim a list of dimensions. It's only designed to concatenate along one dimension at a time; if you pass a pandas Index or a DataArray as the argument for dim, then it will create a new dimension in the dataset and use the values in that argument as the coordinates - so you have to exactly match the number Datasets or DataArrays in the first argument.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  A problem about xarray.concat 248942085
316404161 https://github.com/pydata/xarray/pull/1272#issuecomment-316404161 https://api.github.com/repos/pydata/xarray/issues/1272 MDEyOklzc3VlQ29tbWVudDMxNjQwNDE2MQ== darothen 4992424 2017-07-19T14:24:38Z 2017-08-04T16:39:53Z NONE

TODO

  • [x] ensure that count() works on Data{Array,set}Resample objects
  • [x] refactor Data{Array,set}Resample objects into a stand-alone file core/resample.py alongside core/groupby.py
  • [x] wrap pytest.warns around tests targeting old API
  • [x] move old API tests into stand-alone
  • [x] Crude up-sampling. Copy/pasting Stephan's earlier comment from Feb 20:

I think we need to fix this before merging this PR, since it suggests the existing functionality would only exist in deprecated form. Pandas does this with a method called .asfreq, though this is basically pure sugar since in practice I think it works exactly the same as .first (or .mean if only doing pure upsampling).


Alright @jhamman, here's the complete list of work left here. I'll tackle some of it during my commutes this week.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Groupby-like API for resampling 208215185
319988645 https://github.com/pydata/xarray/pull/1272#issuecomment-319988645 https://api.github.com/repos/pydata/xarray/issues/1272 MDEyOklzc3VlQ29tbWVudDMxOTk4ODY0NQ== darothen 4992424 2017-08-03T14:39:04Z 2017-08-03T14:39:04Z NONE

Finished off everything except upsampling. In pandas, all upsampling works by constructing a new time index (which we already do) and then filling in the NaNs that result in the dataset with one of a few different rules. Not sure how involved this will be, but I anticipate this can all be implemented in core/resample.py

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Groupby-like API for resampling 208215185
318079611 https://github.com/pydata/xarray/issues/1490#issuecomment-318079611 https://api.github.com/repos/pydata/xarray/issues/1490 MDEyOklzc3VlQ29tbWVudDMxODA3OTYxMQ== darothen 4992424 2017-07-26T14:57:58Z 2017-07-26T14:57:58Z NONE

Did some digging.

Note here that the dtypes of time1 and time2 are different; the first is a datetime64[ns] but the second is a datetime64[ns, UTC]. For the sake of illustration, I'm going to change the timezone to EST. If we print time2, we get something that looks like this:

``` python

time2 DatetimeIndex(['2000-01-01 00:00:00-05:00', '2000-01-01 01:00:00-05:00', '2000-01-01 02:00:00-05:00', '2000-01-01 03:00:00-05:00', '2000-01-01 04:00:00-05:00', '2000-01-01 05:00:00-05:00', '2000-01-01 06:00:00-05:00', '2000-01-01 07:00:00-05:00', '2000-01-01 08:00:00-05:00', '2000-01-01 09:00:00-05:00', ... '2000-12-30 14:00:00-05:00', '2000-12-30 15:00:00-05:00', '2000-12-30 16:00:00-05:00', '2000-12-30 17:00:00-05:00', '2000-12-30 18:00:00-05:00', '2000-12-30 19:00:00-05:00', '2000-12-30 20:00:00-05:00', '2000-12-30 21:00:00-05:00', '2000-12-30 22:00:00-05:00', '2000-12-30 23:00:00-05:00'], dtype='datetime64[ns, EST]', length=8760, freq='H') ```

But, if we directly print its values, we get something slightly different:

``` python

time2.values array(['2000-01-01T05:00:00.000000000', '2000-01-01T06:00:00.000000000', '2000-01-01T07:00:00.000000000', ..., '2000-12-31T02:00:00.000000000', '2000-12-31T03:00:00.000000000', '2000-12-31T04:00:00.000000000'], dtype='datetime64[ns]') ```

The difference is that the timezone delta has been automatically added in terms of hours to each value in time2. This brings up something to note: if you construct your Dataset using time1.values and time2.values, there is no problem:

python import pandas as pd import xarray as xr time1 = pd.date_range('2000-01-01', freq='H', periods=365 * 24) #timezone naïve time2 = pd.date_range('2000-01-01', freq='H', periods=365 * 24, tz='UTC') #timezone aware ds1 = xr.Dataset({'foo': ('time', np.arange(365 * 24)), 'time': time1.values}) ds2 = xr.Dataset({'foo': ('time', np.arange(365 * 24)), 'time': time2.values}) ds1.resample('3H', 'time', how='mean') # works fine ds2.resample('3H', 'time', how='mean') # works fine

Both time1 and time2 are instances of pd.DatetimeIndex which are subclasses of pd.Index. When xarray tries to turn them into Variables, it ultimately uses a PandasIndexAdapter to decode the contents of time1 and time2, and this is where the trouble happens. The PandasIndexAdapter tries to safely cast the dtype of the array it is passed, which works just fine for time1. But for some weird reason, numpy doesn't recognize its own datetime dtypes when they have timezone information. That is, this will work:

``` python

np.dtype('datetime64[ns]') dtype('<M8[ns]') ``` But this won't:

``` python

np.dtype('datetime64[ns, UTC]') TypeError: Invalid datetime unit in metadata string "[ns, UC]" ```

But also, the type of time2.dtype is a pandas.types.dtypes.DatetimeTZDtype, which NumPy doesn't know what to do with (it doesn't know how to map that type to its own datetime64).

So what happens is that the resulting Variable which defines the time coordinate on your ds2 has an array with the correct values, but is explicitly told to have the dtype object. When the array is decoded, then, bad things happen.

One solution would be to catch this potential glitch in either is_valid_numpy_dtype() or the PandasIndexAdapter constructor. Alternatively, we could eagerly coerce arrays with type pandas.types.dtypes.DatetimeTZDtype into numpy-compliant types at some earlier point.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Resample not working when time coordinate is timezone aware 245649333
316398830 https://github.com/pydata/xarray/pull/1272#issuecomment-316398830 https://api.github.com/repos/pydata/xarray/issues/1272 MDEyOklzc3VlQ29tbWVudDMxNjM5ODgzMA== darothen 4992424 2017-07-19T14:07:00Z 2017-07-19T14:07:00Z NONE

I did my best to re-base everything to master... plan on spending an hour or so figuring out what's broken and at least restoring the status quo.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Groupby-like API for resampling 208215185
316377854 https://github.com/pydata/xarray/issues/1483#issuecomment-316377854 https://api.github.com/repos/pydata/xarray/issues/1483 MDEyOklzc3VlQ29tbWVudDMxNjM3Nzg1NA== darothen 4992424 2017-07-19T12:59:04Z 2017-07-19T12:59:04Z NONE

Instead of computing the mean over your non-stacked dimension by

python dsg = dst.groupby('allpoints').mean()

why not just instead call

python dsg = dst.mean('time', keep_attrs=True)

so that you just collapse the time dimension and preserve the attributes on your data? Then you can unstack() and everything should still be there. The idiom of stacking/applying/unstacking is really useful to fit your data to the interface of a numpy or scipy function that will do all the heavy lifting with a vectorized routine for you - isn't using groupby in this way really slow?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Loss of coordinate information from groupby.apply() on a stacked object 244016361
316376598 https://github.com/pydata/xarray/issues/1482#issuecomment-316376598 https://api.github.com/repos/pydata/xarray/issues/1482 MDEyOklzc3VlQ29tbWVudDMxNjM3NjU5OA== darothen 4992424 2017-07-19T12:54:30Z 2017-07-19T12:54:30Z NONE

@mitar it depends on your data/application, right? But that information would also be helpful in figuring out alternative pathways. If you're always going to process the images individually or sequentially, then what advantage is there (aside from convenience) of dumping them in some giant array with forced dimensions/shape per slice?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for jagged array 243964948
316371416 https://github.com/pydata/xarray/issues/1482#issuecomment-316371416 https://api.github.com/repos/pydata/xarray/issues/1482 MDEyOklzc3VlQ29tbWVudDMxNjM3MTQxNg== darothen 4992424 2017-07-19T12:34:32Z 2017-07-19T12:34:32Z NONE

The problem is that these sorts of arrays break the common data model on top of which xarray (and NetCDF) is built.

If I understand correctly, I could batch all images of the same size into its own dimension? That might be also acceptable.

Yes, if you can pre-process all the images and align them on some common set of dimensions (maybe just xi and yi, denoting integer index in the x and y directions), and pad unused space for each image with NaNs, then you could concatenate everything into a Dataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for jagged array 243964948
315355743 https://github.com/pydata/xarray/pull/1272#issuecomment-315355743 https://api.github.com/repos/pydata/xarray/issues/1272 MDEyOklzc3VlQ29tbWVudDMxNTM1NTc0Mw== darothen 4992424 2017-07-14T13:10:22Z 2017-07-14T13:10:22Z NONE

I think a pull against the new releases is critical to see what breaks. Beyond that, just code clean up and testing. I can try to bump this higher on my priority list.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Groupby-like API for resampling 208215185
313106392 https://github.com/pydata/xarray/issues/1354#issuecomment-313106392 https://api.github.com/repos/pydata/xarray/issues/1354 MDEyOklzc3VlQ29tbWVudDMxMzEwNjM5Mg== darothen 4992424 2017-07-05T13:41:56Z 2017-07-05T13:41:56Z NONE

@wqshen, a workaround until a more complete modification to align is available would be to explicitly copy/set the coordinate values on your arrays before using xr.concat(). Alternatively, if it's as simple as stacking along a new tailing axis, you could stack via dask/numpy and then construct a new DataArray passing the coordinates explicitly.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat automagically outer-joins coordinates 219692578
308914123 https://github.com/pydata/xarray/issues/1447#issuecomment-308914123 https://api.github.com/repos/pydata/xarray/issues/1447 MDEyOklzc3VlQ29tbWVudDMwODkxNDEyMw== darothen 4992424 2017-06-16T02:14:31Z 2017-06-16T02:14:31Z NONE

For xbpch I followed a similar naming convention based on @rabernat's xmitgcm. Brewing on the horizon is an xarray-powered toolkit for GEOS-Chem and while it'll be a stand-alone library, I imagine it'll belong to this confederation of toolkits and provide an accessor or two for computing model grid geometries and related things on-the-fly. I'd also +1 for an xarray prefix (so, xbpch -> xarray-bpch and xmitgcm -> xarray-mitgcm?)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Package naming "conventions" for xarray extensions 234658224
305178905 https://github.com/pydata/xarray/issues/1192#issuecomment-305178905 https://api.github.com/repos/pydata/xarray/issues/1192 MDEyOklzc3VlQ29tbWVudDMwNTE3ODkwNQ== darothen 4992424 2017-05-31T12:59:52Z 2017-05-31T12:59:52Z NONE

Not to hijack the thread, but @PeterDSteinberg - this is the first I've heard of earthio and I think there would be a lot of interest from the broader atmospheric/oceanic sciences community to hear about what your all's plans are. Could your team do a blog post on Continuum sometime outlining the goals of the project?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implementing dask.array.coarsen in xarrays 198742089
304107683 https://github.com/pydata/xarray/issues/470#issuecomment-304107683 https://api.github.com/repos/pydata/xarray/issues/470 MDEyOklzc3VlQ29tbWVudDMwNDEwNzY4Mw== darothen 4992424 2017-05-25T19:57:22Z 2017-05-25T19:57:22Z NONE

This certainly could be useful, but since this is essentially plotting a vector of data, why not just drop into pandas?

``` df = da.to_dataframe()

Could reset coordinates if you really wanted

df = df.reset_index()

df.plot.scatter('longitude', 'latitude', c=da.name) ```

Patching in this rough functionality into the plotting module should be really straightforward, maybe @jhamman has some tips?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add scatter plot method to dataset 94787306
301489242 https://github.com/pydata/xarray/issues/1279#issuecomment-301489242 https://api.github.com/repos/pydata/xarray/issues/1279 MDEyOklzc3VlQ29tbWVudDMwMTQ4OTI0Mg== darothen 4992424 2017-05-15T14:18:55Z 2017-05-15T14:18:55Z NONE

Dask dataframes have recently been updated so that rolling operations work (dask/dask#2198). Does this open a pathway to enable rolling on dask arrays within xarray?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Rolling window operation does not work with dask arrays 208903781
300462962 https://github.com/pydata/xarray/issues/1391#issuecomment-300462962 https://api.github.com/repos/pydata/xarray/issues/1391 MDEyOklzc3VlQ29tbWVudDMwMDQ2Mjk2Mg== darothen 4992424 2017-05-10T12:11:56Z 2017-05-10T12:11:56Z NONE

@klapo! Great to see you here!

Happy to iterate with you on documenting this functionality. For reference, I wrote a package for my dissertation work to help automate the task of constructing multi-dimensional Datasets which include dimensions corresponding to experimental/ensemble factors. One of my on-going projects is to actually fully abstract this (I have a not-uploaded branch of the project which tries to build the notion of an "EnsembleDataset", which has the same relationship to a Dataset that an pandas Panel used to have to a DataFrame).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Adding Example/Tutorial of importing data to Xarray (Merge/conact/etc) 225536793
299194997 https://github.com/pydata/xarray/issues/1397#issuecomment-299194997 https://api.github.com/repos/pydata/xarray/issues/1397 MDEyOklzc3VlQ29tbWVudDI5OTE5NDk5Nw== darothen 4992424 2017-05-04T14:05:48Z 2017-05-04T14:05:48Z NONE

Cool; please keep me in the loop if you don't mind, because I also have an application which I'd really like to just be able use the built-in faceting for rather than building my plot grids manually.

A good comparison case is to perform the same plots (with the same set aspect/size/ratio at both the figure and subplot level) but just don't use the Cartopy transformations. In these cases, I have all the control that I would expect. There are also important differences between pcoloring and imshowing which would be useful to understand. At a minimum, we should deliver back to xarray some improved documentation discussing handling subplot geometry during faceting.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Changing projections under plot() 225846258
299191499 https://github.com/pydata/xarray/issues/1397#issuecomment-299191499 https://api.github.com/repos/pydata/xarray/issues/1397 MDEyOklzc3VlQ29tbWVudDI5OTE5MTQ5OQ== darothen 4992424 2017-05-04T13:53:09Z 2017-05-04T13:53:09Z NONE

@fmaussion What happens if you add aspect="auto" to subplot_kws?

I'm tempted to have us move this discussion to StackOverflow (for heightened visibility), but I suspect there might actually be a bug somewhere in the finalization of the faceting that undoes the specifications you pass to the initial subplot constructor.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Changing projections under plot() 225846258
299056235 https://github.com/pydata/xarray/issues/1397#issuecomment-299056235 https://api.github.com/repos/pydata/xarray/issues/1397 MDEyOklzc3VlQ29tbWVudDI5OTA1NjIzNQ== darothen 4992424 2017-05-03T22:43:55Z 2017-05-03T22:43:55Z NONE

The biggest trouble I have is with tightening the space between the map and the colorbar at the bottom, but this looks like a cartopy/mpl question, not an xarray question, so I should quit pestering you guys.

You just need to pass the "pad" argument to cbar_kwargs.

The trickier problem is that sometimes cartopy can be a bit unpredictable in controlling the size and aspect ratio of axes after you've plotted maps on them. You can force a plot to respect the aspect ratio you use when you construct an axis by using the keyword aspect="auto", but it can be a bit difficult to get this to work in xarray sometimes. But at the end of the day, it's not a big deal to hand-craft a publication-quality figure once you know the rough gist of what you want to go on it - and xarray's tools are already great for that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Changing projections under plot() 225846258
294829429 https://github.com/pydata/xarray/pull/1356#issuecomment-294829429 https://api.github.com/repos/pydata/xarray/issues/1356 MDEyOklzc3VlQ29tbWVudDI5NDgyOTQyOQ== darothen 4992424 2017-04-18T12:53:01Z 2017-04-18T12:53:01Z NONE

Alrighty, patched and ready for a final look-over! I appreciate the help and patience, @shoyer!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add DatetimeAccessor for accessing datetime fields via `.dt` attribute 220011864
294628295 https://github.com/pydata/xarray/pull/1356#issuecomment-294628295 https://api.github.com/repos/pydata/xarray/issues/1356 MDEyOklzc3VlQ29tbWVudDI5NDYyODI5NQ== darothen 4992424 2017-04-17T23:44:08Z 2017-04-17T23:44:08Z NONE

Turns out it was easy enough to add an accessor for ds['time.time']; that's already provided via pandas.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add DatetimeAccessor for accessing datetime fields via `.dt` attribute 220011864
294520064 https://github.com/pydata/xarray/pull/1356#issuecomment-294520064 https://api.github.com/repos/pydata/xarray/issues/1356 MDEyOklzc3VlQ29tbWVudDI5NDUyMDA2NA== darothen 4992424 2017-04-17T16:21:19Z 2017-04-17T16:21:19Z NONE

There's a test-case relating to #367 (test_virtual_variable_same_name) which is causing me a bit of grief as I re-factor the virtual variable logic. Should we really be able to access variables like ds['time.time']? This seems to break the logic of what a virtual variable does, and was implemented to help out with time GroupBys and resampling (something I'll eventually get around to finishing up a refactor for - #1272).

Two options for fixing:

  1. add a "time" field to DateTimeAccessor.
  2. add some additional if-then-else logic to _get_virtual_variable.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add DatetimeAccessor for accessing datetime fields via `.dt` attribute 220011864
293881177 https://github.com/pydata/xarray/pull/1356#issuecomment-293881177 https://api.github.com/repos/pydata/xarray/issues/1356 MDEyOklzc3VlQ29tbWVudDI5Mzg4MTE3Nw== darothen 4992424 2017-04-13T12:26:24Z 2017-04-13T12:26:24Z NONE

Finished clean-up, added some documentation, etc. I mangled resolving a merge conflict with my update to whats-new.rst (5ae4e08) in terms of the commit text, but other than that I think we're getting closer to finishing this.

wrt to the virtual variables, I think some more thinking is necessary so we can come up with a plan of approach. Do we want to deprecate this feature entirely? Do we just want to wrap the datetime component virtual variables to the .dt accessor if they're datetime-like? We could very easily do the latter for 0.9.3, but maybe we should target a future major release to deprecate the virtual variables and instead encourage adding a few specialized (but commonly-used) accessors to xarray?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add DatetimeAccessor for accessing datetime fields via `.dt` attribute 220011864
293280073 https://github.com/pydata/xarray/pull/1356#issuecomment-293280073 https://api.github.com/repos/pydata/xarray/issues/1356 MDEyOklzc3VlQ29tbWVudDI5MzI4MDA3Mw== darothen 4992424 2017-04-11T14:25:27Z 2017-04-11T14:25:27Z NONE

Updated with support for multi-dimensional time data stored as dask array.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add DatetimeAccessor for accessing datetime fields via `.dt` attribute 220011864
292930254 https://github.com/pydata/xarray/issues/1352#issuecomment-292930254 https://api.github.com/repos/pydata/xarray/issues/1352 MDEyOklzc3VlQ29tbWVudDI5MjkzMDI1NA== darothen 4992424 2017-04-10T12:06:52Z 2017-04-10T12:07:03Z NONE

Yeah, I tend to agree, there should be some sort of auto-magic happening. But, I can think of at least two options:

  1. Coerce to array-like, like you do manually in your first comment here. That makes sense if the dimension is important, i.e. it carries useful metadata or encodes something important.

  2. Coerce to an attribute on the Dataset.

I use workflows where I concatenate things like multiple ensemble members into a single file, and I wind up with this pattern all the time. I usually just drop() the offending coordinate, and save it as part of the output filename. This is because tools like cdo really, really don't like non lat-lon-time dimensions, so that can interrupt my workflow sometimes. Saving as an attribute bypasses this issue, but then you lose the ability to retain any metadata that was associated with that coordinate.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Saving to netCDF with 0D dimension doesn't work 219321876
292926691 https://github.com/pydata/xarray/issues/1352#issuecomment-292926691 https://api.github.com/repos/pydata/xarray/issues/1352 MDEyOklzc3VlQ29tbWVudDI5MjkyNjY5MQ== darothen 4992424 2017-04-10T11:48:37Z 2017-04-10T11:48:37Z NONE

@andreas-h you can drop the 0D dimensions:

python d_ = d_.drop(['category', 'species']) d_.to_netcdf(...)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Saving to netCDF with 0D dimension doesn't work 219321876
292569100 https://github.com/pydata/xarray/pull/1356#issuecomment-292569100 https://api.github.com/repos/pydata/xarray/issues/1356 MDEyOklzc3VlQ29tbWVudDI5MjU2OTEwMA== darothen 4992424 2017-04-07T15:30:43Z 2017-04-07T15:30:43Z NONE

@shoyer I corrected things based on your comments. The last commit is an attempt to refactor things to match the way that methods like rolling/groupby functions are injected into the class; this might be totally superfluous here, but I thought it was worth trying.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add DatetimeAccessor for accessing datetime fields via `.dt` attribute 220011864
291568964 https://github.com/pydata/xarray/issues/358#issuecomment-291568964 https://api.github.com/repos/pydata/xarray/issues/358 MDEyOklzc3VlQ29tbWVudDI5MTU2ODk2NA== darothen 4992424 2017-04-04T17:14:18Z 2017-04-04T17:14:18Z NONE

Proof of concept, borrowing liberally from pandas. I think this will be pretty straightforward to hook up into xarray. I wonder, is there any way to register such an accessor with DataArrays that have a specific dtype? Ideally we'd only want to expose this accessor if a DataArray was a numpy.datetime64 type under the hood.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add .dt and .str accessors to DataArray (like pandas.Series) 59720901
291228898 https://github.com/pydata/xarray/issues/358#issuecomment-291228898 https://api.github.com/repos/pydata/xarray/issues/358 MDEyOklzc3VlQ29tbWVudDI5MTIyODg5OA== darothen 4992424 2017-04-03T18:20:32Z 2017-04-03T18:20:32Z NONE

Working on a project today which would greatly benefit from having the .dt accessors. Given that this issue is nearly two years old, any thoughts on what it would take to resolve in the present codebase? Still as straightforward as wrappers on the pandas time series methods?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add .dt and .str accessors to DataArray (like pandas.Series) 59720901
290133148 https://github.com/pydata/xarray/issues/1092#issuecomment-290133148 https://api.github.com/repos/pydata/xarray/issues/1092 MDEyOklzc3VlQ29tbWVudDI5MDEzMzE0OA== darothen 4992424 2017-03-29T15:47:57Z 2017-03-29T15:48:17Z NONE

Ah, thanks for the heads-up @benbovy! I see the difference now, and I agree both approaches could co-exist. I may play around with building some of your proposed DatasetNode functionality into my Experiment tool.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset groups 187859705
290106782 https://github.com/pydata/xarray/issues/1092#issuecomment-290106782 https://api.github.com/repos/pydata/xarray/issues/1092 MDEyOklzc3VlQ29tbWVudDI5MDEwNjc4Mg== darothen 4992424 2017-03-29T14:26:15Z 2017-03-29T14:26:15Z NONE

Would the domain for this just be to simulate the tree-like structure that NetCDF permits, or could it extend to multiple datasets on disk? One of the ideas that we had during the aospy hackathon involved some sort of idiom based on xarray for packing multiple, similar datasets together. For instance, it's very common in climate science to re-run a model multiple times nearly identically, but changing a parameter or boundary condition. So you end up with large archives of data on disk which are identical in shape and metadata, and you want to be able to quickly analyze across them.

As an example, I built a helper tool during my dissertation to automate much of this, allowing you to dump your processed output in some sort of directory structure and consistent naming scheme, and then easily ingest what you need for a given analysis. It's actually working great for a much larger, Monte Carlo set of model simulations right now (3 factor levels with 3-5 values at each level, for a total of 1500 years of simulation). My tool works by concatenating each experimental factor as a new dimension, which lets you use xarray's selection tools to perform analyses across the ensemble. You can pre-process things before concatenating too, if the data ends up being too big to fit in memory (e.g. for every simulation in the experiment, compute time-zonal averages before concatenation).

Going back to @shoyer's comment, it still seems as though there is room to build some sort of collection of Datasets, in the same way that a Dataset is a collection of DataArrays. Maybe this is different than @lamorton's grouping example, but it would be really, really cool if you could use the same sort of syntactic sugar to select across multiple Datasets with like-dimensions just as you could slice into groups inside a Dataset as proposed here. It would certainly make things much more manageable than concatenating huge combinations of Datasets in memory!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset groups 187859705
289106737 https://github.com/pydata/xarray/issues/1327#issuecomment-289106737 https://api.github.com/repos/pydata/xarray/issues/1327 MDEyOklzc3VlQ29tbWVudDI4OTEwNjczNw== darothen 4992424 2017-03-24T18:25:40Z 2017-03-24T18:25:40Z NONE

I saw your PR #1328 on this, but just a heads up that there is an open issue #1269 and pull-request #1272 to re-factor the resampling API to match the GroupBy-like API used by pandas. count() works without any issues on my feature branch.

I've been extremely busy but can try to carve out some more time in the near future to settle some remaining issues on that PR, which would resolve this issue too.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add 'count' as option for how in dataset resample 216833414
283448614 https://github.com/pydata/xarray/pull/1272#issuecomment-283448614 https://api.github.com/repos/pydata/xarray/issues/1272 MDEyOklzc3VlQ29tbWVudDI4MzQ0ODYxNA== darothen 4992424 2017-03-01T19:46:46Z 2017-03-01T19:46:46Z NONE

Should .apply() really work on non-aggregation functions? Based on the pandas documentation it seems like "resample" is truly just a synonym for a transformation of the time dimension. I can't really find many examples of people using this as a substitute for time group-bys... it seems that's what the pd.TimeGrouper is for, in conjunction with a normal .groupby().

As written, non-aggregation ("transformation"?) doesn't work because the call in _combine() to _maybe_reorder() messes things up (it drops all of the data along the resampled dimension). It shouldn't be too hard to fix this, although I'm leaning more and more to making stand-alone Data{Array,set}Resample classes in a separate file which only loosely inherit from their Data{Array,set}GroupBy cousins, since they need to re-write some really critical parts of the underlying machinery.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Groupby-like API for resampling 208215185
281208031 https://github.com/pydata/xarray/pull/1272#issuecomment-281208031 https://api.github.com/repos/pydata/xarray/issues/1272 MDEyOklzc3VlQ29tbWVudDI4MTIwODAzMQ== darothen 4992424 2017-02-20T23:51:01Z 2017-02-20T23:51:01Z NONE

Thanks for the feedback, @shoyer! Will circle back around to continue working on this in a few days when I have some free time.

  • Daniel
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Groupby-like API for resampling 208215185
281186680 https://github.com/pydata/xarray/pull/1272#issuecomment-281186680 https://api.github.com/repos/pydata/xarray/issues/1272 MDEyOklzc3VlQ29tbWVudDI4MTE4NjY4MA== darothen 4992424 2017-02-20T21:36:06Z 2017-02-20T21:36:06Z NONE

Smoothed out most of the problems from earlier and missing details. Still not sure if it's wise to refactor most of the resampling logic into a new resample.py, like what was done with rolling, but it still makes some sense to keep things in groupby.py because we're just subclassing existing machinery from there.

The only issue now is the signature for init() in Data{set,Array}Resample, where we have to add in two keyword arguments. Python 2.x doesn't like named arguments after *args. There are a few options here, mostly just playing with **kwargs as in this StackOverflow thread.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Groupby-like API for resampling 208215185
280663975 https://github.com/pydata/xarray/issues/1273#issuecomment-280663975 https://api.github.com/repos/pydata/xarray/issues/1273 MDEyOklzc3VlQ29tbWVudDI4MDY2Mzk3NQ== darothen 4992424 2017-02-17T14:28:21Z 2017-02-17T14:28:21Z NONE

+1 from me; adding this as a method on Dataset and DataArray would be great.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  replace a dim with a coordinate from another dataset 208312826
280104546 https://github.com/pydata/xarray/issues/1269#issuecomment-280104546 https://api.github.com/repos/pydata/xarray/issues/1269 MDEyOklzc3VlQ29tbWVudDI4MDEwNDU0Ng== darothen 4992424 2017-02-15T18:59:17Z 2017-02-15T18:59:17Z NONE

@MaximilianR Oh, the interface is easy enough to do, even maintaining backwards-compatibility (already have that working). I was considering going the route done with GroupBy and the classes that compose it, like DatasetGroupBy... basically, we just record the wanted resampling dimension and inject the grouping/resampling operations we want. Also adds the ability to specialize methods like .first() and .last(), which is done under the current implementation.

But.... if there's a simpler way, that might be preferable!

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  GroupBy like API for resample 207587161
279845588 https://github.com/pydata/xarray/issues/1269#issuecomment-279845588 https://api.github.com/repos/pydata/xarray/issues/1269 MDEyOklzc3VlQ29tbWVudDI3OTg0NTU4OA== darothen 4992424 2017-02-14T21:44:11Z 2017-02-14T21:44:11Z NONE

Assuming we want to stick with pd.TimeGrouper under the hood, the only sticking point I've come across so far is how to have the resulting Data{Array,set}GroupBy object "remember" the resampling dimension, e.g. if you have multi-dimensional data and want to compute time means you have to call

python ds.resample(time='24H').mean('time')

or else mean will operate across all dimensions. Any thoughts, @shoyer?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  GroupBy like API for resample 207587161
279810604 https://github.com/pydata/xarray/issues/1269#issuecomment-279810604 https://api.github.com/repos/pydata/xarray/issues/1269 MDEyOklzc3VlQ29tbWVudDI3OTgxMDYwNA== darothen 4992424 2017-02-14T19:32:01Z 2017-02-14T19:32:01Z NONE

Let me dig into this a bit right now. My analysis project for this afternoon was already going to require digging into pandas' resampling in more depth anyways.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  GroupBy like API for resample 207587161
243124532 https://github.com/pydata/xarray/issues/988#issuecomment-243124532 https://api.github.com/repos/pydata/xarray/issues/988 MDEyOklzc3VlQ29tbWVudDI0MzEyNDUzMg== darothen 4992424 2016-08-29T13:32:11Z 2016-08-29T13:32:11Z NONE

I definitely see the logic with regards to encouraging users to use a context manager, and from the perspective of someone building a third-party library on top of xarray it would be fine. However, I think that from the perspective of an end-user (for example, a scientist) crunching numbers and analyzing data with xarray simply as a convenience library, this produces much too obfuscated code - a standard library import (contextlib, which isn't something many scientific coders would regularly use or necessarily know about) and a lot of boiler-plate "enabling" the extra features they want in their calculation.

I think your earlier proposal of an xarray.set_options is a cleaner and simpler way forward, even if it does have thorns. Do you have any estimate of the performance penalty checking hooks on all xarray objects would incur?

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Hooks for custom attribute handling in xarray operations 173612265
242912131 https://github.com/pydata/xarray/issues/987#issuecomment-242912131 https://api.github.com/repos/pydata/xarray/issues/987 MDEyOklzc3VlQ29tbWVudDI0MjkxMjEzMQ== darothen 4992424 2016-08-27T11:34:28Z 2016-08-27T11:34:28Z NONE

@joonro, I think there's a strong case to be made about returning a DataArray with some metadata appended. Referring to the latest draft of the CF Metadata Conventions, there is a clear way to indicate when operations such as mean, max, or min have been applied to a variable by using the cell_methods attribute.

It might be more prudent to add this attribute whenever we apply these operations to a DataArray (or perhaps variable-wise when applied to a Dataset). That way, there is a clear reason to not return a scalar - the documentation of what operations were applied to produce that final result.

I can whip up a working example/pull request if people think this is a direction to go. I'd probably build a decorator which handles inspection of the operator name and arguments and uses that to add the cell_methods attribute, that way people can add the same functionality to homegrown methods/operators.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Return a scalar instead of DataArray when the return value is a scalar 173494017
224049602 https://github.com/pydata/xarray/issues/463#issuecomment-224049602 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDIyNDA0OTYwMg== darothen 4992424 2016-06-06T18:42:06Z 2016-06-06T18:42:06Z NONE

@mangecoeur, although it's not an xarray-based solution, I've found that by far the best solution to this problem is to transform your dataset from the "timeslice" format (which is convenient for models to write out - all the data at a given point in time, often in separate files for each time step) to "timeseries" format - a continuous format, where you have all the data for a single variable in a single (or much smaller collection of) files.

NCAR published a great utility for converting batches of NetCDF output from timeslice to timeseries format here; it's significantly faster than any shell-script/CDO/NCO solution I've ever encountered, and it parallelizes extremely easily.

Adding a simple post-processing step to convert my simulation output to timeseries format dramatically reduced my overall work time. Before, I had a separate handler which re-implemented open_mfdataset(), performed an intermediate reduction (usually extracting a variable), and then concatenated within xarray. This could get around the open file limit, but it wasn't fast. My pre-processed data is often still big - barely fitting within memory - but it's far easier to handle, and you can throw dask at it no problem to get huge speedups in analysis.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
220334426 https://github.com/pydata/xarray/issues/851#issuecomment-220334426 https://api.github.com/repos/pydata/xarray/issues/851 MDEyOklzc3VlQ29tbWVudDIyMDMzNDQyNg== darothen 4992424 2016-05-19T14:05:34Z 2016-05-19T14:05:34Z NONE

@byersiiasa, what happens if you just concatenate them using the NCO command ncrcat?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.concat and xr.to_netcdf new filesize 155741762
192357422 https://github.com/pydata/xarray/issues/784#issuecomment-192357422 https://api.github.com/repos/pydata/xarray/issues/784 MDEyOklzc3VlQ29tbWVudDE5MjM1NzQyMg== darothen 4992424 2016-03-04T16:58:59Z 2016-03-04T16:58:59Z NONE

The reindex_like() approach works super well in my case. Since only my latitudes are screwed up (and they're spaced by a tad more than a degree), a low tolerance 1e-2-1e-3 worked perfectly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  almost-equal grids 138443211
192332830 https://github.com/pydata/xarray/issues/784#issuecomment-192332830 https://api.github.com/repos/pydata/xarray/issues/784 MDEyOklzc3VlQ29tbWVudDE5MjMzMjgzMA== darothen 4992424 2016-03-04T15:56:58Z 2016-03-04T15:56:58Z NONE

Hi @mathause, I actually just ran into a very similar problem to your second bullet point. I had some limited success by manually re-building the re-gridded dataset onto the CESM coordinate system, swapping out the not-exactly-but-actually-close-enough coordinates for the CESM reference data's coordinates. In my case, I was re-gridding with CDO, but even when I explicitly pull out the CESM grid definition it wouldn't match precisely.

Since there was a lot of boilerplate code to do this in xarray (although I had a lot of success defining a callback to pass in with open_dataset), it was far easier just to use NCO to copy the correct coordinate variables into the re-gridded data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  almost-equal grids 138443211
187245860 https://github.com/pydata/xarray/issues/768#issuecomment-187245860 https://api.github.com/repos/pydata/xarray/issues/768 MDEyOklzc3VlQ29tbWVudDE4NzI0NTg2MA== darothen 4992424 2016-02-22T16:04:39Z 2016-02-22T16:04:39Z NONE

Hi @jonathanstrong,

Just thought it would be useful to point out that the people who maintain NetCDF is Unidata, a branch of the University Corporation for Atmospheric Research. In fact, netCDF-4 is essentially built on top of HDF5 - a much more widely-known file format, with first-class support including an I/O layer in pandas. While it would certainly be great to "sell" netCDF as a format in the documentation, those of us who still have to write netCDF-based I/O modules for our Fortran models might have to throw up a little in our mouths when we do so...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  save/load DataArray to numpy npz functions 134376872
169057010 https://github.com/pydata/xarray/issues/704#issuecomment-169057010 https://api.github.com/repos/pydata/xarray/issues/704 MDEyOklzc3VlQ29tbWVudDE2OTA1NzAxMA== darothen 4992424 2016-01-05T16:44:41Z 2016-01-05T16:44:41Z NONE

I also like import xarray as xr.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Complete renaming xray -> xarray 124867009
148376642 https://github.com/pydata/xarray/issues/624#issuecomment-148376642 https://api.github.com/repos/pydata/xarray/issues/624 MDEyOklzc3VlQ29tbWVudDE0ODM3NjY0Mg== darothen 4992424 2015-10-15T12:57:04Z 2015-10-15T12:57:04Z NONE

Is there another easy way to add a constant offset to all the values of a dimension (e.g. add, say, 10 meters to every value in the dimension)? I don't typically use operations like that, but I can see where they might be useful.

If not, then rolling in integer space is the way to go.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  roll method 111471076
148206569 https://github.com/pydata/xarray/issues/624#issuecomment-148206569 https://api.github.com/repos/pydata/xarray/issues/624 MDEyOklzc3VlQ29tbWVudDE0ODIwNjU2OQ== darothen 4992424 2015-10-14T21:24:35Z 2015-10-14T21:24:35Z NONE

Using an API like ds.roll(time=100) would be more consistent with other aggregation/manipulation routines, and there's nothing in @rabernat 's code that forbids that call signature.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  roll method 111471076
131214583 https://github.com/pydata/xarray/issues/531#issuecomment-131214583 https://api.github.com/repos/pydata/xarray/issues/531 MDEyOklzc3VlQ29tbWVudDEzMTIxNDU4Mw== darothen 4992424 2015-08-14T19:26:18Z 2015-08-14T19:26:18Z NONE

Hi @jsbj,

The fancy indexing notation you're trying to use only works when xray successfully decodes the time dimension. As discussed in the documentation here, this only works when the year of record falls between 1678 and 2262. Since you have years 2262-2300 in your dataset, this is a feature - xray is failing gracefully.

There are a few current open discussions on this behavior, which is an issue higher up the python chain with numpy: 1. time decoding error with "days since" 2. Fix datetime decoding when time units are 'days since 0000-01-01 00:00:00' 3. ocefpaf - Loading non-standard dates with cf_units 4. numpy - Non-standard Calendar Support

For now, a very simple hack would be to re-compute your time units so that they're re-based, say, with units 'days since 1700-01-01 00:00:00'. That way all of them would fit within the permissible range to use the decoding routine built into xray. You could simply pass the decode_cf=False flag when you open the dataset, modify the non-decoded time array and units, then run xray.decode_cf() on the modified dataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Having trouble with time dim of CMIP5 dataset 100980878

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 25.893ms · About: xarray-datasette