home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

64 rows where state = "closed", type = "issue" and user = 2443309 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

type 1

  • issue · 64 ✖

state 1

  • closed · 64 ✖

repo 1

  • xarray 64
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
33637243 MDU6SXNzdWUzMzYzNzI0Mw== 131 Dataset summary methods jhamman 2443309 closed 0   0.2 650893 10 2014-05-16T00:17:56Z 2023-09-28T12:42:34Z 2014-05-21T21:47:29Z MEMBER      

Add summary methods to Dataset object. For example, it would be great if you could summarize a entire dataset in a single line.

(1) Mean of all variables in dataset.

python mean_ds = ds.mean()

(2) Mean of all variables in dataset along a dimension:

python time_mean_ds = ds.mean(dim='time')

In the case where a dimension is specified and there are variables that don't use that dimension, I'd imagine you would just pass that variable through unchanged.

Related to #122.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/131/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1644429340 I_kwDOAMm_X85iBAAc 7692 Feature proposal: DataArray.to_zarr() jhamman 2443309 closed 0     5 2023-03-28T18:00:24Z 2023-04-03T15:53:37Z 2023-04-03T15:53:37Z MEMBER      

Is your feature request related to a problem?

It would be nice to mimic the behavior of DataArray.to_netcdf for the Zarr backend.

Describe the solution you'd like

This should be possible: python xr.open_dataarray('file.nc').to_zarr('store.zarr')

Describe alternatives you've considered

None.

Additional context

xref DataArray.to_netcdf issue/PR: #915 / #990

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7692/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1642635191 I_kwDOAMm_X85h6J-3 7686 Add reset_encoding to Dataset and DataArray objects jhamman 2443309 closed 0     2 2023-03-27T18:51:39Z 2023-03-30T21:09:17Z 2023-03-30T21:09:17Z MEMBER      

Is your feature request related to a problem?

Xarray maintains the encoding of datasets read from most of its supported backend formats (e.g. NetCDF, Zarr, etc.). This is very useful when you want to perfectly roundtrip but it often gets in the way, causing conflicts when writing a modified dataset or when appending to another dataset. Most of the time, the solution is to just remove the encoding from the dataset and continue on. The following code sample is found in a number of issues that reference this problem.

```python for v in list(ds.coords.keys()): if ds.coords[v].dtype == object: ds[v].encoding.clear()

for v in list(ds.variables.keys()):
    if ds[v].dtype == object:
        ds[v].encoding.clear()

```

A sample of issues that show variants of this problem.

  • https://github.com/pydata/xarray/issues/3476
  • https://github.com/pydata/xarray/issues/3739
  • https://github.com/pydata/xarray/issues/4380
  • https://github.com/pydata/xarray/issues/5219
  • https://github.com/pydata/xarray/issues/5969
  • https://github.com/pydata/xarray/issues/6329
  • https://github.com/pydata/xarray/issues/6352

Describe the solution you'd like

In many cases, the solution to these problems is to leave the original dataset encoding behind and either use Xarray's default encoding (or the backends default) or to specify one's own encoding options. Both cases would benefit from a convenience method to reset the original encoding. Something like would serve this process:

python ds = xr.open_dataset(...).reset_encoding()

Describe alternatives you've considered

Variations on the API above could also be considered:

python xr.open_dataset(..., keep_encoding=False)

or even: python with xr.set_options(keep_encoding=False): ds = xr.open_dataset(...)

We can/should also do a better job of surfacing inconsistent encoding in our backends (e.g. to_netcdf).

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7686/reactions",
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 2,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1558497871 I_kwDOAMm_X85c5MpP 7479 Use NumPy's SupportsDType jhamman 2443309 closed 0     0 2023-01-26T17:21:32Z 2023-02-28T23:23:47Z 2023-02-28T23:23:47Z MEMBER      

What is your issue?

Now that we've bumped our minimum NumPy version to 1.21, we can address this comment:

https://github.com/pydata/xarray/blob/b21f62ee37eea3650a58e9ffa3a7c9f4ae83006b/xarray/core/types.py#L57-L62

I decided not to tackle this as part of #7461 but we may be able to do something like this:

python from numpy.typing._dtype_like import _DTypeLikeNested, _ShapeLike, _SupportsDType

xref: #6834 cc @headtr1ck

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7479/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1247014308 I_kwDOAMm_X85KU-2k 6634 Optionally include encoding in Dataset to_dict jhamman 2443309 closed 0     0 2022-05-24T19:10:01Z 2022-05-26T19:17:35Z 2022-05-26T19:17:35Z MEMBER      

Is your feature request related to a problem?

When using Xarray's to_dict methods to record a Dataset's schema, it would be useful to (optionally) include encoding in the output.

Describe the solution you'd like

The feature request may be resolved by simply adding an encoding keyword argument. This may look like this:

python ds = xr.Dataset(...) ds.to_dict(data=False, encoding=True)

Describe alternatives you've considered

It is currently possible to manually extract encoding attributes but this is a less desirable solution.

xref: https://github.com/pangeo-forge/pangeo-forge-recipes/issues/256

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6634/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
636449225 MDU6SXNzdWU2MzY0NDkyMjU= 4139 [Feature request] Support file-like objects in open_rasterio jhamman 2443309 closed 0     2 2020-06-10T18:11:26Z 2022-04-19T17:15:21Z 2022-04-19T17:15:20Z MEMBER      

With some acrobatics, it is possible to open file-like objects to rasterio. It would be useful if xarray supported this workflow, particularly for working with cloud optimized geotiffs and fs-spec.

MCVE Code Sample

```python with open('my_data.tif', 'rb') as f: da = xr.open_rasterio(f)

```

Expected Output

DataArray -> equivalent to xr.open_rasterio('my_data.tif')

Problem Description

We only currently allow str, rasterio.DatasetReader, or rasterio.WarpedVRT as inputs to open_rasterio.

Versions

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: 2a288f6ed4286910fcf3ab9895e1e9cbd44d30b4 python: 3.8.2 | packaged by conda-forge | (default, Apr 24 2020, 07:56:27) [Clang 9.0.1 ] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.15.2.dev68+gb896a68f pandas: 1.0.4 numpy: 1.18.5 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.5 cfgrib: None iris: None bottleneck: None dask: 2.18.1 distributed: 2.18.0 matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 46.1.3.post20200325 pip: 20.1 conda: None pytest: 5.4.3 IPython: 7.13.0 sphinx: 3.0.3

xref: https://github.com/pangeo-data/pangeo-datastore/issues/109

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4139/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1108564253 I_kwDOAMm_X85CE1kd 6176 Xarray versioning to switch to CalVer jhamman 2443309 closed 0     10 2022-01-19T21:09:45Z 2022-03-03T04:32:10Z 2022-01-31T18:35:27Z MEMBER      

Xarray is planning to switch to Calendar versioning (calver). This issue serves as a general announcement.

The idea has come up in multiple developer meetings (#4001) and is part of a larger effort to increase our release cadence (#5927). Today's developer meeting included unanimous consent for the change. Other projects in Xarray's ecosystem have also made this change recently (e.g. https://github.com/dask/community/issues/100). While it is likely we will make this change in the next release or two, users and developers should feel free to voice objections here.

The proposed calver implementation follows the same schema as the Dask project, that is; YYYY.MM.X (4 digit year, two digit month, one digit micro zero-indexed version. For example, the code block below provides comparison of the current and future version tags:

```python In [1]: import xarray as xr

current

In [2]: xr.version Out[2]: '0.19.1'

proposed

In [2]: xr.version Out[2]: '2022.01.0' ```

cc @pydata/xarray

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6176/reactions",
    "total_count": 6,
    "+1": 6,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
139064764 MDU6SXNzdWUxMzkwNjQ3NjQ= 787 Add Groupby and Rolling methods to docs jhamman 2443309 closed 0     2 2016-03-07T19:10:26Z 2021-11-08T19:51:00Z 2021-11-08T19:51:00Z MEMBER      

The injected apply/reduce methods for the Groupby and Rolling objects are not shown in the api documentation page. While there is obviously a fair bit of overlap between the similar DataArray/Dataset methods, it would help users to know what methods are available to the Groupby and Rolling methods if we explicitly listed them in the documentation. Suggestions on the best format to show these mehtods (e.g. Rolling.mean) are welcomed.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/787/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
663968779 MDU6SXNzdWU2NjM5Njg3Nzk= 4253 [community] Backends refactor meeting jhamman 2443309 closed 0     13 2020-07-22T18:39:19Z 2021-03-11T20:42:33Z 2021-03-11T20:42:33Z MEMBER      

In today's dev call, we opted to schedule a separate meeting to discuss the backends refactor that BOpen (@alexamici and his team) is beginning to work on. This issue is meant to coordinate the scheduling of this meeting. To that end, I've created the following Doodle Poll to help choose a time: https://doodle.com/poll/4mtzxncka7gee4mq

Anyone from @pydata/xarray should feel free to join if there is interest. At a minimum, I'm hoping to have @alexamici, @aurghs, @shoyer, and @rabernat there.

Please respond to the poll by COB tomorrow so I can quickly get the meeting on the books. Thanks!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4253/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
287223508 MDU6SXNzdWUyODcyMjM1MDg= 1815 apply_ufunc(dask='parallelized') with multiple outputs jhamman 2443309 closed 0     17 2018-01-09T20:40:52Z 2020-08-19T06:57:55Z 2020-08-19T06:57:55Z MEMBER      

I have an application where I'd like to use apply_ufunc with dask on a function that requires multiple inputs and outputs. This was left as a TODO item in the #1517. However, its not clear to me looking at the code how this can be done given the current form of dask's atop. I'm hoping @shoyer has already thought of a clever solution here...

Code Sample, a copy-pastable example if possible

```python def func(foo, bar):

assert foo.shape == bar.shape
spam = np.zeros_like(bar)
spam2 = np.full_like(bar, 2)


return spam, spam2

foo = xr.DataArray(np.zeros((10, 10))).chunk() bar = xr.DataArray(np.zeros((10, 10))).chunk() + 5

xrfunc = xr.apply_ufunc(func, foo, bar, output_core_dims=[[], []], dask='parallelized') ```

Problem description

This currently raises a NotImplementedError.

Expected Output

Multiple dask arrays. In my example above, two dask arrays.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 64 OS: Linux OS-release: 4.4.86+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.0+dev.c92020a pandas: 0.22.0 numpy: 1.13.3 scipy: 1.0.0 netCDF4: 1.3.1 h5netcdf: 0.5.0 Nio: None zarr: 2.2.0a2.dev176 bottleneck: 1.2.1 cyordereddict: None dask: 0.16.0 distributed: 1.20.2+36.g7387410 matplotlib: 2.1.1 cartopy: None seaborn: None setuptools: 38.4.0 pip: 9.0.1 conda: 4.3.29 pytest: 3.3.2 IPython: 6.2.1 sphinx: None

cc @mrocklin, @arbennett

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1815/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
318988669 MDU6SXNzdWUzMTg5ODg2Njk= 2094 Drop win-32 platform CI from appveyor matrix? jhamman 2443309 closed 0     3 2018-04-30T18:29:17Z 2020-03-30T20:30:58Z 2020-03-24T03:41:24Z MEMBER      

Conda-forge has dropped support for 32-bit windows builds (https://github.com/conda-forge/cftime-feedstock/issues/2#issuecomment-385485144). Do we want to continue testing against this environment? The point becomes moot after #1876 gets wrapped up in ~7 months.

xref: https://github.com/pydata/xarray/pull/1252

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2094/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
578017585 MDU6SXNzdWU1NzgwMTc1ODU= 3851 Exposing Zarr backend internals as semi-public API jhamman 2443309 closed 0     3 2020-03-09T16:04:49Z 2020-03-27T22:37:26Z 2020-03-27T22:37:26Z MEMBER      

We recently built a prototype REST API for serving xarray datasets via a Fast-API application (see #3850 for more details). In the process of doing this, we needed to use a few internal functions in Xarray's Zarr backend:

python from xarray.backends.zarr import ( _DIMENSION_KEY, _encode_zarr_attr_value, _extract_zarr_variable_encoding, encode_zarr_variable, ) from xarray.core.pycompat import dask_array_type from xarray.util.print_versions import get_sys_info, netcdf_and_hdf5_versions

Obviously, none of these imports are really meant for use outside of Xarray's backends so I'd like to discuss how we may go about exposing these functions (or variables) as semi-public (advanced use) API features. Thoughts?

cc @rabernat

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3851/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
197920258 MDU6SXNzdWUxOTc5MjAyNTg= 1188 Should we deprecate the compat and encoding constructor arguments? jhamman 2443309 closed 0     5 2016-12-28T21:41:26Z 2020-03-24T14:34:37Z 2020-03-24T14:34:37Z MEMBER      

In https://github.com/pydata/xarray/pull/1170#discussion_r94078121, @shoyer writes:

...I would consider deprecating the encoding argument to DataArray instead. It would also make sense to get rid of the compat argument to Dataset.

These extra arguments are not part of the fundamental xarray data model and thus are a little distracting, especially to new users.

@pydata/xarray and others, what do we think about deprecating the compat argument to the Dataset constructor and the encoding arguement to the DataArray (and Dataset via #1170).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1188/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
508743579 MDU6SXNzdWU1MDg3NDM1Nzk= 3413 Can apply_ufunc be used on arrays with different dimension sizes jhamman 2443309 closed 0     2 2019-10-17T22:04:00Z 2019-12-11T22:32:23Z 2019-12-11T22:32:23Z MEMBER      

We have an application where we want to use apply_ufunc to apply a function that takes two 1-D arrays and returns a scalar value (basically a reduction over the only axis). We start with two DataArrays that share all the same dimensions - except for the lengths of the dimension we'll be reducing along (t in this case):

```python def diff_mean(X, y): ''' a function that only works on 1d arrays that are different lengths''' assert X.ndim == 1, X.ndim assert y.ndim == 1, y.ndim assert len(X) != len(y), X return X.mean() - y.mean()

X = np.random.random((10, 4, 5)) y = np.random.random((6, 4, 5))

Xda = xr.DataArray(X, dims=('t', 'x', 'y')).chunk({'t': -1, 'x': 2, 'y': 2}) yda = xr.DataArray(y, dims=('t', 'x', 'y')).chunk({'t': -1, 'x': 2, 'y': 2}) ```

Then, we'd like to use apply_ufunc to apply our function (e.g. diff_mean):

python out = xr.apply_ufunc( diff_mean, Xda, yda, vectorize=True, dask="parallelized", output_dtypes=[np.float], input_core_dims=[['t'], ['t']], )

This fails with an error when aligning the t dimensions:

```python-traceback

ValueError Traceback (most recent call last) <ipython-input-4-e90cf6fba482> in <module> 9 dask="parallelized", 10 output_dtypes=[np.float], ---> 11 input_core_dims=[['t'], ['t']], 12 )

~/miniconda3/envs/xarray-ml/lib/python3.7/site-packages/xarray/core/computation.py in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, *args) 1042 join=join, 1043 exclude_dims=exclude_dims, -> 1044 keep_attrs=keep_attrs 1045 ) 1046 elif any(isinstance(a, Variable) for a in args):

~/miniconda3/envs/xarray-ml/lib/python3.7/site-packages/xarray/core/computation.py in apply_dataarray_vfunc(func, signature, join, exclude_dims, keep_attrs, *args) 222 if len(args) > 1: 223 args = deep_align( --> 224 args, join=join, copy=False, exclude=exclude_dims, raise_on_invalid=False 225 ) 226

~/miniconda3/envs/xarray-ml/lib/python3.7/site-packages/xarray/core/alignment.py in deep_align(objects, join, copy, indexes, exclude, raise_on_invalid, fill_value) 403 indexes=indexes, 404 exclude=exclude, --> 405 fill_value=fill_value 406 ) 407

~/miniconda3/envs/xarray-ml/lib/python3.7/site-packages/xarray/core/alignment.py in align(join, copy, indexes, exclude, fill_value, *objects) 321 "arguments without labels along dimension %r cannot be " 322 "aligned because they have different dimension sizes: %r" --> 323 % (dim, sizes) 324 ) 325

ValueError: arguments without labels along dimension 't' cannot be aligned because they have different dimension sizes: {10, 6}

```

https://nbviewer.jupyter.org/gist/jhamman/0e52d9bb29f679e26b0878c58bb813d2

I'm curious if this can be made to work with apply_ufunc or if we should pursue other options here. Advice and suggestions appreciated.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 14:38:56) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.14.0 pandas: 0.25.1 numpy: 1.17.1 scipy: 1.3.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.3.2 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.3.0 distributed: 2.3.2 matplotlib: 3.1.1 cartopy: None seaborn: None numbagg: None setuptools: 41.2.0 pip: 19.2.3 conda: None pytest: 5.0.1 IPython: 7.8.0 sphinx: 2.2.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3413/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
503700649 MDU6SXNzdWU1MDM3MDA2NDk= 3380 [Release] 0.14 jhamman 2443309 closed 0     19 2019-10-07T21:28:28Z 2019-10-15T01:08:11Z 2019-10-14T21:26:59Z MEMBER      

3358 is going to make some fairly major changes to the minimum supported versions of required and optional dependencies. We also have a few bug fixes that have landed since releasing 0.13 that would be good to get out.

From what I can tell, the following pending PRs are close enough to get into this release. - [ ] ~tests for arrays with units #3238~ - [x] map_blocks #3276 - [x] Rolling minimum dependency versions policy #3358 - [x] Remove all OrderedDict's (#3389) - [x] Speed up isel and __getitem__ #3375 - [x] Fix concat bug when concatenating unlabeled dimensions. #3362 - [ ] ~Add hypothesis test for netCDF4 roundtrip #3283~ - [x] Fix groupby reduce for dataarray #3338 - [x] Need a fix for https://github.com/pydata/xarray/issues/3377

Am I missing anything else that needs to get in?

I think we should aim to wrap this release up soon (this week). I can volunteer to go through the release steps once we're ready.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3380/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
297227247 MDU6SXNzdWUyOTcyMjcyNDc= 1910 Pynio tests are being skipped on TravisCI jhamman 2443309 closed 0     3 2018-02-14T20:03:31Z 2019-02-07T00:08:17Z 2019-02-07T00:08:17Z MEMBER      

Problem description

Currently on Travis, the Pynio tests are being skipped. The py27-cdat+iris+pynio is supposed to be running tests for each of these but it is not.

https://travis-ci.org/pydata/xarray/jobs/341426116#L2429-L2518

I can't look at this right now in depth but I'm wondering if this is related to #1531.

reported by @WeatherGod

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1910/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
302930480 MDU6SXNzdWUzMDI5MzA0ODA= 1971 Should we be testing against multiple dask schedulers? jhamman 2443309 closed 0     5 2018-03-07T01:25:37Z 2019-01-13T20:58:21Z 2019-01-13T20:58:20Z MEMBER      

Almost all of our unit tests are against the dask's default scheduler (usually dask.threaded). While it is true that beauty of dask is that one can separate the scheduler from the logical implementation, there are a few idiosyncrasies to consider, particularly in xarray's backends. To that end, we have a few tests covering the integration of the distributed scheduler with xarray's backends but the test coverage is not particularly complete.

If nothing more, I think it is worth considering tests that use the threaded, multiprocessing, and distributed schedulers for a larger subset of the backends tests (those that use dask).

Note, I'm bringing this up because I'm seeing some failing tests in #1793 that are unrelated to my code change but do appear to be related to dask and possibly a different different default scheduler (example failure).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1971/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
293414745 MDU6SXNzdWUyOTM0MTQ3NDU= 1876 DEP: drop Python 2.7 support jhamman 2443309 closed 0     2 2018-02-01T06:11:07Z 2019-01-02T04:52:04Z 2019-01-02T04:52:04Z MEMBER      

The timeline for dropping Python 2.7 support for new Xarray releases is the end of 2018.

This issue can be used to track the necessary documentation and code changes to make that happen.

xref: #1830

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1876/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
323765896 MDU6SXNzdWUzMjM3NjU4OTY= 2142 add CFTimeIndex enabled date_range function jhamman 2443309 closed 0     1 2018-05-16T20:02:08Z 2018-09-19T20:24:40Z 2018-09-19T20:24:40Z MEMBER      

Pandas' date_range function is a fast and flexible way to create DateTimeIndex objects. Now that we have a functioning CFTimeIndex, it would be great to add a version of the date_range function that supports other calendars and dates out of range for Pandas.

Code Sampl and expected output

```python In [1]: import xarray as xr

In [2]: xr.date_range('2000-02-26', '2000-03-02') Out[2]: DatetimeIndex(['2000-02-26', '2000-02-27', '2000-02-28', '2000-02-29', '2000-03-01', '2000-03-02'], dtype='datetime64[ns]', freq='D')

In [3]: xr.date_range('2000-02-26', '2000-03-02', calendar='noleap') Out[3]: CFTimeIndex(['2000-02-26', '2000-02-27', '2000-02-28', '2000-03-01', '2000-03-02'], dtype='cftime.datetime', freq='D') ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2142/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
288465429 MDU6SXNzdWUyODg0NjU0Mjk= 1829 Drop support for Python 3.4 jhamman 2443309 closed 0   0.11 2856429 13 2018-01-15T02:38:19Z 2018-07-08T00:55:32Z 2018-07-08T00:55:32Z MEMBER      

Python 3.7-final is due out in June (PEP 537). When do we want to deprecate 3.4 and when should we drop support all together. @maxim-lian brought this up in a PR he's working on: https://github.com/pydata/xarray/pull/1828#issuecomment-357562144.

For reference, we dropped Python 3.3 in #1175 (12/20/2016).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1829/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
327893262 MDU6SXNzdWUzMjc4OTMyNjI= 2203 Update minimum version of dask jhamman 2443309 closed 0     6 2018-05-30T20:47:57Z 2018-07-08T00:55:32Z 2018-07-08T00:55:32Z MEMBER      

Xarray currently states that it supports dask version 0.9 and later. However, 1) I don't think this is true and my quick test shows that some of our tests fail using dask 0.9, and 2) we have a growing number of tests that are being skipped for older dask versions:

$ grep -irn "dask.__version__" xarray/tests/*py xarray/tests/__init__.py:90: if LooseVersion(dask.__version__) < '0.18': xarray/tests/test_computation.py:755: if LooseVersion(dask.__version__) < LooseVersion('0.17.3'): xarray/tests/test_computation.py:841: if not use_dask or LooseVersion(dask.__version__) > LooseVersion('0.17.4'): xarray/tests/test_dask.py:211: @pytest.mark.skipif(LooseVersion(dask.__version__) <= '0.15.4', xarray/tests/test_dask.py:223: @pytest.mark.skipif(LooseVersion(dask.__version__) <= '0.15.4', xarray/tests/test_dask.py:284: @pytest.mark.skipif(LooseVersion(dask.__version__) <= '0.15.4', xarray/tests/test_dask.py:296: @pytest.mark.skipif(LooseVersion(dask.__version__) <= '0.15.4', xarray/tests/test_dask.py:387: if LooseVersion(dask.__version__) == LooseVersion('0.15.3'): xarray/tests/test_dask.py:784: pytest.mark.skipif(LooseVersion(dask.__version__) <= '0.15.4', xarray/tests/test_dask.py:802: pytest.mark.skipif(LooseVersion(dask.__version__) <= '0.15.4', xarray/tests/test_dask.py:818:@pytest.mark.skipif(LooseVersion(dask.__version__) <= '0.15.4', xarray/tests/test_variable.py:1664: if LooseVersion(dask.__version__) <= LooseVersion('0.15.1'): xarray/tests/test_variable.py:1670: if LooseVersion(dask.__version__) <= LooseVersion('0.15.1'):

I'd like to see xarray bump the minimum version number of dask to something around 0.15.4 (Oct. 2017) or 0.16 (Nov. 2017).

cc @mrocklin, @pydata/xarray

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2203/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
327875183 MDU6SXNzdWUzMjc4NzUxODM= 2200 DEPS: drop numpy < 1.12 jhamman 2443309 closed 0     0 2018-05-30T19:52:40Z 2018-07-08T00:55:31Z 2018-07-08T00:55:31Z MEMBER      

Pandas is dropping Numpy 1.11 and earlier in their 0.24 release. It is probably easiest to follow suit with xarray.

xref: https://github.com/pandas-dev/pandas/issues/21242

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2200/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
331415995 MDU6SXNzdWUzMzE0MTU5OTU= 2225 Zarr Backend: check for non-uniform chunks is too strict jhamman 2443309 closed 0     3 2018-06-12T02:36:05Z 2018-06-13T05:51:36Z 2018-06-13T05:51:36Z MEMBER      

I think the following block of code is more strict than either dask or zarr requires:

https://github.com/pydata/xarray/blob/6c3abedf906482111b06207b9016ea8493c42713/xarray/backends/zarr.py#L80-L89

It should be possible to have uneven chunks in the last position of multiple dimensions in a zarr dataset.

Code Sample, a copy-pastable example if possible

```python In [1]: import xarray as xr

In [2]: import dask.array as dsa

In [3]: da = xr.DataArray(dsa.random.random((8, 7, 11), chunks=(3, 3, 3)), dims=('x', 'y', 't'))

In [4]: da Out[4]: <xarray.DataArray 'da.random.random_sample-1aed3ea2f9dd784ec947cb119459fa56' (x: 8, y: 7, t: 11)> dask.array<shape=(8, 7, 11), dtype=float64, chunksize=(3, 3, 3)> Dimensions without coordinates: x, y, t

In [5]: da.data.chunks Out[5]: ((3, 3, 2), (3, 3, 1), (3, 3, 3, 2))

In [6]: da.to_dataset('varname').to_zarr('/Users/jhamman/workdir/test_chunks.zarr') /Users/jhamman/anaconda/bin/ipython:1: FutureWarning: the order of the arguments on DataArray.to_dataset has changed; you now need to supply name as a keyword argument #!/Users/jhamman/anaconda/bin/python


ValueError Traceback (most recent call last) <ipython-input-7-32fa9a7d0276> in <module>() ----> 1 da.to_dataset('varname').to_zarr('/Users/jhamman/workdir/test_chunks.zarr')

~/anaconda/lib/python3.6/site-packages/xarray/core/dataset.py in to_zarr(self, store, mode, synchronizer, group, encoding, compute) 1185 from ..backends.api import to_zarr 1186 return to_zarr(self, store=store, mode=mode, synchronizer=synchronizer, -> 1187 group=group, encoding=encoding, compute=compute) 1188 1189 def unicode(self):

~/anaconda/lib/python3.6/site-packages/xarray/backends/api.py in to_zarr(dataset, store, mode, synchronizer, group, encoding, compute) 856 # I think zarr stores should always be sync'd immediately 857 # TODO: figure out how to properly handle unlimited_dims --> 858 dataset.dump_to_store(store, sync=True, encoding=encoding, compute=compute) 859 860 if not compute:

~/anaconda/lib/python3.6/site-packages/xarray/core/dataset.py in dump_to_store(self, store, encoder, sync, encoding, unlimited_dims, compute) 1073 1074 store.store(variables, attrs, check_encoding, -> 1075 unlimited_dims=unlimited_dims) 1076 if sync: 1077 store.sync(compute=compute)

~/anaconda/lib/python3.6/site-packages/xarray/backends/zarr.py in store(self, variables, attributes, args, kwargs) 341 def store(self, variables, attributes, args, kwargs): 342 AbstractWritableDataStore.store(self, variables, attributes, --> 343 *args, kwargs) 344 345 def sync(self, compute=True):

~/anaconda/lib/python3.6/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, unlimited_dims) 366 self.set_dimensions(variables, unlimited_dims=unlimited_dims) 367 self.set_variables(variables, check_encoding_set, --> 368 unlimited_dims=unlimited_dims) 369 370 def set_attributes(self, attributes):

~/anaconda/lib/python3.6/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, unlimited_dims) 403 check = vn in check_encoding_set 404 target, source = self.prepare_variable( --> 405 name, v, check, unlimited_dims=unlimited_dims) 406 407 self.writer.add(source, target)

~/anaconda/lib/python3.6/site-packages/xarray/backends/zarr.py in prepare_variable(self, name, variable, check_encoding, unlimited_dims) 325 326 encoding = _extract_zarr_variable_encoding( --> 327 variable, raise_on_invalid=check_encoding) 328 329 encoded_attrs = OrderedDict()

~/anaconda/lib/python3.6/site-packages/xarray/backends/zarr.py in _extract_zarr_variable_encoding(variable, raise_on_invalid) 181 182 chunks = _determine_zarr_chunks(encoding.get('chunks'), variable.chunks, --> 183 variable.ndim) 184 encoding['chunks'] = chunks 185 return encoding

~/anaconda/lib/python3.6/site-packages/xarray/backends/zarr.py in _determine_zarr_chunks(enc_chunks, var_chunks, ndim) 87 "Zarr requires uniform chunk sizes excpet for final chunk." 88 " Variable %r has incompatible chunks. Consider " ---> 89 "rechunking using chunk()." % (var_chunks,)) 90 # last chunk is allowed to be smaller 91 last_var_chunk = all_var_chunks[-1]

ValueError: Zarr requires uniform chunk sizes excpet for final chunk. Variable ((3, 3, 2), (3, 3, 1), (3, 3, 3, 2)) has incompatible chunks. Consider rechunking using chunk(). ```

Problem description

[this should explain why the current behavior is a problem and why the expected output is a better solution.]

Expected Output

IIUC, Zarr allows multiple dims to have uneven chunks, so long as they are all in the last position:

```Python In [9]: import zarr

In [10]: z = zarr.zeros((8, 7, 11), chunks=(3, 3, 3), dtype='i4')

In [11]: z.chunks Out[11]: (3, 3, 3) ```

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 17.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.7 pandas: 0.22.0 numpy: 1.14.3 scipy: 1.1.0 netCDF4: 1.3.1 h5netcdf: 0.5.1 h5py: 2.7.1 Nio: None zarr: 2.2.0 bottleneck: 1.2.1 cyordereddict: None dask: 0.17.2 distributed: 1.21.6 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 39.0.1 pip: 9.0.3 conda: 4.5.4 pytest: 3.5.1 IPython: 6.3.1 sphinx: 1.7.4
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2225/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
322445312 MDU6SXNzdWUzMjI0NDUzMTI= 2121 rasterio backend should use DataStorePickleMixin (or something similar) jhamman 2443309 closed 0     2 2018-05-11T21:51:59Z 2018-06-07T18:02:56Z 2018-06-07T18:02:56Z MEMBER      

Code Sample, a copy-pastable example if possible

```Python In [1]: import xarray as xr

In [2]: ds = xr.open_rasterio('RGB.byte.tif')

In [3]: ds Out[3]: <xarray.DataArray (band: 3, y: 718, x: 791)> [1703814 values with dtype=uint8] Coordinates: * band (band) int64 1 2 3 * y (y) float64 2.827e+06 2.826e+06 2.826e+06 2.826e+06 2.826e+06 ... * x (x) float64 1.021e+05 1.024e+05 1.027e+05 1.03e+05 1.033e+05 ... Attributes: transform: (101985.0, 300.0379266750948, 0.0, 2826915.0, 0.0, -300.0417... crs: +init=epsg:32618 res: (300.0379266750948, 300.041782729805) is_tiled: 0 nodatavals: (0.0, 0.0, 0.0)

In [4]: import pickle

In [5]: pickle.dumps(ds)

TypeError Traceback (most recent call last) <ipython-input-5-a165c2473431> in <module>() ----> 1 pickle.dumps(ds)

TypeError: can't pickle rasterio._io.RasterReader objects ```

Problem description

Originally reported by @rsignell-usgs in https://github.com/pangeo-data/pangeo/issues/249#issuecomment-388445370, the rasterio backend is not pickle-able. This obviously causes problems when using dask-distributed. We probably need to use DataStorePickleMixin or something similar on rasterio datasets to allow multiple readers of the same dataset.

Expected Output

python pickle.dumps(ds)

returns a pickled dataset.

Output of xr.show_versions()

xr.show_versions() /Users/jhamman/anaconda/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 17.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.3 pandas: 0.22.0 numpy: 1.14.2 scipy: 1.0.1 netCDF4: 1.3.1 h5netcdf: 0.5.1 h5py: 2.7.1 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.17.2 distributed: 1.21.6 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 39.0.1 pip: 9.0.3 conda: 4.5.1 pytest: 3.5.1 IPython: 6.3.1 sphinx: 1.7.4
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2121/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
304201107 MDU6SXNzdWUzMDQyMDExMDc= 1981 use dask to open datasets in parallel jhamman 2443309 closed 0     5 2018-03-11T22:33:52Z 2018-04-20T12:04:23Z 2018-04-20T12:04:23Z MEMBER      

Code Sample, a copy-pastable example if possible

python xr.open_mfdataset('path/to/many/files*.nc', method='parallel')

Problem description

We have many issues describing the less than stelar performance of open_mfdataset (e.g. #511, #893, #1385, #1788, #1823). The problem can be broken into three pieces: 1) open each file, 2) decode/preprocess each datasets, and 3) merge/combine/concat the collection of datasets. We can perform (1) and (2) in parallel (performance improvements to (3) would be a separate task). Lately, I'm finding that for large numbers of files, it can take many seconds to many minutes just to open all the files in a multi-file dataset of mine.

I'm proposing that we use something like dask.bag to parallelize steps (1) and (2). I've played around with this a bit and it "works" almost right out of the box, provided you are using the "autoclose=True" option. A concrete example:

We could change the line: Python datasets = [open_dataset(p, **open_kwargs) for p in paths] to Python import dask.bag as db paths_bag = db.from_sequence(paths) datasets = paths_bag.map(open_dataset, **open_kwargs).compute()

I'm curious what others think of this idea and what the potential downfalls may be.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1981/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
295621576 MDU6SXNzdWUyOTU2MjE1NzY= 1897 Vectorized indexing with cache=False jhamman 2443309 closed 0     5 2018-02-08T18:38:18Z 2018-03-06T22:00:57Z 2018-03-06T22:00:57Z MEMBER      

Code Sample, a copy-pastable example if possible

```python import numpy as np import xarray as xr n_times = 4; n_lats = 10; n_lons = 15 n_points = 4

ds = xr.Dataset({'test_var': (['time', 'latitude', 'longitude'], np.random.random((n_times, n_lats, n_lons)))}) ds.to_netcdf('test.nc')

rand_lons = xr.Variable('points', np.random.randint(0, high=n_lons, size=n_points)) rand_lats = xr.Variable('points', np.random.randint(0, high=n_lats, size=n_points))

ds = xr.open_dataset('test.nc', cache=False) points = ds['test_var'][:, rand_lats, rand_lons] yields:


NotImplementedError Traceback (most recent call last) <ipython-input-7-f16e4cae9456> in <module>() 12 13 ds = xr.open_dataset('test.nc', cache=False) ---> 14 points = ds['test_var'][:, rand_lats, rand_lons]

~/anaconda/envs/pangeo/lib/python3.6/site-packages/xarray/core/dataarray.py in getitem(self, key) 478 else: 479 # xarray-style array indexing --> 480 return self.isel(**self._item_key_to_dict(key)) 481 482 def setitem(self, key, value):

~/anaconda/envs/pangeo/lib/python3.6/site-packages/xarray/core/dataarray.py in isel(self, drop, indexers) 759 DataArray.sel 760 """ --> 761 ds = self._to_temp_dataset().isel(drop=drop, indexers) 762 return self._from_temp_dataset(ds) 763

~/anaconda/envs/pangeo/lib/python3.6/site-packages/xarray/core/dataset.py in isel(self, drop, indexers) 1390 for name, var in iteritems(self._variables): 1391 var_indexers = {k: v for k, v in indexers_list if k in var.dims} -> 1392 new_var = var.isel(var_indexers) 1393 if not (drop and name in var_indexers): 1394 variables[name] = new_var

~/anaconda/envs/pangeo/lib/python3.6/site-packages/xarray/core/variable.py in isel(self, **indexers) 851 if dim in indexers: 852 key[i] = indexers[dim] --> 853 return self[tuple(key)] 854 855 def squeeze(self, dim=None):

~/anaconda/envs/pangeo/lib/python3.6/site-packages/xarray/core/variable.py in getitem(self, key) 620 """ 621 dims, indexer, new_order = self._broadcast_indexes(key) --> 622 data = as_indexable(self._data)[indexer] 623 if new_order: 624 data = np.moveaxis(data, range(len(new_order)), new_order)

~/anaconda/envs/pangeo/lib/python3.6/site-packages/xarray/core/indexing.py in getitem(self, key) 554 555 def getitem(self, key): --> 556 return type(self)(_wrap_numpy_scalars(self.array[key])) 557 558 def setitem(self, key, value):

~/anaconda/envs/pangeo/lib/python3.6/site-packages/xarray/core/indexing.py in getitem(self, indexer) 521 522 def getitem(self, indexer): --> 523 return type(self)(self.array, self._updated_key(indexer)) 524 525 def setitem(self, key, value):

~/anaconda/envs/pangeo/lib/python3.6/site-packages/xarray/core/indexing.py in _updated_key(self, new_key) 491 'Vectorized indexing for {} is not implemented. Load your ' 492 'data first with .load() or .compute(), or disable caching by ' --> 493 'setting cache=False in open_dataset.'.format(type(self))) 494 495 iter_new_key = iter(expanded_indexer(new_key.tuple, self.ndim))

NotImplementedError: Vectorized indexing for <class 'xarray.core.indexing.LazilyIndexedArray'> is not implemented. Load your data first with .load() or .compute(), or disable caching by setting cache=False in open_dataset. ```

Problem description

Raising a NotImplementedError here is fine but it instructs the user to "disable caching by setting cache=False in open_dataset" which I've already done. So my questions are 1) should we expect this to work and 2) if not

Expected Output

Ideally, we can get the same behavior as:

```python ds = xr.open_dataset('test2.nc', cache=False).load() points = ds['test_var'][:, rand_lats, rand_lons]

<xarray.DataArray 'test_var' (time: 4, points: 4)> array([[0.939469, 0.406885, 0.939469, 0.759075], [0.470116, 0.585546, 0.470116, 0.37833 ], [0.274321, 0.648218, 0.274321, 0.383391], [0.754121, 0.078878, 0.754121, 0.903788]]) Dimensions without coordinates: time, points ```

without needing to use .load()

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-693.5.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.0+dev55.g1d32399 pandas: 0.22.0 numpy: 1.14.0 scipy: 1.0.0 netCDF4: 1.3.1 h5netcdf: 0.5.0 h5py: 2.7.1 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.16.1 distributed: 1.20.2 matplotlib: 2.1.2 cartopy: 0.15.1 seaborn: 0.8.1 setuptools: 38.4.0 pip: 9.0.1 conda: None pytest: 3.4.0 IPython: 6.2.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1897/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
287852184 MDU6SXNzdWUyODc4NTIxODQ= 1821 v0.10.1 Release jhamman 2443309 closed 0   0.10.3 3008859 11 2018-01-11T16:56:08Z 2018-02-26T23:20:45Z 2018-02-26T01:48:32Z MEMBER      

We're close to a minor/bug-fix release (0.10.1). What do we need to get done before that can happen?

  • [x] #1800 Performance improvements to Zarr (@jhamman)
  • [ ] #1793 Fix for to_netcdf writes with dask-distributed (@jhamman, could use help)
  • [x] #1819 Normalisation for RGB imshow

Help wanted / bugs that no-one is working on: - [ ] #1792 Comparison to masked numpy arrays - [ ] #1764 groupby_bins fails for empty bins

What else?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1821/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
113497063 MDU6SXNzdWUxMTM0OTcwNjM= 640 Use pytest to simplify unit tests jhamman 2443309 closed 0     2 2015-10-27T03:06:48Z 2018-02-05T21:00:02Z 2018-02-05T21:00:02Z MEMBER      

xray's unit testing system uses Python's standard unittest framework. pytest offers a more flexible framework requiring less boilerplate code. I recently (#638) introduced pytest into xray's CI builds. This issue proposes incrementally migrating and simplifying xray's unit testing framework to pytest.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/640/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
288466108 MDU6SXNzdWUyODg0NjYxMDg= 1830 Drop support for Python 2 jhamman 2443309 closed 0     7 2018-01-15T02:44:15Z 2018-02-01T06:04:08Z 2018-02-01T06:04:08Z MEMBER      

When do we want to drop Python 2 support for Xarray. For reference, Pandas has a stated drop date for Python 2 of the end of 2018 (this year) and Numpy is slightly later and includes an incremental depreciation, final on Jan. 1, 2020.

We may also consider signing this pledge to help make it clear when/why we're dropping Python 2 support: http://www.python3statement.org/

xref: https://github.com/pandas-dev/pandas/issues/18894, https://github.com/numpy/numpy/pull/10006, https://github.com/python3statement/python3statement.github.io/issues/11

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1830/reactions",
    "total_count": 5,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
287186057 MDU6SXNzdWUyODcxODYwNTc= 1813 Test Failure: test_datetime_line_plot jhamman 2443309 closed 0     3 2018-01-09T18:29:35Z 2018-01-10T07:13:53Z 2018-01-10T07:13:53Z MEMBER      

We're getting a single test failure in the plot tests on master (link to travis failure. I haven't been able to reproduce this locally yet so I'm just going to post here to see if anyone has any ideas.

Code Sample

```python ___ TestDatetimePlot.test_datetime_line_plot _____ self = <xarray.tests.test_plot.TestDatetimePlot testMethod=test_datetime_line_plot> def test_datetime_line_plot(self): # test if line plot raises no Exception

  self.darray.plot.line()

xarray/tests/test_plot.py:1333:


xarray/plot/plot.py:328: in line return line(self._da, args, *kwargs) xarray/plot/plot.py:223: in line _ensure_plottable(x)


args = (<xarray.DataArray 'time' (time: 12)> array([datetime.datetime(2017, 1, 1, 0, 0), datetime.datetime(2017, 2, 1,... 12, 1, 0, 0)], dtype=object) Coordinates: * time (time) object 2017-01-01 2017-02-01 2017-03-01 2017-04-01 ...,) numpy_types = [<class 'numpy.floating'>, <class 'numpy.integer'>, <class 'numpy.timedelta64'>, <class 'numpy.datetime64'>] other_types = [<class 'datetime.datetime'>] x = <xarray.DataArray 'time' (time: 12)> array([datetime.datetime(2017, 1, 1, 0, 0), datetime.datetime(2017, 2, 1, ...7, 12, 1, 0, 0)], dtype=object) Coordinates: * time (time) object 2017-01-01 2017-02-01 2017-03-01 2017-04-01 ... def _ensure_plottable(*args): """ Raise exception if there is anything in args that can't be plotted on an axis. """ numpy_types = [np.floating, np.integer, np.timedelta64, np.datetime64] other_types = [datetime]

    for x in args:
        if not (_valid_numpy_subdtype(np.array(x), numpy_types)
                or _valid_other_type(np.array(x), other_types)):
          raise TypeError('Plotting requires coordinates to be numeric '
                            'or dates.')

E TypeError: Plotting requires coordinates to be numeric or dates. xarray/plot/plot.py:57: TypeError ```

Expected Output

This test was previously passing

Output of xr.show_versions()

https://travis-ci.org/pydata/xarray/jobs/326640013#L1262

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1813/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
265056503 MDU6SXNzdWUyNjUwNTY1MDM= 1631 Resample / upsample behavior diverges from pandas jhamman 2443309 closed 0     5 2017-10-12T19:22:44Z 2017-12-30T06:21:42Z 2017-12-30T06:21:42Z MEMBER      

I've found a few issues where xarray's new resample / upsample functionality is diverging from Pandas. I think they are mostly surrounding how NaNs are treated. Thoughts from @shoyer, @darothen and others.

Gist with all the juicy details: https://gist.github.com/jhamman/354f0e5ff32a39550ffd25800e7214fc#file-xarray_resample-ipynb

xref: #1608, #1272

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1631/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
283984555 MDU6SXNzdWUyODM5ODQ1NTU= 1798 BUG: set_variables in backends.commons loads target dataset jhamman 2443309 closed 0     1 2017-12-21T19:43:05Z 2017-12-28T05:40:17Z 2017-12-28T05:40:17Z MEMBER      

Problem description

In #1609 we (I) implemented a fix for appending to datasets with existing variables. In doing so, it looks like I added a regression wherein the variables property on the AbstractWritableDataStore is repeatedly queried. This property calls .load() on the underlying dataset.

This was discovered while diagnosing some problems with the zarr backend (#1770, https://github.com/pangeo-data/pangeo/issues/48#issuecomment-353223737).

I have a potential fix for this that I will post once the tests pass.

cc @rabernat, @mrocklin

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: 20f957db105a9348b0f7d2dac076c17c31cbccee python: 3.6.0.final.0 python-bits: 64 OS: Darwin OS-release: 17.3.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.0+dev18.g4a9c1e3 pandas: 0.21.0 numpy: 1.13.3 scipy: 0.19.1 netCDF4: 1.3.0 h5netcdf: 0.5.0 Nio: None zarr: 2.1.4 bottleneck: 1.2.1 cyordereddict: None dask: 0.15.4 distributed: 1.19.3 matplotlib: 2.0.2 cartopy: 0.15.1 seaborn: 0.8.1 setuptools: 33.1.0.post20170122 pip: 9.0.1 conda: None pytest: 3.2.3 IPython: 5.2.2 sphinx: 1.6.3
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1798/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
279958650 MDU6SXNzdWUyNzk5NTg2NTA= 1766 Pandas has deprecated the TimeGrouper jhamman 2443309 closed 0     0 2017-12-07T00:40:11Z 2017-12-07T01:33:29Z 2017-12-07T01:33:29Z MEMBER      

Code Sample, a copy-pastable example if possible

python da.resample(time='MS').sum('time')

Problem description

Pandas has deprecated the TimeGrouper class (https://github.com/pandas-dev/pandas/issues/16747) and that warning has started popping out during xarray resample operations. We can make this go away quite easily. (I'll submit a PR shortly).

Output of xr.show_versions()

In [2]: xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.9.6-75-g246c352 pandas: 0.21.0 numpy: 1.13.3 scipy: 0.19.1 netCDF4: 1.3.0 h5netcdf: 0.5.0 Nio: None bottleneck: 1.2.1 cyordereddict: None dask: 0.15.4 matplotlib: 2.0.2 cartopy: 0.15.1 seaborn: 0.8.1 setuptools: 33.1.0.post20170122 pip: 9.0.1 conda: None pytest: 3.2.3 IPython: 5.2.2 sphinx: 1.6.3
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1766/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
253463226 MDU6SXNzdWUyNTM0NjMyMjY= 1535 v0.10 Release jhamman 2443309 closed 0     18 2017-08-28T21:31:43Z 2017-11-20T20:13:52Z 2017-11-20T17:27:24Z MEMBER      

I'd like to issue the v0.10 release in within the next few weeks, after merging the following PRs:

Features

  • [x] #1272 Groupby-like API for resampling (@darothen)
  • [x] #1473 Indexing with broadcasting (@fujiisoup, @shoyer)
  • [x] #1489 to_dask_dataframe() (@jmunroe)
  • [x] #1508 Support using opened netCDF4.Dataset (@dopplershift)
  • [x] #1514 Add pathlib.Path support to open_(mf)dataset (@willirath)
  • [x] #1543 pass dask compute/persist args through from load/compute/perist (@jhamman)

Bug Fixes

  • [x] #1532 Avoid computing dask variables on __repr__ and __getattr__ (@crusaderky)
  • [x] #1542 Pandas dev test failures (@shoyer)
  • [x] #1538 Disallow improper DataArray construction (@jhamman)

Misc

  • [x] #1485 xr.show_versions() (@jhamman)
  • [x] #1530 Deprecate old pandas support (@fujiisoup)
  • [x] #1539 Remove support for dataset construction w/o dims. (@jhamman)

TODO

  • [x] #1333 Deprecate indexing with non-aligned DataArray objects

Let me know if there's anything else critical to get in.

CC @pydata/xarray

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1535/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
267354113 MDU6SXNzdWUyNjczNTQxMTM= 1644 Formalize contract between XArray and the dask.distributed scheduler jhamman 2443309 closed 0     1 2017-10-21T06:09:22Z 2017-11-14T23:40:06Z 2017-11-14T23:40:06Z MEMBER      

From @mrocklin in https://github.com/pangeo-data/pangeo/issues/5#issue-255329911:

XArray was designed long before the dask.distributed task scheduler. As a result newer ways of doing things, like asynchronous computing, persist, etc. either don't function well, or were hacked on in a less-than-optimal-way. We should improve this relationship so that XArray can take advantage of newer dask.distributed features today and also adhere to contracts so that it benefits from changes in the future.

There is conversation towards the end of dask/dask#1068 about what such a contract might look like. I think that @jcrist is planning to work on this on the Dask side some time in the next week or two.

There is a new "Dask Collection Interface" implemented in https://github.com/dask/dask/pull/2748 (and the dask docs docs).

I'm creating this issue here (in addition to https://github.com/pangeo-data/pangeo/issues/5) to track design considerations on the xarray side and to get input from the @pydata/xarray team.

cc @mrocklin, @shoyer, @jcrist, @rabernat

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1644/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
270808895 MDU6SXNzdWUyNzA4MDg4OTU= 1684 Dask arrays and DataArray coords that share name with dimensions jhamman 2443309 closed 0     3 2017-11-02T21:11:58Z 2017-11-05T01:29:45Z 2017-11-05T01:29:45Z MEMBER      

First reported by @mrocklin in here.

```python In [1]: import xarray

In [2]: import dask.array as da

In [3]: coord = da.arange(8, chunks=(4,)) ...: data = da.random.random((8, 8), chunks=(4, 4)) + 1 ...: array = xarray.DataArray(data, ...: coords={'x': coord, 'y': coord}, ...: dims=['x', 'y']) ...:


ValueError Traceback (most recent call last) <ipython-input-3-b90a33ebf436> in <module>() 3 array = xarray.DataArray(data, 4 coords={'x': coord, 'y': coord}, ----> 5 dims=['x', 'y'])

/home/mrocklin/workspace/xarray/xarray/core/dataarray.py in init(self, data, coords, dims, name, attrs, encoding, fastpath) 227 228 data = as_compatible_data(data) --> 229 coords, dims = _infer_coords_and_dims(data.shape, coords, dims) 230 variable = Variable(dims, data, attrs, encoding, fastpath=True) 231

/home/mrocklin/workspace/xarray/xarray/core/dataarray.py in _infer_coords_and_dims(shape, coords, dims) 68 if utils.is_dict_like(coords): 69 for k, v in coords.items(): ---> 70 new_coords[k] = as_variable(v, name=k) 71 elif coords is not None: 72 for dim, coord in zip(dims, coords):

/home/mrocklin/workspace/xarray/xarray/core/variable.py in as_variable(obj, name) 94 '{}'.format(obj)) 95 elif utils.is_scalar(obj): ---> 96 obj = Variable([], obj) 97 elif getattr(obj, 'name', None) is not None: 98 obj = Variable(obj.name, obj)

/home/mrocklin/workspace/xarray/xarray/core/variable.py in init(self, dims, data, attrs, encoding, fastpath) 275 """ 276 self._data = as_compatible_data(data, fastpath=fastpath) --> 277 self._dims = self._parse_dimensions(dims) 278 self._attrs = None 279 self._encoding = None

/home/mrocklin/workspace/xarray/xarray/core/variable.py in _parse_dimensions(self, dims) 439 raise ValueError('dimensions %s must have the same length as the ' 440 'number of data dimensions, ndim=%s' --> 441 % (dims, self.ndim)) 442 return dims 443

ValueError: dimensions () must have the same length as the number of data dimensions, ndim=1 ```

or a similiar setup that computes the coordinates imediately

```Python In [18]: x = xr.Variable('x', da.arange(8, chunks=(4,))) ...: y = xr.Variable('y', da.arange(8, chunks=(4,)) * 2) ...: data = da.random.random((8, 8), chunks=(4, 4)) + 1 ...: array = xr.DataArray(data, ...: dims=['x', 'y']) ...: array.coords['x'] = x ...: array.coords['y'] = y ...:

In [19]: array Out[19]: <xarray.DataArray 'add-7d8ed340e5dd8fe107ea681573c72e87' (x: 8, y: 8)> dask.array<shape=(8, 8), dtype=float64, chunksize=(4, 4)> Coordinates: * x (x) int64 0 1 2 3 4 5 6 7 * y (y) int64 0 2 4 6 8 10 12 14 ```

Problem description

I think we have two, possiblely related problems with using dask arrays as DataArray coordinates.

  1. As the first snippet shows, the constructor fails when coordinates are specified as raw dask arrays. This does not occur when coord is a numpy array.
  2. When coordinates are specified as dask arrays via the coords attribute, they are computed immediately.

Expected Output

Output of xr.show_versions()

In [23]: xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.0rc1 pandas: 0.20.3 numpy: 1.13.1 scipy: 0.19.1 netCDF4: None h5netcdf: 0.3.1 Nio: None bottleneck: 1.2.0 cyordereddict: None dask: 0.15.4 matplotlib: 2.0.2 cartopy: 0.15.1 seaborn: 0.8.1 setuptools: 36.6.0 pip: 9.0.1 conda: 4.3.29 pytest: 3.0.5 IPython: 5.1.0 sphinx: 1.5.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1684/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
265827204 MDU6SXNzdWUyNjU4MjcyMDQ= 1633 seaborn.apionly module is deprecated jhamman 2443309 closed 0     1 2017-10-16T16:11:29Z 2017-10-23T15:58:09Z 2017-10-23T15:58:09Z MEMBER      

Xarray is using the apionly module from seaborn which is now raising this warning:

Python ...python3.6/site-packages/seaborn/apionly.py:6: UserWarning: As seaborn no longer sets a default style on import, the seaborn.apionly module is deprecated. It will be removed in a future version. warnings.warn(msg, UserWarning)

I think the only places we use seaborn are here:

https://github.com/pydata/xarray/blob/2949558b75a65404a500a237ec54834fd6946d07/xarray/plot/utils.py#L76-L87

This shouldn't a difficult fix if/when we decide to change it.

xref: https://github.com/mwaskom/seaborn/pull/1216

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1633/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
266250898 MDU6SXNzdWUyNjYyNTA4OTg= 1636 support writing unlimited dimensions with h5netcdf jhamman 2443309 closed 0     0 2017-10-17T19:33:11Z 2017-10-18T19:56:43Z 2017-10-18T19:56:43Z MEMBER      

h5netcdfv0.5 (just released) added support for unlimited dimensions. This may (should) allow us to enable writing unlimited dimensions with the h5netcdf backend.

xref: https://github.com/shoyer/h5netcdf/pull/33

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1636/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
262847801 MDU6SXNzdWUyNjI4NDc4MDE= 1605 Resample interpolate failing on tutorial dataset jhamman 2443309 closed 0     3 2017-10-04T16:17:56Z 2017-10-05T16:34:14Z 2017-10-05T16:34:14Z MEMBER      

I'm getting some unexpected behavior/errors from the new resample/interpolate methods.

@darothen - any idea what's going on here?

```Python-traceback In [1]: import xarray as xr

In [2]: ds = xr.tutorial.load_dataset('air_temperature')

In [3]: ds.resample(time='15d').interpolate(kind='linear')

AttributeError Traceback (most recent call last) <ipython-input-3-ef931d7ebbda> in <module>() ----> 1 ds.resample(time='15d').interpolate(kind='linear')

/glade/p/work/jhamman/storylines/src/xarray/xarray/core/resample.py in interpolate(self, kind) 110 111 """ --> 112 return self._interpolate(kind=kind) 113 114 def _interpolate(self, kind='linear'):

/glade/p/work/jhamman/storylines/src/xarray/xarray/core/resample.py in _interpolate(self, kind) 312 313 old_times = self._obj[self._dim].astype(float) --> 314 new_times = self._full_index.values.astype(float) 315 316 data_vars = OrderedDict()

AttributeError: 'NoneType' object has no attribute 'values' ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1605/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
262858955 MDU6SXNzdWUyNjI4NTg5NTU= 1606 BUG: _extract_nc4_variable_encoding raises when shuffle argument is set jhamman 2443309 closed 0     0 2017-10-04T16:55:59Z 2017-10-05T00:12:38Z 2017-10-05T00:12:38Z MEMBER      

I think we're missing the shuffle key from the valid encodings list below:

https://github.com/pydata/xarray/blob/24643ecee2eab04d0f84c41715d753e829f448e6/xarray/backends/netCDF4_.py#L155-L156

Python var = xr.Variable(('x',), [1, 2, 3], {}, {'chunking': (2, 1)}) encoding = _extract_nc4_variable_encoding(var, raise_on_invalid=True)

``` variable = <xarray.Variable (x: 3)> array([1, 2, 3]), raise_on_invalid = True, lsd_okay = True, backend = 'netCDF4'

def _extract_nc4_variable_encoding(variable, raise_on_invalid=False,
                                   lsd_okay=True, backend='netCDF4'):
    encoding = variable.encoding.copy()

    safe_to_drop = set(['source', 'original_shape'])
    valid_encodings = set(['zlib', 'complevel', 'fletcher32', 'contiguous',
                           'chunksizes'])
    if lsd_okay:
        valid_encodings.add('least_significant_digit')

    if (encoding.get('chunksizes') is not None and
            (encoding.get('original_shape', variable.shape) !=
                variable.shape) and not raise_on_invalid):
        del encoding['chunksizes']

    for k in safe_to_drop:
        if k in encoding:
            del encoding[k]

    if raise_on_invalid:
        invalid = [k for k in encoding if k not in valid_encodings]
        if invalid:
            raise ValueError('unexpected encoding parameters for %r backend: '
                           ' %r' % (backend, invalid))

E ValueError: unexpected encoding parameters for 'netCDF4' backend: ['shuffle']

xarray/backends/netCDF4_.py:173: ValueError ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1606/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
245893358 MDU6SXNzdWUyNDU4OTMzNTg= 1493 ENH: points coord from isel/sel_points should be a MultiIndex jhamman 2443309 closed 0     1 2017-07-27T00:33:42Z 2017-09-07T15:25:40Z 2017-09-07T15:25:40Z MEMBER      

We implemented the pointwise indexing methods (isel_points and sel_points) before we had MultiIndex support. Would it make sense to update these methods to return objects with coordinates defined as a MultiIndex?

Current behavior:

```Python print('original --> \n', ds)

lons = [-88, -85.9] lats = [34.2, 31.9]

subset = ds.sel_points(lon=lons, lat=lats, method='nearest') print('subset --> \n', subset) ``` yields:

original --> <xarray.Dataset> Dimensions: (lat: 224, lon: 464, time: 19709) Coordinates: * lat (lat) float64 25.06 25.19 25.31 25.44 25.56 25.69 25.81 25.94 ... * lon (lon) float64 -124.9 -124.8 -124.7 -124.6 -124.4 -124.3 -124.2 ... * time (time) float64 5.548e+04 5.548e+04 5.548e+04 5.548e+04 ... Data variables: pcp (time, lat, lon) float64 nan nan nan nan nan nan nan nan nan ... subset --> <xarray.Dataset> Dimensions: (points: 2, time: 19709) Coordinates: lat (points) float64 34.19 31.94 lon (points) float64 -87.94 -85.94 * time (time) float64 5.548e+04 5.548e+04 5.548e+04 5.548e+04 ... Dimensions without coordinates: points Data variables: pcp (points, time) float64 0.0 5.698 0.0 0.0 14.66 0.0 0.0 0.0 0.0 ...

Maybe it makes sense to return an object with a MultiIndex like:

Python new = pd.MultiIndex.from_arrays([subset.lon.to_index(), subset.lat.to_index()], names=['lon', 'lat']) print(new) MultiIndex(levels=[[-87.9375, -85.9375], [31.9375, 34.1875]], labels=[[0, 1], [1, 0]], names=['lon', 'lat'])

xref: #214, #475, #507

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1493/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
254430377 MDU6SXNzdWUyNTQ0MzAzNzc= 1542 Testing: Failing tests on py36-pandas-dev jhamman 2443309 closed 0   0.10 2415632 4 2017-08-31T18:40:47Z 2017-09-05T22:22:32Z 2017-09-05T22:22:32Z MEMBER      

We currently have 7 failing tests when run against the pandas development code (travis).

Question for @shoyer - can you take a look at these and see if we should try to get a fix in place prior to v.0.10.0? It looks like Pandas.0.21 is slated for release on Sept. 30.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1542/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
254217141 MDU6SXNzdWUyNTQyMTcxNDE= 1540 BUG: Dask distributed integration tests failing on Travis jhamman 2443309 closed 0     10 2017-08-31T05:41:50Z 2017-09-05T09:18:01Z 2017-09-01T01:09:11Z MEMBER      

Recent builds on travis are failing for the integration tests for dask distributed (example). Those tests are:

  • test_dask_distributed_integration_test[h5netcdf]
  • test_dask_distributed_integration_test[netcdf4]

The traceback includes this detail:

``` __ test_dask_distributed_integration_test[netcdf4] ________________ loop = <tornado.platform.epoll.EPollIOLoop object at 0x7fe36dc9e250> engine = 'netcdf4' @pytest.mark.parametrize('engine', ENGINES) def test_dask_distributed_integration_test(loop, engine): with cluster() as (s, ): with distributed.Client(s['address'], loop=loop): original = create_test_data() with create_tmp_file(allow_cleanup_failure=ON_WINDOWS) as filename: original.to_netcdf(filename, engine=engine) with xr.open_dataset(filename, chunks=3, engine=engine) as restored: assert isinstance(restored.var1.data, da.Array)

                  computed = restored.compute()

xarray/tests/test_distributed.py:33:


xarray/core/dataset.py:487: in compute return new.load() xarray/core/dataset.py:464: in load evaluated_data = da.compute(lazy_data.values()) ../../../miniconda/envs/test_env/lib/python2.7/site-packages/dask/base.py:206: in compute results = get(dsk, keys, kwargs) ../../../miniconda/envs/test_env/lib/python2.7/site-packages/distributed/client.py:1923: in get results = self.gather(packed, asynchronous=asynchronous) ../../../miniconda/envs/test_env/lib/python2.7/site-packages/distributed/client.py:1368: in gather asynchronous=asynchronous) ../../../miniconda/envs/test_env/lib/python2.7/site-packages/distributed/client.py:540: in sync return sync(self.loop, func, args, kwargs) ../../../miniconda/envs/test_env/lib/python2.7/site-packages/distributed/utils.py:239: in sync six.reraise(error[0]) ../../../miniconda/envs/test_env/lib/python2.7/site-packages/distributed/utils.py:227: in f result[0] = yield make_coro() ../../../miniconda/envs/test_env/lib/python2.7/site-packages/tornado/gen.py:1055: in run value = future.result() ../../../miniconda/envs/test_env/lib/python2.7/site-packages/tornado/concurrent.py:238: in result raise_exc_info(self._exc_info) ../../../miniconda/envs/test_env/lib/python2.7/site-packages/tornado/gen.py:1063: in run yielded = self.gen.throw(exc_info) ../../../miniconda/envs/test_env/lib/python2.7/site-packages/distributed/client.py:1246: in _gather traceback)


c = a[b] E TypeError: string indices must be integers ```

Distributed v.1.18.1 was released 5 days ago so there must have been a breaking change that has been passed down to us.

cc @shoyer, @mrocklin

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1540/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
140063713 MDU6SXNzdWUxNDAwNjM3MTM= 790 ENH: Optional Read-Only RasterIO backend jhamman 2443309 closed 0     15 2016-03-11T02:00:32Z 2017-06-06T10:25:22Z 2017-06-06T10:25:22Z MEMBER      

RasterIO is a GDAL based library that provides Fast and direct raster I/O for use with Numpy and SciPy. I've just used it a bit but have been generally impressed with its support for a range of ASCII and binary raster formats. It might be a nice addition to the suite of backends already available in xarray.

I'm envisioning a functionality akin to what we provide in the PyNIO backend, which is to say, read-only support for which ever file types RasterIO supports.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/790/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
124569898 MDU6SXNzdWUxMjQ1Njk4OTg= 696 Doc updates jhamman 2443309 closed 0     1 2016-01-02T01:37:58Z 2016-12-29T02:36:56Z 2016-12-29T02:36:56Z MEMBER      

Now that ReadTheDocs supports using conda, we can - use cartopy to plot the maps at build time - standardize on Python 3

xref: #695

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/696/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
138045063 MDU6SXNzdWUxMzgwNDUwNjM= 781 Don't infer x/y coordinates interval breaks for cartopy plot axes jhamman 2443309 closed 0     9 2016-03-03T01:22:19Z 2016-11-10T22:55:05Z 2016-11-10T22:55:05Z MEMBER      

The DataArray.plot.pcolormesh() method modifies the x/y coordinates of its plots. I'm finding that, at least for custom cartopy projections, the offset applied here causes some real issues downstream.

@clarkfitzg - Do you see any problem with treating the x/y offset in the same way as the axis limits?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/781/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
109589162 MDU6SXNzdWUxMDk1ODkxNjI= 605 Support Two-Dimensional Coordinate Variables jhamman 2443309 closed 0   1.0 741199 11 2015-10-02T23:27:18Z 2016-07-31T23:02:46Z 2016-07-31T23:02:46Z MEMBER      

The CF Conventions supports the notion of a 2d coordinate variable in the case of irregularly spaced data. An example of this sort of dataset is below. The CF Convention is to add a "coordinates" attribute with a string describing the 2d coordinates.

dimensions: xc = 128 ; yc = 64 ; lev = 18 ; variables: float T(lev,yc,xc) ; T:long_name = "temperature" ; T:units = "K" ; T:coordinates = "lon lat" ; float xc(xc) ; xc:axis = "X" ; xc:long_name = "x-coordinate in Cartesian system" ; xc:units = "m" ; float yc(yc) ; yc:axis = "Y" ; yc:long_name = "y-coordinate in Cartesian system" ; yc:units = "m" ; float lev(lev) ; lev:long_name = "pressure level" ; lev:units = "hPa" ; float lon(yc,xc) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; float lat(yc,xc) ; lat:long_name = "latitude" ; lat:units = "degrees_north" ;

I'd like to discuss how we could support this in xray. There motivating application for this is in plotting operations but it may also have application in other grouping and remapping operations (e.g. #324, #475, #486).

One option would just to honor the "coordinates" attr in plotting and use the specified coordinates as the x/y values.

ref: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html#idp5559280

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/605/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
122776511 MDU6SXNzdWUxMjI3NzY1MTE= 681 to_netcdf on Python 3: "string" qualifier on attributes jhamman 2443309 closed 0     8 2015-12-17T16:56:59Z 2016-06-16T08:27:33Z 2016-03-01T21:49:36Z MEMBER      

I've had a number of people ask me about this and I think we can figure out a way to fix this. In python3, variabile attributes in files written with Dataset.to_netcdf end up with the "string" type qualifier shown below. This causes all sorts of problems with other netcdf programs. Is this related to https://github.com/Unidata/netcdf4-python/issues/485?

``` bash PRISM$ ncdump -h prism_historical_conus4k.189501-201510.nc netcdf prism_historical_conus4k.189501-201510 { dimensions: latitude = 621 ; longitude = 1405 ; time = 1450 ; variables: double latitude(latitude) ; double longitude(longitude) ; int64 time(time) ; string time:units = "days since 1895-01-01 00:00:00" ; string time:calendar = "proleptic_gregorian" ; float prcp(time, latitude, longitude) ; string prcp:units = "mm" ; string prcp:description = "precipitation " ; string prcp:long_name = "precipitation" ;

// global attributes: string :title = "PRISM: Parameter-elevation Regressions on Independent Slopes Model" ; } ```

cc @lizaclark

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/681/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
156186767 MDU6SXNzdWUxNTYxODY3Njc= 855 drop support for Python 2.6 jhamman 2443309 closed 0     0 2016-05-23T01:53:15Z 2016-05-23T19:38:07Z 2016-05-23T19:38:07Z MEMBER      

@shoyer polled the xarray users list about dropping Python 2.6 from the supported versions of Python for xarray. There were no complaints so it looks like we are moving forward on this at the next major release (0.8).

xref: https://groups.google.com/forum/#!searchin/xarray/2.6/xarray/JVIUiIhEW_8/qBjxmestCQAJ

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/855/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
113499493 MDU6SXNzdWUxMTM0OTk0OTM= 641 add rolling_apply method or function jhamman 2443309 closed 0     13 2015-10-27T03:30:11Z 2016-02-20T02:32:33Z 2016-02-20T02:32:33Z MEMBER      

Pandas has a generic rolling_apply function. It would be nice to support a similar api on xray objects. The api I have in mind is something like:

``` Python

DataArray.rolling_apply(window, func, min_periods=None, freq=None,

center=False, args=(), kwargs={})

da.rolling_apply(7, np.mean) ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/641/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
108769226 MDU6SXNzdWUxMDg3NjkyMjY= 593 Bug when accessing sorted dataset before loading jhamman 2443309 closed 0     6 2015-09-28T23:58:29Z 2016-01-04T23:11:55Z 2015-10-02T21:41:11Z MEMBER      

I ran into this bug this afternoon. If I sort a Dataset using isel before loading the data, I end up with an error in the netCDF4 backend. If I call Dataset.load() before sorting the Dataset, I get the expected behavior.

First some info on my environment (everything should be fresh):

Python version : 3.4.3 |Anaconda 2.3.0 (x86_64)| (default, Mar 6 2015, 12:07:41) [GCC 4.2.1 (Apple Inc. build 5577)] xray version : 0.6.0 numpy version : 1.9.3 netCDF4 version : 1.1.9

Now for a simplified example that reproduces the bug:

``` Python

In [1]: import xray import numpy as np import netCDF4

In [2]: random_data = np.random.random(size=(4, 6)) dim0 = [0, 1, 2, 3] dim1 = [0, 2, 1, 3, 5, 4] # We will sort this in a later step da = xray.DataArray(data=random_data, dims=('dim0', 'dim1'), coords={'dim0': dim0, 'dim1': dim1}, name='randovar') ds = da.to_dataset() ds.to_netcdf('rando.nc')

In [3]: ds2 = xray.open_dataset('rando.nc')

ds2.load() # work around to prevent IndexError

inds = np.argsort(ds2.dim1.values) ds2 = ds2.isel(dim1=inds) print(ds2.randovar)

Out[3]:

IndexError Traceback (most recent call last) <ipython-input-3-9b4ab63c0fd2> in <module>() 2 inds = np.argsort(ds2.dim1.values) 3 ds2 = ds2.isel(dim1=inds) ----> 4 print(ds2.randovar)

...

/Users/jhamman/anaconda/lib/python3.4/site-packages/xray/backends/netCDF4_.py in getitem(self, key) 43 else: 44 getitem = operator.getitem ---> 45 data = getitem(self.array, key) 46 if self.ndim == 0: 47 # work around for netCDF4-python's broken handling of 0-d

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.getitem (netCDF4/_netCDF4.c:30994)()

/Users/jhamman/anaconda/lib/python3.4/site-packages/netCDF4/utils.py in _StartCountStride(elem, shape, dimensions, grp, datashape, put) 220 # duplicate indices in the sequence) 221 msg = "integer sequences in slices must be sorted and cannot have duplicates" --> 222 raise IndexError(msg) 223 # convert to boolean array. 224 # if unlim, let boolean array be longer than current dimension

IndexError: integer sequences in slices must be sorted and cannot have duplicates ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/593/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
110102454 MDU6SXNzdWUxMTAxMDI0NTQ= 611 facet grid axis labels are None jhamman 2443309 closed 0   0.6.1 1307323 4 2015-10-06T21:12:50Z 2016-01-04T23:11:55Z 2015-10-09T14:25:57Z MEMBER      

The dim names on this plot are not showing up (e.g. None is not right, it should be x and y):

Python data = (np.random.random(size=(20, 25, 12)) + np.linspace(-3, 3, 12)) # range is ~ -3 to 4 da = xray.DataArray(data, dims=['x', 'y', 'time'], name='data') fg = da.plot.pcolormesh(col='time', col_wrap=4)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/611/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
109434899 MDU6SXNzdWUxMDk0MzQ4OTk= 602 latest docs are broken jhamman 2443309 closed 0 shoyer 1217238 0.7.0 1368762 4 2015-10-02T05:48:21Z 2016-01-02T01:31:17Z 2016-01-02T01:31:17Z MEMBER      

Looking at the doc build from tonight, something happened and netCDF4 isn't getting picked up. All the docs depending on the netCDF4 package are broken (e.g. plotting, IO, etc.).

@shoyer - You may be able to just resubmit the doc build, or maybe we need to fix something.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/602/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
110801359 MDU6SXNzdWUxMTA4MDEzNTk= 617 travis builds are broken jhamman 2443309 closed 0     2 2015-10-10T15:39:51Z 2015-10-23T22:26:43Z 2015-10-23T22:26:43Z MEMBER      

Tests are failing on Python 2.7 and 3.4. We just started getting pandas 0.17 and numpy 1.10 so that is probably the source of the issue.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/617/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
110040239 MDU6SXNzdWUxMTAwNDAyMzk= 610 don't throw away variable specific coordinates information jhamman 2443309 closed 0     0 2015-10-06T15:50:41Z 2015-10-08T18:03:19Z 2015-10-08T18:03:19Z MEMBER      

Currently, we decode the coordinates attribute, when present, but it doesn't end up in the DataArray's encoding attribute (https://github.com/xray/xray/blob/master/xray/conventions.py#L822-L832). This should be changed so the user can reference the coordinates attribute after decoding.

xref: #605

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/610/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
109553434 MDU6SXNzdWUxMDk1NTM0MzQ= 603 support using Cartopy with facet grids jhamman 2443309 closed 0     1 2015-10-02T19:06:33Z 2015-10-06T15:10:01Z 2015-10-06T15:10:01Z MEMBER      

Currently, I don't think it is possible to specify a Cartopy projection for facet grid plots.

It would be nice to be able to specify either the subplots array including Cartopy projections (e.g. ax=axes) or a projection key word argument via (subplots_kw=dict(projection=...)) directly when using the xray's facet grid.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/603/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
101061611 MDU6SXNzdWUxMDEwNjE2MTE= 533 DataArray.name should always be a string jhamman 2443309 closed 0     2 2015-08-14T17:36:02Z 2015-09-18T17:35:26Z 2015-09-18T17:35:26Z MEMBER      

Consider the following example:

``` Python import numpy as np import xray

da = xray.DataArray(np.random.random((4, 5))) ds = da.to_dataset(name=0) # or name=True, or name=(4) ds.to_netcdf('test.nc') ```

raises this error:

``` python /Users/jhamman/anaconda/lib/python3.4/site-packages/xray/backends/netCDF4_.py in prepare_variable(self, name, variable) 228 endian='native', 229 least_significant_digit=encoding.get('least_significant_digit'), --> 230 fill_value=fill_value) 231 nc4_var.set_auto_maskandscale(False) 232

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.createVariable (netCDF4/_netCDF4.c:13217)()

/Users/jhamman/anaconda/lib/python3.4/posixpath.py in normpath(path) 330 if path == empty: 331 return dot --> 332 initial_slashes = path.startswith(sep) 333 # POSIX allows one or two initial slashes, but treats three or more 334 # as single slash.

AttributeError: 'int' object has no attribute 'startswith' ```

I think one way to solve this is to cast the name attribute to a string at the time of assignment. Another way is just to raise an error if a not string variable name is used.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/533/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
97861940 MDU6SXNzdWU5Nzg2MTk0MA== 500 discrete colormap option for imshow and pcolormesh jhamman 2443309 closed 0     9 2015-07-29T05:07:18Z 2015-08-06T16:06:33Z 2015-08-06T16:06:33Z MEMBER      

It may be nice to include an option for a discrete colormap/colorbar for the imshow and pcolormesh methods. I would suggest that the default behavior remains a continuous colormap. Perhaps adding an argument such as cmap_intervals would allow for easy discretization of the colormap.

The logic in #499 takes care of most of the details for this issue.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/500/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
83000406 MDU6SXNzdWU4MzAwMDQwNg== 411 unexpected positional indexing behavior with Dataset and DataArray "isel" jhamman 2443309 closed 0     5 2015-05-31T04:48:10Z 2015-06-01T05:03:38Z 2015-06-01T05:03:29Z MEMBER      

I may be missing something here but I think the indexing behavior in isel is surprisingly different to that of numpy and is incongruent with the xray documentation. Either this is a bug or a feature that I don't understand.

From the xray docs on positional indexing:

Indexing a DataArray directly works (mostly) just like it does for numpy arrays, except that the returned object is always another DataArray

My example below uses two 1d numpy arrays to select from a 3d numpy array. When using pure numpy, I get a 2d array back. In my view, this is the expected behavior. When using the xray.Dataset or xray.DataArray, I get an oddly shaped 3d array back with a duplicate dimension.

``` python import numpy as np import xray import sys

print('python version:', sys.version) print('numpy version:', np.version.full_version) print('xray version:', xray.version.version) ```

python version: 3.4.3 |Anaconda 2.2.0 (x86_64)| (default, Mar 6 2015, 12:07:41) [GCC 4.2.1 (Apple Inc. build 5577)] numpy version: 1.9.2 xray version: 0.4.1

``` python

A few numpy arrays

time = np.arange(100) lons = np.arange(40, 60) lats = np.arange(25, 70) np_data = np.random.random(size=(len(time), len(lats), len(lons)))

pick some random points to select

ys, xs = np.nonzero(np_data[0] > 0.8) print(len(ys)) ```

176

``` python

create a xray.DataArray and xray.Dataset

xr_data = xray.DataArray(np_data, [('time', time), ('y', lats), ('x', lons)]) # DataArray xr_ds = xr_data.to_dataset(name='data') # Dataset

numpy indexing

print('numpy: ', np_data[:, ys, xs].shape)

xray positional indexing

print('xray1: ', xr_data.isel(y=ys, x=xs).shape) print('xray2: ', xr_data[:, ys, xs].shape) print('xray3: ', xr_ds.isel(y=ys, x=xs)) ```

numpy: (100, 176) xray1: (100, 176, 176) xray2: (100, 176, 176) xray3: <xray.Dataset> Dimensions: (time: 100, x: 176, y: 176) Coordinates: * x (x) int64 46 47 57 45 48 50 51 54 57 59 48 52 49 50 52 53 55 57 43 46 47 48 53 ... * time (time) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ... * y (y) int64 25 25 25 26 26 26 26 26 26 26 27 27 28 28 28 28 28 28 29 29 29 29 29 ... Data variables: data (time, y, x) float64 0.9343 0.8311 0.8842 0.3188 0.02052 0.4506 0.04177 ...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/411/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
33273199 MDU6SXNzdWUzMzI3MzE5OQ== 122 Dataset.groupby summary methods jhamman 2443309 closed 0     3 2014-05-11T23:28:18Z 2014-06-23T07:25:08Z 2014-06-23T07:25:08Z MEMBER      

This may just be a documentation issue but the summary apply and combine methods for the Dataset.GroupBy object seem to be missing.

``` python In [146]:

foo_values = np.random.RandomState(0).rand(3, 4) times = pd.date_range('2000-01-01', periods=3) ds = xray.Dataset({'time': ('time', times), 'foo': (['time', 'space'], foo_values)})

ds.groupby('time').mean() #replace time with time.month after #121 is adressed

ds.groupby('time').apply(np.mean) # also Errors here


AttributeError Traceback (most recent call last) <ipython-input-146-eec1e73cff23> in <module>() 3 ds = xray.Dataset({'time': ('time', times), 4 'foo': (['time', 'space'], foo_values)}) ----> 5 ds.groupby('time').mean() 6 ds.groupby('time').apply(np.mean)

AttributeError: 'DatasetGroupBy' object has no attribute 'mean' ```

Adding this functionality, if not already present, seems like a really nice addition to the package.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/122/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
33942756 MDU6SXNzdWUzMzk0Mjc1Ng== 138 keep attrs when reducing xray objects jhamman 2443309 closed 0     4 2014-05-21T00:40:19Z 2014-05-22T00:29:22Z 2014-05-22T00:29:22Z MEMBER      

Reduction operations currently drop all Variable and Dataset attrs when a reduction operation is performed. I'm proposing adding a keyword to these methods to allow for copying of the original Variable or Dataset attrs.

The default value of the keep_attrs keyword would be False.

For example:

python new = ds.mean(keep_attrs=True)

returns new with all the Variable and Dataset attrs as ds contained.

Some previous discussion in #131 and #137.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/138/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
33272937 MDU6SXNzdWUzMzI3MjkzNw== 121 virtual variables not available when using open_dataset jhamman 2443309 closed 0     5 2014-05-11T23:11:21Z 2014-05-16T00:37:39Z 2014-05-16T00:37:39Z MEMBER      

The tutorial provides an example of how to use xray's virtual_variables. The same functionality is not availble from a Dataset object created by open_dataset.

Tutorial:

``` python In [135]: foo_values = np.random.RandomState(0).rand(3, 4) times = pd.date_range('2000-01-01', periods=3) ds = xray.Dataset({'time': ('time', times), 'foo': (['time', 'space'], foo_values)}) ds['time.dayofyear']

Out[135]: <xray.DataArray 'time.dayofyear' (time: 3)> array([1, 2, 3], dtype=int32) Attributes: Empty ```

however, reading a time coordinate / variable from a netCDF4 file, and applying the same logic raises an error:

``` In [136]: ds = xray.open_dataset('sample_for_xray.nc') ds['time']

Out[136]: <xray.DataArray 'time' (time: 4)> array([1979-09-16 12:00:00, 1979-10-17 00:00:00, 1979-11-16 12:00:00, 1979-12-17 00:00:00], dtype=object) Attributes: dimensions: 1 long_name: time type_preferred: int

In [137]: ds['time.dayofyear']


ValueError Traceback (most recent call last) <ipython-input-137-bfe1ae778782> in <module>() ----> 1 ds['time.dayofyear']

/Users/jhamman/anaconda/lib/python2.7/site-packages/xray-0.2.0.dev_cc5e1b2-py2.7.egg/xray/dataset.pyc in getitem(self, key) 408 """Access the given variable name in this dataset as a DataArray. 409 """ --> 410 return data_array.DataArray._constructor(self, key) 411 412 def setitem(self, key, value):

/Users/jhamman/anaconda/lib/python2.7/site-packages/xray-0.2.0.dev_cc5e1b2-py2.7.egg/xray/data_array.pyc in _constructor(cls, dataset, name) 95 if name not in dataset and name not in dataset.virtual_variables: 96 raise ValueError('name %r must be a variable in dataset %r' ---> 97 % (name, dataset)) 98 obj._dataset = dataset 99 obj._name = name

ValueError: name 'time.dayofyear' must be a variable in dataset <xray.Dataset> Dimensions: (time: 4, x: 275, y: 205) Coordinates: time X
x X
y X
Noncoordinates: Wind 0 2 1
Attributes:

sample data for xray from RASM project ```

Is there a reason that the virtual time variables are only available if the dataset is created from a pandas date_range? Lastly, this could be related to the #118 .

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/121/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
33112594 MDU6SXNzdWUzMzExMjU5NA== 118 Problems parsing time variable using open_dataset jhamman 2443309 closed 0     4 2014-05-08T18:57:31Z 2014-05-16T00:37:28Z 2014-05-16T00:37:28Z MEMBER      

I'm noticing a problem parsing the time variable for at least the noleap calendar for a properly formatted time dimension. Any thoughts on why this is?

``` bash ncdump -c -t sample_for_xray.nc netcdf sample_for_xray { dimensions: time = UNLIMITED ; // (4 currently) y = 205 ; x = 275 ; variables: double Wind(time, y, x) ; Wind:units = "m/s" ; Wind:long_name = "Wind speed" ; Wind:coordinates = "latitude longitude" ; Wind:dimensions = "2" ; Wind:type_preferred = "double" ; Wind:time_rep = "instantaneous" ; Wind:_FillValue = 9.96920996838687e+36 ; double time(time) ; time:calendar = "noleap" ; time:dimensions = "1" ; time:long_name = "time" ; time:type_preferred = "int" ; time:units = "days since 0001-01-01 0:0:0" ;

// global attributes: ... data:

time = "1979-09-16 12", "1979-10-17", "1979-11-16 12", "1979-12-17" ; ```

python ds = xray.open_dataset('sample_for_xray.nc') print ds['time']

```

TypeError Traceback (most recent call last) <ipython-input-46-65c280e7a283> in <module>() 1 ds = xray.open_dataset('sample_for_xray.nc') ----> 2 print ds['time']

/home/jhamman/anaconda/lib/python2.7/site-packages/xray/common.pyc in repr(self) 40 41 def repr(self): ---> 42 return array_repr(self) 43 44 def _iter(self):

/home/jhamman/anaconda/lib/python2.7/site-packages/xray/common.pyc in array_repr(arr) 122 summary = ['<xray.%s %s(%s)>'% (type(arr).name, name_str, dim_summary)] 123 if arr.size < 1e5 or arr._in_memory(): --> 124 summary.append(repr(arr.values)) 125 else: 126 summary.append('[%s values with dtype=%s]' % (arr.size, arr.dtype))

/home/jhamman/anaconda/lib/python2.7/site-packages/xray/data_array.pyc in values(self) 147 def values(self): 148 """The variables's data as a numpy.ndarray""" --> 149 return self.variable.values 150 151 @values.setter

/home/jhamman/anaconda/lib/python2.7/site-packages/xray/variable.pyc in values(self) 217 def values(self): 218 """The variable's data as a numpy.ndarray""" --> 219 return utils.as_array_or_item(self._data_cached()) 220 221 @values.setter

/home/jhamman/anaconda/lib/python2.7/site-packages/xray/utils.pyc in as_array_or_item(values, dtype) 56 # converted into an integer instead :( 57 return values ---> 58 values = as_safe_array(values, dtype=dtype) 59 if values.ndim == 0 and values.dtype.kind == 'O': 60 # unpack 0d object arrays to be consistent with numpy

/home/jhamman/anaconda/lib/python2.7/site-packages/xray/utils.pyc in as_safe_array(values, dtype) 40 """Like np.asarray, but convert all datetime64 arrays to ns precision 41 """ ---> 42 values = np.asarray(values, dtype=dtype) 43 if values.dtype.kind == 'M': 44 # np.datetime64

/home/jhamman/anaconda/lib/python2.7/site-packages/numpy/core/numeric.pyc in asarray(a, dtype, order) 458 459 """ --> 460 return array(a, dtype, copy=False, order=order) 461 462 def asanyarray(a, dtype=None, order=None):

/home/jhamman/anaconda/lib/python2.7/site-packages/xray/variable.pyc in array(self, dtype) 121 if dtype is None: 122 dtype = self.dtype --> 123 return self.array.values.astype(dtype) 124 125 def getitem(self, key):

TypeError: Cannot cast datetime.date object from metadata [D] to [ns] according to the rule 'same_kind' ```

This file is available here: ftp://ftp.hydro.washington.edu/pub/jhamman/sample_for_xray.nc

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/118/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
33273376 MDU6SXNzdWUzMzI3MzM3Ng== 123 Selective variable reads in open_dataset jhamman 2443309 closed 0     2 2014-05-11T23:39:12Z 2014-05-12T02:25:10Z 2014-05-12T02:25:10Z MEMBER      

One of the beautiful things about the netCDF data model is that the variables can be read individually. I'm suggesting adding a variables keyword (or something along those lines) to the open_dataset function to support selecting one or more or all variables in a file. This will allow for faster reads and smaller memory usage when the full set of variables is not needed.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/123/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 33.474ms · About: xarray-datasette