home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

20 rows where repo = 13221727, type = "issue" and user = 6628425 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 17
  • open 3

type 1

  • issue · 20 ✖

repo 1

  • xarray · 20 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1970241789 I_kwDOAMm_X851b4D9 8394 Update cftime frequency strings in line with recent updates in pandas spencerkclark 6628425 closed 0     1 2023-10-31T11:24:15Z 2023-11-16T15:19:42Z 2023-11-16T15:19:42Z MEMBER      

What is your issue?

Pandas has introduced some deprecations in how frequency strings are specified:

  • Deprecating "A", "A-JAN", etc. in favor of "Y", "Y-JAN", etc. (https://github.com/pandas-dev/pandas/pull/55252)
  • Deprecating "AS", "AS-JAN", etc. in favor of "YS", "YS-JAN", etc. (https://github.com/pandas-dev/pandas/pull/55479)
  • Deprecating "Q", "Q-JAN", etc. in favor of "QE", "QE-JAN", etc. (https://github.com/pandas-dev/pandas/pull/55553)
  • Deprecating "M" in favor of "ME" (https://github.com/pandas-dev/pandas/pull/54061)
  • Deprecating "H" in favor of "h" (https://github.com/pandas-dev/pandas/pull/54939)
  • Deprecating "T", "S", "L", and "U" in favor of "min", "s", "ms", and "us" (https://github.com/pandas-dev/pandas/pull/54061).

It would be good to carry these deprecations out for cftime frequency specifications to remain consistent.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8394/reactions",
    "total_count": 2,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 1,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1413075015 I_kwDOAMm_X85UOdBH 7184 Potentially add option to encode times using `longdouble` values spencerkclark 6628425 open 0     0 2022-10-18T11:46:30Z 2022-10-18T11:47:00Z   MEMBER      

By default xarray will exactly roundtrip times saved to disk by encoding them using int64 values. However, if a user specifies time encoding units that prevent this, float64 values will be used, and this has the potential to cause roundtripping differences due to roundoff error. Recently, cftime added the ability to encode times using longdouble values (https://github.com/Unidata/cftime/pull/284). On some platforms this offers greater precision than float64 values (though typically not full quad precision). Nevertheless some users might be interested in encoding their times using such values.

The main thing that longdouble values have going for them is that they enable greater precision when using arbitrary units to encode the dates (with int64 we are constrained to using units that allow for time intervals to be expressed with integers). That said, the more I think about this, the more I feel it may not be the best idea:

  • Since the meaning of longdouble can vary from platform to platform, I wonder what happens if you encode times using longdouble values on one machine and decode them on another?
  • longdouble values cannot be stored with all backends; for example zarr supports it, but netCDF does not.
  • We already provide a robust way to exactly roundtrip any dates--i.e. encode them with int64 values--so adding a less robust (if slightly more flexible in terms of units) option might just cause confusion.

It's perhaps still worth opening this issue for discussion in case others have thoughts that might allay those concerns.

cc: @jswhit @dcherian

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7184/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1401909544 I_kwDOAMm_X85Tj3Eo 7145 Time decoding error message does not include the problematic variable's name spencerkclark 6628425 closed 0     5 2022-10-08T10:59:17Z 2022-10-13T23:21:55Z 2022-10-12T15:25:42Z MEMBER      

What is your issue?

If any variable in a Dataset has times that cannot be represented as cftime.datetime objects, an error message will be raised. However, this error message will not indicate the problematic variable's name. It would be nice if it did, because it would make it easier for users to determine the source of the error.

cc: @durack1 xref: Unidata/cftime#295

Example

This is a minimal example of the issue. The error message gives no indication that "invalid_times" is the problem:

```

import xarray as xr TIME_ATTRS = {"units": "days since 0001-01-01", "calendar": "noleap"} valid_times = xr.DataArray([0, 1], dims=["time"], attrs=TIME_ATTRS, name="valid_times") invalid_times = xr.DataArray([1e36, 2e36], dims=["time"], attrs=TIME_ATTRS, name="invalid_times") ds = xr.merge([valid_times, invalid_times]) xr.decode_cf(ds) Traceback (most recent call last): File "/Users/spencer/software/xarray/xarray/coding/times.py", line 275, in decode_cf_datetime dates = _decode_datetime_with_pandas(flat_num_dates, units, calendar) File "/Users/spencer/software/xarray/xarray/coding/times.py", line 210, in _decode_datetime_with_pandas raise OutOfBoundsDatetime( pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Cannot decode times from a non-standard calendar, 'noleap', using pandas.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/spencer/software/xarray/xarray/coding/times.py", line 180, in _decode_cf_datetime_dtype result = decode_cf_datetime(example_value, units, calendar, use_cftime) File "/Users/spencer/software/xarray/xarray/coding/times.py", line 277, in decode_cf_datetime dates = _decode_datetime_with_cftime( File "/Users/spencer/software/xarray/xarray/coding/times.py", line 202, in _decode_datetime_with_cftime cftime.num2date(num_dates, units, calendar, only_use_cftime_datetimes=True) File "src/cftime/_cftime.pyx", line 605, in cftime._cftime.num2date File "src/cftime/_cftime.pyx", line 404, in cftime._cftime.cast_to_int OverflowError: time values outside range of 64 bit signed integers

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/spencer/software/xarray/xarray/conventions.py", line 655, in decode_cf vars, attrs, coord_names = decode_cf_variables( File "/Users/spencer/software/xarray/xarray/conventions.py", line 521, in decode_cf_variables new_vars[k] = decode_cf_variable( File "/Users/spencer/software/xarray/xarray/conventions.py", line 369, in decode_cf_variable var = times.CFDatetimeCoder(use_cftime=use_cftime).decode(var, name=name) File "/Users/spencer/software/xarray/xarray/coding/times.py", line 687, in decode dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime) File "/Users/spencer/software/xarray/xarray/coding/times.py", line 190, in _decode_cf_datetime_dtype raise ValueError(msg) ValueError: unable to decode time units 'days since 0001-01-01' with "calendar 'noleap'". Try opening your dataset with decode_times=False or installing cftime if it is not installed. ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7145/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1117563249 I_kwDOAMm_X85CnKlx 6204 [Bug]: cannot chunk a DataArray that originated as a coordinate spencerkclark 6628425 open 0     1 2022-01-28T15:56:44Z 2022-03-16T04:18:46Z   MEMBER      

What happened?

If I construct the following DataArray, and try to chunk its "x" coordinate, I get back a NumPy-backed DataArray: ``` In [2]: a = xr.DataArray([1, 2, 3], dims=["x"], coords=[[4, 5, 6]])

In [3]: a.x.chunk() Out[3]: <xarray.DataArray 'x' (x: 3)> array([4, 5, 6]) Coordinates: * x (x) int64 4 5 6 If I construct a copy of the `"x"` coordinate, things work as I would expect: In [4]: x = xr.DataArray(a.x, dims=a.x.dims, coords=a.x.coords, name="x")

In [5]: x.chunk() Out[5]: <xarray.DataArray 'x' (x: 3)> dask.array<xarray-\<this-array>, shape=(3,), dtype=int64, chunksize=(3,), chunktype=numpy.ndarray> Coordinates: * x (x) int64 4 5 6 ```

What did you expect to happen?

I would expect the following to happen: ``` In [2]: a = xr.DataArray([1, 2, 3], dims=["x"], coords=[[4, 5, 6]])

In [3]: a.x.chunk() Out[3]: <xarray.DataArray 'x' (x: 3)> dask.array<xarray-\<this-array>, shape=(3,), dtype=int64, chunksize=(3,), chunktype=numpy.ndarray> Coordinates: * x (x) int64 4 5 6 ```

Minimal Complete Verifiable Example

No response

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 15:59:12) [Clang 11.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.5 libnetcdf: 4.6.3

xarray: 0.20.1 pandas: 1.3.5 numpy: 1.19.4 scipy: 1.5.4 netCDF4: 1.5.5 pydap: None h5netcdf: 0.8.1 h5py: 2.10.0 Nio: None zarr: 2.7.0 cftime: 1.2.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.22.0 distributed: None matplotlib: 3.2.2 cartopy: 0.19.0.post1 seaborn: None numbagg: None fsspec: 2021.06.0 cupy: None pint: 0.15 sparse: None setuptools: 49.6.0.post20210108 pip: 20.2.4 conda: 4.10.1 pytest: 6.0.1 IPython: 7.27.0 sphinx: 3.2.1

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6204/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
849771808 MDU6SXNzdWU4NDk3NzE4MDg= 5107 Converting `cftime.datetime` objects to `np.datetime64` values through `astype` spencerkclark 6628425 open 0     0 2021-04-04T01:02:55Z 2021-10-05T00:00:36Z   MEMBER      

The discussion of the use of the indexes property in #5102 got me thinking about this StackOverflow answer. For a while I have thought that my answer there isn't very satisfying, not only because it relies on this somewhat obscure indexes property, but also because it only works on dimension coordinates -- i.e. something that would be backed by an index.

Describe the solution you'd like

It would be better if we could do this conversion with astype, e.g. da.astype("datetime64[ns]"). This would allow conversion to datetime64 values for all cftime.datetime DataArrays -- dask-backed or NumPy-backed, 1D or ND -- through a fairly standard and well-known method. To my surprise, while you do not get the nice calendar-switching warning that CFTimeIndex.to_datetimeindex provides, this actually already kind of seems to work (?!):

``` In [1]: import xarray as xr

In [2]: times = xr.cftime_range("2000", periods=6, calendar="noleap")

In [3]: da = xr.DataArray(times.values.reshape((2, 3)), dims=["a", "b"])

In [4]: da.astype("datetime64[ns]") Out[4]: <xarray.DataArray (a: 2, b: 3)> array([['2000-01-01T00:00:00.000000000', '2000-01-02T00:00:00.000000000', '2000-01-03T00:00:00.000000000'], ['2000-01-04T00:00:00.000000000', '2000-01-05T00:00:00.000000000', '2000-01-06T00:00:00.000000000']], dtype='datetime64[ns]') Dimensions without coordinates: a, b ```

NumPy obviously does not officially support this -- nor would I expect it to -- so I would be wary of simply documenting this behavior as is. Would it be reasonable for us to modify xarray.core.duck_array_ops.astype to explicitly implement this conversion ourselves for cftime.datetime arrays? This way we could ensure this was always supported, and we could include appropriate errors for out-of-bounds times (the NumPy method currently overflows in that case) and warnings for switching from non-standard calendars.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5107/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
802734042 MDU6SXNzdWU4MDI3MzQwNDI= 4870 Time encoding error associated with cftime > 1.4.0 spencerkclark 6628425 closed 0     0 2021-02-06T16:15:20Z 2021-02-07T23:12:30Z 2021-02-07T23:12:30Z MEMBER      

As of cftime > 1.4.0, the return type of cftime.date2num can either be an integer or float. An integer dtype is used if the times can all be encoded exactly with the provided units; otherwise a float dtype is used. This causes problems in our current encoding pipeline, because we call cftime.date2num on dates one at a time through np.vectorize, and np.vectorize infers the type of the full returned array based on the result of the first function evaluation. If the first result is an integer, then the full array will be assumed to have an integer dtype, and any values that should be floats are cast as integers.

What happened:

``` In [1]: import cftime; import numpy as np; import xarray as xr

In [2]: times = np.array([cftime.DatetimeGregorian(2000, 1, 1), cftime.DatetimeGregorian(2000, 1, 1, 1)])

In [3]: xr.coding.times._encode_datetime_with_cftime(times, "days since 2000-01-01", calendar="gregorian") Out[3]: array([0, 0]) ```

What you expected to happen:

In [3]: xr.coding.times._encode_datetime_with_cftime(times, "days since 2000-01-01", calendar="gregorian") Out[3]: array([0. , 0.04166667])

A solution here would be to encode the times with a list comprehension instead, and cast the final result to an array, in which case NumPy infers the dtype in a more sensible way.

Environment:

Output of <tt>xr.show_versions()</tt> ``` INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 14:38:56) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 20.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.6.2 xarray: 0.16.2.dev175+g8cc34cb4.d20210201 pandas: 1.1.3 numpy: 1.19.1 scipy: 1.2.1 netCDF4: 1.5.1.2 pydap: installed h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: 1.4.1 nc_time_axis: 1.1.1.dev5+g531dd0d PseudoNetCDF: None rasterio: 1.0.25 cfgrib: 0.9.7.1 iris: None bottleneck: 1.2.1 dask: 2.11.0 distributed: 2.11.0 matplotlib: 3.3.2 cartopy: None seaborn: 0.9.0 numbagg: installed pint: None setuptools: 51.0.0.post20201207 pip: 19.2.2 conda: None pytest: 5.0.1 IPython: 7.10.1 sphinx: 3.0.4 ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4870/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
360420464 MDU6SXNzdWUzNjA0MjA0NjQ= 2416 Indicate calendar type in CFTimeIndex repr spencerkclark 6628425 closed 0     5 2018-09-14T19:07:04Z 2020-11-20T01:00:41Z 2020-07-23T10:42:29Z MEMBER      

Currently CFTimeIndex uses the default repr it inherits from pandas.Index. This just displays a potentially-truncated version of the values in the index, along with the index's data type and length, e.g.: CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00, 2000-01-03 00:00:00, 2000-01-04 00:00:00, 2000-01-05 00:00:00, 2000-01-06 00:00:00, 2000-01-07 00:00:00, 2000-01-08 00:00:00, 2000-01-09 00:00:00, 2000-01-10 00:00:00, ... 2000-12-22 00:00:00, 2000-12-23 00:00:00, 2000-12-24 00:00:00, 2000-12-25 00:00:00, 2000-12-26 00:00:00, 2000-12-27 00:00:00, 2000-12-28 00:00:00, 2000-12-29 00:00:00, 2000-12-30 00:00:00, 2000-12-31 00:00:00], dtype='object', length=366) It would be nice if the repr also included an indication of the calendar type of the index, since different indexes could have different calendar types. For example: CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00, 2000-01-03 00:00:00, 2000-01-04 00:00:00, 2000-01-05 00:00:00, 2000-01-06 00:00:00, 2000-01-07 00:00:00, 2000-01-08 00:00:00, 2000-01-09 00:00:00, 2000-01-10 00:00:00, ... 2000-12-22 00:00:00, 2000-12-23 00:00:00, 2000-12-24 00:00:00, 2000-12-25 00:00:00, 2000-12-26 00:00:00, 2000-12-27 00:00:00, 2000-12-28 00:00:00, 2000-12-29 00:00:00, 2000-12-30 00:00:00, 2000-12-31 00:00:00], dtype='object', length=366, calendar='proleptic_gregorian')

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2416/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
431970156 MDU6SXNzdWU0MzE5NzAxNTY= 2886 Expose use_cftime option in open_zarr spencerkclark 6628425 closed 0     7 2019-04-11T11:24:48Z 2020-09-02T15:19:32Z 2020-09-02T15:19:32Z MEMBER      

use_cftime was recently added as an option to decode_cf and open_dataset to give users a little more control over how times are decoded (#2759). It would be good if it was also available for open_zarr. This perhaps doesn't have quite the importance, because open_zarr only works for single data stores, so there is no risk of decoding times to different types (e.g. as there was for open_mfdataset, #1263); however, it would still be nice to be able to silence serialization warnings that result from decoding times to cftime objects in some instances, e.g. #2754.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2886/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
539648897 MDU6SXNzdWU1Mzk2NDg4OTc= 3641 interp with long cftime coordinates raises an error spencerkclark 6628425 closed 0     8 2019-12-18T12:23:16Z 2020-01-26T14:10:37Z 2020-01-26T14:10:37Z MEMBER      

MCVE Code Sample

``` In [1]: import xarray as xr

In [2]: times = xr.cftime_range('0001', periods=3, freq='500Y')

In [3]: da = xr.DataArray(range(3), dims=['time'], coords=[times])

In [4]: da.interp(time=['0002-05-01'])

TypeError Traceback (most recent call last) <ipython-input-4-f781cb4d500e> in <module> ----> 1 da.interp(time=['0002-05-01'])

~/Software/miniconda3/envs/xarray-tests/lib/python3.7/site-packages/xarray/core/dataarray.py in interp(self, coords, method, assume_sorted, kwargs, coords_kwargs) 1353 kwargs=kwargs, 1354 assume_sorted=assume_sorted, -> 1355 coords_kwargs, 1356 ) 1357 return self._from_temp_dataset(ds)

~/Software/miniconda3/envs/xarray-tests/lib/python3.7/site-packages/xarray/core/dataset.py in interp(self, coords, method, assume_sorted, kwargs, coords_kwargs) 2565 if k in var.dims 2566 } -> 2567 variables[name] = missing.interp(var, var_indexers, method, kwargs) 2568 elif all(d not in indexers for d in var.dims): 2569 # keep unrelated object array

~/Software/miniconda3/envs/xarray-tests/lib/python3.7/site-packages/xarray/core/missing.py in interp(var, indexes_coords, method, *kwargs) 607 new_dims = broadcast_dims + list(destination[0].dims) 608 interped = interp_func( --> 609 var.transpose(original_dims).data, x, destination, method, kwargs 610 ) 611

~/Software/miniconda3/envs/xarray-tests/lib/python3.7/site-packages/xarray/core/missing.py in interp_func(var, x, new_x, method, kwargs) 683 ) 684 --> 685 return _interpnd(var, x, new_x, func, kwargs) 686 687

~/Software/miniconda3/envs/xarray-tests/lib/python3.7/site-packages/xarray/core/missing.py in _interpnd(var, x, new_x, func, kwargs) 698 699 def _interpnd(var, x, new_x, func, kwargs): --> 700 x, new_x = _floatize_x(x, new_x) 701 702 if len(x) == 1:

~/Software/miniconda3/envs/xarray-tests/lib/python3.7/site-packages/xarray/core/missing.py in _floatize_x(x, new_x) 556 # represented by float. 557 xmin = x[i].values.min() --> 558 x[i] = x[i]._to_numeric(offset=xmin, dtype=np.float64) 559 new_x[i] = new_x[i]._to_numeric(offset=xmin, dtype=np.float64) 560 return x, new_x

~/Software/miniconda3/envs/xarray-tests/lib/python3.7/site-packages/xarray/core/variable.py in _to_numeric(self, offset, datetime_unit, dtype) 2001 """ 2002 numeric_array = duck_array_ops.datetime_to_numeric( -> 2003 self.data, offset, datetime_unit, dtype 2004 ) 2005 return type(self)(self.dims, numeric_array, self._attrs)

~/Software/miniconda3/envs/xarray-tests/lib/python3.7/site-packages/xarray/core/duck_array_ops.py in datetime_to_numeric(array, offset, datetime_unit, dtype) 410 if array.dtype.kind in "mM": 411 return np.where(isnull(array), np.nan, array.astype(dtype)) --> 412 return array.astype(dtype) 413 414

TypeError: float() argument must be a string or a number, not 'datetime.timedelta' ```

Problem Description

In principle we should be able to get this to work. The issue stems from the following logic in datetime_to_numeric: https://github.com/pydata/xarray/blob/45fd0e63f43cf313b022a33aeec7f0f982e1908b/xarray/core/duck_array_ops.py#L402-L404 Here we are relying on pandas to convert an array of datetime.timedelta objects to an array with dtype timedelta64[ns]. If the array of datetime.timedelta objects cannot be safely converted to timedelta64[ns] (e.g. due to an integer overflow) then this line is silently a no-op which leads to the error downstream at the dtype conversion step. This is my fault originally for suggesting this approach, https://github.com/pydata/xarray/pull/2668#discussion_r247271576.

~~To solve this I think we'll need to write our own logic to convert datetime.timedelta objects to numeric values instead of relying on pandas/NumPy.~~ (as @huard notes we should be able to use NumPy directly here for the conversion). We should not consider ourselves beholden to using nanosecond resolution for a couple of reasons: 1. datetime.timedelta objects do not natively support nanosecond resolution; they have microsecond resolution natively, which corresponds with a NumPy timedelta range of +/- 2.9e5 years. 2. One motivation/use-case for cftime dates is that they can represent long time periods that cannot be represented using a standard DatetimeIndex. We should do everything we can to support this with a CFTimeIndex.

@huard @dcherian this is an important issue we'll need to solve to be able to use a fixed offset for cftime dates for an application like polyfit/polyval.

xref: #3349 and #3631.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 14:38:56) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 19.0.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: None xarray: 0.14.1 pandas: 0.25.0 numpy: 1.17.0 scipy: 1.3.1 netCDF4: None pydap: installed h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.0.25 cfgrib: 0.9.7.1 iris: None bottleneck: 1.2.1 dask: 2.9.0+2.gd0daa5bc distributed: 2.9.0 matplotlib: 3.1.1 cartopy: None seaborn: 0.9.0 numbagg: installed setuptools: 42.0.2.post20191201 pip: 19.2.2 conda: None pytest: 5.0.1 IPython: 7.10.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3641/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
538068264 MDU6SXNzdWU1MzgwNjgyNjQ= 3624 Issue serializing arrays of times with certain dtype and _FillValue encodings spencerkclark 6628425 closed 0     0 2019-12-15T15:44:08Z 2020-01-15T15:22:30Z 2020-01-15T15:22:30Z MEMBER      

MCVE Code Sample

``` In [1]: import numpy as np; import pandas as pd; import xarray as xr

In [2]: times = pd.date_range('2000', periods=3)

In [3]: da = xr.DataArray(times, dims=['a'], coords=[[1, 2, 3]], name='foo')

In [4]: da.encoding['_FillValue'] = 1.0e20

In [5]: da.encoding['dtype'] = np.dtype('float64')

In [6]: da.to_dataset().to_netcdf('test.nc')

OverflowError Traceback (most recent call last) <ipython-input-6-cbc6b2cfdf9a> in <module> ----> 1 da.to_dataset().to_netcdf('test.nc')

~/Software/xarray/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1548 unlimited_dims=unlimited_dims, 1549 compute=compute, -> 1550 invalid_netcdf=invalid_netcdf, 1551 ) 1552

~/Software/xarray/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1071 # to be parallelized with dask 1072 dump_to_store( -> 1073 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1074 ) 1075 if autoclose:

~/Software/xarray/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1117 variables, attrs = encoder(variables, attrs) 1118 -> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) 1120 1121

~/Software/xarray/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 291 writer = ArrayWriter() 292 --> 293 variables, attributes = self.encode(variables, attributes) 294 295 self.set_attributes(attributes)

~/Software/xarray/xarray/backends/common.py in encode(self, variables, attributes) 380 # All NetCDF files get CF encoded by default, without this attempting 381 # to write times, for example, would fail. --> 382 variables, attributes = cf_encoder(variables, attributes) 383 variables = {k: self.encode_variable(v) for k, v in variables.items()} 384 attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}

~/Software/xarray/xarray/conventions.py in cf_encoder(variables, attributes) 758 _update_bounds_encoding(variables) 759 --> 760 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()} 761 762 # Remove attrs from bounds variables (issue #2921)

~/Software/xarray/xarray/conventions.py in <dictcomp>(.0) 758 _update_bounds_encoding(variables) 759 --> 760 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()} 761 762 # Remove attrs from bounds variables (issue #2921)

~/Software/xarray/xarray/conventions.py in encode_cf_variable(var, needs_copy, name) 248 variables.UnsignedIntegerCoder(), 249 ]: --> 250 var = coder.encode(var, name=name) 251 252 # TODO(shoyer): convert all of these to use coders, too:

~/Software/xarray/xarray/coding/variables.py in encode(self, variable, name) 163 if fv is not None: 164 # Ensure _FillValue is cast to same dtype as data's --> 165 encoding["_FillValue"] = data.dtype.type(fv) 166 fill_value = pop_to(encoding, attrs, "_FillValue", name=name) 167 if not pd.isnull(fill_value):

OverflowError: Python int too large to convert to C long ```

Expected Output

I think this should succeed in writing to a netCDF file (it worked in xarray 0.14.0 and earlier).

Problem Description

I think this (admittedly very subtle) issue was introduced in https://github.com/pydata/xarray/pull/3502. Essentially at the time data enters CFMaskCoder.encode it does not necessarily have the dtype it will ultimately be encoded with. In the case of this example, data has type int64, but when it will be stored in the netCDF file it will be a double-precision float.

A possible solution here might be to rely on encoding['dtype'] (if it exists) to determine the type to cast the encoding values for '_FillValue' and 'missing_value' to, instead of relying solely on data.dtype (maybe use that as a fallback).

cc: @spencerahill

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Dec 6 2019, 08:36:57) [Clang 9.0.0 (tags/RELEASE_900/final)] python-bits: 64 OS: Darwin OS-release: 19.0.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.1 xarray: master pandas: 0.25.3 numpy: 1.17.3 scipy: 1.3.2 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.9.0 distributed: 2.9.0 matplotlib: 3.1.2 cartopy: None seaborn: None numbagg: None setuptools: 42.0.2.post20191203 pip: 19.3.1 conda: None pytest: 5.3.2 IPython: 7.10.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3624/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
534404865 MDU6SXNzdWU1MzQ0MDQ4NjU= 3603 Test failure with dask master spencerkclark 6628425 closed 0     2 2019-12-07T13:55:54Z 2019-12-30T17:46:44Z 2019-12-30T17:46:44Z MEMBER      

It looks like https://github.com/dask/dask/pull/5684, which adds nanmedian to dask (nice!), caused the error message to change for when one tries to reduce an array over all axes via median (i.e. it no longer contains 'dask', because xarray now dispatches to the newly added dask function instead of failing before trying that).

@dcherian do you have thoughts on how to best address this? Should we just remove that check in test_reduce? ``` =================================== FAILURES =================================== _____ TestVariable.test_reduce _______

error = <class 'NotImplementedError'>, pattern = 'dask'

@contextmanager
def raises_regex(error, pattern):
    __tracebackhide__ = True
    with pytest.raises(error) as excinfo:
      yield

xarray/tests/init.py:104:


self = <xarray.tests.test_dask.TestVariable object at 0x7fd14f8e9c88>

def test_reduce(self):
    u = self.eager_var
    v = self.lazy_var
    self.assertLazyAndAllClose(u.mean(), v.mean())
    self.assertLazyAndAllClose(u.std(), v.std())
    with raise_if_dask_computes():
        actual = v.argmax(dim="x")
    self.assertLazyAndAllClose(u.argmax(dim="x"), actual)
    with raise_if_dask_computes():
        actual = v.argmin(dim="x")
    self.assertLazyAndAllClose(u.argmin(dim="x"), actual)
    self.assertLazyAndAllClose((u > 1).any(), (v > 1).any())
    self.assertLazyAndAllClose((u < 1).all("x"), (v < 1).all("x"))
    with raises_regex(NotImplementedError, "dask"):
      v.median()

xarray/tests/test_dask.py:220:


self = <xarray.Variable (x: 4, y: 6)> dask.array<array, shape=(4, 6), dtype=float64, chunksize=(2, 2), chunktype=numpy.ndarray> dim = None, axis = None, skipna = None, kwargs = {}

def wrapped_func(self, dim=None, axis=None, skipna=None, **kwargs):
  return self.reduce(func, dim, axis, skipna=skipna, **kwargs)

xarray/core/common.py:46:


self = <xarray.Variable (x: 4, y: 6)> dask.array<array, shape=(4, 6), dtype=float64, chunksize=(2, 2), chunktype=numpy.ndarray> func = <function _create_nan_agg_method.\<locals>.f at 0x7fd16c228378> dim = None, axis = None, keep_attrs = None, keepdims = False, allow_lazy = True kwargs = {'skipna': None} input_data = dask.array<array, shape=(4, 6), dtype=float64, chunksize=(2, 2), chunktype=numpy.ndarray>

def reduce(
    self,
    func,
    dim=None,
    axis=None,
    keep_attrs=None,
    keepdims=False,
    allow_lazy=None,
    **kwargs,
):
    """Reduce this array by applying `func` along some dimension(s).

    Parameters
    ----------
    func : function
        Function which can be called in the form
        `func(x, axis=axis, **kwargs)` to return the result of reducing an
        np.ndarray over an integer valued axis.
    dim : str or sequence of str, optional
        Dimension(s) over which to apply `func`.
    axis : int or sequence of int, optional
        Axis(es) over which to apply `func`. Only one of the 'dim'
        and 'axis' arguments can be supplied. If neither are supplied, then
        the reduction is calculated over the flattened array (by calling
        `func(x)` without an axis argument).
    keep_attrs : bool, optional
        If True, the variable's attributes (`attrs`) will be copied from
        the original object to the new one.  If False (default), the new
        object will be returned without attributes.
    keepdims : bool, default False
        If True, the dimensions which are reduced are left in the result
        as dimensions of size one
    **kwargs : dict
        Additional keyword arguments passed on to `func`.

    Returns
    -------
    reduced : Array
        Array with summarized data and the indicated dimension(s)
        removed.
    """
    if dim == ...:
        dim = None
    if dim is not None and axis is not None:
        raise ValueError("cannot supply both 'axis' and 'dim' arguments")

    if dim is not None:
        axis = self.get_axis_num(dim)

    if allow_lazy is not None:
        warnings.warn(
            "allow_lazy is deprecated and will be removed in version 0.16.0. It is now True by default.",
            DeprecationWarning,
        )
    else:
        allow_lazy = True

    input_data = self.data if allow_lazy else self.values

    if axis is not None:
        data = func(input_data, axis=axis, **kwargs)
    else:
      data = func(input_data, **kwargs)

xarray/core/variable.py:1534:


values = dask.array<array, shape=(4, 6), dtype=float64, chunksize=(2, 2), chunktype=numpy.ndarray> axis = None, skipna = None, kwargs = {} func = <function nanmedian at 0x7fd16c226bf8>, nanname = 'nanmedian'

def f(values, axis=None, skipna=None, **kwargs):
    if kwargs.pop("out", None) is not None:
        raise TypeError(f"`out` is not valid for {name}")

    values = asarray(values)

    if coerce_strings and values.dtype.kind in "SU":
        values = values.astype(object)

    func = None
    if skipna or (skipna is None and values.dtype.kind in "cfO"):
        nanname = "nan" + name
        func = getattr(nanops, nanname)
    else:
        func = _dask_or_eager_func(name)

    try:
      return func(values, axis=axis, **kwargs)

xarray/core/duck_array_ops.py:307:


a = dask.array<array, shape=(4, 6), dtype=float64, chunksize=(2, 2), chunktype=numpy.ndarray> axis = None, out = None

def nanmedian(a, axis=None, out=None):
  return _dask_or_eager_func("nanmedian", eager_module=nputils)(a, axis=axis)

xarray/core/nanops.py:144:


args = (dask.array<array, shape=(4, 6), dtype=float64, chunksize=(2, 2), chunktype=numpy.ndarray>,) kwargs = {'axis': None} dispatch_args = (dask.array<array, shape=(4, 6), dtype=float64, chunksize=(2, 2), chunktype=numpy.ndarray>,) wrapped = <function nanmedian at 0x7fd1737bcea0>

def f(*args, **kwargs):
    if list_of_args:
        dispatch_args = args[0]
    else:
        dispatch_args = args[array_args]
    if any(isinstance(a, dask_array.Array) for a in dispatch_args):
        try:
            wrapped = getattr(dask_module, name)
        except AttributeError as e:
            raise AttributeError(f"{e}: requires dask >={requires_dask}")
    else:
        wrapped = getattr(eager_module, name)
  return wrapped(*args, **kwargs)

xarray/core/duck_array_ops.py:47:


a = dask.array<array, shape=(4, 6), dtype=float64, chunksize=(2, 2), chunktype=numpy.ndarray> axis = None, keepdims = False, out = None

@derived_from(np)
def nanmedian(a, axis=None, keepdims=False, out=None):
    """
    This works by automatically chunking the reduced axes to a single chunk
    and then calling ``numpy.nanmedian`` function across the remaining dimensions
    """
    if axis is None:
        raise NotImplementedError(
          "The da.nanmedian function only works along an axis or a subset of axes.  "
            "The full algorithm is difficult to do in parallel"
        )

E NotImplementedError: The da.nanmedian function only works along an axis or a subset of axes. The full algorithm is difficult to do in parallel

/usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/dask/array/reductions.py:1299: NotImplementedError

During handling of the above exception, another exception occurred:

self = <xarray.tests.test_dask.TestVariable object at 0x7fd14f8e9c88>

def test_reduce(self):
    u = self.eager_var
    v = self.lazy_var
    self.assertLazyAndAllClose(u.mean(), v.mean())
    self.assertLazyAndAllClose(u.std(), v.std())
    with raise_if_dask_computes():
        actual = v.argmax(dim="x")
    self.assertLazyAndAllClose(u.argmax(dim="x"), actual)
    with raise_if_dask_computes():
        actual = v.argmin(dim="x")
    self.assertLazyAndAllClose(u.argmin(dim="x"), actual)
    self.assertLazyAndAllClose((u > 1).any(), (v > 1).any())
    self.assertLazyAndAllClose((u < 1).all("x"), (v < 1).all("x"))
    with raises_regex(NotImplementedError, "dask"):
      v.median()

xarray/tests/test_dask.py:220:


self = <contextlib._GeneratorContextManager object at 0x7fd14f8bcc50> type = <class 'NotImplementedError'> value = NotImplementedError('The da.nanmedian function only works along an axis or a subset of axes. The full algorithm is difficult to do in parallel') traceback = <traceback object at 0x7fd154597bc8>

def __exit__(self, type, value, traceback):
    if type is None:
        try:
            next(self.gen)
        except StopIteration:
            return False
        else:
            raise RuntimeError("generator didn't stop")
    else:
        if value is None:
            # Need to force instantiation so we can reliably
            # tell if we get the same exception back
            value = type()
        try:
          self.gen.throw(type, value, traceback)

E AssertionError: exception NotImplementedError('The da.nanmedian function only works along an axis or a subset of axes. The full algorithm is difficult to do in parallel') did not match pattern 'dask' ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3603/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
427398236 MDU6SXNzdWU0MjczOTgyMzY= 2856 Roundtripping between a dimension coordinate and scalar coordinate on a Dataset spencerkclark 6628425 closed 0     4 2019-03-31T13:42:39Z 2019-04-04T21:58:24Z 2019-04-04T21:58:24Z MEMBER      

Code Sample, a copy-pastable example if possible

In xarray 0.12.0 the following example produces a Dataset with no indexes: ``` In [1]: import xarray as xr

In [2]: da = xr.DataArray([1], [('x', [0])], name='a')

In [3]: da.to_dataset().isel(x=0).expand_dims('x').indexes Out[3]: ```

Expected Output

In xarray 0.11.3 the roundtrip sequence above properly recovers the initial index along the 'x' dimension: ``` In [1]: import xarray as xr

In [2]: da = xr.DataArray([1], [('x', [0])], name='a')

In [3]: da.to_dataset().isel(x=0).expand_dims('x').indexes Out[3]: x: Int64Index([0], dtype='int64', name='x') ```

Output of xr.show_versions()

``` INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 02:16:08) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.1 libnetcdf: 4.6.1 xarray: 0.12.0 pandas: 0.24.2 numpy: 1.13.1 scipy: 0.19.1 netCDF4: 1.4.0 pydap: None h5netcdf: 0.5.1 h5py: 2.8.0 Nio: None zarr: None cftime: 1.0.0 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.0 dask: 0.17.5 distributed: 1.21.8 matplotlib: 2.0.2 cartopy: None seaborn: None setuptools: 40.5.0 pip: 9.0.1 conda: None pytest: 3.10.0 IPython: 6.4.0 sphinx: 1.7.4 ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2856/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
408772665 MDU6SXNzdWU0MDg3NzI2NjU= 2761 'standard' calendar refers to 'proleptic_gregorian' in cftime_range rather than 'gregorian' spencerkclark 6628425 closed 0     2 2019-02-11T13:06:05Z 2019-02-15T21:58:16Z 2019-02-15T21:58:16Z MEMBER      

Code Sample, a copy-pastable example if possible

```python In [1]: import xarray

In [2]: xarray.cftime_range('2000', periods=3, calendar='standard').values Out[2]: array([cftime.DatetimeProlepticGregorian(2000, 1, 1, 0, 0, 0, 0, -1, 1), cftime.DatetimeProlepticGregorian(2000, 1, 2, 0, 0, 0, 0, -1, 1), cftime.DatetimeProlepticGregorian(2000, 1, 3, 0, 0, 0, 0, -1, 1)], dtype=object) ```

Problem description

When writing cftime_range I used dates from a proleptic Gregorian calendar when the calendar type was specified as 'standard'. While this is consistent with Python's built-in datetime.datetime (which uses a proleptic Gregorian calendar), this differs from the behavior in cftime.num2date and ultimately the CF conventions, which state that 'standard' should refer to the true Gregorian calendar. My inclination is that considering "cf" is in the name of cftime_range, we should adhere to those conventions as closely as possible (and hence the way I initially coded things was a mistake).

Expected Output

python In [2]: xarray.cftime_range('2000', periods=3, calendar='standard').values Out[2]: array([cftime.DatetimeGregorian(2000, 1, 1, 0, 0, 0, 0, -1, 1), cftime.DatetimeGregorian(2000, 1, 2, 0, 0, 0, 0, -1, 1), cftime.DatetimeGregorian(2000, 1, 3, 0, 0, 0, 0, -1, 1)], dtype=object)

Do others agree that we should fix this? If we were to make this change, would it be appropriate to consider it a bug and simply make the breaking change immediately, or might we need a deprecation cycle?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2761/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
400504690 MDU6SXNzdWU0MDA1MDQ2OTA= 2688 dropna() for a Series indexed by a CFTimeIndex spencerkclark 6628425 closed 0     3 2019-01-17T23:15:29Z 2019-02-02T06:56:12Z 2019-02-02T06:56:12Z MEMBER      

Code Sample, a copy-pastable example if possible

Currently something like the following raises an error: ``` In [1]: import xarray as xr

In [2]: import pandas as pd

In [3]: import numpy as np

In [4]: times = xr.cftime_range('2000', periods=3)

In [5]: series = pd.Series(np.array([0., np.nan, 1.]), index=times)

In [6]: series Out[6]: 2000-01-01 00:00:00 0.0 2000-01-02 00:00:00 NaN 2000-01-03 00:00:00 1.0 dtype: float64

In [7]: series.dropna()

TypeError Traceback (most recent call last) <ipython-input-7-45eb0c023203> in <module> ----> 1 series.dropna()

~/pandas/pandas/core/series.py in dropna(self, axis, inplace, **kwargs) 4169 4170 if self._can_hold_na: -> 4171 result = remove_na_arraylike(self) 4172 if inplace: 4173 self._update_inplace(result)

~/pandas/pandas/core/dtypes/missing.py in remove_na_arraylike(arr) 539 return arr[notna(arr)] 540 else: --> 541 return arr[notna(lib.values_from_object(arr))]

~/pandas/pandas/core/series.py in getitem(self, key) 801 key = com.apply_if_callable(key, self) 802 try: --> 803 result = self.index.get_value(self, key) 804 805 if not is_scalar(result):

~/xarray-dev/xarray/xarray/coding/cftimeindex.py in get_value(self, series, key) 321 """Adapted from pandas.tseries.index.DatetimeIndex.get_value""" 322 if not isinstance(key, slice): --> 323 return series.iloc[self.get_loc(key)] 324 else: 325 return series.iloc[self.slice_indexer(

~/xarray-dev/xarray/xarray/coding/cftimeindex.py in get_loc(self, key, method, tolerance) 300 else: 301 return pd.Index.get_loc(self, key, method=method, --> 302 tolerance=tolerance) 303 304 def _maybe_cast_slice_bound(self, label, side, kind):

~/pandas/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 2595 'backfill or nearest lookups') 2596 try: -> 2597 return self._engine.get_loc(key) 2598 except KeyError: 2599 return self._engine.get_loc(self._maybe_cast_indexer(key))

~/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

~/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

TypeError: '[ True False True]' is an invalid key ```

Problem description

We currently rely on this in the resampling logic within xarray for a Series indexed by a DatetimeIndex: https://github.com/pydata/xarray/blob/dc87dea52351835af472d131f70a7f7603b3100e/xarray/core/groupby.py#L268 It would be nice if we could do the same with a Series indexed by a CFTimeIndex, e.g. in #2593.

Expected Output

In [7]: series.dropna() Out[7]: 2000-01-01 00:00:00 0.0 2000-01-03 00:00:00 1.0 dtype: float64

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.1 | packaged by conda-forge | (default, Nov 13 2018, 09:50:42) [Clang 9.0.0 (clang-900.0.37)] python-bits: 64 OS: Darwin OS-release: 18.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.10.9+117.g80914e0.dirty pandas: 0.24.0.dev0+1332.g5d134ec numpy: 1.15.4 scipy: 1.1.0 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None cyordereddict: None dask: 1.0.0 distributed: 1.25.2 matplotlib: 3.0.2 cartopy: None seaborn: 0.9.0 setuptools: 40.6.3 pip: 18.1 conda: None pytest: 3.10.1 IPython: 7.2.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2688/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
398918281 MDU6SXNzdWUzOTg5MTgyODE= 2671 Enable subtracting a scalar cftime.datetime object from a CFTimeIndex spencerkclark 6628425 closed 0     0 2019-01-14T14:42:12Z 2019-01-30T16:45:10Z 2019-01-30T16:45:10Z MEMBER      

Code Sample, a copy-pastable example if possible

``` In [1]: import xarray

In [2]: times = xarray.cftime_range('2000', periods=3)

In [3]: times - times[0]

TypeError Traceback (most recent call last) <ipython-input-3-97cbca76a8af> in <module> ----> 1 times - times[0]

~/xarray-dev/xarray/xarray/coding/cftimeindex.py in sub(self, other) 417 return CFTimeIndex(np.array(self) - other.to_pytimedelta()) 418 else: --> 419 return CFTimeIndex(np.array(self) - other) 420 421 def _add_delta(self, deltas):

~/xarray-dev/xarray/xarray/coding/cftimeindex.py in new(cls, data, name) 238 result = object.new(cls) 239 result._data = np.array(data, dtype='O') --> 240 assert_all_valid_date_type(result._data) 241 result.name = name 242 return result

~/xarray-dev/xarray/xarray/coding/cftimeindex.py in assert_all_valid_date_type(data) 194 raise TypeError( 195 'CFTimeIndex requires cftime.datetime ' --> 196 'objects. Got object of {}.'.format(date_type)) 197 if not all(isinstance(value, date_type) for value in data): 198 raise TypeError(

TypeError: CFTimeIndex requires cftime.datetime objects. Got object of <class 'datetime.timedelta'>. ```

Problem description

This should result in a pandas.TimedeltaIndex, as is the case for a pandas.DatetimeIndex: ``` In [4]: import pandas

In [5]: times = pandas.date_range('2000', periods=3)

In [6]: times - times[0] Out[6]: TimedeltaIndex(['0 days', '1 days', '2 days'], dtype='timedelta64[ns]', freq=None) ```

Expected Output

``` In [1]: import xarray

In [2]: times = xarray.cftime_range('2000', periods=3)

In [3]: times - times[0] Out[3]: TimedeltaIndex(['0 days', '1 days', '2 days'], dtype='timedelta64[ns]', freq=None) ```

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.6 | packaged by conda-forge | (default, Jul 26 2018, 09:55:02) [GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] python-bits: 64 OS: Darwin OS-release: 18.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.10.9+127.ga7129d1 pandas: 0.24.0.dev0+1332.g5d134ec numpy: 1.15.4 scipy: 1.1.0 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None cyordereddict: None dask: 1.0.0 distributed: 1.25.1 matplotlib: 3.0.2 cartopy: None seaborn: 0.9.0 setuptools: 40.6.3 pip: 18.1 conda: None pytest: 3.10.1 IPython: 7.2.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2671/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
369751771 MDU6SXNzdWUzNjk3NTE3NzE= 2484 Enable add/sub operations involving a CFTimeIndex and a TimedeltaIndex spencerkclark 6628425 closed 0     1 2018-10-13T01:00:28Z 2018-10-17T04:00:57Z 2018-10-17T04:00:57Z MEMBER      

``` In [1]: import xarray as xr

In [2]: start_dates = xr.cftime_range('1999-12', periods=12, freq='M')

In [3]: end_dates = start_dates.shift(1, 'M')

In [4]: end_dates - start_dates

TypeError Traceback (most recent call last) <ipython-input-4-43c24409020b> in <module>() ----> 1 end_dates - start_dates

/Users/spencerclark/xarray-dev/xarray/xarray/coding/cftimeindex.pyc in sub(self, other) 365 366 def sub(self, other): --> 367 return CFTimeIndex(np.array(self) - other) 368 369

TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'CFTimeIndex'

```

Problem description

Subtracting one DatetimeIndex from another produces a TimedeltaIndex: ``` In [5]: import pandas as pd

In [6]: start_dates = pd.date_range('1999-12', periods=12, freq='M')

In [7]: end_dates = start_dates.shift(1, 'M')

In [8]: end_dates - start_dates Out[8]: TimedeltaIndex(['31 days', '29 days', '31 days', '30 days', '31 days', '30 days', '31 days', '31 days', '30 days', '31 days', '30 days', '31 days'], dtype='timedelta64[ns]', freq=None) ``` This should also be straightforward to enable for CFTimeIndexes and would be useful, for example, in the problem described in https://github.com/pydata/xarray/issues/2481#issue-369639339.

Expected Output

``` In [1]: import xarray as xr

In [2]: start_dates = xr.cftime_range('1999-12', periods=12, freq='M')

In [3]: end_dates = start_dates.shift(1, 'M')

In [4]: end_dates - start_dates Out[4]: TimedeltaIndex(['31 days', '29 days', '31 days', '30 days', '31 days', '30 days', '31 days', '31 days', '30 days', '31 days', '30 days', '31 days'], dtype='timedelta64[ns]', freq=None) ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2484/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
324758225 MDU6SXNzdWUzMjQ3NTgyMjU= 2165 CFTimeIndex improperly handles string slice for length-1 indexes spencerkclark 6628425 closed 0     0 2018-05-21T00:51:55Z 2018-05-21T08:02:35Z 2018-05-21T08:02:35Z MEMBER      

Code Sample, a copy-pastable example if possible

``` In [1]: import xarray as xr

In [2]: import cftime

In [3]: index = xr.CFTimeIndex([cftime.DatetimeNoLeap(1, 1, 1)])

In [4]: da = xr.DataArray([1], coords=[index], dims=['time'])

In [5]: da.sel(time=slice('0001', '0001')) Out[5]: <xarray.DataArray (time: 0)> array([], dtype=int64) Coordinates: * time (time) object ```

Problem description

When a CFTimeIndex is created with a single element, slicing using strings does not work; the example above should behave analogously to as it does when using a DatetimeIndex: ``` In [9]: import pandas as pd

In [10]: index = pd.DatetimeIndex(['2000-01-01'])

In [11]: da = xr.DataArray([1], coords=[index], dims=['time'])

In [12]: da.sel(time=slice('2000', '2000')) Out[12]: <xarray.DataArray (time: 1)> array([1]) Coordinates: * time (time) datetime64[ns] 2000-01-01 ``` I have a fix for this, which I will push shortly.

Expected Output

In [5]: da.sel(time=slice('0001', '0001')) Out[5]: <xarray.DataArray (time: 1)> array([1]) Coordinates: * time (time) object 0001-01-01 00:00:00

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 17.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.4 pandas: 0.20.2 numpy: 1.13.1 scipy: 0.19.1 netCDF4: 1.4.0 h5netcdf: 0.5.1 h5py: 2.8.0 Nio: None zarr: None bottleneck: 1.2.0 cyordereddict: None dask: 0.15.0 distributed: 1.17.1 matplotlib: 2.0.2 cartopy: None seaborn: None setuptools: 33.1.1.post20170320 pip: 9.0.1 conda: None pytest: 3.1.2 IPython: 6.1.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2165/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
322591813 MDU6SXNzdWUzMjI1OTE4MTM= 2127 cftime.datetime serialization example failing in latest doc build spencerkclark 6628425 closed 0     9 2018-05-13T12:58:15Z 2018-05-14T19:17:37Z 2018-05-14T19:17:37Z MEMBER      

Code Sample, a copy-pastable example if possible

``` In [1]: from itertools import product

In [2]: import numpy as np

In [3]: import xarray as xr

In [4]: from cftime import DatetimeNoLeap

In [5]: dates = [DatetimeNoLeap(year, month, 1) for year, month in product(range ...: (1, 3), range(1, 13))]

In [6]: with xr.set_options(enable_cftimeindex=True): ...: da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'], name='foo') ...:

In [7]: da.to_netcdf('test.nc')

TypeError Traceback (most recent call last) <ipython-input-7-306dbf0ba669> in <module>() ----> 1 da.to_netcdf('test.nc')

/Users/spencerclark/xarray-dev/xarray/xarray/core/dataarray.pyc in to_netcdf(self, args, kwargs) 1514 dataset = self.to_dataset() 1515 -> 1516 return dataset.to_netcdf(args, **kwargs) 1517 1518 def to_dict(self):

/Users/spencerclark/xarray-dev/xarray/xarray/core/dataset.pyc in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims) 1143 return to_netcdf(self, path, mode, format=format, group=group, 1144 engine=engine, encoding=encoding, -> 1145 unlimited_dims=unlimited_dims) 1146 1147 def to_zarr(self, store=None, mode='w-', synchronizer=None, group=None,

/Users/spencerclark/xarray-dev/xarray/xarray/backends/api.pyc in to_netcdf(dataset, path_or_file, mode, format, group, engine, writer, encoding, unlimited_dims) 681 try: 682 dataset.dump_to_store(store, sync=sync, encoding=encoding, --> 683 unlimited_dims=unlimited_dims) 684 if path_or_file is None: 685 return target.getvalue()

/Users/spencerclark/xarray-dev/xarray/xarray/core/dataset.pyc in dump_to_store(self, store, encoder, sync, encoding, unlimited_dims) 1073 1074 store.store(variables, attrs, check_encoding, -> 1075 unlimited_dims=unlimited_dims) 1076 if sync: 1077 store.sync()

/Users/spencerclark/xarray-dev/xarray/xarray/backends/common.pyc in store(self, variables, attributes, check_encoding_set, unlimited_dims) 356 """ 357 --> 358 variables, attributes = self.encode(variables, attributes) 359 360 self.set_attributes(attributes)

/Users/spencerclark/xarray-dev/xarray/xarray/backends/common.pyc in encode(self, variables, attributes) 441 # All NetCDF files get CF encoded by default, without this attempting 442 # to write times, for example, would fail. --> 443 variables, attributes = cf_encoder(variables, attributes) 444 variables = OrderedDict([(k, self.encode_variable(v)) 445 for k, v in variables.items()])

/Users/spencerclark/xarray-dev/xarray/xarray/conventions.pyc in cf_encoder(variables, attributes) 575 """ 576 new_vars = OrderedDict((k, encode_cf_variable(v, name=k)) --> 577 for k, v in iteritems(variables)) 578 return new_vars, attributes

python2/cyordereddict/_cyordereddict.pyx in cyordereddict._cyordereddict.OrderedDict.init (python2/cyordereddict/_cyordereddict.c:1225)()

//anaconda/envs/xarray-dev/lib/python2.7/_abcoll.pyc in update(args, *kwds) 569 self[key] = other[key] 570 else: --> 571 for key, value in other: 572 self[key] = value 573 for key, value in kwds.items():

/Users/spencerclark/xarray-dev/xarray/xarray/conventions.pyc in <genexpr>((k, v)) 575 """ 576 new_vars = OrderedDict((k, encode_cf_variable(v, name=k)) --> 577 for k, v in iteritems(variables)) 578 return new_vars, attributes

/Users/spencerclark/xarray-dev/xarray/xarray/conventions.pyc in encode_cf_variable(var, needs_copy, name) 232 variables.CFMaskCoder(), 233 variables.UnsignedIntegerCoder()]: --> 234 var = coder.encode(var, name=name) 235 236 # TODO(shoyer): convert all of these to use coders, too:

/Users/spencerclark/xarray-dev/xarray/xarray/coding/times.pyc in encode(self, variable, name) 384 data, 385 encoding.pop('units', None), --> 386 encoding.pop('calendar', None)) 387 safe_setitem(attrs, 'units', units, name=name) 388 safe_setitem(attrs, 'calendar', calendar, name=name)

/Users/spencerclark/xarray-dev/xarray/xarray/coding/times.pyc in encode_cf_datetime(dates, units, calendar) 338 339 if units is None: --> 340 units = infer_datetime_units(dates) 341 else: 342 units = _cleanup_netcdf_time_units(units)

/Users/spencerclark/xarray-dev/xarray/xarray/coding/times.pyc in infer_datetime_units(dates) 254 reference_date = dates[0] if len(dates) > 0 else '1970-01-01' 255 reference_date = format_cftime_datetime(reference_date) --> 256 unique_timedeltas = np.unique(np.diff(dates)).astype('timedelta64[ns]') 257 units = _infer_time_units_from_diff(unique_timedeltas) 258 return '%s since %s' % (units, reference_date)

TypeError: Cannot cast datetime.timedelta object from metadata [Y] to [ns] according to the rule 'same_kind' ```

Problem description

This seems to be an edge case that was not covered in the tests I added in #1252. Strangely if I cast the result of np.unique(np.diff(dates)) as an array before converting to 'timedelta64[ns]' objects things work: ``` In [9]: np.unique(np.diff(dates)).astype('timedelta64[ns]')


TypeError Traceback (most recent call last) <ipython-input-9-5d53452b676f> in <module>() ----> 1 np.unique(np.diff(dates)).astype('timedelta64[ns]')

TypeError: Cannot cast datetime.timedelta object from metadata [Y] to [ns] according to the rule 'same_kind'

In [10]: np.array(np.unique(np.diff(dates))).astype('timedelta64[ns]') Out[10]: array([2419200000000000, 2592000000000000, 2678400000000000], dtype='timedelta64[ns]') ``` Might anyone have any ideas as to what the underlying issue is? The fix could be as simple as that, but I don't understand why that makes a difference.

Expected Output

da.to_netcdf('test.nc') should succeed without an error.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.14.final.0 python-bits: 64 OS: Darwin OS-release: 17.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None xarray: 0.8.2+dev641.g7302d7e pandas: 0.22.0 numpy: 1.13.1 scipy: 0.19.1 netCDF4: 1.3.1 h5netcdf: None h5py: 2.7.1 Nio: None zarr: 2.2.0 bottleneck: None cyordereddict: 1.0.0 dask: 0.17.1 distributed: 1.21.3 matplotlib: 2.2.2 cartopy: None seaborn: 0.8.1 setuptools: 38.4.0 pip: 9.0.1 conda: None pytest: 3.3.2 IPython: 5.5.0 sphinx: 1.7.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2127/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
238284894 MDU6SXNzdWUyMzgyODQ4OTQ= 1464 Writing directly to a netCDF file while using distributed spencerkclark 6628425 closed 0     7 2017-06-24T01:28:00Z 2018-03-10T15:43:18Z 2018-03-10T15:43:18Z MEMBER      

I've been experimenting with distributed recently and have run into an issue when saving a result directly to a file using the netcdf4 engine. I've found if I compute things before saving to a file (thus loading the result into memory before calling to_netcdf) things work OK. I attached a minimum working example below.

Can others reproduce this? Part of me thinks there must be something wrong with my setup, because I'm somewhat surprised something like this wouldn't have come up already (apologies in advance if that's the case).

``` In [1]: import dask

In [2]: import distributed

In [3]: import netCDF4

In [4]: import xarray as xr

In [5]: dask.version Out[5]: '0.15.0'

In [6]: distributed.version Out[6]: '1.17.1'

In [7]: netCDF4.version Out[7]: '1.2.9'

In [8]: xr.version Out[8]: '0.9.6'

In [9]: da = xr.DataArray([1., 2., 3.])

In [10]: da.to_netcdf('no-dask.nc')

In [11]: da.chunk().to_netcdf('dask.nc') # Not using distributed yet

In [12]: c = distributed.Client() # Launch a LocalCluster (now using distributed)

In [13]: c Out[13]: <Client: scheduler='tcp://127.0.0.1:44576' processes=16 cores=16>

In [14]: da.chunk().to_netcdf('dask-distributed-netcdf4.nc', engine='netcdf4')

EOFError Traceback (most recent call last) <ipython-input-14-98490239a35f> in <module>() ----> 1 da.chunk().to_netcdf('dask-distributed-netcdf4.nc', engine='netcdf4')

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/xarray/core/dataarray.py in to_netcdf(self, args, kwargs) 1349 dataset = self.to_dataset() 1350 -> 1351 dataset.to_netcdf(args, **kwargs) 1352 1353 def to_dict(self):

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims) 975 return to_netcdf(self, path, mode, format=format, group=group, 976 engine=engine, encoding=encoding, --> 977 unlimited_dims=unlimited_dims) 978 979 def unicode(self):

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, writer, encoding, unlimited_dims) 571 try: 572 dataset.dump_to_store(store, sync=sync, encoding=encoding, --> 573 unlimited_dims=unlimited_dims) 574 if path_or_file is None: 575 return target.getvalue()

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/xarray/core/dataset.py in dump_to_store(self, store, encoder, sync, encoding, unlimited_dims) 916 unlimited_dims=unlimited_dims) 917 if sync: --> 918 store.sync() 919 920 def to_netcdf(self, path=None, mode='w', format=None, group=None,

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in sync(self) 334 def sync(self): 335 with self.ensure_open(autoclose=True): --> 336 super(NetCDF4DataStore, self).sync() 337 self.ds.sync() 338

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/xarray/backends/common.py in sync(self) 200 201 def sync(self): --> 202 self.writer.sync() 203 204 def store_dataset(self, dataset):

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/xarray/backends/common.py in sync(self) 177 import dask 178 if LooseVersion(dask.version) > LooseVersion('0.8.1'): --> 179 da.store(self.sources, self.targets, lock=GLOBAL_LOCK) 180 else: 181 da.store(self.sources, self.targets)

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/dask/array/core.py in store(sources, targets, lock, regions, compute, kwargs) 922 dsk = sharedict.merge((name, updates), *[src.dask for src in sources]) 923 if compute: --> 924 Array._get(dsk, keys, kwargs) 925 else: 926 from ..delayed import Delayed

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/dask/base.py in _get(cls, dsk, keys, get, kwargs) 102 get = get or _globals['get'] or cls._default_get 103 dsk2 = optimization_function(cls)(ensure_dict(dsk), keys, kwargs) --> 104 return get(dsk2, keys, **kwargs) 105 106 @classmethod

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, **kwargs) 1762 if sync: 1763 try: -> 1764 results = self.gather(packed) 1765 finally: 1766 for f in futures.values():

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/distributed/client.py in gather(self, futures, errors, maxsize, direct) 1261 else: 1262 return self.sync(self._gather, futures, errors=errors, -> 1263 direct=direct) 1264 1265 @gen.coroutine

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/distributed/client.py in sync(self, func, args, kwargs) 487 return future 488 else: --> 489 return sync(self.loop, func, args, **kwargs) 490 491 def str(self):

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, args, kwargs) 232 e.wait(1000000) 233 if error[0]: --> 234 six.reraise(error[0]) 235 else: 236 return result[0]

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/six.py in reraise(tp, value, tb) 684 if value.traceback is not tb: 685 raise value.with_traceback(tb) --> 686 raise value 687 688 else:

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/distributed/utils.py in f() 221 raise RuntimeError("sync() called from thread of running loop") 222 yield gen.moment --> 223 result[0] = yield make_coro() 224 except Exception as exc: 225 logger.exception(exc)

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/tornado/gen.py in run(self) 1013 1014 try: -> 1015 value = future.result() 1016 except Exception: 1017 self.had_exception = True

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/tornado/concurrent.py in result(self, timeout) 235 return self._result 236 if self._exc_info is not None: --> 237 raise_exc_info(self._exc_info) 238 self._check_done() 239 return self._result

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/tornado/util.py in raise_exc_info(exc_info)

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/tornado/gen.py in run(self) 1019 1020 if exc_info is not None: -> 1021 yielded = self.gen.throw(*exc_info) 1022 exc_info = None 1023 else:

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/distributed/client.py in _gather(self, futures, errors, direct) 1154 six.reraise(type(exception), 1155 exception, -> 1156 traceback) 1157 if errors == 'skip': 1158 bad_keys.add(key)

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/six.py in reraise(tp, value, tb) 683 value = tp() 684 if value.traceback is not tb: --> 685 raise value.with_traceback(tb) 686 raise value 687

/nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/distributed/protocol/pickle.py in loads() 57 def loads(x): 58 try: ---> 59 return pickle.loads(x) 60 except Exception: 61 logger.info("Failed to deserialize %s", x[:10000], exc_info=True)

EOFError: Ran out of input ```

If I load the data into memory first by invoking compute() things work OK:

In [15]: da.chunk().compute().to_netcdf('dask-distributed-netcdf4.nc', engine='netcdf4')

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1464/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
133136274 MDU6SXNzdWUxMzMxMzYyNzQ= 759 Trouble applying argmin when using xr.open_mfdataset() spencerkclark 6628425 closed 0     1 2016-02-12T01:34:41Z 2016-02-12T16:13:18Z 2016-02-12T16:13:18Z MEMBER      

I recently tried to apply the argmin function on a dataset that I opened using xr.open_mfdataset and encountered an unexpected error. Applying argmin on the same dataset opened using xr.open_dataset works fine. Below is an example with some toy data. Could this be a bug, or is there something I'm doing wrong? I appreciate your help.

``` ipython In [1]: import xarray as xr In [2]: import numpy as np In [3]: xr.DataArray(np.random.rand(2, 3, 4), coords=[np.arange(2), np.arange(3), np.arange(4)], dims=['x', 'y', 'z']).to_dataset(name='test').to_netcdf('test_mfdataset.nc') In [4]: xr.open_dataset('test_mfdataset.nc').test.argmin('x').values Out[4]: array([[1, 1, 1, 1], [1, 0, 1, 0], [1, 1, 0, 1]])

In [5]: xr.open_mfdataset('test_mfdataset.nc').test.argmin('x').values

IndexError Traceback (most recent call last) <ipython-input-8-ccac9ca40874> in <module>() ----> 1 xr.open_mfdataset('test_mfdataset.nc').test.argmin('x').values

//anaconda/lib/python2.7/site-packages/xarray/core/dataarray.py in values(self) 353 def values(self): 354 """The array's data as a numpy.ndarray""" --> 355 return self.variable.values 356 357 @values.setter

//anaconda/lib/python2.7/site-packages/xarray/core/variable.py in values(self) 286 def values(self): 287 """The variable's data as a numpy.ndarray""" --> 288 return _as_array_or_item(self._data_cached()) 289 290 @values.setter

//anaconda/lib/python2.7/site-packages/xarray/core/variable.py in _data_cached(self) 252 def _data_cached(self): 253 if not isinstance(self._data, (np.ndarray, PandasIndexAdapter)): --> 254 self._data = np.asarray(self._data) 255 return self._data 256

//anaconda/lib/python2.7/site-packages/numpy/core/numeric.pyc in asarray(a, dtype, order) 472 473 """ --> 474 return array(a, dtype, copy=False, order=order) 475 476 def asanyarray(a, dtype=None, order=None):

//anaconda/lib/python2.7/site-packages/dask/array/core.py in array(self, dtype, kwargs) 852 853 def array(self, dtype=None, kwargs): --> 854 x = self.compute() 855 if dtype and x.dtype != dtype: 856 x = x.astype(dtype)

//anaconda/lib/python2.7/site-packages/dask/base.py in compute(self, kwargs) 35 36 def compute(self, kwargs): ---> 37 return compute(self, **kwargs)[0] 38 39 @classmethod

//anaconda/lib/python2.7/site-packages/dask/base.py in compute(args, kwargs) 108 for opt, val in groups.items()]) 109 keys = [var._keys() for var in variables] --> 110 results = get(dsk, keys, *kwargs) 111 112 results_iter = iter(results)

//anaconda/lib/python2.7/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, kwargs) 55 results = get_async(pool.apply_async, len(pool._pool), dsk, result, 56 cache=cache, queue=queue, get_id=_thread_get_id, ---> 57 kwargs) 58 59 return results

//anaconda/lib/python2.7/site-packages/dask/async.py in get_async(apply_async, num_workers, dsk, result, cache, queue, get_id, raise_on_exception, rerun_exceptions_locally, callbacks, **kwargs) 479 _execute_task(task, data) # Re-execute locally 480 else: --> 481 raise(remote_exception(res, tb)) 482 state['cache'][key] = res 483 finish_task(dsk, key, state, results, keyorder.get)

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,4) (4,1) (1,3)

Traceback

File "//anaconda/lib/python2.7/site-packages/dask/async.py", line 264, in execute_task result = _execute_task(task, data) File "//anaconda/lib/python2.7/site-packages/dask/async.py", line 246, in _execute_task return func(*args2) File "//anaconda/lib/python2.7/site-packages/toolz/functoolz.py", line 381, in call ret = f(ret) File "//anaconda/lib/python2.7/site-packages/dask/array/reductions.py", line 450, in arg_agg return _arg_combine(data, axis, argfunc)[0] File "//anaconda/lib/python2.7/site-packages/dask/array/reductions.py", line 416, in _arg_combine arg = (arg + offsets)[tuple(inds)] ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/759/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 85.541ms · About: xarray-datasette