github: issues: 20 rows where repo = 13221727, type = "issue" and user = 6628425 sorted by updated

20 rows where repo = 13221727, type = "issue" and user = 6628425 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
1970241789	I_kwDOAMm_X851b4D9	8394	Update cftime frequency strings in line with recent updates in pandas	spencerkclark 6628425	closed	1	2023-10-31T11:24:15Z	2023-11-16T15:19:42Z	2023-11-16T15:19:42Z	MEMBER	What is your issue? Pandas has introduced some deprecations in how frequency strings are specified: Deprecating `"A"`, `"A-JAN"`, etc. in favor of `"Y"`, `"Y-JAN"`, etc. (https://github.com/pandas-dev/pandas/pull/55252) Deprecating `"AS"`, `"AS-JAN"`, etc. in favor of `"YS"`, `"YS-JAN"`, etc. (https://github.com/pandas-dev/pandas/pull/55479) Deprecating `"Q"`, `"Q-JAN"`, etc. in favor of `"QE"`, `"QE-JAN"`, etc. (https://github.com/pandas-dev/pandas/pull/55553) Deprecating `"M"` in favor of `"ME"` (https://github.com/pandas-dev/pandas/pull/54061) Deprecating `"H"` in favor of `"h"` (https://github.com/pandas-dev/pandas/pull/54939) Deprecating `"T"`, `"S"`, `"L"`, and `"U"` in favor of `"min"`, `"s"`, `"ms"`, and `"us"` (https://github.com/pandas-dev/pandas/pull/54061). It would be good to carry these deprecations out for cftime frequency specifications to remain consistent.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8394/reactions", "total_count": 2, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 1, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1413075015	I_kwDOAMm_X85UOdBH	7184	Potentially add option to encode times using `longdouble` values	spencerkclark 6628425	open	0	2022-10-18T11:46:30Z	2022-10-18T11:47:00Z		MEMBER	By default xarray will exactly roundtrip times saved to disk by encoding them using int64 values. However, if a user specifies time encoding units that prevent this, float64 values will be used, and this has the potential to cause roundtripping differences due to roundoff error. Recently, cftime added the ability to encode times using longdouble values (https://github.com/Unidata/cftime/pull/284). On some platforms this offers greater precision than float64 values (though typically not full quad precision). Nevertheless some users might be interested in encoding their times using such values. The main thing that `longdouble` values have going for them is that they enable greater precision when using arbitrary units to encode the dates (with int64 we are constrained to using units that allow for time intervals to be expressed with integers). That said, the more I think about this, the more I feel it may not be the best idea: Since the meaning of `longdouble` can vary from platform to platform, I wonder what happens if you encode times using `longdouble` values on one machine and decode them on another? `longdouble` values cannot be stored with all backends; for example zarr supports it, but netCDF does not. We already provide a robust way to exactly roundtrip any dates--i.e. encode them with int64 values--so adding a less robust (if slightly more flexible in terms of units) option might just cause confusion. It's perhaps still worth opening this issue for discussion in case others have thoughts that might allay those concerns. cc: @jswhit @dcherian	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7184/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1401909544	I_kwDOAMm_X85Tj3Eo	7145	Time decoding error message does not include the problematic variable's name	spencerkclark 6628425	closed	5	2022-10-08T10:59:17Z	2022-10-13T23:21:55Z	2022-10-12T15:25:42Z	MEMBER	What is your issue? If any variable in a Dataset has times that cannot be represented as `cftime.datetime` objects, an error message will be raised. However, this error message will not indicate the problematic variable's name. It would be nice if it did, because it would make it easier for users to determine the source of the error. cc: @durack1 xref: Unidata/cftime#295 Example This is a minimal example of the issue. The error message gives no indication that `"invalid_times"` is the problem: ``` import xarray as xr TIME_ATTRS = {"units": "days since 0001-01-01", "calendar": "noleap"} valid_times = xr.DataArray([0, 1], dims=["time"], attrs=TIME_ATTRS, name="valid_times") invalid_times = xr.DataArray([1e36, 2e36], dims=["time"], attrs=TIME_ATTRS, name="invalid_times") ds = xr.merge([valid_times, invalid_times]) xr.decode_cf(ds) Traceback (most recent call last): File "/Users/spencer/software/xarray/xarray/coding/times.py", line 275, in decode_cf_datetime dates = _decode_datetime_with_pandas(flat_num_dates, units, calendar) File "/Users/spencer/software/xarray/xarray/coding/times.py", line 210, in _decode_datetime_with_pandas raise OutOfBoundsDatetime( pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Cannot decode times from a non-standard calendar, 'noleap', using pandas. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/spencer/software/xarray/xarray/coding/times.py", line 180, in _decode_cf_datetime_dtype result = decode_cf_datetime(example_value, units, calendar, use_cftime) File "/Users/spencer/software/xarray/xarray/coding/times.py", line 277, in decode_cf_datetime dates = _decode_datetime_with_cftime( File "/Users/spencer/software/xarray/xarray/coding/times.py", line 202, in _decode_datetime_with_cftime cftime.num2date(num_dates, units, calendar, only_use_cftime_datetimes=True) File "src/cftime/_cftime.pyx", line 605, in cftime._cftime.num2date File "src/cftime/_cftime.pyx", line 404, in cftime._cftime.cast_to_int OverflowError: time values outside range of 64 bit signed integers During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/spencer/software/xarray/xarray/conventions.py", line 655, in decode_cf vars, attrs, coord_names = decode_cf_variables( File "/Users/spencer/software/xarray/xarray/conventions.py", line 521, in decode_cf_variables new_vars[k] = decode_cf_variable( File "/Users/spencer/software/xarray/xarray/conventions.py", line 369, in decode_cf_variable var = times.CFDatetimeCoder(use_cftime=use_cftime).decode(var, name=name) File "/Users/spencer/software/xarray/xarray/coding/times.py", line 687, in decode dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime) File "/Users/spencer/software/xarray/xarray/coding/times.py", line 190, in _decode_cf_datetime_dtype raise ValueError(msg) ValueError: unable to decode time units 'days since 0001-01-01' with "calendar 'noleap'". Try opening your dataset with decode_times=False or installing cftime if it is not installed. ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7145/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1117563249	I_kwDOAMm_X85CnKlx	6204	[Bug]: cannot chunk a DataArray that originated as a coordinate	spencerkclark 6628425	open	1	2022-01-28T15:56:44Z	2022-03-16T04:18:46Z		MEMBER	What happened? If I construct the following DataArray, and try to chunk its `"x"` coordinate, I get back a NumPy-backed DataArray: ``` In [2]: a = xr.DataArray([1, 2, 3], dims=["x"], coords=[[4, 5, 6]]) In [3]: a.x.chunk() Out[3]: <xarray.DataArray 'x' (x: 3)> array([4, 5, 6]) Coordinates: * x (x) int64 4 5 6 If I construct a copy of the `"x"` coordinate, things work as I would expect: In [4]: x = xr.DataArray(a.x, dims=a.x.dims, coords=a.x.coords, name="x") In [5]: x.chunk() Out[5]: <xarray.DataArray 'x' (x: 3)> dask.array<xarray-\<this-array>, shape=(3,), dtype=int64, chunksize=(3,), chunktype=numpy.ndarray> Coordinates: * x (x) int64 4 5 6 ``` What did you expect to happen? I would expect the following to happen: ``` In [2]: a = xr.DataArray([1, 2, 3], dims=["x"], coords=[[4, 5, 6]]) In [3]: a.x.chunk() Out[3]: <xarray.DataArray 'x' (x: 3)> dask.array<xarray-\<this-array>, shape=(3,), dtype=int64, chunksize=(3,), chunktype=numpy.ndarray> Coordinates: * x (x) int64 4 5 6 ``` Minimal Complete Verifiable Example No response Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS commit: None python: 3.7.10 \| packaged by conda-forge \| (default, Feb 19 2021, 15:59:12) [Clang 11.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.5 libnetcdf: 4.6.3 xarray: 0.20.1 pandas: 1.3.5 numpy: 1.19.4 scipy: 1.5.4 netCDF4: 1.5.5 pydap: None h5netcdf: 0.8.1 h5py: 2.10.0 Nio: None zarr: 2.7.0 cftime: 1.2.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.22.0 distributed: None matplotlib: 3.2.2 cartopy: 0.19.0.post1 seaborn: None numbagg: None fsspec: 2021.06.0 cupy: None pint: 0.15 sparse: None setuptools: 49.6.0.post20210108 pip: 20.2.4 conda: 4.10.1 pytest: 6.0.1 IPython: 7.27.0 sphinx: 3.2.1	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6204/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
849771808	MDU6SXNzdWU4NDk3NzE4MDg=	5107	Converting `cftime.datetime` objects to `np.datetime64` values through `astype`	spencerkclark 6628425	open	0	2021-04-04T01:02:55Z	2021-10-05T00:00:36Z		MEMBER	The discussion of the use of the `indexes` property in #5102 got me thinking about this StackOverflow answer. For a while I have thought that my answer there isn't very satisfying, not only because it relies on this somewhat obscure `indexes` property, but also because it only works on dimension coordinates -- i.e. something that would be backed by an index. Describe the solution you'd like It would be better if we could do this conversion with `astype`, e.g. `da.astype("datetime64[ns]")`. This would allow conversion to `datetime64` values for all `cftime.datetime` DataArrays -- dask-backed or NumPy-backed, 1D or ND -- through a fairly standard and well-known method. To my surprise, while you do not get the nice calendar-switching warning that `CFTimeIndex.to_datetimeindex` provides, this actually already kind of seems to work (?!): ``` In [1]: import xarray as xr In [2]: times = xr.cftime_range("2000", periods=6, calendar="noleap") In [3]: da = xr.DataArray(times.values.reshape((2, 3)), dims=["a", "b"]) In [4]: da.astype("datetime64[ns]") Out[4]: <xarray.DataArray (a: 2, b: 3)> array([['2000-01-01T00:00:00.000000000', '2000-01-02T00:00:00.000000000', '2000-01-03T00:00:00.000000000'], ['2000-01-04T00:00:00.000000000', '2000-01-05T00:00:00.000000000', '2000-01-06T00:00:00.000000000']], dtype='datetime64[ns]') Dimensions without coordinates: a, b ``` NumPy obviously does not officially support this -- nor would I expect it to -- so I would be wary of simply documenting this behavior as is. Would it be reasonable for us to modify `xarray.core.duck_array_ops.astype` to explicitly implement this conversion ourselves for `cftime.datetime` arrays? This way we could ensure this was always supported, and we could include appropriate errors for out-of-bounds times (the NumPy method currently overflows in that case) and warnings for switching from non-standard calendars.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5107/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
802734042	MDU6SXNzdWU4MDI3MzQwNDI=	4870	Time encoding error associated with cftime > 1.4.0	spencerkclark 6628425	closed	0	2021-02-06T16:15:20Z	2021-02-07T23:12:30Z	2021-02-07T23:12:30Z	MEMBER	As of cftime > 1.4.0, the return type of `cftime.date2num` can either be an integer or float. An integer dtype is used if the times can all be encoded exactly with the provided units; otherwise a float dtype is used. This causes problems in our current encoding pipeline, because we call `cftime.date2num` on dates one at a time through `np.vectorize`, and `np.vectorize` infers the type of the full returned array based on the result of the first function evaluation. If the first result is an integer, then the full array will be assumed to have an integer dtype, and any values that should be floats are cast as integers. What happened: ``` In [1]: import cftime; import numpy as np; import xarray as xr In [2]: times = np.array([cftime.DatetimeGregorian(2000, 1, 1), cftime.DatetimeGregorian(2000, 1, 1, 1)]) In [3]: xr.coding.times._encode_datetime_with_cftime(times, "days since 2000-01-01", calendar="gregorian") Out[3]: array([0, 0]) ``` What you expected to happen: `In [3]: xr.coding.times._encode_datetime_with_cftime(times, "days since 2000-01-01", calendar="gregorian") Out[3]: array([0. , 0.04166667])` A solution here would be to encode the times with a list comprehension instead, and cast the final result to an array, in which case NumPy infers the dtype in a more sensible way. Environment: Output of <tt>xr.show_versions()</tt> ``` INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 \| packaged by conda-forge \| (default, Jul 1 2019, 14:38:56) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 20.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.6.2 xarray: 0.16.2.dev175+g8cc34cb4.d20210201 pandas: 1.1.3 numpy: 1.19.1 scipy: 1.2.1 netCDF4: 1.5.1.2 pydap: installed h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: 1.4.1 nc_time_axis: 1.1.1.dev5+g531dd0d PseudoNetCDF: None rasterio: 1.0.25 cfgrib: 0.9.7.1 iris: None bottleneck: 1.2.1 dask: 2.11.0 distributed: 2.11.0 matplotlib: 3.3.2 cartopy: None seaborn: 0.9.0 numbagg: installed pint: None setuptools: 51.0.0.post20201207 pip: 19.2.2 conda: None pytest: 5.0.1 IPython: 7.10.1 sphinx: 3.0.4 ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4870/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
360420464	MDU6SXNzdWUzNjA0MjA0NjQ=	2416	Indicate calendar type in CFTimeIndex repr	spencerkclark 6628425	closed	5	2018-09-14T19:07:04Z	2020-11-20T01:00:41Z	2020-07-23T10:42:29Z	MEMBER	Currently CFTimeIndex uses the default repr it inherits from `pandas.Index`. This just displays a potentially-truncated version of the values in the index, along with the index's data type and length, e.g.: `CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00, 2000-01-03 00:00:00, 2000-01-04 00:00:00, 2000-01-05 00:00:00, 2000-01-06 00:00:00, 2000-01-07 00:00:00, 2000-01-08 00:00:00, 2000-01-09 00:00:00, 2000-01-10 00:00:00, ... 2000-12-22 00:00:00, 2000-12-23 00:00:00, 2000-12-24 00:00:00, 2000-12-25 00:00:00, 2000-12-26 00:00:00, 2000-12-27 00:00:00, 2000-12-28 00:00:00, 2000-12-29 00:00:00, 2000-12-30 00:00:00, 2000-12-31 00:00:00], dtype='object', length=366)` It would be nice if the repr also included an indication of the calendar type of the index, since different indexes could have different calendar types. For example: `CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00, 2000-01-03 00:00:00, 2000-01-04 00:00:00, 2000-01-05 00:00:00, 2000-01-06 00:00:00, 2000-01-07 00:00:00, 2000-01-08 00:00:00, 2000-01-09 00:00:00, 2000-01-10 00:00:00, ... 2000-12-22 00:00:00, 2000-12-23 00:00:00, 2000-12-24 00:00:00, 2000-12-25 00:00:00, 2000-12-26 00:00:00, 2000-12-27 00:00:00, 2000-12-28 00:00:00, 2000-12-29 00:00:00, 2000-12-30 00:00:00, 2000-12-31 00:00:00], dtype='object', length=366, calendar='proleptic_gregorian')`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2416/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
431970156	MDU6SXNzdWU0MzE5NzAxNTY=	2886	Expose use_cftime option in open_zarr	spencerkclark 6628425	closed	7	2019-04-11T11:24:48Z	2020-09-02T15:19:32Z	2020-09-02T15:19:32Z	MEMBER	`use_cftime` was recently added as an option to `decode_cf` and `open_dataset` to give users a little more control over how times are decoded (#2759). It would be good if it was also available for `open_zarr`. This perhaps doesn't have quite the importance, because `open_zarr` only works for single data stores, so there is no risk of decoding times to different types (e.g. as there was for `open_mfdataset`, #1263); however, it would still be nice to be able to silence serialization warnings that result from decoding times to cftime objects in some instances, e.g. #2754.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2886/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
539648897	MDU6SXNzdWU1Mzk2NDg4OTc=	3641	interp with long cftime coordinates raises an error	spencerkclark 6628425	closed	8	2019-12-18T12:23:16Z	2020-01-26T14:10:37Z	2020-01-26T14:10:37Z	MEMBER	MCVE Code Sample ``` In [1]: import xarray as xr In [2]: times = xr.cftime_range('0001', periods=3, freq='500Y') In [3]: da = xr.DataArray(range(3), dims=['time'], coords=[times]) In [4]: da.interp(time=['0002-05-01']) TypeError Traceback (most recent call last) <ipython-input-4-f781cb4d500e> in <module> ----> 1 da.interp(time=['0002-05-01']) ~/Software/miniconda3/envs/xarray-tests/lib/python3.7/site-packages/xarray/core/dataarray.py in interp(self, coords, method, assume_sorted, kwargs, coords_kwargs) 1353 kwargs=kwargs, 1354 assume_sorted=assume_sorted, -> 1355 coords_kwargs, 1356 ) 1357 return self._from_temp_dataset(ds) ~/Software/miniconda3/envs/xarray-tests/lib/python3.7/site-packages/xarray/core/dataset.py in interp(self, coords, method, assume_sorted, kwargs, coords_kwargs) 2565 if k in var.dims 2566 } -> 2567 variables[name] = missing.interp(var, var_indexers, method, kwargs) 2568 elif all(d not in indexers for d in var.dims): 2569 # keep unrelated object array ~/Software/miniconda3/envs/xarray-tests/lib/python3.7/site-packages/xarray/core/missing.py in interp(var, indexes_coords, method, *kwargs) 607 new_dims = broadcast_dims + list(destination[0].dims) 608 interped = interp_func( --> 609 var.transpose(original_dims).data, x, destination, method, kwargs 610 ) 611 ~/Software/miniconda3/envs/xarray-tests/lib/python3.7/site-packages/xarray/core/missing.py in interp_func(var, x, new_x, method, kwargs) 683 ) 684 --> 685 return _interpnd(var, x, new_x, func, kwargs) 686 687 ~/Software/miniconda3/envs/xarray-tests/lib/python3.7/site-packages/xarray/core/missing.py in _interpnd(var, x, new_x, func, kwargs) 698 699 def _interpnd(var, x, new_x, func, kwargs): --> 700 x, new_x = _floatize_x(x, new_x) 701 702 if len(x) == 1: ~/Software/miniconda3/envs/xarray-tests/lib/python3.7/site-packages/xarray/core/missing.py in _floatize_x(x, new_x) 556 # represented by float. 557 xmin = x[i].values.min() --> 558 x[i] = x[i]._to_numeric(offset=xmin, dtype=np.float64) 559 new_x[i] = new_x[i]._to_numeric(offset=xmin, dtype=np.float64) 560 return x, new_x ~/Software/miniconda3/envs/xarray-tests/lib/python3.7/site-packages/xarray/core/variable.py in _to_numeric(self, offset, datetime_unit, dtype) 2001 """ 2002 numeric_array = duck_array_ops.datetime_to_numeric( -> 2003 self.data, offset, datetime_unit, dtype 2004 ) 2005 return type(self)(self.dims, numeric_array, self._attrs) ~/Software/miniconda3/envs/xarray-tests/lib/python3.7/site-packages/xarray/core/duck_array_ops.py in datetime_to_numeric(array, offset, datetime_unit, dtype) 410 if array.dtype.kind in "mM": 411 return np.where(isnull(array), np.nan, array.astype(dtype)) --> 412 return array.astype(dtype) 413 414 TypeError: float() argument must be a string or a number, not 'datetime.timedelta' ``` Problem Description In principle we should be able to get this to work. The issue stems from the following logic in `datetime_to_numeric`: https://github.com/pydata/xarray/blob/45fd0e63f43cf313b022a33aeec7f0f982e1908b/xarray/core/duck_array_ops.py#L402-L404 Here we are relying on pandas to convert an array of `datetime.timedelta` objects to an array with dtype `timedelta64[ns]`. If the array of `datetime.timedelta` objects cannot be safely converted to `timedelta64[ns]` (e.g. due to an integer overflow) then this line is silently a no-op which leads to the error downstream at the dtype conversion step. This is my fault originally for suggesting this approach, https://github.com/pydata/xarray/pull/2668#discussion_r247271576. ~~To solve this I think we'll need to write our own logic to convert `datetime.timedelta` objects to numeric values instead of relying on pandas/NumPy.~~ (as @huard notes we should be able to use NumPy directly here for the conversion). We should not consider ourselves beholden to using nanosecond resolution for a couple of reasons: 1. `datetime.timedelta` objects do not natively support nanosecond resolution; they have microsecond resolution natively, which corresponds with a NumPy timedelta range of +/- 2.9e5 years. 2. One motivation/use-case for cftime dates is that they can represent long time periods that cannot be represented using a standard `DatetimeIndex`. We should do everything we can to support this with a `CFTimeIndex`. @huard @dcherian this is an important issue we'll need to solve to be able to use a fixed offset for cftime dates for an application like `polyfit`/`polyval`. xref: #3349 and #3631. Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 \| packaged by conda-forge \| (default, Jul 1 2019, 14:38:56) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 19.0.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: None xarray: 0.14.1 pandas: 0.25.0 numpy: 1.17.0 scipy: 1.3.1 netCDF4: None pydap: installed h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.0.25 cfgrib: 0.9.7.1 iris: None bottleneck: 1.2.1 dask: 2.9.0+2.gd0daa5bc distributed: 2.9.0 matplotlib: 3.1.1 cartopy: None seaborn: 0.9.0 numbagg: installed setuptools: 42.0.2.post20191201 pip: 19.2.2 conda: None pytest: 5.0.1 IPython: 7.10.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3641/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
538068264	MDU6SXNzdWU1MzgwNjgyNjQ=	3624	Issue serializing arrays of times with certain dtype and _FillValue encodings	spencerkclark 6628425	closed	0	2019-12-15T15:44:08Z	2020-01-15T15:22:30Z	2020-01-15T15:22:30Z	MEMBER	MCVE Code Sample ``` In [1]: import numpy as np; import pandas as pd; import xarray as xr In [2]: times = pd.date_range('2000', periods=3) In [3]: da = xr.DataArray(times, dims=['a'], coords=[[1, 2, 3]], name='foo') In [4]: da.encoding['_FillValue'] = 1.0e20 In [5]: da.encoding['dtype'] = np.dtype('float64') In [6]: da.to_dataset().to_netcdf('test.nc') OverflowError Traceback (most recent call last) <ipython-input-6-cbc6b2cfdf9a> in <module> ----> 1 da.to_dataset().to_netcdf('test.nc') ~/Software/xarray/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1548 unlimited_dims=unlimited_dims, 1549 compute=compute, -> 1550 invalid_netcdf=invalid_netcdf, 1551 ) 1552 ~/Software/xarray/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1071 # to be parallelized with dask 1072 dump_to_store( -> 1073 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1074 ) 1075 if autoclose: ~/Software/xarray/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1117 variables, attrs = encoder(variables, attrs) 1118 -> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) 1120 1121 ~/Software/xarray/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 291 writer = ArrayWriter() 292 --> 293 variables, attributes = self.encode(variables, attributes) 294 295 self.set_attributes(attributes) ~/Software/xarray/xarray/backends/common.py in encode(self, variables, attributes) 380 # All NetCDF files get CF encoded by default, without this attempting 381 # to write times, for example, would fail. --> 382 variables, attributes = cf_encoder(variables, attributes) 383 variables = {k: self.encode_variable(v) for k, v in variables.items()} 384 attributes = {k: self.encode_attribute(v) for k, v in attributes.items()} ~/Software/xarray/xarray/conventions.py in cf_encoder(variables, attributes) 758 _update_bounds_encoding(variables) 759 --> 760 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()} 761 762 # Remove attrs from bounds variables (issue #2921) ~/Software/xarray/xarray/conventions.py in <dictcomp>(.0) 758 _update_bounds_encoding(variables) 759 --> 760 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()} 761 762 # Remove attrs from bounds variables (issue #2921) ~/Software/xarray/xarray/conventions.py in encode_cf_variable(var, needs_copy, name) 248 variables.UnsignedIntegerCoder(), 249 ]: --> 250 var = coder.encode(var, name=name) 251 252 # TODO(shoyer): convert all of these to use coders, too: ~/Software/xarray/xarray/coding/variables.py in encode(self, variable, name) 163 if fv is not None: 164 # Ensure _FillValue is cast to same dtype as data's --> 165 encoding["_FillValue"] = data.dtype.type(fv) 166 fill_value = pop_to(encoding, attrs, "_FillValue", name=name) 167 if not pd.isnull(fill_value): OverflowError: Python int too large to convert to C long ``` Expected Output I think this should succeed in writing to a netCDF file (it worked in xarray 0.14.0 and earlier). Problem Description I think this (admittedly very subtle) issue was introduced in https://github.com/pydata/xarray/pull/3502. Essentially at the time `data` enters `CFMaskCoder.encode` it does not necessarily have the `dtype` it will ultimately be encoded with. In the case of this example, `data` has type `int64`, but when it will be stored in the netCDF file it will be a double-precision float. A possible solution here might be to rely on `encoding['dtype']` (if it exists) to determine the type to cast the encoding values for `'_FillValue'` and `'missing_value'` to, instead of relying solely on `data.dtype` (maybe use that as a fallback). cc: @spencerahill Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 \| packaged by conda-forge \| (default, Dec 6 2019, 08:36:57) [Clang 9.0.0 (tags/RELEASE_900/final)] python-bits: 64 OS: Darwin OS-release: 19.0.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.1 xarray: master pandas: 0.25.3 numpy: 1.17.3 scipy: 1.3.2 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.9.0 distributed: 2.9.0 matplotlib: 3.1.2 cartopy: None seaborn: None numbagg: None setuptools: 42.0.2.post20191203 pip: 19.3.1 conda: None pytest: 5.3.2 IPython: 7.10.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3624/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
534404865	MDU6SXNzdWU1MzQ0MDQ4NjU=	3603	Test failure with dask master	spencerkclark 6628425	closed	2	2019-12-07T13:55:54Z	2019-12-30T17:46:44Z	2019-12-30T17:46:44Z	MEMBER	It looks like https://github.com/dask/dask/pull/5684, which adds `nanmedian` to dask (nice!), caused the error message to change for when one tries to reduce an array over all axes via `median` (i.e. it no longer contains `'dask'`, because xarray now dispatches to the newly added dask function instead of failing before trying that). @dcherian do you have thoughts on how to best address this? Should we just remove that check in `test_reduce`? ``` =================================== FAILURES =================================== _____ TestVariable.test_reduce _______ error = <class 'NotImplementedError'>, pattern = 'dask' `@contextmanager def raises_regex(error, pattern): __tracebackhide__ = True with pytest.raises(error) as excinfo:` `yield` xarray/tests/init.py:104: self = <xarray.tests.test_dask.TestVariable object at 0x7fd14f8e9c88> def test_reduce(self): u = self.eager_var v = self.lazy_var self.assertLazyAndAllClose(u.mean(), v.mean()) self.assertLazyAndAllClose(u.std(), v.std()) with raise_if_dask_computes(): actual = v.argmax(dim="x") self.assertLazyAndAllClose(u.argmax(dim="x"), actual) with raise_if_dask_computes(): actual = v.argmin(dim="x") self.assertLazyAndAllClose(u.argmin(dim="x"), actual) self.assertLazyAndAllClose((u > 1).any(), (v > 1).any()) self.assertLazyAndAllClose((u < 1).all("x"), (v < 1).all("x")) with raises_regex(NotImplementedError, "dask"): `v.median()` xarray/tests/test_dask.py:220: self = <xarray.Variable (x: 4, y: 6)> dask.array<array, shape=(4, 6), dtype=float64, chunksize=(2, 2), chunktype=numpy.ndarray> dim = None, axis = None, skipna = None, kwargs = {} `def wrapped_func(self, dim=None, axis=None, skipna=None, kwargs):` `return self.reduce(func, dim, axis, skipna=skipna, kwargs)` xarray/core/common.py:46: self = <xarray.Variable (x: 4, y: 6)> dask.array<array, shape=(4, 6), dtype=float64, chunksize=(2, 2), chunktype=numpy.ndarray> func = <function _create_nan_agg_method.\<locals>.f at 0x7fd16c228378> dim = None, axis = None, keep_attrs = None, keepdims = False, allow_lazy = True kwargs = {'skipna': None} input_data = dask.array<array, shape=(4, 6), dtype=float64, chunksize=(2, 2), chunktype=numpy.ndarray> def reduce( self, func, dim=None, axis=None, keep_attrs=None, keepdims=False, allow_lazy=None, kwargs, ): """Reduce this array by applying `func` along some dimension(s). Parameters ---------- func : function Function which can be called in the form `func(x, axis=axis, kwargs)` to return the result of reducing an np.ndarray over an integer valued axis. dim : str or sequence of str, optional Dimension(s) over which to apply `func`. axis : int or sequence of int, optional Axis(es) over which to apply `func`. Only one of the 'dim' and 'axis' arguments can be supplied. If neither are supplied, then the reduction is calculated over the flattened array (by calling `func(x)` without an axis argument). keep_attrs : bool, optional If True, the variable's attributes (`attrs`) will be copied from the original object to the new one. If False (default), the new object will be returned without attributes. keepdims : bool, default False If True, the dimensions which are reduced are left in the result as dimensions of size one kwargs : dict Additional keyword arguments passed on to `func`. Returns ------- reduced : Array Array with summarized data and the indicated dimension(s) removed. """ if dim == ...: dim = None if dim is not None and axis is not None: raise ValueError("cannot supply both 'axis' and 'dim' arguments") if dim is not None: axis = self.get_axis_num(dim) if allow_lazy is not None: warnings.warn( "allow_lazy is deprecated and will be removed in version 0.16.0. It is now True by default.", DeprecationWarning, ) else: allow_lazy = True input_data = self.data if allow_lazy else self.values if axis is not None: data = func(input_data, axis=axis, kwargs) else: `data = func(input_data, kwargs)` xarray/core/variable.py:1534: values = dask.array<array, shape=(4, 6), dtype=float64, chunksize=(2, 2), chunktype=numpy.ndarray> axis = None, skipna = None, kwargs = {} func = <function nanmedian at 0x7fd16c226bf8>, nanname = 'nanmedian' def f(values, axis=None, skipna=None, kwargs): if kwargs.pop("out", None) is not None: raise TypeError(f"`out` is not valid for {name}") values = asarray(values) if coerce_strings and values.dtype.kind in "SU": values = values.astype(object) func = None if skipna or (skipna is None and values.dtype.kind in "cfO"): nanname = "nan" + name func = getattr(nanops, nanname) else: func = _dask_or_eager_func(name) try: `return func(values, axis=axis, *kwargs)` xarray/core/duck_array_ops.py:307: a = dask.array<array, shape=(4, 6), dtype=float64, chunksize=(2, 2), chunktype=numpy.ndarray> axis = None, out = None `def nanmedian(a, axis=None, out=None):` `return _dask_or_eager_func("nanmedian", eager_module=nputils)(a, axis=axis)` xarray/core/nanops.py:144: args = (dask.array<array, shape=(4, 6), dtype=float64, chunksize=(2, 2), chunktype=numpy.ndarray>,) kwargs = {'axis': None} dispatch_args = (dask.array<array, shape=(4, 6), dtype=float64, chunksize=(2, 2), chunktype=numpy.ndarray>,) wrapped = <function nanmedian at 0x7fd1737bcea0> `def f(args, *kwargs): if list_of_args: dispatch_args = args[0] else: dispatch_args = args[array_args] if any(isinstance(a, dask_array.Array) for a in dispatch_args): try: wrapped = getattr(dask_module, name) except AttributeError as e: raise AttributeError(f"{e}: requires dask >={requires_dask}") else: wrapped = getattr(eager_module, name)` `return wrapped(args, **kwargs)` xarray/core/duck_array_ops.py:47: a = dask.array<array, shape=(4, 6), dtype=float64, chunksize=(2, 2), chunktype=numpy.ndarray> axis = None, keepdims = False, out = None @derived_from(np) def nanmedian(a, axis=None, keepdims=False, out=None): """ This works by automatically chunking the reduced axes to a single chunk and then calling ``numpy.nanmedian`` function across the remaining dimensions """ if axis is None: raise NotImplementedError( `"The da.nanmedian function only works along an axis or a subset of axes. " "The full algorithm is difficult to do in parallel" )` E NotImplementedError: The da.nanmedian function only works along an axis or a subset of axes. The full algorithm is difficult to do in parallel /usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/dask/array/reductions.py:1299: NotImplementedError During handling of the above exception, another exception occurred: self = <xarray.tests.test_dask.TestVariable object at 0x7fd14f8e9c88> def test_reduce(self): u = self.eager_var v = self.lazy_var self.assertLazyAndAllClose(u.mean(), v.mean()) self.assertLazyAndAllClose(u.std(), v.std()) with raise_if_dask_computes(): actual = v.argmax(dim="x") self.assertLazyAndAllClose(u.argmax(dim="x"), actual) with raise_if_dask_computes(): actual = v.argmin(dim="x") self.assertLazyAndAllClose(u.argmin(dim="x"), actual) self.assertLazyAndAllClose((u > 1).any(), (v > 1).any()) self.assertLazyAndAllClose((u < 1).all("x"), (v < 1).all("x")) with raises_regex(NotImplementedError, "dask"): `v.median()` xarray/tests/test_dask.py:220: self = <contextlib._GeneratorContextManager object at 0x7fd14f8bcc50> type = <class 'NotImplementedError'> value = NotImplementedError('The da.nanmedian function only works along an axis or a subset of axes. The full algorithm is difficult to do in parallel') traceback = <traceback object at 0x7fd154597bc8> `def __exit__(self, type, value, traceback): if type is None: try: next(self.gen) except StopIteration: return False else: raise RuntimeError("generator didn't stop") else: if value is None: # Need to force instantiation so we can reliably # tell if we get the same exception back value = type() try:` `self.gen.throw(type, value, traceback)` E AssertionError: exception NotImplementedError('The da.nanmedian function only works along an axis or a subset of axes. The full algorithm is difficult to do in parallel') did not match pattern 'dask' ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3603/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
427398236	MDU6SXNzdWU0MjczOTgyMzY=	2856	Roundtripping between a dimension coordinate and scalar coordinate on a Dataset	spencerkclark 6628425	closed	4	2019-03-31T13:42:39Z	2019-04-04T21:58:24Z	2019-04-04T21:58:24Z	MEMBER	Code Sample, a copy-pastable example if possible In xarray 0.12.0 the following example produces a Dataset with no indexes: ``` In [1]: import xarray as xr In [2]: da = xr.DataArray([1], [('x', [0])], name='a') In [3]: da.to_dataset().isel(x=0).expand_dims('x').indexes Out[3]: ``` Expected Output In xarray 0.11.3 the roundtrip sequence above properly recovers the initial index along the `'x'` dimension: ``` In [1]: import xarray as xr In [2]: da = xr.DataArray([1], [('x', [0])], name='a') In [3]: da.to_dataset().isel(x=0).expand_dims('x').indexes Out[3]: x: Int64Index([0], dtype='int64', name='x') ``` Output of `xr.show_versions()` ``` INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 \| packaged by conda-forge \| (default, Feb 28 2019, 02:16:08) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.1 libnetcdf: 4.6.1 xarray: 0.12.0 pandas: 0.24.2 numpy: 1.13.1 scipy: 0.19.1 netCDF4: 1.4.0 pydap: None h5netcdf: 0.5.1 h5py: 2.8.0 Nio: None zarr: None cftime: 1.0.0 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.0 dask: 0.17.5 distributed: 1.21.8 matplotlib: 2.0.2 cartopy: None seaborn: None setuptools: 40.5.0 pip: 9.0.1 conda: None pytest: 3.10.0 IPython: 6.4.0 sphinx: 1.7.4 ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2856/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
408772665	MDU6SXNzdWU0MDg3NzI2NjU=	2761	'standard' calendar refers to 'proleptic_gregorian' in cftime_range rather than 'gregorian'	spencerkclark 6628425	closed	2	2019-02-11T13:06:05Z	2019-02-15T21:58:16Z	2019-02-15T21:58:16Z	MEMBER	Code Sample, a copy-pastable example if possible ```python In [1]: import xarray In [2]: xarray.cftime_range('2000', periods=3, calendar='standard').values Out[2]: array([cftime.DatetimeProlepticGregorian(2000, 1, 1, 0, 0, 0, 0, -1, 1), cftime.DatetimeProlepticGregorian(2000, 1, 2, 0, 0, 0, 0, -1, 1), cftime.DatetimeProlepticGregorian(2000, 1, 3, 0, 0, 0, 0, -1, 1)], dtype=object) ``` Problem description When writing `cftime_range` I used dates from a proleptic Gregorian calendar when the calendar type was specified as `'standard'`. While this is consistent with Python's built-in `datetime.datetime` (which uses a proleptic Gregorian calendar), this differs from the behavior in `cftime.num2date` and ultimately the CF conventions, which state that `'standard'` should refer to the true Gregorian calendar. My inclination is that considering "cf" is in the name of `cftime_range`, we should adhere to those conventions as closely as possible (and hence the way I initially coded things was a mistake). Expected Output `python In [2]: xarray.cftime_range('2000', periods=3, calendar='standard').values Out[2]: array([cftime.DatetimeGregorian(2000, 1, 1, 0, 0, 0, 0, -1, 1), cftime.DatetimeGregorian(2000, 1, 2, 0, 0, 0, 0, -1, 1), cftime.DatetimeGregorian(2000, 1, 3, 0, 0, 0, 0, -1, 1)], dtype=object)` Do others agree that we should fix this? If we were to make this change, would it be appropriate to consider it a bug and simply make the breaking change immediately, or might we need a deprecation cycle?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2761/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
400504690	MDU6SXNzdWU0MDA1MDQ2OTA=	2688	dropna() for a Series indexed by a CFTimeIndex	spencerkclark 6628425	closed	3	2019-01-17T23:15:29Z	2019-02-02T06:56:12Z	2019-02-02T06:56:12Z	MEMBER	Code Sample, a copy-pastable example if possible Currently something like the following raises an error: ``` In [1]: import xarray as xr In [2]: import pandas as pd In [3]: import numpy as np In [4]: times = xr.cftime_range('2000', periods=3) In [5]: series = pd.Series(np.array([0., np.nan, 1.]), index=times) In [6]: series Out[6]: 2000-01-01 00:00:00 0.0 2000-01-02 00:00:00 NaN 2000-01-03 00:00:00 1.0 dtype: float64 In [7]: series.dropna() TypeError Traceback (most recent call last) <ipython-input-7-45eb0c023203> in <module> ----> 1 series.dropna() ~/pandas/pandas/core/series.py in dropna(self, axis, inplace, kwargs) 4169 4170 if self._can_hold_na: -> 4171 result = remove_na_arraylike(self) 4172 if inplace: 4173 self._update_inplace(result) ~/pandas/pandas/core/dtypes/missing.py in remove_na_arraylike(arr) 539 return arr[notna(arr)] 540 else: --> 541 return arr[notna(lib.values_from_object(arr))] ~/pandas/pandas/core/series.py in getitem**(self, key) 801 key = com.apply_if_callable(key, self) 802 try: --> 803 result = self.index.get_value(self, key) 804 805 if not is_scalar(result): ~/xarray-dev/xarray/xarray/coding/cftimeindex.py in get_value(self, series, key) 321 """Adapted from pandas.tseries.index.DatetimeIndex.get_value""" 322 if not isinstance(key, slice): --> 323 return series.iloc[self.get_loc(key)] 324 else: 325 return series.iloc[self.slice_indexer( ~/xarray-dev/xarray/xarray/coding/cftimeindex.py in get_loc(self, key, method, tolerance) 300 else: 301 return pd.Index.get_loc(self, key, method=method, --> 302 tolerance=tolerance) 303 304 def _maybe_cast_slice_bound(self, label, side, kind): ~/pandas/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 2595 'backfill or nearest lookups') 2596 try: -> 2597 return self._engine.get_loc(key) 2598 except KeyError: 2599 return self._engine.get_loc(self._maybe_cast_indexer(key)) ~/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() ~/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() TypeError: '[ True False True]' is an invalid key ``` Problem description We currently rely on this in the resampling logic within xarray for a Series indexed by a DatetimeIndex: https://github.com/pydata/xarray/blob/dc87dea52351835af472d131f70a7f7603b3100e/xarray/core/groupby.py#L268 It would be nice if we could do the same with a Series indexed by a CFTimeIndex, e.g. in #2593. Expected Output `In [7]: series.dropna() Out[7]: 2000-01-01 00:00:00 0.0 2000-01-03 00:00:00 1.0 dtype: float64` Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.1 \| packaged by conda-forge \| (default, Nov 13 2018, 09:50:42) [Clang 9.0.0 (clang-900.0.37)] python-bits: 64 OS: Darwin OS-release: 18.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.10.9+117.g80914e0.dirty pandas: 0.24.0.dev0+1332.g5d134ec numpy: 1.15.4 scipy: 1.1.0 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None cyordereddict: None dask: 1.0.0 distributed: 1.25.2 matplotlib: 3.0.2 cartopy: None seaborn: 0.9.0 setuptools: 40.6.3 pip: 18.1 conda: None pytest: 3.10.1 IPython: 7.2.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2688/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
398918281	MDU6SXNzdWUzOTg5MTgyODE=	2671	Enable subtracting a scalar cftime.datetime object from a CFTimeIndex	spencerkclark 6628425	closed	0	2019-01-14T14:42:12Z	2019-01-30T16:45:10Z	2019-01-30T16:45:10Z	MEMBER	Code Sample, a copy-pastable example if possible ``` In [1]: import xarray In [2]: times = xarray.cftime_range('2000', periods=3) In [3]: times - times[0] TypeError Traceback (most recent call last) <ipython-input-3-97cbca76a8af> in <module> ----> 1 times - times[0] ~/xarray-dev/xarray/xarray/coding/cftimeindex.py in sub(self, other) 417 return CFTimeIndex(np.array(self) - other.to_pytimedelta()) 418 else: --> 419 return CFTimeIndex(np.array(self) - other) 420 421 def _add_delta(self, deltas): ~/xarray-dev/xarray/xarray/coding/cftimeindex.py in new(cls, data, name) 238 result = object.new(cls) 239 result._data = np.array(data, dtype='O') --> 240 assert_all_valid_date_type(result._data) 241 result.name = name 242 return result ~/xarray-dev/xarray/xarray/coding/cftimeindex.py in assert_all_valid_date_type(data) 194 raise TypeError( 195 'CFTimeIndex requires cftime.datetime ' --> 196 'objects. Got object of {}.'.format(date_type)) 197 if not all(isinstance(value, date_type) for value in data): 198 raise TypeError( TypeError: CFTimeIndex requires cftime.datetime objects. Got object of <class 'datetime.timedelta'>. ``` Problem description This should result in a `pandas.TimedeltaIndex`, as is the case for a `pandas.DatetimeIndex`: ``` In [4]: import pandas In [5]: times = pandas.date_range('2000', periods=3) In [6]: times - times[0] Out[6]: TimedeltaIndex(['0 days', '1 days', '2 days'], dtype='timedelta64[ns]', freq=None) ``` Expected Output ``` In [1]: import xarray In [2]: times = xarray.cftime_range('2000', periods=3) In [3]: times - times[0] Out[3]: TimedeltaIndex(['0 days', '1 days', '2 days'], dtype='timedelta64[ns]', freq=None) ``` Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.6.6 \| packaged by conda-forge \| (default, Jul 26 2018, 09:55:02) [GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] python-bits: 64 OS: Darwin OS-release: 18.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.10.9+127.ga7129d1 pandas: 0.24.0.dev0+1332.g5d134ec numpy: 1.15.4 scipy: 1.1.0 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None cyordereddict: None dask: 1.0.0 distributed: 1.25.1 matplotlib: 3.0.2 cartopy: None seaborn: 0.9.0 setuptools: 40.6.3 pip: 18.1 conda: None pytest: 3.10.1 IPython: 7.2.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2671/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
369751771	MDU6SXNzdWUzNjk3NTE3NzE=	2484	Enable add/sub operations involving a CFTimeIndex and a TimedeltaIndex	spencerkclark 6628425	closed	1	2018-10-13T01:00:28Z	2018-10-17T04:00:57Z	2018-10-17T04:00:57Z	MEMBER	``` In [1]: import xarray as xr In [2]: start_dates = xr.cftime_range('1999-12', periods=12, freq='M') In [3]: end_dates = start_dates.shift(1, 'M') In [4]: end_dates - start_dates TypeError Traceback (most recent call last) <ipython-input-4-43c24409020b> in <module>() ----> 1 end_dates - start_dates /Users/spencerclark/xarray-dev/xarray/xarray/coding/cftimeindex.pyc in sub(self, other) 365 366 def sub(self, other): --> 367 return CFTimeIndex(np.array(self) - other) 368 369 TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'CFTimeIndex' ``` Problem description Subtracting one DatetimeIndex from another produces a TimedeltaIndex: ``` In [5]: import pandas as pd In [6]: start_dates = pd.date_range('1999-12', periods=12, freq='M') In [7]: end_dates = start_dates.shift(1, 'M') In [8]: end_dates - start_dates Out[8]: TimedeltaIndex(['31 days', '29 days', '31 days', '30 days', '31 days', '30 days', '31 days', '31 days', '30 days', '31 days', '30 days', '31 days'], dtype='timedelta64[ns]', freq=None) ``` This should also be straightforward to enable for CFTimeIndexes and would be useful, for example, in the problem described in https://github.com/pydata/xarray/issues/2481#issue-369639339. Expected Output ``` In [1]: import xarray as xr In [2]: start_dates = xr.cftime_range('1999-12', periods=12, freq='M') In [3]: end_dates = start_dates.shift(1, 'M') In [4]: end_dates - start_dates Out[4]: TimedeltaIndex(['31 days', '29 days', '31 days', '30 days', '31 days', '30 days', '31 days', '31 days', '30 days', '31 days', '30 days', '31 days'], dtype='timedelta64[ns]', freq=None) ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2484/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
324758225	MDU6SXNzdWUzMjQ3NTgyMjU=	2165	CFTimeIndex improperly handles string slice for length-1 indexes	spencerkclark 6628425	closed	0	2018-05-21T00:51:55Z	2018-05-21T08:02:35Z	2018-05-21T08:02:35Z	MEMBER	Code Sample, a copy-pastable example if possible ``` In [1]: import xarray as xr In [2]: import cftime In [3]: index = xr.CFTimeIndex([cftime.DatetimeNoLeap(1, 1, 1)]) In [4]: da = xr.DataArray([1], coords=[index], dims=['time']) In [5]: da.sel(time=slice('0001', '0001')) Out[5]: <xarray.DataArray (time: 0)> array([], dtype=int64) Coordinates: * time (time) object ``` Problem description When a `CFTimeIndex` is created with a single element, slicing using strings does not work; the example above should behave analogously to as it does when using a `DatetimeIndex`: ``` In [9]: import pandas as pd In [10]: index = pd.DatetimeIndex(['2000-01-01']) In [11]: da = xr.DataArray([1], coords=[index], dims=['time']) In [12]: da.sel(time=slice('2000', '2000')) Out[12]: <xarray.DataArray (time: 1)> array([1]) Coordinates: * time (time) datetime64[ns] 2000-01-01 ``` I have a fix for this, which I will push shortly. Expected Output `In [5]: da.sel(time=slice('0001', '0001')) Out[5]: <xarray.DataArray (time: 1)> array([1]) Coordinates: * time (time) object 0001-01-01 00:00:00` Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 17.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.4 pandas: 0.20.2 numpy: 1.13.1 scipy: 0.19.1 netCDF4: 1.4.0 h5netcdf: 0.5.1 h5py: 2.8.0 Nio: None zarr: None bottleneck: 1.2.0 cyordereddict: None dask: 0.15.0 distributed: 1.17.1 matplotlib: 2.0.2 cartopy: None seaborn: None setuptools: 33.1.1.post20170320 pip: 9.0.1 conda: None pytest: 3.1.2 IPython: 6.1.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2165/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
322591813	MDU6SXNzdWUzMjI1OTE4MTM=	2127	cftime.datetime serialization example failing in latest doc build	spencerkclark 6628425	closed	9	2018-05-13T12:58:15Z	2018-05-14T19:17:37Z	2018-05-14T19:17:37Z	MEMBER	Code Sample, a copy-pastable example if possible ``` In [1]: from itertools import product In [2]: import numpy as np In [3]: import xarray as xr In [4]: from cftime import DatetimeNoLeap In [5]: dates = [DatetimeNoLeap(year, month, 1) for year, month in product(range ...: (1, 3), range(1, 13))] In [6]: with xr.set_options(enable_cftimeindex=True): ...: da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'], name='foo') ...: In [7]: da.to_netcdf('test.nc') TypeError Traceback (most recent call last) <ipython-input-7-306dbf0ba669> in <module>() ----> 1 da.to_netcdf('test.nc') /Users/spencerclark/xarray-dev/xarray/xarray/core/dataarray.pyc in to_netcdf(self, args, kwargs) 1514 dataset = self.to_dataset() 1515 -> 1516 return dataset.to_netcdf(args, *kwargs) 1517 1518 def to_dict(self): /Users/spencerclark/xarray-dev/xarray/xarray/core/dataset.pyc in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims) 1143 return to_netcdf(self, path, mode, format=format, group=group, 1144 engine=engine, encoding=encoding, -> 1145 unlimited_dims=unlimited_dims) 1146 1147 def to_zarr(self, store=None, mode='w-', synchronizer=None, group=None, /Users/spencerclark/xarray-dev/xarray/xarray/backends/api.pyc in to_netcdf(dataset, path_or_file, mode, format, group, engine, writer, encoding, unlimited_dims) 681 try: 682 dataset.dump_to_store(store, sync=sync, encoding=encoding, --> 683 unlimited_dims=unlimited_dims) 684 if path_or_file is None: 685 return target.getvalue() /Users/spencerclark/xarray-dev/xarray/xarray/core/dataset.pyc in dump_to_store(self, store, encoder, sync, encoding, unlimited_dims) 1073 1074 store.store(variables, attrs, check_encoding, -> 1075 unlimited_dims=unlimited_dims) 1076 if sync: 1077 store.sync() /Users/spencerclark/xarray-dev/xarray/xarray/backends/common.pyc in store(self, variables, attributes, check_encoding_set, unlimited_dims) 356 """ 357 --> 358 variables, attributes = self.encode(variables, attributes) 359 360 self.set_attributes(attributes) /Users/spencerclark/xarray-dev/xarray/xarray/backends/common.pyc in encode(self, variables, attributes) 441 # All NetCDF files get CF encoded by default, without this attempting 442 # to write times, for example, would fail. --> 443 variables, attributes = cf_encoder(variables, attributes) 444 variables = OrderedDict([(k, self.encode_variable(v)) 445 for k, v in variables.items()]) /Users/spencerclark/xarray-dev/xarray/xarray/conventions.pyc in cf_encoder(variables, attributes) 575 """ 576 new_vars = OrderedDict((k, encode_cf_variable(v, name=k)) --> 577 for k, v in iteritems(variables)) 578 return new_vars, attributes python2/cyordereddict/_cyordereddict.pyx in cyordereddict._cyordereddict.OrderedDict.init (python2/cyordereddict/_cyordereddict.c:1225)() //anaconda/envs/xarray-dev/lib/python2.7/_abcoll.pyc in update(args,* *kwds) 569 self[key] = other[key] 570 else: --> 571 for key, value in other: 572 self[key] = value 573 for key, value in kwds.items(): /Users/spencerclark/xarray-dev/xarray/xarray/conventions.pyc in <genexpr>((k, v)) 575 """ 576 new_vars = OrderedDict((k, encode_cf_variable(v, name=k)) --> 577 for k, v in iteritems(variables)) 578 return new_vars, attributes /Users/spencerclark/xarray-dev/xarray/xarray/conventions.pyc in encode_cf_variable(var, needs_copy, name) 232 variables.CFMaskCoder(), 233 variables.UnsignedIntegerCoder()]: --> 234 var = coder.encode(var, name=name) 235 236 # TODO(shoyer): convert all of these to use coders, too: /Users/spencerclark/xarray-dev/xarray/xarray/coding/times.pyc in encode(self, variable, name) 384 data, 385 encoding.pop('units', None), --> 386 encoding.pop('calendar', None)) 387 safe_setitem(attrs, 'units', units, name=name) 388 safe_setitem(attrs, 'calendar', calendar, name=name) /Users/spencerclark/xarray-dev/xarray/xarray/coding/times.pyc in encode_cf_datetime(dates, units, calendar) 338 339 if units is None: --> 340 units = infer_datetime_units(dates) 341 else: 342 units = _cleanup_netcdf_time_units(units) /Users/spencerclark/xarray-dev/xarray/xarray/coding/times.pyc in infer_datetime_units(dates) 254 reference_date = dates[0] if len(dates) > 0 else '1970-01-01' 255 reference_date = format_cftime_datetime(reference_date) --> 256 unique_timedeltas = np.unique(np.diff(dates)).astype('timedelta64[ns]') 257 units = _infer_time_units_from_diff(unique_timedeltas) 258 return '%s since %s' % (units, reference_date) TypeError: Cannot cast datetime.timedelta object from metadata [Y] to [ns] according to the rule 'same_kind' ``` Problem description This seems to be an edge case that was not covered in the tests I added in #1252. Strangely if I cast the result of `np.unique(np.diff(dates))` as an array before converting to `'timedelta64[ns]'` objects things work: ``` In [9]: np.unique(np.diff(dates)).astype('timedelta64[ns]') TypeError Traceback (most recent call last) <ipython-input-9-5d53452b676f> in <module>() ----> 1 np.unique(np.diff(dates)).astype('timedelta64[ns]') TypeError: Cannot cast datetime.timedelta object from metadata [Y] to [ns] according to the rule 'same_kind' In [10]: np.array(np.unique(np.diff(dates))).astype('timedelta64[ns]') Out[10]: array([2419200000000000, 2592000000000000, 2678400000000000], dtype='timedelta64[ns]') ``` Might anyone have any ideas as to what the underlying issue is? The fix could be as simple as that, but I don't understand why that makes a difference. Expected Output `da.to_netcdf('test.nc')` should succeed without an error. Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 2.7.14.final.0 python-bits: 64 OS: Darwin OS-release: 17.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None xarray: 0.8.2+dev641.g7302d7e pandas: 0.22.0 numpy: 1.13.1 scipy: 0.19.1 netCDF4: 1.3.1 h5netcdf: None h5py: 2.7.1 Nio: None zarr: 2.2.0 bottleneck: None cyordereddict: 1.0.0 dask: 0.17.1 distributed: 1.21.3 matplotlib: 2.2.2 cartopy: None seaborn: 0.8.1 setuptools: 38.4.0 pip: 9.0.1 conda: None pytest: 3.3.2 IPython: 5.5.0 sphinx: 1.7.1	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2127/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
238284894	MDU6SXNzdWUyMzgyODQ4OTQ=	1464	Writing directly to a netCDF file while using distributed	spencerkclark 6628425	closed	7	2017-06-24T01:28:00Z	2018-03-10T15:43:18Z	2018-03-10T15:43:18Z	MEMBER	I've been experimenting with `distributed` recently and have run into an issue when saving a result directly to a file using the `netcdf4` engine. I've found if I compute things before saving to a file (thus loading the result into memory before calling `to_netcdf`) things work OK. I attached a minimum working example below. Can others reproduce this? Part of me thinks there must be something wrong with my setup, because I'm somewhat surprised something like this wouldn't have come up already (apologies in advance if that's the case). ``` In [1]: import dask In [2]: import distributed In [3]: import netCDF4 In [4]: import xarray as xr In [5]: dask.version Out[5]: '0.15.0' In [6]: distributed.version Out[6]: '1.17.1' In [7]: netCDF4.version Out[7]: '1.2.9' In [8]: xr.version Out[8]: '0.9.6' In [9]: da = xr.DataArray([1., 2., 3.]) In [10]: da.to_netcdf('no-dask.nc') In [11]: da.chunk().to_netcdf('dask.nc') # Not using distributed yet In [12]: c = distributed.Client() # Launch a LocalCluster (now using distributed) In [13]: c Out[13]: <Client: scheduler='tcp://127.0.0.1:44576' processes=16 cores=16> In [14]: da.chunk().to_netcdf('dask-distributed-netcdf4.nc', engine='netcdf4') EOFError Traceback (most recent call last) <ipython-input-14-98490239a35f> in <module>() ----> 1 da.chunk().to_netcdf('dask-distributed-netcdf4.nc', engine='netcdf4') /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/xarray/core/dataarray.py in to_netcdf(self, args, kwargs) 1349 dataset = self.to_dataset() 1350 -> 1351 dataset.to_netcdf(args, kwargs) 1352 1353 def to_dict(self): /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims) 975 return to_netcdf(self, path, mode, format=format, group=group, 976 engine=engine, encoding=encoding, --> 977 unlimited_dims=unlimited_dims) 978 979 def unicode(self): /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, writer, encoding, unlimited_dims) 571 try: 572 dataset.dump_to_store(store, sync=sync, encoding=encoding, --> 573 unlimited_dims=unlimited_dims) 574 if path_or_file is None: 575 return target.getvalue() /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/xarray/core/dataset.py in dump_to_store(self, store, encoder, sync, encoding, unlimited_dims) 916 unlimited_dims=unlimited_dims) 917 if sync: --> 918 store.sync() 919 920 def to_netcdf(self, path=None, mode='w', format=None, group=None, /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in sync(self) 334 def sync(self): 335 with self.ensure_open(autoclose=True): --> 336 super(NetCDF4DataStore, self).sync() 337 self.ds.sync() 338 /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/xarray/backends/common.py in sync(self) 200 201 def sync(self): --> 202 self.writer.sync() 203 204 def store_dataset(self, dataset): /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/xarray/backends/common.py in sync(self) 177 import dask 178 if LooseVersion(dask.version) > LooseVersion('0.8.1'): --> 179 da.store(self.sources, self.targets, lock=GLOBAL_LOCK) 180 else: 181 da.store(self.sources, self.targets) /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/dask/array/core.py in store(sources, targets, lock, regions, compute, kwargs) 922 dsk = sharedict.merge((name, updates), [src.dask for src in sources]) 923 if compute: --> 924 Array._get(dsk, keys,* kwargs) 925 else: 926 from ..delayed import Delayed /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/dask/base.py in _get(cls, dsk, keys, get, kwargs) 102 get = get or _globals['get'] or cls._default_get 103 dsk2 = optimization_function(cls)(ensure_dict(dsk), keys, kwargs) --> 104 return get(dsk2, keys, kwargs) 105 106 @classmethod /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, kwargs) 1762 if sync: 1763 try: -> 1764 results = self.gather(packed) 1765 finally: 1766 for f in futures.values(): /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/distributed/client.py in gather(self, futures, errors, maxsize, direct) 1261 else: 1262 return self.sync(self._gather, futures, errors=errors, -> 1263 direct=direct) 1264 1265 @gen.coroutine /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/distributed/client.py in sync(self, func, args, kwargs) 487 return future 488 else: --> 489 return sync(self.loop, func, args, kwargs) 490 491 def str*(self): /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, args,* kwargs) 232 e.wait(1000000) 233 if error[0]: --> 234 six.reraise(error[0]) 235 else: 236 return result[0] /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/six.py in reraise(tp, value, tb) 684 if value.traceback is not tb: 685 raise value.with_traceback(tb) --> 686 raise value 687 688 else: /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/distributed/utils.py in f() 221 raise RuntimeError("sync() called from thread of running loop") 222 yield gen.moment --> 223 result[0] = yield make_coro() 224 except Exception as exc: 225 logger.exception(exc) /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/tornado/gen.py in run(self) 1013 1014 try: -> 1015 value = future.result() 1016 except Exception: 1017 self.had_exception = True /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/tornado/concurrent.py in result(self, timeout) 235 return self._result 236 if self._exc_info is not None: --> 237 raise_exc_info(self._exc_info) 238 self._check_done() 239 return self._result /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/tornado/util.py in raise_exc_info(exc_info) /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/tornado/gen.py in run(self) 1019 1020 if exc_info is not None: -> 1021 yielded = self.gen.throw(*exc_info) 1022 exc_info = None 1023 else: /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/distributed/client.py in _gather(self, futures, errors, direct) 1154 six.reraise(type(exception), 1155 exception, -> 1156 traceback) 1157 if errors == 'skip': 1158 bad_keys.add(key) /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/six.py in reraise(tp, value, tb) 683 value = tp() 684 if value.traceback is not tb: --> 685 raise value.with_traceback(tb) 686 raise value 687 /nbhome/skc/miniconda3/envs/research/lib/python3.6/site-packages/distributed/protocol/pickle.py in loads() 57 def loads(x): 58 try: ---> 59 return pickle.loads(x) 60 except Exception: 61 logger.info("Failed to deserialize %s", x[:10000], exc_info=True) EOFError: Ran out of input ``` If I load the data into memory first by invoking `compute()` things work OK: `In [15]: da.chunk().compute().to_netcdf('dask-distributed-netcdf4.nc', engine='netcdf4')`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1464/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
133136274	MDU6SXNzdWUxMzMxMzYyNzQ=	759	Trouble applying argmin when using xr.open_mfdataset()	spencerkclark 6628425	closed	1	2016-02-12T01:34:41Z	2016-02-12T16:13:18Z	2016-02-12T16:13:18Z	MEMBER	I recently tried to apply the argmin function on a dataset that I opened using `xr.open_mfdataset` and encountered an unexpected error. Applying argmin on the same dataset opened using `xr.open_dataset` works fine. Below is an example with some toy data. Could this be a bug, or is there something I'm doing wrong? I appreciate your help. ``` ipython In [1]: import xarray as xr In [2]: import numpy as np In [3]: xr.DataArray(np.random.rand(2, 3, 4), coords=[np.arange(2), np.arange(3), np.arange(4)], dims=['x', 'y', 'z']).to_dataset(name='test').to_netcdf('test_mfdataset.nc') In [4]: xr.open_dataset('test_mfdataset.nc').test.argmin('x').values Out[4]: array([[1, 1, 1, 1], [1, 0, 1, 0], [1, 1, 0, 1]]) In [5]: xr.open_mfdataset('test_mfdataset.nc').test.argmin('x').values IndexError Traceback (most recent call last) <ipython-input-8-ccac9ca40874> in <module>() ----> 1 xr.open_mfdataset('test_mfdataset.nc').test.argmin('x').values //anaconda/lib/python2.7/site-packages/xarray/core/dataarray.py in values(self) 353 def values(self): 354 """The array's data as a numpy.ndarray""" --> 355 return self.variable.values 356 357 @values.setter //anaconda/lib/python2.7/site-packages/xarray/core/variable.py in values(self) 286 def values(self): 287 """The variable's data as a numpy.ndarray""" --> 288 return _as_array_or_item(self._data_cached()) 289 290 @values.setter //anaconda/lib/python2.7/site-packages/xarray/core/variable.py in _data_cached(self) 252 def _data_cached(self): 253 if not isinstance(self._data, (np.ndarray, PandasIndexAdapter)): --> 254 self._data = np.asarray(self._data) 255 return self._data 256 //anaconda/lib/python2.7/site-packages/numpy/core/numeric.pyc in asarray(a, dtype, order) 472 473 """ --> 474 return array(a, dtype, copy=False, order=order) 475 476 def asanyarray(a, dtype=None, order=None): //anaconda/lib/python2.7/site-packages/dask/array/core.py in array(self, dtype, kwargs) 852 853 def array(self, dtype=None, kwargs): --> 854 x = self.compute() 855 if dtype and x.dtype != dtype: 856 x = x.astype(dtype) //anaconda/lib/python2.7/site-packages/dask/base.py in compute(self, kwargs) 35 36 def compute(self, kwargs): ---> 37 return compute(self, *kwargs)[0] 38 39 @classmethod //anaconda/lib/python2.7/site-packages/dask/base.py in compute(args,* kwargs) 108 for opt, val in groups.items()]) 109 keys = [var._keys() for var in variables] --> 110 results = get(dsk, keys, kwargs) 111 112 results_iter = iter(results) //anaconda/lib/python2.7/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, kwargs) 55 results = get_async(pool.apply_async, len(pool._pool), dsk, result, 56 cache=cache, queue=queue, get_id=_thread_get_id, ---> 57* kwargs) 58 59 return results //anaconda/lib/python2.7/site-packages/dask/async.py in get_async(apply_async, num_workers, dsk, result, cache, queue, get_id, raise_on_exception, rerun_exceptions_locally, callbacks, *kwargs) 479 _execute_task(task, data) # Re-execute locally 480 else: --> 481 raise(remote_exception(res, tb)) 482 state['cache'][key] = res 483 finish_task(dsk, key, state, results, keyorder.get) IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,4) (4,1) (1,3) Traceback File "//anaconda/lib/python2.7/site-packages/dask/async.py", line 264, in execute_task result = _execute_task(task, data) File "//anaconda/lib/python2.7/site-packages/dask/async.py", line 246, in _execute_task return func(args2) File "//anaconda/lib/python2.7/site-packages/toolz/functoolz.py", line 381, in call ret = f(ret) File "//anaconda/lib/python2.7/site-packages/dask/array/reductions.py", line 450, in arg_agg return _arg_combine(data, axis, argfunc)[0] File "//anaconda/lib/python2.7/site-packages/dask/array/reductions.py", line 416, in _arg_combine arg = (arg + offsets)[tuple(inds)] ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/759/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

20 rows where repo = 13221727, type = "issue" and user = 6628425 sorted by updated_at descending

What is your issue?

What is your issue?

Example

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Relevant log output

Anything else we need to know?

Environment

INSTALLED VERSIONS

MCVE Code Sample

In [4]: da.interp(time=['0002-05-01'])

Problem Description

Output of xr.show_versions()

MCVE Code Sample

In [6]: da.to_dataset().to_netcdf('test.nc')

Expected Output

Problem Description

Output of xr.show_versions()

Code Sample, a copy-pastable example if possible

Expected Output

Output of xr.show_versions()

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Code Sample, a copy-pastable example if possible

In [7]: series.dropna()

Problem description

Expected Output

Output of xr.show_versions()

Code Sample, a copy-pastable example if possible

In [3]: times - times[0]

Problem description

Expected Output

Output of xr.show_versions()

In [4]: end_dates - start_dates

Problem description

Expected Output

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of xr.show_versions()

Code Sample, a copy-pastable example if possible

In [7]: da.to_netcdf('test.nc')

Problem description

Expected Output

Output of xr.show_versions()

In [14]: da.chunk().to_netcdf('dask-distributed-netcdf4.nc', engine='netcdf4')

In [5]: xr.open_mfdataset('test_mfdataset.nc').test.argmin('x').values

Traceback

Advanced export

Output of `xr.show_versions()`

Output of `xr.show_versions()`

Output of `xr.show_versions()`

Output of `xr.show_versions()`

Output of `xr.show_versions()`

Output of `xr.show_versions()`

Output of `xr.show_versions()`