home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

8,178 rows sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: locked, assignee, milestone, author_association, draft, state_reason, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 4,144
  • pull 4,034

state 2

  • closed 7,034
  • open 1,144

repo 1

  • xarray 8,178
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2278499376 PR_kwDOAMm_X85uhFke 8997 Zarr: Optimize `region="auto"` detection dcherian 2448579 open 0     1 2024-05-03T22:13:18Z 2024-05-04T21:47:39Z   MEMBER   0 pydata/xarray/pulls/8997
  1. This moves the region detection code into ZarrStore so we only open the store once.
  2. Instead of opening the store as a dataset, construct a pd.Index directly to "auto"-infer the region.

The diff is large mostly because a bunch of code moved from backends/api.py to backends/zarr.py

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8997/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2273401533 PR_kwDOAMm_X85uPt_n 8991 Add argument check_dims to assert_allclose to allow transposed inputs (#5733) ignamv 408363 open 0     1 2024-05-01T12:05:40Z 2024-05-04T21:42:26Z   FIRST_TIME_CONTRIBUTOR   0 pydata/xarray/pulls/8991
  • [x] Closes #5733
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8991/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2034500760 PR_kwDOAMm_X85hnplA 8536 Speed up localize Illviljan 14371165 open 0     2 2023-12-10T19:24:40Z 2024-05-04T20:20:01Z   MEMBER   1 pydata/xarray/pulls/8536
  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8536/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2279042264 PR_kwDOAMm_X85ui13E 8999 Port negative frequency fix for `pandas.date_range` to `cftime_range` spencerkclark 6628425 open 0     0 2024-05-04T14:48:08Z 2024-05-04T14:51:26Z   MEMBER   0 pydata/xarray/pulls/8999

Like pandas.date_range, cftime_range would previously return dates outside the range of the specified start and end dates if provided a negative frequency: ```

start = cftime.DatetimeGregorian(2023, 10, 31) end = cftime.DatetimeGregorian(2021, 11, 1) xr.cftime_range(start, end, freq="-1YE") CFTimeIndex([2023-12-31 00:00:00, 2022-12-31 00:00:00, 2021-12-31 00:00:00], dtype='object', length=3, calendar='standard', freq='-1YE-DEC') ```

This PR ports a bug fix from pandas (https://github.com/pandas-dev/pandas/issues/56147) to prevent this from happening. The above example now produces: ```

start = cftime.DatetimeGregorian(2023, 10, 31) end = cftime.DatetimeGregorian(2021, 11, 1) xr.cftime_range(start, end, freq="-1YE") CFTimeIndex([2022-12-31 00:00:00, 2021-12-31 00:00:00], dtype='object', length=2, calendar='standard', freq=None) ```

Since this is a bug fix, we do not make any attempt to preserve the old behavior if an earlier version of pandas is installed. In the testing context this means we skip some tests for pandas versions less than 3.0.

  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8999/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2270269009 I_kwDOAMm_X86HUY5R 8984 Nightly Hypothesis tests failed github-actions[bot] 41898282 open 0     0 2024-04-30T00:32:21Z 2024-05-04T00:28:46Z   CONTRIBUTOR      

Workflow Run URL

Python 3.12 Test Summary ``` properties/test_index_manipulation.py::DatasetTest::runTest: DeprecationWarning: Deleting a single level of a MultiIndex is deprecated. Previously, this deleted all levels of a MultiIndex. Please also drop the following variables: {'3'} to avoid an error in the future. Falsifying example: state = DatasetStateMachine() state.init_ds(var=Variable( data=array(['1969-12-31T23:59:59.999978227', '1970-01-01T00:00:00.000000006'], dtype='datetime64[ns]'), dims=['É'], attrs={}, )) state.assert_invariants() Draw 1: ['É'] > drop_dims: ['É'] state.drop_dims(data=data(...)) state.assert_invariants() adding dimension coordinate 0 state.add_dim_coord(var=Variable(data=array([4, 0], dtype='timedelta64[ns]'), dims=['0'], attrs={})) state.assert_invariants() adding dimension coordinate 6 state.add_dim_coord(var=Variable( data=array([ 65536, 281474976776448], dtype=uint64), dims=['6'], attrs={}, )) state.assert_invariants() Draw 2: '0' > renaming 0 to ž state.rename_vars(data=data(...), newname='ž') state.assert_invariants() Draw 3: '6' > resetting 6 state.reset_index(data=data(...)) state.assert_invariants() Draw 4: ['0', '6'] > drop_dims: ['0', '6'] state.drop_dims(data=data(...)) state.assert_invariants() adding dimension coordinate 1 state.add_dim_coord(var=Variable( data=array([ 72113669181473025, 1024, 18446744073709551614], dtype=uint64), dims=['1'], attrs={}, )) state.assert_invariants() Draw 5: ['1'] > drop_dims: ['1'] state.drop_dims(data=data(...)) state.assert_invariants() adding dimension coordinate Ż state.add_dim_coord(var=Variable( data=array([0]), dims=['Ż'], attrs={'': array([65539, 65539], dtype=uint64), '2': array(['331191', '331191'], dtype='>M8[Y]'), 'Ż': ''}, )) state.assert_invariants() Draw 6: ['Ż'] > stacking ['Ż'] as 3 state.stack(create_index=True, data=data(...), newname='3') state.assert_invariants() Draw 7: '3' > renaming 3 to 2 state.rename_vars(data=data(...), newname='2') state.assert_invariants() Draw 8: ['3'] > drop_dims: ['3'] state.drop_dims(data=data(...)) state.teardown() Explanation: These lines were always and only run by failing examples: /home/runner/work/xarray/xarray/properties/test_index_manipulation.py:174 /home/runner/work/xarray/xarray/xarray/core/dataset.py:5928 /home/runner/work/xarray/xarray/xarray/core/dataset.py:5929 /home/runner/work/xarray/xarray/xarray/core/dataset.py:5932 /home/runner/work/xarray/xarray/xarray/core/indexes.py:1326 /home/runner/work/xarray/xarray/xarray/core/indexes.py:1327 You can reproduce this example by temporarily adding @reproduce_failure('6.100.2', b'AXicVY7dEcIwDIMlO0l5YRqOztEXFugi3RyQnZIG3+XH+hJLgHMler3uO37ViA0B3KJ1BwoHRYGgX/2FSP5pTE1HO6+LVv5sJQYVHQI0Jrd4YiE9ZEo+h+GBT683z9E2jKZkac+I7ZxSZSetEpMM1FYzcc66BctsCtklbV/auAr5') as a decorator on your test case ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8984/reactions",
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 1,
    "eyes": 1
}
    xarray 13221727 issue
2189649243 I_kwDOAMm_X86Cg2Vb 8844 ⚠️ Nightly upstream-dev CI failed ⚠️ github-actions[bot] 41898282 open 0     15 2024-03-16T00:20:55Z 2024-05-04T00:20:57Z   CONTRIBUTOR      

Workflow Run URL

Python 3.12 Test Summary ``` xarray/tests/test_array_api.py::test_arithmetic: AttributeError: '_DType' object has no attribute 'type' xarray/tests/test_array_api.py::test_aggregation: AttributeError: '_DType' object has no attribute 'kind' xarray/tests/test_array_api.py::test_aggregation_skipna: AttributeError: '_DType' object has no attribute 'type' xarray/tests/test_array_api.py::test_astype: AttributeError: type object 'numpy.int64' has no attribute '_np_dtype' xarray/tests/test_array_api.py::test_broadcast: AttributeError: '_DType' object has no attribute 'type' xarray/tests/test_array_api.py::test_broadcast_during_arithmetic: AttributeError: '_DType' object has no attribute 'type' xarray/tests/test_array_api.py::test_concat: TypeError: Cannot interpret 'Array([[ 1., 2., 3.], [ 4., 5., nan]], dtype=array_api_strict.float64)' as a data type xarray/tests/test_array_api.py::test_indexing: AttributeError: '_DType' object has no attribute 'type' xarray/tests/test_array_api.py::test_properties: AttributeError: 'DataArray' object has no attribute 'nbytes' xarray/tests/test_array_api.py::test_reorganizing_operation: AttributeError: '_DType' object has no attribute 'type' xarray/tests/test_array_api.py::test_stack: AttributeError: '_DType' object has no attribute 'type' xarray/tests/test_array_api.py::test_unstack: AttributeError: '_DType' object has no attribute 'type' xarray/tests/test_array_api.py::test_where: TypeError: Cannot interpret 'Array(1, dtype=array_api_strict.int64)' as a data type xarray/tests/test_cftime_offsets.py::test_date_range_like[2020-02-01--1YE-FEB-noleap-gregorian-True-2020-02-29-True]: AssertionError: assert 11 == 12 + where 11 = len(DatetimeIndex(['2019-02-28', '2018-02-28', '2017-02-28', '2016-02-29',\n '2015-02-28', '2014-02-28', '201...2-29',\n '2011-02-28', '2010-02-28', '2009-02-28'],\n dtype='datetime64[ns]', freq='-1YE-FEB')) xarray/tests/test_cftime_offsets.py::test_cftime_range_same_as_pandas[-1ME-2000-2000]: AssertionError: Arrays are not equal (shapes (1,), (0,) mismatch) ACTUAL: array(['2000-01-31T00:00:00.000000000'], dtype='datetime64[ns]') DESIRED: array([], dtype='datetime64[ns]') xarray/tests/test_cftime_offsets.py::test_cftime_range_same_as_pandas[-1ME-2000-2001]: AssertionError: Arrays are not equal (shapes (13,), (12,) mismatch) ACTUAL: array(['2001-01-31T00:00:00.000000000', '2000-12-31T00:00:00.000000000', '2000-11-30T00:00:00.000000000', '2000-10-31T00:00:00.000000000', '2000-09-30T00:00:00.000000000', '2000-08-31T00:00:00.000000000',... DESIRED: array(['2000-12-31T00:00:00.000000000', '2000-11-30T00:00:00.000000000', '2000-10-31T00:00:00.000000000', '2000-09-30T00:00:00.000000000', '2000-08-31T00:00:00.000000000', '2000-07-31T00:00:00.000000000',... xarray/tests/test_cftime_offsets.py::test_cftime_range_same_as_pandas[-1ME-2001-2001]: AssertionError: Arrays are not equal (shapes (1,), (0,) mismatch) ACTUAL: array(['2001-01-31T00:00:00.000000000'], dtype='datetime64[ns]') DESIRED: array([], dtype='datetime64[ns]') xarray/tests/test_cftime_offsets.py::test_cftime_range_same_as_pandas[-1YE-2000-2000]: AssertionError: Arrays are not equal (shapes (1,), (0,) mismatch) ACTUAL: array(['2000-12-31T00:00:00.000000000'], dtype='datetime64[ns]') DESIRED: array([], dtype='datetime64[ns]') xarray/tests/test_cftime_offsets.py::test_cftime_range_same_as_pandas[-1YE-2000-2001]: AssertionError: Arrays are not equal (shapes (2,), (1,) mismatch) ACTUAL: array(['2001-12-31T00:00:00.000000000', '2000-12-31T00:00:00.000000000'], dtype='datetime64[ns]') DESIRED: array(['2000-12-31T00:00:00.000000000'], dtype='datetime64[ns]') xarray/tests/test_cftime_offsets.py::test_cftime_range_same_as_pandas[-1YE-2001-2001]: AssertionError: Arrays are not equal (shapes (1,), (0,) mismatch) ACTUAL: array(['2001-12-31T00:00:00.000000000'], dtype='datetime64[ns]') DESIRED: array([], dtype='datetime64[ns]') xarray/tests/test_dtypes.py::test_maybe_promote[a-expected0]: SystemError: <class 'numpy.dtype'> returned a result with an exception set xarray/tests/test_duck_array_ops.py::TestOps::test_where_type_promotion: AssertionError: assert dtype('float64') == <class 'numpy.float32'> + where dtype('float64') = array([ 1., nan]).dtype + and <class 'numpy.float32'> = np.float32 xarray/tests/test_duck_array_ops.py::TestDaskOps::test_where_type_promotion: AssertionError: assert dtype('float64') == <class 'numpy.float32'> + where dtype('float64') = array([ 1., nan]).dtype + and <class 'numpy.float32'> = np.float32 xarray/tests/test_namedarray.py::TestNamedArray::test_real_and_imag: ModuleNotFoundError: No module named 'numpy.array_api' xarray/tests/test_namedarray.py::TestNamedArray::test_duck_array_class: ModuleNotFoundError: No module named 'numpy.array_api' xarray/tests/test_namedarray.py::TestNamedArray::test_expand_dims[None-3-expected_shape0-expected_dims0]: ModuleNotFoundError: No module named 'numpy.array_api' xarray/tests/test_namedarray.py::TestNamedArray::test_expand_dims[Default.token-3-expected_shape1-expected_dims1]: ModuleNotFoundError: No module named 'numpy.array_api' xarray/tests/test_namedarray.py::TestNamedArray::test_expand_dims[z-3-expected_shape2-expected_dims2]: ModuleNotFoundError: No module named 'numpy.array_api' xarray/tests/test_namedarray.py::TestNamedArray::test_permute_dims[dims0-expected_sizes0]: ModuleNotFoundError: No module named 'numpy.array_api' xarray/tests/test_namedarray.py::TestNamedArray::test_permute_dims[dims1-expected_sizes1]: ModuleNotFoundError: No module named 'numpy.array_api' xarray/tests/test_namedarray.py::TestNamedArray::test_permute_dims[dims2-expected_sizes2]: ModuleNotFoundError: No module named 'numpy.array_api' xarray/tests/test_namedarray.py::TestNamedArray::test_permute_dims_errors: ModuleNotFoundError: No module named 'numpy.array_api' xarray/tests/test_namedarray.py::TestNamedArray::test_broadcast_to[broadcast_dims1-3]: ModuleNotFoundError: No module named 'numpy.array_api' xarray/tests/test_namedarray.py::TestNamedArray::test_broadcast_to[broadcast_dims2-3]: ModuleNotFoundError: No module named 'numpy.array_api' xarray/tests/test_rolling.py::TestDataArrayRolling::test_rolling_dask_dtype[float32]: AssertionError: assert dtype('float64') == dtype('float32') + where dtype('float64') = <xarray.DataArray (x: 3)> Size: 24B\ndask.array<truediv, shape=(3,), dtype=float64, chunksize=(3,), chunktype=numpy.ndarray>\nCoordinates:\n * x (x) int64 24B 1 2 3.dtype + and dtype('float32') = <xarray.DataArray (x: 3)> Size: 12B\narray([1. , 1.5, 2. ], dtype=float32)\nCoordinates:\n * x (x) int64 24B 1 2 3.dtype xarray/tests/test_strategies.py::TestVariablesStrategy::test_make_strategies_namespace: ImportError: cannot import name 'array_api' from 'numpy' (/home/runner/micromamba/envs/xarray-tests/lib/python3.12/site-packages/numpy/__init__.py). Did you mean: 'array_repr'? Falsifying example: test_make_strategies_namespace( self=<xarray.tests.test_strategies.TestVariablesStrategy object at 0x7f47dc1f39b0>, data=data(...), ) You can reproduce this example by temporarily adding @reproduce_failure('6.100.2', b'AA==') as a decorator on your test case ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8844/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
    xarray 13221727 issue
2194953062 PR_kwDOAMm_X85qFqp1 8854 array api-related upstream-dev failures keewis 14808389 open 0     15 2024-03-19T13:17:09Z 2024-05-03T22:46:41Z   MEMBER   0 pydata/xarray/pulls/8854
  • [x] towards #8844

This "fixes" the upstream-dev failures related to the removal of numpy.array_api. There are a couple of open questions, though: - array-api-strict is not installed by default, so namedarray would get a new dependency. Not sure how to deal with that – as far as I can tell, numpy.array_api was not supposed to be used that way, so maybe we need to use array-api-compat instead? What do you think, @andersy005, @Illviljan? - array-api-strict does not define Array.nbytes (causing a funny exception that wrongly claims DataArray does not define nbytes) - array-api-strict has a different DType class, which makes it tricky to work with both numpy dtypes and said dtype class in the same code. In particular, if I understand correctly we're supposed to check dtypes using isdtype, but numpy.isdtype will only exist in numpy>=2, array-api-strict's version does not define datetime / string / object dtypes, and numpy.issubdtype does not work with the non-numpy dtype class). So maybe we need to use array-api-compat internally?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8854/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2278510478 PR_kwDOAMm_X85uhIGP 8998 Zarr: Optimize appending dcherian 2448579 open 0     0 2024-05-03T22:21:44Z 2024-05-03T22:23:34Z   MEMBER   1 pydata/xarray/pulls/8998

Builds on #8997

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8998/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2276408691 I_kwDOAMm_X86Hrz1z 8995 Why does xr.apply_ufunc support numpy/dask.arrays? TomNicholas 35968931 open 0     0 2024-05-02T20:18:41Z 2024-05-03T22:03:43Z   MEMBER      

What is your issue?

@keewis pointed out that it's weird that xarray.apply_ufunc supports passing numpy/dask arrays directly, and I'm inclined to agree. I don't understand why we do, and think we should consider removing that feature.

Two arguments in favour of removing it:

1) It exposes users to transposition errors

Consider this example:

```python In [1]: import xarray as xr

In [2]: import numpy as np

In [3]: arr = np.arange(12).reshape(3, 4)

In [4]: def mean(obj, dim): ...: # note: apply always moves core dimensions to the end ...: return xr.apply_ufunc( ...: np.mean, obj, input_core_dims=[[dim]], kwargs={"axis": -1} ...: ) ...:

In [5]: mean(arr, dim='time') Out[5]: array([1.5, 5.5, 9.5])

In [6]: mean(arr.T, dim='time') Out[6]: array([4., 5., 6., 7.]) ```

Transposing the input leads to a different result, with the value of the dim kwarg effectively ignored. This kind of error is what xarray code is supposed to prevent by design.

2) There is an alternative input pattern that doesn't require accepting bare arrays

Instead, any numpy/dask array can just be wrapped up into an xarray Variable/NamedArray before passing it to apply_ufunc.

```python In [7]: from xarray.core.variable import Variable

In [8]: var = Variable(data=arr, dims=['time', 'space'])

In [9]: mean(var, dim='time') Out[9]: <xarray.Variable (space: 4)> Size: 32B array([4., 5., 6., 7.])

In [10]: mean(var.T, dim='time') Out[10]: <xarray.Variable (space: 4)> Size: 32B array([4., 5., 6., 7.]) ```

This now guards against the transposition error, and puts the onus on the user to be clear about which axes of their array correspond to which dimension.

With Variable/NamedArray as public API, this latter pattern can handle every case that passing bare arrays in could.

I suggest we deprecate accepting bare arrays in favour of having users wrap them in Variable/NamedArray/DataArray objects instead.

(Note 1: We also accept raw scalars, but this doesn't expose anyone to transposition errors.)

(Note 2: In a quick scan of the apply_ufunc docstring, the docs on it in computation.rst, and the extensive guide that @dcherian wrote in the xarray tutorial repository, I can't see any examples that actually pass bare arrays to apply_ufunc.)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8995/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
503583044 MDU6SXNzdWU1MDM1ODMwNDQ= 3379 `ds.to_zarr(mode="a", append_dim="time")` not capturing any time steps under Hours jminsk-cc 48155582 closed 0     3 2019-10-07T17:17:06Z 2024-05-03T18:34:50Z 2024-05-03T18:34:50Z NONE      

MCVE Code Sample

```python import datetime

import xarray as xr

date = datetime.datetime(2019, 1, 1, 1, 10)

Reading in 2 min time stepped MRMS data

ds = xr.open_rasterio(dir_path) ds.name = "mrms" ds["time"] = date ds = ds.expand_dims("time") ds = ds.to_dataset()

ds.to_zarr("fin_zarr", compute=False, mode="w-")

date = datetime.datetime(2019, 1, 1, 1, 12)

Reading in 2 min time stepped MRMS data

This can be the same file since we are adding time manually

ds = xr.open_rasterio(dir_path) ds.name = "mrms" ds["time"] = date ds = ds.expand_dims("time") ds = ds.to_dataset()

ds.to_zarr("fin_zarr", compute=False, mode="a", append_dim="time") ```

Expected Output

<xarray.Dataset> Dimensions: (band: 1, time: 1, x: 7000, y: 3500) Coordinates: * band (band) int64 1 * y (y) float64 55.0 54.99 54.98 54.97 ... 20.04 20.03 20.02 20.01 * x (x) float64 -130.0 -130.0 -130.0 -130.0 ... -60.03 -60.02 -60.01 * time (time) datetime64[ns] 2019-01-01T01:10:00 Data variables: mrms (time, band, y, x) uint8 255 255 255 255 255 ... 255 255 255 255 appended by this in a ds.to_zarr() <xarray.Dataset> Dimensions: (band: 1, time: 1, x: 7000, y: 3500) Coordinates: * band (band) int64 1 * y (y) float64 55.0 54.99 54.98 54.97 ... 20.04 20.03 20.02 20.01 * x (x) float64 -130.0 -130.0 -130.0 -130.0 ... -60.03 -60.02 -60.01 * time (time) datetime64[ns] 2019-01-01T01:12:00 Data variables: mrms (time, band, y, x) uint8 255 255 255 255 255 ... 255 255 255 255 should look like below <xarray.Dataset> Dimensions: (band: 1, time: 2, x: 7000, y: 3500) Coordinates: * band (band) int64 1 * time (time) datetime64[ns] 2019-01-01T01:10:00 2019-01-01T01:12:00 * x (x) float64 -130.0 -130.0 -130.0 -130.0 ... -60.03 -60.02 -60.01 * y (y) float64 55.0 54.99 54.98 54.97 ... 20.04 20.03 20.02 20.01 Data variables: mrms (time, band, y, x) uint8 dask.array<shape=(2, 1, 3500, 7000), chunksize=(1, 1, 438, 1750)>

Problem Description

The outout looks like this: <xarray.Dataset> Dimensions: (band: 1, time: 2, x: 7000, y: 3500) Coordinates: * band (band) int64 1 * time (time) datetime64[ns] 2019-01-01T01:10:00 2019-01-01T01:10:00 * x (x) float64 -130.0 -130.0 -130.0 -130.0 ... -60.03 -60.02 -60.01 * y (y) float64 55.0 54.99 54.98 54.97 ... 20.04 20.03 20.02 20.01 Data variables: mrms (time, band, y, x) uint8 dask.array<shape=(2, 1, 3500, 7000), chunksize=(1, 1, 438, 1750)>

Where the minutes are repeated for the whole hour until a new hour is appended. It seems to not be handling minutes correctly.

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, Mar 27 2019, 16:54:48) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.12.3 pandas: 0.24.2 numpy: 1.16.4 scipy: 1.3.0 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: 1.0.21 cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.1.0 distributed: 2.1.0 matplotlib: 3.1.0 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: None setuptools: 41.0.1 pip: 19.1.1 conda: 4.7.12 pytest: 5.0.1 IPython: 7.6.1 sphinx: 2.1.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3379/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2269295936 PR_kwDOAMm_X85uBwtv 8983 fixes for the `pint` tests keewis 14808389 open 0     0 2024-04-29T15:09:28Z 2024-05-03T18:30:06Z   MEMBER   0 pydata/xarray/pulls/8983

This removes the use of the deprecated numpy.core._exceptions.UFuncError (and multiplication as a way to attach units), and makes sure we run the pint tests in the upstream-dev CI again.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8983/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1050082137 I_kwDOAMm_X84-lvtZ 5969 `to_zarr(append_dim="time")` appends incorrect datetimes JackKelly 460756 closed 0     3 2021-11-10T17:00:53Z 2024-05-03T17:09:31Z 2024-05-03T17:09:30Z NONE      

Description

If you create a Zarr with a single timestep and then append to the time dimension of that Zarr in subsequent writes then the appended timestamps are likely to be wrong. This only seems to happen if the time dimension is datetime64.

Minimal Complete Verifiable Example

Create a really simple Dataset:

python times = pd.date_range("2000-01-01 00:35", periods=8, freq="6H") da = xr.DataArray(coords=[times], dims=["time"]) ds = da.to_dataset(name="foo")

Write just the first timestep to a new Zarr store:

python ZARR_PATH = "test.zarr" ds.isel(time=[0]).to_zarr(ZARR_PATH, mode="w")

So far, so good!

Now things get weird... let's append the remainder of ds to the Zarr store:

python ds.isel(time=slice(1, None)).to_zarr(ZARR_PATH, append_dim="time")

This throws a warning, which is probably relevant:

/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/dataset.py:2037: SerializationWarning: saving variable None with floating point data as an integer dtype without any _FillValue to use for NaNs return to_zarr(

What happened

Let's load the Zarr and print the contents on the time coord:

python ds_loaded = xr.open_dataset(ZARR_PATH, engine="zarr") print(ds_loaded.time) <xarray.DataArray 'time' (time: 8)> array(['2000-01-01T00:35', '2000-01-01T00:35', '2000-01-01T00:35', '2000-01-02T00:35', '2000-01-02T00:35', '2000-01-02T00:35', '2000-01-03T00:35', '2000-01-03T00:35'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2000-01-01T00:35:00 ... 2000-01-03T00:35:00

(I've removed the seconds and milliseconds to make it a bit easier to read)

The first and fifth time coords (2000-01-01T00:35 and 2000-01-02T00:35) are correct. None of the others are correct!

The encoding is not appropriate (see #3942)... notice that the units is days since..., which clearly can't represent sub-day resolution:

python print(ds_loaded.time.encoding) {'chunks': (1,), 'preferred_chunks': {'time': 1}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'units': 'days since 2000-01-01 00:35:00', 'calendar': 'proleptic_gregorian', 'dtype': dtype('int64')}

What you expected to happen

The correct time coords are: python print(ds.time) <xarray.DataArray 'time' (time: 8)> array(['2000-01-01T00:35', '2000-01-01T06:35', '2000-01-01T12:35', '2000-01-01T18:35', '2000-01-02T00:35', '2000-01-02T06:35', '2000-01-02T12:35', '2000-01-02T18:35'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2000-01-01T00:35:00 ... 2000-01-02T18:35:00

Anything else we need to know?

There are three workarounds that I'm aware of:

1) When first creating the Zarr, write two or more timesteps into the Zarr. Then you can append any number of timesteps to the Zarr and everything works fine. 2) Convert the time coords to Unix epoch, represented as ints. 3) Manually set the encoding before the first write (as suggested in https://github.com/pydata/xarray/issues/3942#issuecomment-610444090). For example:

python ds.isel(time=[0]).to_zarr( ZARR_PATH, mode="w", encoding={ 'time': { 'units': 'seconds since 1970-01-01' } } )

Related issues

It's possible that the root cause of this issue is #3942.

And I think #3379 is another symptom of this issue.

Environment

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.13.0-21-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: ('en_GB', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.20.1 pandas: 1.3.4 numpy: 1.21.4 scipy: 1.7.2 netCDF4: 1.5.8 pydap: None h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.10.1 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.8 cfgrib: 0.9.9.1 iris: None bottleneck: 1.3.2 dask: 2021.10.0 distributed: None matplotlib: 3.4.3 cartopy: None seaborn: None numbagg: None fsspec: 2021.11.0 cupy: None pint: None sparse: None setuptools: 58.5.3 pip: 21.3.1 conda: None pytest: 6.2.5 IPython: 7.29.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5969/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2196272235 PR_kwDOAMm_X85qKODl 8856 Migrate indexing and broadcasting logic to `xarray.namedarray` (Part 1) andersy005 13301940 open 0     0 2024-03-19T23:51:46Z 2024-05-03T17:08:11Z   MEMBER   1 pydata/xarray/pulls/8856

This pull request is the first part of migrating the indexing and broadcasting logic from xarray.core.variable to xarray.namedarray. I intend to open follow-up pull requests to address additional changes related to this refactoring, as outlined in the proposal for decoupling lazy indexing functionality from NamedArray.

  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8856/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1933712083 I_kwDOAMm_X85zQhrT 8289 segfault with a particular netcdf4 file hmaarrfk 90008 open 0     11 2023-10-09T20:07:17Z 2024-05-03T16:54:18Z   CONTRIBUTOR      

What happened?

The following code yields a segfault on my machine (and many other machines with a similar environment)

``` import xarray filename = 'tiny.nc.txt' engine = "netcdf4"

dataset = xarray.open_dataset(filename, engine=engine)

i = 0 for i in range(60): xarray.open_dataset(filename, engine=engine) ```

tiny.nc.txt mrc.nc.txt

What did you expect to happen?

Not to segfault.

Minimal Complete Verifiable Example

  1. Generate some netcdf4 with my application.
  2. Trim the netcdf4 file down (load it, and drop all the vars I can while still reproducing this bug)
  3. Try to read it.

```Python import xarray from tqdm import tqdm filename = 'mrc.nc.txt' engine = "h5netcdf" dataset = xarray.open_dataset(filename, engine=engine)

for i in tqdm(range(60), desc=f"filename={filename}, enine={engine}"): xarray.open_dataset(filename, engine=engine)

engine = "netcdf4"

dataset = xarray.open_dataset(filename, engine=engine) for i in tqdm(range(60), desc=f"filename={filename}, enine={engine}"): xarray.open_dataset(filename, engine=engine)

filename = 'tiny.nc.txt'

engine = "h5netcdf" dataset = xarray.open_dataset(filename, engine=engine) for i in tqdm(range(60), desc=f"filename={filename}, enine={engine}"): xarray.open_dataset(filename, engine=engine)

engine = "netcdf4"

dataset = xarray.open_dataset(filename, engine=engine) for i in tqdm(range(60), desc=f"filename={filename}, enine={engine}"): xarray.open_dataset(filename, engine=engine) ```

hand crafting the file from start to finish seems to not segfault: ``` import xarray import numpy as np engine = 'netcdf4'

dataset = xarray.Dataset()

coords = {} coords['image_x'] = np.arange(1, dtype='int') dataset = dataset.assign_coords(coords)

dataset['image'] = xarray.DataArray( np.zeros((1,), dtype='uint8'), dims=('image_x',) )

%%

dataset.to_netcdf('mrc.nc.txt')

%%

dataset = xarray.open_dataset('mrc.nc.txt', engine=engine)

for i in range(10): xarray.open_dataset('mrc.nc.txt', engine=engine)

```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Python i=0 passes i=1 mostly segfaults, but sometimes it can take more than 1 iteration

Anything else we need to know?

At first I thought it was deep in hdf5, but I am less convinced now

xref: https://github.com/HDFGroup/hdf5/issues/3649

Environment

``` INSTALLED VERSIONS ------------------ commit: None python: 3.10.12 | packaged by Ramona Optics | (main, Jun 27 2023, 02:59:09) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 6.5.1-060501-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.2 xarray: 2023.9.1.dev25+g46643bb1.d20231009 pandas: 2.1.1 numpy: 1.24.4 scipy: 1.11.3 netCDF4: 1.6.4 pydap: None h5netcdf: 1.2.0 h5py: 3.9.0 Nio: None zarr: 2.16.1 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.3.0 distributed: 2023.3.0 matplotlib: 3.8.0 cartopy: None seaborn: None numbagg: None fsspec: 2023.9.2 cupy: None pint: 0.22 sparse: None flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.2.1 conda: 23.7.4 pytest: 7.4.2 mypy: None IPython: 8.16.1 sphinx: 7.2.6 ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8289/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2275404926 PR_kwDOAMm_X85uWjVP 8993 call `np.cross` with 3D vectors only keewis 14808389 closed 0     1 2024-05-02T12:21:30Z 2024-05-03T15:56:49Z 2024-05-03T15:22:26Z MEMBER   0 pydata/xarray/pulls/8993
  • [x] towards #8844

In the tests, we've been calling np.cross with vectors of 2 or 3 dimensions, numpy>=2 will deprecate 2D vectors (plus, we're now raising on warnings). Thus, we 0-pad the inputs before generating the expected result (which generally should not change the outcome of the tests).

For a later PR: add tests to check if xr.cross works if more than a single dimension is present, and pre-compute the expected result. Also, for property-based testing: the cross-product of two vectors is perpendicular to both input vectors (use the dot product to check that), and its length (l2-norm) is the product of the lengths of the input vectors.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8993/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2203689075 PR_kwDOAMm_X85qjXJq 8870 Enable explicit use of key tuples (instead of *Indexer objects) in indexing adapters and explicitly indexed arrays andersy005 13301940 closed 0     1 2024-03-23T04:34:18Z 2024-05-03T15:27:38Z 2024-05-03T15:27:22Z MEMBER   0 pydata/xarray/pulls/8870
  • [ ] Towards #8856
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8870/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2276352251 I_kwDOAMm_X86HrmD7 8994 Improving performance of open_datatree TomNicholas 35968931 open 0     4 2024-05-02T19:43:17Z 2024-05-03T15:25:33Z   MEMBER      

What is your issue?

The implementation of open_datatree works, but is inefficient, because it calls open_dataset once for every group in the file. We should refactor this to improve the performance, which would fix issues like https://github.com/xarray-contrib/datatree/issues/330.

We discussed this in the datatree meeting, and my understanding is that concretely we need to:

  • [ ] Create an asv benchmark for open_datatree, probably involving first writing then benchmarking the opening of a special netCDF file that has no data but lots of groups.
  • [ ] Refactor the NetCDFDatastore class to only create one CachingFileManager object per file, not one per group, see https://github.com/pydata/xarray/blob/748bb3a328a65416022ec44ced8d461f143081b5/xarray/backends/netCDF4_.py#L406.
  • [ ] Refactor NetCDF4BackendEntrypoint.open_datatree to use an implementation that goes through NetCDFDatastore without calling the top-level xr.open_dataset again.
  • [ ] Check the performance of calling xr.open_datatree on a netCDF file has actually improved.

It would be great to get this done soon as part of the datatree integration project. @kmuehlbauer I know you were interested - are you willing / do you have time to take this task on?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8994/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2276732187 PR_kwDOAMm_X85ubH0P 8996 Mark `test_use_cftime_false_standard_calendar_in_range` as an expected failure spencerkclark 6628425 closed 0     0 2024-05-03T01:05:21Z 2024-05-03T15:21:48Z 2024-05-03T15:21:48Z MEMBER   0 pydata/xarray/pulls/8996

Per https://github.com/pydata/xarray/issues/8844#issuecomment-2089427222, for the time being this marks test_use_cftime_false_standard_calendar_in_range as an expected failure under NumPy 2. Hopefully we'll be able to fix the upstream issue in pandas eventually.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8996/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2216068694 I_kwDOAMm_X86EFoZW 8895 droping variables when accessing remote datasets via pydap Mikejmnez 8241481 open 0     1 2024-03-29T22:55:45Z 2024-05-03T15:15:09Z   CONTRIBUTOR      

Is your feature request related to a problem?

I ran into the following issue when trying to access a remote dataset. Here is the concrete example that reproduces the error. ```python from pydap.client import open_url from pydap.cas.urs import setup_session import xarray as xr import numpy as np

username = "UsernameHere" password= "PasswordHere" filename = 'Daymet_Daily_V4R1.daymet_v4_daily_na_tmax_2010.nc' hyrax_url = 'https://opendap.earthdata.nasa.gov/collections/C2532426483-ORNL_CLOUD/granules/' url1 = hyrax_url + filename session = setup_session(username, password, check_url=hyrax_url)

ds = xr.open_dataset(url1, engine="pydap", session=session) The last line returns an error:python ValueError: dimensions ('time',) must have the same length as the number of data dimensions, ndim=2 ```

The issue involves the variable time_bnds. I know that because this works: python DS = [] for var in [var for var in tmax_ds.keys() if var not in ['time_bnds']]: DS.append(xr.open_dataset(url1+'?'+var, engine='pydap', session=session)) ds = xr.merge(DS) I also tried passing decode_times=False but continue having the error. The above for loop works but I think unnecessarily too slow (~30 secs).

I tried all this with the newer versions of xarray.__version__ = [2024.2, 2024.3].

Describe the solution you'd like

I think it would be nice to be able to drop the variable I know I don't want. So something like this:

python ds = xr.open_dataset(url1, drop_variables='time_bnds', engine="pydap", session=session) and only create a xarray.dataset with the variables I want. However when I do that <ins>I continue to have the same error as before</ins>, which means that drop_variables is being applied after creating the xarray.dataset.

Describe alternatives you've considered

This is potentially a backend issue with pydap - which does not take a drop_variables option, but since dropping a variable is a one-liner in pydap and takes less than 1milisec, it makes it an desirable feature.

For example I can easily open the dataset and drop the variable with pydap as described below

```python $ dataset = open_url(url1, session=session) # this works $ dataset[tuple([var for var in dataset.keys() if var not in ['time_bnds']])] # this takes < 1ms

<DatasetType with children 'y', 'lon', 'lat', 'time', 'x', 'tmax', 'lambert_conformal_conic', 'yearday'> ```

It looks like it would be a easy implementation on the backend, but at the same time I took a look at pydap_.py

https://github.com/pydata/xarray/blob/b80260781ee19bddee01ef09ac0da31ec12c5152/xarray/backends/pydap_.py#L129-L130

and I feel like it could also be implemented at the xarray level by allowing drop_variables which is already an argument in xarray.open_dataset, to be passed to the PydapDataStore (I guess in both scenarios drop_variables would be passed).

Any thoughts or suggestions? I can certainly lead on this effort as I already will be working on enabling the dap4 implementation within pydap.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8895/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2163608564 I_kwDOAMm_X86A9gv0 8802 Error when using `apply_ufunc` with `datetime64` as output dtype gcaria 44147817 open 0     4 2024-03-01T15:09:57Z 2024-05-03T12:19:14Z   CONTRIBUTOR      

What happened?

When using apply_ufunc with datetime64[ns] as output dtype, code throws error about converting from specific units to generic datetime units.

What did you expect to happen?

No response

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np

def _fn(arr: np.ndarray, time: np.ndarray) -> np.ndarray: return time[:10]

def fn(da: xr.DataArray) -> xr.DataArray: dim_out = "time_cp"

return xr.apply_ufunc(
    _fn,
    da,
    da.time,
    input_core_dims=[["time"], ["time"]],
    output_core_dims=[[dim_out]],
    vectorize=True,
    dask="parallelized",
    output_dtypes=["datetime64[ns]"],
    dask_gufunc_kwargs={"allow_rechunk": True, 
                        "output_sizes": {dim_out: 10},},
    exclude_dims=set(("time",)),
)

da_fake = xr.DataArray(np.random.rand(5,5,5), coords=dict(x=range(5), y=range(5), time=np.array(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'], dtype='datetime64[ns]') )).chunk(dict(x=2,y=2))

fn(da_fake.compute()).compute() # ValueError: Cannot convert from specific units to generic units in NumPy datetimes or timedeltas

fn(da_fake).compute() # same errors as above ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

```Python

ValueError Traceback (most recent call last) Cell In[211], line 1 ----> 1 fn(da_fake).compute()

File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataarray.py:1163, in DataArray.compute(self, kwargs) 1144 """Manually trigger loading of this array's data from disk or a 1145 remote source into memory and return a new array. The original is 1146 left unaltered. (...) 1160 dask.compute 1161 """ 1162 new = self.copy(deep=False) -> 1163 return new.load(kwargs)

File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataarray.py:1137, in DataArray.load(self, kwargs) 1119 def load(self, kwargs) -> Self: 1120 """Manually trigger loading of this array's data from disk or a 1121 remote source into memory and return this array. 1122 (...) 1135 dask.compute 1136 """ -> 1137 ds = self._to_temp_dataset().load(**kwargs) 1138 new = self._from_temp_dataset(ds) 1139 self._variable = new._variable

File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataset.py:853, in Dataset.load(self, kwargs) 850 chunkmanager = get_chunked_array_type(lazy_data.values()) 852 # evaluate all the chunked arrays simultaneously --> 853 evaluated_data = chunkmanager.compute(lazy_data.values(), kwargs) 855 for k, data in zip(lazy_data, evaluated_data): 856 self.variables[k].data = data

File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/daskmanager.py:70, in DaskManager.compute(self, data, kwargs) 67 def compute(self, data: DaskArray, kwargs) -> tuple[np.ndarray, ...]: 68 from dask.array import compute ---> 70 return compute(*data, kwargs)

File /srv/conda/envs/notebook/lib/python3.10/site-packages/dask/base.py:628, in compute(traverse, optimize_graph, scheduler, get, args, kwargs) 625 postcomputes.append(x.dask_postcompute()) 627 with shorten_traceback(): --> 628 results = schedule(dsk, keys, kwargs) 630 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)])

File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2372, in vectorize.call(self, args, kwargs) 2369 self._init_stage_2(args, kwargs) 2370 return self -> 2372 return self._call_as_normal(*args, kwargs)

File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2365, in vectorize._call_as_normal(self, args, *kwargs) 2362 vargs = [args[_i] for _i in inds] 2363 vargs.extend([kwargs[_n] for _n in names]) -> 2365 return self._vectorize_call(func=func, args=vargs)

File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2446, in vectorize._vectorize_call(self, func, args) 2444 """Vectorized call to func over positional args.""" 2445 if self.signature is not None: -> 2446 res = self._vectorize_call_with_signature(func, args) 2447 elif not args: 2448 res = func()

File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2506, in vectorize._vectorize_call_with_signature(self, func, args) 2502 outputs = _create_arrays(broadcast_shape, dim_sizes, 2503 output_core_dims, otypes, results) 2505 for output, result in zip(outputs, results): -> 2506 output[index] = result 2508 if outputs is None: 2509 # did not call the function even once 2510 if otypes is None:

ValueError: Cannot convert from specific units to generic units in NumPy datetimes or timedeltas ```

Anything else we need to know?

No response

Environment

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8802/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2266442492 PR_kwDOAMm_X85t4NhR 8976 Migration of datatree/ops.py -> datatree_ops.py flamingbear 479480 closed 0     4 2024-04-26T20:14:11Z 2024-05-02T19:49:39Z 2024-05-02T19:49:39Z CONTRIBUTOR   0 pydata/xarray/pulls/8976

I considered wedging this into core/ops.py, but it didn't look like it fit there.

This is a basic lift and shift from datatree_/ops.py to core/datatree_ops.py

I did fix the document addendum injection and added a couple of tests.

  • [x] Contributes to migration step for miscellaneous modules in Track merging datatree into xarray #8572
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8976/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2054280736 I_kwDOAMm_X856cdYg 8572 Track merging datatree into xarray TomNicholas 35968931 open 0     27 2023-12-22T17:37:20Z 2024-05-02T19:44:29Z   MEMBER      

What is your issue?

Master issue to track progress of merging xarray-datatree into xarray main. Would close https://github.com/pydata/xarray/issues/4118 (and many similar issues), as well as one of the goals of our development roadmap.

Also see the project board for DataTree integration.


On calls in the last few dev meetings, we decided to forget about a temporary cross-repo from xarray import datatree (so this issue supercedes #7418), and just begin merging datatree into xarray main directly.

Weekly meeting

See https://github.com/pydata/xarray/issues/8747

Task list:

To happen in order:

  • [x] open_datatree in xarray. This doesn't need to be performant initially, and ~~it would initially return a datatree.DataTree object.~~ EDIT: We decided it should return an xarray.DataTree object, or even xarray.core.datatree.DataTree object. So we can start by just copying the basic version in datatree/io.py right now which just calls open_dataset many times. #8697
  • [x] Triage and fix issues: figure out which of the issues on xarray-contrib/datatree need to be fixed before the merge (if any).
  • [ ] Merge in code for DataTree class. I suggest we do this by making one PR for each module, and ideally discussing and merging each before opening a PR for the next module. (Open to other workflow suggestions though.) The main aim here being lowering the bus factor on the code, confirming high-level design decisions, and improving details of the implementation as it goes in.

    Suggested order of modules to merge: - [x] datatree/treenode.py - defines the tree structure, without any dimensions/data attached, #8757 - [x] datatree/datatree.py - adds data to the tree structure, #8789 - [x] datatree/iterators.py - iterates over a single tree in various ways, currently copied from anytree, #8879 - [x] datatree/mapping.py - implements map_over_subtree by iterating over N trees at once https://github.com/pydata/xarray/pull/8948, - [ ] datatree/ops.py - uses map_over_subtree to map methods like .mean over whole trees (https://github.com/pydata/xarray/pull/8976), - [x] datatree/formatting_html.py - HTML repr, works but could do with some optimization https://github.com/pydata/xarray/pull/8930, - [x] datatree/{extensions/common}.py - miscellaneous other features e.g. attribute-like access (#8967).

  • [ ] Expose datatree API publicly. Actually expose open_datatree and DataTree in xarray's public API as top-level imports. The full list of things to expose is:

  • [ ] open_datatree
  • [ ] DataTree
  • [ ] map_over_subtree
  • [ ] assert_isomorphic
  • [ ] register_datatree_accessor

  • [ ] Refactor class inheritance - Dataset/DataArray share some mixin classes (e.g. DataWithCoords), and we could probably refactor DataTree to use these too. This is low-priority but would reduce code duplication.

Can happen basically at any time or maybe in parallel with other efforts:

  • [ ] Generalize backends to support groups. Once a basic version of xr.open_datatree exists, we can start refactoring xarray's backend classes to support a general Backend.open_datatree method for any backend that can open multiple groups. Then we can make sure this is more performant than the naive implementation, i.e. only opening the file once. See also #8994.
  • [ ] Support backends other than netCDF and Zarr. - e.g. grib, see https://github.com/pydata/xarray/pull/7437,
  • [ ] Support dask properly - Issue https://github.com/xarray-contrib/datatree/pull/97 and the (stale) PR https://github.com/xarray-contrib/datatree/pull/196 are about dask parallelization over separate nodes in the tree.
  • [ ] Add other new high-level API methods - Things like .reorder_nodes and ideas we've only discussed like https://github.com/xarray-contrib/datatree/issues/79 and https://github.com/xarray-contrib/datatree/issues/254 (cc @dcherian who has had useful ideas here)
  • [ ] Copy xarray-contrib/datatree issues over to xarray's main repository. I think this is quite important and worth doing as a record of why decisions were made. (@jhamman and @TomNicholas)
  • [ ] Copy over any recent bug fixes from original datatree repository
  • [x] Look into merging commit history of xarray-contrib/datatree. I think this would be cool but is less important than keeping the issues. (@jhamman suggested we could do this using some git wizardry that I hadn't heard of before)
  • [ ] xarray.tutorial.open_datatree - I've been meaning to make a tutorial datatree object for ages. There's an issue about it, but actually now I think something close to the CMIP6 ensemble data that @jbusecke and I used in our pangeo blog post would already be pretty good. Once we have this it becomes much easier to write docs about some advanced features.
  • [ ] Merge Docs - I've tried to write these pages so that they should slot neatly into xarray's existing docs structure. Careful reading, additions and improvements would be great though. Summary of what docs exist on this issue https://github.com/xarray-contrib/datatree/issues/61
  • [ ] Write a blog post on the xarray blog highlighting xarray's new functionality, and explicitly thanking the NASA team for their work. Doesn't have to be long, it can just point to the documentation.

Anyone is welcome to help with any of this, including but not limited to @owenlittlejohns , @eni-awowale, @flamingbear (@etienneschalk maybe?).

cc also @shoyer @keewis for any thoughts as to the process.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8572/reactions",
    "total_count": 7,
    "+1": 6,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
    xarray 13221727 issue
2275107296 I_kwDOAMm_X86Hm2Hg 8992 (i)loc slicer specialization for convenient slicing by dimension label as `.loc('dim_name')[:n]` smartass101 941907 open 0     0 2024-05-02T10:04:11Z 2024-05-02T14:47:09Z   NONE      

Is your feature request related to a problem?

Until PEP 472, I'm sure we would all love to be able to do indexing with labeled dimension names inside brackets. Here I'm proposing a slightly modified syntax which is possible to implement and would be quite convenient IMHO.

Describe the solution you'd like

This is inspired by the Pandas .loc(axis=n) specialization. Essentially the .(i)loc accessors would become callable like in Pandas, which would enable to specify the desired order of dimensions in the subsequent slicing brackets. Schematically ```python darr.loc('dim name 1', 'dim name 2')[x1:x2,y1:y2]

is equivalent to first returning an augmented `_LocIndexer` which now associates positional indexes to according to the provided dim orderpython loc_idx_spec = darr.loc('dim name 1', 'dim name 2') loc_idx_spec[x1:x2,y1:y2] `` The first part is essentially similar to.transpose('dim name 1', 'dim name 2')and in the case of aDataArray` it could be used instead. But this syntax could work also for Dataset. Additonally, it does not require an actual transpose operation.

This accessor becomes especially convenient when you quickly want to index just one dimension such as python darr.loc('dim name')[:x2]

Describe alternatives you've considered

The equivalent darr.sel({'dim name 1': slice(x1, x2), 'dim name 2': slice(y1,y2)}) is admittedly not that much worse, but for me writing slice feels cumbersome especially in situations when you have a lot of None specifications such as slice(None,None,2).

Additional context

This .loc(axis=n) API is (not so obviously) documented for Pandas here.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8992/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2204914380 PR_kwDOAMm_X85qnPSf 8872 Avoid auto creation of indexes in concat TomNicholas 35968931 open 0     15 2024-03-25T05:16:33Z 2024-05-01T19:07:01Z   MEMBER   0 pydata/xarray/pulls/8872

If we create a Coordinates object using the concatenated result_indexes, and pass that to the Dataset constructor, we can explicitly set the correct indexes from the start, instead of auto-creating the wrong ones and then trying to overwrite them with the correct indexes later (which is what the current implementation does).

  • [x] Possible fix for #8871
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8872/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2241526039 PR_kwDOAMm_X85skMs0 8939 avoid a couple of warnings in `polyfit` keewis 14808389 closed 0     14 2024-04-13T11:49:13Z 2024-05-01T16:42:06Z 2024-05-01T15:34:20Z MEMBER   0 pydata/xarray/pulls/8939

- [x] towards #8844

  • replace numpy.core.finfo with numpy.finfo
  • add dtype and copy parameters to all definitions of __array__
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8939/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2267803218 PR_kwDOAMm_X85t8pSN 8980 Complete deprecation of Dataset.dims returning dict TomNicholas 35968931 open 0     6 2024-04-28T20:32:29Z 2024-05-01T15:40:44Z   MEMBER   0 pydata/xarray/pulls/8980
  • [x] Completes deprecation cycle described in #8496, and started in #8500
  • [ ] ~~Tests added~~
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8980/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
977544678 MDU6SXNzdWU5Nzc1NDQ2Nzg= 5733 Shoudn't `assert_allclose` transpose datasets? jbusecke 14314623 open 0     16 2021-08-23T22:55:12Z 2024-05-01T15:27:21Z   CONTRIBUTOR      

I am trying to compare two datasets, one of which has possibly transposed dimensions on a data variable.

python import xarray as xr import numpy as np data = np.random.rand(4,6) da = xr.DataArray(data, dims=['x','y']) ds1 = xr.Dataset({'data':da}) ds2 = xr.Dataset({'data':da}).transpose('y','x') What happened: In my mind this should pass python xr.testing.assert_allclose(ds1, ds2) but instead it fails

```

AssertionError Traceback (most recent call last) <ipython-input-7-58cd53174a1e> in <module> ----> 1 xr.testing.assert_allclose(ds1, ds2)

[... skipping hidden 1 frame]

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/testing.py in assert_allclose(a, b, rtol, atol, decode_bytes) 169 a.variables, b.variables, compat=compat_variable 170 ) --> 171 assert allclose, formatting.diff_dataset_repr(a, b, compat=equiv) 172 else: 173 raise TypeError("{} not supported by assertion comparison".format(type(a)))

AssertionError: Left and right Dataset objects are not close

Differing data variables: L data (x, y) float64 0.8589 0.09264 0.0264 ... 0.1039 0.3685 0.3983 R data (y, x) float64 0.8589 0.8792 0.8433 0.6952 ... 0.3664 0.2214 0.3983 ```

Simply transposing ds2 to the same dimensions of ds1 fixes this (since the data is the same after all)

python xr.testing.assert_allclose(ds1, ds2.transpose('x','y'))

Since most of the other xarray operations are 'transpose-safe' ((ds1+ds2) = (ds1 + ds2.transpose('x','y')), shouldnt this one be too?

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.109+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.0 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.6.2 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.7.1 cftime: 1.4.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.2.2 cfgrib: 0.9.9.0 iris: None bottleneck: 1.3.2 dask: 2021.04.1 distributed: 2021.04.1 matplotlib: 3.4.1 cartopy: 0.19.0 seaborn: None numbagg: None pint: 0.17 setuptools: 49.6.0.post20210108 pip: 20.3.4 conda: None pytest: None IPython: 7.22.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5733/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2270984193 PR_kwDOAMm_X85uHk70 8986 clean up the upstream-dev setup script keewis 14808389 closed 0     1 2024-04-30T09:34:04Z 2024-04-30T23:26:13Z 2024-04-30T20:59:56Z MEMBER   0 pydata/xarray/pulls/8986

In trying to install packages that are compatible with numpy>=2 I added several projects that are built in CI without build isolation (so that they will be built with the nightly version of numpy). That was a temporary workaround, so we should start thinking about cleaning this up.

As it seems numcodecs is now compatible (or uses less of numpy in compiled code, not sure), this is an attempt to see if CI works if we use the version from conda-forge.

bottleneck and cftime now build against numpy>=2.0.0rc1, so we can stop building them without build isolation.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8986/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2266174558 I_kwDOAMm_X86HExRe 8975 Xarray sponsorship guidelines shoyer 1217238 open 0     3 2024-04-26T17:05:01Z 2024-04-30T20:52:33Z   MEMBER      

At what level of support should Xarray acknowledge sponsors on our website?

I would like to surface this for open discussion because there are potential sponsoring organizations with conflicts of interest with members of Xarray's leadership team (e.g., Earthmover, which employs @jhamman, @rabernat and @dcherian).

My suggestion is to use NumPy's guidelines, with an adjustment down to 1/3 of the thresholds to account for the smaller size of the project:

  • $10,000/yr for unrestricted financial contributions (e.g., donations)
  • $20,000/yr for financial contributions for a particular purpose (e.g., grants)
  • $30,000/yr for in-kind contributions (e.g., time for employees to contribute)
  • 2 person-months/yr of paid work time for one or more Xarray maintainers or regular contributors to any Xarray team or activity

The NumPy guidelines also include a grace period of a minimum of 6 months for acknowledging support. I would suggest increasing this to a minimum of 1 year for Xarray.

I would greatly appreciate any feedback from members of the community, either in this issue or on the next team meeting.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8975/reactions",
    "total_count": 6,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2270275688 I_kwDOAMm_X86HUaho 8985 update `to_netcdf` docstring to list support for explicit CDF5 writes JulioTBacmeister 9221710 open 0     4 2024-04-30T00:41:13Z 2024-04-30T20:48:46Z   NONE      

Is your feature request related to a problem?

I cannot get to_netcdf() to write files in CDF5 format as identifed by the 'ncdump -k' command.

Describe the solution you'd like

When I write a netcdf file using:

D.to_netcdf( filename )

then ask ncdump to tell me the kind of file I have,

ncdump -k filename

it returns 'netCDF-4'. Unfortunately, this file won't work in the Community Atmpshere Model (CAM), as an initial condition for example. CAM will bomb when it tries to read it. After converting the file with this command:

nccopy -k cdf5 filename cdf5_filename

the file now works in CAM. Also, the command

ncdump -k cdf5_filename

returns 'cdf5'.

I confess I don't know what the nccopy command is doing, but it seems to be needed for the file to be readable by CAM. I am looking for an option in the to_netcdf method that will explicitly write 'cdf5' files without needing to resort to the nccopy command.

Describe alternatives you've considered

Writing netcdf-4 files from xarray and converting via nccopy -k cdf5 filename cdf5_filename

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8985/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2272299822 PR_kwDOAMm_X85uL82a 8989 Skip flaky `test_open_mfdataset_manyfiles` test max-sixty 5635139 closed 0     0 2024-04-30T19:24:41Z 2024-04-30T20:27:04Z 2024-04-30T19:46:34Z MEMBER   0 pydata/xarray/pulls/8989

Don't just xfail, and not only on windows, since it can crash the worker

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8989/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2271670475 PR_kwDOAMm_X85uJ5Er 8988 Remove `.drop` warning allow max-sixty 5635139 closed 0     0 2024-04-30T14:39:35Z 2024-04-30T19:26:17Z 2024-04-30T19:26:16Z MEMBER   0 pydata/xarray/pulls/8988  
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8988/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2271652603 PR_kwDOAMm_X85uJ122 8987 Add notes on when to add ignores to warnings max-sixty 5635139 closed 0     0 2024-04-30T14:34:52Z 2024-04-30T14:56:47Z 2024-04-30T14:56:46Z MEMBER   0 pydata/xarray/pulls/8987  
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8987/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2262468762 PR_kwDOAMm_X85tqnJm 8973 Docstring and documentation improvement for the Dataset class noahbenson 2005723 closed 0     7 2024-04-25T01:39:02Z 2024-04-30T14:40:32Z 2024-04-30T14:40:14Z CONTRIBUTOR   0 pydata/xarray/pulls/8973

The example in the doc-string of the Dataset class prior to this commit uses an example array whose size is 2 x 2 x 3 with the first two dimensions labeled "x" and "y" and the final dimension labeled "time". This was confusing due to the fact that "x" and "y" are just arbitrary names for these axes and that no reason is given for the data to be organized in a 2x2x3 array instead of a 2x2 matrix. This commit clarifies the example.

Additionally, this PR contains updates to the documentation, specifically the user-guide/data-structures.rst file; the updates bring the documentation examples into alignment with the doc-string change. Unfortunately, I wasn't able to build the documentation, so this will need to be checked. (I followed the instructions here, but despite cfgrib working fine, I got an error about how it wasn't a valid engine.)

See issue #8970 for more information.

  • [X] Closes #8970
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8973/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2261858401 I_kwDOAMm_X86G0Thh 8970 Example code in the documentation for `Dataset` is not clear noahbenson 2005723 closed 0     13 2024-04-24T17:50:46Z 2024-04-30T14:40:15Z 2024-04-30T14:40:15Z CONTRIBUTOR      

What is your issue?

The example code in the documentation for the Dataset class (e.g., here) is probably clear to those who study Earth and Atmospheric Sciences, but it makes no sense to me. Here is the code:

```python np.random.seed(0) temperature = 15 + 8 * np.random.randn(2, 2, 3) precipitation = 10 * np.random.rand(2, 2, 3) lon = [[-99.83, -99.32], [-99.79, -99.23]] lat = [[42.25, 42.21], [42.63, 42.59]] time = pd.date_range("2014-09-06", periods=3) reference_time = pd.Timestamp("2014-09-05")

ds = xr.Dataset( data_vars=dict( temperature=(["x", "y", "time"], temperature), precipitation=(["x", "y", "time"], precipitation), ), coords=dict( lon=(["x", "y"], lon), lat=(["x", "y"], lat), time=time, reference_time=reference_time, ), attrs=dict(description="Weather related data."), ) ```

To be clear, I understand each individual line of code, but I don't understand why there is both a latitude/longitude and an x/y in this example or how they are supposed to be related to each other (and there do not appear to be any additional details about this dataset's intended structure). Probably due to this lack of clarity I'm having a hard time wrapping my head around what the x/y coordinates and the lat/lon coordinates are supposed to demonstrate about xarray here, or how the x/y and lat/lon values are represented in the data structure. Are the x and y coordinates in a map projection of some kind? I have worked successfully with Datasets in the past, but as someone who doesn't work with geospatial data, I find myself more confused about Datasets after reading this example than before.

I suspect that all that is needed is a clear description of what these data are supposed to represent, how they are intended to be used, and how x/y and lat/lon are related. If someone can explain this to me, I'd be happy to submit a PR for the docs.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8970/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1915997507 I_kwDOAMm_X85yM81D 8238 NamedArray tracking issue dcherian 2448579 open 0     12 2023-09-27T17:07:58Z 2024-04-30T12:49:17Z   MEMBER      

@andersy005 I think it would be good to keep a running list of NamedArray tasks. I'll start with a rough sketch, please update/edit as you like.

  • [x] Refactor out NamedArray base class (#8075)
  • [x] publicize design doc: Scientific Python | Pangeo | NumPy Mailist
  • [ ] Migrate VariableArithmetic to NamedArrayArithmetic (#8244)
  • [ ] Migrate ExplicitlyIndexed array classes to array protocols
  • [x] MIgrate from *Indexer objects to .oindex and .vindex on ExplicitlyIndexed array classes
  • [ ] https://github.com/pydata/xarray/pull/8870
  • [ ] Migrate unary ops
  • [ ] Migrate binary ops
  • [ ] Migrate nanops.py
  • [x] Avoid "injecting" reduce methods potentially by using generate_reductions.py? (#8304)
  • [ ] reprs and formatting.py
  • [x] parallelcompat.py
  • [ ] pycompat.py (#8244)
  • [ ] https://github.com/pydata/xarray/pull/8276
  • [ ] have test_variable.py test both NamedArray and Variable
  • [x] Arrays with unknown shape #8291
  • [ ] https://github.com/pydata/xarray/issues/8306
  • [ ] https://github.com/pydata/xarray/issues/8310
  • [ ] https://github.com/pydata/xarray/issues/8333
  • [ ] Try to preserve imports from xarray.core/* by importing namedarray functionality into xarray.core/*

xref #3981

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8238/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2212435865 PR_kwDOAMm_X85rAwYu 8885 add `.oindex` and `.vindex` to `BackendArray` andersy005 13301940 closed 0     8 2024-03-28T06:14:43Z 2024-04-30T12:12:50Z 2024-04-17T01:53:23Z MEMBER   0 pydata/xarray/pulls/8885

this PR builds towards

  • https://github.com/pydata/xarray/pull/8870
  • https://github.com/pydata/xarray/pull/8856

the primary objective is to partially address

  1. Implement fall back .oindex, .vindex properties on BackendArray base class. These will simply rewrap the key tuple with the appropriate *Indexer object, and pass it on to __getitem__ or __setitem__. These methods will also raise DeprecationWarning so that external backends will know to migrate to .oindex, and .vindex over the next year.

  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8885/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2231711080 PR_kwDOAMm_X85sCbN- 8921 Revert `.oindex` and `.vindex` additions in `_ElementwiseFunctionArray`, `NativeEndiannessArray`, and `BoolTypeArray` classes andersy005 13301940 open 0     9 2024-04-08T17:11:08Z 2024-04-30T06:49:46Z   MEMBER   0 pydata/xarray/pulls/8921

As noted in https://github.com/pydata/xarray/issues/8909, the use of .oindex and .vindex properties in coding/* appears to have broken some backends (e.g. scipy). This PR reverts those changes. We plan to bundle these changes into a separate backends feature branch (see this comment, which will be merged once we are confident about its impact on downstream dependencies.

  • [ ] Closes #8909
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8921/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2240600694 PR_kwDOAMm_X85shCWz 8933 Use array_api compliant dtype Illviljan 14371165 open 0     0 2024-04-12T17:30:51Z 2024-04-30T03:57:57Z   MEMBER   1 pydata/xarray/pulls/8933
  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

Notes: * For duckarrays, use _dtype[_generic] * For actual np.ndarrays, use np.dtype[np.generic] * np.dtype is too specific in general and it's probably not needed in most of their array_api functions. * _DTypeBase-class in np? * Mixing dtypes from np and xp is discouraged: https://github.com/data-apis/array-api/issues/582 * Using asarray seems to be the recommended way.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8933/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1389295853 I_kwDOAMm_X85Szvjt 7099 Pass arbitrary options to sel() benbovy 4160723 open 0     4 2022-09-28T12:44:52Z 2024-04-30T00:44:18Z   MEMBER      

Is your feature request related to a problem?

Currently .sel() accepts two options method and tolerance. These are relevant for default (pandas) indexes but not necessarily for other, custom indexes.

It would be also useful for custom indexes to expose their own selection options, e.g.,

  • index query optimization like the dualtree flag of sklearn.neighbors.KDTree.query
  • k-nearest neighbors selection with the creation of a new "k" dimension (+ coordinate / index) with user-defined name and size.

From #3223, it would be nice if we could also pass distinct options values per index.

What would be a good API for that?

Describe the solution you'd like

Some ideas:

A. Allow passing a tuple (labels, options_dict) as indexer value

python ds.sel(x=([0, 2], {"method": "nearest"}), y=3)

B. Expose an options kwarg that would accept a nested dict

python ds.sel(x=[0, 2], y=3, options={"x": {"method": "nearest"}})

Option A does not look very readable. Option B is slightly better, although the nested dictionary is not great.

Any other ideas? Some sort of context manager? Some Index specific API?

Describe alternatives you've considered

The API proposed in #3223 would look great if method and tolerance were the only accepted options, but less so for arbitrary options.

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7099/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
481761508 MDU6SXNzdWU0ODE3NjE1MDg= 3223 Feature request for multiple tolerance values when using nearest method and sel() NicWayand 1117224 open 0     4 2019-08-16T19:53:31Z 2024-04-29T23:21:04Z   NONE      

```python import xarray as xr import numpy as np import pandas as pd

Create test data

ds = xr.Dataset() ds.coords['lon'] = np.arange(-120,-60) ds.coords['lat'] = np.arange(30,50) ds.coords['time'] = pd.date_range('2018-01-01','2018-01-30') ds['AirTemp'] = xr.DataArray(np.ones((ds.lat.size,ds.lon.size,ds.time.size)), dims=['lat','lon','time'])

target_lat = [36.83] target_lon = [-110] target_time = [np.datetime64('2019-06-01')]

Nearest pulls a date too far away

ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest')

Adding tolerance for lat long, but also applied to time

ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest', tolerance=0.5)

Ideally tolerance could accept a dictionary but currently fails

ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest', tolerance={'lat':0.5, 'lon':0.5, 'time':np.timedelta64(1,'D')})

```

Expected Output

A dataset with nearest values to tolerances on each dim.

Problem Description

I would like to add the ability of tolerance to accept a dictionary for multiple tolerance values for different dimensions. Before I try implementing it, I wanted to 1) check it doesn't already exist or someone isn't working on it, and 2) get suggestions for how to proceed.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Feb 20 2019, 02:51:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.9.184-0.1.ac.235.83.329.metal1.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.11.3 pandas: 0.24.1 numpy: 1.15.4 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: 1.5.5 zarr: 2.2.0 cftime: 1.0.3.4 PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None cyordereddict: None dask: 1.1.2 distributed: 1.26.0 matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.8.0 pip: 19.0.3 conda: None pytest: None IPython: 7.3.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3223/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1951543761 I_kwDOAMm_X850UjHR 8335 ```DataArray.sel``` can silently pick up the nearest point, even if it is far away and the query is out of bounds jerabaul29 8382834 open 0     13 2023-10-19T08:02:44Z 2024-04-29T23:02:31Z   CONTRIBUTOR      

What is your issue?

@paulina-t (who found a bug caused by the behavior we report here in a codebase, where it was badly messing things up).

See the example notebook at https://github.com/jerabaul29/public_bug_reports/blob/main/xarray/2023_10_18/interp.ipynb .


Problem

It is always a bit risky to interpolate / find the nearest neighbor to a query or similar, as bad things can happen if querying a value for a point that is outside of the area that is represented. Fortunately, xarray returns NaN if performing interp outside of the bounds of a dataset:

```python import xarray as xr import numpy as np

xr.version

'2023.9.0'

data = np.array([[1, 2, 3], [4, 5, 6]]) lat = [10, 20] lon = [120, 130, 140]

data_xr = xr.DataArray(data, coords={'lat':lat, 'lon':lon}, dims=['lat', 'lon'])

data_xr

<xarray.DataArray (lat: 2, lon: 3)> array([[1, 2, 3], [4, 5, 6]]) Coordinates: * lat (lat) int64 10 20 * lon (lon) int64 120 130 140

interp is civilized: rather than wildly extrapolating, it returns NaN

data_xr.interp(lat=15, lon=125)

<xarray.DataArray ()> array(3.) Coordinates: lat int64 15 lon int64 125

data_xr.interp(lat=5, lon=125)

<xarray.DataArray ()> array(nan) Coordinates: lat int64 5 lon int64 125 ```

Unfortunately, .sel will happily find the nearest neighbor of a point, even if the input point is outside of the dataset range:

```python

sel is not as civilized: it happily finds the neares neighbor, even if it is "on the one side" of the example data

data_xr.sel(lat=5, lon=125, method='nearest')

<xarray.DataArray ()> array(2) Coordinates: lat int64 10 lon int64 130 ```

This can easily cause tricky bugs.


Discussion

Would it be possible for .sel to have a behavior that makes the user aware of such issues? I.e. either:

  • print a warning on stderr
  • return NaN
  • raise an exception

when performing a .sel query that is outside of a dataset range / not in between of 2 dataset points?

I understand that finding the nearest neighbor may still be useful / wanted in some cases even when being outside of the bounds of the dataset, but the fact that this happens silently by default has been causing bugs for us. Could either this default behavior be changed, or maybe enabled with a flag (allow_extrapolate=False by default for example, so users can consciously opt it in)?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8335/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2259316341 I_kwDOAMm_X86Gqm51 8965 Support concurrent loading of variables dcherian 2448579 open 0     4 2024-04-23T16:41:24Z 2024-04-29T22:21:51Z   MEMBER      

Is your feature request related to a problem?

Today if users have to concurrently load multiple variables in a DataArray or Dataset, they have to use dask.

It struck me that it'd be pretty easy for .load to gain an executor kwarg that accepts anything that follows the concurrent.futures executor interface, and parallelize this loop.

https://github.com/pydata/xarray/blob/b0036749542145794244dee4c4869f3750ff2dee/xarray/core/dataset.py#L853-L857

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8965/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2267780811 PR_kwDOAMm_X85t8kgX 8979 Warn on automatic coercion to coordinate variables in Dataset constructor TomNicholas 35968931 open 0     2 2024-04-28T19:44:20Z 2024-04-29T21:13:00Z   MEMBER   0 pydata/xarray/pulls/8979
  • [x] Starts the deprecation cycle for #8959
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
  • [ ] Change existing code + examples so as not to emit this new warning everywhere.
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8979/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2136832627 I_kwDOAMm_X85_XXpz 8755 to_zarr removes global attributes in destination dataset pnorton-usgs 8998112 open 0     12 2024-02-15T15:32:10Z 2024-04-29T19:22:41Z   NONE      

What happened?

Adding new variables to a zarr dataset with to_zarr() always removes the existing global attributes. New global attributes in the source dataset are not always added to the destination dataset depending on how to_zarr() is called.

What did you expect to happen?

I would expect that existing global attributes would always be preserved. If there are new global attributes I would expect them to be added to the existing global attributes instead of replacing all existing global attributes.

Minimal Complete Verifiable Example

```Python import xarray as xr from pyproj import CRS

local_zarr = 'sample.zarr'

ds_sample = xr.tutorial.load_dataset("air_temperature")

Make a local copy

ds_sample.to_zarr(local_zarr, mode='w') ds_sample = xr.open_dataset(local_zarr, engine='zarr', backend_kwargs={'consolidated':True}, chunks={}, decode_coords=True)

Create CRS metadata

crs_meta = CRS.from_epsg(4326).to_cf()

ds_new = xr.Dataset(data_vars={"crs": ([], 1, crs_meta)}) ds_new.attrs['note'] = 'please add this'

Add all variables from ds_new to the zarr

NOTE: This adds the new global attribute but also removes

all existing global attributes

ds_new.to_zarr(local_zarr, mode='a')

Add selected variable(s) to zarr dataset

NOTE: This does not copy new global attributes

and removes all existing global attributes

ds_new['crs'].to_zarr(local_zarr, mode='a')

Re-open local zarr store

ds_sample = xr.open_dataset(local_zarr, engine='zarr', backend_kwargs={'consolidated':True}, chunks={}, decode_coords=True)

ds_sample ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:26:40) [Clang 14.0.6 ] python-bits: 64 OS: Darwin OS-release: 22.6.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 2024.1.1 pandas: 2.2.0 numpy: 1.26.4 scipy: 1.12.0 netCDF4: 1.6.0 pydap: installed h5netcdf: 1.3.0 h5py: 3.8.0 Nio: None zarr: 2.17.0 cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: 1.3.7 dask: 2024.2.0 distributed: 2024.2.0 matplotlib: 3.8.2 cartopy: 0.22.0 seaborn: None numbagg: None fsspec: 2023.12.2 cupy: None pint: 0.23 sparse: None flox: None numpy_groupies: None setuptools: 69.0.3 pip: 24.0 conda: None pytest: 8.0.0 mypy: None IPython: 8.21.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8755/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1250939008 I_kwDOAMm_X85Kj9CA 6646 `dim` vs `dims` max-sixty 5635139 closed 0     4 2022-05-27T16:15:02Z 2024-04-29T18:24:56Z 2024-04-29T18:24:56Z MEMBER      

What is your issue?

I've recently been hit with this when experimenting with xr.dot and xr.corr — xr.dot takes dims, and xr.cov takes dim. Because they each take multiple arrays as positional args, kwargs are more conventional.

Should we standardize on one of these?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6646/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2268058661 PR_kwDOAMm_X85t9f5f 8982 Switch all methods to `dim` max-sixty 5635139 closed 0     0 2024-04-29T03:42:34Z 2024-04-29T18:24:56Z 2024-04-29T18:24:55Z MEMBER   0 pydata/xarray/pulls/8982

I think this is the final set of methods

  • [x] Closes #6646
  • [ ] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8982/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2267810980 PR_kwDOAMm_X85t8q4s 8981 Enable ffill for datetimes max-sixty 5635139 closed 0     5 2024-04-28T20:53:18Z 2024-04-29T18:09:48Z 2024-04-28T23:02:11Z MEMBER   0 pydata/xarray/pulls/8981

Notes inline. Would fix #4587

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8981/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2019566184 I_kwDOAMm_X854YCJo 8494 Filter expected warnings in the test suite TomNicholas 35968931 closed 0     1 2023-11-30T21:50:15Z 2024-04-29T16:57:07Z 2024-04-29T16:56:16Z MEMBER      

FWIW one thing I'd be keen for to do generally — though maybe this isn't the place to start it — is handle warnings in the test suite when we add a new warning — i.e. filter them out where we expect them.

In this case, that would be the loading the netCDF files that have duplicate dims.

Otherwise warnings become a huge block of text without much salience. I mostly see the 350 lines of them and think "meh mostly units & cftime", but then something breaks on a new upstream release that was buried in there, or we have a supported code path that is raising warnings internally.

(I'm not sure whether it's possible to generally enforce that — maybe we could raise on any warnings coming from within xarray? Would be a non-trivial project to get us there though...)

Originally posted by @max-sixty in https://github.com/pydata/xarray/issues/8491#issuecomment-1834615826

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8494/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1949782787 PR_kwDOAMm_X85dJLvY 8332 Add invert option to DataArray/Dataset.stack() carschandler 92899389 open 0     9 2023-10-18T13:38:38Z 2024-04-29T16:29:40Z   FIRST_TIME_CONTRIBUTOR   0 pydata/xarray/pulls/8332

This brings in the option to stack all dimensions except for one or more dimensions listed. I find this very useful for quickly iterating over all the combinations of dimensions except for one (i.e. you have a number of input parameters that are parameterized and one time dimension, and you want to calculate some time response for all the combinations of these input parameters and store them in the time-row corresponding to the appropriate combination of inputs).

I played around with implementing to_stacked_array() for DataArray, but this made less sense in the end since that method was really designed for Datasets.

  • [x] Addresses #8278
  • [ ] Tests added (added one for DataArray, just now realizing I should have one for Dataset)
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8332/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2261855627 PR_kwDOAMm_X85togwQ 8969 CI: python 3.12 by default. dcherian 2448579 closed 0     2 2024-04-24T17:49:25Z 2024-04-29T16:21:20Z 2024-04-29T16:21:08Z MEMBER   0 pydata/xarray/pulls/8969
  1. Now that numba supports 3.12.
  2. Disabled pint on the main environment since it doesn't work. Pint is still installed in the all-but-dask env, which is still runs python 3.11 for this reason.
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8969/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2118308210 I_kwDOAMm_X85-QtFy 8707 Weird interaction between aggregation and multiprocessing on DaskArrays saschahofmann 24508496 closed 0     10 2024-02-05T11:35:28Z 2024-04-29T16:20:45Z 2024-04-29T16:20:44Z CONTRIBUTOR      

What happened?

When I try to run a modified version of the example from the dropna documentation (see below), it creates a never terminating process. To reproduce it I added a rolling operation before dropping nans and then run 4 processes using the standard library multiprocessing Pool class on DaskArrays. Running the rolling + dropna in a for loop finishes as expectedly in no time.

What did you expect to happen?

There is nothing obvious to me why this wouldn't just work unless there is a weird interaction between the Dask threads and the different processes. Using Xarray+Dask+Multiprocessing seems to work for me on other functions, it seems to be this particular combination that is problematic.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np from multiprocessing import Pool

datasets = [xr.Dataset( { "temperature": ( ["time", "location"], [[23.4, 24.1], [np.nan if i>1 else 23.4, 22.1 if i<2 else np.nan], [21.8 if i<3 else np.nan, 24.2], [20.5, 25.3]], ) }, coords={"time": [1, 2, 3, 4], "location": ["A", "B"]}, ).chunk(time=2) for i in range(4)]

def process(dataset): return dataset.rolling(dim={'time':2}).sum().dropna(dim="time", how="all").compute()

This works as expected

dropped = [] for dataset in datasets: dropped.append(process(dataset))

This seems to never finish

with Pool(4) as p: dropped = p.map(process, datasets) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

I am still running on 2023.08.0 see below for more details about the environment

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 (main, Jan 25 2024, 20:42:03) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-124-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2023.8.0 pandas: 2.1.4 numpy: 1.26.3 scipy: 1.12.0 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: 1.4.1 PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2024.1.1 distributed: 2024.1.1 matplotlib: 3.8.2 cartopy: 0.22.0 seaborn: None numbagg: None fsspec: 2023.12.2 cupy: None pint: 0.23 sparse: None flox: 0.9.0 numpy_groupies: 0.10.2 setuptools: 69.0.3 pip: 23.2.1 conda: None pytest: 8.0.0 mypy: None IPython: 8.18.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8707/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2267711587 PR_kwDOAMm_X85t8VWy 8978 more engine environment tricks in preparation for `numpy>=2` keewis 14808389 closed 0     7 2024-04-28T17:54:38Z 2024-04-29T14:56:22Z 2024-04-29T14:56:21Z MEMBER   0 pydata/xarray/pulls/8978

Turns out pydap also needs to build with numpy>=2. Until it does, we should remove it from the upstream-dev environment. Also, numcodecs build-depends on setuptools-scm.

And finally, the h5py nightlies might support numpy>=2 (h5py>=3.11 supposedly is numpy>=2 compatible), so once again I'll try and see if CI passes.

  • [x] towards #8844
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8978/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2244518111 PR_kwDOAMm_X85suNEO 8946 Fix upcasting with python builtin numbers and numpy 2 djhoese 1828519 open 0     18 2024-04-15T20:07:42Z 2024-04-29T12:38:55Z   CONTRIBUTOR   0 pydata/xarray/pulls/8946

See #8402 for more discussion. Bottom line is that numpy 2 changes the rules for casting between two inputs. Due to this and xarray's preference for promoting python scalars to 0d arrays (scalar arrays), xarray objects are being upcast to higher data types when they previously didn't.

I'm mainly opening this PR for further and more detailed discussion.

CC @dcherian

  • [ ] Closes #8402
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8946/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2262478932 PR_kwDOAMm_X85tqpUi 8974 Raise errors on new warnings from within xarray max-sixty 5635139 closed 0     2 2024-04-25T01:50:48Z 2024-04-29T12:18:42Z 2024-04-29T02:50:21Z MEMBER   0 pydata/xarray/pulls/8974

Notes are inline.

  • [x] Closes https://github.com/pydata/xarray/issues/8494
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst

Done with some help from an LLM — quite good for doing tedious tasks that we otherwise wouldn't want to do — can paste in all the warnings output and get a decent start on rules for exclusions

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8974/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1997537503 PR_kwDOAMm_X85fqp3A 8459 Check for aligned chunks when writing to existing variables max-sixty 5635139 closed 0     5 2023-11-16T18:56:06Z 2024-04-29T03:05:36Z 2024-03-29T14:35:50Z MEMBER   0 pydata/xarray/pulls/8459

While I don't feel super confident that this is designed to protect against any bugs, it does solve the immediate problem in #8371, by hoisting the encoding check above the code that runs for only new variables. The encoding check is somewhat implicit, so this was an easy thing to miss prior.

  • [x] Closes #8371,
  • [x] Closes #8882
  • [x] Closes #8876
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8459/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1574694462 I_kwDOAMm_X85d2-4- 7513 intermittent failures with h5netcdf, h5py on macos dcherian 2448579 closed 0     5 2023-02-07T16:58:43Z 2024-04-28T23:35:21Z 2024-04-28T23:35:21Z MEMBER      

What is your issue?

cc @hmaarrfk @kmuehlbauer

Passed: https://github.com/pydata/xarray/actions/runs/4115923717/jobs/7105298426 Failed: https://github.com/pydata/xarray/actions/runs/4115946392/jobs/7105345290

Versions: h5netcdf 1.1.0 pyhd8ed1ab_0 conda-forge h5py 3.8.0 nompi_py310h5555e59_100 conda-forge hdf4 4.2.15 h7aa5921_5 conda-forge hdf5 1.12.2 nompi_h48135f9_101 conda-forge

``` =================================== FAILURES =================================== ___ test_open_mfdataset_manyfiles[h5netcdf-20-True-5-5] ______ [gw1] darwin -- Python 3.10.9 /Users/runner/micromamba-root/envs/xarray-tests/bin/python

readengine = 'h5netcdf', nfiles = 20, parallel = True, chunks = 5 file_cache_maxsize = 5

@requires_dask
@pytest.mark.filterwarnings("ignore:use make_scale(name) instead")
def test_open_mfdataset_manyfiles(
    readengine, nfiles, parallel, chunks, file_cache_maxsize
):
    # skip certain combinations
    skip_if_not_engine(readengine)

    if ON_WINDOWS:
        pytest.skip("Skipping on Windows")

    randdata = np.random.randn(nfiles)
    original = Dataset({"foo": ("x", randdata)})
    # test standard open_mfdataset approach with too many files
    with create_tmp_files(nfiles) as tmpfiles:
        writeengine = readengine if readengine != "pynio" else "netcdf4"
        # split into multiple sets of temp files
        for ii in original.x.values:
            subds = original.isel(x=slice(ii, ii + 1))
            if writeengine != "zarr":
                subds.to_netcdf(tmpfiles[ii], engine=writeengine)
            else:  # if writeengine == "zarr":
                subds.to_zarr(store=tmpfiles[ii])

        # check that calculation on opened datasets works properly
      with open_mfdataset(
            tmpfiles,
            combine="nested",
            concat_dim="x",
            engine=readengine,
            parallel=parallel,
            chunks=chunks if (not chunks and readengine != "zarr") else "auto",
        ) as actual:

/Users/runner/work/xarray/xarray/xarray/tests/test_backends.py:3267:


/Users/runner/work/xarray/xarray/xarray/backends/api.py:991: in open_mfdataset datasets, closers = dask.compute(datasets, closers) /Users/runner/micromamba-root/envs/xarray-tests/lib/python3.10/site-packages/dask/base.py:599: in compute results = schedule(dsk, keys, kwargs) /Users/runner/micromamba-root/envs/xarray-tests/lib/python3.10/site-packages/dask/threaded.py:89: in get results = get_async( /Users/runner/micromamba-root/envs/xarray-tests/lib/python3.10/site-packages/dask/local.py:511: in get_async raise_exception(exc, tb) /Users/runner/micromamba-root/envs/xarray-tests/lib/python3.10/site-packages/dask/local.py:319: in reraise raise exc /Users/runner/micromamba-root/envs/xarray-tests/lib/python3.10/site-packages/dask/local.py:224: in execute_task result = _execute_task(task, data) /Users/runner/micromamba-root/envs/xarray-tests/lib/python3.10/site-packages/dask/core.py:119: in _execute_task return func((_execute_task(a, cache) for a in args)) /Users/runner/micromamba-root/envs/xarray-tests/lib/python3.10/site-packages/dask/utils.py:72: in apply return func(args, kwargs) /Users/runner/work/xarray/xarray/xarray/backends/api.py:526: in open_dataset backend_ds = backend.open_dataset( /Users/runner/work/xarray/xarray/xarray/backends/h5netcdf_.py:417: in open_dataset ds = store_entrypoint.open_dataset( /Users/runner/work/xarray/xarray/xarray/backends/store.py:32: in open_dataset vars, attrs = store.load() /Users/runner/work/xarray/xarray/xarray/backends/common.py:129: in load (decode_variable_name(k), v) for k, v in self.get_variables().items() /Users/runner/work/xarray/xarray/xarray/backends/h5netcdf.py:220: in get_variables return FrozenDict( /Users/runner/work/xarray/xarray/xarray/core/utils.py:471: in FrozenDict return Frozen(dict(args, *kwargs)) /Users/runner/work/xarray/xarray/xarray/backends/h5netcdf_.py:221: in <genexpr> (k, self.open_store_variable(k, v)) for k, v in self.ds.variables.items() /Users/runner/work/xarray/xarray/xarray/backends/h5netcdf_.py:200: in open_store_variable elif var.compression is not None: /Users/runner/micromamba-root/envs/xarray-tests/lib/python3.10/site-packages/h5netcdf/core.py:394: in compression return self._h5ds.compression


self = <[AttributeError("'NoneType' object has no attribute '_root'") raised in repr()] Variable object at 0x151378970>

@property
def _h5ds(self):
    # Always refer to the root file and store not h5py object
    # subclasses:
  return self._root._h5file[self._h5path]

E AttributeError: 'NoneType' object has no attribute '_h5file'

```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7513/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1579956621 I_kwDOAMm_X85eLDmN 7519 Selecting variables from Dataset with view on dict keys is of type DataArray derhintze 25172489 closed 0     7 2023-02-10T16:02:19Z 2024-04-28T21:01:28Z 2024-04-28T21:01:27Z NONE      

What happened?

When selecting variables from a Dataset using a view on dict keys, the type returned is a DataArray, whereas the same using a list is a Dataset.

What did you expect to happen?

The type returned should be a Dataset.

Minimal Complete Verifiable Example

```Python import xarray as xr

d = {"a": ("dim", range(1, 4)), "b": ("dim", range(2, 5))}

data = xr.Dataset(d) select_dict = data[d.keys()] select_list = data[list(d)]

reveal_type(select_dict) reveal_type(select_list) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Python $ mypy test.py test.py:9: note: Revealed type is "xarray.core.dataarray.DataArray" test.py:10: note: Revealed type is "xarray.core.dataset.Dataset" Success: no issues found in 1 source file

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.10 (main, Mar 15 2022, 15:56:56) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.49.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.0 xarray: 2022.12.0 pandas: 1.5.2 numpy: 1.23.5 scipy: 1.10.0 netCDF4: 1.6.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.6.3 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 58.1.0 pip: 23.0 conda: None pytest: 7.2.1 mypy: 0.991 IPython: 8.8.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7519/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1024011835 I_kwDOAMm_X849CS47 5857 Incorrect results when using xarray.ufuncs.angle(..., deg=True) cvr 1119116 closed 0     4 2021-10-12T16:24:11Z 2024-04-28T20:58:55Z 2024-04-28T20:58:54Z NONE      

What happened:

The xarray.ufuncs.angle is broken. From the help docstring one may use option deg=True to have the result in degrees instead of radians (which is consistent with numpy.angle function). Yet results show that this is not the case. Moreover specifying deg=True or deg=False leads to the same result with the values in radians.

What you expected to happen:

To have the result of xarray.ufuncs.angle converted to degrees when option deg=True is specified.

Minimal Complete Verifiable Example:

```python

Put your MCVE code here

import numpy as np import xarray as xr

ds = xr.Dataset(coords={'wd': ('wd', np.arange(0, 360, 30, dtype=float))})

Z = xr.ufuncs.exp(1j * xr.ufuncs.radians(ds.wd)) D = xr.ufuncs.angle(Z, deg=True) # YIELDS INCORRECT RESULTS if not np.allclose(ds.wd, (D % 360)): print(f"Issue with angle operation: {D.values%360} instead of {ds.wd.values}" \ + f"\n\tERROR xr.ufuncs.angle(Z, deg=True) gives incorrect results !!!")

D = xr.ufuncs.degrees(xr.ufuncs.angle(Z)) # Works OK if not np.allclose(ds.wd, (D % 360)): print(f"Issue with angle operation: {D%360} instead of {ds.wd}" \ + f"\n\tERROR xr.ufuncs.degrees(xr.ufuncs.angle(Z)) gives incorrect results!!!")

D = xr.apply_ufunc(np.angle, Z, kwargs={'deg': True}) # Works OK if not np.allclose(ds.wd, (D % 360)): print(f"Issue with angle operation: {D%360} instead of {ds.wd}" \ + f"\n\tERROR xr.apply_ufunc(np.angle, Z, kwargs={{'deg': True}}) gives incorrect results!!!") ```

Anything else we need to know?:

Though xarray.ufuncs has a deprecated warning stating that the numpy equivalent may be used, this is not true for numpy.angle. Example:

```python import numpy as np import xarray as xr

ds = xr.Dataset(coords={'wd': ('wd', np.arange(0, 360, 30, dtype=float))})

Z = np.exp(1j * np.radians(ds.wd)) print(Z) print(f"Is Z an XArray? {isinstance(Z, xr.DataArray)}")

D = np.angle(ds.wd, deg=True) print(D) print(f"Is D an XArray? {isinstance(D, xr.DataArray)}") `` If this code is run, the result ofnumpy.angle(xarray.DataArray)is not a DataArray object, contrary to other numpy operations (for all versions of xarray I've used). Hence thexarray.ufuncs.angle` is a great option, if it was not for the current problem.

Environment:

No issues with xarray versions 0.16.2 and 0.17.0. This error happens from 0.18.0 onwards, up to 0.19.0 (recentmost).

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.19.0-18-amd64 machine: x86_64 processor: byteorder: little LC_ALL: en_US.utf8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 0.19.0 pandas: 1.2.3 numpy: 1.20.2 scipy: 1.5.3 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 58.2.0 pip: 21.3 conda: 4.10.3 pytest: None IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5857/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1518812301 I_kwDOAMm_X85ahzyN 7414 Error using xarray.interp - function signature does not match with scipy.interpn Florian1209 20089326 closed 0     2 2023-01-04T11:30:48Z 2024-04-28T20:55:33Z 2024-04-28T20:55:33Z NONE      

What happened?

I am experiencing an error when using the array.interp function. The error message indicates that the function signature does not match with scipy interpn.

It 's linked to scipy update 1.10.0 (2023/01/03).

What did you expect to happen?

I would interpolate 2D data of numpy float64 : two data lattitudes and longitudes following <xarray.DataArray (row: 32, col: 32)>. da is a xarray dataset : <xarray.Dataset> Dimensions: (lat: 721, lon: 1441) Coordinates: * lat (lat) float64 90.0 89.75 89.5 89.25 ... -89.25 -89.5 -89.75 -90.0 * lon (lon) float64 0.0 0.25 0.5 0.75 1.0 ... 359.2 359.5 359.8 360.0 Data variables: hgt (lat, lon) >f4 13.61 13.61 13.61 13.61 ... -29.53 -29.53 -29.53 Attributes:

Minimal Complete Verifiable Example

Python interpolated_da = da.interp( { "x": xr.DataArray(x, dims=("x", "y")), "y": xr.DataArray(y, dims=("x", "y")), } )

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python interpolated_da = da.interp( venv/lib/python3.8/site-packages/xarray/core/dataset.py:3378: in interp variables[name] = missing.interp(var, var_indexers, method, kwargs) venv/lib/python3.8/site-packages/xarray/core/missing.py:639: in interp interped = interp_func( venv/lib/python3.8/site-packages/xarray/core/missing.py:764: in interp_func return _interpnd(var, x, new_x, func, kwargs) venv/lib/python3.8/site-packages/xarray/core/missing.py:788: in _interpnd rslt = func(x, var, xi, kwargs) venv/lib/python3.8/site-packages/scipy/interpolate/_rgi.py:654: in interpn return interp(xi) venv/lib/python3.8/site-packages/scipy/interpolate/_rgi.py:336: in call result = evaluate_linear_2d(self.values,


??? E TypeError: No matching signature found

_rgi_cython.pyx:19: TypeError ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.4.0-135-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.0

xarray: 2022.12.0 pandas: 1.5.2 numpy: 1.22.4 scipy: 1.10.0 netCDF4: 1.6.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.4 cfgrib: None iris: None bottleneck: None dask: 2022.12.1 distributed: 2022.12.1 matplotlib: 3.6.2 cartopy: None seaborn: None numbagg: None fsspec: 2022.11.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.6.3 pip: 22.3.1 conda: None pytest: 7.2.0 mypy: None IPython: 8.7.0 sphinx: 5.3.0 None

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7414/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1039113959 I_kwDOAMm_X849757n 5913 Invalid characters in OpenDAP URL pmartineauGit 57886986 closed 0     5 2021-10-29T02:54:14Z 2024-04-28T20:55:17Z 2024-04-28T20:55:17Z NONE      

Hello,

I have successfully opened an OpenDAP URL with ds = xarray.open_dataset(url) However, after selecting a subset with ds = ds.isel(time=0) and attempting to load the data with ds.load(), I get the following error:

HTTP Status 400 – Bad Request: Invalid character found in the request

target. The valid characters are defined in RFC 7230 and RFC 3986

I suspect the reason is that square brackets are passed in the URL when attempting to load: ...zg_6hrPlevPt_MIROC6_historical_r1i1p1f1_gn_185001010600-185101010000.nc.dods?zg.zg[0][0:6][0:127][0:255]] because of the index selection with .isel()

In fact, some servers do forbid square brackets: https://www.unidata.ucar.edu/mailing_lists/archives/thredds/2020/msg00056.html

Would it be possible to provide an option to encode URLs? ( [ becomes %5B, and ] becomes %5D )

Or, instead of loading directly with ds.load(), is there a way for me to retrieve the URL with offending brackets that is generated automatically by xarray, encode it myself, and then use ds2 = xarray.load_dataset(encoded_url) to load?

Thank you for your help!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5913/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  not_planned xarray 13221727 issue
1244977848 I_kwDOAMm_X85KNNq4 6629 `plot.imshow` with datetime coordinate fails shaharkadmiel 6872529 closed 0     5 2022-05-23T10:56:46Z 2024-04-28T20:16:44Z 2024-04-28T20:16:44Z NONE      

What happened?

When trying to plot a 2d DataArray that has one of the 2 coordinates as datetime with da.plot.imshow, the following error is returned:

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

I know that I can use pcolormesh instead but on large arrays, imshow is much faster. It also behaves nicer with transparency and interpolation so or regularly sampled data, I find imshow a better choice.

Here is a minimal working example:

```python import numpy as np from xarray import DataArray from pandas import date_range

time = date_range('2020-01-01', periods=7, freq='D') y = np.linspace(0, 10, 11) da = DataArray( np.random.rand(time.size, y.size), coords=dict(time=time, y=y), dims=('time', 'y') )

da.plot.imshow(x='time', y='y') ```

What did you expect to happen?

I suggest the following solution which can be added after https://github.com/pydata/xarray/blob/4da7fdbd85bb82e338ad65a532dd7a9707e18ce0/xarray/plot/plot.py#L1366

python left, right = map(date2num, (left, right))

and then adding: python ax.xaxis_date() plt.setp(ax.get_xticklabels(), rotation=30, ha='right')

Minimal Complete Verifiable Example

```Python import numpy as np from xarray import DataArray from pandas import date_range

creating the data

time = date_range('2020-01-01', periods=7, freq='D') y = np.linspace(0, 10, 11) da = DataArray( np.random.rand(time.size, y.size), coords=dict(time=time, y=y), dims=('time', 'y') )

import matplotlib.pyplot as plt from matplotlib.dates import date2num, AutoDateFormatter

from https://github.com/pydata/xarray/blob/4da7fdbd85bb82e338ad65a532dd7a9707e18ce0/xarray/plot/plot.py#L1348

def _center_pixels(x): """Center the pixels on the coordinates.""" if np.issubdtype(x.dtype, str): # When using strings as inputs imshow converts it to # integers. Choose extent values which puts the indices in # in the center of the pixels: return 0 - 0.5, len(x) - 0.5

try:
    # Center the pixels assuming uniform spacing:
    xstep = 0.5 * (x[1] - x[0])
except IndexError:
    # Arbitrary default value, similar to matplotlib behaviour:
    xstep = 0.1

return x[0] - xstep, x[-1] + xstep

Center the pixels:

left, right = _center_pixels(da.time) top, bottom = _center_pixels(da.y)

the magical step

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

left, right = map(date2num, (left, right))

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

plotting

fig, ax = plt.subplots() ax.imshow( da.T, extent=(left, right, top, bottom), origin='lower', aspect='auto' )

ax.xaxis_date() plt.setp(ax.get_xticklabels(), rotation=30, ha='right') ```

MVCE confirmation

  • [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python

TypeError Traceback (most recent call last) /var/folders/bj/czjbfh496258q1lc3p01lyz00000gn/T/ipykernel_59425/1460104966.py in <module> ----> 1 da.plot.imshow(x='time', y='y')

~/miniconda3/lib/python3.8/site-packages/xarray/plot/plot.py in plotmethod(_PlotMethods_obj, x, y, figsize, size, aspect, ax, row, col, col_wrap, xincrease, yincrease, add_colorbar, add_labels, vmin, vmax, cmap, colors, center, robust, extend, levels, infer_intervals, subplot_kws, cbar_ax, cbar_kwargs, xscale, yscale, xticks, yticks, xlim, ylim, norm, kwargs) 1306 for arg in ["_PlotMethods_obj", "newplotfunc", "kwargs"]: 1307 del allargs[arg] -> 1308 return newplotfunc(allargs) 1309 1310 # Add to class _PlotMethods

~/miniconda3/lib/python3.8/site-packages/xarray/plot/plot.py in newplotfunc(darray, x, y, figsize, size, aspect, ax, row, col, col_wrap, xincrease, yincrease, add_colorbar, add_labels, vmin, vmax, cmap, center, robust, extend, levels, infer_intervals, colors, subplot_kws, cbar_ax, cbar_kwargs, xscale, yscale, xticks, yticks, xlim, ylim, norm, kwargs) 1208 ax = get_axis(figsize, size, aspect, ax, subplot_kws) 1209 -> 1210 primitive = plotfunc( 1211 xplt, 1212 yplt,

~/miniconda3/lib/python3.8/site-packages/xarray/plot/plot.py in imshow(x, y, z, ax, kwargs) 1394 z[np.any(z.mask, axis=-1), -1] = 0 1395 -> 1396 primitive = ax.imshow(z, defaults) 1397 1398 # If x or y are strings the ticklabels have been replaced with

~/miniconda3/lib/python3.8/site-packages/matplotlib/_api/deprecation.py in wrapper(args, kwargs) 454 "parameter will become keyword-only %(removal)s.", 455 name=name, obj_type=f"parameter of {func.name}()") --> 456 return func(args, kwargs) 457 458 # Don't modify func's signature, as boilerplate.py needs it.

~/miniconda3/lib/python3.8/site-packages/matplotlib/init.py in inner(ax, data, args, kwargs) 1410 def inner(ax, args, data=None, kwargs): 1411 if data is None: -> 1412 return func(ax, *map(sanitize_sequence, args), kwargs) 1413 1414 bound = new_sig.bind(ax, args, *kwargs)

~/miniconda3/lib/python3.8/site-packages/matplotlib/axes/_axes.py in imshow(self, X, cmap, norm, aspect, interpolation, alpha, vmin, vmax, origin, extent, interpolation_stage, filternorm, filterrad, resample, url, **kwargs) 5450 # update ax.dataLim, and, if autoscaling, set viewLim 5451 # to tightly fit the image, regardless of dataLim. -> 5452 im.set_extent(im.get_extent()) 5453 5454 self.add_image(im)

~/miniconda3/lib/python3.8/site-packages/matplotlib/image.py in set_extent(self, extent) 980 self._extent = xmin, xmax, ymin, ymax = extent 981 corners = (xmin, ymin), (xmax, ymax) --> 982 self.axes.update_datalim(corners) 983 self.sticky_edges.x[:] = [xmin, xmax] 984 self.sticky_edges.y[:] = [ymin, ymax]

~/miniconda3/lib/python3.8/site-packages/matplotlib/axes/_base.py in update_datalim(self, xys, updatex, updatey) 2474 """ 2475 xys = np.asarray(xys) -> 2476 if not np.any(np.isfinite(xys)): 2477 return 2478 self.dataLim.update_from_data_xy(xys, self.ignore_existing_data_limits,

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:21:17) [Clang 11.1.0 ] python-bits: 64 OS: Darwin OS-release: 21.4.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.3.0 pandas: 1.3.4 numpy: 1.21.4 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.10.3 cftime: 1.6.0 nc_time_axis: 1.4.0 PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.5 iris: None bottleneck: 1.3.2 dask: 2022.04.0 distributed: 2022.4.0 matplotlib: 3.5.0 cartopy: 0.20.2 seaborn: 0.11.2 numbagg: None fsspec: 2022.5.0 cupy: None pint: 0.18 sparse: None setuptools: 62.3.2 pip: 22.1.1 conda: 4.12.0 pytest: None IPython: 7.30.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6629/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
803075280 MDU6SXNzdWU4MDMwNzUyODA= 4880 Datetime as coordinaets does not convert back to datetime (returns int) feefladder 33122845 closed 0     6 2021-02-07T22:20:11Z 2024-04-28T20:13:33Z 2024-04-28T20:13:32Z CONTRIBUTOR      

What happened: datetime was in np.datetime64 formet. When converted t datetime.datetime format it returned an int What you expected to happen: `to get a datetime returned Minimal Complete Verifiable Example:

```python

Put your MCVE code here

import xarray as xr import numpy as np import datetime date_frame = xr.DataArray(dims='time',coords={'time':pd.date_range('2000-01-01',periods=365)},data=np.zeros(365)) print('pandas date range (datetime): ',pd.date_range('2000-01-01',periods=365)[0]) print('dataframe datetime converted to datetime (int): ',date_frame.coords['time'].data[0].astype(datetime.datetime)) print("normal numpy datetime64 converted to datetime (datetime): ",np.datetime64(datetime.datetime(2000,1,1)).astype(datetime.datetime)) output: pandas date range (datetime): 2000-01-01 00:00:00 dataframe datetime converted to datetime (int): 946684800000000000 normal numpy datetime64 converted to datetime (datetime): 2000-01-01 00:00:00 ```

if converted to int, it also gives different lengths of int : date_frame: 946684800000000000 946684800000000 normal datetime64^ Anything else we need to know?:

it is also mentioned in this SO thread appears to be a problem in the datetime64....

numpy version 1.20.0 pandas version 1.2.1

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.9 | packaged by conda-forge | (default, Dec 9 2020, 21:08:20) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-59-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.16.2 pandas: 1.2.1 numpy: 1.20.0 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.6.1 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.01.1 distributed: 2021.01.1 matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: None IPython: 7.20.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4880/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1402002645 I_kwDOAMm_X85TkNzV 7146 Segfault writing large netcdf files to s3fs d1mach 11075246 closed 0     17 2022-10-08T16:56:31Z 2024-04-28T20:11:59Z 2024-04-28T20:11:59Z NONE      

What happened?

It seems netcdf4 does not work well currently with s3fs the FUSE filesystem layer over S3 compatible storage with either the default netcdf4 engine nor with the h5netcdf.

Here is an example python import numpy as np import xarray as xr from datetime import datetime, timedelta NTIMES=48 start = datetime(2022,10,6,0,0) time_vals = [start + timedelta(minutes=20*t) for t in range(NTIMES)] times = xr.DataArray(data = [t.strftime('%Y%m%d%H%M%S').encode() for t in time_vals], dims=['Time']) v1 = xr.DataArray(data=np.zeros((len(times), 201, 201)), dims=['Time', 'x', 'y']) ds = xr.Dataset(data_vars=dict(times=times, v1=v1)) ds.to_netcdf(path='/my_s3_fs/test_netcdf.nc', format='NETCDF4', mode='w') On my system this code crashes with NTIMES=48, but completes without an error with NTIMES=24.

The output with NTIMES=48 is

``` There are 1 HDF5 objects open!

Report: open objects on 72057594037927936 Segmentation fault (core dumped) ```

I have tried the other engine that handles NETCDF4 in xarray with engine='h5netcdf' and also got a segfault.

A quick workaround seems to be to use the local filesystem to write the NetCDF file and then move the complete file to S3.

python ds.to_netcdf(path='/tmp/test_netcdf.nc', format='NETCDF4', mode='w') shutil.move('/tmp/test_netcdf.nc', '/my_s3_fs/test_netcdf.nc') There are several pieces of software involved here: the xarray package (0.16.1), netcdf4 (1.5.4), HDF5 (1.10.6), and s3fs (1.79). If this is not a bug in my code but in the underlying libraries, most likely it is not an xarray bug, but since it fails with both Netcdf4 engines, I decided to report it here.

What did you expect to happen?

With NTIMES=24 I am getting a file /my_s3_fs/test_netcdf.nc of about 7.8 MBytes. WIth NTIMES=36 I get an empty file. I would expect to have this code run without a segfault and produce a nonempty file.

Minimal Complete Verifiable Example

Python import numpy as np import xarray as xr from datetime import datetime, timedelta NTIMES=48 start = datetime(2022,10,6,0,0) time_vals = [start + timedelta(minutes=20*t) for t in range(NTIMES)] times = xr.DataArray(data = [t.strftime('%Y%m%d%H%M%S').encode() for t in time_vals], dims=['Time']) v1 = xr.DataArray(data=np.zeros((len(times), 201, 201)), dims=['Time', 'x', 'y']) ds = xr.Dataset(data_vars=dict(times=times, v1=v1)) ds.to_netcdf(path='/my_s3_fs/test_netcdf.nc', format='NETCDF4', mode='w')

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python There are 1 HDF5 objects open!

Report: open objects on 72057594037927936 Segmentation fault (core dumped) ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.8.3 | packaged by conda-forge | (default, Jun 1 2020, 17:43:00) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-26-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.1 pandas: 1.1.3 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.4 pydap: None h5netcdf: 1.0.2 h5py: 3.1.0 Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: None matplotlib: 3.3.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 50.3.0.post20201006 pip: 20.2.3 conda: 22.9.0 pytest: 6.1.1 IPython: 7.18.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7146/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1437481995 I_kwDOAMm_X85VrjwL 7259 🐛 NetCDF4 RuntimeWarning if xarray is imported before netCDF4 s-weigand 9513634 open 0     18 2022-11-06T17:38:35Z 2024-04-28T20:11:35Z   CONTRIBUTOR      

What happened?

Yesterday we got a dependabot update PR to upgrade xarray from 2022.10.0 to 2022.11.0 and a test where we check for our own deprecation warnings failed because there was an additional unexpected warning. After some debugging we found that the warning was caused by calling xarray.Dataset.to_netcdf for the first time in our test suite, but did not trigger when calling it again.

After a lot of head-scratching and confusion, we found that it is an import order problem that can be solved by importing netCDF4 before importing xarray (we didn't import netCDF4 at all in our code).

What did you expect to happen?

No RuntimeWarning from netCDF4

Minimal Complete Verifiable Example

Python import xarray import warnings warnings.filterwarnings('error') import netCDF4

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Python Traceback (most recent call last): File "d:\git\pyglotaran\glotaran\builtin\io\folder\test\test_folder_plugin.py", line 86, in <module> save_result(result_path="foo", format_name="folder", result=result) File "D:\git\pyglotaran\glotaran\plugin_system\io_plugin_utils.py", line 87, in wrapper return func(*args, **kwargs) File "D:\git\pyglotaran\glotaran\plugin_system\project_io_registration.py", line 473, in save_result paths = io.save_result( # type: ignore[call-arg] File "D:\git\pyglotaran\glotaran\builtin\io\folder\folder_plugin.py", line 192, in save_result save_dataset( File "D:\git\pyglotaran\glotaran\plugin_system\io_plugin_utils.py", line 87, in wrapper return func(*args, **kwargs) File "D:\git\pyglotaran\glotaran\plugin_system\data_io_registration.py", line 242, in save_dataset io.save_dataset( # type: ignore[call-arg] File "D:\git\pyglotaran\glotaran\builtin\io\netCDF\netCDF.py", line 24, in save_dataset data_to_save.to_netcdf(file_name, mode="w") File "C:\Anaconda3\envs\pyglotaran310\lib\site-packages\xarray\core\dataset.py", line 1903, in to_netcdf return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( File "C:\Anaconda3\envs\pyglotaran310\lib\site-packages\xarray\backends\api.py", line 1176, in to_netcdf engine = _get_default_engine(path_or_file) File "C:\Anaconda3\envs\pyglotaran310\lib\site-packages\xarray\backends\api.py", line 140, in _get_default_engine return _get_default_engine_netcdf() File "C:\Anaconda3\envs\pyglotaran310\lib\site-packages\xarray\backends\api.py", line 118, in _get_default_engine_netcdf import netCDF4 # noqa: F401 File "C:\Anaconda3\envs\pyglotaran310\lib\site-packages\netCDF4\__init__.py", line 3, in <module> from ._netCDF4 import * File "src\netCDF4\_netCDF4.pyx", line 1, in init netCDF4._netCDF4 RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 16 from C header, got 96 from PyObject

Anything else we need to know?

The problem can be reproduced by running console python -c "import xarray;import warnings;warnings.filterwarnings('error');import netCDF4"

which throws the error python Traceback (most recent call last): File "<string>", line 1, in <module> File "C:\Anaconda3\envs\xarray\lib\site-packages\netCDF4\__init__.py", line 3, in <module> from ._netCDF4 import * File "src\netCDF4\_netCDF4.pyx", line 1, in init netCDF4._netCDF4 RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 16 from C header, got 96 from PyObject

When importing netCDF4 first all runs as expected console python -c "import netCDF4;import xarray;import warnings;warnings.filterwarnings('error');import netCDF4"

A git bisect shows that the first commit with this problem was https://github.com/pydata/xarray/commit/f32d354e295c05fb5c5ece7862f77f19d82d5894

```console $ git bisect start status: waiting for both good and bad commits (xarray) /d/git/xarray (main|BISECTING) $ git bisect good v2022.10.0 status: waiting for bad commit, 1 good commit known (xarray) /d/git/xarray (main|BISECTING) $ git bisect bad v2022.11.0 Bisecting: 23 revisions left to test after this (roughly 5 steps) [4944b9eb1483c1fbd0e86fd12f3fb894b325fb8d] Fix binning when labels are provided. (#7205) (xarray) /d/git/xarray ((4944b9eb...)|BISECTING) $ git bisect run python -c "import xarray;import warnings;warnings.filterwarnings('error');import netCDF4" running 'python' '-c' 'import xarray;import warnings;warnings.filterwarnings('\''error'\'');import netCDF4' Bisecting: 11 revisions left to test after this (roughly 4 steps) [f32d354e295c05fb5c5ece7862f77f19d82d5894] Lazy Imports (#7179) running 'python' '-c' 'import xarray;import warnings;warnings.filterwarnings('\''error'\'');import netCDF4' Traceback (most recent call last): File "<string>", line 1, in <module> File "C:\Anaconda3\envs\xarray\lib\site-packages\netCDF4\__init__.py", line 3, in <module> from ._netCDF4 import * File "src\netCDF4\_netCDF4.pyx", line 1, in init netCDF4._netCDF4 RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 16 from C header, got 96 from PyObject Bisecting: 5 revisions left to test after this (roughly 3 steps) [b9aedd0155548ed0f34506ecc255b1688f07ffaa] set_coords docs: see also Dataset.assign_coords (#7230) running 'python' '-c' 'import xarray;import warnings;warnings.filterwarnings('\''error'\'');import netCDF4' Bisecting: 2 revisions left to test after this (roughly 2 steps) [65bfa4d10a529f00a9f9b145d1cea402bdae83d0] Actually make the fast code path return early for Aligner.align (#7222) running 'python' '-c' 'import xarray;import warnings;warnings.filterwarnings('\''error'\'');import netCDF4' Bisecting: 0 revisions left to test after this (roughly 1 step) [fc9026b59d38146a21769cc2d3026a12d58af059] Avoid loading any data for reprs (#7203) running 'python' '-c' 'import xarray;import warnings;warnings.filterwarnings('\''error'\'');import netCDF4' f32d354e295c05fb5c5ece7862f77f19d82d5894 is the first bad commit commit f32d354e295c05fb5c5ece7862f77f19d82d5894 Author: Mick <mick.niklas@gmail.com> Date: Fri Oct 28 18:25:39 2022 +0200 Lazy Imports (#7179) * fix typing of BackendEntrypoint * make backends lazy * make matplotlib lazy and add tests for lazy modules * make flox lazy * fix generated docs on windows... * try fixing test * make pycompat lazy * make dask.array lazy * add import xarray without numpy or pandas benchmark * improve error reporting in test * fix import benchmark * add lazy import to whats-new * fix lazy import test * fix typos * fix windows stuff again asv_bench/benchmarks/import.py | 12 +- doc/whats-new.rst | 2 + xarray/backends/cfgrib_.py | 27 ++-- xarray/backends/common.py | 15 ++- xarray/backends/h5netcdf_.py | 19 ++- xarray/backends/netCDF4_.py | 16 +-- xarray/backends/pseudonetcdf_.py | 13 +- xarray/backends/pydap_.py | 24 ++-- xarray/backends/pynio_.py | 13 +- xarray/backends/scipy_.py | 12 +- xarray/backends/zarr.py | 15 +-- xarray/convert.py | 3 +- xarray/core/_aggregations.py | 247 ++++++++++++++++++++++++++++------- xarray/core/dataset.py | 3 +- xarray/core/duck_array_ops.py | 31 +++-- xarray/core/formatting.py | 36 ++--- xarray/core/indexing.py | 6 +- xarray/core/missing.py | 4 +- xarray/core/parallel.py | 20 +-- xarray/core/pycompat.py | 20 ++- xarray/core/utils.py | 19 +++ xarray/core/variable.py | 15 +-- xarray/plot/utils.py | 9 +- xarray/tests/test_backends.py | 4 +- xarray/tests/test_computation.py | 4 +- xarray/tests/test_dask.py | 3 +- xarray/tests/test_dataset.py | 4 +- xarray/tests/test_duck_array_ops.py | 4 +- xarray/tests/test_missing.py | 4 +- xarray/tests/test_plugins.py | 61 ++++++++- xarray/tests/test_sparse.py | 4 +- xarray/tests/test_variable.py | 4 +- xarray/util/generate_aggregations.py | 13 +- 33 files changed, 445 insertions(+), 241 deletions(-) bisect found first bad commit ```

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.6 | packaged by conda-forge | (main, Oct 7 2022, 20:14:50) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: AMD64 Family 23 Model 8 Stepping 2, AuthenticAMD byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: ('de_DE', 'cp1252') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.11.0 pandas: 1.5.1 numpy: 1.23.4 scipy: 1.9.3 netCDF4: 1.6.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.6.1 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 59.8.0 pip: 22.2.2 conda: None pytest: 7.1.3 IPython: 8.5.0 sphinx: 5.2.3 None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7259/reactions",
    "total_count": 12,
    "+1": 8,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 4
}
    xarray 13221727 issue
2132500634 I_kwDOAMm_X85_G2Ca 8742 Y-axis is reversed when using to_zarr() alistaireverett 7837535 closed 0     3 2024-02-13T14:48:30Z 2024-04-28T20:08:13Z 2024-04-28T20:08:13Z NONE      

What happened?

When I export a dataset to NetCDF and Zarr, the y axis appears to have been reversed with gdalinfo. I also cannot build a vrt file with the Zarr file since it complains about positive NS axis, but this works fine with the NetCDF file.

Example NetCDF file as input: in.nc.zip

gdalinfo on output NetCDF file: $ gdalinfo NETCDF:out.nc:air_temperature_2m Driver: netCDF/Network Common Data Format Files: out.nc out.nc.aux.xml Size is 949, 1069 Coordinate System is: PROJCRS["unnamed", BASEGEOGCRS["unknown", DATUM["unnamed", ELLIPSOID["Sphere",6371000,0, LENGTHUNIT["metre",1, ID["EPSG",9001]]]], PRIMEM["Greenwich",0, ANGLEUNIT["degree",0.0174532925199433, ID["EPSG",9122]]]], CONVERSION["unnamed", METHOD["Lambert Conic Conformal (2SP)", ID["EPSG",9802]], PARAMETER["Latitude of false origin",63.3, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8821]], PARAMETER["Longitude of false origin",15, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8822]], PARAMETER["Latitude of 1st standard parallel",63.3, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8823]], PARAMETER["Latitude of 2nd standard parallel",63.3, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8824]], PARAMETER["Easting at false origin",0, LENGTHUNIT["metre",1], ID["EPSG",8826]], PARAMETER["Northing at false origin",0, LENGTHUNIT["metre",1], ID["EPSG",8827]]], CS[Cartesian,2], AXIS["easting",east, ORDER[1], LENGTHUNIT["metre",1, ID["EPSG",9001]]], AXIS["northing",north, ORDER[2], LENGTHUNIT["metre",1, ID["EPSG",9001]]]] Data axis to CRS axis mapping: 1,2 Origin = (-1061334.000000000000000,1338732.125000000000000) Pixel Size = (2500.000000000000000,-2500.000000000000000) Metadata: air_temperature_2m#coordinates=longitude latitude air_temperature_2m#grid_mapping=projection_lambert air_temperature_2m#long_name=Screen level temperature (T2M) air_temperature_2m#standard_name=air_temperature air_temperature_2m#units=K air_temperature_2m#_FillValue=9.96921e+36 height1#description=height above ground height1#long_name=height height1#positive=up height1#units=m height1#_FillValue=nan NC_GLOBAL#coordinates=projection_lambert time NETCDF_DIM_EXTRA={height1} NETCDF_DIM_height1_DEF={1,5} NETCDF_DIM_height1_VALUES=2 projection_lambert#earth_radius=6371000 projection_lambert#grid_mapping_name=lambert_conformal_conic projection_lambert#latitude_of_projection_origin=63.3 projection_lambert#longitude_of_central_meridian=15 projection_lambert#standard_parallel={63.3,63.3} x#long_name=x-coordinate in Cartesian system x#standard_name=projection_x_coordinate x#units=m x#_FillValue=nan y#long_name=y-coordinate in Cartesian system y#standard_name=projection_y_coordinate y#units=m y#_FillValue=nan Geolocation: LINE_OFFSET=0 LINE_STEP=1 PIXEL_OFFSET=0 PIXEL_STEP=1 SRS=GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]] X_BAND=1 X_DATASET=NETCDF:"out.nc":longitude Y_BAND=1 Y_DATASET=NETCDF:"out.nc":latitude Corner Coordinates: Upper Left (-1061334.000, 1338732.125) ( 18d10'24.02"W, 72d45'59.56"N) Lower Left (-1061334.000,-1333767.875) ( 0d15'55.60"E, 50d18'23.10"N) Upper Right ( 1311166.000, 1338732.125) ( 54d17'24.85"E, 71d34'43.38"N) Lower Right ( 1311166.000,-1333767.875) ( 33d 2'20.10"E, 49d45' 6.51"N) Center ( 124916.000, 2482.125) ( 17d30' 3.21"E, 63d18' 1.50"N) Band 1 Block=949x1069 Type=Float32, ColorInterp=Undefined Min=236.480 Max=284.937 Minimum=236.480, Maximum=284.937, Mean=269.816, StdDev=9.033 NoData Value=9.96920996838686905e+36 Unit Type: K Metadata: coordinates=longitude latitude grid_mapping=projection_lambert long_name=Screen level temperature (T2M) NETCDF_DIM_height1=2 NETCDF_VARNAME=air_temperature_2m standard_name=air_temperature STATISTICS_MAXIMUM=284.93682861328 STATISTICS_MEAN=269.81614967971 STATISTICS_MINIMUM=236.47978210449 STATISTICS_STDDEV=9.0332172122638 units=K _FillValue=9.96921e+36

gdalinfo on output Zarr file: $ gdalinfo ZARR:out.zarr:/air_temperature_2m:0 Driver: Zarr/Zarr Files: none associated Size is 949, 1069 Origin = (-1061334.000000000000000,-1333767.875000000000000) Pixel Size = (2500.000000000000000,2500.000000000000000) Metadata: coordinates=longitude latitude grid_mapping=projection_lambert long_name=Screen level temperature (T2M) standard_name=air_temperature Corner Coordinates: Upper Left (-1061334.000,-1333767.875) Lower Left (-1061334.000, 1338732.125) Upper Right ( 1311166.000,-1333767.875) Lower Right ( 1311166.000, 1338732.125) Center ( 124916.000, 2482.125) Band 1 Block=475x268 Type=Float32, ColorInterp=Undefined NoData Value=9.96920996838686905e+36 Unit Type: K

The main issue is that the origin and y-axis direction is reversed, as you can see from the origin and pixel size. I have tried taking the CRS from the netcdf and adding it to the Zarr file as a _CRS attribute manually, but this doesn't make any difference to the origin or pixel size.

What did you expect to happen?

Origin, pixel size and corner coords should match those in the netcdf file.

$ gdalinfo ZARR:out.zarr:/air_temperature_2m:0 Driver: Zarr/Zarr Files: none associated Size is 949, 1069 Origin = (-1061334.000000000000000,1338732.125000000000000) Pixel Size = (2500.000000000000000,-2500.000000000000000) Metadata: coordinates=longitude latitude grid_mapping=projection_lambert long_name=Screen level temperature (T2M) standard_name=air_temperature Corner Coordinates: Corner Coordinates: Upper Left (-1061334.000, 1338732.125) ( 18d10'24.02"W, 72d45'59.56"N) Lower Left (-1061334.000,-1333767.875) ( 0d15'55.60"E, 50d18'23.10"N) Upper Right ( 1311166.000, 1338732.125) ( 54d17'24.85"E, 71d34'43.38"N) Lower Right ( 1311166.000,-1333767.875) ( 33d 2'20.10"E, 49d45' 6.51"N) Center ( 124916.000, 2482.125) ( 17d30' 3.21"E, 63d18' 1.50"N) Band 1 Block=475x268 Type=Float32, ColorInterp=Undefined NoData Value=9.96920996838686905e+36 Unit Type: K

Minimal Complete Verifiable Example

```Python import xarray as xr from pyproj import CRS

ds = xr.open_dataset("in.nc")

Optionally take copy CRS to Zarr (produces and error, but does work)

crs_wkt = CRS.from_cf(ds["projection_lambert"].attrs).to_wkt() ds["air_temperature_2m"] = ds["air_temperature_2m"].assign_attrs(_CRS={"wkt": crs_wkt})

ds.to_zarr("out.zarr")

ds.to_netcdf("out.nc") ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] python-bits: 64 OS: Linux OS-release: 6.5.0-15-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2024.1.1 pandas: 2.2.0 numpy: 1.26.4 scipy: None netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: None mypy: None IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8742/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1680031454 I_kwDOAMm_X85kIz7e 7780 mypy does not understand output of binary operations Illviljan 14371165 open 0     8 2023-04-23T13:38:55Z 2024-04-28T20:07:04Z   MEMBER      

What happened?

When doing operations on numpy arrays and xarray variables mypy does not understand that the output is always a xarray variable regardless of the order. See example.

What did you expect to happen?

mypy to pass for the example code.

Minimal Complete Verifiable Example

```Python import numpy as np import xarray as xr

x = np.array([1, 2, 4]) v = xr.Variable(["x"], x)

numpy first:

xv = x * v xv.values # error: "ndarray[Any, dtype[bool_]]" has no attribute "values" [attr-defined] if isinstance(xv, xr.Variable): xv.values

variable first:

vx = v * x vx.values if isinstance(vx, xr.Variable): vx.values ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

Seen in #7741

Environment

xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.9.16 (main, Mar 8 2023, 10:39:24) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: en libhdf5: 1.10.6 libnetcdf: None xarray: 2023.4.2 pandas: 2.0.0 numpy: 1.23.5 scipy: 1.10.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.4.0 distributed: 2023.4.0 matplotlib: 3.5.3 cartopy: None seaborn: 0.12.2 numbagg: None fsspec: 2023.4.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 67.7.1 pip: 23.1.1 conda: 23.3.1 pytest: 7.3.1 mypy: 1.2.0 IPython: 8.12.0 sphinx: 6.1.3
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7780/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2188557281 I_kwDOAMm_X86Ccrvh 8842 Opening zarr dataset with poor connection leads to NaN chunks renaudjester 38732257 open 0     21 2024-03-15T13:47:18Z 2024-04-28T20:05:15Z   NONE      

Problem

I am using xarray to open zarr datasets located in an s3 bucket. However, it can happen that the results doesn't retrieve all the chunks and we have NaNs instead. It is usually linked with low bandwith internet connection and asking for a lot of chunks.

More details

In our case (see code below), we started tracking the http calls to understand a bit better the problem (with http tracking software). 3 cases are possible as for the response when getting a chunk: - 200: we get the chunk with the data - 403: missing data, this is normal as I am dealing with ocean data so the chunks associated with the continent don't exists - no response: there isn't even a response so the get request "fails" and we don't have the data.

The latter is a big problem as we have randomly empty chunks! as a user it is also very annoying to detect.

We also noticed that when using xarray.open_dataset the calls seems to be done all at the same time! Which increases the probability of NaN chunks. That's why we tried using xarray.open_mfdataset since each worker calls the get request ie the chunks, one by one.

Questions

  • Why the xarray.open_dataset sends all the requests concurrently? Is it possible to control the number of requests and do some kind of rolling batch gather?
  • Is there a way to raise an exception when there are no response from the server? So that at least, as users, we don't have to manually check the data.
  • Any idea to solve this problem?
  • Maybe this is linked to zarr library?

To reproduce

This bug is difficult to reproduce. The only way I managed to reproduce it is with a computer connected to a phone that is connected to the 3G. With this setup it happens all the time though. With a good connection and on my computer it never happens. We have had several reports of this problem otherwise.

See the two scripts: one with open_dataset

``` import xarray as xr import matplotlib.pyplot as plt import time import sys

import logging logging.basicConfig( stream=sys.stdout, format="%(asctime)s | %(name)14s | %(levelname)7s | %(message)s", datefmt="%Y-%m-%dT%H:%M:%S", encoding="utf-8", level=logging.ERROR, ) logging.getLogger("timeloop").setLevel(logging.DEBUG) logging.getLogger("urllib3").setLevel(logging.DEBUG) logging.getLogger("botocore").setLevel(logging.DEBUG) logging.getLogger("s3fs").setLevel(logging.DEBUG) logging.getLogger("fsspec").setLevel(logging.DEBUG) logging.getLogger("asyncio").setLevel(logging.DEBUG)

logging.getLogger("numba").setLevel(logging.ERROR)

logging.getLogger("s3transfer").setLevel(logging.DEBUG)

start_time = time.time()

print("Starting...") data = xr.open_dataset("https://s3.waw3-1.cloudferro.com/mdl-arco-geo-012/arco/GLOBAL_ANALYSISFORECAST_PHY_001_024/cmems_mod_glo_phy-thetao_anfc_0.083deg_P1D-m_202211/geoChunked.zarr", engine = "zarr")

print("Dataset opened...") bla = data.thetao.sel(longitude = slice(-170.037309004901026,-70.037309004901026), latitude=slice(-80.27257431850789,-40.27257431850789), time=slice("2023-03-20T00:00:00","2023-03-20T00:00:00")).sel(elevation =0, method="nearest")

print("Plotting... ") map = bla.isel(time=0).plot()

map = data.isel(time=0).plot()

print("Saving image...") plt.savefig("./bla_fast.png")

print("Total processing time:", (time.time() - start_time)) and the other one with `open_mfdataset`: import xarray as xr import matplotlib.pyplot as plt import time import sys import dask

import logging logging.basicConfig( stream=sys.stdout, format="%(asctime)s | %(name)14s | %(levelname)7s | %(message)s", datefmt="%Y-%m-%dT%H:%M:%S", encoding="utf-8", level=logging.ERROR, ) logging.getLogger("timeloop").setLevel(logging.DEBUG) logging.getLogger("urllib3").setLevel(logging.DEBUG) logging.getLogger("botocore").setLevel(logging.DEBUG) logging.getLogger("s3fs").setLevel(logging.DEBUG) logging.getLogger("fsspec").setLevel(logging.DEBUG) logging.getLogger("asyncio").setLevel(logging.DEBUG)

logging.getLogger("numba").setLevel(logging.ERROR)

logging.getLogger("s3transfer").setLevel(logging.DEBUG)

start_time = time.time()

with dask.config.set(num_workers=2):

print("Starting...")
data = xr.open_mfdataset(["https://s3.waw3-1.cloudferro.com/mdl-arco-geo-012/arco/GLOBAL_ANALYSISFORECAST_PHY_001_024/cmems_mod_glo_phy-thetao_anfc_0.083deg_P1D-m_202211/geoChunked.zarr"], 
engine = "zarr")#.thetao.sel(longitude = slice(-170.037309004901026,-70.037309004901026),
            #latitude=slice(-80.27257431850789,-40.27257431850789),
            #time=slice("2023-03-20T00:00:00","2023-03-20T00:00:00")).sel(elevation =0,     #method="nearest")

print("Dataset opened...")
bla = data.thetao.sel(longitude = slice(-170.037309004901026,-70.037309004901026),
            latitude=slice(-80.27257431850789,-40.27257431850789),
            time=slice("2023-03-20T00:00:00","2023-03-20T00:00:00")).sel(elevation =0,
method="nearest")


print("Plotting... ")
map = bla.isel(time=0).plot()

#map = data.isel(time=0).plot()

print("Saving image...")
plt.savefig("./bla_long.png")

print("Total processing time:", (time.time() - start_time)) ```

Expected result

or failed run

Obtained result

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8842/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2021386895 PR_kwDOAMm_X85g7QZD 8500 Deprecate ds.dims returning dict TomNicholas 35968931 closed 0     1 2023-12-01T18:29:28Z 2024-04-28T20:04:00Z 2023-12-06T17:52:24Z MEMBER   0 pydata/xarray/pulls/8500
  • [x] Closes first step of #8496, would require another PR later to actually change the return type. Also really resolves the second half of #921.
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8500/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2247914876 I_kwDOAMm_X86F_HV8 8950 ENH: Make `_to_dataframe` faster for extension array columns after `pandas` fix ilan-gold 43999641 open 0     0 2024-04-17T10:10:37Z 2024-04-28T20:03:23Z   CONTRIBUTOR      

What is your issue?

One https://github.com/pandas-dev/pandas/issues/57676 is completed, we should be able to do the joins in the _to_dataframe method faster (we need to be able to handle the singleton case which is hte issue with pandas): https://github.com/pydata/xarray/blob/239309f881ba0d7e02280147bc443e6e286e6a63/xarray/core/dataset.py#L7170-L7177

see discussion here

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8950/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2115555965 I_kwDOAMm_X85-GNJ9 8695 Return a 3D object alongside 1D object in apply_ufunc ahuang11 15331990 closed 0     7 2024-02-02T18:47:14Z 2024-04-28T19:59:31Z 2024-04-28T19:59:31Z CONTRIBUTOR      

Is your feature request related to a problem?

Currently, I have something similar to this, where the input_lat is transformed to new_lat (here, +0.25, but in real use case, it's indeterministic).

Since xarray_ufunc doesn't return a dataset with actual coordinates values, I had to return a second output to retain new_lat to properly update the coordinate values, but this second output is shaped time, lat, lon so I have to ds["lat"] = new_lat.isel(lon=0, time=0).values, which I think is inefficient; I simply need it to be shaped lat.

Any ideas on how I can modify this to make it more efficient?

```python import xarray as xr import numpy as np

air = xr.tutorial.open_dataset("air_temperature")["air"] input_lat = np.arange(20, 45)

def interp1d_np(data, base_lat, input_lat): new_lat = input_lat + 0.25 return np.interp(new_lat, base_lat, data), new_lat

ds, new_lat = xr.apply_ufunc( interp1d_np, # first the function air, air.lat, # as above input_lat, # as above input_core_dims=[["lat"], ["lat"], ["lat"]], # list with one entry per arg output_core_dims=[["lat"], ["lat"]], # returned data has one dimension exclude_dims=set(("lat",)), # dimensions allowed to change size. Must be a set! vectorize=True, # loop over non-core dims ) new_lat = new_lat.isel(lon=0, time=0).values ds["lat"] = new_lat ```

Describe the solution you'd like

Either be able to automatically assign the new_lat to the returned xarray object, or allow a 1D dataset to be returned

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8695/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
576337745 MDU6SXNzdWU1NzYzMzc3NDU= 3831 Errors using to_zarr for an s3 store JarrodBWong 15351025 closed 0     15 2020-03-05T15:30:40Z 2024-04-28T19:59:02Z 2024-04-28T19:59:02Z NONE      

Hello, I have been trying to write zarr files from xarray directly into an s3 store but keep getting errors for missing arrays. It looks like the structure of the zarr archive is created in my s3 bucket, I can see .zarray and .zattrs files but it's missing the 0.0.0, 0.0.1, etc files. I have been able to write the same arrays directly to my disk so don't think it's an issue with the dataset itself.

MCVE Code Sample

```python s3 = s3fs.S3FileSystem(anon=False) store= s3fs.S3Map(root=f's3://my-bucket/data.zarr', s3=s3, check=False)

ds.to_zarr(store=store, consolidated=True, mode='w')

```

Output

The variable name of the array changes by the run, it's not always the same one that it says is missing.

logs -------------------------------------------------------------------------- NoSuchKey Traceback (most recent call last) /opt/conda/lib/python3.7/site-packages/s3fs/core.py in _fetch_range(client, bucket, key, version_id, start, end, max_attempts, req_kw) 1196 Range='bytes=%i-%i' % (start, end - 1), -> 1197 **kwargs) 1198 return resp['Body'].read() ~/.local/lib/python3.7/site-packages/botocore/client.py in _api_call(self, *args, **kwargs) 315 # The "self" in this scope is referring to the BaseClient. --> 316 return self._make_api_call(operation_name, kwargs) 317 ~/.local/lib/python3.7/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params) 625 error_class = self.exceptions.from_code(error_code) --> 626 raise error_class(parsed_response, operation_name) 627 else: NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist. During handling of the above exception, another exception occurred: FileNotFoundError Traceback (most recent call last) /opt/conda/lib/python3.7/site-packages/fsspec/mapping.py in __getitem__(self, key, default) 75 try: ---> 76 result = self.fs.cat(key) 77 except: # noqa: E722 /opt/conda/lib/python3.7/site-packages/fsspec/spec.py in cat(self, path) 545 """ Get the content of a file """ --> 546 return self.open(path, "rb").read() 547 /opt/conda/lib/python3.7/site-packages/fsspec/spec.py in read(self, length) 1129 return b"" -> 1130 out = self.cache._fetch(self.loc, self.loc + length) 1131 self.loc += len(out) /opt/conda/lib/python3.7/site-packages/fsspec/caching.py in _fetch(self, start, end) 338 # First read, or extending both before and after --> 339 self.cache = self.fetcher(start, bend) 340 self.start = start /opt/conda/lib/python3.7/site-packages/s3fs/core.py in _fetch_range(self, start, end) 1059 def _fetch_range(self, start, end): -> 1060 return _fetch_range(self.fs.s3, self.bucket, self.key, self.version_id, start, end, req_kw=self.req_kw) 1061 /opt/conda/lib/python3.7/site-packages/s3fs/core.py in _fetch_range(client, bucket, key, version_id, start, end, max_attempts, req_kw) 1212 return b'' -> 1213 raise translate_boto_error(e) 1214 except Exception as e: FileNotFoundError: The specified key does not exist. During handling of the above exception, another exception occurred: KeyError Traceback (most recent call last) /opt/conda/lib/python3.7/site-packages/zarr/core.py in _load_metadata_nosync(self) 149 mkey = self._key_prefix + array_meta_key --> 150 meta_bytes = self._store[mkey] 151 except KeyError: /opt/conda/lib/python3.7/site-packages/fsspec/mapping.py in __getitem__(self, key, default) 79 return default ---> 80 raise KeyError(key) 81 return result KeyError: 'my-bucket/data.zarr/lv_HTGL7_l1/.zarray' During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) <ipython-input-7-c21938cc83d3> in <module> 7 ds.to_zarr(store=s3_store_dest, 8 consolidated=True, ----> 9 mode='w') /opt/conda/lib/python3.7/site-packages/xarray/core/dataset.py in to_zarr(self, store, mode, synchronizer, group, encoding, compute, consolidated, append_dim) 1623 compute=compute, 1624 consolidated=consolidated, -> 1625 append_dim=append_dim, 1626 ) 1627 /opt/conda/lib/python3.7/site-packages/xarray/backends/api.py in to_zarr(dataset, store, mode, synchronizer, group, encoding, compute, consolidated, append_dim) 1341 writer = ArrayWriter() 1342 # TODO: figure out how to properly handle unlimited_dims -> 1343 dump_to_store(dataset, zstore, writer, encoding=encoding) 1344 writes = writer.sync(compute=compute) 1345 /opt/conda/lib/python3.7/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1133 variables, attrs = encoder(variables, attrs) 1134 -> 1135 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) 1136 1137 /opt/conda/lib/python3.7/site-packages/xarray/backends/zarr.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 385 self.set_dimensions(variables_encoded, unlimited_dims=unlimited_dims) 386 self.set_variables( --> 387 variables_encoded, check_encoding_set, writer, unlimited_dims=unlimited_dims 388 ) 389 /opt/conda/lib/python3.7/site-packages/xarray/backends/zarr.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 444 dtype = str 445 zarr_array = self.ds.create( --> 446 name, shape=shape, dtype=dtype, fill_value=fill_value, **encoding 447 ) 448 zarr_array.attrs.put(encoded_attrs) /opt/conda/lib/python3.7/site-packages/zarr/hierarchy.py in create(self, name, **kwargs) 877 """Create an array. Keyword arguments as per 878 :func:`zarr.creation.create`.""" --> 879 return self._write_op(self._create_nosync, name, **kwargs) 880 881 def _create_nosync(self, name, **kwargs): /opt/conda/lib/python3.7/site-packages/zarr/hierarchy.py in _write_op(self, f, *args, **kwargs) 656 657 with lock: --> 658 return f(*args, **kwargs) 659 660 def create_group(self, name, overwrite=False): /opt/conda/lib/python3.7/site-packages/zarr/hierarchy.py in _create_nosync(self, name, **kwargs) 884 kwargs.setdefault('cache_attrs', self.attrs.cache) 885 return create(store=self._store, path=path, chunk_store=self._chunk_store, --> 886 **kwargs) 887 888 def empty(self, name, **kwargs): /opt/conda/lib/python3.7/site-packages/zarr/creation.py in create(shape, chunks, dtype, compressor, fill_value, order, store, synchronizer, overwrite, path, chunk_store, filters, cache_metadata, cache_attrs, read_only, object_codec, **kwargs) 123 # instantiate array 124 z = Array(store, path=path, chunk_store=chunk_store, synchronizer=synchronizer, --> 125 cache_metadata=cache_metadata, cache_attrs=cache_attrs, read_only=read_only) 126 127 return z /opt/conda/lib/python3.7/site-packages/zarr/core.py in __init__(self, store, path, read_only, chunk_store, synchronizer, cache_metadata, cache_attrs) 122 123 # initialize metadata --> 124 self._load_metadata() 125 126 # initialize attributes /opt/conda/lib/python3.7/site-packages/zarr/core.py in _load_metadata(self) 139 """(Re)load metadata from store.""" 140 if self._synchronizer is None: --> 141 self._load_metadata_nosync() 142 else: 143 mkey = self._key_prefix + array_meta_key /opt/conda/lib/python3.7/site-packages/zarr/core.py in _load_metadata_nosync(self) 150 meta_bytes = self._store[mkey] 151 except KeyError: --> 152 err_array_not_found(self._path) 153 else: 154 /opt/conda/lib/python3.7/site-packages/zarr/errors.py in err_array_not_found(path) 19 20 def err_array_not_found(path): ---> 21 raise ValueError('array not found at path %r' % path) 22 23 ValueError: array not found at path 'lv_HTGL7_l1'

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.14.165-133.209.amzn2.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.15.0 pandas: 1.0.1 numpy: 1.18.1 scipy: 1.4.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: 1.5.5 zarr: 2.4.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.3 cfgrib: None iris: None bottleneck: None dask: 2.11.0 distributed: 2.11.0 matplotlib: 3.1.3 cartopy: None seaborn: 0.10.0 numbagg: None setuptools: 45.2.0.post20200209 pip: 20.0.2 conda: 4.7.12 pytest: None IPython: 7.12.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3831/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2253567622 I_kwDOAMm_X86GUraG 8959 Dataset constructor always coerces 1D data variables with same name as dim to coordinates TomNicholas 35968931 open 0     10 2024-04-19T17:54:28Z 2024-04-28T19:57:31Z   MEMBER      

What is your issue?

Whilst xarray's data model appears to allow 1D data variables that have the same name as their dimension, it seems to be impossible to actually create this using the Dataset constructor, as they will always be converted to coordinate variables instead.

We can create a 1D data variable with the same name as it's dimension like this: ```python In [9]: ds = xr.Dataset({'x': 0})

In [10]: ds Out[10]: <xarray.Dataset> Size: 8B Dimensions: () Data variables: x int64 8B 0

In [11]: ds.expand_dims('x') Out[11]: <xarray.Dataset> Size: 8B Dimensions: (x: 1) Dimensions without coordinates: x Data variables: x (x) int64 8B 0 ``` so it seems to be a valid part of the data model.

But I can't get to that situation from the Dataset constructor. This should create the same dataset:

```python In [15]: ds = xr.Dataset(data_vars={'x': ('x', [0])})

In [16]: ds Out[16]: <xarray.Dataset> Size: 8B Dimensions: (x: 1) Coordinates: * x (x) int64 8B 0 Data variables: empty `` But actually it makesxa coordinate variable (and implicitly creates a pandas Index for it). This means that in this case there is no difference between using thedata_varsandcoords` kwargs to the constructor:

```python ds = xr.Dataset(coords={'x': ('x', [0])})

In [18]: ds Out[18]: <xarray.Dataset> Size: 8B Dimensions: (x: 1) Coordinates: * x (x) int64 8B 0 Data variables: empty ```

This all seems weird to me. I would have thought that if a 1D data variable is allowed, we shouldn't coerce to making it a coordinate variable in the constructor. If anything that's actively misleading.

Note that whilst this came up in the context of trying to avoid auto-creation of 1D indexes for coordinate variables, this issue is actually separate. (xref https://github.com/pydata/xarray/pull/8872#issuecomment-2027571714)

cc @benbovy who probably has thoughts

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8959/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2224036575 I_kwDOAMm_X86EkBrf 8905 Variable doesn't have an .expand_dims method TomNicholas 35968931 closed 0     4 2024-04-03T22:19:10Z 2024-04-28T19:54:08Z 2024-04-28T19:54:08Z MEMBER      

Is your feature request related to a problem?

DataArray and Dataset have an .expand_dims method, but it looks like Variable doesn't.

Describe the solution you'd like

Variable should also have this method, the only difference being that it wouldn't create any coordinates or indexes.

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8905/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2204768593 I_kwDOAMm_X86DahlR 8871 Concatenation automatically creates indexes where none existed TomNicholas 35968931 open 0     1 2024-03-25T02:43:31Z 2024-04-27T16:50:56Z   MEMBER      

What happened?

Currently concatenation will automatically create indexes for any dimension coordinates in the output, even if there were no indexes on the input.

What did you expect to happen?

Indexes not to be created for variables which did not already have them.

Minimal Complete Verifiable Example

```Python

TODO once passing indexes={} directly to DataArray constructor is allowed then no need to create coords object separately first

coords = Coordinates( {"x": np.array([1, 2, 3])}, indexes={} ) arrays = [ DataArray( np.zeros((3, 3)), dims=["x", "y"], coords=coords, ) for _ in range(2) ]

combined = concat(arrays, dim="x") assert combined.shape == (6, 3) assert combined.dims == ("x", "y")

should not have auto-created any indexes

assert combined.indexes == {} # this fails

combined = concat(arrays, dim="z") assert combined.shape == (2, 3, 3) assert combined.dims == ("z", "x", "y")

should not have auto-created any indexes

assert combined.indexes == {} # this also fails ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

```Python

nor have auto-created any indexes

  assert combined.indexes == {}

E AssertionError: assert Indexes:\n x Index([1, 2, 3, 1, 2, 3], dtype='int64', name='x') == {} E Full diff: E - { E - , E - } E + Indexes: E + x Index([1, 2, 3, 1, 2, 3], dtype='int64', name='x', E + ) ```

Anything else we need to know?

The culprit is the call to core.indexes.create_default_index_implicit inside merge.py. If I comment out this call my concat test passes, but basic tests in test_merge.py start failing.

I would like know to how to avoid the internal call to create_default_index_implicit. I tried passing compat='override' but that made no difference, so I think we would have to change merge.collect_variables_and_indexes somehow.

Conceptually, I would have thought we should be examining what indexes exist on the objects to be concatenated, and not creating new indexes for any variable that doesn't already have one. Presumably we should therefore be making use of the indexes argument to merge.collect_variables_and_indexes, but currently that just seems to be empty.

Environment

I've been experimenting running this test on a branch that includes both #8711 and #8714, but actually this example will fail in the same way on main.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8871/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2254350395 PR_kwDOAMm_X85tPTua 8960 Option to not auto-create index during expand_dims TomNicholas 35968931 closed 0     2 2024-04-20T03:27:23Z 2024-04-27T16:48:30Z 2024-04-27T16:48:24Z MEMBER   0 pydata/xarray/pulls/8960
  • [x] Solves part of #8871 by pulling out part of https://github.com/pydata/xarray/pull/8872#issuecomment-2027571714
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~

TODO: - [x] Add new kwarg to DataArray.expand_dims - [ ] Add examples to docstrings? - [x] Check it actually solves the problem in #8872

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8960/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2261844699 PR_kwDOAMm_X85toeXT 8968 Bump dependencies incl `pandas>=2` dcherian 2448579 closed 0     0 2024-04-24T17:42:19Z 2024-04-27T14:17:16Z 2024-04-27T14:17:16Z MEMBER   0 pydata/xarray/pulls/8968
  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8968/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2266443797 PR_kwDOAMm_X85t4Nzs 8977 preliminary pr to examine the DataTree injected docs. flamingbear 479480 open 0     5 2024-04-26T20:15:22Z 2024-04-26T22:36:00Z   CONTRIBUTOR   1 pydata/xarray/pulls/8977

This PR should never be merged, it is opened only to run the build-the-docs with the changes from #8976

I just wanted to make sure I could point to what the final doc pages will look like when datatree is released.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8977/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1307112340 I_kwDOAMm_X85N6POU 6799 `interp` performance with chunked dimensions slevang 39069044 open 0     9 2022-07-17T14:25:17Z 2024-04-26T21:41:31Z   CONTRIBUTOR      

What is your issue?

I'm trying to perform 2D interpolation on a large 3D array that is heavily chunked along the interpolation dimensions and not the third dimension. The application could be extracting a timeseries from a reanalysis dataset chunked in space but not time, to compare to observed station data with more precise coordinates.

I use the advanced interpolation method as described in the documentation, with the interpolation coordinates specified by DataArray's with a shared dimension like so:

```python %load_ext memory_profiler import numpy as np import dask.array as da import xarray as xr

Synthetic dataset chunked in the two interpolation dimensions

nt = 40000 nx = 200 ny = 200 ds = xr.Dataset( data_vars = { 'foo':( ('t', 'x', 'y'), da.random.random(size=(nt, nx, ny), chunks=(-1, 10, 10))), }, coords = { 't': np.linspace(0, 1, nt), 'x': np.linspace(0, 1, nx), 'y': np.linspace(0, 1, ny), } )

Interpolate to some random 2D locations

ni = 10 xx = xr.DataArray(np.random.random(ni), dims='z', name='x') yy = xr.DataArray(np.random.random(ni), dims='z', name='y') interpolated = ds.foo.interp(x=xx, y=yy) %memit interpolated.compute() ```

With just 10 interpolation points, this example calculation uses about 1.5 * ds.nbytes of memory, and saturates around 2 * ds.nbytes by about 100 interpolation points.

This could definitely work better, as each interpolated point usually only requires a single chunk of the input dataset, and at most 4 if it is right on the corner of a chunk. For example we can instead do it in a loop and get very reasonable memory usage, but this isn't very scalable:

python interpolated = [] for n in range(ni): interpolated.append(ds.foo.interp(x=xx.isel(z=n), y=yy.isel(z=n))) interpolated = xr.concat(interpolated, dim='z') %memit interpolated.compute()

I tried adding a .chunk({'z':1}) to the interpolation coordinates but this doesn't help. We can also do .sel(x=xx, y=yy, method='nearest') with very good performance.

Any tips to make this calculation work better with existing options, or otherwise ways we might improve the interp method to handle this case? Given the performance behavior I'm guessing we may be doing sequntial interpolation for the dimensions, basically an interp1d call for all the xx points and from there another to the yy points, which for even a small number of points would require nearly all chunks to be loaded in. But I haven't explored the code enough yet to understand the details.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6799/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2260280862 PR_kwDOAMm_X85tjH8m 8967 Migrate datatreee assertions/extensions/formatting owenlittlejohns 7788154 closed 0     0 2024-04-24T04:23:03Z 2024-04-26T17:38:59Z 2024-04-26T17:29:18Z CONTRIBUTOR   0 pydata/xarray/pulls/8967

This PR continues the overall work of migrating DataTree into xarray.

  • xarray/core/datatree_render.py is the renamed version of xarray/datatree_/datatree/render.py.
  • xarray/core/extensions.py now contains functionality from xarray/datatree_/datatree/extensions.py.
  • xarray/core/formatting.py now contains functionality from xarray/datatree_/datatree/formatting.py.
  • xarray/tests/test_datatree.py now contains tests from xarray/datatree_/datatree/tests/test_dataset_api.py.
  • xarray/testing/assertions.py now contains functionality from /xarray/datatree_/datatree/testing.py.

I had also meant to get to common.py and what's left of io.py, but I've got a hefty couple of days of meetings ahead, so I wanted to get this progress into PR before that happens. @flamingbear or I can follow up with the remaining things in a separate PR. (Also this PR is already getting a little big, so maybe it's already got enough in it)

  • [x] Contributes to migration step for miscellaneous modules in #8572
  • [ ] ~~Tests added~~
  • [ ] ~~User visible changes (including notable bug fixes) are documented in whats-new.rst~~
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8967/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
590630281 MDU6SXNzdWU1OTA2MzAyODE= 3921 issues discovered by the all-but-dask CI keewis 14808389 closed 0     4 2020-03-30T22:08:46Z 2024-04-25T14:48:15Z 2024-02-10T02:57:34Z MEMBER      

After adding the py38-all-but-dask CI in #3919, it discovered a few backend issues: - zarr: - [x] open_zarr with chunks="auto" always tries to chunk, even if dask is not available (fixed in #3919) - [x] ZarrArrayWrapper.__getitem__ incorrectly passes the indexer's tuple attribute to _arrayize_vectorized_indexer (this only happens if dask is not available) (fixed in #3919) - [x] slice indexers with negative steps get transformed incorrectly if dask is not available https://github.com/pydata/xarray/pull/8674 - rasterio: - ~calling pickle.dumps on a Dataset object returned by open_rasterio fails because a non-serializable lock was used (if dask is installed, a serializable lock is used instead)~

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3921/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2261917442 PR_kwDOAMm_X85touYl 8971 Delete pynio backend. dcherian 2448579 closed 0     2 2024-04-24T18:25:26Z 2024-04-25T14:38:23Z 2024-04-25T14:23:59Z MEMBER   0 pydata/xarray/pulls/8971
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8971/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
602256880 MDU6SXNzdWU2MDIyNTY4ODA= 3981 [Proposal] Expose Variable without Pandas dependency jhamman 2443309 open 0     23 2020-04-17T22:00:10Z 2024-04-24T17:19:55Z   MEMBER      

This issue proposes exposing Xarray's Variable class as a stand-alone array class with named axes (dims) and arbitrary metadata (attrs) but without coordinates (indexes). Yes, this already exists but the Variable class in currently inseparable from our Pandas dependency, despite not utilizing any of its functionality. What would this entail?

The biggest change would be in making Pandas an optional dependency and isolating any imports. This change could be confined to the Variable object or could be propagated further as the Explicit Indexes work proceeds (#1603).

Why?

Within Xarray, the Variable class is a vital building block for many of our internal data structures. Recently, the utility of a simple array with named dimensions has been highlighted by a few potential user communities:

  • Scikit-learn: https://github.com/scikit-learn/enhancement_proposals/pull/18
  • PyTorch: (https://pytorch.org/tutorials/intermediate/named_tensor_tutorial.html, http://nlp.seas.harvard.edu/NamedTensor)

An example from the above linked SLEP as to why users may not want Pandas a dependency in Xarray:

@amueller: ...If we go this route, I think we need to make xarray, and therefore pandas, a mandatory dependency... ... @adrinjalali: ...And we still do have the option of making a NamedArray. xarray uses the pandas' index classes for the indexing and stuff, which is something we really don't need...

Since we already have a class developed that meets these applications' use cases, its seems only prudent to evaluate the feasibility in exposing the Variable as a low-level api object.

In conclusion, I'm not sure this is currently worth the effort but its probably worth exploring at this point.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3981/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2259850888 I_kwDOAMm_X86GspaI 8966 HTML repr for chunked variables with high dimensionality TomNicholas 35968931 open 0     1 2024-04-23T22:00:40Z 2024-04-24T13:27:05Z   MEMBER      

What is your issue?

The graphical representation of dask arrays with many dimensions can end up off the page in the HTML repr.

Ideally dask would worry about this for us, and we just use their _inline_repr, as mentioned here https://github.com/pydata/xarray/issues/4376#issuecomment-680296332

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8966/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2243685081 I_kwDOAMm_X86Fu-rZ 8945 netCDF4 indexing: `reindex_like` is very slow if dataset not loaded into memory brendan-m-murphy 11130776 closed 0     4 2024-04-15T13:26:08Z 2024-04-23T21:49:28Z 2024-04-23T15:33:36Z NONE      

What is your issue?

Reindexing a dataset without loading it into memory seems to be very slow (about 1000x slower than reindexing after loading into memory).

Here is a minimum working example: ``` times = 100 nlat = 200 nlon = 300

fp = xr.Dataset({"fp": (["time", "lat", "lon"], np.arange(times * nlat * nlon).reshape(times, nlat, nlon))}, coords={"time": pd.date_range(start="2019-01-01T02:00:00", periods=times, freq="1H"), "lat": np.arange(nlat), "lon": np.arange(nlon)})

flux = xr.Dataset({"flux": (["time", "lat", "lon"], np.arange(nlat * nlon).reshape(1, nlat, nlon))}, coords={"time": [pd.to_datetime("2019-01-01")], "lat": np.arange(nlat) + np.random.normal(0.0, 0.01, nlat), "lon": np.arange(nlon) + np.random.normal(0.0, 0.01, nlon)})

fp.to_netcdf("combine_datasets_tests/fp.nc") flux.to_netcdf("combine_datasets_tests/flux.nc")

fp1 = xr.open_dataset("combine_datasets_tests/fp.nc") flux1 = xr.open_dataset("combine_datasets_tests/flux.nc") ```

Then flux1 = flux1.reindex_like(fp1, method="ffill", tolerance=None) takes over a minute, while flux1 = flux1.load().reindex_like(fp1, method="ffill", tolerance=None) is almost instantaneous (timeit says 91ms, including opening the dataset... I'm not sure if caching is influencing this).

Profiling the "reindex without load" cell: ``` 804936 function calls (804622 primitive calls) in 93.285 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function) 1 92.211 92.211 93.191 93.191 {built-in method _operator.getitem} 1 0.289 0.289 0.980 0.980 utils.py:81(_StartCountStride) 6 0.239 0.040 0.613 0.102 shape_base.py:267(apply_along_axis) 72656 0.109 0.000 0.109 0.000 utils.py:429(<lambda>) 72656 0.085 0.000 0.136 0.000 utils.py:430(<lambda>) 72661 0.051 0.000 0.051 0.000 {built-in method numpy.arange} 145318 0.048 0.000 0.115 0.000 shape_base.py:370(<genexpr>) 2 0.045 0.023 0.046 0.023 indexing.py:1334(getitem) 6 0.044 0.007 0.044 0.007 numeric.py:136(ones) 145318 0.044 0.000 0.067 0.000 index_tricks.py:690(next) 14 0.033 0.002 0.033 0.002 {built-in method numpy.empty} 145333/145325 0.023 0.000 0.023 0.000 {built-in method builtins.next} 1 0.020 0.020 93.275 93.275 duck_array_ops.py:317(where) 21 0.018 0.001 0.018 0.001 {method 'astype' of 'numpy.ndarray' objects} 145330 0.013 0.000 0.013 0.000 {built-in method numpy.asanyarray} 1 0.002 0.002 0.002 0.002 {built-in method _functools.reduce} 1 0.002 0.002 93.279 93.279 variable.py:821(_getitem_with_mask) 18 0.001 0.000 0.001 0.000 {built-in method numpy.zeros} 1 0.000 0.000 0.000 0.000 file_manager.py:226(close) ```

The getitem call at the top is from xarray.backends.netCDF4_.py, line 114. Because of the jittered coordinates in flux, I'm assuming that the index passed to netCDF4 is not consecutive/strictly monotonic integers (0, 1, 2, 3, ...). In the past, this has caused issues: https://github.com/Unidata/netcdf4-python/issues/680.

In my venv, netCDF4 was installed from a wheel with the following versions: netcdf4-python version: 1.6.5 HDF5 lib version: 1.12.2 netcdf lib version: 4.9.3-development

This is with xarray version 2023.12.0, numpy 1.26, and pandas 1.5.3.

I will try to investigate more and hopefully simplify the example. (Can't quite justify spending more time on it at work because this is just to tag a version that was used in some experiments before we switch to zarr as a backend, so hopefully it won't be relevant at that point.)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8945/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2141447815 I_kwDOAMm_X85_o-aH 8768 `xarray/datatree_` missing in 2024.2.0 sdist mgorny 110765 closed 0     15 2024-02-19T03:57:31Z 2024-04-23T18:11:58Z 2024-04-23T15:35:21Z CONTRIBUTOR      

What happened?

Apparently xarray-2024.2.0 requires xarray.datatree_ module but this module isn't included in sdist tarball.

What did you expect to happen?

No response

Minimal Complete Verifiable Example

Python $ tar -tf /tmp/dist/xarray-2024.2.0.tar.gz | grep datatree_ (empty)

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

n/a

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8768/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1692909704 PR_kwDOAMm_X85PnMF6 7811 Generalize delayed TomNicholas 35968931 open 0     0 2023-05-02T18:34:26Z 2024-04-23T17:41:55Z   MEMBER   0 pydata/xarray/pulls/7811

A small follow-on to #7019 to allow using non-dask implementations of delayed.

(Builds off of #7019)

  • [x] Closes #7810
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7811/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1692904446 I_kwDOAMm_X85k56v- 7810 Generalize dask.delayed calls to go through ChunkManager TomNicholas 35968931 open 0     0 2023-05-02T18:30:32Z 2024-04-23T17:38:58Z   MEMBER      

[Deepak: Should we add chunked_array_type and from_array_kwargs to open_mfdataset?

I actually don't think we need to - from_array_kwargs is only going to get directly passed down to open_dataset, and hence could be considered part of **kwargs.

This should actually just work, except in the case of parallel=True. For that we could add delayed to the ChunkManager ABC, so that if cubed does implement cubed.delayed it could be added, else a NotImplementedError would be raised. I think all of this wouldn't be necessary if we had lazy concatenation in xarray though (xref https://github.com/pydata/xarray/issues/4628). That suggestion would mean we should also replace other instances of dask.delayed in other parts of the codebase though... I think I will split this into a separate issue in the interests of getting this one merged.

Originally posted by @TomNicholas in https://github.com/pydata/xarray/pull/7019#discussion_r1182904134

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7810/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2234142680 PR_kwDOAMm_X85sK0g8 8923 `"source"` encoding for datasets opened from `fsspec` objects keewis 14808389 open 0     5 2024-04-09T19:12:45Z 2024-04-23T16:54:09Z   MEMBER   0 pydata/xarray/pulls/8923

When opening files from path-like objects (str, pathlib.Path), the backend machinery (_dataset_from_backend_dataset) sets the "source" encoding. This is useful if we need the original path for additional processing, like writing to a similarly named file, or to extract additional metadata. This would be useful as well when using fsspec to open remote files.

In this PR, I'm extracting the path attribute that most fsspec objects have to set that value. I've considered using isinstance checks instead of the getattr-with-default, but the list of potential classes is too big to be practical (at least 4 classes just within fsspec itself).

If this sounds like a good idea, I'll update the documentation of the "source" encoding to mention this feature.

  • [x] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8923/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2248692681 PR_kwDOAMm_X85s8dDt 8953 stop pruning datatree_ directory from distribution flamingbear 479480 closed 0     0 2024-04-17T16:14:13Z 2024-04-23T15:39:06Z 2024-04-23T15:35:20Z CONTRIBUTOR   0 pydata/xarray/pulls/8953

This PR removes the directive that strips out the datatree_ directory from the xarray distribution.

It also cleans a few typing errors and removes exceptions for the datatree_ directory for mypy.

It does NOT remove the exception for pre-commit config.

  • [X] Closes #8768
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8953/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2257054431 I_kwDOAMm_X86Gh-rf 8963 datatree ops.py migration cleanup flamingbear 479480 open 0     0 2024-04-22T17:06:36Z 2024-04-22T18:23:31Z   CONTRIBUTOR      

What is your issue?

During the 3/26/2024 design discussion meeting (#8747), we discussed the monkey patching of methods that was required to wrap datatree nodes with the desired Xarray API function. This was a primarily done as a necessity due to the datatree code not living in the xarray repository. The better implementation could be to add the wrapping to xarray's generate_aggregations.py module. The ultimate decision made by Stephan and Tom was to add this as a future issue (this one) and migrate the file as is for now.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8963/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2255339414 PR_kwDOAMm_X85tSYdD 8962 Raise exception when using `diff` with a non-existent dimension nathanredmond 89410512 open 0     1 2024-04-22T00:04:58Z 2024-04-22T17:20:20Z   FIRST_TIMER   0 pydata/xarray/pulls/8962
  • [X] Closes #7748
  • [X] Tests added - test_dataset_diff_dim_nonexist, test_dataarray_diff_dim_nonexist
  • [X] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8962/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1664193419 I_kwDOAMm_X85jMZOL 7748 diff('non existing dimension') does not raise exception LunarLanding 4441338 open 0     4 2023-04-12T09:29:58Z 2024-04-21T22:31:37Z   NONE      

What happened?

Calling xr.DataArray.diff with a non-existing dimension does not raise an exception.

What did you expect to happen?

An exception to be raised.

Minimal Complete Verifiable Example

Python import xarray as xr; import numpy as np; xr.DataArray(np.arange(10),dims=('a',)).diff('b')

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.10.0-21-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 2023.3.0 pandas: 1.5.3 numpy: 1.23.5 scipy: 1.10.1 netCDF4: 1.6.2 pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.14.2 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2023.3.1 distributed: 2023.3.1 matplotlib: 3.7.1 cartopy: None seaborn: 0.12.2 numbagg: None fsspec: 2023.3.0 cupy: None pint: None sparse: 0.14.0 flox: 0.6.9 numpy_groupies: 0.9.20 setuptools: 67.6.0 pip: 23.0.1 conda: 23.1.0 pytest: 7.2.2 mypy: 1.1.1 IPython: 8.11.0 sphinx: 6.1.3
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7748/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2255271332 PR_kwDOAMm_X85tSKJs 8961 use `nan` instead of `NaN` keewis 14808389 closed 0     0 2024-04-21T21:26:18Z 2024-04-21T22:01:04Z 2024-04-21T22:01:03Z MEMBER   0 pydata/xarray/pulls/8961

FYI @aulemahal, numpy.NaN will be removed in the upcoming numpy=2.0 release.

  • [x] follow-up to #8603
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8961/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2187743087 PR_kwDOAMm_X85ptH1f 8840 Grouper, Resampler as public api dcherian 2448579 open 0     0 2024-03-15T05:16:05Z 2024-04-21T16:21:34Z   MEMBER   1 pydata/xarray/pulls/8840

Expose Grouper and Resampler as public API

TODO: - [ ] Consider avoiding IndexVariable


  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8840/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2100707586 PR_kwDOAMm_X85lFQn3 8669 Fix automatic broadcasting when wrapping array api class TomNicholas 35968931 closed 0     0 2024-01-25T16:05:19Z 2024-04-20T05:58:05Z 2024-01-26T16:41:30Z MEMBER   0 pydata/xarray/pulls/8669
  • [x] Closes #8665
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8669/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2252965835 I_kwDOAMm_X86GSYfL 8958 DataArray .rolling() unclear behaviour when center=False juliencarponcy 41296546 open 0     3 2024-04-19T13:12:28Z 2024-04-19T15:15:52Z   NONE      

What is your issue?

Hi,

I am using the rolling().construct() method which I found very convenient and efficient.

I had timeseries with 2 dimensions: time and channel. I used the construct to produce small overlapping windows of samples on all channels:

xr_data['full_windowed_eeg'] = xr_data['resampled_eeg'] \ .rolling(resampled_time=window_size, min_periods=None) \ .construct("window_tvec", stride=1, keep_attrs=True) \ .dropna('resampled_time') \ .rename({'resampled_time':'window_time'}).copy()

However, after not obtaining the result I expected, I found out that the new coord window_time, was corresponding to the original time coords at the end/right of the window, and not as the first time coord of the window as I expected.

There is no argument to specify this apparently, as in its current state, it allows only for taking the "center coord" or the "right coord" (if center=False).

I expect that the way I wanted it is not so uncommon, so implementing that possibility would be great. But more urgently, I would find it extremely useful and avoiding puzzling debbugging if this behaviour was clearly explained in the API reference.

But maybe I am missing something here?

Thanks for your great work !

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8958/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2250654663 I_kwDOAMm_X86GJkPH 8957 netCDF encoding and decoding issues. Thomas-Z 1492047 open 0     6 2024-04-18T13:06:49Z 2024-04-19T13:12:04Z   CONTRIBUTOR      

What happened?

Reading or writing netCDF variables containing scale_factor and/or fill_value might raise the following error: python UFuncTypeError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('int64') with casting rule 'same_kind' This problem might be related to the following changes: #7654.

What did you expect to happen?

I'm expecting it to work like it did before xarray 2024.03.0!

Minimal Complete Verifiable Example

```Python

Example 1, decoding problem.

import netCDF4 as nc import numpy as np import xarray as xr

with nc.Dataset("test1.nc", mode="w") as ncds: ncds.createDimension(dimname="d") ncx = ncds.createVariable( varname="x", datatype=np.int64, dimensions=("d",), fill_value=-1, )

ncx.scale_factor = 1e-3
ncx.units = "seconds"

ncx[:] = np.array([0.001, 0.002, 0.003])

This will raise the error

xr.load_dataset("test1.nc")

Example 2, encoding problem.

import netCDF4 as nc import numpy as np import xarray as xr

with nc.Dataset("test2.nc", mode="w") as ncds: ncds.createDimension(dimname="d") ncx = ncds.createVariable(varname="x", datatype=np.int8, dimensions=("d",))

ncx.scale_factor = 1000

ncx[:] = np.array([1000, 2000, 3000])

Reading it does work

data = xr.load_dataset("test2.nc")

Writing read data does not work

data.to_netcdf("text2x.nc") ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

```Python

Example 1 error


UFuncTypeError Traceback (most recent call last) Cell In[38], line 1 ----> 1 xr.load_dataset("test2.nc")

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/backends/api.py:280, in load_dataset(filename_or_obj, kwargs) 277 raise TypeError("cache has no effect in this context") 279 with open_dataset(filename_or_obj, kwargs) as ds: --> 280 return ds.load()

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/core/dataset.py:855, in Dataset.load(self, **kwargs) 853 for k, v in self.variables.items(): 854 if k not in lazy_data: --> 855 v.load() 857 return self

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/core/variable.py:961, in Variable.load(self, kwargs) 944 def load(self, kwargs): 945 """Manually trigger loading of this variable's data from disk or a 946 remote source into memory and return this variable. 947 (...) 959 dask.array.compute 960 """ --> 961 self._data = to_duck_array(self._data, **kwargs) 962 return self

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/namedarray/pycompat.py:134, in to_duck_array(data, **kwargs) 131 return loaded_data 133 if isinstance(data, ExplicitlyIndexed): --> 134 return data.get_duck_array() # type: ignore[no-untyped-call, no-any-return] 135 elif is_duck_array(data): 136 return data

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/core/indexing.py:809, in MemoryCachedArray.get_duck_array(self) 808 def get_duck_array(self): --> 809 self._ensure_cached() 810 return self.array.get_duck_array()

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/core/indexing.py:803, in MemoryCachedArray._ensure_cached(self) 802 def _ensure_cached(self): --> 803 self.array = as_indexable(self.array.get_duck_array())

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/core/indexing.py:760, in CopyOnWriteArray.get_duck_array(self) 759 def get_duck_array(self): --> 760 return self.array.get_duck_array()

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/core/indexing.py:630, in LazilyIndexedArray.get_duck_array(self) 625 # self.array[self.key] is now a numpy array when 626 # self.array is a BackendArray subclass 627 # and self.key is BasicIndexer((slice(None, None, None),)) 628 # so we need the explicit check for ExplicitlyIndexed 629 if isinstance(array, ExplicitlyIndexed): --> 630 array = array.get_duck_array() 631 return _wrap_numpy_scalars(array)

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/coding/variables.py:81, in _ElementwiseFunctionArray.get_duck_array(self) 80 def get_duck_array(self): ---> 81 return self.func(self.array.get_duck_array())

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/coding/variables.py:81, in _ElementwiseFunctionArray.get_duck_array(self) 80 def get_duck_array(self): ---> 81 return self.func(self.array.get_duck_array())

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/coding/variables.py:399, in _scale_offset_decoding(data, scale_factor, add_offset, dtype) 397 data = data.astype(dtype=dtype, copy=True) 398 if scale_factor is not None: --> 399 data *= scale_factor 400 if add_offset is not None: 401 data += add_offset

UFuncTypeError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'

Example 2 error


UFuncTypeError Traceback (most recent call last) Cell In[42], line 1 ----> 1 data.to_netcdf("text1x.nc")

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/core/dataset.py:2298, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 2295 encoding = {} 2296 from xarray.backends.api import to_netcdf -> 2298 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 2299 self, 2300 path, 2301 mode=mode, 2302 format=format, 2303 group=group, 2304 engine=engine, 2305 encoding=encoding, 2306 unlimited_dims=unlimited_dims, 2307 compute=compute, 2308 multifile=False, 2309 invalid_netcdf=invalid_netcdf, 2310 )

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/backends/api.py:1339, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1334 # TODO: figure out how to refactor this logic (here and in save_mfdataset) 1335 # to avoid this mess of conditionals 1336 try: 1337 # TODO: allow this work (setting up the file for writing array data) 1338 # to be parallelized with dask -> 1339 dump_to_store( 1340 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1341 ) 1342 if autoclose: 1343 store.close()

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/backends/api.py:1386, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1383 if encoder: 1384 variables, attrs = encoder(variables, attrs) -> 1386 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/backends/common.py:393, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 390 if writer is None: 391 writer = ArrayWriter() --> 393 variables, attributes = self.encode(variables, attributes) 395 self.set_attributes(attributes) 396 self.set_dimensions(variables, unlimited_dims=unlimited_dims)

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/backends/common.py:482, in WritableCFDataStore.encode(self, variables, attributes) 479 def encode(self, variables, attributes): 480 # All NetCDF files get CF encoded by default, without this attempting 481 # to write times, for example, would fail. --> 482 variables, attributes = cf_encoder(variables, attributes) 483 variables = {k: self.encode_variable(v) for k, v in variables.items()} 484 attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/conventions.py:795, in cf_encoder(variables, attributes) 792 # add encoding for time bounds variables if present. 793 _update_bounds_encoding(variables) --> 795 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()} 797 # Remove attrs from bounds variables (issue #2921) 798 for var in new_vars.values():

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/conventions.py:196, in encode_cf_variable(var, needs_copy, name) 183 ensure_not_multiindex(var, name=name) 185 for coder in [ 186 times.CFDatetimeCoder(), 187 times.CFTimedeltaCoder(), (...) 194 variables.BooleanCoder(), 195 ]: --> 196 var = coder.encode(var, name=name) 198 # TODO(kmuehlbauer): check if ensure_dtype_not_object can be moved to backends: 199 var = ensure_dtype_not_object(var, name=name)

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/coding/variables.py:476, in CFScaleOffsetCoder.encode(self, variable, name) 474 data -= pop_to(encoding, attrs, "add_offset", name=name) 475 if "scale_factor" in encoding: --> 476 data /= pop_to(encoding, attrs, "scale_factor", name=name) 478 return Variable(dims, data, attrs, encoding, fastpath=True)

UFuncTypeError: Cannot cast ufunc 'divide' output from dtype('float64') to dtype('int64') with casting rule 'same_kind' ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 5.15.0-92-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: ('fr_FR', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2 xarray: 2024.3.0 pandas: 2.2.2 numpy: 1.26.4 scipy: 1.13.0 netCDF4: 1.6.5 pydap: None h5netcdf: 1.3.0 h5py: 3.11.0 Nio: None zarr: 2.17.2 cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: None dask: 2024.4.1 distributed: 2024.4.1 matplotlib: 3.8.4 cartopy: 0.23.0 seaborn: None numbagg: None fsspec: 2024.3.1 cupy: None pint: 0.23 sparse: None flox: None numpy_groupies: None setuptools: 69.5.1 pip: 24.0 conda: 24.3.0 pytest: 8.1.1 mypy: 1.9.0 IPython: 8.22.2 sphinx: 7.3.5
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8957/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2004250796 I_kwDOAMm_X853dnCs 8473 Regular (linspace) Coordinates/Index JulienBrn 35689176 open 0     9 2023-11-21T13:08:08Z 2024-04-18T22:11:39Z   NONE      

Is your feature request related to a problem?

Most of my dimension coordinates fall into three categories: - Categorical coordinates - Pandas multiindex - Regular coordinates, that is of the form start + np.arange(n)/fs for some start, fs

I feel the way the latter is currently handled in xarray is suboptimal (unless I'm misusing this great library) as it has the following drawbacks: - Visually: It is not obvious that the coordinate is a linear space: when printing the dataset/array we see some of the values. - Computation Usage: applying scipy functions that require a regular sampling (for example scipy spectrogram is very annoying as one has to extract the fs and check that the coordinate is indeed regularly sampled. I currently use step=np.diff(a)[0], assert (np.abs(np.diff(a)-step))<epsilon).all(), fs=1/step - Rounding errors: sometimes one gets rounding errors in the values for the coordinate - Memory/Disk performance: when storing a dataset with few arrays, the storing of the coordinate values does take up some non negligible space (I have an example where one of my raw data is a one dimensional time array of 3gb and I like adding a coordinate system as soon as possible, thus doubling its size) - Speed: I would expect joins/alignment/rolling/... to be very fast on such coordinates

Note: It is not obvious for me from the documentation whether this is more of a "coordinate" enhancement or an "index" enhancement (index being to my knowledge discussed only in this part of the documentation ).

Describe the solution you'd like

A new type of index/coordinate where only the "start" and "fs" are stored. The _repr_inline may look like "RegularIndex(start, end, step=1/fs)". Perhaps another more generic possibility would be a type of coordinate system that is expressed as a transform fromnp.arange(s, e) by the bijective function f (with the inverse of f also provided). RegularIndex(start, end, fs) would then be an instance withf = lambda x: x/fs, inv(f) = lambda y: y*fs, s=round(start*fs), e = round(end*fs)+1 The advantage of this approach is that joins/alignment/selection/... could be handled generically on the np.arange(s, e) and this would also work on non linear spaces (for example log spaces)

Describe alternatives you've considered

I have tried writing an Index subclass but I struggle on the create_variables method. If I do not return a coordinate for the current dimension, then a.set_xindex(["t"], RegularIndex) keeps the previous coordinates and if I do, then I need to provide a Variable from the np.array that I do not want to create (for memory efficiency). I have tried to drop the coordinate after setting my custom index, but that seems to remove the index as well...

There may be many other problems as I have just quickly tried. Should this be a viable approach I may be open to writing a version myself and post it for review. However, I am relatively new to xarray and I would appreciate to first know if I am on the right track.

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8473/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2134951079 I_kwDOAMm_X85_QMSn 8747 Datatree design discussions - weekly meeting TomNicholas 35968931 open 0     10 2024-02-14T18:39:16Z 2024-04-18T22:09:16Z   MEMBER      

What is your issue?

In the bi-weekly dev meeting today we agreed that deliberate higher-level discussions of datatree's design would be useful. (i.e. we're not worried about our ability to write high-quality code, so let's focus review time more explicitly on the high-level design questions.)

This could take the form of me just talking through what I did in a certain part of the code and why, or a targeted discussion on specific design questions that I was never quite sure about. Some examples of the latter, as food for thought: - [ ] Inheritance of dimension coordinates from parent nodes? https://github.com/xarray-contrib/datatree/issues/297 - [x] ~~Symbolic links? https://github.com/xarray-contrib/datatree/issues/5~~ (we decided this was overkill) - [ ] Is dt.ds ugly? See also the difference between dt.ds and dt.to_dataset() https://github.com/xarray-contrib/datatree/issues/303#issuecomment-1917798769 - [ ] Which methods should map over the subtree and which shouldn't? (can't find the issue for this one) - [ ] Ignore missing dims when mapping over subtree? https://github.com/xarray-contrib/datatree/issues/67 - [ ] API for sub-tree selection https://github.com/xarray-contrib/datatree/issues/254 - [ ] API for merging leaves https://github.com/xarray-contrib/datatree/issues/192 - [ ] Dict-like interface ambiguities https://github.com/xarray-contrib/datatree/issues/240 - [ ] The tree broadcasting rabbit hole https://github.com/xarray-contrib/datatree/issues/199 - [ ] Relationship between datatree and catalogs https://github.com/xarray-contrib/datatree/issues/134 - [ ] Should xr.concat/xr.merge accept DataTree objects? (and map over them by default?) Would help with https://github.com/TomNicholas/VirtualiZarr/issues/84#issuecomment-2065410549

There was also this design doc I wrote at one point

@flamingbear are you free at 11:30am EST on Tuesday each week? @shoyer, @keewis and I are all free then. Others also welcome (e.g. @owenlittlejohns , @eni-awowale, @etienneschalk), but not required :)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8747/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Next page

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 434.482ms · About: xarray-datasette