home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

7,034 rows where state = "closed" sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: locked, assignee, milestone, author_association, draft, state_reason, created_at (date), updated_at (date), closed_at (date)

type 2

  • pull 3,876
  • issue 3,158

state 1

  • closed · 7,034 ✖

repo 1

  • xarray 7,034
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
503583044 MDU6SXNzdWU1MDM1ODMwNDQ= 3379 `ds.to_zarr(mode="a", append_dim="time")` not capturing any time steps under Hours jminsk-cc 48155582 closed 0     3 2019-10-07T17:17:06Z 2024-05-03T18:34:50Z 2024-05-03T18:34:50Z NONE      

MCVE Code Sample

```python import datetime

import xarray as xr

date = datetime.datetime(2019, 1, 1, 1, 10)

Reading in 2 min time stepped MRMS data

ds = xr.open_rasterio(dir_path) ds.name = "mrms" ds["time"] = date ds = ds.expand_dims("time") ds = ds.to_dataset()

ds.to_zarr("fin_zarr", compute=False, mode="w-")

date = datetime.datetime(2019, 1, 1, 1, 12)

Reading in 2 min time stepped MRMS data

This can be the same file since we are adding time manually

ds = xr.open_rasterio(dir_path) ds.name = "mrms" ds["time"] = date ds = ds.expand_dims("time") ds = ds.to_dataset()

ds.to_zarr("fin_zarr", compute=False, mode="a", append_dim="time") ```

Expected Output

<xarray.Dataset> Dimensions: (band: 1, time: 1, x: 7000, y: 3500) Coordinates: * band (band) int64 1 * y (y) float64 55.0 54.99 54.98 54.97 ... 20.04 20.03 20.02 20.01 * x (x) float64 -130.0 -130.0 -130.0 -130.0 ... -60.03 -60.02 -60.01 * time (time) datetime64[ns] 2019-01-01T01:10:00 Data variables: mrms (time, band, y, x) uint8 255 255 255 255 255 ... 255 255 255 255 appended by this in a ds.to_zarr() <xarray.Dataset> Dimensions: (band: 1, time: 1, x: 7000, y: 3500) Coordinates: * band (band) int64 1 * y (y) float64 55.0 54.99 54.98 54.97 ... 20.04 20.03 20.02 20.01 * x (x) float64 -130.0 -130.0 -130.0 -130.0 ... -60.03 -60.02 -60.01 * time (time) datetime64[ns] 2019-01-01T01:12:00 Data variables: mrms (time, band, y, x) uint8 255 255 255 255 255 ... 255 255 255 255 should look like below <xarray.Dataset> Dimensions: (band: 1, time: 2, x: 7000, y: 3500) Coordinates: * band (band) int64 1 * time (time) datetime64[ns] 2019-01-01T01:10:00 2019-01-01T01:12:00 * x (x) float64 -130.0 -130.0 -130.0 -130.0 ... -60.03 -60.02 -60.01 * y (y) float64 55.0 54.99 54.98 54.97 ... 20.04 20.03 20.02 20.01 Data variables: mrms (time, band, y, x) uint8 dask.array<shape=(2, 1, 3500, 7000), chunksize=(1, 1, 438, 1750)>

Problem Description

The outout looks like this: <xarray.Dataset> Dimensions: (band: 1, time: 2, x: 7000, y: 3500) Coordinates: * band (band) int64 1 * time (time) datetime64[ns] 2019-01-01T01:10:00 2019-01-01T01:10:00 * x (x) float64 -130.0 -130.0 -130.0 -130.0 ... -60.03 -60.02 -60.01 * y (y) float64 55.0 54.99 54.98 54.97 ... 20.04 20.03 20.02 20.01 Data variables: mrms (time, band, y, x) uint8 dask.array<shape=(2, 1, 3500, 7000), chunksize=(1, 1, 438, 1750)>

Where the minutes are repeated for the whole hour until a new hour is appended. It seems to not be handling minutes correctly.

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, Mar 27 2019, 16:54:48) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.12.3 pandas: 0.24.2 numpy: 1.16.4 scipy: 1.3.0 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: 1.0.21 cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.1.0 distributed: 2.1.0 matplotlib: 3.1.0 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: None setuptools: 41.0.1 pip: 19.1.1 conda: 4.7.12 pytest: 5.0.1 IPython: 7.6.1 sphinx: 2.1.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3379/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1050082137 I_kwDOAMm_X84-lvtZ 5969 `to_zarr(append_dim="time")` appends incorrect datetimes JackKelly 460756 closed 0     3 2021-11-10T17:00:53Z 2024-05-03T17:09:31Z 2024-05-03T17:09:30Z NONE      

Description

If you create a Zarr with a single timestep and then append to the time dimension of that Zarr in subsequent writes then the appended timestamps are likely to be wrong. This only seems to happen if the time dimension is datetime64.

Minimal Complete Verifiable Example

Create a really simple Dataset:

python times = pd.date_range("2000-01-01 00:35", periods=8, freq="6H") da = xr.DataArray(coords=[times], dims=["time"]) ds = da.to_dataset(name="foo")

Write just the first timestep to a new Zarr store:

python ZARR_PATH = "test.zarr" ds.isel(time=[0]).to_zarr(ZARR_PATH, mode="w")

So far, so good!

Now things get weird... let's append the remainder of ds to the Zarr store:

python ds.isel(time=slice(1, None)).to_zarr(ZARR_PATH, append_dim="time")

This throws a warning, which is probably relevant:

/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/dataset.py:2037: SerializationWarning: saving variable None with floating point data as an integer dtype without any _FillValue to use for NaNs return to_zarr(

What happened

Let's load the Zarr and print the contents on the time coord:

python ds_loaded = xr.open_dataset(ZARR_PATH, engine="zarr") print(ds_loaded.time) <xarray.DataArray 'time' (time: 8)> array(['2000-01-01T00:35', '2000-01-01T00:35', '2000-01-01T00:35', '2000-01-02T00:35', '2000-01-02T00:35', '2000-01-02T00:35', '2000-01-03T00:35', '2000-01-03T00:35'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2000-01-01T00:35:00 ... 2000-01-03T00:35:00

(I've removed the seconds and milliseconds to make it a bit easier to read)

The first and fifth time coords (2000-01-01T00:35 and 2000-01-02T00:35) are correct. None of the others are correct!

The encoding is not appropriate (see #3942)... notice that the units is days since..., which clearly can't represent sub-day resolution:

python print(ds_loaded.time.encoding) {'chunks': (1,), 'preferred_chunks': {'time': 1}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'units': 'days since 2000-01-01 00:35:00', 'calendar': 'proleptic_gregorian', 'dtype': dtype('int64')}

What you expected to happen

The correct time coords are: python print(ds.time) <xarray.DataArray 'time' (time: 8)> array(['2000-01-01T00:35', '2000-01-01T06:35', '2000-01-01T12:35', '2000-01-01T18:35', '2000-01-02T00:35', '2000-01-02T06:35', '2000-01-02T12:35', '2000-01-02T18:35'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2000-01-01T00:35:00 ... 2000-01-02T18:35:00

Anything else we need to know?

There are three workarounds that I'm aware of:

1) When first creating the Zarr, write two or more timesteps into the Zarr. Then you can append any number of timesteps to the Zarr and everything works fine. 2) Convert the time coords to Unix epoch, represented as ints. 3) Manually set the encoding before the first write (as suggested in https://github.com/pydata/xarray/issues/3942#issuecomment-610444090). For example:

python ds.isel(time=[0]).to_zarr( ZARR_PATH, mode="w", encoding={ 'time': { 'units': 'seconds since 1970-01-01' } } )

Related issues

It's possible that the root cause of this issue is #3942.

And I think #3379 is another symptom of this issue.

Environment

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.13.0-21-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: ('en_GB', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.20.1 pandas: 1.3.4 numpy: 1.21.4 scipy: 1.7.2 netCDF4: 1.5.8 pydap: None h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.10.1 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.8 cfgrib: 0.9.9.1 iris: None bottleneck: 1.3.2 dask: 2021.10.0 distributed: None matplotlib: 3.4.3 cartopy: None seaborn: None numbagg: None fsspec: 2021.11.0 cupy: None pint: None sparse: None setuptools: 58.5.3 pip: 21.3.1 conda: None pytest: 6.2.5 IPython: 7.29.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5969/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2275404926 PR_kwDOAMm_X85uWjVP 8993 call `np.cross` with 3D vectors only keewis 14808389 closed 0     1 2024-05-02T12:21:30Z 2024-05-03T15:56:49Z 2024-05-03T15:22:26Z MEMBER   0 pydata/xarray/pulls/8993
  • [x] towards #8844

In the tests, we've been calling np.cross with vectors of 2 or 3 dimensions, numpy>=2 will deprecate 2D vectors (plus, we're now raising on warnings). Thus, we 0-pad the inputs before generating the expected result (which generally should not change the outcome of the tests).

For a later PR: add tests to check if xr.cross works if more than a single dimension is present, and pre-compute the expected result. Also, for property-based testing: the cross-product of two vectors is perpendicular to both input vectors (use the dot product to check that), and its length (l2-norm) is the product of the lengths of the input vectors.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8993/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2203689075 PR_kwDOAMm_X85qjXJq 8870 Enable explicit use of key tuples (instead of *Indexer objects) in indexing adapters and explicitly indexed arrays andersy005 13301940 closed 0     1 2024-03-23T04:34:18Z 2024-05-03T15:27:38Z 2024-05-03T15:27:22Z MEMBER   0 pydata/xarray/pulls/8870
  • [ ] Towards #8856
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8870/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2276732187 PR_kwDOAMm_X85ubH0P 8996 Mark `test_use_cftime_false_standard_calendar_in_range` as an expected failure spencerkclark 6628425 closed 0     0 2024-05-03T01:05:21Z 2024-05-03T15:21:48Z 2024-05-03T15:21:48Z MEMBER   0 pydata/xarray/pulls/8996

Per https://github.com/pydata/xarray/issues/8844#issuecomment-2089427222, for the time being this marks test_use_cftime_false_standard_calendar_in_range as an expected failure under NumPy 2. Hopefully we'll be able to fix the upstream issue in pandas eventually.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8996/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2266442492 PR_kwDOAMm_X85t4NhR 8976 Migration of datatree/ops.py -> datatree_ops.py flamingbear 479480 closed 0     4 2024-04-26T20:14:11Z 2024-05-02T19:49:39Z 2024-05-02T19:49:39Z CONTRIBUTOR   0 pydata/xarray/pulls/8976

I considered wedging this into core/ops.py, but it didn't look like it fit there.

This is a basic lift and shift from datatree_/ops.py to core/datatree_ops.py

I did fix the document addendum injection and added a couple of tests.

  • [x] Contributes to migration step for miscellaneous modules in Track merging datatree into xarray #8572
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8976/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2241526039 PR_kwDOAMm_X85skMs0 8939 avoid a couple of warnings in `polyfit` keewis 14808389 closed 0     14 2024-04-13T11:49:13Z 2024-05-01T16:42:06Z 2024-05-01T15:34:20Z MEMBER   0 pydata/xarray/pulls/8939

- [x] towards #8844

  • replace numpy.core.finfo with numpy.finfo
  • add dtype and copy parameters to all definitions of __array__
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8939/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2270984193 PR_kwDOAMm_X85uHk70 8986 clean up the upstream-dev setup script keewis 14808389 closed 0     1 2024-04-30T09:34:04Z 2024-04-30T23:26:13Z 2024-04-30T20:59:56Z MEMBER   0 pydata/xarray/pulls/8986

In trying to install packages that are compatible with numpy>=2 I added several projects that are built in CI without build isolation (so that they will be built with the nightly version of numpy). That was a temporary workaround, so we should start thinking about cleaning this up.

As it seems numcodecs is now compatible (or uses less of numpy in compiled code, not sure), this is an attempt to see if CI works if we use the version from conda-forge.

bottleneck and cftime now build against numpy>=2.0.0rc1, so we can stop building them without build isolation.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8986/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2272299822 PR_kwDOAMm_X85uL82a 8989 Skip flaky `test_open_mfdataset_manyfiles` test max-sixty 5635139 closed 0     0 2024-04-30T19:24:41Z 2024-04-30T20:27:04Z 2024-04-30T19:46:34Z MEMBER   0 pydata/xarray/pulls/8989

Don't just xfail, and not only on windows, since it can crash the worker

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8989/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2271670475 PR_kwDOAMm_X85uJ5Er 8988 Remove `.drop` warning allow max-sixty 5635139 closed 0     0 2024-04-30T14:39:35Z 2024-04-30T19:26:17Z 2024-04-30T19:26:16Z MEMBER   0 pydata/xarray/pulls/8988  
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8988/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2271652603 PR_kwDOAMm_X85uJ122 8987 Add notes on when to add ignores to warnings max-sixty 5635139 closed 0     0 2024-04-30T14:34:52Z 2024-04-30T14:56:47Z 2024-04-30T14:56:46Z MEMBER   0 pydata/xarray/pulls/8987  
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8987/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2262468762 PR_kwDOAMm_X85tqnJm 8973 Docstring and documentation improvement for the Dataset class noahbenson 2005723 closed 0     7 2024-04-25T01:39:02Z 2024-04-30T14:40:32Z 2024-04-30T14:40:14Z CONTRIBUTOR   0 pydata/xarray/pulls/8973

The example in the doc-string of the Dataset class prior to this commit uses an example array whose size is 2 x 2 x 3 with the first two dimensions labeled "x" and "y" and the final dimension labeled "time". This was confusing due to the fact that "x" and "y" are just arbitrary names for these axes and that no reason is given for the data to be organized in a 2x2x3 array instead of a 2x2 matrix. This commit clarifies the example.

Additionally, this PR contains updates to the documentation, specifically the user-guide/data-structures.rst file; the updates bring the documentation examples into alignment with the doc-string change. Unfortunately, I wasn't able to build the documentation, so this will need to be checked. (I followed the instructions here, but despite cfgrib working fine, I got an error about how it wasn't a valid engine.)

See issue #8970 for more information.

  • [X] Closes #8970
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8973/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2261858401 I_kwDOAMm_X86G0Thh 8970 Example code in the documentation for `Dataset` is not clear noahbenson 2005723 closed 0     13 2024-04-24T17:50:46Z 2024-04-30T14:40:15Z 2024-04-30T14:40:15Z CONTRIBUTOR      

What is your issue?

The example code in the documentation for the Dataset class (e.g., here) is probably clear to those who study Earth and Atmospheric Sciences, but it makes no sense to me. Here is the code:

```python np.random.seed(0) temperature = 15 + 8 * np.random.randn(2, 2, 3) precipitation = 10 * np.random.rand(2, 2, 3) lon = [[-99.83, -99.32], [-99.79, -99.23]] lat = [[42.25, 42.21], [42.63, 42.59]] time = pd.date_range("2014-09-06", periods=3) reference_time = pd.Timestamp("2014-09-05")

ds = xr.Dataset( data_vars=dict( temperature=(["x", "y", "time"], temperature), precipitation=(["x", "y", "time"], precipitation), ), coords=dict( lon=(["x", "y"], lon), lat=(["x", "y"], lat), time=time, reference_time=reference_time, ), attrs=dict(description="Weather related data."), ) ```

To be clear, I understand each individual line of code, but I don't understand why there is both a latitude/longitude and an x/y in this example or how they are supposed to be related to each other (and there do not appear to be any additional details about this dataset's intended structure). Probably due to this lack of clarity I'm having a hard time wrapping my head around what the x/y coordinates and the lat/lon coordinates are supposed to demonstrate about xarray here, or how the x/y and lat/lon values are represented in the data structure. Are the x and y coordinates in a map projection of some kind? I have worked successfully with Datasets in the past, but as someone who doesn't work with geospatial data, I find myself more confused about Datasets after reading this example than before.

I suspect that all that is needed is a clear description of what these data are supposed to represent, how they are intended to be used, and how x/y and lat/lon are related. If someone can explain this to me, I'd be happy to submit a PR for the docs.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8970/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2212435865 PR_kwDOAMm_X85rAwYu 8885 add `.oindex` and `.vindex` to `BackendArray` andersy005 13301940 closed 0     8 2024-03-28T06:14:43Z 2024-04-30T12:12:50Z 2024-04-17T01:53:23Z MEMBER   0 pydata/xarray/pulls/8885

this PR builds towards

  • https://github.com/pydata/xarray/pull/8870
  • https://github.com/pydata/xarray/pull/8856

the primary objective is to partially address

  1. Implement fall back .oindex, .vindex properties on BackendArray base class. These will simply rewrap the key tuple with the appropriate *Indexer object, and pass it on to __getitem__ or __setitem__. These methods will also raise DeprecationWarning so that external backends will know to migrate to .oindex, and .vindex over the next year.

  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8885/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1250939008 I_kwDOAMm_X85Kj9CA 6646 `dim` vs `dims` max-sixty 5635139 closed 0     4 2022-05-27T16:15:02Z 2024-04-29T18:24:56Z 2024-04-29T18:24:56Z MEMBER      

What is your issue?

I've recently been hit with this when experimenting with xr.dot and xr.corr — xr.dot takes dims, and xr.cov takes dim. Because they each take multiple arrays as positional args, kwargs are more conventional.

Should we standardize on one of these?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6646/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2268058661 PR_kwDOAMm_X85t9f5f 8982 Switch all methods to `dim` max-sixty 5635139 closed 0     0 2024-04-29T03:42:34Z 2024-04-29T18:24:56Z 2024-04-29T18:24:55Z MEMBER   0 pydata/xarray/pulls/8982

I think this is the final set of methods

  • [x] Closes #6646
  • [ ] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8982/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2267810980 PR_kwDOAMm_X85t8q4s 8981 Enable ffill for datetimes max-sixty 5635139 closed 0     5 2024-04-28T20:53:18Z 2024-04-29T18:09:48Z 2024-04-28T23:02:11Z MEMBER   0 pydata/xarray/pulls/8981

Notes inline. Would fix #4587

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8981/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2019566184 I_kwDOAMm_X854YCJo 8494 Filter expected warnings in the test suite TomNicholas 35968931 closed 0     1 2023-11-30T21:50:15Z 2024-04-29T16:57:07Z 2024-04-29T16:56:16Z MEMBER      

FWIW one thing I'd be keen for to do generally — though maybe this isn't the place to start it — is handle warnings in the test suite when we add a new warning — i.e. filter them out where we expect them.

In this case, that would be the loading the netCDF files that have duplicate dims.

Otherwise warnings become a huge block of text without much salience. I mostly see the 350 lines of them and think "meh mostly units & cftime", but then something breaks on a new upstream release that was buried in there, or we have a supported code path that is raising warnings internally.

(I'm not sure whether it's possible to generally enforce that — maybe we could raise on any warnings coming from within xarray? Would be a non-trivial project to get us there though...)

Originally posted by @max-sixty in https://github.com/pydata/xarray/issues/8491#issuecomment-1834615826

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8494/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2261855627 PR_kwDOAMm_X85togwQ 8969 CI: python 3.12 by default. dcherian 2448579 closed 0     2 2024-04-24T17:49:25Z 2024-04-29T16:21:20Z 2024-04-29T16:21:08Z MEMBER   0 pydata/xarray/pulls/8969
  1. Now that numba supports 3.12.
  2. Disabled pint on the main environment since it doesn't work. Pint is still installed in the all-but-dask env, which is still runs python 3.11 for this reason.
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8969/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2118308210 I_kwDOAMm_X85-QtFy 8707 Weird interaction between aggregation and multiprocessing on DaskArrays saschahofmann 24508496 closed 0     10 2024-02-05T11:35:28Z 2024-04-29T16:20:45Z 2024-04-29T16:20:44Z CONTRIBUTOR      

What happened?

When I try to run a modified version of the example from the dropna documentation (see below), it creates a never terminating process. To reproduce it I added a rolling operation before dropping nans and then run 4 processes using the standard library multiprocessing Pool class on DaskArrays. Running the rolling + dropna in a for loop finishes as expectedly in no time.

What did you expect to happen?

There is nothing obvious to me why this wouldn't just work unless there is a weird interaction between the Dask threads and the different processes. Using Xarray+Dask+Multiprocessing seems to work for me on other functions, it seems to be this particular combination that is problematic.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np from multiprocessing import Pool

datasets = [xr.Dataset( { "temperature": ( ["time", "location"], [[23.4, 24.1], [np.nan if i>1 else 23.4, 22.1 if i<2 else np.nan], [21.8 if i<3 else np.nan, 24.2], [20.5, 25.3]], ) }, coords={"time": [1, 2, 3, 4], "location": ["A", "B"]}, ).chunk(time=2) for i in range(4)]

def process(dataset): return dataset.rolling(dim={'time':2}).sum().dropna(dim="time", how="all").compute()

This works as expected

dropped = [] for dataset in datasets: dropped.append(process(dataset))

This seems to never finish

with Pool(4) as p: dropped = p.map(process, datasets) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

I am still running on 2023.08.0 see below for more details about the environment

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 (main, Jan 25 2024, 20:42:03) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-124-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2023.8.0 pandas: 2.1.4 numpy: 1.26.3 scipy: 1.12.0 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: 1.4.1 PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2024.1.1 distributed: 2024.1.1 matplotlib: 3.8.2 cartopy: 0.22.0 seaborn: None numbagg: None fsspec: 2023.12.2 cupy: None pint: 0.23 sparse: None flox: 0.9.0 numpy_groupies: 0.10.2 setuptools: 69.0.3 pip: 23.2.1 conda: None pytest: 8.0.0 mypy: None IPython: 8.18.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8707/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2267711587 PR_kwDOAMm_X85t8VWy 8978 more engine environment tricks in preparation for `numpy>=2` keewis 14808389 closed 0     7 2024-04-28T17:54:38Z 2024-04-29T14:56:22Z 2024-04-29T14:56:21Z MEMBER   0 pydata/xarray/pulls/8978

Turns out pydap also needs to build with numpy>=2. Until it does, we should remove it from the upstream-dev environment. Also, numcodecs build-depends on setuptools-scm.

And finally, the h5py nightlies might support numpy>=2 (h5py>=3.11 supposedly is numpy>=2 compatible), so once again I'll try and see if CI passes.

  • [x] towards #8844
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8978/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2262478932 PR_kwDOAMm_X85tqpUi 8974 Raise errors on new warnings from within xarray max-sixty 5635139 closed 0     2 2024-04-25T01:50:48Z 2024-04-29T12:18:42Z 2024-04-29T02:50:21Z MEMBER   0 pydata/xarray/pulls/8974

Notes are inline.

  • [x] Closes https://github.com/pydata/xarray/issues/8494
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst

Done with some help from an LLM — quite good for doing tedious tasks that we otherwise wouldn't want to do — can paste in all the warnings output and get a decent start on rules for exclusions

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8974/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1997537503 PR_kwDOAMm_X85fqp3A 8459 Check for aligned chunks when writing to existing variables max-sixty 5635139 closed 0     5 2023-11-16T18:56:06Z 2024-04-29T03:05:36Z 2024-03-29T14:35:50Z MEMBER   0 pydata/xarray/pulls/8459

While I don't feel super confident that this is designed to protect against any bugs, it does solve the immediate problem in #8371, by hoisting the encoding check above the code that runs for only new variables. The encoding check is somewhat implicit, so this was an easy thing to miss prior.

  • [x] Closes #8371,
  • [x] Closes #8882
  • [x] Closes #8876
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8459/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1574694462 I_kwDOAMm_X85d2-4- 7513 intermittent failures with h5netcdf, h5py on macos dcherian 2448579 closed 0     5 2023-02-07T16:58:43Z 2024-04-28T23:35:21Z 2024-04-28T23:35:21Z MEMBER      

What is your issue?

cc @hmaarrfk @kmuehlbauer

Passed: https://github.com/pydata/xarray/actions/runs/4115923717/jobs/7105298426 Failed: https://github.com/pydata/xarray/actions/runs/4115946392/jobs/7105345290

Versions: h5netcdf 1.1.0 pyhd8ed1ab_0 conda-forge h5py 3.8.0 nompi_py310h5555e59_100 conda-forge hdf4 4.2.15 h7aa5921_5 conda-forge hdf5 1.12.2 nompi_h48135f9_101 conda-forge

``` =================================== FAILURES =================================== ___ test_open_mfdataset_manyfiles[h5netcdf-20-True-5-5] ______ [gw1] darwin -- Python 3.10.9 /Users/runner/micromamba-root/envs/xarray-tests/bin/python

readengine = 'h5netcdf', nfiles = 20, parallel = True, chunks = 5 file_cache_maxsize = 5

@requires_dask
@pytest.mark.filterwarnings("ignore:use make_scale(name) instead")
def test_open_mfdataset_manyfiles(
    readengine, nfiles, parallel, chunks, file_cache_maxsize
):
    # skip certain combinations
    skip_if_not_engine(readengine)

    if ON_WINDOWS:
        pytest.skip("Skipping on Windows")

    randdata = np.random.randn(nfiles)
    original = Dataset({"foo": ("x", randdata)})
    # test standard open_mfdataset approach with too many files
    with create_tmp_files(nfiles) as tmpfiles:
        writeengine = readengine if readengine != "pynio" else "netcdf4"
        # split into multiple sets of temp files
        for ii in original.x.values:
            subds = original.isel(x=slice(ii, ii + 1))
            if writeengine != "zarr":
                subds.to_netcdf(tmpfiles[ii], engine=writeengine)
            else:  # if writeengine == "zarr":
                subds.to_zarr(store=tmpfiles[ii])

        # check that calculation on opened datasets works properly
      with open_mfdataset(
            tmpfiles,
            combine="nested",
            concat_dim="x",
            engine=readengine,
            parallel=parallel,
            chunks=chunks if (not chunks and readengine != "zarr") else "auto",
        ) as actual:

/Users/runner/work/xarray/xarray/xarray/tests/test_backends.py:3267:


/Users/runner/work/xarray/xarray/xarray/backends/api.py:991: in open_mfdataset datasets, closers = dask.compute(datasets, closers) /Users/runner/micromamba-root/envs/xarray-tests/lib/python3.10/site-packages/dask/base.py:599: in compute results = schedule(dsk, keys, kwargs) /Users/runner/micromamba-root/envs/xarray-tests/lib/python3.10/site-packages/dask/threaded.py:89: in get results = get_async( /Users/runner/micromamba-root/envs/xarray-tests/lib/python3.10/site-packages/dask/local.py:511: in get_async raise_exception(exc, tb) /Users/runner/micromamba-root/envs/xarray-tests/lib/python3.10/site-packages/dask/local.py:319: in reraise raise exc /Users/runner/micromamba-root/envs/xarray-tests/lib/python3.10/site-packages/dask/local.py:224: in execute_task result = _execute_task(task, data) /Users/runner/micromamba-root/envs/xarray-tests/lib/python3.10/site-packages/dask/core.py:119: in _execute_task return func((_execute_task(a, cache) for a in args)) /Users/runner/micromamba-root/envs/xarray-tests/lib/python3.10/site-packages/dask/utils.py:72: in apply return func(args, kwargs) /Users/runner/work/xarray/xarray/xarray/backends/api.py:526: in open_dataset backend_ds = backend.open_dataset( /Users/runner/work/xarray/xarray/xarray/backends/h5netcdf_.py:417: in open_dataset ds = store_entrypoint.open_dataset( /Users/runner/work/xarray/xarray/xarray/backends/store.py:32: in open_dataset vars, attrs = store.load() /Users/runner/work/xarray/xarray/xarray/backends/common.py:129: in load (decode_variable_name(k), v) for k, v in self.get_variables().items() /Users/runner/work/xarray/xarray/xarray/backends/h5netcdf.py:220: in get_variables return FrozenDict( /Users/runner/work/xarray/xarray/xarray/core/utils.py:471: in FrozenDict return Frozen(dict(args, *kwargs)) /Users/runner/work/xarray/xarray/xarray/backends/h5netcdf_.py:221: in <genexpr> (k, self.open_store_variable(k, v)) for k, v in self.ds.variables.items() /Users/runner/work/xarray/xarray/xarray/backends/h5netcdf_.py:200: in open_store_variable elif var.compression is not None: /Users/runner/micromamba-root/envs/xarray-tests/lib/python3.10/site-packages/h5netcdf/core.py:394: in compression return self._h5ds.compression


self = <[AttributeError("'NoneType' object has no attribute '_root'") raised in repr()] Variable object at 0x151378970>

@property
def _h5ds(self):
    # Always refer to the root file and store not h5py object
    # subclasses:
  return self._root._h5file[self._h5path]

E AttributeError: 'NoneType' object has no attribute '_h5file'

```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7513/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1579956621 I_kwDOAMm_X85eLDmN 7519 Selecting variables from Dataset with view on dict keys is of type DataArray derhintze 25172489 closed 0     7 2023-02-10T16:02:19Z 2024-04-28T21:01:28Z 2024-04-28T21:01:27Z NONE      

What happened?

When selecting variables from a Dataset using a view on dict keys, the type returned is a DataArray, whereas the same using a list is a Dataset.

What did you expect to happen?

The type returned should be a Dataset.

Minimal Complete Verifiable Example

```Python import xarray as xr

d = {"a": ("dim", range(1, 4)), "b": ("dim", range(2, 5))}

data = xr.Dataset(d) select_dict = data[d.keys()] select_list = data[list(d)]

reveal_type(select_dict) reveal_type(select_list) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Python $ mypy test.py test.py:9: note: Revealed type is "xarray.core.dataarray.DataArray" test.py:10: note: Revealed type is "xarray.core.dataset.Dataset" Success: no issues found in 1 source file

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.10 (main, Mar 15 2022, 15:56:56) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.49.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.0 xarray: 2022.12.0 pandas: 1.5.2 numpy: 1.23.5 scipy: 1.10.0 netCDF4: 1.6.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.6.3 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 58.1.0 pip: 23.0 conda: None pytest: 7.2.1 mypy: 0.991 IPython: 8.8.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7519/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1024011835 I_kwDOAMm_X849CS47 5857 Incorrect results when using xarray.ufuncs.angle(..., deg=True) cvr 1119116 closed 0     4 2021-10-12T16:24:11Z 2024-04-28T20:58:55Z 2024-04-28T20:58:54Z NONE      

What happened:

The xarray.ufuncs.angle is broken. From the help docstring one may use option deg=True to have the result in degrees instead of radians (which is consistent with numpy.angle function). Yet results show that this is not the case. Moreover specifying deg=True or deg=False leads to the same result with the values in radians.

What you expected to happen:

To have the result of xarray.ufuncs.angle converted to degrees when option deg=True is specified.

Minimal Complete Verifiable Example:

```python

Put your MCVE code here

import numpy as np import xarray as xr

ds = xr.Dataset(coords={'wd': ('wd', np.arange(0, 360, 30, dtype=float))})

Z = xr.ufuncs.exp(1j * xr.ufuncs.radians(ds.wd)) D = xr.ufuncs.angle(Z, deg=True) # YIELDS INCORRECT RESULTS if not np.allclose(ds.wd, (D % 360)): print(f"Issue with angle operation: {D.values%360} instead of {ds.wd.values}" \ + f"\n\tERROR xr.ufuncs.angle(Z, deg=True) gives incorrect results !!!")

D = xr.ufuncs.degrees(xr.ufuncs.angle(Z)) # Works OK if not np.allclose(ds.wd, (D % 360)): print(f"Issue with angle operation: {D%360} instead of {ds.wd}" \ + f"\n\tERROR xr.ufuncs.degrees(xr.ufuncs.angle(Z)) gives incorrect results!!!")

D = xr.apply_ufunc(np.angle, Z, kwargs={'deg': True}) # Works OK if not np.allclose(ds.wd, (D % 360)): print(f"Issue with angle operation: {D%360} instead of {ds.wd}" \ + f"\n\tERROR xr.apply_ufunc(np.angle, Z, kwargs={{'deg': True}}) gives incorrect results!!!") ```

Anything else we need to know?:

Though xarray.ufuncs has a deprecated warning stating that the numpy equivalent may be used, this is not true for numpy.angle. Example:

```python import numpy as np import xarray as xr

ds = xr.Dataset(coords={'wd': ('wd', np.arange(0, 360, 30, dtype=float))})

Z = np.exp(1j * np.radians(ds.wd)) print(Z) print(f"Is Z an XArray? {isinstance(Z, xr.DataArray)}")

D = np.angle(ds.wd, deg=True) print(D) print(f"Is D an XArray? {isinstance(D, xr.DataArray)}") `` If this code is run, the result ofnumpy.angle(xarray.DataArray)is not a DataArray object, contrary to other numpy operations (for all versions of xarray I've used). Hence thexarray.ufuncs.angle` is a great option, if it was not for the current problem.

Environment:

No issues with xarray versions 0.16.2 and 0.17.0. This error happens from 0.18.0 onwards, up to 0.19.0 (recentmost).

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.19.0-18-amd64 machine: x86_64 processor: byteorder: little LC_ALL: en_US.utf8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 0.19.0 pandas: 1.2.3 numpy: 1.20.2 scipy: 1.5.3 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 58.2.0 pip: 21.3 conda: 4.10.3 pytest: None IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5857/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1518812301 I_kwDOAMm_X85ahzyN 7414 Error using xarray.interp - function signature does not match with scipy.interpn Florian1209 20089326 closed 0     2 2023-01-04T11:30:48Z 2024-04-28T20:55:33Z 2024-04-28T20:55:33Z NONE      

What happened?

I am experiencing an error when using the array.interp function. The error message indicates that the function signature does not match with scipy interpn.

It 's linked to scipy update 1.10.0 (2023/01/03).

What did you expect to happen?

I would interpolate 2D data of numpy float64 : two data lattitudes and longitudes following <xarray.DataArray (row: 32, col: 32)>. da is a xarray dataset : <xarray.Dataset> Dimensions: (lat: 721, lon: 1441) Coordinates: * lat (lat) float64 90.0 89.75 89.5 89.25 ... -89.25 -89.5 -89.75 -90.0 * lon (lon) float64 0.0 0.25 0.5 0.75 1.0 ... 359.2 359.5 359.8 360.0 Data variables: hgt (lat, lon) >f4 13.61 13.61 13.61 13.61 ... -29.53 -29.53 -29.53 Attributes:

Minimal Complete Verifiable Example

Python interpolated_da = da.interp( { "x": xr.DataArray(x, dims=("x", "y")), "y": xr.DataArray(y, dims=("x", "y")), } )

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python interpolated_da = da.interp( venv/lib/python3.8/site-packages/xarray/core/dataset.py:3378: in interp variables[name] = missing.interp(var, var_indexers, method, kwargs) venv/lib/python3.8/site-packages/xarray/core/missing.py:639: in interp interped = interp_func( venv/lib/python3.8/site-packages/xarray/core/missing.py:764: in interp_func return _interpnd(var, x, new_x, func, kwargs) venv/lib/python3.8/site-packages/xarray/core/missing.py:788: in _interpnd rslt = func(x, var, xi, kwargs) venv/lib/python3.8/site-packages/scipy/interpolate/_rgi.py:654: in interpn return interp(xi) venv/lib/python3.8/site-packages/scipy/interpolate/_rgi.py:336: in call result = evaluate_linear_2d(self.values,


??? E TypeError: No matching signature found

_rgi_cython.pyx:19: TypeError ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.4.0-135-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.0

xarray: 2022.12.0 pandas: 1.5.2 numpy: 1.22.4 scipy: 1.10.0 netCDF4: 1.6.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.4 cfgrib: None iris: None bottleneck: None dask: 2022.12.1 distributed: 2022.12.1 matplotlib: 3.6.2 cartopy: None seaborn: None numbagg: None fsspec: 2022.11.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.6.3 pip: 22.3.1 conda: None pytest: 7.2.0 mypy: None IPython: 8.7.0 sphinx: 5.3.0 None

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7414/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1039113959 I_kwDOAMm_X849757n 5913 Invalid characters in OpenDAP URL pmartineauGit 57886986 closed 0     5 2021-10-29T02:54:14Z 2024-04-28T20:55:17Z 2024-04-28T20:55:17Z NONE      

Hello,

I have successfully opened an OpenDAP URL with ds = xarray.open_dataset(url) However, after selecting a subset with ds = ds.isel(time=0) and attempting to load the data with ds.load(), I get the following error:

HTTP Status 400 – Bad Request: Invalid character found in the request

target. The valid characters are defined in RFC 7230 and RFC 3986

I suspect the reason is that square brackets are passed in the URL when attempting to load: ...zg_6hrPlevPt_MIROC6_historical_r1i1p1f1_gn_185001010600-185101010000.nc.dods?zg.zg[0][0:6][0:127][0:255]] because of the index selection with .isel()

In fact, some servers do forbid square brackets: https://www.unidata.ucar.edu/mailing_lists/archives/thredds/2020/msg00056.html

Would it be possible to provide an option to encode URLs? ( [ becomes %5B, and ] becomes %5D )

Or, instead of loading directly with ds.load(), is there a way for me to retrieve the URL with offending brackets that is generated automatically by xarray, encode it myself, and then use ds2 = xarray.load_dataset(encoded_url) to load?

Thank you for your help!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5913/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  not_planned xarray 13221727 issue
1244977848 I_kwDOAMm_X85KNNq4 6629 `plot.imshow` with datetime coordinate fails shaharkadmiel 6872529 closed 0     5 2022-05-23T10:56:46Z 2024-04-28T20:16:44Z 2024-04-28T20:16:44Z NONE      

What happened?

When trying to plot a 2d DataArray that has one of the 2 coordinates as datetime with da.plot.imshow, the following error is returned:

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

I know that I can use pcolormesh instead but on large arrays, imshow is much faster. It also behaves nicer with transparency and interpolation so or regularly sampled data, I find imshow a better choice.

Here is a minimal working example:

```python import numpy as np from xarray import DataArray from pandas import date_range

time = date_range('2020-01-01', periods=7, freq='D') y = np.linspace(0, 10, 11) da = DataArray( np.random.rand(time.size, y.size), coords=dict(time=time, y=y), dims=('time', 'y') )

da.plot.imshow(x='time', y='y') ```

What did you expect to happen?

I suggest the following solution which can be added after https://github.com/pydata/xarray/blob/4da7fdbd85bb82e338ad65a532dd7a9707e18ce0/xarray/plot/plot.py#L1366

python left, right = map(date2num, (left, right))

and then adding: python ax.xaxis_date() plt.setp(ax.get_xticklabels(), rotation=30, ha='right')

Minimal Complete Verifiable Example

```Python import numpy as np from xarray import DataArray from pandas import date_range

creating the data

time = date_range('2020-01-01', periods=7, freq='D') y = np.linspace(0, 10, 11) da = DataArray( np.random.rand(time.size, y.size), coords=dict(time=time, y=y), dims=('time', 'y') )

import matplotlib.pyplot as plt from matplotlib.dates import date2num, AutoDateFormatter

from https://github.com/pydata/xarray/blob/4da7fdbd85bb82e338ad65a532dd7a9707e18ce0/xarray/plot/plot.py#L1348

def _center_pixels(x): """Center the pixels on the coordinates.""" if np.issubdtype(x.dtype, str): # When using strings as inputs imshow converts it to # integers. Choose extent values which puts the indices in # in the center of the pixels: return 0 - 0.5, len(x) - 0.5

try:
    # Center the pixels assuming uniform spacing:
    xstep = 0.5 * (x[1] - x[0])
except IndexError:
    # Arbitrary default value, similar to matplotlib behaviour:
    xstep = 0.1

return x[0] - xstep, x[-1] + xstep

Center the pixels:

left, right = _center_pixels(da.time) top, bottom = _center_pixels(da.y)

the magical step

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

left, right = map(date2num, (left, right))

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

plotting

fig, ax = plt.subplots() ax.imshow( da.T, extent=(left, right, top, bottom), origin='lower', aspect='auto' )

ax.xaxis_date() plt.setp(ax.get_xticklabels(), rotation=30, ha='right') ```

MVCE confirmation

  • [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python

TypeError Traceback (most recent call last) /var/folders/bj/czjbfh496258q1lc3p01lyz00000gn/T/ipykernel_59425/1460104966.py in <module> ----> 1 da.plot.imshow(x='time', y='y')

~/miniconda3/lib/python3.8/site-packages/xarray/plot/plot.py in plotmethod(_PlotMethods_obj, x, y, figsize, size, aspect, ax, row, col, col_wrap, xincrease, yincrease, add_colorbar, add_labels, vmin, vmax, cmap, colors, center, robust, extend, levels, infer_intervals, subplot_kws, cbar_ax, cbar_kwargs, xscale, yscale, xticks, yticks, xlim, ylim, norm, kwargs) 1306 for arg in ["_PlotMethods_obj", "newplotfunc", "kwargs"]: 1307 del allargs[arg] -> 1308 return newplotfunc(allargs) 1309 1310 # Add to class _PlotMethods

~/miniconda3/lib/python3.8/site-packages/xarray/plot/plot.py in newplotfunc(darray, x, y, figsize, size, aspect, ax, row, col, col_wrap, xincrease, yincrease, add_colorbar, add_labels, vmin, vmax, cmap, center, robust, extend, levels, infer_intervals, colors, subplot_kws, cbar_ax, cbar_kwargs, xscale, yscale, xticks, yticks, xlim, ylim, norm, kwargs) 1208 ax = get_axis(figsize, size, aspect, ax, subplot_kws) 1209 -> 1210 primitive = plotfunc( 1211 xplt, 1212 yplt,

~/miniconda3/lib/python3.8/site-packages/xarray/plot/plot.py in imshow(x, y, z, ax, kwargs) 1394 z[np.any(z.mask, axis=-1), -1] = 0 1395 -> 1396 primitive = ax.imshow(z, defaults) 1397 1398 # If x or y are strings the ticklabels have been replaced with

~/miniconda3/lib/python3.8/site-packages/matplotlib/_api/deprecation.py in wrapper(args, kwargs) 454 "parameter will become keyword-only %(removal)s.", 455 name=name, obj_type=f"parameter of {func.name}()") --> 456 return func(args, kwargs) 457 458 # Don't modify func's signature, as boilerplate.py needs it.

~/miniconda3/lib/python3.8/site-packages/matplotlib/init.py in inner(ax, data, args, kwargs) 1410 def inner(ax, args, data=None, kwargs): 1411 if data is None: -> 1412 return func(ax, *map(sanitize_sequence, args), kwargs) 1413 1414 bound = new_sig.bind(ax, args, *kwargs)

~/miniconda3/lib/python3.8/site-packages/matplotlib/axes/_axes.py in imshow(self, X, cmap, norm, aspect, interpolation, alpha, vmin, vmax, origin, extent, interpolation_stage, filternorm, filterrad, resample, url, **kwargs) 5450 # update ax.dataLim, and, if autoscaling, set viewLim 5451 # to tightly fit the image, regardless of dataLim. -> 5452 im.set_extent(im.get_extent()) 5453 5454 self.add_image(im)

~/miniconda3/lib/python3.8/site-packages/matplotlib/image.py in set_extent(self, extent) 980 self._extent = xmin, xmax, ymin, ymax = extent 981 corners = (xmin, ymin), (xmax, ymax) --> 982 self.axes.update_datalim(corners) 983 self.sticky_edges.x[:] = [xmin, xmax] 984 self.sticky_edges.y[:] = [ymin, ymax]

~/miniconda3/lib/python3.8/site-packages/matplotlib/axes/_base.py in update_datalim(self, xys, updatex, updatey) 2474 """ 2475 xys = np.asarray(xys) -> 2476 if not np.any(np.isfinite(xys)): 2477 return 2478 self.dataLim.update_from_data_xy(xys, self.ignore_existing_data_limits,

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:21:17) [Clang 11.1.0 ] python-bits: 64 OS: Darwin OS-release: 21.4.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.3.0 pandas: 1.3.4 numpy: 1.21.4 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.10.3 cftime: 1.6.0 nc_time_axis: 1.4.0 PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.5 iris: None bottleneck: 1.3.2 dask: 2022.04.0 distributed: 2022.4.0 matplotlib: 3.5.0 cartopy: 0.20.2 seaborn: 0.11.2 numbagg: None fsspec: 2022.5.0 cupy: None pint: 0.18 sparse: None setuptools: 62.3.2 pip: 22.1.1 conda: 4.12.0 pytest: None IPython: 7.30.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6629/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
803075280 MDU6SXNzdWU4MDMwNzUyODA= 4880 Datetime as coordinaets does not convert back to datetime (returns int) feefladder 33122845 closed 0     6 2021-02-07T22:20:11Z 2024-04-28T20:13:33Z 2024-04-28T20:13:32Z CONTRIBUTOR      

What happened: datetime was in np.datetime64 formet. When converted t datetime.datetime format it returned an int What you expected to happen: `to get a datetime returned Minimal Complete Verifiable Example:

```python

Put your MCVE code here

import xarray as xr import numpy as np import datetime date_frame = xr.DataArray(dims='time',coords={'time':pd.date_range('2000-01-01',periods=365)},data=np.zeros(365)) print('pandas date range (datetime): ',pd.date_range('2000-01-01',periods=365)[0]) print('dataframe datetime converted to datetime (int): ',date_frame.coords['time'].data[0].astype(datetime.datetime)) print("normal numpy datetime64 converted to datetime (datetime): ",np.datetime64(datetime.datetime(2000,1,1)).astype(datetime.datetime)) output: pandas date range (datetime): 2000-01-01 00:00:00 dataframe datetime converted to datetime (int): 946684800000000000 normal numpy datetime64 converted to datetime (datetime): 2000-01-01 00:00:00 ```

if converted to int, it also gives different lengths of int : date_frame: 946684800000000000 946684800000000 normal datetime64^ Anything else we need to know?:

it is also mentioned in this SO thread appears to be a problem in the datetime64....

numpy version 1.20.0 pandas version 1.2.1

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.9 | packaged by conda-forge | (default, Dec 9 2020, 21:08:20) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-59-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.16.2 pandas: 1.2.1 numpy: 1.20.0 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.6.1 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.01.1 distributed: 2021.01.1 matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: None IPython: 7.20.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4880/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1402002645 I_kwDOAMm_X85TkNzV 7146 Segfault writing large netcdf files to s3fs d1mach 11075246 closed 0     17 2022-10-08T16:56:31Z 2024-04-28T20:11:59Z 2024-04-28T20:11:59Z NONE      

What happened?

It seems netcdf4 does not work well currently with s3fs the FUSE filesystem layer over S3 compatible storage with either the default netcdf4 engine nor with the h5netcdf.

Here is an example python import numpy as np import xarray as xr from datetime import datetime, timedelta NTIMES=48 start = datetime(2022,10,6,0,0) time_vals = [start + timedelta(minutes=20*t) for t in range(NTIMES)] times = xr.DataArray(data = [t.strftime('%Y%m%d%H%M%S').encode() for t in time_vals], dims=['Time']) v1 = xr.DataArray(data=np.zeros((len(times), 201, 201)), dims=['Time', 'x', 'y']) ds = xr.Dataset(data_vars=dict(times=times, v1=v1)) ds.to_netcdf(path='/my_s3_fs/test_netcdf.nc', format='NETCDF4', mode='w') On my system this code crashes with NTIMES=48, but completes without an error with NTIMES=24.

The output with NTIMES=48 is

``` There are 1 HDF5 objects open!

Report: open objects on 72057594037927936 Segmentation fault (core dumped) ```

I have tried the other engine that handles NETCDF4 in xarray with engine='h5netcdf' and also got a segfault.

A quick workaround seems to be to use the local filesystem to write the NetCDF file and then move the complete file to S3.

python ds.to_netcdf(path='/tmp/test_netcdf.nc', format='NETCDF4', mode='w') shutil.move('/tmp/test_netcdf.nc', '/my_s3_fs/test_netcdf.nc') There are several pieces of software involved here: the xarray package (0.16.1), netcdf4 (1.5.4), HDF5 (1.10.6), and s3fs (1.79). If this is not a bug in my code but in the underlying libraries, most likely it is not an xarray bug, but since it fails with both Netcdf4 engines, I decided to report it here.

What did you expect to happen?

With NTIMES=24 I am getting a file /my_s3_fs/test_netcdf.nc of about 7.8 MBytes. WIth NTIMES=36 I get an empty file. I would expect to have this code run without a segfault and produce a nonempty file.

Minimal Complete Verifiable Example

Python import numpy as np import xarray as xr from datetime import datetime, timedelta NTIMES=48 start = datetime(2022,10,6,0,0) time_vals = [start + timedelta(minutes=20*t) for t in range(NTIMES)] times = xr.DataArray(data = [t.strftime('%Y%m%d%H%M%S').encode() for t in time_vals], dims=['Time']) v1 = xr.DataArray(data=np.zeros((len(times), 201, 201)), dims=['Time', 'x', 'y']) ds = xr.Dataset(data_vars=dict(times=times, v1=v1)) ds.to_netcdf(path='/my_s3_fs/test_netcdf.nc', format='NETCDF4', mode='w')

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python There are 1 HDF5 objects open!

Report: open objects on 72057594037927936 Segmentation fault (core dumped) ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.8.3 | packaged by conda-forge | (default, Jun 1 2020, 17:43:00) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-26-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.1 pandas: 1.1.3 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.4 pydap: None h5netcdf: 1.0.2 h5py: 3.1.0 Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: None matplotlib: 3.3.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 50.3.0.post20201006 pip: 20.2.3 conda: 22.9.0 pytest: 6.1.1 IPython: 7.18.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7146/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2132500634 I_kwDOAMm_X85_G2Ca 8742 Y-axis is reversed when using to_zarr() alistaireverett 7837535 closed 0     3 2024-02-13T14:48:30Z 2024-04-28T20:08:13Z 2024-04-28T20:08:13Z NONE      

What happened?

When I export a dataset to NetCDF and Zarr, the y axis appears to have been reversed with gdalinfo. I also cannot build a vrt file with the Zarr file since it complains about positive NS axis, but this works fine with the NetCDF file.

Example NetCDF file as input: in.nc.zip

gdalinfo on output NetCDF file: $ gdalinfo NETCDF:out.nc:air_temperature_2m Driver: netCDF/Network Common Data Format Files: out.nc out.nc.aux.xml Size is 949, 1069 Coordinate System is: PROJCRS["unnamed", BASEGEOGCRS["unknown", DATUM["unnamed", ELLIPSOID["Sphere",6371000,0, LENGTHUNIT["metre",1, ID["EPSG",9001]]]], PRIMEM["Greenwich",0, ANGLEUNIT["degree",0.0174532925199433, ID["EPSG",9122]]]], CONVERSION["unnamed", METHOD["Lambert Conic Conformal (2SP)", ID["EPSG",9802]], PARAMETER["Latitude of false origin",63.3, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8821]], PARAMETER["Longitude of false origin",15, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8822]], PARAMETER["Latitude of 1st standard parallel",63.3, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8823]], PARAMETER["Latitude of 2nd standard parallel",63.3, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8824]], PARAMETER["Easting at false origin",0, LENGTHUNIT["metre",1], ID["EPSG",8826]], PARAMETER["Northing at false origin",0, LENGTHUNIT["metre",1], ID["EPSG",8827]]], CS[Cartesian,2], AXIS["easting",east, ORDER[1], LENGTHUNIT["metre",1, ID["EPSG",9001]]], AXIS["northing",north, ORDER[2], LENGTHUNIT["metre",1, ID["EPSG",9001]]]] Data axis to CRS axis mapping: 1,2 Origin = (-1061334.000000000000000,1338732.125000000000000) Pixel Size = (2500.000000000000000,-2500.000000000000000) Metadata: air_temperature_2m#coordinates=longitude latitude air_temperature_2m#grid_mapping=projection_lambert air_temperature_2m#long_name=Screen level temperature (T2M) air_temperature_2m#standard_name=air_temperature air_temperature_2m#units=K air_temperature_2m#_FillValue=9.96921e+36 height1#description=height above ground height1#long_name=height height1#positive=up height1#units=m height1#_FillValue=nan NC_GLOBAL#coordinates=projection_lambert time NETCDF_DIM_EXTRA={height1} NETCDF_DIM_height1_DEF={1,5} NETCDF_DIM_height1_VALUES=2 projection_lambert#earth_radius=6371000 projection_lambert#grid_mapping_name=lambert_conformal_conic projection_lambert#latitude_of_projection_origin=63.3 projection_lambert#longitude_of_central_meridian=15 projection_lambert#standard_parallel={63.3,63.3} x#long_name=x-coordinate in Cartesian system x#standard_name=projection_x_coordinate x#units=m x#_FillValue=nan y#long_name=y-coordinate in Cartesian system y#standard_name=projection_y_coordinate y#units=m y#_FillValue=nan Geolocation: LINE_OFFSET=0 LINE_STEP=1 PIXEL_OFFSET=0 PIXEL_STEP=1 SRS=GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]] X_BAND=1 X_DATASET=NETCDF:"out.nc":longitude Y_BAND=1 Y_DATASET=NETCDF:"out.nc":latitude Corner Coordinates: Upper Left (-1061334.000, 1338732.125) ( 18d10'24.02"W, 72d45'59.56"N) Lower Left (-1061334.000,-1333767.875) ( 0d15'55.60"E, 50d18'23.10"N) Upper Right ( 1311166.000, 1338732.125) ( 54d17'24.85"E, 71d34'43.38"N) Lower Right ( 1311166.000,-1333767.875) ( 33d 2'20.10"E, 49d45' 6.51"N) Center ( 124916.000, 2482.125) ( 17d30' 3.21"E, 63d18' 1.50"N) Band 1 Block=949x1069 Type=Float32, ColorInterp=Undefined Min=236.480 Max=284.937 Minimum=236.480, Maximum=284.937, Mean=269.816, StdDev=9.033 NoData Value=9.96920996838686905e+36 Unit Type: K Metadata: coordinates=longitude latitude grid_mapping=projection_lambert long_name=Screen level temperature (T2M) NETCDF_DIM_height1=2 NETCDF_VARNAME=air_temperature_2m standard_name=air_temperature STATISTICS_MAXIMUM=284.93682861328 STATISTICS_MEAN=269.81614967971 STATISTICS_MINIMUM=236.47978210449 STATISTICS_STDDEV=9.0332172122638 units=K _FillValue=9.96921e+36

gdalinfo on output Zarr file: $ gdalinfo ZARR:out.zarr:/air_temperature_2m:0 Driver: Zarr/Zarr Files: none associated Size is 949, 1069 Origin = (-1061334.000000000000000,-1333767.875000000000000) Pixel Size = (2500.000000000000000,2500.000000000000000) Metadata: coordinates=longitude latitude grid_mapping=projection_lambert long_name=Screen level temperature (T2M) standard_name=air_temperature Corner Coordinates: Upper Left (-1061334.000,-1333767.875) Lower Left (-1061334.000, 1338732.125) Upper Right ( 1311166.000,-1333767.875) Lower Right ( 1311166.000, 1338732.125) Center ( 124916.000, 2482.125) Band 1 Block=475x268 Type=Float32, ColorInterp=Undefined NoData Value=9.96920996838686905e+36 Unit Type: K

The main issue is that the origin and y-axis direction is reversed, as you can see from the origin and pixel size. I have tried taking the CRS from the netcdf and adding it to the Zarr file as a _CRS attribute manually, but this doesn't make any difference to the origin or pixel size.

What did you expect to happen?

Origin, pixel size and corner coords should match those in the netcdf file.

$ gdalinfo ZARR:out.zarr:/air_temperature_2m:0 Driver: Zarr/Zarr Files: none associated Size is 949, 1069 Origin = (-1061334.000000000000000,1338732.125000000000000) Pixel Size = (2500.000000000000000,-2500.000000000000000) Metadata: coordinates=longitude latitude grid_mapping=projection_lambert long_name=Screen level temperature (T2M) standard_name=air_temperature Corner Coordinates: Corner Coordinates: Upper Left (-1061334.000, 1338732.125) ( 18d10'24.02"W, 72d45'59.56"N) Lower Left (-1061334.000,-1333767.875) ( 0d15'55.60"E, 50d18'23.10"N) Upper Right ( 1311166.000, 1338732.125) ( 54d17'24.85"E, 71d34'43.38"N) Lower Right ( 1311166.000,-1333767.875) ( 33d 2'20.10"E, 49d45' 6.51"N) Center ( 124916.000, 2482.125) ( 17d30' 3.21"E, 63d18' 1.50"N) Band 1 Block=475x268 Type=Float32, ColorInterp=Undefined NoData Value=9.96920996838686905e+36 Unit Type: K

Minimal Complete Verifiable Example

```Python import xarray as xr from pyproj import CRS

ds = xr.open_dataset("in.nc")

Optionally take copy CRS to Zarr (produces and error, but does work)

crs_wkt = CRS.from_cf(ds["projection_lambert"].attrs).to_wkt() ds["air_temperature_2m"] = ds["air_temperature_2m"].assign_attrs(_CRS={"wkt": crs_wkt})

ds.to_zarr("out.zarr")

ds.to_netcdf("out.nc") ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] python-bits: 64 OS: Linux OS-release: 6.5.0-15-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2024.1.1 pandas: 2.2.0 numpy: 1.26.4 scipy: None netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: None mypy: None IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8742/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2021386895 PR_kwDOAMm_X85g7QZD 8500 Deprecate ds.dims returning dict TomNicholas 35968931 closed 0     1 2023-12-01T18:29:28Z 2024-04-28T20:04:00Z 2023-12-06T17:52:24Z MEMBER   0 pydata/xarray/pulls/8500
  • [x] Closes first step of #8496, would require another PR later to actually change the return type. Also really resolves the second half of #921.
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8500/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2115555965 I_kwDOAMm_X85-GNJ9 8695 Return a 3D object alongside 1D object in apply_ufunc ahuang11 15331990 closed 0     7 2024-02-02T18:47:14Z 2024-04-28T19:59:31Z 2024-04-28T19:59:31Z CONTRIBUTOR      

Is your feature request related to a problem?

Currently, I have something similar to this, where the input_lat is transformed to new_lat (here, +0.25, but in real use case, it's indeterministic).

Since xarray_ufunc doesn't return a dataset with actual coordinates values, I had to return a second output to retain new_lat to properly update the coordinate values, but this second output is shaped time, lat, lon so I have to ds["lat"] = new_lat.isel(lon=0, time=0).values, which I think is inefficient; I simply need it to be shaped lat.

Any ideas on how I can modify this to make it more efficient?

```python import xarray as xr import numpy as np

air = xr.tutorial.open_dataset("air_temperature")["air"] input_lat = np.arange(20, 45)

def interp1d_np(data, base_lat, input_lat): new_lat = input_lat + 0.25 return np.interp(new_lat, base_lat, data), new_lat

ds, new_lat = xr.apply_ufunc( interp1d_np, # first the function air, air.lat, # as above input_lat, # as above input_core_dims=[["lat"], ["lat"], ["lat"]], # list with one entry per arg output_core_dims=[["lat"], ["lat"]], # returned data has one dimension exclude_dims=set(("lat",)), # dimensions allowed to change size. Must be a set! vectorize=True, # loop over non-core dims ) new_lat = new_lat.isel(lon=0, time=0).values ds["lat"] = new_lat ```

Describe the solution you'd like

Either be able to automatically assign the new_lat to the returned xarray object, or allow a 1D dataset to be returned

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8695/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
576337745 MDU6SXNzdWU1NzYzMzc3NDU= 3831 Errors using to_zarr for an s3 store JarrodBWong 15351025 closed 0     15 2020-03-05T15:30:40Z 2024-04-28T19:59:02Z 2024-04-28T19:59:02Z NONE      

Hello, I have been trying to write zarr files from xarray directly into an s3 store but keep getting errors for missing arrays. It looks like the structure of the zarr archive is created in my s3 bucket, I can see .zarray and .zattrs files but it's missing the 0.0.0, 0.0.1, etc files. I have been able to write the same arrays directly to my disk so don't think it's an issue with the dataset itself.

MCVE Code Sample

```python s3 = s3fs.S3FileSystem(anon=False) store= s3fs.S3Map(root=f's3://my-bucket/data.zarr', s3=s3, check=False)

ds.to_zarr(store=store, consolidated=True, mode='w')

```

Output

The variable name of the array changes by the run, it's not always the same one that it says is missing.

logs -------------------------------------------------------------------------- NoSuchKey Traceback (most recent call last) /opt/conda/lib/python3.7/site-packages/s3fs/core.py in _fetch_range(client, bucket, key, version_id, start, end, max_attempts, req_kw) 1196 Range='bytes=%i-%i' % (start, end - 1), -> 1197 **kwargs) 1198 return resp['Body'].read() ~/.local/lib/python3.7/site-packages/botocore/client.py in _api_call(self, *args, **kwargs) 315 # The "self" in this scope is referring to the BaseClient. --> 316 return self._make_api_call(operation_name, kwargs) 317 ~/.local/lib/python3.7/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params) 625 error_class = self.exceptions.from_code(error_code) --> 626 raise error_class(parsed_response, operation_name) 627 else: NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist. During handling of the above exception, another exception occurred: FileNotFoundError Traceback (most recent call last) /opt/conda/lib/python3.7/site-packages/fsspec/mapping.py in __getitem__(self, key, default) 75 try: ---> 76 result = self.fs.cat(key) 77 except: # noqa: E722 /opt/conda/lib/python3.7/site-packages/fsspec/spec.py in cat(self, path) 545 """ Get the content of a file """ --> 546 return self.open(path, "rb").read() 547 /opt/conda/lib/python3.7/site-packages/fsspec/spec.py in read(self, length) 1129 return b"" -> 1130 out = self.cache._fetch(self.loc, self.loc + length) 1131 self.loc += len(out) /opt/conda/lib/python3.7/site-packages/fsspec/caching.py in _fetch(self, start, end) 338 # First read, or extending both before and after --> 339 self.cache = self.fetcher(start, bend) 340 self.start = start /opt/conda/lib/python3.7/site-packages/s3fs/core.py in _fetch_range(self, start, end) 1059 def _fetch_range(self, start, end): -> 1060 return _fetch_range(self.fs.s3, self.bucket, self.key, self.version_id, start, end, req_kw=self.req_kw) 1061 /opt/conda/lib/python3.7/site-packages/s3fs/core.py in _fetch_range(client, bucket, key, version_id, start, end, max_attempts, req_kw) 1212 return b'' -> 1213 raise translate_boto_error(e) 1214 except Exception as e: FileNotFoundError: The specified key does not exist. During handling of the above exception, another exception occurred: KeyError Traceback (most recent call last) /opt/conda/lib/python3.7/site-packages/zarr/core.py in _load_metadata_nosync(self) 149 mkey = self._key_prefix + array_meta_key --> 150 meta_bytes = self._store[mkey] 151 except KeyError: /opt/conda/lib/python3.7/site-packages/fsspec/mapping.py in __getitem__(self, key, default) 79 return default ---> 80 raise KeyError(key) 81 return result KeyError: 'my-bucket/data.zarr/lv_HTGL7_l1/.zarray' During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) <ipython-input-7-c21938cc83d3> in <module> 7 ds.to_zarr(store=s3_store_dest, 8 consolidated=True, ----> 9 mode='w') /opt/conda/lib/python3.7/site-packages/xarray/core/dataset.py in to_zarr(self, store, mode, synchronizer, group, encoding, compute, consolidated, append_dim) 1623 compute=compute, 1624 consolidated=consolidated, -> 1625 append_dim=append_dim, 1626 ) 1627 /opt/conda/lib/python3.7/site-packages/xarray/backends/api.py in to_zarr(dataset, store, mode, synchronizer, group, encoding, compute, consolidated, append_dim) 1341 writer = ArrayWriter() 1342 # TODO: figure out how to properly handle unlimited_dims -> 1343 dump_to_store(dataset, zstore, writer, encoding=encoding) 1344 writes = writer.sync(compute=compute) 1345 /opt/conda/lib/python3.7/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1133 variables, attrs = encoder(variables, attrs) 1134 -> 1135 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) 1136 1137 /opt/conda/lib/python3.7/site-packages/xarray/backends/zarr.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 385 self.set_dimensions(variables_encoded, unlimited_dims=unlimited_dims) 386 self.set_variables( --> 387 variables_encoded, check_encoding_set, writer, unlimited_dims=unlimited_dims 388 ) 389 /opt/conda/lib/python3.7/site-packages/xarray/backends/zarr.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 444 dtype = str 445 zarr_array = self.ds.create( --> 446 name, shape=shape, dtype=dtype, fill_value=fill_value, **encoding 447 ) 448 zarr_array.attrs.put(encoded_attrs) /opt/conda/lib/python3.7/site-packages/zarr/hierarchy.py in create(self, name, **kwargs) 877 """Create an array. Keyword arguments as per 878 :func:`zarr.creation.create`.""" --> 879 return self._write_op(self._create_nosync, name, **kwargs) 880 881 def _create_nosync(self, name, **kwargs): /opt/conda/lib/python3.7/site-packages/zarr/hierarchy.py in _write_op(self, f, *args, **kwargs) 656 657 with lock: --> 658 return f(*args, **kwargs) 659 660 def create_group(self, name, overwrite=False): /opt/conda/lib/python3.7/site-packages/zarr/hierarchy.py in _create_nosync(self, name, **kwargs) 884 kwargs.setdefault('cache_attrs', self.attrs.cache) 885 return create(store=self._store, path=path, chunk_store=self._chunk_store, --> 886 **kwargs) 887 888 def empty(self, name, **kwargs): /opt/conda/lib/python3.7/site-packages/zarr/creation.py in create(shape, chunks, dtype, compressor, fill_value, order, store, synchronizer, overwrite, path, chunk_store, filters, cache_metadata, cache_attrs, read_only, object_codec, **kwargs) 123 # instantiate array 124 z = Array(store, path=path, chunk_store=chunk_store, synchronizer=synchronizer, --> 125 cache_metadata=cache_metadata, cache_attrs=cache_attrs, read_only=read_only) 126 127 return z /opt/conda/lib/python3.7/site-packages/zarr/core.py in __init__(self, store, path, read_only, chunk_store, synchronizer, cache_metadata, cache_attrs) 122 123 # initialize metadata --> 124 self._load_metadata() 125 126 # initialize attributes /opt/conda/lib/python3.7/site-packages/zarr/core.py in _load_metadata(self) 139 """(Re)load metadata from store.""" 140 if self._synchronizer is None: --> 141 self._load_metadata_nosync() 142 else: 143 mkey = self._key_prefix + array_meta_key /opt/conda/lib/python3.7/site-packages/zarr/core.py in _load_metadata_nosync(self) 150 meta_bytes = self._store[mkey] 151 except KeyError: --> 152 err_array_not_found(self._path) 153 else: 154 /opt/conda/lib/python3.7/site-packages/zarr/errors.py in err_array_not_found(path) 19 20 def err_array_not_found(path): ---> 21 raise ValueError('array not found at path %r' % path) 22 23 ValueError: array not found at path 'lv_HTGL7_l1'

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.14.165-133.209.amzn2.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.15.0 pandas: 1.0.1 numpy: 1.18.1 scipy: 1.4.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: 1.5.5 zarr: 2.4.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.3 cfgrib: None iris: None bottleneck: None dask: 2.11.0 distributed: 2.11.0 matplotlib: 3.1.3 cartopy: None seaborn: 0.10.0 numbagg: None setuptools: 45.2.0.post20200209 pip: 20.0.2 conda: 4.7.12 pytest: None IPython: 7.12.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3831/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2224036575 I_kwDOAMm_X86EkBrf 8905 Variable doesn't have an .expand_dims method TomNicholas 35968931 closed 0     4 2024-04-03T22:19:10Z 2024-04-28T19:54:08Z 2024-04-28T19:54:08Z MEMBER      

Is your feature request related to a problem?

DataArray and Dataset have an .expand_dims method, but it looks like Variable doesn't.

Describe the solution you'd like

Variable should also have this method, the only difference being that it wouldn't create any coordinates or indexes.

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8905/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2254350395 PR_kwDOAMm_X85tPTua 8960 Option to not auto-create index during expand_dims TomNicholas 35968931 closed 0     2 2024-04-20T03:27:23Z 2024-04-27T16:48:30Z 2024-04-27T16:48:24Z MEMBER   0 pydata/xarray/pulls/8960
  • [x] Solves part of #8871 by pulling out part of https://github.com/pydata/xarray/pull/8872#issuecomment-2027571714
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~

TODO: - [x] Add new kwarg to DataArray.expand_dims - [ ] Add examples to docstrings? - [x] Check it actually solves the problem in #8872

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8960/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2261844699 PR_kwDOAMm_X85toeXT 8968 Bump dependencies incl `pandas>=2` dcherian 2448579 closed 0     0 2024-04-24T17:42:19Z 2024-04-27T14:17:16Z 2024-04-27T14:17:16Z MEMBER   0 pydata/xarray/pulls/8968
  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8968/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2260280862 PR_kwDOAMm_X85tjH8m 8967 Migrate datatreee assertions/extensions/formatting owenlittlejohns 7788154 closed 0     0 2024-04-24T04:23:03Z 2024-04-26T17:38:59Z 2024-04-26T17:29:18Z CONTRIBUTOR   0 pydata/xarray/pulls/8967

This PR continues the overall work of migrating DataTree into xarray.

  • xarray/core/datatree_render.py is the renamed version of xarray/datatree_/datatree/render.py.
  • xarray/core/extensions.py now contains functionality from xarray/datatree_/datatree/extensions.py.
  • xarray/core/formatting.py now contains functionality from xarray/datatree_/datatree/formatting.py.
  • xarray/tests/test_datatree.py now contains tests from xarray/datatree_/datatree/tests/test_dataset_api.py.
  • xarray/testing/assertions.py now contains functionality from /xarray/datatree_/datatree/testing.py.

I had also meant to get to common.py and what's left of io.py, but I've got a hefty couple of days of meetings ahead, so I wanted to get this progress into PR before that happens. @flamingbear or I can follow up with the remaining things in a separate PR. (Also this PR is already getting a little big, so maybe it's already got enough in it)

  • [x] Contributes to migration step for miscellaneous modules in #8572
  • [ ] ~~Tests added~~
  • [ ] ~~User visible changes (including notable bug fixes) are documented in whats-new.rst~~
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8967/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
590630281 MDU6SXNzdWU1OTA2MzAyODE= 3921 issues discovered by the all-but-dask CI keewis 14808389 closed 0     4 2020-03-30T22:08:46Z 2024-04-25T14:48:15Z 2024-02-10T02:57:34Z MEMBER      

After adding the py38-all-but-dask CI in #3919, it discovered a few backend issues: - zarr: - [x] open_zarr with chunks="auto" always tries to chunk, even if dask is not available (fixed in #3919) - [x] ZarrArrayWrapper.__getitem__ incorrectly passes the indexer's tuple attribute to _arrayize_vectorized_indexer (this only happens if dask is not available) (fixed in #3919) - [x] slice indexers with negative steps get transformed incorrectly if dask is not available https://github.com/pydata/xarray/pull/8674 - rasterio: - ~calling pickle.dumps on a Dataset object returned by open_rasterio fails because a non-serializable lock was used (if dask is installed, a serializable lock is used instead)~

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3921/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2261917442 PR_kwDOAMm_X85touYl 8971 Delete pynio backend. dcherian 2448579 closed 0     2 2024-04-24T18:25:26Z 2024-04-25T14:38:23Z 2024-04-25T14:23:59Z MEMBER   0 pydata/xarray/pulls/8971
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8971/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2243685081 I_kwDOAMm_X86Fu-rZ 8945 netCDF4 indexing: `reindex_like` is very slow if dataset not loaded into memory brendan-m-murphy 11130776 closed 0     4 2024-04-15T13:26:08Z 2024-04-23T21:49:28Z 2024-04-23T15:33:36Z NONE      

What is your issue?

Reindexing a dataset without loading it into memory seems to be very slow (about 1000x slower than reindexing after loading into memory).

Here is a minimum working example: ``` times = 100 nlat = 200 nlon = 300

fp = xr.Dataset({"fp": (["time", "lat", "lon"], np.arange(times * nlat * nlon).reshape(times, nlat, nlon))}, coords={"time": pd.date_range(start="2019-01-01T02:00:00", periods=times, freq="1H"), "lat": np.arange(nlat), "lon": np.arange(nlon)})

flux = xr.Dataset({"flux": (["time", "lat", "lon"], np.arange(nlat * nlon).reshape(1, nlat, nlon))}, coords={"time": [pd.to_datetime("2019-01-01")], "lat": np.arange(nlat) + np.random.normal(0.0, 0.01, nlat), "lon": np.arange(nlon) + np.random.normal(0.0, 0.01, nlon)})

fp.to_netcdf("combine_datasets_tests/fp.nc") flux.to_netcdf("combine_datasets_tests/flux.nc")

fp1 = xr.open_dataset("combine_datasets_tests/fp.nc") flux1 = xr.open_dataset("combine_datasets_tests/flux.nc") ```

Then flux1 = flux1.reindex_like(fp1, method="ffill", tolerance=None) takes over a minute, while flux1 = flux1.load().reindex_like(fp1, method="ffill", tolerance=None) is almost instantaneous (timeit says 91ms, including opening the dataset... I'm not sure if caching is influencing this).

Profiling the "reindex without load" cell: ``` 804936 function calls (804622 primitive calls) in 93.285 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function) 1 92.211 92.211 93.191 93.191 {built-in method _operator.getitem} 1 0.289 0.289 0.980 0.980 utils.py:81(_StartCountStride) 6 0.239 0.040 0.613 0.102 shape_base.py:267(apply_along_axis) 72656 0.109 0.000 0.109 0.000 utils.py:429(<lambda>) 72656 0.085 0.000 0.136 0.000 utils.py:430(<lambda>) 72661 0.051 0.000 0.051 0.000 {built-in method numpy.arange} 145318 0.048 0.000 0.115 0.000 shape_base.py:370(<genexpr>) 2 0.045 0.023 0.046 0.023 indexing.py:1334(getitem) 6 0.044 0.007 0.044 0.007 numeric.py:136(ones) 145318 0.044 0.000 0.067 0.000 index_tricks.py:690(next) 14 0.033 0.002 0.033 0.002 {built-in method numpy.empty} 145333/145325 0.023 0.000 0.023 0.000 {built-in method builtins.next} 1 0.020 0.020 93.275 93.275 duck_array_ops.py:317(where) 21 0.018 0.001 0.018 0.001 {method 'astype' of 'numpy.ndarray' objects} 145330 0.013 0.000 0.013 0.000 {built-in method numpy.asanyarray} 1 0.002 0.002 0.002 0.002 {built-in method _functools.reduce} 1 0.002 0.002 93.279 93.279 variable.py:821(_getitem_with_mask) 18 0.001 0.000 0.001 0.000 {built-in method numpy.zeros} 1 0.000 0.000 0.000 0.000 file_manager.py:226(close) ```

The getitem call at the top is from xarray.backends.netCDF4_.py, line 114. Because of the jittered coordinates in flux, I'm assuming that the index passed to netCDF4 is not consecutive/strictly monotonic integers (0, 1, 2, 3, ...). In the past, this has caused issues: https://github.com/Unidata/netcdf4-python/issues/680.

In my venv, netCDF4 was installed from a wheel with the following versions: netcdf4-python version: 1.6.5 HDF5 lib version: 1.12.2 netcdf lib version: 4.9.3-development

This is with xarray version 2023.12.0, numpy 1.26, and pandas 1.5.3.

I will try to investigate more and hopefully simplify the example. (Can't quite justify spending more time on it at work because this is just to tag a version that was used in some experiments before we switch to zarr as a backend, so hopefully it won't be relevant at that point.)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8945/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2141447815 I_kwDOAMm_X85_o-aH 8768 `xarray/datatree_` missing in 2024.2.0 sdist mgorny 110765 closed 0     15 2024-02-19T03:57:31Z 2024-04-23T18:11:58Z 2024-04-23T15:35:21Z CONTRIBUTOR      

What happened?

Apparently xarray-2024.2.0 requires xarray.datatree_ module but this module isn't included in sdist tarball.

What did you expect to happen?

No response

Minimal Complete Verifiable Example

Python $ tar -tf /tmp/dist/xarray-2024.2.0.tar.gz | grep datatree_ (empty)

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

n/a

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8768/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2248692681 PR_kwDOAMm_X85s8dDt 8953 stop pruning datatree_ directory from distribution flamingbear 479480 closed 0     0 2024-04-17T16:14:13Z 2024-04-23T15:39:06Z 2024-04-23T15:35:20Z CONTRIBUTOR   0 pydata/xarray/pulls/8953

This PR removes the directive that strips out the datatree_ directory from the xarray distribution.

It also cleans a few typing errors and removes exceptions for the datatree_ directory for mypy.

It does NOT remove the exception for pre-commit config.

  • [X] Closes #8768
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8953/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2255271332 PR_kwDOAMm_X85tSKJs 8961 use `nan` instead of `NaN` keewis 14808389 closed 0     0 2024-04-21T21:26:18Z 2024-04-21T22:01:04Z 2024-04-21T22:01:03Z MEMBER   0 pydata/xarray/pulls/8961

FYI @aulemahal, numpy.NaN will be removed in the upcoming numpy=2.0 release.

  • [x] follow-up to #8603
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8961/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2100707586 PR_kwDOAMm_X85lFQn3 8669 Fix automatic broadcasting when wrapping array api class TomNicholas 35968931 closed 0     0 2024-01-25T16:05:19Z 2024-04-20T05:58:05Z 2024-01-26T16:41:30Z MEMBER   0 pydata/xarray/pulls/8669
  • [x] Closes #8665
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8669/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2238099300 PR_kwDOAMm_X85sYXC0 8930 Migrate formatting_html.py into xarray core eni-awowale 51421921 closed 0     7 2024-04-11T16:15:28Z 2024-04-18T21:59:47Z 2024-04-18T21:59:44Z CONTRIBUTOR   0 pydata/xarray/pulls/8930

This PR migrates the formatting_html.py module into xarray/core/formatting_html.py as part of the on-going effort to merge xarray-datatree into xarray.

One thing of note is that importing and setting the OPTIONS to "default" in datatree/formatting_html.py (lines) were moved into xarray/core/options.py on #L23 and #L49. So, I did not add them back to xarray/core/formatting_html.py.

  • [x] Completes migration step for datatree/formating_htmls.py https://github.com/pydata/xarray/issues/8572
  • [x] Tests added
  • [ ] ~~User visible changes (including notable bug fixes) are documented in whats-new.rst~~
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8930/reactions",
    "total_count": 3,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 2,
    "rocket": 1,
    "eyes": 0
}
    xarray 13221727 pull
2125478394 PR_kwDOAMm_X85mZIzr 8723 (feat): Support for `pandas` `ExtensionArray` ilan-gold 43999641 closed 0     23 2024-02-08T15:38:18Z 2024-04-18T12:52:06Z 2024-04-18T12:52:03Z CONTRIBUTOR   0 pydata/xarray/pulls/8723

Some outstanding points/decisions brought up by this PR: - [ ] Confirm type promotion rules and write them out. As it stands now, if everything is of the same extension array type, it is passed onwards and otherwise is converted to numpy. (related: https://github.com/pydata/xarray/pull/8714) ~- [ ] Acceptance of plum as a dispatch method. Without it, the behavior should be fallen back on from before (cast to numpy types). I am a big fan of dispatching and think it could serve as a model going forward for making support of other data types/arrays more feasible. The other option, I think, would be to just use the underlying array of the ExtensionDuckArray class to decide and then have some central registry that serves as the basis for a decorator (like the api for accessors via _CachedAccessor). That being said, the current defaults are quite good so this is a marginal feature, in all likelihood.~ - [ ] Do we allow just pandas ExtensionArray directly or can we also allow Series?

Possible missing something else! Let me know!

Checklist: - [x] Closes #8463 and Closes #5287 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in whats-new.rst - [ ] New functions/methods are listed in api.rst

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8723/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
884649380 MDU6SXNzdWU4ODQ2NDkzODA= 5287 Support for pandas Extension Arrays Hoeze 1200058 closed 0     8 2021-05-10T17:00:17Z 2024-04-18T12:52:04Z 2024-04-18T12:52:04Z NONE      

Is your feature request related to a problem? Please describe. I started writing an ExtensionArray which is basically a Tuple[Array[str], Array[int], Array[int], Array[str], Array[str]]. Its scalar type is a Tuple[str, int, int, str, str].

This is working great in Pandas, I can read and write Parquet as well as csv with it. However, as soon as I'm using any .to_xarray() method, it gets converted to a NumPy array of objects. Also, converting back to Pandas keeps a Series of objects instead of my extension type.

Describe the solution you'd like Would it be possible to support Pandas Extension Types on coordinates? It's not necessary to compute anything on them, I'd just like to use them for dimensions.

Describe alternatives you've considered I was thinking over implementing a NumPy duck array, but I have never tried this and it looks quite complicated compared to the Pandas Extension types.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5287/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1999657332 I_kwDOAMm_X853MFl0 8463 Categorical Array ilan-gold 43999641 closed 0     19 2023-11-17T17:57:12Z 2024-04-18T12:52:04Z 2024-04-18T12:52:04Z CONTRIBUTOR      

Is your feature request related to a problem?

We are looking to improve compatibility between AnnData and xarray (see https://github.com/scverse/anndata/issues/744), and so categoricals are naturally on our roadmap. Thus, I think some sort of standard-use categoricals array would be desirable. It seems something similar has come up with netCDF, although my knowledge is limited so this issue may be more distinct than I am aware. So what comes of this issue may solve two birds with one stone, or it may work towards some common solution that can at least help both use-cases (AnnData and netCDF ENUM).

Describe the solution you'd like

The goal would be a standard-use categorical data type xarray container of some sort. I'm not sure what form this can take.

We have something functional here that inherits from ExplicitlyIndexedNDArrayMixin and returns pandas.CategoricalDtype. So let's say this implementation would be at least a conceptual starting point to work from (it also seems not dissimilar to what is done here for new CF types).

Some issues: 1. I have no idea what a standard "return type" for an xarray categorical array should be (i.e., numpy with the categories applied, pandas, something custom etc.). So I'm not sure if using pandas.CategoricalDtype type is acceptable as In do in the linked implementation. Relatedly.... 2. I don't think using pandas.CategoricalDtype really helps with the already existing CF Enum need if you want to have the return type be some sort of numpy array (although again, not sure about the return type). As I understand it, though, the whole point of categoricals is to use integers as the base type and then only show "strings" outwardly i.e., printing, the API for equality operations, accessors etc., while the internals are based on integers. So I'm not really sure numpy is even an option here. Maybe we roll our own solution? 3. I am not sure this is the right level at which to implement this (maybe it should be a Variable? I don't think so, but I am just a beginner here 😄 )

It seems you may want, in addition to the array container, some sort of i/o functionality for this feature (so maybe some on-disk specification?).

Describe alternatives you've considered

I think there is some route via VariableCoder as hinted here i.e., using encode/decode. This would probably be more general purpose as we could encode directly to other data types if using pandas is not desirable. Maybe this would be a way to support both netCDF and returning a pandas.CategoricalDtype (again, not sure what the netCDF return type should be for ENUM).

Additional context

So just for reference, the current behavior of to_xarray with pandas.CategoricalDtype is object dtype from numpy:

```python import pandas as pd df = pd.DataFrame({'cat': ['a', 'b', 'a', 'b', 'c']}) df['cat'] = df['cat'].astype('category') df.to_xarray()['cat']

<xarray.DataArray 'cat' (index: 5)>

array(['a', 'b', 'a', 'b', 'c'], dtype=object)

Coordinates:

* index (index) int64 0 1 2 3 4

```

And as stated in the netCDF issue, for that use-case, the information about ENUM is lost (from what I can read).

Apologies if I'm missing something here! Feedback welcome! Sorry if this is a bit chaotic, just trying to cover my bases.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8463/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2246986030 PR_kwDOAMm_X85s2plY 8948 Migrate datatree mapping.py owenlittlejohns 7788154 closed 0     1 2024-04-16T22:36:48Z 2024-04-17T20:44:29Z 2024-04-17T19:59:34Z CONTRIBUTOR   0 pydata/xarray/pulls/8948

This PR continues the overall work of migrating DataTree into xarray.

datatree_mapping.py is the renamed version of mapping.py from the datatree repository.

  • [x] Closes migration step for mapping.py #8572
  • [ ] ~~Tests added~~
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8948/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2244681150 PR_kwDOAMm_X85suxIl 8947 Add mypy to dev dependencies max-sixty 5635139 closed 0     0 2024-04-15T21:39:19Z 2024-04-17T16:39:23Z 2024-04-17T16:39:22Z MEMBER   0 pydata/xarray/pulls/8947  
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8947/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2075019328 PR_kwDOAMm_X85juCQ- 8603 Convert 360_day calendars by choosing random dates to drop or add aulemahal 20629530 closed 0     3 2024-01-10T19:13:31Z 2024-04-16T14:53:42Z 2024-04-16T14:53:42Z CONTRIBUTOR   0 pydata/xarray/pulls/8603
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst

Small PR to add a new "method" to convert to and from 360_day calendars. The current two methods (chosen with the align_on keyword) will always remove or add the same day-of-year for all years of the same length.

This new option will randomly chose the days, one for each fifth of the year (72-days period). It emulates the method of the LOCA datasets (see web page and article ). February 29th is always removed/added when the source/target is a leap year.

I copied the implementation from xclim (which I wrote), see code here .

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8603/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2243268327 I_kwDOAMm_X86FtY7n 8944 When opening a zipped Dataset stored under Zarr on a s3 bucket, `botocore.exceptions.NoCredentialsError: Unable to locate credentials` eschalkargans 119882363 closed 0     2 2024-04-15T10:13:58Z 2024-04-15T19:51:44Z 2024-04-15T19:51:43Z NONE      

What happened?

A zipped Zarr store is available on s3 bucket that requires authentication.

When using xr.open_dataset, the following exception occurs:

NoCredentialsError: Unable to locate credentials

What did you expect to happen?

I expected the dataset to be openable.

Minimal Complete Verifiable Example

It is difficult for me to describe a MCVE as it requires a remote file on an s3 bucket requiring authentication.

To reproduce fully, one must have access to a zipped zarr on an s3 bucket requiring authentication.

```Python

import xarray as xr

credentials_key = "key" credentials_secret = "secret" credentials_endpoint_url = "endpoint_url" credentials_region_name = "region"

storage_options = dict( key=credentials_key, secret=credentials_secret, client_kwargs=dict( endpoint_url=credentials_endpoint_url, region_name=credentials_region_name, ), )

zip_s3_zarr_path = "zip::s3://path/to/my/dataset.zarr.zip"

xds = xr.open_dataset( zip_s3_zarr_path, backend_kwargs={"storage_options": storage_options}, engine="zarr", group="/", consolidated=True, ) ```

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

```Python --------------------------------------------------------------------------- NoCredentialsError Traceback (most recent call last) Cell In[4], line 1 ----> 1 xds = xr.open_dataset( 2 zip_s3_zarr_path, 3 backend_kwargs={"storage_options": storage_options}, 4 engine="zarr", 5 group="/", 6 consolidated=True, 7 ) File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/xarray/backends/api.py:573, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs) 561 decoders = _resolve_decoders_kwargs( 562 decode_cf, 563 open_backend_dataset_parameters=backend.open_dataset_parameters, (...) 569 decode_coords=decode_coords, 570 ) 572 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None) --> 573 backend_ds = backend.open_dataset( 574 filename_or_obj, 575 drop_variables=drop_variables, 576 **decoders, 577 **kwargs, 578 ) 579 ds = _dataset_from_backend_dataset( 580 backend_ds, 581 filename_or_obj, (...) 591 **kwargs, 592 ) 593 return ds File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/xarray/backends/zarr.py:967, in ZarrBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel, zarr_version) 946 def open_dataset( # type: ignore[override] # allow LSP violation, not supporting **kwargs 947 self, 948 filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore, (...) 964 zarr_version=None, 965 ) -> Dataset: 966 filename_or_obj = _normalize_path(filename_or_obj) --> 967 store = ZarrStore.open_group( 968 filename_or_obj, 969 group=group, 970 mode=mode, 971 synchronizer=synchronizer, 972 consolidated=consolidated, 973 consolidate_on_close=False, 974 chunk_store=chunk_store, 975 storage_options=storage_options, 976 stacklevel=stacklevel + 1, 977 zarr_version=zarr_version, 978 ) 980 store_entrypoint = StoreBackendEntrypoint() 981 with close_on_error(store): File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/xarray/backends/zarr.py:454, in ZarrStore.open_group(cls, store, mode, synchronizer, group, consolidated, consolidate_on_close, chunk_store, storage_options, append_dim, write_region, safe_chunks, stacklevel, zarr_version, write_empty) 451 raise FileNotFoundError(f"No such file or directory: '{store}'") 452 elif consolidated: 453 # TODO: an option to pass the metadata_key keyword --> 454 zarr_group = zarr.open_consolidated(store, **open_kwargs) 455 else: 456 zarr_group = zarr.open_group(store, **open_kwargs) File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/zarr/convenience.py:1334, in open_consolidated(store, metadata_key, mode, **kwargs) 1332 # normalize parameters 1333 zarr_version = kwargs.get("zarr_version") -> 1334 store = normalize_store_arg( 1335 store, storage_options=kwargs.get("storage_options"), mode=mode, zarr_version=zarr_version 1336 ) 1337 if mode not in {"r", "r+"}: 1338 raise ValueError("invalid mode, expected either 'r' or 'r+'; found {!r}".format(mode)) File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/zarr/storage.py:197, in normalize_store_arg(store, storage_options, mode, zarr_version) 195 else: 196 raise ValueError("zarr_version must be either 2 or 3") --> 197 return normalize_store(store, storage_options, mode) File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/zarr/storage.py:167, in _normalize_store_arg_v2(store, storage_options, mode) 165 if isinstance(store, str): 166 if "://" in store or "::" in store: --> 167 return FSStore(store, mode=mode, **(storage_options or {})) 168 elif storage_options: 169 raise ValueError("storage_options passed with non-fsspec path") File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/zarr/storage.py:1377, in FSStore.__init__(self, url, normalize_keys, key_separator, mode, exceptions, dimension_separator, fs, check, create, missing_exceptions, **storage_options) 1375 if protocol in (None, "file") and not storage_options.get("auto_mkdir"): 1376 storage_options["auto_mkdir"] = True -> 1377 self.map = fsspec.get_mapper(url, **{**mapper_options, **storage_options}) 1378 self.fs = self.map.fs # for direct operations 1379 self.path = self.fs._strip_protocol(url) File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/fsspec/mapping.py:245, in get_mapper(url, check, create, missing_exceptions, alternate_root, **kwargs) 214 """Create key-value interface for given URL and options 215 216 The URL will be of the form "protocol://location" and point to the root (...) 242 ``FSMap`` instance, the dict-like key-value store. 243 """ 244 # Removing protocol here - could defer to each open() on the backend --> 245 fs, urlpath = url_to_fs(url, **kwargs) 246 root = alternate_root if alternate_root is not None else urlpath 247 return FSMap(root, fs, check, create, missing_exceptions=missing_exceptions) File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/fsspec/core.py:388, in url_to_fs(url, **kwargs) 386 inkwargs["fo"] = urls 387 urlpath, protocol, _ = chain[0] --> 388 fs = filesystem(protocol, **inkwargs) 389 return fs, urlpath File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/fsspec/registry.py:290, in filesystem(protocol, **storage_options) 283 warnings.warn( 284 "The 'arrow_hdfs' protocol has been deprecated and will be " 285 "removed in the future. Specify it as 'hdfs'.", 286 DeprecationWarning, 287 ) 289 cls = get_filesystem_class(protocol) --> 290 return cls(**storage_options) File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/fsspec/spec.py:79, in _Cached.__call__(cls, *args, **kwargs) 77 return cls._cache[token] 78 else: ---> 79 obj = super().__call__(*args, **kwargs) 80 # Setting _fs_token here causes some static linters to complain. 81 obj._fs_token_ = token File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/fsspec/implementations/zip.py:56, in ZipFileSystem.__init__(self, fo, mode, target_protocol, target_options, compression, allowZip64, compresslevel, **kwargs) 52 fo = fsspec.open( 53 fo, mode=mode + "b", protocol=target_protocol, **(target_options or {}), # **kwargs 54 ) 55 self.of = fo ---> 56 self.fo = fo.__enter__() # the whole instance is a context 57 self.zip = zipfile.ZipFile( 58 self.fo, 59 mode=mode, (...) 62 compresslevel=compresslevel, 63 ) 64 self.dir_cache = None File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/fsspec/core.py:100, in OpenFile.__enter__(self) 97 def __enter__(self): 98 mode = self.mode.replace("t", "").replace("b", "") + "b" --> 100 f = self.fs.open(self.path, mode=mode) 102 self.fobjects = [f] 104 if self.compression is not None: File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/fsspec/spec.py:1307, in AbstractFileSystem.open(self, path, mode, block_size, cache_options, compression, **kwargs) 1305 else: 1306 ac = kwargs.pop("autocommit", not self._intrans) -> 1307 f = self._open( 1308 path, 1309 mode=mode, 1310 block_size=block_size, 1311 autocommit=ac, 1312 cache_options=cache_options, 1313 **kwargs, 1314 ) 1315 if compression is not None: 1316 from fsspec.compression import compr File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/s3fs/core.py:671, in S3FileSystem._open(self, path, mode, block_size, acl, version_id, fill_cache, cache_type, autocommit, size, requester_pays, cache_options, **kwargs) 668 if cache_type is None: 669 cache_type = self.default_cache_type --> 671 return S3File( 672 self, 673 path, 674 mode, 675 block_size=block_size, 676 acl=acl, 677 version_id=version_id, 678 fill_cache=fill_cache, 679 s3_additional_kwargs=kw, 680 cache_type=cache_type, 681 autocommit=autocommit, 682 requester_pays=requester_pays, 683 cache_options=cache_options, 684 size=size, 685 ) File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/s3fs/core.py:2099, in S3File.__init__(self, s3, path, mode, block_size, acl, version_id, fill_cache, s3_additional_kwargs, autocommit, cache_type, requester_pays, cache_options, size) 2097 self.details = s3.info(path) 2098 self.version_id = self.details.get("VersionId") -> 2099 super().__init__( 2100 s3, 2101 path, 2102 mode, 2103 block_size, 2104 autocommit=autocommit, 2105 cache_type=cache_type, 2106 cache_options=cache_options, 2107 size=size, 2108 ) 2109 self.s3 = self.fs # compatibility 2111 # when not using autocommit we want to have transactional state to manage File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/fsspec/spec.py:1663, in AbstractBufferedFile.__init__(self, fs, path, mode, block_size, autocommit, cache_type, cache_options, size, **kwargs) 1661 self.size = size 1662 else: -> 1663 self.size = self.details["size"] 1664 self.cache = caches[cache_type]( 1665 self.blocksize, self._fetch_range, self.size, **cache_options 1666 ) 1667 else: File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/fsspec/spec.py:1676, in AbstractBufferedFile.details(self) 1673 @property 1674 def details(self): 1675 if self._details is None: -> 1676 self._details = self.fs.info(self.path) 1677 return self._details File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/fsspec/asyn.py:118, in sync_wrapper.<locals>.wrapper(*args, **kwargs) 115 @functools.wraps(func) 116 def wrapper(*args, **kwargs): 117 self = obj or args[0] --> 118 return sync(self.loop, func, *args, **kwargs) File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/fsspec/asyn.py:103, in sync(loop, func, timeout, *args, **kwargs) 101 raise FSTimeoutError from return_result 102 elif isinstance(return_result, BaseException): --> 103 raise return_result 104 else: 105 return return_result File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/fsspec/asyn.py:56, in _runner(event, coro, result, timeout) 54 coro = asyncio.wait_for(coro, timeout=timeout) 55 try: ---> 56 result[0] = await coro 57 except Exception as ex: 58 result[0] = ex File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/s3fs/core.py:1302, in S3FileSystem._info(self, path, bucket, key, refresh, version_id) 1300 if key: 1301 try: -> 1302 out = await self._call_s3( 1303 "head_object", 1304 self.kwargs, 1305 Bucket=bucket, 1306 Key=key, 1307 **version_id_kw(version_id), 1308 **self.req_kw, 1309 ) 1310 return { 1311 "ETag": out.get("ETag", ""), 1312 "LastModified": out["LastModified"], (...) 1318 "ContentType": out.get("ContentType"), 1319 } 1320 except FileNotFoundError: File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/s3fs/core.py:348, in S3FileSystem._call_s3(self, method, *akwarglist, **kwargs) 346 logger.debug("CALL: %s - %s - %s", method.__name__, akwarglist, kw2) 347 additional_kwargs = self._get_s3_method_kwargs(method, *akwarglist, **kwargs) --> 348 return await _error_wrapper( 349 method, kwargs=additional_kwargs, retries=self.retries 350 ) File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/s3fs/core.py:140, in _error_wrapper(func, args, kwargs, retries) 138 err = e 139 err = translate_boto_error(err) --> 140 raise err File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/s3fs/core.py:113, in _error_wrapper(func, args, kwargs, retries) 111 for i in range(retries): 112 try: --> 113 return await func(*args, **kwargs) 114 except S3_RETRYABLE_ERRORS as e: 115 err = e File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/aiobotocore/client.py:366, in AioBaseClient._make_api_call(self, operation_name, api_params) 362 maybe_compress_request( 363 self.meta.config, request_dict, operation_model 364 ) 365 apply_request_checksum(request_dict) --> 366 http, parsed_response = await self._make_request( 367 operation_model, request_dict, request_context 368 ) 370 await self.meta.events.emit( 371 'after-call.{service_id}.{operation_name}'.format( 372 service_id=service_id, operation_name=operation_name (...) 377 context=request_context, 378 ) 380 if http.status_code >= 300: File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/aiobotocore/client.py:391, in AioBaseClient._make_request(self, operation_model, request_dict, request_context) 387 async def _make_request( 388 self, operation_model, request_dict, request_context 389 ): 390 try: --> 391 return await self._endpoint.make_request( 392 operation_model, request_dict 393 ) 394 except Exception as e: 395 await self.meta.events.emit( 396 'after-call-error.{service_id}.{operation_name}'.format( 397 service_id=self._service_model.service_id.hyphenize(), (...) 401 context=request_context, 402 ) File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/aiobotocore/endpoint.py:96, in AioEndpoint._send_request(self, request_dict, operation_model) 94 context = request_dict['context'] 95 self._update_retries_context(context, attempts) ---> 96 request = await self.create_request(request_dict, operation_model) 97 success_response, exception = await self._get_response( 98 request, operation_model, context 99 ) 100 while await self._needs_retry( 101 attempts, 102 operation_model, (...) 105 exception, 106 ): File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/aiobotocore/endpoint.py:84, in AioEndpoint.create_request(self, params, operation_model) 80 service_id = operation_model.service_model.service_id.hyphenize() 81 event_name = 'request-created.{service_id}.{op_name}'.format( 82 service_id=service_id, op_name=operation_model.name 83 ) ---> 84 await self._event_emitter.emit( 85 event_name, 86 request=request, 87 operation_name=operation_model.name, 88 ) 89 prepared_request = self.prepare_request(request) 90 return prepared_request File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/aiobotocore/hooks.py:66, in AioHierarchicalEmitter._emit(self, event_name, kwargs, stop_on_response) 63 logger.debug('Event %s: calling handler %s', event_name, handler) 65 # Await the handler if its a coroutine. ---> 66 response = await resolve_awaitable(handler(**kwargs)) 67 responses.append((handler, response)) 68 if stop_on_response and response is not None: File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/aiobotocore/_helpers.py:15, in resolve_awaitable(obj) 13 async def resolve_awaitable(obj): 14 if inspect.isawaitable(obj): ---> 15 return await obj 17 return obj File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/aiobotocore/signers.py:24, in AioRequestSigner.handler(self, operation_name, request, **kwargs) 19 async def handler(self, operation_name=None, request=None, **kwargs): 20 # This is typically hooked up to the "request-created" event 21 # from a client's event emitter. When a new request is created 22 # this method is invoked to sign the request. 23 # Don't call this method directly. ---> 24 return await self.sign(operation_name, request) File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/aiobotocore/signers.py:82, in AioRequestSigner.sign(self, operation_name, request, region_name, signing_type, expires_in, signing_name) 79 else: 80 raise e ---> 82 auth.add_auth(request) File ~/.pyenv/versions/3.11.6/envs/work-env/lib/python3.11/site-packages/botocore/auth.py:418, in SigV4Auth.add_auth(self, request) 416 def add_auth(self, request): 417 if self.credentials is None: --> 418 raise NoCredentialsError() 419 datetime_now = datetime.datetime.utcnow() 420 request.context['timestamp'] = datetime_now.strftime(SIGV4_TIMESTAMP) NoCredentialsError: Unable to locate credentials ``` ### Anything else we need to know? #### Summary When debugging, I found a bugfix, to be made in the `fsspec` library. I still wanted to create the issue in the xarray repo as the bug happened to me while using xarray, and another xarray users might have similar issues, so creating the issue here serves as a potential bridge for future users #### Details Bug in `fsspec: 2023.10.0`: it forgets to pass the `kwargs` to the `open` method in `ZipFileSystem.__init__`. Current: ```python fo = fsspec.open( fo, mode=mode + "b", protocol=target_protocol, **(target_options or {}) ) ``` Bugfix: (passing the kwargs) ```python fo = fsspec.open( fo, mode=mode + "b", protocol=target_protocol, **(target_options or {}), **kwargs ) ```

Note: the missing kwargs passing is still present in the latest main branch at the time of writing this issue: https://github.com/fsspec/filesystem_spec/blob/37c1bc63b9c5a5b2b9a0d5161e89b4233f888b29/fsspec/implementations/zip.py#L56

Tested on my local environment by editing fsspec itself. The Zip Zarr store on the s3 bucket can then be opened successfully.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 (main, Jan 10 2024, 20:45:04) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.15.0-102-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2023.10.1 pandas: 2.1.4 numpy: 1.26.2 scipy: 1.11.3 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: 3.10.0 Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.11.0 distributed: 2023.11.0 matplotlib: 3.7.1 cartopy: 0.22.0 seaborn: None numbagg: None fsspec: 2023.10.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.2.1 conda: None pytest: 7.4.3 mypy: 1.7.0 IPython: 8.20.0 sphinx: 6.2.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8944/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2239835092 PR_kwDOAMm_X85seXWW 8932 FIX: use str dtype without size information kmuehlbauer 5821660 closed 0     11 2024-04-12T10:59:45Z 2024-04-15T19:43:22Z 2024-04-13T12:25:48Z MEMBER   0 pydata/xarray/pulls/8932

Aims to resolve parts of #8844.

python xarray/tests/test_accessor_str.py::test_case_str: AssertionError: assert dtype('<U26') == dtype('<U30') I'm not sure this is the right location for the fix, at least it fixes those errors. AFAICT this is some issue somewhere inside apply_ufunc where the string dtype size is kept. So this fix removes the size information from the dtype (actually recreating it).

  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8932/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2242781767 PR_kwDOAMm_X85soOln 8943 Bump codecov/codecov-action from 4.2.0 to 4.3.0 in the actions group dependabot[bot] 49699333 closed 0     0 2024-04-15T06:04:28Z 2024-04-15T19:16:38Z 2024-04-15T19:16:38Z CONTRIBUTOR   0 pydata/xarray/pulls/8943

Bumps the actions group with 1 update: codecov/codecov-action.

Updates codecov/codecov-action from 4.2.0 to 4.3.0

Release notes

Sourced from codecov/codecov-action's releases.

v4.3.0

What's Changed

  • fix: automatically detect if using GitHub enterprise by @​thomasrockhu-codecov in codecov/codecov-action#1356
  • build(deps-dev): bump typescript from 5.4.3 to 5.4.4 by @​dependabot in codecov/codecov-action#1355
  • build(deps): bump github/codeql-action from 3.24.9 to 3.24.10 by @​dependabot in codecov/codecov-action#1360
  • build(deps-dev): bump @​typescript-eslint/eslint-plugin from 7.5.0 to 7.6.0 by @​dependabot in codecov/codecov-action#1364
  • build(deps-dev): bump @​typescript-eslint/parser from 7.5.0 to 7.6.0 by @​dependabot in codecov/codecov-action#1363
  • feat: add network params by @​thomasrockhu-codecov in codecov/codecov-action#1365
  • build(deps): bump undici from 5.28.3 to 5.28.4 by @​dependabot in codecov/codecov-action#1361
  • chore(release): v4.3.0 by @​thomasrockhu-codecov in codecov/codecov-action#1366

Full Changelog: https://github.com/codecov/codecov-action/compare/v4.2.0...v4.3.0

Commits
  • 8450866 chore(release): v4.3.0 (#1366)
  • e841909 build(deps): bump undici from 5.28.3 to 5.28.4 (#1361)
  • 363a65a feat: add network params (#1365)
  • 640b86a build(deps-dev): bump @​typescript-eslint/parser from 7.5.0 to 7.6.0 (#1363)
  • 375c033 build(deps-dev): bump @​typescript-eslint/eslint-plugin from 7.5.0 to 7.6.0 (#...
  • d701256 build(deps): bump github/codeql-action from 3.24.9 to 3.24.10 (#1360)
  • 0bb547a build(deps-dev): bump typescript from 5.4.3 to 5.4.4 (#1355)
  • 55e8381 fix: automatically detect if using GitHub enterprise (#1356)
  • See full diff in compare view


Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore <dependency name> major version` will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself) - `@dependabot ignore <dependency name> minor version` will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself) - `@dependabot ignore <dependency name>` will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself) - `@dependabot unignore <dependency name>` will remove all of the ignore conditions of the specified dependency - `@dependabot unignore <dependency name> <ignore condition>` will remove the ignore condition of the specified dependency and ignore conditions
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8943/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2208982027 PR_kwDOAMm_X85q1Lns 8879 Migrate iterators.py for datatree. owenlittlejohns 7788154 closed 0     2 2024-03-26T18:14:53Z 2024-04-15T16:23:56Z 2024-04-11T15:28:25Z CONTRIBUTOR   0 pydata/xarray/pulls/8879

This PR continues the overall work of migrating DataTree into xarray.

iterators.py does not have direct tests. In discussions with @TomNicholas and @flamingbear, we concurred that other unit tests utilise this functionality.

  • [x] Closes migration step for iterators.py #8572
  • [ ] ~~Tests added~~
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8879/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2240895281 PR_kwDOAMm_X85siDno 8934 Correct save_mfdataset docstring TomNicholas 35968931 closed 0     0 2024-04-12T20:51:35Z 2024-04-14T19:58:46Z 2024-04-14T11:14:42Z MEMBER   0 pydata/xarray/pulls/8934

Noticed the **kwargs part of the docstring was mangled - see here

  • [ ] ~~Closes #xxxx~~
  • [ ] ~~Tests added~~
  • [ ] ~~User visible changes (including notable bug fixes) are documented in whats-new.rst~~
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8934/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2236408438 PR_kwDOAMm_X85sSjdN 8926 no untyped tests Illviljan 14371165 closed 0     2 2024-04-10T20:52:29Z 2024-04-14T16:15:45Z 2024-04-14T16:15:45Z MEMBER   1 pydata/xarray/pulls/8926
  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8926/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2241528898 PR_kwDOAMm_X85skNON 8940 adapt more tests to the copy-on-write behavior of pandas keewis 14808389 closed 0     1 2024-04-13T11:57:10Z 2024-04-13T19:36:30Z 2024-04-13T14:44:50Z MEMBER   0 pydata/xarray/pulls/8940
  • [x] follow-up to #8846
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8940/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2241499231 PR_kwDOAMm_X85skHW9 8938 use `pd.to_timedelta` instead of `TimedeltaIndex` keewis 14808389 closed 0     0 2024-04-13T10:38:12Z 2024-04-13T12:32:14Z 2024-04-13T12:32:13Z MEMBER   0 pydata/xarray/pulls/8938

pandas recently removed the deprecated unit kwarg to TimedeltaIndex.

  • [x] towards #8844
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8938/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2241228882 PR_kwDOAMm_X85sjOh9 8936 MAINT: use sphinxext-rediraffe conda install raybellwaves 17162724 closed 0     1 2024-04-13T02:11:07Z 2024-04-13T02:53:53Z 2024-04-13T02:53:48Z CONTRIBUTOR   0 pydata/xarray/pulls/8936
  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8936/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2066807588 I_kwDOAMm_X857MPsk 8590 ValueError: conflicting sizes for dimension in xr.open_dataset("reference://"...) VS. no error in xr.open_dataset(direct_file_path) for h5 ksharonin 90292403 closed 0     3 2024-01-05T06:34:34Z 2024-04-11T06:54:44Z 2024-04-11T06:54:44Z NONE      

What is your issue?

Hi all, on a project I am attempting a dataset read using the xarray JSON reference system. Metadata for this file (an ATL03 h5 file) can be found here: https://nsidc.org/sites/default/files/icesat2_atl03_data_dict_v005.pdf

  1. Reading a group with variables that have 2 dimensions or less produces no issues. E.g Group "gt1l/heights" (documented as /gtx/heights in the PDF) ds = xr.open_dataset("reference://", engine="zarr", backend_kwargs={ "consolidated": False, "storage_options": {"fo": JSON_PATH}, "group": "gt1l/heights" })
  2. Reading a group with a variable that has 3+ dimensions causes the following error. The group "ancillary_data/calibrations/dead_time_radiometric_signal_loss/gt1l" contains variable rad_corr which contains 3 dimensions. ds = xr.open_dataset("reference://", engine="zarr", backend_kwargs={ "consolidated": False, "storage_options": {"fo": JSON_PATH}, "group": "ancillary_data/calibrations/dead_time_radiometric_signal_loss/gt1l" }) ``` { "name": "ValueError", "message": "conflicting sizes for dimension 'phony_dim_1': length 498 on 'width' and length 160 on {'phony_dim_0': 'dead_time', 'phony_dim_1': 'rad_corr', 'phony_dim_2': 'rad_corr'}", "stack": "--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[2], line 1 ----> 1 ds = xr.open_dataset(\"reference://\", engine=\"zarr\", backend_kwargs={ 2 \"consolidated\": False, 3 \"storage_options\": {\"fo\": JSON_PATH}, 4 \"group\": group_path 5 })

File ~/opt/anaconda3/envs/kerchunkc/lib/python3.8/site-packages/xarray/backends/api.py:539, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, backend_kwargs, kwargs) 527 decoders = _resolve_decoders_kwargs( 528 decode_cf, 529 open_backend_dataset_parameters=backend.open_dataset_parameters, (...) 535 decode_coords=decode_coords, 536 ) 538 overwrite_encoded_chunks = kwargs.pop(\"overwrite_encoded_chunks\", None) --> 539 backend_ds = backend.open_dataset( 540 filename_or_obj, 541 drop_variables=drop_variables, 542 decoders, 543 kwargs, 544 ) 545 ds = _dataset_from_backend_dataset( 546 backend_ds, 547 filename_or_obj, (...) 555 kwargs, 556 ) 557 return ds

File ~/opt/anaconda3/envs/kerchunkc/lib/python3.8/site-packages/xarray/backends/zarr.py:862, in ZarrBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel) 860 store_entrypoint = StoreBackendEntrypoint() 861 with close_on_error(store): --> 862 ds = store_entrypoint.open_dataset( 863 store, 864 mask_and_scale=mask_and_scale, 865 decode_times=decode_times, 866 concat_characters=concat_characters, 867 decode_coords=decode_coords, 868 drop_variables=drop_variables, 869 use_cftime=use_cftime, 870 decode_timedelta=decode_timedelta, 871 ) 872 return ds

File ~/opt/anaconda3/envs/kerchunkc/lib/python3.8/site-packages/xarray/backends/store.py:43, in StoreBackendEntrypoint.open_dataset(self, store, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta) 29 encoding = store.get_encoding() 31 vars, attrs, coord_names = conventions.decode_cf_variables( 32 vars, 33 attrs, (...) 40 decode_timedelta=decode_timedelta, 41 ) ---> 43 ds = Dataset(vars, attrs=attrs) 44 ds = ds.set_coords(coord_names.intersection(vars)) 45 ds.set_close(store.close)

File ~/opt/anaconda3/envs/kerchunkc/lib/python3.8/site-packages/xarray/core/dataset.py:604, in Dataset.init(self, data_vars, coords, attrs) 601 if isinstance(coords, Dataset): 602 coords = coords.variables --> 604 variables, coord_names, dims, indexes, _ = merge_data_and_coords( 605 data_vars, coords, compat=\"broadcast_equals\" 606 ) 608 self._attrs = dict(attrs) if attrs is not None else None 609 self._close = None

File ~/opt/anaconda3/envs/kerchunkc/lib/python3.8/site-packages/xarray/core/merge.py:575, in merge_data_and_coords(data_vars, coords, compat, join) 573 objects = [data_vars, coords] 574 explicit_coords = coords.keys() --> 575 return merge_core( 576 objects, 577 compat, 578 join, 579 explicit_coords=explicit_coords, 580 indexes=Indexes(indexes, coords), 581 )

File ~/opt/anaconda3/envs/kerchunkc/lib/python3.8/site-packages/xarray/core/merge.py:761, in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value) 756 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat) 757 variables, out_indexes = merge_collected( 758 collected, prioritized, compat=compat, combine_attrs=combine_attrs 759 ) --> 761 dims = calculate_dimensions(variables) 763 coord_names, noncoord_names = determine_coords(coerced) 764 if explicit_coords is not None:

File ~/opt/anaconda3/envs/kerchunkc/lib/python3.8/site-packages/xarray/core/variable.py:3208, in calculate_dimensions(variables) 3206 last_used[dim] = k 3207 elif dims[dim] != size: -> 3208 raise ValueError( 3209 f\"conflicting sizes for dimension {dim!r}: \" 3210 f\"length {size} on {k!r} and length {dims[dim]} on {last_used!r}\" 3211 ) 3212 return dims

ValueError: conflicting sizes for dimension 'phony_dim_1': length 498 on 'width' and length 160 on {'phony_dim_0': 'dead_time', 'phony_dim_1': 'rad_corr', 'phony_dim_2': 'rad_corr'}" }

```

  1. Now, contrast with reading the same group with the 3+ dimension variables, but using the direct h5 file path. This does not produce an error ds = xr.open_dataset("/Users/katrinasharonin/Downloads/ATL03_20230816235231_08822014_006_01.h5", group="ancillary_data/calibrations/dead_time_radiometric_signal_loss/gt1l")

The JSON reference file has been attached for reference ATL03_REF_NONUTM.json

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8590/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2117245042 I_kwDOAMm_X85-Mphy 8703 calling to_zarr inside map_blocks function results in missing values martinspetlik 23472459 closed 0     8 2024-02-04T18:21:40Z 2024-04-11T06:53:45Z 2024-04-11T06:53:45Z NONE      

What happened?

I want to work with a huge dataset stored in hdf5 loaded in chunks. Each chunk contains part of my data that should be saved to a specific region of zarr files. I need to follow the original order of chunks. I found it a convenient way to use a map_blocks function for this purpose. However, I have missing values in the final zarr file. Some chunks or parts of chunks are not stored.

I used a simplified scenario for code documenting this behavior. The initial zarr file of zeros is filled with ones. There are always some parts where there are still zeros.

What did you expect to happen?

No response

Minimal Complete Verifiable Example

```Python import os import shutil import xarray as xr import numpy as np import dask.array as da

xr.show_versions()

zarr_file = "file.zarr" if os.path.exists(zarr_file): shutil.rmtree(zarr_file)

chunk_size = 5 shape = (50, 32, 1000) ones_dataset = xr.Dataset({"data": xr.ones_like(xr.DataArray(np.empty(shape)))}) ones_dataset = ones_dataset.chunk({'dim_0': chunk_size}) chunk_indices = np.arange(len(ones_dataset.chunks['dim_0'])) chunk_ids = np.repeat(np.arange(ones_dataset.sizes["dim_0"] // chunk_size), chunk_size) chunk_ids_dask_array = da.from_array(chunk_ids, chunks=(chunk_size,))

Append the chunk IDs Dask array as a new variable to the existing dataset

ones_dataset['chunk_id'] = (('dim_0',), chunk_ids_dask_array)

Create a new dataset filled with zeros

zeros_dataset = xr.Dataset({"data": xr.zeros_like(xr.DataArray(np.empty(shape)))}) zeros_dataset.to_zarr(zarr_file, compute=False)

def process_chunk(chunk_dataset): chunk_id = int(chunk_dataset["chunk_id"][0]) chunk_dataset_to_store = chunk_dataset.drop_vars("chunk_id")

start_index = chunk_id * chunk_size
end_index = chunk_id * chunk_size + chunk_size

chunk_dataset_to_store.to_zarr(zarr_file, region={'dim_0': slice(start_index, end_index)})
return chunk_dataset

ones_dataset.map_blocks(process_chunk, template=ones_dataset).compute()

Load data stored in zarr

zarr_data = xr.open_zarr(zarr_file, chunks={'dim_0': chunk_size})

Find differences

for var_name in zarr_data.variables: try: xr.testing.assert_equal(zarr_data[var_name], ones_dataset[var_name]) except AssertionError: print(f"Differences in {var_name}:") print(zarr_data[var_name].values) print(ones_dataset[var_name].values) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] python-bits: 64 OS: Linux OS-release: 6.5.0-15-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2024.1.1 pandas: 2.1.4 numpy: 1.26.3 scipy: 1.11.4 netCDF4: 1.6.5 pydap: None h5netcdf: 1.3.0 h5py: 3.10.0 Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: 1.4.1 iris: None bottleneck: 1.3.7 dask: 2024.1.1 distributed: 2024.1.0 matplotlib: 3.8.2 cartopy: 0.22.0 seaborn: 0.13.1 numbagg: 0.6.8 fsspec: 2023.12.2 cupy: None pint: None sparse: None flox: 0.8.9 numpy_groupies: 0.10.2 setuptools: 69.0.2 pip: 23.3.1 conda: None pytest: 7.4.4 mypy: None IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8703/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2189919172 I_kwDOAMm_X86Ch4PE 8848 numpy where argument crashes the program ShaiAvr 35605235 closed 0     3 2024-03-16T11:17:22Z 2024-04-11T06:53:28Z 2024-04-11T06:53:28Z NONE      

What happened?

I was trying to divide two DataArray objects, but I needed to check for zero values. I though to do something like: python res = np.divide(a, b, out=np.zeros_like(a, dtype=float), where=b != 0) to get zero whenever b == 0, but running this resulted in a crush of infinite recursion. Part of the traceback is shown below. The full traceback is way too long to post here.

What did you expect to happen?

The division would succeed and I'd get a DataArray with the same dimensions as a and b with the correct results and zero whenever b == 0.

Minimal Complete Verifiable Example

```Python import numpy as np import xarray as xr

a = xr.DataArray([-1, 0, 1, 2, 3], dims="x") b = xr.DataArray([0, 0, 0, 2, 2], dims="x") print(np.divide(a, b, out=np.zeros_like(a, dtype=float), where=b != 0)) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Python File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 898, in apply_array_ufunc return func(*args) ^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\arithmetic.py", line 83, in __array_ufunc__ return apply_ufunc( ^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 898, in apply_array_ufunc return func(*args) ^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\arithmetic.py", line 83, in __array_ufunc__ return apply_ufunc( ^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 898, in apply_array_ufunc return func(*args) ^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\arithmetic.py", line 83, in __array_ufunc__ return apply_ufunc( ^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 898, in apply_array_ufunc return func(*args) ^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\arithmetic.py", line 83, in __array_ufunc__ return apply_ufunc( ^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 898, in apply_array_ufunc return func(*args) ^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\arithmetic.py", line 83, in __array_ufunc__ return apply_ufunc( ^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 898, in apply_array_ufunc return func(*args) ^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\arithmetic.py", line 83, in __array_ufunc__ return apply_ufunc( ^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 898, in apply_array_ufunc return func(*args) ^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\arithmetic.py", line 83, in __array_ufunc__ return apply_ufunc( ^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 898, in apply_array_ufunc return func(*args) ^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\arithmetic.py", line 83, in __array_ufunc__ return apply_ufunc( ^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 898, in apply_array_ufunc return func(*args) ^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\arithmetic.py", line 83, in __array_ufunc__ return apply_ufunc( ^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 898, in apply_array_ufunc return func(*args) ^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\arithmetic.py", line 83, in __array_ufunc__ return apply_ufunc( ^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 898, in apply_array_ufunc return func(*args) ^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\arithmetic.py", line 83, in __array_ufunc__ return apply_ufunc( ^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 898, in apply_array_ufunc return func(*args) ^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\arithmetic.py", line 83, in __array_ufunc__ return apply_ufunc( ^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 898, in apply_array_ufunc return func(*args) ^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\arithmetic.py", line 83, in __array_ufunc__ return apply_ufunc( ^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 898, in apply_array_ufunc return func(*args) ^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\arithmetic.py", line 83, in __array_ufunc__ return apply_ufunc( ^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 898, in apply_array_ufunc return func(*args) ^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\arithmetic.py", line 83, in __array_ufunc__ return apply_ufunc( ^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 898, in apply_array_ufunc return func(*args) ^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\arithmetic.py", line 83, in __array_ufunc__ return apply_ufunc( ^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 898, in apply_array_ufunc return func(*args) ^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\arithmetic.py", line 83, in __array_ufunc__ return apply_ufunc( ^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 898, in apply_array_ufunc return func(*args) ^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\arithmetic.py", line 83, in __array_ufunc__ return apply_ufunc( ^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 1283, in apply_ufunc return apply_array_ufunc(func, *args, dask=dask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 880, in apply_array_ufunc if any(is_chunked_array(arg) for arg in args): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\core\computation.py", line 880, in <genexpr> if any(is_chunked_array(arg) for arg in args): ^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\namedarray\pycompat.py", line 92, in is_chunked_array return is_duck_dask_array(x) or (is_duck_array(x) and hasattr(x, "chunks")) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\namedarray\utils.py", line 94, in is_duck_dask_array return is_duck_array(x) and is_dask_collection(x) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Portable_Programs\.pyenv\pyenv-win\versions\3.11.5\Lib\site-packages\xarray\namedarray\utils.py", line 68, in is_dask_collection if module_available("dask"): ^^^^^^^^^^^^^^^^^^^^^^^^ RecursionError: maximum recursion depth exceeded while calling a Python object

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.5 (tags/v3.11.5:cce6ba9, Aug 24 2023, 14:38:34) [MSC v.1936 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('English_United States', '1252') libhdf5: 1.14.0 libnetcdf: 4.9.2 xarray: 2024.2.0 pandas: 2.2.1 numpy: 1.26.4 scipy: 1.12.0 netCDF4: 1.6.5 pydap: None h5netcdf: 1.3.0 h5py: 3.10.0 Nio: None zarr: 2.17.1 cftime: 1.6.3 nc_time_axis: 1.4.1 iris: None bottleneck: 1.3.8 dask: 2024.3.1 distributed: 2024.3.1 matplotlib: 3.8.2 cartopy: None seaborn: 0.13.2 numbagg: 0.8.1 fsspec: 2024.2.0 cupy: None pint: 0.23 sparse: None flox: 0.9.3 numpy_groupies: 0.10.2 setuptools: 69.1.1 pip: 24.0 conda: None pytest: 8.0.2 mypy: 1.8.0 IPython: 8.22.1 sphinx: 7.2.6
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8848/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2231409978 PR_kwDOAMm_X85sBYyR 8920 Enhance the ugly error in constructor when no data passed aimtsou 2598924 closed 0     6 2024-04-08T14:42:57Z 2024-04-10T22:46:57Z 2024-04-10T22:46:53Z CONTRIBUTOR   0 pydata/xarray/pulls/8920

This fix enhances the issue 8860. I did not add any test since I believe it is not needed for this case since we did not add any functionality.

  • [X] Closes #8860
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8920/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2198196326 I_kwDOAMm_X86DBdBm 8860 Ugly error in constructor when no data passed TomNicholas 35968931 closed 0     2 2024-03-20T17:55:52Z 2024-04-10T22:46:55Z 2024-04-10T22:46:54Z MEMBER      

What happened?

Passing no data to the Dataset constructor can result in a very unhelpful "tuple index out of range" error when this is a clear case of malformed input that we should be able to catch.

What did you expect to happen?

An error more like "tuple must be of form (dims, data[, attrs])"

Minimal Complete Verifiable Example

Python xr.Dataset({"t": ()})

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

```Python

IndexError Traceback (most recent call last) Cell In[2], line 1 ----> 1 xr.Dataset({"t": ()})

File ~/Documents/Work/Code/xarray/xarray/core/dataset.py:693, in Dataset.init(self, data_vars, coords, attrs) 690 if isinstance(coords, Dataset): 691 coords = coords._variables --> 693 variables, coord_names, dims, indexes, _ = merge_data_and_coords( 694 data_vars, coords 695 ) 697 self._attrs = dict(attrs) if attrs else None 698 self._close = None

File ~/Documents/Work/Code/xarray/xarray/core/dataset.py:422, in merge_data_and_coords(data_vars, coords) 418 coords = create_coords_with_default_indexes(coords, data_vars) 420 # exclude coords from alignment (all variables in a Coordinates object should 421 # already be aligned together) and use coordinates' indexes to align data_vars --> 422 return merge_core( 423 [data_vars, coords], 424 compat="broadcast_equals", 425 join="outer", 426 explicit_coords=tuple(coords), 427 indexes=coords.xindexes, 428 priority_arg=1, 429 skip_align_args=[1], 430 )

File ~/Documents/Work/Code/xarray/xarray/core/merge.py:718, in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value, skip_align_args) 715 for pos, obj in skip_align_objs: 716 aligned.insert(pos, obj) --> 718 collected = collect_variables_and_indexes(aligned, indexes=indexes) 719 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat) 720 variables, out_indexes = merge_collected( 721 collected, prioritized, compat=compat, combine_attrs=combine_attrs 722 )

File ~/Documents/Work/Code/xarray/xarray/core/merge.py:358, in collect_variables_and_indexes(list_of_mappings, indexes) 355 indexes_.pop(name, None) 356 append_all(coords_, indexes_) --> 358 variable = as_variable(variable, name=name, auto_convert=False) 359 if name in indexes: 360 append(name, variable, indexes[name])

File ~/Documents/Work/Code/xarray/xarray/core/variable.py:126, in as_variable(obj, name, auto_convert) 124 obj = obj.copy(deep=False) 125 elif isinstance(obj, tuple): --> 126 if isinstance(obj[1], DataArray): 127 raise TypeError( 128 f"Variable {name!r}: Using a DataArray object to construct a variable is" 129 " ambiguous, please extract the data using the .data property." 130 ) 131 try:

IndexError: tuple index out of range ```

Anything else we need to know?

No response

Environment

Xarray main

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8860/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2232134629 PR_kwDOAMm_X85sD5gg 8922 Add typing to some functions in indexing.py Illviljan 14371165 closed 0     0 2024-04-08T21:45:30Z 2024-04-10T18:05:52Z 2024-04-10T18:05:52Z MEMBER   0 pydata/xarray/pulls/8922

A drive-by PR as I was trying to figure out how these functions works.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8922/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2230680765 I_kwDOAMm_X86E9Xy9 8919 Using the xarray.Dataset.where() function takes up a lot of memory isLiYang 69391863 closed 0     4 2024-04-08T09:15:49Z 2024-04-09T02:45:09Z 2024-04-09T02:45:08Z NONE      

What is your issue?

My python script was killed because it took up too much memory. After checking, I found that the problem is the ds.where() function.

The original netcdf file opened from the hard disk takes up about 10 Mb of storage, but when I mask the data that doesn't match according to the latitude and longitude location, the variable ds takes up a dozen GB of memory. When I deleted this variable using del ds, the memory occupied by the script immediately returned to normal.

```

Open this netcdf file.

ds = xr.open_dataset(track)

If longitude range is [-180, 180], then convert to [0, 360].

if np.any(ds[var_lon] < 0): ds[var_lon] = ds[var_lon] % 360

Extract data by longitude and latitude.

ds = ds.where((ds[var_lon] >= region[0]) & (ds[var_lon] <= region[1]) & (ds[var_lat] >= region[2]) & (ds[var_lat] <= region[3]))

Select data by range and value of some variables.

for key, value in range_select.items(): ds = ds.where((ds[key] >= value[0]) & (ds[key] <= value[1])) for key, value in value_select.items(): ds = ds.where(ds[key].isin(value)) ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8919/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2230357492 PR_kwDOAMm_X85r9wiV 8918 Bump codecov/codecov-action from 4.1.1 to 4.2.0 in the actions group dependabot[bot] 49699333 closed 0     0 2024-04-08T06:21:47Z 2024-04-08T16:31:12Z 2024-04-08T16:31:11Z CONTRIBUTOR   0 pydata/xarray/pulls/8918

Bumps the actions group with 1 update: codecov/codecov-action.

Updates codecov/codecov-action from 4.1.1 to 4.2.0

Release notes

Sourced from codecov/codecov-action's releases.

v4.2.0

What's Changed

  • chore(deps): update deps by @​thomasrockhu-codecov in codecov/codecov-action#1351
  • feat: allow for authentication via OIDC token by @​thomasrockhu-codecov in codecov/codecov-action#1330
  • fix: use_oidc shoudl be required false by @​thomasrockhu-codecov in codecov/codecov-action#1353

Full Changelog: https://github.com/codecov/codecov-action/compare/v4.1.1...v4.2.0

Commits
  • 7afa10e fix: use_oidc shoudl be required false (#1353)
  • d820d60 feat: allow for authentication via OIDC token (#1330)
  • 3a20752 chore(deps): update deps (#1351)
  • See full diff in compare view


Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore <dependency name> major version` will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself) - `@dependabot ignore <dependency name> minor version` will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself) - `@dependabot ignore <dependency name>` will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself) - `@dependabot unignore <dependency name>` will remove all of the ignore conditions of the specified dependency - `@dependabot unignore <dependency name> <ignore condition>` will remove the ignore condition of the specified dependency and ignore conditions
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8918/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2228373305 I_kwDOAMm_X86E0kc5 8915 Weird behavior of DataSet.where(... , drop=True) johannespletzer 22961670 closed 0     4 2024-04-05T16:03:05Z 2024-04-08T09:32:48Z 2024-04-08T09:32:48Z NONE      

What happened?

I work with an aircraft emission dataset that is freely available online: emission dataset

During my calculations I eventually convert the DataSet to a DataFrame. My motivation is to avoid unnecessary rows in the DataFrame. Doing some calculations my code returned unexpected results. Eventually I could narrow it down to a DataSet.where(... , drop=True) argument I added along the way, which introduces differences in the data. Here are two examples:

Example 1: Along some dimensions data points vanished if drop=True

Example 2: For other dimensions (these?) data points appeared elsewhere if drop=True

What did you expect to happen?

I expect for my calculations to return the same results, regardless of whether drop=True is active or not.

Minimal Complete Verifiable Example

```Python !wget "https://zenodo.org/records/10818082/files/Emission_Inventory_H2O_Optimized_v0.1_MR3_Fleet_BRU-MYA_2075.nc"

import matplotlib.pyplot as plt import xarray as xr

nc_file = xr.open_dataset('Emission_Inventory_H2O_Optimized_v0.1_MR3_Fleet_BRU-MYA_2075.nc')

fig, axs = plt.subplots(1,2,figsize=(10,4))

nc_file.H2O.where(nc_file.H2O!=0, drop=True).sum(('lon','time')).plot.contour(x='lat',ax=axs[0]) axs[0].set_xlim(-50,90) axs[0].set_title('With drop=True')

nc_file.H2O.where(nc_file.H2O!=0, drop=False).sum(('lon','time')).plot.contour(x='lat',ax=axs[1]) axs[1].set_xlim(-50,90) axs[1].set_title('With drop=False')

plt.tight_layout() plt.show()

fig, axs = plt.subplots(1,2,figsize=(10,4))

nc_file.H2O.where(nc_file.H2O!=0, drop=True).sum(('lat','time')).plot.contour(x='lon',ax=axs[0]) axs[0].set_title('With drop=True')

nc_file.H2O.where(nc_file.H2O!=0, drop=False).sum(('lat','time')).plot.contour(x='lon',ax=axs[1]) axs[1].set_title('With drop=False')

plt.tight_layout() plt.show() ```

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.9 | packaged by Anaconda, Inc. | (main, Mar 1 2023, 18:18:15) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 165 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: ('en_US', 'ISO8859-1') libhdf5: 1.14.0 libnetcdf: 4.9.2 xarray: 2022.11.0 pandas: 1.5.3 numpy: 1.23.5 scipy: 1.13.0 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.3 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.5 dask: None distributed: None matplotlib: 3.7.0 cartopy: 0.21.1 seaborn: 0.12.2 numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.6.3 pip: 22.3.1 conda: None pytest: None IPython: 8.10.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8915/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2227630565 I_kwDOAMm_X86ExvHl 8912 xarray and pyinstaller happymvd 135848029 closed 0     3 2024-04-05T10:14:44Z 2024-04-08T07:10:46Z 2024-04-05T20:26:48Z NONE      

What is your issue?

I am working on a Windows 11 computer with Python 3.11.9 installed as the only version of Python I am working in Virtual Studio Code I have created and activated a venv virtual environment for my project (not a conda one) I used the pip command to install xarray into the virtual environment I have a test script called Testing.py and the only thing it does is import xarray as xr and then it prints a message to the terminal window This works 100% in python in the VSC terminal window I then issue the following command pyinstaller --onefile Testing.py

This creates a file called Testing.exe for me When I run Testing.exe I get the following error message

**Traceback (most recent call last): File "importlib\metadata__init__.py", line 563, in from_name StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "Testing.py", line 20, in <module> File "PyInstaller\loader\pyimod02_importers.py", line 419, in exec_module File "xarray__init__.py", line 3, in <module> File "PyInstaller\loader\pyimod02_importers.py", line 419, in exec_module File "xarray\testing__init__.py", line 1, in <module> File "PyInstaller\loader\pyimod02_importers.py", line 419, in exec_module File "xarray\testing\assertions.py", line 11, in <module> File "PyInstaller\loader\pyimod02_importers.py", line 419, in exec_module File "xarray\core\duck_array_ops.py", line 36, in <module> File "PyInstaller\loader\pyimod02_importers.py", line 419, in exec_module File "xarray\core\dask_array_ops.py", line 3, in <module> File "PyInstaller\loader\pyimod02_importers.py", line 419, in exec_module File "xarray\core\nputils.py", line 14, in <module> File "xarray\namedarray\utils.py", line 60, in module_available File "importlib\metadata__init__.py", line 1009, in version File "importlib\metadata__init__.py", line 982, in distribution File "importlib\metadata__init__.py", line 565, in from_name importlib.metadata.PackageNotFoundError: No package metadata was found for numpy [15596] Failed to execute script 'Testing' due to unhandled exception!**

Please, is there anyone who can suggest what I need to do to get around this problem .... I need to distribute my project when it is complete

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8912/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2229440952 I_kwDOAMm_X86E4pG4 8917 from_array() got an unexpected keyword argument 'inline_array' Shiviiii23 29259305 closed 0     6 2024-04-06T22:09:47Z 2024-04-08T00:15:38Z 2024-04-08T00:15:37Z NONE      

What happened?

I got the error: from_array() got an unexpected keyword argument 'inline_array' for this line in xarray: data = da.from_array( data, chunks, name=name, lock=lock, inline_array=inline_array, **kwargs )

lines 1230 and 1231 in xarray/core/variable.py

What did you expect to happen?

No response

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8917/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  not_planned xarray 13221727 issue
2228266052 PR_kwDOAMm_X85r24hE 8913 Update hypothesis action to always save the cache dcherian 2448579 closed 0     0 2024-04-05T15:09:35Z 2024-04-05T16:51:05Z 2024-04-05T16:51:03Z MEMBER   0 pydata/xarray/pulls/8913

Update the cache always.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8913/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2215113392 PR_kwDOAMm_X85rJ3wR 8889 Add typing to test_plot.py Illviljan 14371165 closed 0     0 2024-03-29T10:49:39Z 2024-04-05T16:42:27Z 2024-04-05T16:42:27Z MEMBER   0 pydata/xarray/pulls/8889

Enforce typing on all tests in test_plot.py and add the remaining type hints.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8889/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2057651682 PR_kwDOAMm_X85i2Byx 8573 ddof vs correction kwargs in std/var TomNicholas 35968931 closed 0     0 2023-12-27T18:10:52Z 2024-04-04T16:46:55Z 2024-04-04T16:46:55Z MEMBER   0 pydata/xarray/pulls/8573
  • [x] Attempt to closes issue described in https://github.com/pydata/xarray/issues/8566#issuecomment-1870472827
  • [x] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8573/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2136709010 I_kwDOAMm_X85_W5eS 8753 Lazy Loading with `DataArray` vs. `Variable` dcherian 2448579 closed 0     0 2024-02-15T14:42:24Z 2024-04-04T16:46:54Z 2024-04-04T16:46:54Z MEMBER      

Discussed in https://github.com/pydata/xarray/discussions/8751

<sup>Originally posted by **ilan-gold** February 15, 2024</sup> My goal is to get a dataset from [custom io-zarr backend lazy-loaded](https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html#how-to-support-lazy-loading). But when I declare a `DataArray` based on the `Variable` which uses `LazilyIndexedArray`, everything is read in. Is this expected? I specifically don't want to have to use dask if possible. I have seen https://github.com/aurghs/xarray-backend-tutorial/blob/main/2.Backend_with_Lazy_Loading.ipynb but it's a little bit different. While I have a custom backend array inheriting from `ZarrArrayWrapper`, this example using `ZarrArrayWrapper` directly still highlights the same unexpected behavior of everything being read in. ```python import zarr import xarray as xr from tempfile import mkdtemp import numpy as np from pathlib import Path from collections import defaultdict class AccessTrackingStore(zarr.DirectoryStore): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self._access_count = {} self._accessed = defaultdict(set) def __getitem__(self, key): for tracked in self._access_count: if tracked in key: self._access_count[tracked] += 1 self._accessed[tracked].add(key) return super().__getitem__(key) def get_access_count(self, key): return self._access_count[key] def set_key_trackers(self, keys_to_track): if isinstance(keys_to_track, str): keys_to_track = [keys_to_track] for k in keys_to_track: self._access_count[k] = 0 def get_subkeys_accessed(self, key): return self._accessed[key] orig_path = Path(mkdtemp()) z = zarr.group(orig_path / "foo.zarr") z['array'] = np.random.randn(1000, 1000) store = AccessTrackingStore(orig_path / "foo.zarr") store.set_key_trackers(['array']) z = zarr.group(store) arr = xr.backends.zarr.ZarrArrayWrapper(z['array']) lazy_arr = xr.core.indexing.LazilyIndexedArray(arr) # just `.zarray` var = xr.Variable(('x', 'y'), lazy_arr) print('Variable read in ', store.get_subkeys_accessed('array')) # now everything is read in da = xr.DataArray(var) print('DataArray read in ', store.get_subkeys_accessed('array')) ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8753/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2136724736 PR_kwDOAMm_X85m_MtN 8754 Don't access data when creating DataArray from Variable. dcherian 2448579 closed 0     2 2024-02-15T14:48:32Z 2024-04-04T16:46:54Z 2024-04-04T16:46:53Z MEMBER   0 pydata/xarray/pulls/8754
  • [x] Closes #8753

This seems to have been around since 2016-ish, so presumably our backend code path is passing arrays around, not Variables.

cc @ilan-gold

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8754/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2220470499 I_kwDOAMm_X86EWbDj 8902 opening a zarr dataset taking so much time DarshanSP19 93967637 closed 0     10 2024-04-02T13:01:52Z 2024-04-04T07:49:51Z 2024-04-04T07:49:51Z NONE      

What is your issue?

I have an era5 dataset stored in GCS bucket as zarr. It contains 273 weather related variables and 4 dimensions. It's an hourly stored data from 1940 to 2023. When I try to open with ds = xr.open_zarr("gs://gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3") it takes 90 seconds to actually finish the open call. The chunk scheme is { 'time': 1 }.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8902/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2224300175 PR_kwDOAMm_X85rpG4S 8907 Trigger hypothesis stateful tests nightly dcherian 2448579 closed 0     0 2024-04-04T02:16:59Z 2024-04-04T02:17:49Z 2024-04-04T02:17:47Z MEMBER   0 pydata/xarray/pulls/8907
  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8907/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2098659175 PR_kwDOAMm_X85k-T6b 8658 Stateful tests with Dataset dcherian 2448579 closed 0     8 2024-01-24T16:34:59Z 2024-04-03T21:29:38Z 2024-04-03T21:29:36Z MEMBER   0 pydata/xarray/pulls/8658

I was curious to see if the hypothesis stateful testing would catch an inconsistent sequence of index manipulation operations like #8646. Turns out rename_vars is basically broken? (filed #8659) :P

PS: this blog post is amazing. E state = DatasetStateMachine() E state.assert_invariants() E > === E E <xarray.Dataset> E Dimensions: () E Data variables: E *empty* E === E E E > vars: ('1', '1_') E state.add_dim_coord(var=<xarray.Variable (1: 1)> E array([0], dtype=uint32)) E state.assert_invariants() E > === E E <xarray.Dataset> E Dimensions: (1: 1) E Coordinates: E * 1 (1) uint32 0 E Data variables: E 1_ (1) uint32 0 E === E E E > renaming 1 to 0 E state.rename_vars(newname='0') E state.assert_invariants() E > === E E <xarray.Dataset> E Dimensions: (1: 1) E Coordinates: E * 0 (1) uint32 0 E Dimensions without coordinates: 1 E Data variables: E 1_ (1) uint32 0 E === E E E state.teardown()

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8658/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2000205407 PR_kwDOAMm_X85fzupc 8467 [skip-ci] dev whats-new dcherian 2448579 closed 0     0 2023-11-18T03:59:29Z 2024-04-03T21:08:45Z 2023-11-18T15:20:37Z MEMBER   0 pydata/xarray/pulls/8467
  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8467/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1989233637 PR_kwDOAMm_X85fOdAk 8446 Remove PseudoNetCDF dcherian 2448579 closed 0     0 2023-11-12T04:29:50Z 2024-04-03T21:08:44Z 2023-11-13T21:53:56Z MEMBER   0 pydata/xarray/pulls/8446

joining the party - [x] User visible changes (including notable bug fixes) are documented in whats-new.rst

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8446/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 1,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2064698904 PR_kwDOAMm_X85jLHsQ 8584 Silence a bunch of CachingFileManager warnings dcherian 2448579 closed 0     1 2024-01-03T21:57:07Z 2024-04-03T21:08:27Z 2024-01-03T22:52:58Z MEMBER   0 pydata/xarray/pulls/8584  
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8584/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2102850331 PR_kwDOAMm_X85lMW8k 8674 Fix negative slicing of Zarr arrays dcherian 2448579 closed 0     0 2024-01-26T20:22:21Z 2024-04-03T21:08:26Z 2024-02-10T02:57:32Z MEMBER   0 pydata/xarray/pulls/8674

Closes #8252 Closes #3921

  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8674/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2148245262 PR_kwDOAMm_X85nmmqX 8777 Return a dataclass from Grouper.factorize dcherian 2448579 closed 0     0 2024-02-22T05:41:29Z 2024-04-03T21:08:25Z 2024-03-15T04:47:30Z MEMBER   0 pydata/xarray/pulls/8777

Toward #8510, builds on #8776

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8777/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2148164557 PR_kwDOAMm_X85nmU5w 8775 [skip-ci] NamedArray: Add lazy indexing array refactoring plan dcherian 2448579 closed 0     0 2024-02-22T04:25:49Z 2024-04-03T21:08:21Z 2024-02-23T22:20:09Z MEMBER   0 pydata/xarray/pulls/8775

This adds a proposal for decoupling the lazy indexing array machinery, indexing adapter machinery, and Variable's setitem and getitem methods, so that the latter can be migrated to NamedArray.

cc @andersy005

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8775/reactions",
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 2,
    "eyes": 0
}
    xarray 13221727 pull
2198991054 PR_kwDOAMm_X85qTNFP 8861 upstream-dev CI: Fix interp and cumtrapz dcherian 2448579 closed 0     0 2024-03-21T02:49:40Z 2024-04-03T21:08:17Z 2024-03-21T04:16:45Z MEMBER   0 pydata/xarray/pulls/8861
  • [x] xref #8844
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8861/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2206243581 I_kwDOAMm_X86DgJr9 8876 Possible race condition when appending to an existing zarr rsemlal-murmuration 157591329 closed 0     4 2024-03-25T16:59:52Z 2024-04-03T15:23:14Z 2024-03-29T14:35:52Z NONE      

What happened?

When appending to an existing zarr along a dimension (to_zarr(..., mode='a', append_dim="x" ,..)), if the dask chunking of the dataset to append does not align with the chunking of the existing zarr, the resulting consolidated zarr store may have NaNs instead of the actual values it is supposed to have.

What did you expect to happen?

We would expected that zarr append to have the same behaviour as if we concatenate dataset in memory (using concat) and write the whole result on a new zarr store in one go

Minimal Complete Verifiable Example

```Python from distributed import Client, LocalCluster import xarray as xr import tempfile

ds1 = xr.Dataset({"a": ("x", [1., 1.])}, coords={'x': [1, 2]}).chunk({"x": 3}) ds2 = xr.Dataset({"a": ("x", [1., 1., 1., 1.])}, coords={'x': [3, 4, 5, 6]}).chunk({"x": 3})

with Client(LocalCluster(processes=False, n_workers=1, threads_per_worker=2)): # The issue happens only when: threads_per_worker > 1 for i in range(0, 100): with tempfile.TemporaryDirectory() as store: print(store) ds1.to_zarr(store, mode="w") # write first dataset ds2.to_zarr(store, mode="a", append_dim="x") # append first dataset

        rez = xr.open_zarr(store).compute() # open consolidated dataset
        nb_values = rez.a.count().item(0) # count non NaN values
        if nb_values != 6:
            print("found NaNs:")
            print(rez.to_dataframe())
            break

```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Python /tmp/tmptg_pe6ox /tmp/tmpm7ncmuxd /tmp/tmpiqcgoiw2 /tmp/tmppma1ieo7 /tmp/tmpw5vi4cf0 /tmp/tmp1rmgwju0 /tmp/tmpm6tfswzi found NaNs: a x 1 1.0 2 1.0 3 1.0 4 1.0 5 1.0 6 NaN

Anything else we need to know?

The example code snippet provided here, reproduces the issue.

Since the issue occurs randomly, we loop in the example for a few times and stop when the issue occurs.

In the example, when ds1 is first written, since it only contains 2 values along the x dimension, the resulting .zarr store have the chunking: {'x': 2}, even though we called .chunk({"x": 3}).

Side note: This behaviour in itself is not problematic in this case, but the fact that the chunking is silently changed made this issue harder to spot.

However, when we try to append the second dataset ds2, that contains 4 values, the .chunk({"x": 3}) in the begining splits the dask array into 2 dask chunks, but in a way that does not align with zarr chunks.

Zarr chunks: + chunk1 : x: [1; 2] + chunk2 : x: [3; 4] + chunk3 : x: [5; 6]

Dask chunks for ds2: + chunk A: x: [3; 4; 5] + chunk B: x: [6]

Both dask chunks A and B, are supposed to write on zarr chunk3 And depending on who writes first, we can end up with NaN on x = 5 or x = 6 instead of actual values.

The issue obviously happens only when dask tasks are run in parallel. Using safe_chunks = True when calling to_zarr does not seem to help.

We couldn't figure out from the documentation how to detect this kind of issues, and how to prevent them from happening (maybe using a synchronizer?)

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0] python-bits: 64 OS: Linux OS-release: 5.15.133.1-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2024.2.0 pandas: 2.2.1 numpy: 1.26.4 scipy: 1.12.0 netCDF4: 1.6.5 pydap: None h5netcdf: 1.3.0 h5py: 3.10.0 Nio: None zarr: 2.17.1 cftime: 1.6.3 nc_time_axis: 1.4.1 iris: None bottleneck: 1.3.8 dask: 2024.3.1 distributed: 2024.3.1 matplotlib: 3.8.3 cartopy: None seaborn: 0.13.2 numbagg: 0.8.1 fsspec: 2024.3.1 cupy: None pint: None sparse: None flox: 0.9.5 numpy_groupies: 0.10.2 setuptools: 69.2.0 pip: 24.0 conda: None pytest: 8.1.1 mypy: None IPython: 8.22.2 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8876/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2171912634 PR_kwDOAMm_X85o3Ify 8809 Pass variable name to `encode_zarr_variable` slevang 39069044 closed 0     6 2024-03-06T16:21:53Z 2024-04-03T14:26:49Z 2024-04-03T14:26:48Z CONTRIBUTOR   0 pydata/xarray/pulls/8809
  • [x] Closes https://github.com/xarray-contrib/xeofs/issues/148
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst

The change from https://github.com/pydata/xarray/pull/8672 mostly fixed the issue of serializing a reset multiindex in the backends, but there was an additional niche issue that turned up in xeofs that was causing serialization to still fail on the zarr backend.

The issue is that zarr is the only backend that uses a custom version of encode_cf_variable called encode_zarr_variable, and the way this gets called we don't pass through the name of the variable before running ensure_not_multiindex.

As a minimal fix, this PR just passes name through as an additional arg to the general encode_variable function. See @benbovy's comment that maybe we should actually unwrap the level coordinate in reset_index and clean up the checks in ensure_not_multiindex, but I wasn't able to get that working easily.

The exact workflow this turned up in involves DataTree and looks like this: ```python import numpy as np import xarray as xr from datatree import DataTree

ND DataArray that gets stacked along a multiindex

da = xr.DataArray(np.ones((3, 3)), coords={"dim1": [1, 2, 3], "dim2": [4, 5, 6]}) da = da.stack(feature=["dim1", "dim2"])

Extract just the stacked coordinates for saving in a dataset

ds = xr.Dataset(data_vars={"feature": da.feature})

Reset the multiindex, which should make things serializable

ds = ds.reset_index("feature") dt1 = DataTree() dt2 = DataTree(name="feature", data=ds) dt1["foo"] = dt2

Somehow in this step, dt1.foo.feature.dim1.variable becomes an IndexVariable again

print(type(dt1.foo.feature.dim1.variable))

Works

dt1.to_netcdf("test.nc", mode="w")

Fails

dt1.to_zarr("test.zarr", mode="w") ```

But we can reproduce in xarray with the test added here.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8809/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2220487961 PR_kwDOAMm_X85rb6ea 8903 Update docstring for compute and persist saschahofmann 24508496 closed 0     2 2024-04-02T13:10:02Z 2024-04-03T07:45:10Z 2024-04-02T23:52:32Z CONTRIBUTOR   0 pydata/xarray/pulls/8903
  • Updates the docstring for persist to mention that it is not altering the original object.
  • Adds a return value to the docstring for compute and persist on both Dataset and DataArray

  • [x] Closes #8901

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8903/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2220228856 I_kwDOAMm_X86EVgD4 8901 Is .persist in place or like .compute? saschahofmann 24508496 closed 0     3 2024-04-02T11:09:59Z 2024-04-02T23:52:33Z 2024-04-02T23:52:33Z CONTRIBUTOR      

What is your issue?

I am playing around with using Dataset.persist and assumed it would work like .load. I also just looked at the source code and it looks to me like it should indeed replace the original data but I can see both in performance and the dask dashboard that steps are recomputed if I don't use the object returned by .persist which points me towards .persist behaving more like .compute.

In either case, I would make a PR to clarify in the docs whether persists leaves the original data untouched or not.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8901/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2218195713 PR_kwDOAMm_X85rUCVG 8898 Update reference to 'Weighted quantile estimators' AndreyAkinshin 2259237 closed 0     3 2024-04-01T12:49:36Z 2024-04-02T12:51:28Z 2024-04-01T15:42:19Z CONTRIBUTOR   0 pydata/xarray/pulls/8898  
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8898/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2215890029 I_kwDOAMm_X86EE8xt 8894 Rolling reduction with a custom function generates an excesive use of memory that kills the workers josephnowak 25071375 closed 0     8 2024-03-29T19:15:28Z 2024-04-01T20:57:59Z 2024-03-30T01:49:17Z CONTRIBUTOR      

What happened?

Hi, I have been trying to use a custom function on the rolling reduction method, the original function tries to filter the nan values (any numpy function that I have used that handles nans generates the same problem) to later apply some simple aggregate functions, but it is killing all my workers even when the data is very small (I have 7 workers and all of them have 3 Gb of RAM).

What did you expect to happen?

I would expect less use of memory taking into account the size of the rolling window, the simplicity of the function and the amount of data used on the example.

Minimal Complete Verifiable Example

```Python import numpy as np import dask.array as da import xarray as xr import dask

def f(x, axis): # If I replace np.nansum by np.sum everything works perfectly and the amount of memory used is very small return np.nansum(x, axis=axis)

arr = xr.DataArray( dask.array.zeros( shape=(300, 30000), dtype=float, chunks=(30, 6000) ), dims=["a", "b"], coords={"a": list(range(300)), "b": list(range(30000))} )

arr.rolling(a=252).reduce(f).chunk({"a": 252}).to_zarr("/data/test/test_write", mode="w") ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Python KilledWorker: Attempted to run task ('nansum-overlap-sum-aggregate-sum-aggregate-e732de6ad917d5f4084b05192ca671c4', 0, 0) on 4 different workers, but all those workers died while running it. The last worker that attempt to run the task was tcp://172.18.0.2:39937. Inspecting worker logs is often a good next step to diagnose what went wrong. For more information see https://distributed.dask.org/en/stable/killed.html.

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.7 | packaged by conda-forge | (main, Dec 23 2023, 14:43:09) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 4.14.275-207.503.amzn2.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.3 libnetcdf: None xarray: 2024.1.0 pandas: 2.2.1 numpy: 1.26.3 scipy: 1.11.4 netCDF4: None pydap: None h5netcdf: None h5py: 3.10.0 Nio: None zarr: 2.16.1 cftime: None nc_time_axis: None iris: None bottleneck: 1.3.7 dask: 2024.1.0 distributed: 2024.1.0 matplotlib: 3.8.2 cartopy: None seaborn: 0.13.1 numbagg: 0.7.0 fsspec: 2023.12.2 cupy: None pint: None sparse: 0.15.1 flox: 0.8.9 numpy_groupies: 0.10.2 setuptools: 69.0.3 pip: 23.3.2 conda: 23.11.0 pytest: 7.4.4 mypy: None IPython: 8.20.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8894/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2218712684 PR_kwDOAMm_X85rV1i2 8900 [pre-commit.ci] pre-commit autoupdate pre-commit-ci[bot] 66853113 closed 0     0 2024-04-01T17:27:26Z 2024-04-01T18:57:43Z 2024-04-01T18:57:42Z CONTRIBUTOR   0 pydata/xarray/pulls/8900

updates: - github.com/astral-sh/ruff-pre-commit: v0.2.0 → v0.3.4 - github.com/psf/black-pre-commit-mirror: 24.1.1 → 24.3.0 - github.com/pre-commit/mirrors-mypy: v1.8.0 → v1.9.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8900/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2217657228 PR_kwDOAMm_X85rSLKy 8896 Bump the actions group with 1 update dependabot[bot] 49699333 closed 0     0 2024-04-01T06:50:24Z 2024-04-01T18:02:56Z 2024-04-01T18:02:56Z CONTRIBUTOR   0 pydata/xarray/pulls/8896

Bumps the actions group with 1 update: codecov/codecov-action.

Updates codecov/codecov-action from 4.1.0 to 4.1.1

Release notes

Sourced from codecov/codecov-action's releases.

v4.1.1

What's Changed

  • build(deps): bump github/codeql-action from 3.24.5 to 3.24.6 by @​dependabot in codecov/codecov-action#1315
  • build(deps-dev): bump typescript from 5.3.3 to 5.4.2 by @​dependabot in codecov/codecov-action#1319
  • Removed mention of Mercurial by @​drazisil-codecov in codecov/codecov-action#1325
  • build(deps): bump github/codeql-action from 3.24.6 to 3.24.7 by @​dependabot in codecov/codecov-action#1332
  • build(deps): bump actions/checkout from 4.1.1 to 4.1.2 by @​dependabot in codecov/codecov-action#1331
  • fix: force version by @​thomasrockhu-codecov in codecov/codecov-action#1329
  • build(deps-dev): bump typescript from 5.4.2 to 5.4.3 by @​dependabot in codecov/codecov-action#1334
  • build(deps): bump undici from 5.28.2 to 5.28.3 by @​dependabot in codecov/codecov-action#1338
  • build(deps): bump github/codeql-action from 3.24.7 to 3.24.9 by @​dependabot in codecov/codecov-action#1341
  • fix: typo in disable_safe_directory by @​mkroening in codecov/codecov-action#1343
  • chore(release): 4.1.1 by @​thomasrockhu-codecov in codecov/codecov-action#1344

New Contributors

  • @​mkroening made their first contribution in codecov/codecov-action#1343

Full Changelog: https://github.com/codecov/codecov-action/compare/v4.1.0...v4.1.1

Changelog

Sourced from codecov/codecov-action's changelog.

4.0.0-beta.2

Fixes

  • #1085 not adding -n if empty to do-upload command

4.0.0-beta.1

v4 represents a move from the universal uploader to the Codecov CLI. Although this will unlock new features for our users, the CLI is not yet at feature parity with the universal uploader.

Breaking Changes

  • No current support for aarch64 and alpine architectures.
  • Tokenless uploading is unsuported
  • Various arguments to the Action have been removed

3.1.4

Fixes

  • #967 Fix typo in README.md
  • #971 fix: add back in working dir
  • #969 fix: CLI option names for uploader

Dependencies

  • #970 build(deps-dev): bump @​types/node from 18.15.12 to 18.16.3
  • #979 build(deps-dev): bump @​types/node from 20.1.0 to 20.1.2
  • #981 build(deps-dev): bump @​types/node from 20.1.2 to 20.1.4

3.1.3

Fixes

  • #960 fix: allow for aarch64 build

Dependencies

  • #957 build(deps-dev): bump jest-junit from 15.0.0 to 16.0.0
  • #958 build(deps): bump openpgp from 5.7.0 to 5.8.0
  • #959 build(deps-dev): bump @​types/node from 18.15.10 to 18.15.12

3.1.2

Fixes

  • #718 Update README.md
  • #851 Remove unsupported path_to_write_report argument
  • #898 codeql-analysis.yml
  • #901 Update README to contain correct information - inputs and negate feature
  • #955 fix: add in all the extra arguments for uploader

Dependencies

  • #819 build(deps): bump openpgp from 5.4.0 to 5.5.0
  • #835 build(deps): bump node-fetch from 3.2.4 to 3.2.10
  • #840 build(deps): bump ossf/scorecard-action from 1.1.1 to 2.0.4
  • #841 build(deps): bump @​actions/core from 1.9.1 to 1.10.0
  • #843 build(deps): bump @​actions/github from 5.0.3 to 5.1.1
  • #869 build(deps): bump node-fetch from 3.2.10 to 3.3.0
  • #872 build(deps-dev): bump jest-junit from 13.2.0 to 15.0.0
  • #879 build(deps): bump decode-uri-component from 0.2.0 to 0.2.2

... (truncated)

Commits
  • c16abc2 chore(release): 4.1.1 (#1344)
  • 3e33441 fix: typo in disable_safe_directory (#1343)
  • 85aacc9 build(deps): bump github/codeql-action from 3.24.7 to 3.24.9 (#1341)
  • 4ea9be0 build(deps): bump undici from 5.28.2 to 5.28.3 (#1338)
  • 164fade build(deps-dev): bump typescript from 5.4.2 to 5.4.3 (#1334)
  • 4621ecc fix: force version (#1329)
  • 251ba34 build(deps): bump actions/checkout from 4.1.1 to 4.1.2 (#1331)
  • 5a593a5 build(deps): bump github/codeql-action from 3.24.6 to 3.24.7 (#1332)
  • a15c0e4 Removed mention of Mercurial (#1325)
  • 8be6ba5 build(deps-dev): bump typescript from 5.3.3 to 5.4.2 (#1319)
  • Additional commits viewable in compare view


Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore <dependency name> major version` will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself) - `@dependabot ignore <dependency name> minor version` will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself) - `@dependabot ignore <dependency name>` will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself) - `@dependabot unignore <dependency name>` will remove all of the ignore conditions of the specified dependency - `@dependabot unignore <dependency name> <ignore condition>` will remove the ignore condition of the specified dependency and ignore conditions
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8896/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2218574880 PR_kwDOAMm_X85rVXJC 8899 New empty whatsnew entry TomNicholas 35968931 closed 0     0 2024-04-01T16:04:27Z 2024-04-01T17:49:09Z 2024-04-01T17:49:06Z MEMBER   0 pydata/xarray/pulls/8899

Should have been done as part of the last release https://github.com/pydata/xarray/releases/tag/v2024.03.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8899/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2215539648 PR_kwDOAMm_X85rLW_p 8891 2024.03.0: Add whats-new dcherian 2448579 closed 0     0 2024-03-29T15:01:35Z 2024-03-29T17:07:19Z 2024-03-29T17:07:17Z MEMBER   0 pydata/xarray/pulls/8891  
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8891/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2215324218 PR_kwDOAMm_X85rKmW7 8890 Add typing to test_groupby.py Illviljan 14371165 closed 0     1 2024-03-29T13:13:59Z 2024-03-29T16:38:17Z 2024-03-29T16:38:16Z MEMBER   0 pydata/xarray/pulls/8890

Enforce typing on all tests in test_groupby.py and add the remaining type hints.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8890/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2203250238 PR_kwDOAMm_X85qh2s8 8867 Avoid in-place multiplication of a large value to an array with small integer dtype Illviljan 14371165 closed 0     3 2024-03-22T20:22:22Z 2024-03-29T15:26:38Z 2024-03-29T15:26:38Z MEMBER   0 pydata/xarray/pulls/8867

Upstream numpy has become a bit more particular with which types you can use for inplace operations. This PR fixes ``` __ TestImshow.test_imshow_rgb_values_in_valid_range __

self = <xarray.tests.test_plot.TestImshow object at 0x7f88320c2780>

def test_imshow_rgb_values_in_valid_range(self) -> None:
    da = DataArray(np.arange(75, dtype="uint8").reshape((5, 5, 3)))
    _, ax = plt.subplots()
  out = da.plot.imshow(ax=ax).get_array()

/home/runner/work/xarray/xarray/xarray/tests/test_plot.py:2034:


/home/runner/work/xarray/xarray/xarray/plot/accessor.py:421: in imshow return dataarray_plot.imshow(self._da, args, kwargs) /home/runner/work/xarray/xarray/xarray/plot/dataarray_plot.py:1601: in newplotfunc primitive = plotfunc( /home/runner/work/xarray/xarray/xarray/plot/dataarray_plot.py:1853: in imshow alpha = 255


self = masked_array( data=[[[1], [1], [1], [1], [1]],

    [[1],
     [1],
...,
     [1],
     [1],
     [1],
     [1]]],

mask=False, fill_value=np.int64(999999), dtype=uint8) other = 255

def __imul__(self, other):
    """
    Multiply self by other in-place.

    """
    m = getmask(other)
    if self._mask is nomask:
        if m is not nomask and m.any():
            self._mask = make_mask_none(self.shape, self.dtype)
            self._mask += m
    elif m is not nomask:
        self._mask += m
    other_data = getdata(other)
    other_data = np.where(self._mask, other_data.dtype.type(1), other_data)
  self._data.__imul__(other_data)

E numpy._core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'multiply' output from dtype('int64') to dtype('uint8') with casting rule 'same_kind'

/home/runner/micromamba/envs/xarray-tests/lib/python3.12/site-packages/numpy/ma/core.py:4415: UFuncTypeError ```

Some curious behaviors seen while debugging: ```python alpha = np.array([1], dtype=np.int8) alpha *= 255 repr(alpha) # 'array([-1], dtype=int8)'

alpha = np.array([1], dtype=np.int16) alpha *= 255 repr(alpha) # 'array([255], dtype=int16)' ```

xref: #8844

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8867/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Next page

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 501.567ms · About: xarray-datasette