home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

1 row where comments = 4 and "updated_at" is on date 2024-02-07 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 1

  • issue 1

state 1

  • open 1

repo 1

  • xarray 1
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2115049090 I_kwDOAMm_X85-ERaC 8694 Error while saving an altered dataset to NetCDF when loaded from a file tarik 12544636 open 0     4 2024-02-02T14:18:03Z 2024-02-07T13:38:40Z   NONE      

What happened?

When attempting to save an altered Xarray dataset to a NetCDF file using the to_netcdf method, an error occurs if the original dataset is loaded from a file. Specifically, this error does not occur when the dataset is created directly but only when it is loaded from a file.

What did you expect to happen?

The altered Xarray dataset is saved as a NetCDF file using the to_netcdf method.

Minimal Complete Verifiable Example

```Python import xarray as xr

ds = xr.Dataset( data_vars=dict( win_1=("attempt", [True, False, True, False, False, True]), win_2=("attempt", [False, True, False, True, False, False]), ), coords=dict( attempt=[1, 2, 3, 4, 5, 6], player_1=("attempt", ["paper", "paper", "scissors", "scissors", "paper", "paper"]), player_2=("attempt", ["rock", "scissors", "paper", "rock", "paper", "rock"]), ) ) ds.to_netcdf("dataset.nc")

ds_from_file = xr.load_dataset("dataset.nc")

ds_altered = ds_from_file.where(ds_from_file["player_1"] == "paper", drop=True) ds_altered.to_netcdf("dataset_altered.nc") ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Python Traceback (most recent call last): File "example.py", line 20, in <module> ds_altered.to_netcdf("dataset_altered.nc") File ".../python3.9/site-packages/xarray/core/dataset.py", line 2303, in to_netcdf return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( File ".../python3.9/site-packages/xarray/backends/api.py", line 1315, in to_netcdf dump_to_store( File ".../python3.9/site-packages/xarray/backends/api.py", line 1362, in dump_to_store store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File ".../python3.9/site-packages/xarray/backends/common.py", line 356, in store self.set_variables( File ".../python3.9/site-packages/xarray/backends/common.py", line 398, in set_variables writer.add(source, target) File ".../python3.9/site-packages/xarray/backends/common.py", line 243, in add target[...] = source File ".../python3.9/site-packages/xarray/backends/scipy_.py", line 78, in __setitem__ data[key] = value File ".../python3.9/site-packages/scipy/io/_netcdf.py", line 1019, in __setitem__ self.data[index] = data ValueError: could not broadcast input array from shape (4,5) into shape (4,8)

Anything else we need to know?

Findings:

The issue is related to the encoding information of the dataset becoming invalid after filtering data with the where method. The to_netcdf method takes the available encoding information instead of considering the actual shape of the data.

In the provided examples, the maximum length of strings stored in "player_1" and "player_2" is originally set to 8 characters. However, after filtering with the where method, the maximum length of the string becomes 5 in "player_1" and remains 8 in "player_2.". But the encoding information of the variables still shows a length of 8, particularly the attribute char_dim_name.

Workaround:

A workaround to resolve this issue is to call the drop_encoding method on the dataset before saving it with to_netcdf. This action ensures that the encoding information is not available, and the to_netcdf method is forced to take the actual shapes of the data, preventing the broadcasting error.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.14 (main, Aug 24 2023, 14:01:46) [GCC 11.4.0] python-bits: 64 OS: Linux OS-release: 6.3.1-060301-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2024.1.1 pandas: 2.2.0 numpy: 1.26.3 scipy: 1.12.0 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 69.0.3 pip: 23.3.2 conda: None pytest: None mypy: None IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8694/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 571.057ms · About: xarray-datasette