github: issues: 1 row where comments = 4 and "updated_at" is on date 2024-02-07 sorted by updated

1 row where comments = 4 and "updated_at" is on date 2024-02-07 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at ▲	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
2115049090	I_kwDOAMm_X85-ERaC	8694	Error while saving an altered dataset to NetCDF when loaded from a file	tarik 12544636	open	0			4	2024-02-02T14:18:03Z	2024-02-07T13:38:40Z		NONE				What happened? When attempting to save an altered Xarray dataset to a NetCDF file using the `to_netcdf` method, an error occurs if the original dataset is loaded from a file. Specifically, this error does not occur when the dataset is created directly but only when it is loaded from a file. What did you expect to happen? The altered Xarray dataset is saved as a NetCDF file using the `to_netcdf` method. Minimal Complete Verifiable Example ```Python import xarray as xr ds = xr.Dataset( data_vars=dict( win_1=("attempt", [True, False, True, False, False, True]), win_2=("attempt", [False, True, False, True, False, False]), ), coords=dict( attempt=[1, 2, 3, 4, 5, 6], player_1=("attempt", ["paper", "paper", "scissors", "scissors", "paper", "paper"]), player_2=("attempt", ["rock", "scissors", "paper", "rock", "paper", "rock"]), ) ) ds.to_netcdf("dataset.nc") ds_from_file = xr.load_dataset("dataset.nc") ds_altered = ds_from_file.where(ds_from_file["player_1"] == "paper", drop=True) ds_altered.to_netcdf("dataset_altered.nc") ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output Python Traceback (most recent call last): File "example.py", line 20, in <module> ds_altered.to_netcdf("dataset_altered.nc") File ".../python3.9/site-packages/xarray/core/dataset.py", line 2303, in to_netcdf return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( File ".../python3.9/site-packages/xarray/backends/api.py", line 1315, in to_netcdf dump_to_store( File ".../python3.9/site-packages/xarray/backends/api.py", line 1362, in dump_to_store store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File ".../python3.9/site-packages/xarray/backends/common.py", line 356, in store self.set_variables( File ".../python3.9/site-packages/xarray/backends/common.py", line 398, in set_variables writer.add(source, target) File ".../python3.9/site-packages/xarray/backends/common.py", line 243, in add target[...] = source File ".../python3.9/site-packages/xarray/backends/scipy_.py", line 78, in __setitem__ data[key] = value File ".../python3.9/site-packages/scipy/io/_netcdf.py", line 1019, in __setitem__ self.data[index] = data ValueError: could not broadcast input array from shape (4,5) into shape (4,8) Anything else we need to know? Findings: The issue is related to the encoding information of the dataset becoming invalid after filtering data with the `where` method. The `to_netcdf` method takes the available encoding information instead of considering the actual shape of the data. In the provided examples, the maximum length of strings stored in "player_1" and "player_2" is originally set to 8 characters. However, after filtering with the `where` method, the maximum length of the string becomes 5 in "player_1" and remains 8 in "player_2.". But the encoding information of the variables still shows a length of 8, particularly the attribute `char_dim_name`. Workaround: A workaround to resolve this issue is to call the `drop_encoding` method on the dataset before saving it with `to_netcdf`. This action ensures that the encoding information is not available, and the `to_netcdf` method is forced to take the actual shapes of the data, preventing the broadcasting error. Environment INSTALLED VERSIONS ------------------ commit: None python: 3.9.14 (main, Aug 24 2023, 14:01:46) [GCC 11.4.0] python-bits: 64 OS: Linux OS-release: 6.3.1-060301-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2024.1.1 pandas: 2.2.0 numpy: 1.26.3 scipy: 1.12.0 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 69.0.3 pip: 23.3.2 conda: None pytest: None mypy: None IPython: None sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8694/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }			xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

1 row where comments = 4 and "updated_at" is on date 2024-02-07 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

Advanced export