github: issues: 2 rows where repo = 13221727, state = "open" and user = 6574622 sorted by updated

2 rows where repo = 13221727, state = "open" and user = 6574622 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at ▲	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
922804256	MDU6SXNzdWU5MjI4MDQyNTY=	5475	Is `_FillValue` really the same as zarr's `fill_value`?	d70-t 6574622	open	0			2	2021-06-16T16:03:21Z	2024-04-02T08:17:23Z		CONTRIBUTOR				The zarr backend uses the `fill_value` of zarrs `.zarray` key as if it would be the `_FillValue` according to CF-Conventions: https://github.com/pydata/xarray/blob/1a7b285be676d5404a4140fc86e8756de75ee7ac/xarray/backends/zarr.py#L373 I think this interpretation of the `fill_value` is wrong and creates problems. Here's why: The zarr v2 spec is still a little vague, but states that `fill_value` is A scalar value providing the default value to use for uninitialized portions of the array, or null if no fill_value is to be used. Accordingly this value should be used to fill all areas of a variable which are not backed by a stored chunk with this value. This is also different from what CF conventions state (emphasis mine): The scalar attribute with the name `_FillValue` and of the same type as its variable is recognized by the netCDF library as the value used to pre-fill disk space allocated to the variable. This value is considered to be a special value that indicates undefined or missing data, and is returned when reading values that were not written. The difference between the two is, that `fill_value` is only a background value, which just isn't stored as a chunk. But `_FillValue` is (possibly) a background value and is interpreted as not being valid data. In my opinion, this mix of `_FillValue` and `missing_value` could be considered a defect in the CF-Conventions, but probably that's far to late as many depend on this. Thinking of an example, when storing a density field (i.e. water droplets forming clouds) in a zarr dataset, it might be perfectly valid to set the `fill_value` to `0` and then store only chunks in regions of the atmosphere where clouds are actually present. In that case, `0` (i.e. no drops) would be a perfectly valid value, which just isn't stored. As most parts of the atmosphere are indeed cloud-free, this may save quite a bunch of storage. Other formats (e.g. OpenVDB) commonly use this trick. The issue gets worse when looking into the upcoming zarr v3 spec where `fill_value` is described as: Provides an element value to use for uninitialised portions of the Zarr array. If the data type of the Zarr array is Boolean then the value must be the literal `false` or `true`. If the data type is one of the integer data types defined in this specification, then the value must be a number with no fraction or exponent part and must be within the range of the data type. For any data type, if the `fill_value` is the literal `null` then the fill value is undefined and the implementation may use any arbitrary value that is consistent with the data type as the fill value. [...] Thus for boolean arrays, if the `fill_value` would be interpreted as a missing value indicator, only (missing, `True`) or (`False`, missing) arrays could be represented. A (`False`, `True`) array would not be possible. The issue applies similarly for integer types as well.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5475/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }			xarray 13221727	issue
1159923690	I_kwDOAMm_X85FIwfq	6329	`to_zarr` with append or region mode and `_FillValue` doesnt work	d70-t 6574622	open	0			17	2022-03-04T18:21:32Z	2023-03-17T16:14:30Z		CONTRIBUTOR				What happened? `python import numpy as np import xarray as xr ds = xr.Dataset({"a": ("x", [3.], {"_FillValue": np.nan})}) m = {} ds.to_zarr(m) ds.to_zarr(m, append_dim="x")` raises `ValueError: failed to prevent overwriting existing key _FillValue in attrs. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually.` What did you expect to happen? I'd expect this to just work (effectively concatenating the dataset to itself). Anything else we need to know? appears also for `region` writes The same issue appears for region writes as in: `python import numpy as np import dask.array as da import xarray as xr ds = xr.Dataset({"a": ("x", da.array([3.,4.]), {"_FillValue": np.nan})}) m = {} ds.to_zarr(m, compute=False, encoding={"a": {"chunks": (1,)}}) ds.isel(x=slice(0,1)).to_zarr(m, region={"x": slice(0,1)})` raises `ValueError: failed to prevent overwriting existing key _FillValue in attrs. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually.` there's a workaround The workaround (deleting the `_FillValue` in subsequent writes): `python m = {} ds.to_zarr(m) del ds.a.attrs["_FillValue"] ds.to_zarr(m, append_dim="x")` seems to do the trick. There are indications that the result might still be broken, but it's not yet clear how to reproduce them (see comments below). This issue has been split off from #6069 Environment INSTALLED VERSIONS ------------------ commit: None python: 3.9.10 (main, Jan 15 2022, 11:48:00) [Clang 13.0.0 (clang-1300.0.29.3)] python-bits: 64 OS: Darwin OS-release: 20.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: ('de_DE', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.20.1 pandas: 1.2.0 numpy: 1.21.2 scipy: 1.6.2 netCDF4: 1.5.8 pydap: installed h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.11.0 cftime: 1.3.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.10 cfgrib: None iris: None bottleneck: None dask: 2021.11.1 distributed: 2021.11.1 matplotlib: 3.4.1 cartopy: 0.20.1 seaborn: 0.11.1 numbagg: None fsspec: 2021.11.1 cupy: None pint: 0.17 sparse: 0.13.0 setuptools: 60.5.0 pip: 21.3.1 conda: None pytest: 6.2.2 IPython: 8.0.0.dev sphinx: 3.5.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6329/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }			xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

2 rows where repo = 13221727, state = "open" and user = 6574622 sorted by updated_at descending

What happened?

What did you expect to happen?

Anything else we need to know?

appears also for region writes

there's a workaround

Advanced export

appears also for `region` writes