home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

3 rows where state = "open", type = "issue" and user = 206773 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date)

type 1

  • issue · 3 ✖

state 1

  • open · 3 ✖

repo 1

  • xarray 3
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
762323609 MDU6SXNzdWU3NjIzMjM2MDk= 4681 Uncompressed Zarr arrays can no longer be written to Zarr forman 206773 open 0     2 2020-12-11T13:02:28Z 2023-10-24T23:08:35Z   NONE      

What happened:

We create xarray.Dataset instances using xr.open_zarr(store) with custom chunk store instances. These will lazily fetch data chunks for data variables from the Sentinel Hub API. For coordinate variables lon, lat, time we use "static" store entries: uncompressed, bytified numpy arrays.

Since xarray 0.16.2 and Zarr 2.6.1 this approach doesnt work anymore. When we write datasets opened from such store using xr.to_zarr(dst_store), e.g. with a dst_store=s3fs.S3Map(), we get encoding errors. E.g. for a coordinate array lon we get from botocore:

Invalid type for parameter Body, value: [55.0475 55.0465 55.0455 ... 53.0025 53.0015 53.0005], type: <class 'numpy.ndarray'>, valid types: <class 'bytes'>, <class 'bytearray'>, file-like object

(Full traceback is below.) It seems that our static numpy arrays won't be encoded at all, because they are uncompressed. If we use a compressor, it works again. (That's our current workaround.)

What you expected to happen:

Before data is written into a Zarr chunk store, it must be encoded from numpy arrays to bytes. This does not seem to happen if uncompressed data is written, that is, the the Zarr encoding's compressor and filters are both None.

Minimal Complete Verifiable Example:

A minimal, self-contained example is the entire test module test_reprod_27.py of the xcube Sentinel Hub plugin xcube-sh.

Original issue in the Sentinel Hub xcube plugin is xcube-sh #27.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Nov 27 2020, 18:58:29) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: de_DE.cp1252 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.1.5 numpy: 1.19.4 scipy: 1.5.3 netCDF4: 1.5.5 pydap: installed h5netcdf: None h5py: None Nio: None zarr: 2.6.1 cftime: 1.3.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.5 cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: 2.30.1 matplotlib: 3.3.3 cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20201009 pip: 20.3.1 conda: None pytest: 6.1.2 IPython: 7.19.0 sphinx: 3.3.1

Traceback:

traceback:

``` File "D:\Projects\xcube\xcube\cli_gen2\write.py", line 47, in write_cube data_id = writer.write_data(cube, File "D:\Projects\xcube\xcube\core\store\stores\s3.py", line 213, in write_data self._new_s3_writer(writer_id).write_data(data, data_id=path, replace=replace, write_params) File "D:\Projects\xcube\xcube\core\store\accessors\dataset.py", line 313, in write_data data.to_zarr(s3fs.S3Map(root=f'{bucket_name}/{data_id}' if bucket_name else data_id, File "D:\Miniconda3\envs\xcube\lib\site-packages\xarray\core\dataset.py", line 1745, in to_zarr return to_zarr( File "D:\Miniconda3\envs\xcube\lib\site-packages\xarray\backends\api.py", line 1481, in to_zarr dump_to_store(dataset, zstore, writer, encoding=encoding) File "D:\Miniconda3\envs\xcube\lib\site-packages\xarray\backends\api.py", line 1158, in dump_to_store store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File "D:\Miniconda3\envs\xcube\lib\site-packages\xarray\backends\zarr.py", line 473, in store self.set_variables( File "D:\Miniconda3\envs\xcube\lib\site-packages\xarray\backends\zarr.py", line 549, in set_variables writer.add(v.data, zarr_array, region) File "D:\Miniconda3\envs\xcube\lib\site-packages\xarray\backends\common.py", line 143, in add target[region] = source File "D:\Miniconda3\envs\xcube\lib\site-packages\zarr\core.py", line 1122, in setitem self.set_basic_selection(selection, value, fields=fields) File "D:\Miniconda3\envs\xcube\lib\site-packages\zarr\core.py", line 1217, in set_basic_selection return self._set_basic_selection_nd(selection, value, fields=fields) File "D:\Miniconda3\envs\xcube\lib\site-packages\zarr\core.py", line 1508, in _set_basic_selection_nd self._set_selection(indexer, value, fields=fields) File "D:\Miniconda3\envs\xcube\lib\site-packages\zarr\core.py", line 1580, in _set_selection self._chunk_setitems(lchunk_coords, lchunk_selection, chunk_values, File "D:\Miniconda3\envs\xcube\lib\site-packages\zarr\core.py", line 1709, in _chunk_setitems self.chunk_store.setitems({k: v for k, v in zip(ckeys, cdatas)}) File "D:\Miniconda3\envs\xcube\lib\site-packages\fsspec\mapping.py", line 110, in setitems self.fs.pipe(values) File "D:\Miniconda3\envs\xcube\lib\site-packages\fsspec\asyn.py", line 121, in wrapper return maybe_sync(func, self, args, kwargs) File "D:\Miniconda3\envs\xcube\lib\site-packages\fsspec\asyn.py", line 100, in maybe_sync return sync(loop, func, args, kwargs) File "D:\Miniconda3\envs\xcube\lib\site-packages\fsspec\asyn.py", line 71, in sync raise exc.with_traceback(tb) File "D:\Miniconda3\envs\xcube\lib\site-packages\fsspec\asyn.py", line 55, in f result[0] = await future File "D:\Miniconda3\envs\xcube\lib\site-packages\fsspec\asyn.py", line 211, in _pipe await asyncio.gather( File "D:\Miniconda3\envs\xcube\lib\site-packages\s3fs\core.py", line 608, in _pipe_file return await self._call_s3( File "D:\Miniconda3\envs\xcube\lib\site-packages\s3fs\core.py", line 225, in _call_s3 raise translate_boto_error(err) from err File "D:\Miniconda3\envs\xcube\lib\site-packages\s3fs\core.py", line 207, in _call_s3 return await method(additional_kwargs) File "D:\Miniconda3\envs\xcube\lib\site-packages\aiobotocore\client.py", line 123, in _make_api_call request_dict = await self._convert_to_request_dict( File "D:\Miniconda3\envs\xcube\lib\site-packages\aiobotocore\client.py", line 171, in _convert_to_request_dict request_dict = self._serializer.serialize_to_request( File "D:\Miniconda3\envs\xcube\lib\site-packages\botocore\validate.py", line 297, in serialize_to_request raise ParamValidationError(report=report.generate_report())

Invalid type for parameter Body, value: [55.0475 55.0465 55.0455 ... 53.0025 53.0015 53.0005], type: <class 'numpy.ndarray'>, valid types: <class 'bytes'>, <class 'bytearray'>, file-like object ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4681/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1226272301 I_kwDOAMm_X85JF24t 6573 32- vs 64-bit coordinates coordinates in where() forman 206773 open 0     6 2022-05-05T06:57:36Z 2022-09-28T08:17:09Z   NONE      

What happened?

I'm struggling whether this is a bug or not. At least I faced a very unexpected behaviour.

For two given data arrays a and b with same dimensions and equal coordinates, c for c = a.where(b) should have equal dimensions and coordinates.

However if the coordinates of a have dtype of float32 and those of b are float64, then the dimension sizes of c will always be two. Of course, this way the coordinates of a and b are no longer exactly equal, but from a user perspective they represent the same labels.

The behaviour is likely caused by the fact that the indexes generated for the coordinates are no longer strictly equal, therefore where() picks only the two outer cells of each dimension. Allowing to explicitly pass indexes may help here, see #6392.

What did you expect to happen?

In the case described above, the dimensions and coordinates of c should be equal to a (and b).

Minimal Complete Verifiable Example

```Python import numpy as np import xarray as xr

c32 = xr.DataArray(np.linspace(0, 1, 10, dtype=np.float32), dims='x') c64 = xr.DataArray(np.linspace(0, 1, 10, dtype=np.float64), dims='x')

c3 = c32.where(c64 > 0.5) assert len(c32) == len(c3)

v32 = xr.DataArray(np.random.random(10), dims='x', coords=dict(x=c32)) v64 = xr.DataArray(np.random.random(10), dims='x', coords=dict(x=c64))

v3 = v32.where(v64 > 0.5) assert len(v32) == len(v3)

--> Assertion error, Expected :10, Actual :2

```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:17:03) [MSC v.1929 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: AMD64 Family 25 Model 80 Stepping 0, AuthenticAMD byteorder: little LC_ALL: None LANG: None LOCALE: ('de_DE', 'cp1252') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.3.0 pandas: 1.4.2 numpy: 1.21.6 scipy: 1.8.0 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.11.3 cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.10 cfgrib: None iris: None bottleneck: None dask: 2022.04.1 distributed: 2022.4.1 matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: None fsspec: 2022.3.0 cupy: None pint: None sparse: None setuptools: 62.1.0 pip: 22.0.4 conda: None pytest: 7.1.2 IPython: 8.2.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6573/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
906748201 MDU6SXNzdWU5MDY3NDgyMDE= 5405 Control CF-encoding in to_zarr() forman 206773 open 0     2 2021-05-30T12:57:40Z 2021-06-23T15:47:32Z   NONE      

Is your feature request related to a problem? Please describe.

I believe, xarray's dataset.to_zarr() is somewhat inconsitent between creating variables and appending data to existing variables: When creating variables it can deal with writing already encoded data. When appending, it expects decoded data.

When appending data, xarray will always CF-encode variable data according to encoding information of existing variables before it appends new data. This is fine if data to be appended is decoded, but if the data to be appended is already encoded (e.g. because it was previously read by dataset = xr.open_dataset(..., decode_cf=False)) then this leads to entirely corrupt data.

See also xarray issue #5263 and my actual problem described in https://github.com/bcdev/nc2zarr/issues/35.

Describe the solution you'd like

A possible hack is to redundantly use dataset = decode_cf(dataset) before appending so encoding it again is finally a no-op, as described in #5263. This of course also costs extra CPU for a useless computation.

I'd like to control whether encoding of data shall take place when appending. If I already have encoded data, I'd like to call encoded_dataset.to_zarr(..., append_dim='time', encode_cf=False).

For example, when I uncomment line 469 in xarray/backends/zarr.py, then this fixes this issue too:

https://github.com/pydata/xarray/blob/1b4412eeb7011f53932779e1d7c3534163aedd63/xarray/backends/zarr.py#L460-L471

Minimal Complete Verifiable Example:

Here is a test that explains the observed inconsistency.

```python import shutil import unittest

import numpy as np import xarray as xr import zarr

SRC_DS_1_PATH = 'src_ds_1.zarr' SRC_DS_2_PATH = 'src_ds_2.zarr' DST_DS_PATH = 'dst_ds.zarr'

class XarrayToZarrAppendInconsistencyTest(unittest.TestCase): @classmethod def del_paths(cls): for path in (SRC_DS_1_PATH, SRC_DS_2_PATH, DST_DS_PATH): shutil.rmtree(path, ignore_errors=True)

def setUp(self):
    self.del_paths()

    scale_factor = 0.0001
    self.v_values_encoded = np.array([[0, 10000, 15000, 20000]], dtype=np.uint16)
    self.v_values_decoded = np.array([[np.nan, 1., 1.5, 2.]], dtype=np.float32)

    # The variable for the two source datasets
    v = xr.DataArray(self.v_values_encoded,
                     dims=('t', 'x'),
                     attrs=dict(scale_factor=scale_factor, _FillValue=0))

    # Create two source datasets
    src_ds = xr.Dataset(data_vars=dict(v=v))
    src_ds.to_zarr(SRC_DS_1_PATH)
    src_ds.to_zarr(SRC_DS_2_PATH)

    # Assert we have written encoded data
    a1 = zarr.convenience.open_array(SRC_DS_1_PATH + '/v')
    a2 = zarr.convenience.open_array(SRC_DS_2_PATH + '/v')
    np.testing.assert_equal(a1, self.v_values_encoded)  # succeeds
    np.testing.assert_equal(a2, self.v_values_encoded)  # succeeds

    # Assert we correctly decode data
    src_ds_1 = xr.open_zarr(SRC_DS_1_PATH, decode_cf=True)
    src_ds_2 = xr.open_zarr(SRC_DS_2_PATH, decode_cf=True)
    np.testing.assert_equal(src_ds_1.v.data, self.v_values_decoded)  # succeeds
    np.testing.assert_equal(src_ds_2.v.data, self.v_values_decoded)  # succeeds

def tearDown(self):
    self.del_paths()

def test_decode_cf_true(self):
    """
    This test succeeds.
    """
    # Open the two source datasets
    src_ds_1 = xr.open_zarr(SRC_DS_1_PATH, decode_cf=True)
    src_ds_2 = xr.open_zarr(SRC_DS_2_PATH, decode_cf=True)
    # Expect data is decoded
    np.testing.assert_equal(src_ds_1.v.data, self.v_values_decoded)  # succeeds
    np.testing.assert_equal(src_ds_2.v.data, self.v_values_decoded)  # succeeds

    # Write 1st source datasets to new dataset, append the 2nd source
    src_ds_1.to_zarr(DST_DS_PATH, mode='w-')
    src_ds_2.to_zarr(DST_DS_PATH, append_dim='t')

    # Open the new dataset
    dst_ds = xr.open_zarr(DST_DS_PATH, decode_cf=True)
    dst_ds_1 = dst_ds.isel(t=slice(0, 1))
    dst_ds_2 = dst_ds.isel(t=slice(1, 2))
    # Expect data is decoded
    np.testing.assert_equal(dst_ds_1.v.data, self.v_values_decoded)  # succeeds
    np.testing.assert_equal(dst_ds_2.v.data, self.v_values_decoded)  # succeeds

def test_decode_cf_false(self):
    """
    This test fails by the last assertion with

    AssertionError:
    Arrays are not equal

    Mismatched elements: 3 / 4 (75%)
    Max absolute difference: 47600
    Max relative difference: 4.76
     x: array([[    0, 57600, 53632, 49664]], dtype=uint16)
     y: array([[    0, 10000, 15000, 20000]], dtype=uint16)
    """
    # Open the two source datasets
    src_ds_1 = xr.open_zarr(SRC_DS_1_PATH, decode_cf=False)
    src_ds_2 = xr.open_zarr(SRC_DS_2_PATH, decode_cf=False)
    # Expect data is NOT decoded (still encoded)
    np.testing.assert_equal(src_ds_1.v.data, self.v_values_encoded)  # succeeds
    np.testing.assert_equal(src_ds_2.v.data, self.v_values_encoded)  # succeeds

    # Write 1st source datasets to new dataset, append the 2nd source
    src_ds_1.to_zarr(DST_DS_PATH, mode='w-')
    # Avoid ValueError: failed to prevent overwriting existing key scale_factor in attrs. ...
    del src_ds_2.v.attrs['scale_factor']
    del src_ds_2.v.attrs['_FillValue']
    src_ds_2.to_zarr(DST_DS_PATH, append_dim='t')

    # Open the new dataset
    dst_ds = xr.open_zarr(DST_DS_PATH, decode_cf=False)
    dst_ds_1 = dst_ds.isel(t=slice(0, 1))
    dst_ds_2 = dst_ds.isel(t=slice(1, 2))
    # Expect data is NOT decoded (still encoded)
    np.testing.assert_equal(dst_ds_1.v.data, self.v_values_encoded)  # succeeds
    np.testing.assert_equal(dst_ds_2.v.data, self.v_values_encoded)  # fails

```

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 15:50:08) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: de_DE.cp1252 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.17.0 pandas: 1.2.2 numpy: 1.20.1 scipy: 1.6.0 netCDF4: 1.5.6 pydap: installed h5netcdf: None h5py: None Nio: None zarr: 2.6.1 cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.0 cfgrib: None iris: None bottleneck: None dask: 2021.02.0 distributed: 2021.02.0 matplotlib: 3.3.4 cartopy: 0.19.0.post1 seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: 6.2.2 IPython: 7.21.0 sphinx: 3.5.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5405/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 21.888ms · About: xarray-datasette