home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 692238160

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
692238160 MDU6SXNzdWU2OTIyMzgxNjA= 4405 open_zarr: concat_characters has no effect when dtype=U1 6130352 open 0     8 2020-09-03T19:22:52Z 2022-04-27T23:48:29Z   NONE      

What happened:

It appears that either to_zarr or from_zarr is incorrectly concatenating the trailing dimension of single byte/character arrays and dropping the last dimension:

```python import xarray as xr import numpy as np xr.set_options(display_style='text')

chrs = np.array([ ['A', 'B'], ['C', 'D'], ['E', 'F'], ], dtype='S1') ds = xr.Dataset(dict(x=(('dim0', 'dim1'), chrs))) ds.x <xarray.DataArray 'x' (dim0: 3, dim1: 2)> array([[b'A', b'B'], [b'C', b'D'], [b'E', b'F']], dtype='|S1') Dimensions without coordinates: dim0, dim1

ds.to_zarr('/tmp/test.zarr', mode='w') xr.open_zarr('/tmp/test.zarr').x.compute()

The second dimension is lost and the values end up being concatenated

<xarray.DataArray 'x' (dim0: 3)> array([b'AB', b'CD', b'EF'], dtype='|S2') Dimensions without coordinates: dim0 ```

For N columns in a 2D array, you end up with an "|SN" 1D array. When using say "S2" or any fixed-length greater than 1, it doesn't happen.

Interestingly though, it only affects the trailing dimension. I.e. if you use 3 dimensions, you get a 2D result with the 3rd dimension dropped:

```python chrs = np.array([[ ['A', 'B'], ['C', 'D'], ['E', 'F'], ]], dtype='S1') ds = xr.Dataset(dict(x=(('dim0', 'dim1', 'dim2'), chrs))) ds <xarray.Dataset> Dimensions: (dim0: 1, dim1: 3, dim2: 2) Dimensions without coordinates: dim0, dim1, dim2 Data variables: x (dim0, dim1, dim2) |S1 b'A' b'B' b'C' b'D' b'E' b'F'

ds.to_zarr('/tmp/test.zarr', mode='w') xr.open_zarr('/tmp/test.zarr').x.compute()

dim2 is gone and the data concatenated to dim1

<xarray.DataArray 'x' (dim0: 1, dim1: 3)> array([[b'AB', b'CD', b'EF']], dtype='|S2') Dimensions without coordinates: dim0, dim1 ```

In short, this only affects the "S1" data type. "U1" is fine as is "SN" where N > 1.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-42-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: None xarray: 0.16.0 pandas: 1.0.5 numpy: 1.19.0 scipy: 1.5.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.21.0 distributed: 2.21.0 matplotlib: 3.3.0 cartopy: None seaborn: 0.10.1 numbagg: None pint: None setuptools: 47.3.1.post20200616 pip: 20.1.1 conda: 4.8.2 pytest: 5.4.3 IPython: 7.15.0 sphinx: 3.2.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4405/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 8 rows from issue in issue_comments
Powered by Datasette · Queries took 0.627ms · About: xarray-datasette