home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

4 rows where repo = 13221727, state = "closed" and user = 27021858 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

type 1

  • issue 4

state 1

  • closed · 4 ✖

repo 1

  • xarray · 4 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1876858952 I_kwDOAMm_X85v3phI 8134 Unable to append data in s3 bucket with to_zarr() and append mode meteoDaniel 27021858 closed 0     2 2023-09-01T06:57:32Z 2023-09-01T16:03:50Z 2023-09-01T16:03:49Z NONE      

What happened?

I updated my packages and now xarray+zarr are unable to append data to an existing Zarr store in s3.

What did you expect to happen?

That data will be appended to an existing Zarr store.

Minimal Complete Verifiable Example

```Python import s3fs import xarray import numpy as np from datetime import datetime from s3fs import S3FileSystem

append_dim = 'dt_calc' consolidated = True

ds = xarray.Dataset( {'temp': (('dt_calc', 'y', 'x'), np.array([[[1., 2., 3., 4.], [3., 4., 5., 6.]]]))}, coords={'lon': ('y', np.array([50., 51.])), 'lat': ('x', np.array([4., 5., 6., 7.])), 'dt_calc': ('dt_calc', [datetime(2022, 1, 1)])} ) ds_2 = xarray.Dataset( {'temp': (('dt_calc', 'y', 'x'), np.array([[[1., 2., 3., 4.], [3., 4., 5., 6.]]]))}, coords={'lon': ('y', np.array([50., 51.])), 'lat': ('x', np.array([4., 5., 6., 7.])), 'dt_calc': ('dt_calc', [datetime(2022, 1, 1, 1)])} )

s3_out = S3FileSystem( anon=False, s3_additional_kwargs={"StorageClass": storage_class}, ) store_out = s3fs.S3Map( root=f"s3:///{bucket_name}/{dataset_name}.zarr", s3=s3_out, check=False )

ds.to_zarr( store, mode="w-", compute=True, consolidated=consolidated )

try: ds_2.to_zarr( store, mode="w-", compute=True, consolidated=consolidated ) except zarr.errors.ContainsGroupError: ds_2.to_zarr( store, mode="a", append_dim=append_dim, compute=True, consolidated=consolidated, ) ```

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python In [6]: xarray.open_zarr(store_out, consolidated=True) Out[6]: <xarray.Dataset> Dimensions: (dt_calc: 1, x: 4, y: 2) Coordinates: * dt_calc (dt_calc) datetime64[ns] 2022-01-01 lat (x) float64 dask.array<chunksize=(4,), meta=np.ndarray> lon (y) float64 dask.array<chunksize=(2,), meta=np.ndarray> Dimensions without coordinates: x, y Data variables: temp (dt_calc, y, x) float64 dask.array<chunksize=(1, 2, 4), meta=np.ndarray>

In [7]: dataset.to_zarr( ...: store_out, ...: mode="a", ...: append_dim=append_dim, ...: compute=True, ...: consolidated=consolidated, ...: )


ValueError Traceback (most recent call last) Cell In[7], line 1 ----> 1 dataset.to_zarr( 2 store_out, 3 mode="a", 4 append_dim=append_dim, 5 compute=True, 6 consolidated=consolidated, 7 )

File /usr/local/lib/python3.9/site-packages/xarray/core/dataset.py:2461, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, write_empty_chunks, chunkmanager_store_kwargs) 2329 """Write dataset contents to a zarr group. 2330 2331 Zarr chunks are determined in the following way: (...) 2457 The I/O user guide, with more details and examples. 2458 """ 2459 from xarray.backends.api import to_zarr -> 2461 return to_zarr( # type: ignore[call-overload,misc] 2462 self, 2463 store=store, 2464 chunk_store=chunk_store, 2465 storage_options=storage_options, 2466 mode=mode, 2467 synchronizer=synchronizer, 2468 group=group, 2469 encoding=encoding, 2470 compute=compute, 2471 consolidated=consolidated, 2472 append_dim=append_dim, 2473 region=region, 2474 safe_chunks=safe_chunks, 2475 zarr_version=zarr_version, 2476 write_empty_chunks=write_empty_chunks, 2477 chunkmanager_store_kwargs=chunkmanager_store_kwargs, 2478 )

File /usr/local/lib/python3.9/site-packages/xarray/backends/api.py:1670, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, write_empty_chunks, chunkmanager_store_kwargs) 1668 existing_dims = zstore.get_dimensions() 1669 if append_dim not in existing_dims: -> 1670 raise ValueError( 1671 f"append_dim={append_dim!r} does not match any existing " 1672 f"dataset dimensions {existing_dims}" 1673 ) 1674 existing_var_names = set(zstore.zarr_group.array_keys()) 1675 for var_name in existing_var_names:

ValueError: append_dim='dt_calc' does not match any existing dataset dimensions {}

In [8]: dataset Out[8]: <xarray.Dataset> Dimensions: (dt_calc: 1, y: 2, x: 4) Coordinates: lon (y) float64 50.0 51.0 lat (x) float64 4.0 5.0 6.0 7.0 * dt_calc (dt_calc) datetime64[ns] 2022-01-01T01:00:00 Dimensions without coordinates: y, x Data variables: temp (dt_calc, y, x) float64 1.0 2.0 3.0 4.0 3.0 4.0 5.0 6.0

In [9]: ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.10 (main, Mar 2 2022, 04:31:58) [GCC 10.2.1 20210110] python-bits: 64 OS: Linux OS-release: 6.2.0-26-generic machine: x86_64 processor: byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2023.8.0 pandas: 2.1.0 numpy: 1.25.2 scipy: 1.10.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: None nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.8.1 distributed: 2023.8.1 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2023.6.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 53.0.0 pip: 21.2.4 conda: None pytest: 6.1.1 mypy: None IPython: 8.12.0 sphinx: None

boto3==1.26.45 aiobotocore==2.5.0 botocore==1.29.76 s3fs==2023.6.0 zarr==2.16.1 xarray==2023.8.0 dask==2023.8.1 dask[distributed]==2023.8.1 dask-cloudprovider==2022.10.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8134/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
545297609 MDU6SXNzdWU1NDUyOTc2MDk= 3661 How to handle 2d latitude and longitude coordinates within DataArray creation meteoDaniel 27021858 closed 0     3 2020-01-04T15:31:43Z 2022-07-09T21:54:21Z 2020-01-04T19:44:15Z NONE      

MCVE Code Sample

```python frames_z.shape -> (25, 1100, 900) grid[:, :, 1].shape -> (1100, 900) grid[:, :, 0].shape -> (1100, 900)

xarray.DataArray(frames_z, coords={'time': timestamps, 'latitude':grid[:, :, 1],'longitude': grid[:, :, 0]}, dims=['time', 'latitude', 'longitude'])

```

Expected Output

Array with latitude and longitude as coordinates.

Problem Description

I am receiving following error message:

MissingDimensionsError: cannot set variable 'latitude' with 2-dimensional data without explicit dimension names. Pass a tuple of (dims, data) instead. And I tried several ways of defining coords and dims but it always fails.

Output of xr.show_versions()

# Paste the output here xr.show_versions() here xarray==0.11.3
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3661/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
947796627 MDU6SXNzdWU5NDc3OTY2Mjc= 5620 `xr.where()` does not work like `np.where()`on meshgrids meteoDaniel 27021858 closed 0     6 2021-07-19T15:55:10Z 2021-07-20T11:32:45Z 2021-07-20T07:58:18Z NONE      

In case of selecting 2D data, the xarray.where() does not work like numpy.where() . In the documentation you have mentioned that np.where() is the corresponding function for xr.where() but it seems that they are working totally different.

Here is my code:

```python data = xarray.open_dataset('path_to_attached_file') minLat, minLon, maxLat, maxLon = (45.08903556483102, 5.625000000000013, 48.92249926375824, 11.249999999999993) latitudes = data.lat.values longitudes = data.lon.values slice_mask = np.where( (latitudes <= maxLat) & (latitudes > minLat) & (longitudes <= maxLon) & (longitudes > minLon) ) _sliced_data = data.where( (data.lat <= maxLat) & (data.lat > minLat) & (data.lon <= maxLon) & (data.lon > minLon), drop=True, ) _sliced_data.latitude.values.max()

49.305596

latitudes[slice_mask].max()

48.922172

``` I have also tried to translate the numpy result into a boolen DataArray:

```python mask_array = data.copy() mask_array.update({'air_temperature_2m': (("y", "x"), (latitudes <= maxLat) & (latitudes > minLat) & (longitudes <= maxLon) & (longitudes > minLon))})

_sliced_data = data.where( mask_array.air_temperature_2m, drop=True, )

``` There we have the same result. It seems that the masking step does not really works correctly.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.1 (default, Feb 9 2021, 07:55:26) [GCC 8.3.0] python-bits: 64 OS: Linux OS-release: 5.10.0-1033-oem machine: x86_64 processor: byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.2.1 numpy: 1.20.0 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: 0.8.0 h5py: 3.2.1 Nio: None zarr: None cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.4 iris: None bottleneck: None dask: 2021.05.0 distributed: None matplotlib: 3.3.4 cartopy: None seaborn: None numbagg: None pint: None setuptools: 53.0.0 pip: 21.0.1 conda: None pytest: 6.1.1 IPython: 7.21.0 sphinx: None

harmonie_knmi_grid_fixture.zip

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5620/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
545324954 MDU6SXNzdWU1NDUzMjQ5NTQ= 3662 Select on 2 dimensional indices/How to define indices correctly? meteoDaniel 27021858 closed 0     2 2020-01-04T19:49:16Z 2020-09-01T02:48:44Z 2020-09-01T02:48:44Z NONE      

MCVE Code Sample

```python ds = xarray.tutorial.open_dataset('rasm').load() ds.sel(yc=50, xc=50, method='nearest')

```

Expected Output

An interpolated timeseries of data closest to 50, 50.

Problem Description

How do I have to define the indices that I can select on multi dimensional indices?

Output of xr.show_versions()

# Paste the output here xr.show_versions() here xarray==0.11.3
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3662/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 37.469ms · About: xarray-datasette