home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where author_association = "CONTRIBUTOR", issue = 1077079208 and user = 43613877 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • observingClouds · 2 ✖

issue 1

  • to_zarr: region not recognised as dataset dimensions · 2 ✖

author_association 1

  • CONTRIBUTOR · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1033814820 https://github.com/pydata/xarray/issues/6069#issuecomment-1033814820 https://api.github.com/repos/pydata/xarray/issues/6069 IC_kwDOAMm_X849nsMk observingClouds 43613877 2022-02-09T14:23:54Z 2022-02-09T14:36:48Z CONTRIBUTOR

You are right, the coordinates should not be dropped.

I think the function _validate_region has a bug. Currently it checks for all ds.variables if at least one of their dimensions agrees with the ones given in the region argument. However, ds.variables also returns the coordinates. However, we actually only want to check if the ds.data_vars have a dimension intersecting with the given region.

Changing the function to `python def _validate_region(ds, region): if not isinstance(region, dict): raise TypeError(f"region`` must be a dict, got {type(region)}")

for k, v in region.items():
    if k not in ds.dims:
        raise ValueError(
            f"all keys in ``region`` are not in Dataset dimensions, got "
            f"{list(region)} and {list(ds.dims)}"
        )
    if not isinstance(v, slice):
        raise TypeError(
            "all values in ``region`` must be slice objects, got "
            f"region={region}"
        )
    if v.step not in {1, None}:
        raise ValueError(
            "step on all slices in ``region`` must be 1 or None, got "
            f"region={region}"
        )

non_matching_vars = [
    k for k, v in ds.data_vars.items() if not set(region).intersection(v.dims)
]
if non_matching_vars:
    raise ValueError(
        f"when setting `region` explicitly in to_zarr(), all "
        f"variables in the dataset to write must have at least "
        f"one dimension in common with the region's dimensions "
        f"{list(region.keys())}, but that is not "
        f"the case for some variables here. To drop these variables "
        f"from this dataset before exporting to zarr, write: "
        f".drop({non_matching_vars!r})"
    )

``` seems to work.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 1,
    "eyes": 0
}
  to_zarr: region not recognised as dataset dimensions 1077079208
1031773761 https://github.com/pydata/xarray/issues/6069#issuecomment-1031773761 https://api.github.com/repos/pydata/xarray/issues/6069 IC_kwDOAMm_X849f55B observingClouds 43613877 2022-02-07T18:19:08Z 2022-02-07T18:19:08Z CONTRIBUTOR

Hi @Boorhin, I just ran into the same issue. The region argument has to be of type slice, in your case slice(t) instead of just t works:

python import xarray as xr from datetime import datetime,timedelta import numpy as np dt= datetime.now() times= np.arange(dt,dt+timedelta(days=6), timedelta(hours=1)) nodesx,nodesy,layers=np.arange(10,50), np.arange(10,50)+15, np.arange(10) ds=xr.Dataset() ds.coords['time']=('time', times) ds.coords['node_x']=('node', nodesx) ds.coords['node_y']=('node', nodesy) ds.coords['layer']=('layer', layers) outfile='my_zarr' varnames=['potato','banana', 'apple'] for var in varnames: ds[var]=(('time', 'layer', 'node'), np.zeros((len(times), len(layers),len(nodesx)))) ds.to_zarr(outfile, mode='a') for t in range(len(times)): for var in varnames: ds[var].isel(time=slice(t)).values += np.random.random((len(layers),len(nodesx))) ds.isel(time=slice(t)).to_zarr(outfile, region={"time": slice(t)})

This leads however to another issue: ```python


ValueError Traceback (most recent call last) <ipython-input-52-bb3d2c1adc12> in <module> 18 for var in varnames: 19 ds[var].isel(time=slice(t)).values += np.random.random((len(layers),len(nodesx))) ---> 20 ds.isel(time=slice(t)).to_zarr(outfile, region={"time": slice(t)})

~/.local/lib/python3.8/site-packages/xarray/core/dataset.py in to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks) 2029 encoding = {} 2030 -> 2031 return to_zarr( 2032 self, 2033 store=store,

~/.local/lib/python3.8/site-packages/xarray/backends/api.py in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks) 1359 1360 if region is not None: -> 1361 _validate_region(dataset, region) 1362 if append_dim is not None and append_dim in region: 1363 raise ValueError(

~/.local/lib/python3.8/site-packages/xarray/backends/api.py in _validate_region(ds, region) 1272 ] 1273 if non_matching_vars: -> 1274 raise ValueError( 1275 f"when setting region explicitly in to_zarr(), all " 1276 f"variables in the dataset to write must have at least "

ValueError: when setting region explicitly in to_zarr(), all variables in the dataset to write must have at least one dimension in common with the region's dimensions ['time'], but that is not the case for some variables here. To drop these variables from this dataset before exporting to zarr, write: .drop(['node_x', 'node_y', 'layer']) ```

Here, the solution is however provided with the error message. Following the instructions, the snippet below finally works (as far as I can tell):

```python import xarray as xr from datetime import datetime,timedelta import numpy as np dt= datetime.now() times= np.arange(dt,dt+timedelta(days=6), timedelta(hours=1)) nodesx,nodesy,layers=np.arange(10,50), np.arange(10,50)+15, np.arange(10) ds=xr.Dataset() ds.coords['time']=('time', times)

ds.coords['node_x']=('node', nodesx)

ds.coords['node_y']=('node', nodesy)

ds.coords['layer']=('layer', layers)

outfile='my_zarr' varnames=['potato','banana', 'apple'] for var in varnames: ds[var]=(('time', 'layer', 'node'), np.zeros((len(times), len(layers),len(nodesx)))) ds.to_zarr(outfile, mode='a') for t in range(len(times)): for var in varnames: ds[var].isel(time=slice(t)).values += np.random.random((len(layers),len(nodesx))) ds.isel(time=slice(t)).to_zarr(outfile, region={"time": slice(t)}) ```

Maybe one would like to generalise region in api.py to allow for single indices or throw a hint in case an a type different to a slice is provided.

Cheers

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_zarr: region not recognised as dataset dimensions 1077079208

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 17.461ms · About: xarray-datasette