home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

3 rows where type = "issue" and user = 4801430 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 2
  • open 1

type 1

  • issue · 3 ✖

repo 1

  • xarray 3
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
863332419 MDU6SXNzdWU4NjMzMzI0MTk= 5197 allow you to raise error on missing zarr chunks with open_dataset/open_zarr bolliger32 4801430 closed 0     1 2021-04-21T00:00:03Z 2023-11-24T22:14:18Z 2023-11-24T22:14:18Z CONTRIBUTOR      

Is your feature request related to a problem? Please describe. Currently if a zarr store has a missing chunk, it is treaded as all missing. This is an upstream functionality but one for which there may soon be a kwarg allowing you to instead raise an error in these instances (https://github.com/zarr-developers/zarr-python/pull/489). This is valuable in situations where you would like to distinguish intentional NaN data from I/O errors that caused you to not write some chunks. Here's an example of a problematic case in this situation (courtesy of @delgadom ):

python import xarray as xr import numpy as np xr.Dataset({'myarr': (('x', 'y'), [[0., np.nan], [2., 3.]]), 'x': [0, 1], 'y': [0, 1]}).chunk({'x': 1, 'y': 1}).to_zarr('myzarr.zarr'); print('\n\ndata read into xarray\n' + '-'*30) print(xr.open_zarr('myzarr.zarr').compute().myarr) print('\n\nstructure of zarr store\n' + '-'*30) ! ls -R myzarr.zarr print('\n\nremove a chunk\n' + '-'*30 + '\nrm myzarr.zarr/myarr/1.0') ! rm myzarr.zarr/myarr/1.0 print('\n\ndata read into xarray\n' + '-'*30) print(xr.open_zarr('myzarr.zarr').compute().myarr)

This prints:

``` data read into xarray


<xarray.DataArray 'myarr' (x: 2, y: 2)> array([[ 0., nan], [ 2., 3.]]) Coordinates: * x (x) int64 0 1 * y (y) int64 0 1 structure of zarr store


myzarr.zarr: myarr x y myzarr.zarr/myarr: 0.0 0.1 1.0 1.1 myzarr.zarr/x: 0 myzarr.zarr/y: 0 remove a chunk


rm myzarr.zarr/myarr/1.0 data read into xarray


<xarray.DataArray 'myarr' (x: 2, y: 2)> array([[ 0., nan], [nan, 3.]]) Coordinates: * x (x) int64 0 1 * y (y) int64 0 1 ```

Describe the solution you'd like I'm not sure where a kwarg to the __init__ method of a zarr Array object would come into play within open_zarr or open_dataset (once https://github.com/zarr-developers/zarr-python/pull/489 is merged), but I figured I'd ask this question to see if anyone could point me in the right direction and to get ready for when that zarr feature exists. Happy to file a PR once I know where I'm looking. Couldn't figure it out with some initial browsing

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5197/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  not_planned xarray 13221727 issue
868352536 MDU6SXNzdWU4NjgzNTI1MzY= 5219 Zarr encoding attributes persist after slicing data, raising error on `to_zarr` bolliger32 4801430 open 0     9 2021-04-27T01:34:52Z 2022-12-06T16:16:20Z   CONTRIBUTOR      

What happened: Opened a dataset using open_zarr, sliced the dataset, and then tried to resave to a zarr store using to_zarr.

What you expected to happen: The file would save without needing to explicitly modify any encoding dictionary values

Minimal Complete Verifiable Example:

```python ds = xr.Dataset({"data": (("dimA", ), [10, 20, 30, 40])}, coords={"dimA": [1, 2, 3, 4]}) ds = ds.chunk({"dimA": 2}) ds.to_zarr("test.zarr", consolidated=True, mode="w")

ds2 = xr.open_zarr("test.zarr", consolidated=True).sel(dimA=[1,3]).persist() ds2.to_zarr("test2.zarr", consolidated=True, mode="w") ```

This raises: python NotImplementedError: Specified zarr chunks encoding['chunks']=(2,) for variable named 'data' would overlap multiple dask chunks ((1, 1),). This is not implemented in xarray yet. Consider either rechunking using `chunk()` or instead deleting or modifying `encoding['chunks']`. Anything else we need to know?:

Not sure if there is a good way around this (or perhaps this is even desired behavior?), but figured I would flag it as it seemed unexpected and took us a second to diagnose. Once you've loaded the data from a zarr store, I feel like the default behavior should probably be to forget the encodings used to save that zarr, treating the in-memory dataset object just like any other in-memory dataset object that could have been loaded from any source. But maybe I'm in the minority or missing some nuance about why you'd want the encoding to hang around.

Environment:

``` INSTALLED VERSIONS


commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.89+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4

xarray: 0.17.0 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.6.2 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.7.1 cftime: 1.2.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.2.2 cfgrib: 0.9.9.0 iris: 3.0.1 bottleneck: 1.3.2 dask: 2021.04.1 distributed: 2021.04.1 matplotlib: 3.4.1 cartopy: 0.19.0 seaborn: 0.11.1 numbagg: None pint: 0.17 setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: 6.2.3 IPython: 7.22.0 sphinx: 3.5.4 ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5219/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
540578460 MDU6SXNzdWU1NDA1Nzg0NjA= 3648 combine_by_coords should allow for missing panels in hypercube bolliger32 4801430 closed 0     0 2019-12-19T21:29:02Z 2019-12-24T13:46:28Z 2019-12-24T13:46:28Z CONTRIBUTOR      

MCVE Code Sample

python import numpy as np import xarray as xr x1 = xr.Dataset( { "temperature": (("y", "x"), 20 * np.random.rand(6).reshape(2, 3)) }, coords={"y": [0, 1], "x": [10, 20, 30]}, ) x2 = xr.Dataset( { "temperature": (("y", "x"), 20 * np.random.rand(6).reshape(2, 3)) }, coords={"y": [2, 3], "x": [10, 20, 30]}, ) x3 = xr.Dataset( { "temperature": (("y", "x"), 20 * np.random.rand(6).reshape(2, 3)) }, coords={"y": [2, 3], "x": [40, 50, 60]}, ) xr.combine_by_coords([x1,x2,x3])

Expected Output

python <xarray.Dataset> Dimensions: (x: 6, y: 4) Coordinates: * x (x) int64 10 20 30 40 50 60 * y (y) int64 0 1 2 3 Data variables: temperature (y, x) float64 14.11 19.19 10.77 nan ... 4.86 10.57 4.38 15.09

Problem Description

Currently, it throws the following error: python ValueError: The supplied objects do not form a hypercube because sub-lists do not have consistent lengths along dimension0 This is b/c combine_by_coords calls xr.core.combine._check_shape_tile_ids, which mandates that the passed datasets form a complete hypercube. This check functiono also serves the purpose of mandating that the dimension depths are the same. Could we pull that part out as a separate function and, for combine_by_coords, only call this first part but NOT mandate that the hypercube is complete? The expected behavior, in my mind, should be to simply replace the missing tiles of the hypercube with fill_value. I'll file a PR to this effect and welcome comments.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Dec 6 2019, 08:54:18) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.14.150+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.1 xarray: 0.14.1 pandas: 0.25.3 numpy: 1.17.3 scipy: 1.3.2 netCDF4: 1.5.3 pydap: None h5netcdf: 0.7.4 h5py: 2.10.0 Nio: None zarr: 2.3.2 cftime: 1.0.4.2 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.1.1 cfgrib: None iris: 2.2.0 bottleneck: None dask: 2.8.1 distributed: 2.8.1 matplotlib: 3.1.2 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: None setuptools: 42.0.2.post20191201 pip: 19.3.1 conda: 4.8.0 pytest: 5.3.1 IPython: 7.10.1 sphinx: 2.2.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3648/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 1758.534ms · About: xarray-datasette