home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

2 rows where state = "open", type = "issue" and user = 11750960 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 1

  • issue · 2 ✖

state 1

  • open · 2 ✖

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
614785886 MDU6SXNzdWU2MTQ3ODU4ODY= 4046 automatic chunking of zarr archive apatlpo 11750960 open 0     3 2020-05-08T14:42:00Z 2023-01-18T21:54:42Z   CONTRIBUTOR      

I store data in a zarr archive that is not chunked and the resulting zarr archive is chunked. This may be as simple usage question. I don't know how to turn this behavior off.

Code sample

Here is minimal example that reproduces the issue:

python ds = xr.DataArray(np.ones((200,800))).rename('foo').to_dataset() print('Initial chunks = {}'.format(ds.foo.chunks)) ds.to_zarr('test.zarr', mode='w') print('zarr archives contains: {}'.format(os.listdir('test.zarr/foo'))) ds = xr.open_zarr('test.zarr') print('Final chunks = {}'.format(ds.foo.chunks)) returns: Initial chunks = None zarr archives contains: ['.zarray', '.zattrs', '0.0', '0.1', '1.0', '1.1'] Final chunks = ((100, 100), (400, 400))

Expected Output

I would expect the archive to not to be chunked.

Versions

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 23:03:20) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.12.53-60.30-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4 xarray: 0.15.2.dev29+g6048356 pandas: 1.0.3 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.4.0 cftime: 1.1.1.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.13.0 distributed: 2.13.0 matplotlib: 3.2.1 cartopy: 0.17.0 seaborn: 0.10.0 numbagg: None pint: None setuptools: 46.1.3.post20200325 pip: 20.0.2 conda: None pytest: None IPython: 7.13.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4046/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
283518232 MDU6SXNzdWUyODM1MTgyMzI= 1795 open_mfdataset concat_dim chunk apatlpo 11750960 open 0     2 2017-12-20T10:34:58Z 2020-01-07T16:19:39Z   CONTRIBUTOR      

open_mfdataset does not allow chunking along concat_dim.

As a result if specific chunking is sought along that dimension by the user it may be best not to pass chunks at the open_mfdataset stage and rechunk variables afterwards. This would be the case for example if chunks are large across files but small within files: https://github.com/apatlpo/lops-array/blob/master/sandbox/natl60_tseries_debug.ipynb

I believe this is difficult to anticipate for new users (like me).

Couldn't this be specified in the documentation of open_mfdataset?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1795/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 32.086ms · About: xarray-datasette