home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "MEMBER", issue = 342531772 and user = 1197350 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, updated_at (date)

user 1

  • rabernat · 4 ✖

issue 1

  • zarr and xarray chunking compatibility and `to_zarr` performance · 4 ✖

author_association 1

  • MEMBER · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
805883595 https://github.com/pydata/xarray/issues/2300#issuecomment-805883595 https://api.github.com/repos/pydata/xarray/issues/2300 MDEyOklzc3VlQ29tbWVudDgwNTg4MzU5NQ== rabernat 1197350 2021-03-24T14:48:55Z 2021-03-24T14:48:55Z MEMBER

In #5056, I have implemented the solution of deleting chunks from encoding when chunk() is called on a variable. A review of that PR would be welcome.

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 2,
    "rocket": 0,
    "eyes": 0
}
  zarr and xarray chunking compatibility and `to_zarr` performance 342531772
790088409 https://github.com/pydata/xarray/issues/2300#issuecomment-790088409 https://api.github.com/repos/pydata/xarray/issues/2300 MDEyOklzc3VlQ29tbWVudDc5MDA4ODQwOQ== rabernat 1197350 2021-03-03T21:55:44Z 2021-03-03T21:55:44Z MEMBER

alternatively to_zarr could ignore encoding["chunks"] when the data is already chunked?

I would not favor that. A user may choose to define their desired zarr chunks by putting this information in encoding. In this case, it's good to raise the error. (This is the case I had in mind when I wrote this code.)

The problem here is that encoding is often being carried over from the original dataset and persisted across operations that change chunk size.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  zarr and xarray chunking compatibility and `to_zarr` performance 342531772
789974968 https://github.com/pydata/xarray/issues/2300#issuecomment-789974968 https://api.github.com/repos/pydata/xarray/issues/2300 MDEyOklzc3VlQ29tbWVudDc4OTk3NDk2OA== rabernat 1197350 2021-03-03T18:54:43Z 2021-03-03T18:54:43Z MEMBER

I think we are all in agreement. Just waiting for someone to make a PR. It's probably just a few lines of code changes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  zarr and xarray chunking compatibility and `to_zarr` performance 342531772
598790404 https://github.com/pydata/xarray/issues/2300#issuecomment-598790404 https://api.github.com/repos/pydata/xarray/issues/2300 MDEyOklzc3VlQ29tbWVudDU5ODc5MDQwNA== rabernat 1197350 2020-03-13T15:51:54Z 2020-03-13T15:51:54Z MEMBER

Hi all. I am looking into this issue, trying to figure out if it is still a thing. I just tried @chrisbarber's MRE above using xarray v 0.15.

python import xarray as xr ds=xr.Dataset({'foo': (['bar'], np.zeros((505359,)))}) ds.to_zarr('test.zarr', mode='w') ds2=xr.open_zarr('test.zarr') ds2.to_zarr('test2.zarr', mode='w') This now works without error, thanks to #2487.

I can trigger the error in a third step: python ds3 = ds2.chunk({'bar': 10000}) ds3.to_zarr('test3.zarr', mode='w')

raises NotImplementedError: Specified zarr chunks (63170,) would overlap multiple dask chunks ((10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 5359),). This is not implemented in xarray yet. Consider rechunking the data using `chunk()` or specifying different chunks in encoding.

The problem is that, even though we rechunked the data, chunks key is still present in encoding. ```python

print(ds3.foo.encoding) {'chunks': (63170,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, '_FillValue': nan, 'dtype': dtype('float64')} ```

This was populated with the variable was read from test.zarr.

As a workaround, you can delete the encoding (either just the chunk attribute or all of it): python ds3.foo.encoding = {} ds3.to_zarr('test3.zarr', mode='w') This allows the operation to complete successfully.

For all the users stuck on this problem (e.g. @abarciauskas-bgse): - update to the latest version of xarray and then - delete the encoding on your variables, or overwrite it with the encoding keyword in to_zarr.

For xarray developers, the question is whether the chunk() method should delete existing chunks attributes from encoding.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  zarr and xarray chunking compatibility and `to_zarr` performance 342531772

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1919.651ms · About: xarray-datasette