home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where author_association = "MEMBER", issue = 837243943 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • shoyer · 5 ✖

issue 1

  • Zarr chunking fixes · 5 ✖

author_association 1

  • MEMBER · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
811481334 https://github.com/pydata/xarray/pull/5065#issuecomment-811481334 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTQ4MTMzNA== shoyer 1217238 2021-03-31T21:35:11Z 2021-03-31T21:35:11Z MEMBER

Why is chunk getting called here? Does it actually get called every time we load a dataset with chunks? If so, we will need a more sophisticated solution.

This happens specifically on this line: https://github.com/pydata/xarray/blob/ddc352faa6de91f266a1749773d08ae8d6f09683/xarray/core/dataset.py#L438

So perhaps it would make sense to copy encoding specifically in this case, e.g., python new_var = var.chunk(chunks, name=name2, lock=lock) new_var.encoding = var.encoding

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811458761 https://github.com/pydata/xarray/pull/5065#issuecomment-811458761 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTQ1ODc2MQ== shoyer 1217238 2021-03-31T20:54:46Z 2021-03-31T20:54:46Z MEMBER

Hmm. I would also be happy with explicitly deleting chunks from encoding for now. It's not adding a lot of technical debt.

In the long term, the whole handling of encoding should be revisited, e.g., see https://github.com/pydata/xarray/issues/5082

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
807140762 https://github.com/pydata/xarray/pull/5065#issuecomment-807140762 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwNzE0MDc2Mg== shoyer 1217238 2021-03-25T17:26:46Z 2021-03-25T17:26:46Z MEMBER

FWIW, I would also favor dropping encoding['chunks'] after indexing, coarsening, interpolating, etc. Basically anything that changes the array shape or chunk structure.

We already drop all of encoding after indexing. My guess is that we do the same for coarsening and interpolations as well (though I haven't checked).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
807111762 https://github.com/pydata/xarray/pull/5065#issuecomment-807111762 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwNzExMTc2Mg== shoyer 1217238 2021-03-25T17:08:09Z 2021-03-25T17:08:09Z MEMBER

Xarray knows to drop the dtype encoding after an arithmetic operation. How does that work? To me .chunk feel like a similar case: an operation that invalidates any existing encoding.

To be honest, the existing convention is quite adhoc, just based on what seemed most appropriate at the time.

https://github.com/pydata/xarray/issues/1614 is most comprehensive description of the current state of things.

We were considering saying that attrs and encoding should always use the same rules, but perhaps we should be more aggressive about dropping encoding.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
806154872 https://github.com/pydata/xarray/pull/5065#issuecomment-806154872 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwNjE1NDg3Mg== shoyer 1217238 2021-03-24T20:10:19Z 2021-03-24T20:10:19Z MEMBER

I'm a little conflicted about dealing with encoding['chunks'] specifically in chunk():

  • On one hand, it feels inconsistent for this only this single method in xarray to modify part of encoding. Nothing else in xarray (after CF decoding) does this. Effectively encoding['chunks'] is now becoming a part of xarray's data model.
  • On the other hand, this would absolutely fix a recurrent pain-point for users, and in that sense it's worth doing.

Maybe this isn't such a big deal in this particular case, especially if we don't think we would need to add such encoding specific logic to any other methods. But are we really sure about that -- what about cases like indexing?

I guess the other alternative to make chunk() and various other methods that would change chunking drop encoding entirely. I don't know if this would really be a better comprehensive solution (I know dropping attrs is much hated), but at least it's an easier mental model.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 245.852ms · About: xarray-datasette