home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

1 row where issue = 1433534927 and user = 92732695 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • matthewyakubiw · 1 ✖

issue 1

  • xarray.align Inflating Source NetCDF Data · 1 ✖

author_association 1

  • NONE 1
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1301171446 https://github.com/pydata/xarray/issues/7248#issuecomment-1301171446 https://api.github.com/repos/pydata/xarray/issues/7248 IC_kwDOAMm_X85Njkz2 matthewyakubiw 92732695 2022-11-02T20:09:36Z 2022-11-02T20:09:36Z NONE

The size of the data seems consistent with that (i.e. 2*27*4*3/2*48*8MB=125GB, though not exactly 58GB, looking at the difference in the "Array Shape" between the two images)

This would be correct if we added that many 8MB data sets, but we just added one! So I'd expect the entire store to be roughly that size (8MB) before compression since in the code example above we are just adding one data array to the Zarr data store. Would you be able to clarify this calculation 2*27*4*3/2*48*8MB=125GB ?

Does looking at the values of the dimension that's grown to size 48 help — are those possibly different, even though you don't intend them to be, and causing it to be huge (but sparse!)?

Does that make sense?

So the reason why all the dimension sizes have grown is because we initialize the Zarr with the set of all possibilities that the coordinates might be (read somewhere that this is what you should do - initialize an empty Zarr with the coords/dimensions you will need). Then what I figured I needed to do was align the dimensions of the .nc file with that of the Zarr store, so that when the data array is added the coordinates and dimensions line up - this is why we see the dimensions grow.

Is this the correct approach? I found the limitation/design of to_zarr() that allows you to only append to one dimension meant that if you want to append new data to a Zarr store, all dimensions/coords must pre-exist in both the Zarr store and individual DataSet that you'd like to append. This is what drives the decision to perform xarray.align.

Without aligning, I would get error messages that complained about dimension sizes being different. Thanks again for your help!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.align Inflating Source NetCDF Data 1433534927

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.72ms · About: xarray-datasette