home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 1030811490 and user = 1197350 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • rabernat · 5 ✖

issue 1

  • problem appending to zarr on GCS when using json token · 5 ✖

author_association 1

  • MEMBER 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1315553661 https://github.com/pydata/xarray/issues/5878#issuecomment-1315553661 https://api.github.com/repos/pydata/xarray/issues/5878 IC_kwDOAMm_X85OacF9 rabernat 1197350 2022-11-15T16:22:30Z 2022-11-15T16:22:30Z MEMBER

Your issue is that the consolidated metadata have not been updated:

```python import gcsfs fs = gcsfs.GCSFileSystem()

the latest array metadata

print(fs.cat('gs://ldeo-glaciology/append_test/test30/temperature/.zarray').decode())

-> "shape": [ 6 ]

the consolidated metadata

print(fs.cat(''gs://ldeo-glaciology/append_test/test30/.zmetadata'').decode())

-> "shape": [ 3 ]

```

There are two ways to fix this.

  1. Don't use consolidated metadatda on read. (This will be a bit slower) python ds = xr.open_dataset('gs://ldeo-glaciology/append_test/test30', engine='zarr', consolidated=False)
  2. Reconsolidate your metadata after append. https://zarr.readthedocs.io/en/stable/tutorial.html#consolidating-metadata
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  problem appending to zarr on GCS when using json token  1030811490
969021506 https://github.com/pydata/xarray/issues/5878#issuecomment-969021506 https://api.github.com/repos/pydata/xarray/issues/5878 IC_kwDOAMm_X845whhC rabernat 1197350 2021-11-15T15:25:37Z 2021-11-15T15:25:46Z MEMBER

So there are two layers here where caching could be happening: - gcsfs / fsspec (python) - gcs itself

I propose we eliminate the python layer entirely for the moment. Whenever you load the dataset, it's shape is completely determined by whatever zarr sees in gs://ldeo-glaciology/append_test/test5/temperature/.zarray. So try looking at this file directly. You can figure out its public URL and just do curl, e.g. curl https://storage.googleapis.com/ldeo-glaciology/append_test/test5/temperature/.zarray { "chunks": [ 3 ], "compressor": { "blocksize": 0, "clevel": 5, "cname": "lz4", "id": "blosc", "shuffle": 1 }, "dtype": "<i8", "fill_value": null, "filters": null, "order": "C", "shape": [ 6 ], "zarr_format": 2 }

Run this from jupyterhub from the command line. Then try gcs.cat('ldeo-glaciology/append_test/test5/temperature/.zarray' and see if you see the same thing. Basically just eliminate as many layers as possible from the problem until you get to the core issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  problem appending to zarr on GCS when using json token  1030811490
967363845 https://github.com/pydata/xarray/issues/5878#issuecomment-967363845 https://api.github.com/repos/pydata/xarray/issues/5878 IC_kwDOAMm_X845qM0F rabernat 1197350 2021-11-12T19:18:38Z 2021-11-12T19:18:38Z MEMBER

Ok I think I may understand what is happening

```python

load the zarr store

ds_both = xr.open_zarr(mapper) ```

When you do this, zarr reads a file called gs://ldeo-glaciology/append_test/test5/temperature/.zarray. Since the data are public, I can look at it right now:

$ gsutil cat gs://ldeo-glaciology/append_tet/test5/temperature/.zarray { "chunks": [ 3 ], "compressor": { "blocksize": 0, "clevel": 5, "cname": "lz4", "id": "blosc", "shuffle": 1 }, "dtype": "<i8", "fill_value": null, "filters": null, "order": "C", "shape": [ 6 ], }

Right now, it shows the shape is [6], as expected after the appending. However, if you read the file immediately after appending (within the 3600s max-age), you will get the cached copy. The cached copy will still be of shape [3]--it won't know about the append.

To test this hypothesis, you would need to disable caching on the bucket. Do you have privileges to do that?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  problem appending to zarr on GCS when using json token  1030811490
967142419 https://github.com/pydata/xarray/issues/5878#issuecomment-967142419 https://api.github.com/repos/pydata/xarray/issues/5878 IC_kwDOAMm_X845pWwT rabernat 1197350 2021-11-12T14:05:36Z 2021-11-12T14:05:36Z MEMBER

Can you post the full stack trace of the error you get when you try to append?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  problem appending to zarr on GCS when using json token  1030811490
966665066 https://github.com/pydata/xarray/issues/5878#issuecomment-966665066 https://api.github.com/repos/pydata/xarray/issues/5878 IC_kwDOAMm_X845niNq rabernat 1197350 2021-11-11T22:17:32Z 2021-11-11T22:17:32Z MEMBER

I think that this is not an issue with xarray, zarr, or anything in python world but rather an issue with how caching works on GCS public buckets: https://cloud.google.com/storage/docs/metadata

To test this, forget about xarray and zarr for a minute and just use gcsfs to list the bucket contents before and after your writes. I think you will find that the default cache lifetime of 3600 seconds means that you cannot "see" the changes to the bucket or the objects as quickly as needed in order to append.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  problem appending to zarr on GCS when using json token  1030811490

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 168.02ms · About: xarray-datasette