issue_comments
16 rows where issue = 1030811490 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- problem appending to zarr on GCS when using json token · 16 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1315919098 | https://github.com/pydata/xarray/issues/5878#issuecomment-1315919098 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X85Ob1T6 | jkingslake 48723181 | 2022-11-15T22:04:20Z | 2022-11-15T22:04:20Z | NONE | This is my latest attempt to avoid the cache issue. It is not working. But I wanted to document it here for the next time this comes up. 1. Run the following in a local jupyter notebook``` import fsspec import xarray as xr import json import gcsfs define a mapper to the ldeo-glaciology bucketneeds a tokenwith open('/Users/jkingslake/Documents/misc/ldeo-glaciology-bc97b12df06b.json') as token_file: token = json.load(token_file) filename = 'gs://ldeo-glaciology/append_test/test56' mapper = fsspec.get_mapper(filename, mode='w', token=token) define two simple datasetsds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': [1, 2, 3]}) ds1 = xr.Dataset({'temperature': (['time'], [53, 54, 55])}, coords={'time': [4, 5, 6]}) write the first ds to bucketds0.to_zarr(mapper) ``` 2. run the following in a local terminal
3. Run the following in the local notebook``` append the second ds to the same zarr storeds1.to_zarr(mapper, mode='a', append_dim='time') ds = xr.open_dataset('gs://ldeo-glaciology/append_test/test56', engine='zarr', consolidated=False) len(ds.time) ``` 3 At least it sometimes does this and sometimes work later, and sometimes works immediately. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
1315901990 | https://github.com/pydata/xarray/issues/5878#issuecomment-1315901990 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X85ObxIm | jkingslake 48723181 | 2022-11-15T21:45:22Z | 2022-11-15T21:45:22Z | NONE | Thanks @rabernat. Using It appears from here that the default caching metadata on each object in a buckect overrides any argument you send when loading. But following this
https://stackoverflow.com/questions/52499015/set-metadata-for-all-objects-in-a-bucket-in-google-cloud-storage
I can turn off caching for all objects in the bucket with
But I don't think this affects new objects. So when writing new objects that I want to append to, maybe the approach is to write the first one, then turn off caching for that object, then continue to append. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
1315553661 | https://github.com/pydata/xarray/issues/5878#issuecomment-1315553661 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X85OacF9 | rabernat 1197350 | 2022-11-15T16:22:30Z | 2022-11-15T16:22:30Z | MEMBER | Your issue is that the consolidated metadata have not been updated: ```python import gcsfs fs = gcsfs.GCSFileSystem() the latest array metadataprint(fs.cat('gs://ldeo-glaciology/append_test/test30/temperature/.zarray').decode()) -> "shape": [ 6 ]the consolidated metadataprint(fs.cat(''gs://ldeo-glaciology/append_test/test30/.zmetadata'').decode()) -> "shape": [ 3 ]``` There are two ways to fix this.
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
1315516624 | https://github.com/pydata/xarray/issues/5878#issuecomment-1315516624 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X85OaTDQ | jkingslake 48723181 | 2022-11-15T15:57:55Z | 2022-11-15T15:57:55Z | NONE | Coming back to this a year later, I am still having the same issue. Running the gsutil locally
whereas running fsspec on leap-pangeo shows only shape 3: ``` import fsspec import xarray as xr import json import gcsfs mapper = fsspec.get_mapper('gs://ldeo-glaciology/append_test/test30', mode='r') ds_both = xr.open_zarr(mapper) len(ds_both.temperature) ``` And trying to append using a new toy dataset written from leap-pangeo has the same issue. Any ideas on what to try next? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
969446050 | https://github.com/pydata/xarray/issues/5878#issuecomment-969446050 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845yJKi | jhamman 2443309 | 2021-11-15T23:45:20Z | 2021-11-15T23:45:20Z | MEMBER | Thought I would drop a related note here. Gcsfs just added support for fixed-key metadata: https://github.com/fsspec/gcsfs/pull/429. So if you are testing out different fsspec/gcsfs options for caching, make sure you are using |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
969427141 | https://github.com/pydata/xarray/issues/5878#issuecomment-969427141 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845yEjF | jkingslake 48723181 | 2021-11-15T23:18:46Z | 2021-11-15T23:18:46Z | NONE | but I now am really confused because
6 ``` @porterdf did you disable caching when you wrote the first zarr? How did you do that exactly? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
969423439 | https://github.com/pydata/xarray/issues/5878#issuecomment-969423439 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845yDpP | jkingslake 48723181 | 2021-11-15T23:15:38Z | 2021-11-15T23:15:38Z | NONE | 1. In the jupyterhub (pangeo) command line with
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
969021506 | https://github.com/pydata/xarray/issues/5878#issuecomment-969021506 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845whhC | rabernat 1197350 | 2021-11-15T15:25:37Z | 2021-11-15T15:25:46Z | MEMBER | So there are two layers here where caching could be happening: - gcsfs / fsspec (python) - gcs itself I propose we eliminate the python layer entirely for the moment. Whenever you load the dataset, it's shape is completely determined by whatever zarr sees in Run this from jupyterhub from the command line. Then try |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
968994399 | https://github.com/pydata/xarray/issues/5878#issuecomment-968994399 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845wa5f | jkingslake 48723181 | 2021-11-15T14:59:33Z | 2021-11-15T14:59:33Z | NONE | thanks for looking into this.
@porterdf you should have full permissions to do things like this. But in any case, I could only see how to change metadata for individual existing objects rather than the entire bucket. How do I edit the cache-control for whole bucket? I have tried writing the first dataset, then changing its disabling caching for that object, then appending. I still do not see the full length (shape = [6]) dataset when I reload it. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
968176008 | https://github.com/pydata/xarray/issues/5878#issuecomment-968176008 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845tTGI | porterdf 7237617 | 2021-11-13T23:43:17Z | 2021-11-13T23:44:27Z | NONE | Update: my local notebook accessing the public bucket does see the appended zarr store exactly as expected, while the 2i2c-hosted notebook still is not (been well over 3600s). Also, I do as @jkingslake does above and set the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
967408017 | https://github.com/pydata/xarray/issues/5878#issuecomment-967408017 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845qXmR | porterdf 7237617 | 2021-11-12T19:40:46Z | 2021-11-13T23:25:53Z | NONE |
Ignorant question: is this cache relevant to client (Jupyter) side or server (GCS) side? It has been well over 3600s and I'm still not seeing the appended zarr when reading it in using Xarray.
I tried to do this last night but did not have permission myself. Perhaps @jkingslake does? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
967363845 | https://github.com/pydata/xarray/issues/5878#issuecomment-967363845 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845qM0F | rabernat 1197350 | 2021-11-12T19:18:38Z | 2021-11-12T19:18:38Z | MEMBER | Ok I think I may understand what is happening ```python load the zarr storeds_both = xr.open_zarr(mapper) ``` When you do this, zarr reads a file called
Right now, it shows the shape is To test this hypothesis, you would need to disable caching on the bucket. Do you have privileges to do that? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
967340995 | https://github.com/pydata/xarray/issues/5878#issuecomment-967340995 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845qHPD | porterdf 7237617 | 2021-11-12T18:52:01Z | 2021-11-12T18:58:52Z | NONE | Thanks for pointing out this cache feature @rabernat. I had no idea - makes sense in general but slows down testing if no known about! Anyway for my case, when appending the second Zarr store to the first, the Zarr's size (using
In my instance, there is no error, only this returned: |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
967142419 | https://github.com/pydata/xarray/issues/5878#issuecomment-967142419 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845pWwT | rabernat 1197350 | 2021-11-12T14:05:36Z | 2021-11-12T14:05:36Z | MEMBER | Can you post the full stack trace of the error you get when you try to append? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
966758709 | https://github.com/pydata/xarray/issues/5878#issuecomment-966758709 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845n5E1 | jkingslake 48723181 | 2021-11-12T02:05:04Z | 2021-11-12T02:05:04Z | NONE | Thanks for taking a look @rabernat. The code below writes a new zarr and checks immediately if it's there using gcsfs. It seems to appear within a few seconds. Is this what you meant? ``` %%time import fsspec import xarray as xr import json import gcsfs define a mapper to the ldeo-glaciology bucket. - needs a tokenwith open('../secrets/ldeo-glaciology-bc97b12df06b.json') as token_file: token = json.load(token_file) get a mapper with fsspec for a new zarrmapper = fsspec.get_mapper('gs://ldeo-glaciology/append_test/test11', mode='w', token=token) check what files are in therefs = gcsfs.GCSFileSystem(project='pangeo-integration-te-3eea', mode='ab', cache_timeout = 0) print('Files in the test directory before writing:') filesBefore = fs.ls('gs://ldeo-glaciology/append_test/') print(*filesBefore,sep='\n') define a simple datasetsds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': [1, 2, 3]}) write the simple dataset to zarrds0.to_zarr(mapper) check to see if the new file is thereprint('Files in the test directory after writing:')
filesAfter = fs.ls('gs://ldeo-glaciology/append_test/')
print(*filesAfter,sep='\n')
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
966665066 | https://github.com/pydata/xarray/issues/5878#issuecomment-966665066 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845niNq | rabernat 1197350 | 2021-11-11T22:17:32Z | 2021-11-11T22:17:32Z | MEMBER | I think that this is not an issue with xarray, zarr, or anything in python world but rather an issue with how caching works on GCS public buckets: https://cloud.google.com/storage/docs/metadata To test this, forget about xarray and zarr for a minute and just use gcsfs to list the bucket contents before and after your writes. I think you will find that the default cache lifetime of 3600 seconds means that you cannot "see" the changes to the bucket or the objects as quickly as needed in order to append. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 4