issue_comments
10 rows where author_association = "NONE" and issue = 1030811490 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- problem appending to zarr on GCS when using json token · 10 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1315919098 | https://github.com/pydata/xarray/issues/5878#issuecomment-1315919098 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X85Ob1T6 | jkingslake 48723181 | 2022-11-15T22:04:20Z | 2022-11-15T22:04:20Z | NONE | This is my latest attempt to avoid the cache issue. It is not working. But I wanted to document it here for the next time this comes up. 1. Run the following in a local jupyter notebook``` import fsspec import xarray as xr import json import gcsfs define a mapper to the ldeo-glaciology bucketneeds a tokenwith open('/Users/jkingslake/Documents/misc/ldeo-glaciology-bc97b12df06b.json') as token_file: token = json.load(token_file) filename = 'gs://ldeo-glaciology/append_test/test56' mapper = fsspec.get_mapper(filename, mode='w', token=token) define two simple datasetsds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': [1, 2, 3]}) ds1 = xr.Dataset({'temperature': (['time'], [53, 54, 55])}, coords={'time': [4, 5, 6]}) write the first ds to bucketds0.to_zarr(mapper) ``` 2. run the following in a local terminal
3. Run the following in the local notebook``` append the second ds to the same zarr storeds1.to_zarr(mapper, mode='a', append_dim='time') ds = xr.open_dataset('gs://ldeo-glaciology/append_test/test56', engine='zarr', consolidated=False) len(ds.time) ``` 3 At least it sometimes does this and sometimes work later, and sometimes works immediately. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
1315901990 | https://github.com/pydata/xarray/issues/5878#issuecomment-1315901990 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X85ObxIm | jkingslake 48723181 | 2022-11-15T21:45:22Z | 2022-11-15T21:45:22Z | NONE | Thanks @rabernat. Using It appears from here that the default caching metadata on each object in a buckect overrides any argument you send when loading. But following this
https://stackoverflow.com/questions/52499015/set-metadata-for-all-objects-in-a-bucket-in-google-cloud-storage
I can turn off caching for all objects in the bucket with
But I don't think this affects new objects. So when writing new objects that I want to append to, maybe the approach is to write the first one, then turn off caching for that object, then continue to append. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
1315516624 | https://github.com/pydata/xarray/issues/5878#issuecomment-1315516624 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X85OaTDQ | jkingslake 48723181 | 2022-11-15T15:57:55Z | 2022-11-15T15:57:55Z | NONE | Coming back to this a year later, I am still having the same issue. Running the gsutil locally
whereas running fsspec on leap-pangeo shows only shape 3: ``` import fsspec import xarray as xr import json import gcsfs mapper = fsspec.get_mapper('gs://ldeo-glaciology/append_test/test30', mode='r') ds_both = xr.open_zarr(mapper) len(ds_both.temperature) ``` And trying to append using a new toy dataset written from leap-pangeo has the same issue. Any ideas on what to try next? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
969427141 | https://github.com/pydata/xarray/issues/5878#issuecomment-969427141 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845yEjF | jkingslake 48723181 | 2021-11-15T23:18:46Z | 2021-11-15T23:18:46Z | NONE | but I now am really confused because
6 ``` @porterdf did you disable caching when you wrote the first zarr? How did you do that exactly? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
969423439 | https://github.com/pydata/xarray/issues/5878#issuecomment-969423439 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845yDpP | jkingslake 48723181 | 2021-11-15T23:15:38Z | 2021-11-15T23:15:38Z | NONE | 1. In the jupyterhub (pangeo) command line with
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
968994399 | https://github.com/pydata/xarray/issues/5878#issuecomment-968994399 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845wa5f | jkingslake 48723181 | 2021-11-15T14:59:33Z | 2021-11-15T14:59:33Z | NONE | thanks for looking into this.
@porterdf you should have full permissions to do things like this. But in any case, I could only see how to change metadata for individual existing objects rather than the entire bucket. How do I edit the cache-control for whole bucket? I have tried writing the first dataset, then changing its disabling caching for that object, then appending. I still do not see the full length (shape = [6]) dataset when I reload it. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
968176008 | https://github.com/pydata/xarray/issues/5878#issuecomment-968176008 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845tTGI | porterdf 7237617 | 2021-11-13T23:43:17Z | 2021-11-13T23:44:27Z | NONE | Update: my local notebook accessing the public bucket does see the appended zarr store exactly as expected, while the 2i2c-hosted notebook still is not (been well over 3600s). Also, I do as @jkingslake does above and set the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
967408017 | https://github.com/pydata/xarray/issues/5878#issuecomment-967408017 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845qXmR | porterdf 7237617 | 2021-11-12T19:40:46Z | 2021-11-13T23:25:53Z | NONE |
Ignorant question: is this cache relevant to client (Jupyter) side or server (GCS) side? It has been well over 3600s and I'm still not seeing the appended zarr when reading it in using Xarray.
I tried to do this last night but did not have permission myself. Perhaps @jkingslake does? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
967340995 | https://github.com/pydata/xarray/issues/5878#issuecomment-967340995 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845qHPD | porterdf 7237617 | 2021-11-12T18:52:01Z | 2021-11-12T18:58:52Z | NONE | Thanks for pointing out this cache feature @rabernat. I had no idea - makes sense in general but slows down testing if no known about! Anyway for my case, when appending the second Zarr store to the first, the Zarr's size (using
In my instance, there is no error, only this returned: |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
966758709 | https://github.com/pydata/xarray/issues/5878#issuecomment-966758709 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845n5E1 | jkingslake 48723181 | 2021-11-12T02:05:04Z | 2021-11-12T02:05:04Z | NONE | Thanks for taking a look @rabernat. The code below writes a new zarr and checks immediately if it's there using gcsfs. It seems to appear within a few seconds. Is this what you meant? ``` %%time import fsspec import xarray as xr import json import gcsfs define a mapper to the ldeo-glaciology bucket. - needs a tokenwith open('../secrets/ldeo-glaciology-bc97b12df06b.json') as token_file: token = json.load(token_file) get a mapper with fsspec for a new zarrmapper = fsspec.get_mapper('gs://ldeo-glaciology/append_test/test11', mode='w', token=token) check what files are in therefs = gcsfs.GCSFileSystem(project='pangeo-integration-te-3eea', mode='ab', cache_timeout = 0) print('Files in the test directory before writing:') filesBefore = fs.ls('gs://ldeo-glaciology/append_test/') print(*filesBefore,sep='\n') define a simple datasetsds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': [1, 2, 3]}) write the simple dataset to zarrds0.to_zarr(mapper) check to see if the new file is thereprint('Files in the test directory after writing:')
filesAfter = fs.ls('gs://ldeo-glaciology/append_test/')
print(*filesAfter,sep='\n')
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 2