issue_comments
7 rows where issue = 1030811490 and user = 48723181 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- problem appending to zarr on GCS when using json token · 7 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1315919098 | https://github.com/pydata/xarray/issues/5878#issuecomment-1315919098 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X85Ob1T6 | jkingslake 48723181 | 2022-11-15T22:04:20Z | 2022-11-15T22:04:20Z | NONE | This is my latest attempt to avoid the cache issue. It is not working. But I wanted to document it here for the next time this comes up. 1. Run the following in a local jupyter notebook``` import fsspec import xarray as xr import json import gcsfs define a mapper to the ldeo-glaciology bucketneeds a tokenwith open('/Users/jkingslake/Documents/misc/ldeo-glaciology-bc97b12df06b.json') as token_file: token = json.load(token_file) filename = 'gs://ldeo-glaciology/append_test/test56' mapper = fsspec.get_mapper(filename, mode='w', token=token) define two simple datasetsds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': [1, 2, 3]}) ds1 = xr.Dataset({'temperature': (['time'], [53, 54, 55])}, coords={'time': [4, 5, 6]}) write the first ds to bucketds0.to_zarr(mapper) ``` 2. run the following in a local terminal
3. Run the following in the local notebook``` append the second ds to the same zarr storeds1.to_zarr(mapper, mode='a', append_dim='time') ds = xr.open_dataset('gs://ldeo-glaciology/append_test/test56', engine='zarr', consolidated=False) len(ds.time) ``` 3 At least it sometimes does this and sometimes work later, and sometimes works immediately. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
1315901990 | https://github.com/pydata/xarray/issues/5878#issuecomment-1315901990 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X85ObxIm | jkingslake 48723181 | 2022-11-15T21:45:22Z | 2022-11-15T21:45:22Z | NONE | Thanks @rabernat. Using It appears from here that the default caching metadata on each object in a buckect overrides any argument you send when loading. But following this
https://stackoverflow.com/questions/52499015/set-metadata-for-all-objects-in-a-bucket-in-google-cloud-storage
I can turn off caching for all objects in the bucket with
But I don't think this affects new objects. So when writing new objects that I want to append to, maybe the approach is to write the first one, then turn off caching for that object, then continue to append. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
1315516624 | https://github.com/pydata/xarray/issues/5878#issuecomment-1315516624 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X85OaTDQ | jkingslake 48723181 | 2022-11-15T15:57:55Z | 2022-11-15T15:57:55Z | NONE | Coming back to this a year later, I am still having the same issue. Running the gsutil locally
whereas running fsspec on leap-pangeo shows only shape 3: ``` import fsspec import xarray as xr import json import gcsfs mapper = fsspec.get_mapper('gs://ldeo-glaciology/append_test/test30', mode='r') ds_both = xr.open_zarr(mapper) len(ds_both.temperature) ``` And trying to append using a new toy dataset written from leap-pangeo has the same issue. Any ideas on what to try next? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
969427141 | https://github.com/pydata/xarray/issues/5878#issuecomment-969427141 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845yEjF | jkingslake 48723181 | 2021-11-15T23:18:46Z | 2021-11-15T23:18:46Z | NONE | but I now am really confused because
6 ``` @porterdf did you disable caching when you wrote the first zarr? How did you do that exactly? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
969423439 | https://github.com/pydata/xarray/issues/5878#issuecomment-969423439 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845yDpP | jkingslake 48723181 | 2021-11-15T23:15:38Z | 2021-11-15T23:15:38Z | NONE | 1. In the jupyterhub (pangeo) command line with
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
968994399 | https://github.com/pydata/xarray/issues/5878#issuecomment-968994399 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845wa5f | jkingslake 48723181 | 2021-11-15T14:59:33Z | 2021-11-15T14:59:33Z | NONE | thanks for looking into this.
@porterdf you should have full permissions to do things like this. But in any case, I could only see how to change metadata for individual existing objects rather than the entire bucket. How do I edit the cache-control for whole bucket? I have tried writing the first dataset, then changing its disabling caching for that object, then appending. I still do not see the full length (shape = [6]) dataset when I reload it. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 | |
966758709 | https://github.com/pydata/xarray/issues/5878#issuecomment-966758709 | https://api.github.com/repos/pydata/xarray/issues/5878 | IC_kwDOAMm_X845n5E1 | jkingslake 48723181 | 2021-11-12T02:05:04Z | 2021-11-12T02:05:04Z | NONE | Thanks for taking a look @rabernat. The code below writes a new zarr and checks immediately if it's there using gcsfs. It seems to appear within a few seconds. Is this what you meant? ``` %%time import fsspec import xarray as xr import json import gcsfs define a mapper to the ldeo-glaciology bucket. - needs a tokenwith open('../secrets/ldeo-glaciology-bc97b12df06b.json') as token_file: token = json.load(token_file) get a mapper with fsspec for a new zarrmapper = fsspec.get_mapper('gs://ldeo-glaciology/append_test/test11', mode='w', token=token) check what files are in therefs = gcsfs.GCSFileSystem(project='pangeo-integration-te-3eea', mode='ab', cache_timeout = 0) print('Files in the test directory before writing:') filesBefore = fs.ls('gs://ldeo-glaciology/append_test/') print(*filesBefore,sep='\n') define a simple datasetsds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': [1, 2, 3]}) write the simple dataset to zarrds0.to_zarr(mapper) check to see if the new file is thereprint('Files in the test directory after writing:')
filesAfter = fs.ls('gs://ldeo-glaciology/append_test/')
print(*filesAfter,sep='\n')
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
problem appending to zarr on GCS when using json token 1030811490 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1