github: issue_comments: 16 rows where issue = 1030811490 sorted by updated

16 rows where issue = 1030811490 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1315919098	https://github.com/pydata/xarray/issues/5878#issuecomment-1315919098	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X85Ob1T6	jkingslake 48723181	2022-11-15T22:04:20Z	2022-11-15T22:04:20Z	NONE	This is my latest attempt to avoid the cache issue. It is not working. But I wanted to document it here for the next time this comes up. 1. Run the following in a local jupyter notebook ``` import fsspec import xarray as xr import json import gcsfs define a mapper to the ldeo-glaciology bucket needs a token with open('/Users/jkingslake/Documents/misc/ldeo-glaciology-bc97b12df06b.json') as token_file: token = json.load(token_file) filename = 'gs://ldeo-glaciology/append_test/test56' mapper = fsspec.get_mapper(filename, mode='w', token=token) define two simple datasets ds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': [1, 2, 3]}) ds1 = xr.Dataset({'temperature': (['time'], [53, 54, 55])}, coords={'time': [4, 5, 6]}) write the first ds to bucket ds0.to_zarr(mapper) ``` 2. run the following in a local terminal `gsutil setmeta -h "Cache-Control:no-store" gs://ldeo-glaciology/append_test/test56/**` to turn off caching for this zarr store and all the files associated with it 3. Run the following in the local notebook ``` append the second ds to the same zarr store ds1.to_zarr(mapper, mode='a', append_dim='time') ds = xr.open_dataset('gs://ldeo-glaciology/append_test/test56', engine='zarr', consolidated=False) len(ds.time) ``` 3 At least it sometimes does this and sometimes work later, and sometimes works immediately.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
1315901990	https://github.com/pydata/xarray/issues/5878#issuecomment-1315901990	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X85ObxIm	jkingslake 48723181	2022-11-15T21:45:22Z	2022-11-15T21:45:22Z	NONE	Thanks @rabernat. Using `consolidated=False` when reading seems to work, but not immediately after the append, and there is very strange behavior where the size of the dataset changes each time you read it. So maybe this is the cache issue again. It appears from here that the default caching metadata on each object in a buckect overrides any argument you send when loading. But following this https://stackoverflow.com/questions/52499015/set-metadata-for-all-objects-in-a-bucket-in-google-cloud-storage I can turn off caching for all objects in the bucket with `gsutil setmeta -h "Cache-Control:no-store" gs://ldeo-glaciology/**` But I don't think this affects new objects. So when writing new objects that I want to append to, maybe the approach is to write the first one, then turn off caching for that object, then continue to append.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
1315553661	https://github.com/pydata/xarray/issues/5878#issuecomment-1315553661	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X85OacF9	rabernat 1197350	2022-11-15T16:22:30Z	2022-11-15T16:22:30Z	MEMBER	Your issue is that the consolidated metadata have not been updated: ```python import gcsfs fs = gcsfs.GCSFileSystem() the latest array metadata print(fs.cat('gs://ldeo-glaciology/append_test/test30/temperature/.zarray').decode()) -> "shape": [ 6 ] the consolidated metadata print(fs.cat(''gs://ldeo-glaciology/append_test/test30/.zmetadata'').decode()) -> "shape": [ 3 ] ``` There are two ways to fix this. Don't use consolidated metadatda on read. (This will be a bit slower) `python ds = xr.open_dataset('gs://ldeo-glaciology/append_test/test30', engine='zarr', consolidated=False)` Reconsolidate your metadata after append. https://zarr.readthedocs.io/en/stable/tutorial.html#consolidating-metadata	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
1315516624	https://github.com/pydata/xarray/issues/5878#issuecomment-1315516624	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X85OaTDQ	jkingslake 48723181	2022-11-15T15:57:55Z	2022-11-15T15:57:55Z	NONE	Coming back to this a year later, I am still having the same issue. Running the gsutil locally `gsutil cat gs://ldeo-glaciology/append_test/test30/temperature/.zarray` shows shape 6: `{ "chunks": [ 3 ], "compressor": { "blocksize": 0, "clevel": 5, "cname": "lz4", "id": "blosc", "shuffle": 1 }, "dtype": "<i8", "fill_value": null, "filters": null, "order": "C", "shape": [ 6 ], "zarr_format": 2` whereas running fsspec on leap-pangeo shows only shape 3: ``` import fsspec import xarray as xr import json import gcsfs mapper = fsspec.get_mapper('gs://ldeo-glaciology/append_test/test30', mode='r') ds_both = xr.open_zarr(mapper) len(ds_both.temperature) ``` And trying to append using a new toy dataset written from leap-pangeo has the same issue. Any ideas on what to try next?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
969446050	https://github.com/pydata/xarray/issues/5878#issuecomment-969446050	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X845yJKi	jhamman 2443309	2021-11-15T23:45:20Z	2021-11-15T23:45:20Z	MEMBER	Thought I would drop a related note here. Gcsfs just added support for fixed-key metadata: https://github.com/fsspec/gcsfs/pull/429. So if you are testing out different fsspec/gcsfs options for caching, make sure you are using `gcsfs==2021.11.0`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
969427141	https://github.com/pydata/xarray/issues/5878#issuecomment-969427141	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X845yEjF	jkingslake 48723181	2021-11-15T23:18:46Z	2021-11-15T23:18:46Z	NONE	but I now am really confused because `test5` from a few days ago shows up as shape [6]: `import fsspec import xarray as xr import json import gcsfs mapper = fsspec.get_mapper('gs://ldeo-glaciology/append_test/test5', mode='r') ds_both = xr.open_zarr(mapper) len(ds_both.time)` ``` /tmp/ipykernel_1040/570416536.py:7: RuntimeWarning: Failed to open Zarr store with consolidated metadata, falling back to try reading non-consolidated metadata. This is typically much slower for opening a dataset. To silence this warning, consider: 1. Consolidating metadata in this existing store with zarr.consolidate_metadata(). 2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or 3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata. ds_both = xr.open_zarr(mapper) 6 ``` @porterdf did you disable caching when you wrote the first zarr? How did you do that exactly?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
969423439	https://github.com/pydata/xarray/issues/5878#issuecomment-969423439	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X845yDpP	jkingslake 48723181	2021-11-15T23:15:38Z	2021-11-15T23:15:38Z	NONE	1. In the jupyterhub (pangeo) command line with `curl`, i get (shape [6]) `curl https://storage.googleapis.com/ldeo-glaciology/append_test/test30/temperature/.zarray { "chunks": [ 3 ], "compressor": { "blocksize": 0, "clevel": 5, "cname": "lz4", "id": "blosc", "shuffle": 1 }, "dtype": "<i8", "fill_value": null, "filters": null, "order": "C", "shape": [ 6 ], "zarr_format": 2` 2. On my local machine using `gsutil`, i get (shape [6]) ``` gsutil cat gs://ldeo-glaciology/append_test/test30/temperature/.zarray { "chunks": [ 3 ], "compressor": { "blocksize": 0, "clevel": 5, "cname": "lz4", "id": "blosc", "shuffle": 1 }, "dtype": "<i8", "fill_value": null, "filters": null, "order": "C", "shape": [ 6 ], "zarr_format": 2 ``` 3. When I use 'fsspec` in the jupyterhub, i get something different (shape [3]) ``` import fsspec import xarray as xr import json import gcsfs mapper = fsspec.get_mapper('gs://ldeo-glaciology/append_test/test30', mode='r') ds_both = xr.open_zarr(mapper) len(ds_both.time) 3 ``` 4. Using `gcsfs` in the jupyterhub I get (shape [3]) ``` gcs = gcsfs.GCSFileSystem(project='pangeo-integration-te-3eea') gcs.cat('ldeo-glaciology/append_test/test5/temperature/.zarray') b'{\n "chunks": [\n 3\n ],\n "compressor": {\n "blocksize": 0,\n "clevel": 5,\n "cname": "lz4",\n "id": "blosc",\n "shuffle": 1\n },\n "dtype": "<i8",\n "fill_value": null,\n "filters": null,\n "order": "C",\n "shape": [\n 3\n ],\n "zarr_format": 2\n}' ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
969021506	https://github.com/pydata/xarray/issues/5878#issuecomment-969021506	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X845whhC	rabernat 1197350	2021-11-15T15:25:37Z	2021-11-15T15:25:46Z	MEMBER	So there are two layers here where caching could be happening: - gcsfs / fsspec (python) - gcs itself I propose we eliminate the python layer entirely for the moment. Whenever you load the dataset, it's shape is completely determined by whatever zarr sees in `gs://ldeo-glaciology/append_test/test5/temperature/.zarray`. So try looking at this file directly. You can figure out its public URL and just do curl, e.g. `curl https://storage.googleapis.com/ldeo-glaciology/append_test/test5/temperature/.zarray { "chunks": [ 3 ], "compressor": { "blocksize": 0, "clevel": 5, "cname": "lz4", "id": "blosc", "shuffle": 1 }, "dtype": "<i8", "fill_value": null, "filters": null, "order": "C", "shape": [ 6 ], "zarr_format": 2 }` Run this from jupyterhub from the command line. Then try `gcs.cat('ldeo-glaciology/append_test/test5/temperature/.zarray'` and see if you see the same thing. Basically just eliminate as many layers as possible from the problem until you get to the core issue.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
968994399	https://github.com/pydata/xarray/issues/5878#issuecomment-968994399	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X845wa5f	jkingslake 48723181	2021-11-15T14:59:33Z	2021-11-15T14:59:33Z	NONE	thanks for looking into this. To test this hypothesis, you would need to disable caching on the bucket. Do you have privileges to do that? I tried to do this last night but did not have permission myself. Perhaps @jkingslake does? @porterdf you should have full permissions to do things like this. But in any case, I could only see how to change metadata for individual existing objects rather than the entire bucket. How do I edit the cache-control for whole bucket? I have tried writing the first dataset, then changing its disabling caching for that object, then appending. I still do not see the full length (shape = [6]) dataset when I reload it.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
968176008	https://github.com/pydata/xarray/issues/5878#issuecomment-968176008	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X845tTGI	porterdf 7237617	2021-11-13T23:43:17Z	2021-11-13T23:44:27Z	NONE	Update: my local notebook accessing the public bucket does see the appended zarr store exactly as expected, while the 2i2c-hosted notebook still is not (been well over 3600s). Also, I do as @jkingslake does above and set the `cache_timeout=0`. From GCSFs docs `Set cache_timeout <= 0 for no caching,` seems like the functionality we desire, yet I continue to only see the un-appended zarr	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
967408017	https://github.com/pydata/xarray/issues/5878#issuecomment-967408017	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X845qXmR	porterdf 7237617	2021-11-12T19:40:46Z	2021-11-13T23:25:53Z	NONE	Right now, it shows the shape is `[6]`, as expected after the appending. However, if you read the file immediately after appending (within the 3600s `max-age`), you will get the cached copy. The cached copy will still be of shape `[3]`--it won't know about the append. Ignorant question: is this cache relevant to client (Jupyter) side or server (GCS) side? It has been well over 3600s and I'm still not seeing the appended zarr when reading it in using Xarray. To test this hypothesis, you would need to disable caching on the bucket. Do you have privileges to do that? I tried to do this last night but did not have permission myself. Perhaps @jkingslake does?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
967363845	https://github.com/pydata/xarray/issues/5878#issuecomment-967363845	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X845qM0F	rabernat 1197350	2021-11-12T19:18:38Z	2021-11-12T19:18:38Z	MEMBER	Ok I think I may understand what is happening ```python load the zarr store ds_both = xr.open_zarr(mapper) ``` When you do this, zarr reads a file called `gs://ldeo-glaciology/append_test/test5/temperature/.zarray`. Since the data are public, I can look at it right now: `$ gsutil cat gs://ldeo-glaciology/append_tet/test5/temperature/.zarray { "chunks": [ 3 ], "compressor": { "blocksize": 0, "clevel": 5, "cname": "lz4", "id": "blosc", "shuffle": 1 }, "dtype": "<i8", "fill_value": null, "filters": null, "order": "C", "shape": [ 6 ], }` Right now, it shows the shape is `[6]`, as expected after the appending. However, if you read the file immediately after appending (within the 3600s `max-age`), you will get the cached copy. The cached copy will still be of shape `[3]`--it won't know about the append. To test this hypothesis, you would need to disable caching on the bucket. Do you have privileges to do that?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
967340995	https://github.com/pydata/xarray/issues/5878#issuecomment-967340995	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X845qHPD	porterdf 7237617	2021-11-12T18:52:01Z	2021-11-12T18:58:52Z	NONE	Thanks for pointing out this cache feature @rabernat. I had no idea - makes sense in general but slows down testing if no known about! Anyway for my case, when appending the second Zarr store to the first, the Zarr's size (using `gsutil du`) does indeed double. I'm new to cloud storage, but my hunch is that this suggests it was appended? Can you post the full stack trace of the error you get when you try to append? In my instance, there is no error, only this returned: `<xarray.backends.zarr.ZarrStore at 0x7f662d31f3a0>`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
967142419	https://github.com/pydata/xarray/issues/5878#issuecomment-967142419	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X845pWwT	rabernat 1197350	2021-11-12T14:05:36Z	2021-11-12T14:05:36Z	MEMBER	Can you post the full stack trace of the error you get when you try to append?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
966758709	https://github.com/pydata/xarray/issues/5878#issuecomment-966758709	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X845n5E1	jkingslake 48723181	2021-11-12T02:05:04Z	2021-11-12T02:05:04Z	NONE	Thanks for taking a look @rabernat. The code below writes a new zarr and checks immediately if it's there using gcsfs. It seems to appear within a few seconds. Is this what you meant? ``` %%time import fsspec import xarray as xr import json import gcsfs define a mapper to the ldeo-glaciology bucket. - needs a token with open('../secrets/ldeo-glaciology-bc97b12df06b.json') as token_file: token = json.load(token_file) get a mapper with fsspec for a new zarr mapper = fsspec.get_mapper('gs://ldeo-glaciology/append_test/test11', mode='w', token=token) check what files are in there fs = gcsfs.GCSFileSystem(project='pangeo-integration-te-3eea', mode='ab', cache_timeout = 0) print('Files in the test directory before writing:') filesBefore = fs.ls('gs://ldeo-glaciology/append_test/') print(filesBefore,sep='\n') define a simple datasets ds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': [1, 2, 3]}) write the simple dataset to zarr ds0.to_zarr(mapper) check to see if the new file is there print('Files in the test directory after writing:') filesAfter = fs.ls('gs://ldeo-glaciology/append_test/') print(filesAfter,sep='\n') Output: Files in the test directory before writing: ldeo-glaciology/append_test/test1 ldeo-glaciology/append_test/test10 ldeo-glaciology/append_test/test2 ldeo-glaciology/append_test/test3 ldeo-glaciology/append_test/test4 ldeo-glaciology/append_test/test5 ldeo-glaciology/append_test/test6 ldeo-glaciology/append_test/test7 ldeo-glaciology/append_test/test8 ldeo-glaciology/append_test/test9 Files in the test directory after writing: ldeo-glaciology/append_test/test1 ldeo-glaciology/append_test/test10 ldeo-glaciology/append_test/test11 ldeo-glaciology/append_test/test2 ldeo-glaciology/append_test/test3 ldeo-glaciology/append_test/test4 ldeo-glaciology/append_test/test5 ldeo-glaciology/append_test/test6 ldeo-glaciology/append_test/test7 ldeo-glaciology/append_test/test8 ldeo-glaciology/append_test/test9 CPU times: user 130 ms, sys: 16.5 ms, total: 146 ms Wall time: 2.19 s ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
966665066	https://github.com/pydata/xarray/issues/5878#issuecomment-966665066	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X845niNq	rabernat 1197350	2021-11-11T22:17:32Z	2021-11-11T22:17:32Z	MEMBER	I think that this is not an issue with xarray, zarr, or anything in python world but rather an issue with how caching works on GCS public buckets: https://cloud.google.com/storage/docs/metadata To test this, forget about xarray and zarr for a minute and just use gcsfs to list the bucket contents before and after your writes. I think you will find that the default cache lifetime of 3600 seconds means that you cannot "see" the changes to the bucket or the objects as quickly as needed in order to append.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

16 rows where issue = 1030811490 sorted by updated_at descending

1. Run the following in a local jupyter notebook

define a mapper to the ldeo-glaciology bucket

needs a token

define two simple datasets

write the first ds to bucket

2. run the following in a local terminal

3. Run the following in the local notebook

append the second ds to the same zarr store

the latest array metadata

-> "shape": [ 6 ]

the consolidated metadata

-> "shape": [ 3 ]

1. In the jupyterhub (pangeo) command line with `curl`, i get (shape [6])

2. On my local machine using `gsutil`, i get (shape [6])

3. When I use 'fsspec` in the jupyterhub, i get something different (shape [3])

4. Using `gcsfs` in the jupyterhub I get (shape [3])

load the zarr store

define a mapper to the ldeo-glaciology bucket. - needs a token

get a mapper with fsspec for a new zarr

check what files are in there

define a simple datasets

write the simple dataset to zarr

check to see if the new file is there

Advanced export

issue_comments

16 rows where issue = 1030811490 sorted by updated_at descending

1. Run the following in a local jupyter notebook

define a mapper to the ldeo-glaciology bucket

needs a token

define two simple datasets

write the first ds to bucket

2. run the following in a local terminal

3. Run the following in the local notebook

append the second ds to the same zarr store

the latest array metadata

-> "shape": [ 6 ]

the consolidated metadata

-> "shape": [ 3 ]

1. In the jupyterhub (pangeo) command line with curl, i get (shape [6])

2. On my local machine using gsutil, i get (shape [6])

3. When I use 'fsspec` in the jupyterhub, i get something different (shape [3])

4. Using gcsfs in the jupyterhub I get (shape [3])

load the zarr store

define a mapper to the ldeo-glaciology bucket. - needs a token

get a mapper with fsspec for a new zarr

check what files are in there

define a simple datasets

write the simple dataset to zarr

check to see if the new file is there

Advanced export

1. In the jupyterhub (pangeo) command line with `curl`, i get (shape [6])

2. On my local machine using `gsutil`, i get (shape [6])

4. Using `gcsfs` in the jupyterhub I get (shape [3])