html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/5878#issuecomment-1315919098,https://api.github.com/repos/pydata/xarray/issues/5878,1315919098,IC_kwDOAMm_X85Ob1T6,48723181,2022-11-15T22:04:20Z,2022-11-15T22:04:20Z,NONE,"This is my latest attempt to avoid the cache issue. It is not working. But I wanted to document it here for the next time this comes up.
### 1. Run the following in a local jupyter notebook
```
import fsspec
import xarray as xr
import json
import gcsfs
## define a mapper to the ldeo-glaciology bucket
### needs a token
with open('/Users/jkingslake/Documents/misc/ldeo-glaciology-bc97b12df06b.json') as token_file:
token = json.load(token_file)
filename = 'gs://ldeo-glaciology/append_test/test56'
mapper = fsspec.get_mapper(filename, mode='w', token=token)
## define two simple datasets
ds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': [1, 2, 3]})
ds1 = xr.Dataset({'temperature': (['time'], [53, 54, 55])}, coords={'time': [4, 5, 6]})
## write the first ds to bucket
ds0.to_zarr(mapper)
```
### 2. run the following in a local terminal
` gsutil setmeta -h ""Cache-Control:no-store"" gs://ldeo-glaciology/append_test/test56/**`
to turn off caching for this zarr store and all the files associated with it
### 3. Run the following in the local notebook
```
## append the second ds to the same zarr store
ds1.to_zarr(mapper, mode='a', append_dim='time')
ds = xr.open_dataset('gs://ldeo-glaciology/append_test/test56', engine='zarr', consolidated=False)
len(ds.time)
```
3
At least it sometimes does this and sometimes work later, and sometimes works immediately.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1030811490
https://github.com/pydata/xarray/issues/5878#issuecomment-1315901990,https://api.github.com/repos/pydata/xarray/issues/5878,1315901990,IC_kwDOAMm_X85ObxIm,48723181,2022-11-15T21:45:22Z,2022-11-15T21:45:22Z,NONE,"Thanks @rabernat.
Using `consolidated=False` when reading seems to work, but not immediately after the append, and there is very strange behavior where the size of the dataset changes each time you read it. So maybe this is the cache issue again.
It appears from [here](https://cloud.google.com/storage/docs/metadata#cache-control) that the default caching metadata on each object in a buckect overrides any argument you send when loading.
But following this
https://stackoverflow.com/questions/52499015/set-metadata-for-all-objects-in-a-bucket-in-google-cloud-storage
I can turn off caching for all objects in the bucket with
``` gsutil setmeta -h ""Cache-Control:no-store"" gs://ldeo-glaciology/**```
But I don't think this affects new objects.
So when writing new objects that I want to append to, maybe the approach is to write the first one, then turn off caching for that object, then continue to append.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1030811490
https://github.com/pydata/xarray/issues/5878#issuecomment-1315516624,https://api.github.com/repos/pydata/xarray/issues/5878,1315516624,IC_kwDOAMm_X85OaTDQ,48723181,2022-11-15T15:57:55Z,2022-11-15T15:57:55Z,NONE,"Coming back to this a year later, I am still having the same issue.
Running the gsutil locally
```
gsutil cat gs://ldeo-glaciology/append_test/test30/temperature/.zarray
```
shows shape 6:
```
{
""chunks"": [
3
],
""compressor"": {
""blocksize"": 0,
""clevel"": 5,
""cname"": ""lz4"",
""id"": ""blosc"",
""shuffle"": 1
},
""dtype"": "" >To test this hypothesis, you would need to disable caching on the bucket. Do you have privileges to do that?
>I tried to do this last night but did not have permission myself. Perhaps @jkingslake does?
@porterdf you should have full permissions to do things like this. But in any case, I could only see how to change metadata for individual existing objects rather than the entire bucket.
How do I edit the cache-control for whole bucket?
I have tried writing the first dataset, then changing its disabling caching for that object, then appending. I still do not see the full length (shape = [6]) dataset when I reload it. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1030811490
https://github.com/pydata/xarray/issues/5878#issuecomment-966758709,https://api.github.com/repos/pydata/xarray/issues/5878,966758709,IC_kwDOAMm_X845n5E1,48723181,2021-11-12T02:05:04Z,2021-11-12T02:05:04Z,NONE,"Thanks for taking a look @rabernat.
The code below writes a new zarr and checks immediately if it's there using gcsfs. It seems to appear within a few seconds.
Is this what you meant?
```
%%time
import fsspec
import xarray as xr
import json
import gcsfs
# define a mapper to the ldeo-glaciology bucket. - needs a token
with open('../secrets/ldeo-glaciology-bc97b12df06b.json') as token_file:
token = json.load(token_file)
# get a mapper with fsspec for a new zarr
mapper = fsspec.get_mapper('gs://ldeo-glaciology/append_test/test11', mode='w', token=token)
# check what files are in there
fs = gcsfs.GCSFileSystem(project='pangeo-integration-te-3eea', mode='ab', cache_timeout = 0)
print('Files in the test directory before writing:')
filesBefore = fs.ls('gs://ldeo-glaciology/append_test/')
print(*filesBefore,sep='\n')
# define a simple datasets
ds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': [1, 2, 3]})
# write the simple dataset to zarr
ds0.to_zarr(mapper)
# check to see if the new file is there
print('Files in the test directory after writing:')
filesAfter = fs.ls('gs://ldeo-glaciology/append_test/')
print(*filesAfter,sep='\n')
```
```
Output:
Files in the test directory before writing:
ldeo-glaciology/append_test/test1
ldeo-glaciology/append_test/test10
ldeo-glaciology/append_test/test2
ldeo-glaciology/append_test/test3
ldeo-glaciology/append_test/test4
ldeo-glaciology/append_test/test5
ldeo-glaciology/append_test/test6
ldeo-glaciology/append_test/test7
ldeo-glaciology/append_test/test8
ldeo-glaciology/append_test/test9
Files in the test directory after writing:
ldeo-glaciology/append_test/test1
ldeo-glaciology/append_test/test10
ldeo-glaciology/append_test/test11
ldeo-glaciology/append_test/test2
ldeo-glaciology/append_test/test3
ldeo-glaciology/append_test/test4
ldeo-glaciology/append_test/test5
ldeo-glaciology/append_test/test6
ldeo-glaciology/append_test/test7
ldeo-glaciology/append_test/test8
ldeo-glaciology/append_test/test9
CPU times: user 130 ms, sys: 16.5 ms, total: 146 ms
Wall time: 2.19 s
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1030811490