issues: 1039844354
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1039844354 | I_kwDOAMm_X849-sQC | 5918 | Reading zarr gives unspecific PermissionError: Access Denied when public data has been consolidated after being written to S3 | 5509356 | open | 0 | 2 | 2021-10-29T18:54:23Z | 2021-11-04T16:30:33Z | NONE | This is a minor issue with an invalid data setup that's easy to get into, I'm reporting it here more for documentation than expecting a fix. Quick summary: If you consolidate metadata on a public zarr dataset in S3, the .zmetadata files end up permission-restricted. So if you try reading with xarray 0.19, it gives an unclear error message and fails to open the dataset, while reading with xarray <=0.18.x goes fine. Contrast this with the nice warnings you get in 0.19 if the .zmetadata just doesn't exist. How this happens People who wrote data without consolidated=True in past versions of xarray might run into this issue, but in our case we actually did have that parameter originally. I've previously reported a few issues with the fact that xarray will write arbitrary zarr hierarchies if the variable names contain slashes, and then can't read them properly. One consequence of this is that data written with consolidated=True still doesn't have .zmetadata files where they're needed for xarray.open_mfdataset to read them. If you try to add the .zmetadata by running
Why it's a problem It would be nice if when xarray goes to read this data, it would see that it has access to the data but not to any usable .zmetadata and spit out the warning like it does if .zmetadata doesn't exist. Instead it fails on an uncaught PermissionError: Access Denied, and it's not clear from the output that this is just a .zmetadata issue and the user can still get the data by passing consolidated=False. Another problem with this situation is that it causes data that reads just fine in xarray 0.18.x without even a warning message to suddenly give Access Denied from the same code when you update to xarray 0.19. Work around If you're trying to read a dataset that has this issue, you can get the same behavior as in previous versions of xarray like so:
Stacktrace ```ClientError Traceback (most recent call last) /opt/anaconda3/envs/gefs/lib/python3.9/site-packages/s3fs/core.py in _call_s3(self, method, akwarglist, kwargs) 245 try: --> 246 out = await method(*additional_kwargs) 247 return out /opt/anaconda3/envs/gefs/lib/python3.9/site-packages/aiobotocore/client.py in _make_api_call(self, operation_name, api_params) 154 error_class = self.exceptions.from_code(error_code) --> 155 raise error_class(parsed_response, operation_name) 156 else: ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied The above exception was the direct cause of the following exception: PermissionError Traceback (most recent call last) /var/folders/xf/xwjm3rj52ls9780rvrbbb9tm0000gn/T/ipykernel_84970/645651303.py in <module> 16 17 # Look up the data ---> 18 test_data = xr.open_mfdataset(s3_lookups, engine="zarr") 19 test_data /opt/anaconda3/envs/gefs/lib/python3.9/site-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, kwargs) 911 getattr_ = getattr 912 --> 913 datasets = [open_(p, open_kwargs) for p in paths] 914 closers = [getattr_(ds, "_close") for ds in datasets] 915 if preprocess is not None: /opt/anaconda3/envs/gefs/lib/python3.9/site-packages/xarray/backends/api.py in <listcomp>(.0) 911 getattr_ = getattr 912 --> 913 datasets = [open_(p, **open_kwargs) for p in paths] 914 closers = [getattr_(ds, "_close") for ds in datasets] 915 if preprocess is not None: /opt/anaconda3/envs/gefs/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, args, *kwargs) 495 496 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None) --> 497 backend_ds = backend.open_dataset( 498 filename_or_obj, 499 drop_variables=drop_variables, /opt/anaconda3/envs/gefs/lib/python3.9/site-packages/xarray/backends/zarr.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel, lock) 824 825 filename_or_obj = _normalize_path(filename_or_obj) --> 826 store = ZarrStore.open_group( 827 filename_or_obj, 828 group=group, /opt/anaconda3/envs/gefs/lib/python3.9/site-packages/xarray/backends/zarr.py in open_group(cls, store, mode, synchronizer, group, consolidated, consolidate_on_close, chunk_store, storage_options, append_dim, write_region, safe_chunks, stacklevel) 367 if consolidated is None: 368 try: --> 369 zarr_group = zarr.open_consolidated(store, **open_kwargs) 370 except KeyError: 371 warnings.warn( /opt/anaconda3/envs/gefs/lib/python3.9/site-packages/zarr/convenience.py in open_consolidated(store, metadata_key, mode, **kwargs) 1176 1177 # setup metadata store -> 1178 meta_store = ConsolidatedMetadataStore(store, metadata_key=metadata_key) 1179 1180 # pass through /opt/anaconda3/envs/gefs/lib/python3.9/site-packages/zarr/storage.py in init(self, store, metadata_key) 2767 2768 # retrieve consolidated metadata -> 2769 meta = json_loads(store[metadata_key]) 2770 2771 # check format of consolidated metadata /opt/anaconda3/envs/gefs/lib/python3.9/site-packages/fsspec/mapping.py in getitem(self, key, default) 131 k = self._key_to_str(key) 132 try: --> 133 result = self.fs.cat(k) 134 except self.missing_exceptions: 135 if default is not None: /opt/anaconda3/envs/gefs/lib/python3.9/site-packages/fsspec/asyn.py in wrapper(args, kwargs) 86 def wrapper(args, kwargs): 87 self = obj or args[0] ---> 88 return sync(self.loop, func, *args, kwargs) 89 90 return wrapper /opt/anaconda3/envs/gefs/lib/python3.9/site-packages/fsspec/asyn.py in sync(loop, func, timeout, args, *kwargs) 67 raise FSTimeoutError 68 if isinstance(result[0], BaseException): ---> 69 raise result[0] 70 return result[0] 71 /opt/anaconda3/envs/gefs/lib/python3.9/site-packages/fsspec/asyn.py in _runner(event, coro, result, timeout) 23 coro = asyncio.wait_for(coro, timeout=timeout) 24 try: ---> 25 result[0] = await coro 26 except Exception as ex: 27 result[0] = ex /opt/anaconda3/envs/gefs/lib/python3.9/site-packages/fsspec/asyn.py in _cat(self, path, recursive, on_error, **kwargs) 342 ex = next(filter(is_exception, out), False) 343 if ex: --> 344 raise ex 345 if ( 346 len(paths) > 1 /opt/anaconda3/envs/gefs/lib/python3.9/site-packages/s3fs/core.py in _cat_file(self, path, version_id, start, end) 849 else: 850 head = {} --> 851 resp = await self._call_s3( 852 "get_object", 853 Bucket=bucket, /opt/anaconda3/envs/gefs/lib/python3.9/site-packages/s3fs/core.py in _call_s3(self, method, akwarglist, *kwargs) 263 except Exception as e: 264 err = e --> 265 raise translate_boto_error(err) 266 267 call_s3 = sync_wrapper(_call_s3) PermissionError: Access Denied ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5918/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
13221727 | issue |