home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1039844354

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1039844354 I_kwDOAMm_X849-sQC 5918 Reading zarr gives unspecific PermissionError: Access Denied when public data has been consolidated after being written to S3 5509356 open 0     2 2021-10-29T18:54:23Z 2021-11-04T16:30:33Z   NONE      

This is a minor issue with an invalid data setup that's easy to get into, I'm reporting it here more for documentation than expecting a fix.

Quick summary: If you consolidate metadata on a public zarr dataset in S3, the .zmetadata files end up permission-restricted. So if you try reading with xarray 0.19, it gives an unclear error message and fails to open the dataset, while reading with xarray <=0.18.x goes fine. Contrast this with the nice warnings you get in 0.19 if the .zmetadata just doesn't exist.


How this happens

People who wrote data without consolidated=True in past versions of xarray might run into this issue, but in our case we actually did have that parameter originally.

I've previously reported a few issues with the fact that xarray will write arbitrary zarr hierarchies if the variable names contain slashes, and then can't read them properly. One consequence of this is that data written with consolidated=True still doesn't have .zmetadata files where they're needed for xarray.open_mfdataset to read them.

If you try to add the .zmetadata by running

fs = s3fs.S3FileSystem() zarr.consolidate_metadata(s3fs.S3Map(s3_path, s3=fs)) directly on the cloud bucket in S3, it writes the .zmetadata files... but permissions are restricted to the user who uploaded them even if you're writing to a public bucket. (It's an AWS thing.)


Why it's a problem

It would be nice if when xarray goes to read this data, it would see that it has access to the data but not to any usable .zmetadata and spit out the warning like it does if .zmetadata doesn't exist. Instead it fails on an uncaught PermissionError: Access Denied, and it's not clear from the output that this is just a .zmetadata issue and the user can still get the data by passing consolidated=False.

Another problem with this situation is that it causes data that reads just fine in xarray 0.18.x without even a warning message to suddenly give Access Denied from the same code when you update to xarray 0.19.


Work around

If you're trying to read a dataset that has this issue, you can get the same behavior as in previous versions of xarray like so:

xr.open_mfdataset(s3_lookups, engine="zarr", consolidated=False) The data loads fine, just more slowly than if the .zmetadata were accessible.


Stacktrace

```

ClientError Traceback (most recent call last) /opt/anaconda3/envs/gefs/lib/python3.9/site-packages/s3fs/core.py in _call_s3(self, method, akwarglist, kwargs) 245 try: --> 246 out = await method(*additional_kwargs) 247 return out

/opt/anaconda3/envs/gefs/lib/python3.9/site-packages/aiobotocore/client.py in _make_api_call(self, operation_name, api_params) 154 error_class = self.exceptions.from_code(error_code) --> 155 raise error_class(parsed_response, operation_name) 156 else:

ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied

The above exception was the direct cause of the following exception:

PermissionError Traceback (most recent call last) /var/folders/xf/xwjm3rj52ls9780rvrbbb9tm0000gn/T/ipykernel_84970/645651303.py in <module> 16 17 # Look up the data ---> 18 test_data = xr.open_mfdataset(s3_lookups, engine="zarr") 19 test_data

/opt/anaconda3/envs/gefs/lib/python3.9/site-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, kwargs) 911 getattr_ = getattr 912 --> 913 datasets = [open_(p, open_kwargs) for p in paths] 914 closers = [getattr_(ds, "_close") for ds in datasets] 915 if preprocess is not None:

/opt/anaconda3/envs/gefs/lib/python3.9/site-packages/xarray/backends/api.py in <listcomp>(.0) 911 getattr_ = getattr 912 --> 913 datasets = [open_(p, **open_kwargs) for p in paths] 914 closers = [getattr_(ds, "_close") for ds in datasets] 915 if preprocess is not None:

/opt/anaconda3/envs/gefs/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, args, *kwargs) 495 496 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None) --> 497 backend_ds = backend.open_dataset( 498 filename_or_obj, 499 drop_variables=drop_variables,

/opt/anaconda3/envs/gefs/lib/python3.9/site-packages/xarray/backends/zarr.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel, lock) 824 825 filename_or_obj = _normalize_path(filename_or_obj) --> 826 store = ZarrStore.open_group( 827 filename_or_obj, 828 group=group,

/opt/anaconda3/envs/gefs/lib/python3.9/site-packages/xarray/backends/zarr.py in open_group(cls, store, mode, synchronizer, group, consolidated, consolidate_on_close, chunk_store, storage_options, append_dim, write_region, safe_chunks, stacklevel) 367 if consolidated is None: 368 try: --> 369 zarr_group = zarr.open_consolidated(store, **open_kwargs) 370 except KeyError: 371 warnings.warn(

/opt/anaconda3/envs/gefs/lib/python3.9/site-packages/zarr/convenience.py in open_consolidated(store, metadata_key, mode, **kwargs) 1176 1177 # setup metadata store -> 1178 meta_store = ConsolidatedMetadataStore(store, metadata_key=metadata_key) 1179 1180 # pass through

/opt/anaconda3/envs/gefs/lib/python3.9/site-packages/zarr/storage.py in init(self, store, metadata_key) 2767 2768 # retrieve consolidated metadata -> 2769 meta = json_loads(store[metadata_key]) 2770 2771 # check format of consolidated metadata

/opt/anaconda3/envs/gefs/lib/python3.9/site-packages/fsspec/mapping.py in getitem(self, key, default) 131 k = self._key_to_str(key) 132 try: --> 133 result = self.fs.cat(k) 134 except self.missing_exceptions: 135 if default is not None:

/opt/anaconda3/envs/gefs/lib/python3.9/site-packages/fsspec/asyn.py in wrapper(args, kwargs) 86 def wrapper(args, kwargs): 87 self = obj or args[0] ---> 88 return sync(self.loop, func, *args, kwargs) 89 90 return wrapper

/opt/anaconda3/envs/gefs/lib/python3.9/site-packages/fsspec/asyn.py in sync(loop, func, timeout, args, *kwargs) 67 raise FSTimeoutError 68 if isinstance(result[0], BaseException): ---> 69 raise result[0] 70 return result[0] 71

/opt/anaconda3/envs/gefs/lib/python3.9/site-packages/fsspec/asyn.py in _runner(event, coro, result, timeout) 23 coro = asyncio.wait_for(coro, timeout=timeout) 24 try: ---> 25 result[0] = await coro 26 except Exception as ex: 27 result[0] = ex

/opt/anaconda3/envs/gefs/lib/python3.9/site-packages/fsspec/asyn.py in _cat(self, path, recursive, on_error, **kwargs) 342 ex = next(filter(is_exception, out), False) 343 if ex: --> 344 raise ex 345 if ( 346 len(paths) > 1

/opt/anaconda3/envs/gefs/lib/python3.9/site-packages/s3fs/core.py in _cat_file(self, path, version_id, start, end) 849 else: 850 head = {} --> 851 resp = await self._call_s3( 852 "get_object", 853 Bucket=bucket,

/opt/anaconda3/envs/gefs/lib/python3.9/site-packages/s3fs/core.py in _call_s3(self, method, akwarglist, *kwargs) 263 except Exception as e: 264 err = e --> 265 raise translate_boto_error(err) 266 267 call_s3 = sync_wrapper(_call_s3)

PermissionError: Access Denied ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5918/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 0.783ms · About: xarray-datasette