issues: 1685503657
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1685503657 | I_kwDOAMm_X85kdr6p | 7789 | Cannot access zarr data on Azure using shared access signatures (SAS) | 8382834 | closed | 0 | 1 | 2023-04-26T18:21:08Z | 2023-04-26T18:32:33Z | 2023-04-26T18:31:01Z | CONTRIBUTOR | What happened?I am trying to access some zarr data that are stored on Azure blob storage. I am able to access them using the Azure account name and key method. I.e. this works fine, and I get a
However, if I understand well, it is not recommended to use the account name and key to just read some zarr data on Azure: this is using a "far too powerful" method to just access data, and it is better to use a dedicated SAS token for this kind of tasks (see for example the first answer in the discussion at https://github.com/Azure/azure-storage-azcopy/issues/1867 ). If I understand correctly, the zarr backend functionality is provided through the following "chaining" of backends: xarray -> zarr -> fsspec -> adlfs. This looks good, as it seems like adlfs supports using SAS: see https://github.com/fsspec/adlfs , setting credentials include ``` In [26]: xr.open_mfdataset(file_list, engine="zarr", storage_options={'sas_token': AZURE_STORAGE_SAS}) ValueError Traceback (most recent call last) File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/adlfs/spec.py:447, in AzureBlobFileSystem.do_connect(self) 446 else: --> 447 raise ValueError( 448 "Must provide either a connection_string or account_name with credentials!!" 449 ) 451 except RuntimeError: ValueError: Must provide either a connection_string or account_name with credentials!! During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) Cell In[26], line 1 ----> 1 xr.open_mfdataset([filename], engine="zarr", storage_options={'sas_token': AZURE_STORAGE_SAS}) File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/xarray/backends/api.py:982, in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, kwargs) 979 open_ = open_dataset 980 getattr_ = getattr --> 982 datasets = [open_(p, open_kwargs) for p in paths] 983 closers = [getattr_(ds, "_close") for ds in datasets] 984 if preprocess is not None: File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/xarray/backends/api.py:982, in <listcomp>(.0) 979 open_ = open_dataset 980 getattr_ = getattr --> 982 datasets = [open_(p, **open_kwargs) for p in paths] 983 closers = [getattr_(ds, "_close") for ds in datasets] 984 if preprocess is not None: File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/xarray/backends/api.py:525, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, backend_kwargs, kwargs) 513 decoders = _resolve_decoders_kwargs( 514 decode_cf, 515 open_backend_dataset_parameters=backend.open_dataset_parameters, (...) 521 decode_coords=decode_coords, 522 ) 524 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None) --> 525 backend_ds = backend.open_dataset( 526 filename_or_obj, 527 drop_variables=drop_variables, 528 decoders, 529 kwargs, 530 ) 531 ds = _dataset_from_backend_dataset( 532 backend_ds, 533 filename_or_obj, (...) 541 kwargs, 542 ) 543 return ds File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/xarray/backends/zarr.py:908, in ZarrBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel, zarr_version) 887 def open_dataset( # type: ignore[override] # allow LSP violation, not supporting **kwargs 888 self, 889 filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore, (...) 905 zarr_version=None, 906 ) -> Dataset: 907 filename_or_obj = _normalize_path(filename_or_obj) --> 908 store = ZarrStore.open_group( 909 filename_or_obj, 910 group=group, 911 mode=mode, 912 synchronizer=synchronizer, 913 consolidated=consolidated, 914 consolidate_on_close=False, 915 chunk_store=chunk_store, 916 storage_options=storage_options, 917 stacklevel=stacklevel + 1, 918 zarr_version=zarr_version, 919 ) 921 store_entrypoint = StoreBackendEntrypoint() 922 with close_on_error(store): File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/xarray/backends/zarr.py:419, in ZarrStore.open_group(cls, store, mode, synchronizer, group, consolidated, consolidate_on_close, chunk_store, storage_options, append_dim, write_region, safe_chunks, stacklevel, zarr_version) 417 if consolidated is None: 418 try: --> 419 zarr_group = zarr.open_consolidated(store, **open_kwargs) 420 except KeyError: 421 try: File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/zarr/convenience.py:1282, in open_consolidated(store, metadata_key, mode, **kwargs) 1280 # normalize parameters 1281 zarr_version = kwargs.get('zarr_version') -> 1282 store = normalize_store_arg(store, storage_options=kwargs.get("storage_options"), mode=mode, 1283 zarr_version=zarr_version) 1284 if mode not in {'r', 'r+'}: 1285 raise ValueError("invalid mode, expected either 'r' or 'r+'; found {!r}" 1286 .format(mode)) File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/zarr/storage.py:181, in normalize_store_arg(store, storage_options, mode, zarr_version) 179 else: 180 raise ValueError("zarr_version must be either 2 or 3") --> 181 return normalize_store(store, storage_options, mode) File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/zarr/storage.py:154, in _normalize_store_arg_v2(store, storage_options, mode) 152 if isinstance(store, str): 153 if "://" in store or "::" in store: --> 154 return FSStore(store, mode=mode, **(storage_options or {})) 155 elif storage_options: 156 raise ValueError("storage_options passed with non-fsspec path") File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/zarr/storage.py:1345, in FSStore.init(self, url, normalize_keys, key_separator, mode, exceptions, dimension_separator, fs, check, create, missing_exceptions, storage_options) 1343 if protocol in (None, "file") and not storage_options.get("auto_mkdir"): 1344 storage_options["auto_mkdir"] = True -> 1345 self.map = fsspec.get_mapper(url, {mapper_options, storage_options}) 1346 self.fs = self.map.fs # for direct operations 1347 self.path = self.fs._strip_protocol(url) File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/fsspec/mapping.py:237, in get_mapper(url, check, create, missing_exceptions, alternate_root, kwargs)
206 """Create key-value interface for given URL and options
207
208 The URL will be of the form "protocol://location" and point to the root
(...)
234 File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/fsspec/core.py:375, in url_to_fs(url, kwargs) 373 inkwargs["fo"] = urls 374 urlpath, protocol, _ = chain[0] --> 375 fs = filesystem(protocol, inkwargs) 376 return fs, urlpath File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/fsspec/registry.py:257, in filesystem(protocol, storage_options) 250 warnings.warn( 251 "The 'arrow_hdfs' protocol has been deprecated and will be " 252 "removed in the future. Specify it as 'hdfs'.", 253 DeprecationWarning, 254 ) 256 cls = get_filesystem_class(protocol) --> 257 return cls(storage_options) File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/fsspec/spec.py:76, in Cached.__call__(cls, args, kwargs) 74 return cls._cache[token] 75 else: ---> 76 obj = super().call(args, **kwargs) 77 # Setting _fs_token here causes some static linters to complain. 78 obj._fs_token = token File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/adlfs/spec.py:281, in AzureBlobFileSystem.init(self, account_name, account_key, connection_string, credential, sas_token, request_session, socket_timeout, blocksize, client_id, client_secret, tenant_id, anon, location_mode, loop, asynchronous, default_fill_cache, default_cache_type, version_aware, kwargs) 269 if ( 270 self.credential is None 271 and self.anon is False 272 and self.sas_token is None 273 and self.account_key is None 274 ): 276 ( 277 self.credential, 278 self.sync_credential, 279 ) = self._get_default_azure_credential(kwargs) --> 281 self.do_connect() 282 weakref.finalize(self, sync, self.loop, close_service_client, self) 284 if self.credential is not None: File ~/miniconda3/envs/harvest/lib/python3.11/site-packages/adlfs/spec.py:457, in AzureBlobFileSystem.do_connect(self) 454 self.do_connect() 456 except Exception as e: --> 457 raise ValueError(f"unable to connect to account for {e}") ValueError: unable to connect to account for Must provide either a connection_string or account_name with credentials!! ``` What did you expect to happen?I would expect to be able to access the zarr dataset on Azure using the SAS token alone, as I can do in for example Minimal Complete Verifiable ExampleI cannot share the access tokens / account name and key unfortunately as these are secret, so this makes it hard to create a MCVE. MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment``` In [27]: xr.show_versions() INSTALLED VERSIONScommit: None python: 3.11.3 | packaged by conda-forge | (main, Apr 6 2023, 08:57:19) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.15.0-69-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.0 libnetcdf: 4.9.2 xarray: 2023.4.2 pandas: 2.0.1 numpy: 1.24.2 scipy: 1.10.1 netCDF4: 1.6.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.14.2 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2022.12.1 distributed: 2022.12.1 matplotlib: 3.7.1 cartopy: 0.21.1 seaborn: None numbagg: None fsspec: 2023.4.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 67.7.2 pip: 23.1.1 conda: None pytest: None mypy: None IPython: 8.12.0 sphinx: None ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7789/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |