pull_requests: 691699433
This data as json
id | node_id | number | state | locked | title | user | body | created_at | updated_at | closed_at | merged_at | merge_commit_sha | assignee | milestone | draft | head | base | author_association | auto_merge | repo | url | merged_by |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
691699433 | MDExOlB1bGxSZXF1ZXN0NjkxNjk5NDMz | 5615 | closed | 0 | add storage_options arg to to_zarr | 17162724 | <!-- Feel free to remove check-list items aren't relevant to your change --> - [ ] Tests added - [x] Passes `pre-commit run --all-files` - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` ### What does this PR do? Adds a `storage_options` arg to `to_zarr`. ### What is the `storage_options` arg? The `storage_options` arg is used throughout the pydata ecosystem where you can write a file to cloud storage. Such as: - **pandas**: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_parquet.html?highlight=to_parquet - **dask.dataframe**: https://docs.dask.org/en/latest/generated/dask.dataframe.to_parquet.html - **dask.array**: https://docs.dask.org/en/latest/generated/dask.array.to_zarr.html#dask.array.to_zarr It allows you to write to a prd account or a dev account for example. It is can be used currently in xarray when reading data via `open_dataset` e.g. `xr.open_dataset(..., backend_kwargs=dict(storage_options=storage_options)` ### Why is it needed? When working in cloud based environments you storage options are often automatically created. I recently found when doing things in parallel dask can hiccup on finding the credentials (see https://dask.slack.com/archives/C02282J3Q6Q/p1626201210115500) passing the storage options as an arg gave it some stability. I imagine this could help when writing multiple zarr stores to cloud storage in parallel (e.g. delayed). This PR also brings similar functionality of `pandas.DataFrame.to_parquet` and `dask.array.to_zarr` to xarray. ### How is this tested? Not obvious how to test. e.g. no `storage_options` test in dask.array https://github.com/dask/dask/blob/main/dask/array/tests/test_array_core.py Pandas has more extensive tests https://github.com/pandas-dev/pandas/blob/master/pandas/tests/io/test_fsspec.py ### Background: The idea for this PR is discussed here: https://github.com/pydata/xarray/discussions/5601 cc. @martindurant | 2021-07-16T19:26:54Z | 2021-08-21T23:19:12Z | 2021-08-21T22:52:18Z | 2021-08-21T22:52:18Z | befd1b98bd84047d62307419a30bcda7a0727926 | 0 | 9194359ba0dadb89c9b43819575dcf832ff04fcc | e26aec9500e04f3b926b248988b976dbfcb9c632 | CONTRIBUTOR | 13221727 | https://github.com/pydata/xarray/pull/5615 |
Links from other tables
- 1 row from pull_requests_id in labels_pull_requests