home / github / pull_requests

Menu
  • GraphQL API
  • Search all tables

pull_requests: 691699433

This data as json

id node_id number state locked title user body created_at updated_at closed_at merged_at merge_commit_sha assignee milestone draft head base author_association auto_merge repo url merged_by
691699433 MDExOlB1bGxSZXF1ZXN0NjkxNjk5NDMz 5615 closed 0 add storage_options arg to to_zarr 17162724 <!-- Feel free to remove check-list items aren't relevant to your change --> - [ ] Tests added - [x] Passes `pre-commit run --all-files` - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` ### What does this PR do? Adds a `storage_options` arg to `to_zarr`. ### What is the `storage_options` arg? The `storage_options` arg is used throughout the pydata ecosystem where you can write a file to cloud storage. Such as: - **pandas**: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_parquet.html?highlight=to_parquet - **dask.dataframe**: https://docs.dask.org/en/latest/generated/dask.dataframe.to_parquet.html - **dask.array**: https://docs.dask.org/en/latest/generated/dask.array.to_zarr.html#dask.array.to_zarr It allows you to write to a prd account or a dev account for example. It is can be used currently in xarray when reading data via `open_dataset` e.g. `xr.open_dataset(..., backend_kwargs=dict(storage_options=storage_options)` ### Why is it needed? When working in cloud based environments you storage options are often automatically created. I recently found when doing things in parallel dask can hiccup on finding the credentials (see https://dask.slack.com/archives/C02282J3Q6Q/p1626201210115500) passing the storage options as an arg gave it some stability. I imagine this could help when writing multiple zarr stores to cloud storage in parallel (e.g. delayed). This PR also brings similar functionality of `pandas.DataFrame.to_parquet` and `dask.array.to_zarr` to xarray. ### How is this tested? Not obvious how to test. e.g. no `storage_options` test in dask.array https://github.com/dask/dask/blob/main/dask/array/tests/test_array_core.py Pandas has more extensive tests https://github.com/pandas-dev/pandas/blob/master/pandas/tests/io/test_fsspec.py ### Background: The idea for this PR is discussed here: https://github.com/pydata/xarray/discussions/5601 cc. @martindurant 2021-07-16T19:26:54Z 2021-08-21T23:19:12Z 2021-08-21T22:52:18Z 2021-08-21T22:52:18Z befd1b98bd84047d62307419a30bcda7a0727926     0 9194359ba0dadb89c9b43819575dcf832ff04fcc e26aec9500e04f3b926b248988b976dbfcb9c632 CONTRIBUTOR   13221727 https://github.com/pydata/xarray/pull/5615  

Links from other tables

  • 1 row from pull_requests_id in labels_pull_requests
Powered by Datasette · Queries took 0.962ms