home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 946543524

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
946543524 MDExOlB1bGxSZXF1ZXN0NjkxNjk5NDMz 5615 add storage_options arg to to_zarr 17162724 closed 0     7 2021-07-16T19:26:54Z 2021-08-21T23:19:12Z 2021-08-21T22:52:18Z CONTRIBUTOR   0 pydata/xarray/pulls/5615
  • [ ] Tests added
  • [x] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst

What does this PR do?

Adds a storage_options arg to to_zarr.

What is the storage_options arg?

The storage_options arg is used throughout the pydata ecosystem where you can write a file to cloud storage. Such as:

  • pandas: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_parquet.html?highlight=to_parquet
  • dask.dataframe: https://docs.dask.org/en/latest/generated/dask.dataframe.to_parquet.html
  • dask.array: https://docs.dask.org/en/latest/generated/dask.array.to_zarr.html#dask.array.to_zarr

It allows you to write to a prd account or a dev account for example.

It is can be used currently in xarray when reading data via open_dataset e.g. xr.open_dataset(..., backend_kwargs=dict(storage_options=storage_options)

Why is it needed?

When working in cloud based environments you storage options are often automatically created. I recently found when doing things in parallel dask can hiccup on finding the credentials (see https://dask.slack.com/archives/C02282J3Q6Q/p1626201210115500) passing the storage options as an arg gave it some stability.

I imagine this could help when writing multiple zarr stores to cloud storage in parallel (e.g. delayed).

This PR also brings similar functionality of pandas.DataFrame.to_parquet and dask.array.to_zarr to xarray.

How is this tested?

Not obvious how to test. e.g. no storage_options test in dask.array https://github.com/dask/dask/blob/main/dask/array/tests/test_array_core.py

Pandas has more extensive tests https://github.com/pandas-dev/pandas/blob/master/pandas/tests/io/test_fsspec.py

Background:

The idea for this PR is discussed here: https://github.com/pydata/xarray/discussions/5601 cc. @martindurant

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5615/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 pull

Links from other tables

  • 1 row from issues_id in issues_labels
  • 7 rows from issue in issue_comments
Powered by Datasette · Queries took 1.428ms · About: xarray-datasette