pull_requests: 137819104

This data as json

id	node_id	number	state	locked	title	user	body	created_at	updated_at	closed_at	merged_at	merge_commit_sha	assignee	milestone	draft	head	base	author_association	auto_merge	repo	url	merged_by
137819104	MDExOlB1bGxSZXF1ZXN0MTM3ODE5MTA0	1528	closed	0	WIP: Zarr backend	1197350	- [x] Closes #1223 - [x] Tests added / passed - [x] Passes ``git diff upstream/master \| flake8 --diff`` - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API I think that a zarr backend could be the ideal storage format for xarray datasets, overcoming many of the frustrations associated with netcdf and enabling optimal performance on cloud platforms. This is a very basic start to implementing a zarr backend (as proposed in #1223); however, I am taking a somewhat different approach. I store the whole dataset in a single zarr group. I encode the extra metadata needed by xarray (so far just dimension information) as attributes within the zarr group and child arrays. I hide these special attributes from the user by wrapping the attribute dictionaries in a "`HiddenKeyDict`", so that they can't be viewed or modified. I have no tests yet (:flushed:), but the following code works. ```python from xarray.backends.zarr import ZarrStore import xarray as xr import numpy as np ds = xr.Dataset( {'foo': (('y', 'x'), np.ones((100, 200)), {'myattr1': 1, 'myattr2': 2}), 'bar': (('x',), np.zeros(200))}, {'y': (('y',), np.arange(100)), 'x': (('x',), np.arange(200))}, {'some_attr': 'copana'} ).chunk({'y': 50, 'x': 40}) zs = ZarrStore(store='zarr_test') ds.dump_to_store(zs) ds2 = xr.Dataset.load_store(zs) assert ds2.equals(ds) ``` There is a very long way to go here, but I thought I would just get a PR started. Some questions that would help me move forward. 1. What is "encoding" at the variable level? (I have never understood this part of xarray.) How should encoding be handled with zarr? 1. Should we encode / decode CF for zarr stores? 1. Do we want to always automatically align dask chunks with the underlying zarr chunks? 1. What sort of public API should the zarr backend have? Should you be able to load zarr stores via `open_dataset`? Or do we need a new method? I think `.to_zarr()` would be quite useful. 1. zarr arrays are extensible along all axes. What does this imply for unlimited dimensions? 1. Is any autoclose logic needed? As far as I can tell, zarr objects don't need to be closed.	2017-08-27T02:38:01Z	2018-02-13T21:35:03Z	2017-12-14T02:11:36Z	2017-12-14T02:11:36Z	8fe7eb0fbcb7aaa90d894bcf32dc1408735e5d9d			0	f5633cabd19189675b607379badc2c19b86c0b8e	89a1a9883c0c8409dad8dbcccf1ab73a3ea2cafc	MEMBER		13221727	https://github.com/pydata/xarray/pull/1528

Links from other tables

2 rows from pull_requests_id in labels_pull_requests