home / github / pull_requests

Menu
  • GraphQL API
  • Search all tables

pull_requests: 137819104

This data as json

id node_id number state locked title user body created_at updated_at closed_at merged_at merge_commit_sha assignee milestone draft head base author_association auto_merge repo url merged_by
137819104 MDExOlB1bGxSZXF1ZXN0MTM3ODE5MTA0 1528 closed 0 WIP: Zarr backend 1197350 - [x] Closes #1223 - [x] Tests added / passed - [x] Passes ``git diff upstream/master | flake8 --diff`` - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API I think that a zarr backend could be the ideal storage format for xarray datasets, overcoming many of the frustrations associated with netcdf and enabling optimal performance on cloud platforms. This is a very basic start to implementing a zarr backend (as proposed in #1223); however, I am taking a somewhat different approach. I store the whole dataset in a single zarr group. I encode the extra metadata needed by xarray (so far just dimension information) as attributes within the zarr group and child arrays. I hide these special attributes from the user by wrapping the attribute dictionaries in a "`HiddenKeyDict`", so that they can't be viewed or modified. I have no tests yet (:flushed:), but the following code works. ```python from xarray.backends.zarr import ZarrStore import xarray as xr import numpy as np ds = xr.Dataset( {'foo': (('y', 'x'), np.ones((100, 200)), {'myattr1': 1, 'myattr2': 2}), 'bar': (('x',), np.zeros(200))}, {'y': (('y',), np.arange(100)), 'x': (('x',), np.arange(200))}, {'some_attr': 'copana'} ).chunk({'y': 50, 'x': 40}) zs = ZarrStore(store='zarr_test') ds.dump_to_store(zs) ds2 = xr.Dataset.load_store(zs) assert ds2.equals(ds) ``` There is a very long way to go here, but I thought I would just get a PR started. Some questions that would help me move forward. 1. What is "encoding" at the variable level? (I have never understood this part of xarray.) How should encoding be handled with zarr? 1. Should we encode / decode CF for zarr stores? 1. Do we want to always automatically align dask chunks with the underlying zarr chunks? 1. What sort of public API should the zarr backend have? Should you be able to load zarr stores via `open_dataset`? Or do we need a new method? I think `.to_zarr()` would be quite useful. 1. zarr arrays are extensible along all axes. What does this imply for unlimited dimensions? 1. Is any autoclose logic needed? As far as I can tell, zarr objects don't need to be closed. 2017-08-27T02:38:01Z 2018-02-13T21:35:03Z 2017-12-14T02:11:36Z 2017-12-14T02:11:36Z 8fe7eb0fbcb7aaa90d894bcf32dc1408735e5d9d     0 f5633cabd19189675b607379badc2c19b86c0b8e 89a1a9883c0c8409dad8dbcccf1ab73a3ea2cafc MEMBER   13221727 https://github.com/pydata/xarray/pull/1528  

Links from other tables

  • 2 rows from pull_requests_id in labels_pull_requests
Powered by Datasette ยท Queries took 0.932ms