html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/1528#issuecomment-364954680,https://api.github.com/repos/pydata/xarray/issues/1528,364954680,MDEyOklzc3VlQ29tbWVudDM2NDk1NDY4MA==,1197350,2018-02-12T15:21:51Z,2018-02-12T15:21:51Z,MEMBER,I'm enjoying this discussion. Zarr offers lots of new possibilities for appending / updating datasets that we should try to support. I personally would really like to be able to append / extend existing arrays from within xarray.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-351401474,https://api.github.com/repos/pydata/xarray/issues/1528,351401474,MDEyOklzc3VlQ29tbWVudDM1MTQwMTQ3NA==,1197350,2017-12-13T14:09:12Z,2017-12-13T14:09:12Z,MEMBER,Will merge later today if no further comments.,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-350365780,https://api.github.com/repos/pydata/xarray/issues/1528,350365780,MDEyOklzc3VlQ29tbWVudDM1MDM2NTc4MA==,1197350,2017-12-08T20:36:26Z,2017-12-08T20:36:26Z,MEMBER,Any more reviews? @fmaussion & @pwolfram: you have experience with backends. Your reviews would be valuable.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-350336238,https://api.github.com/repos/pydata/xarray/issues/1528,350336238,MDEyOklzc3VlQ29tbWVudDM1MDMzNjIzOA==,1197350,2017-12-08T18:26:58Z,2017-12-08T18:26:58Z,MEMBER,"There is a silly lingering issue that I need help resolving.
In a8b478543a978bd98c37711609c610432fdc7d07, @jhamman added a function `_replace_slices_with_arrays` related to vectorized indexing. This function contains a line
```python
array_subspace_size = max(
(k.ndim for k in key if isinstance(k, np.ndarray)), default=0)
```
The `default` keyword was introduced in python 3.4, so this doesn't work in 2.7. I have tried a couple of options to overcome this but none of them have worked. Would someone care to help out with this? It is possibly the last remaining issue to resolve before this PR is really ready to be merged.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-349992006,https://api.github.com/repos/pydata/xarray/issues/1528,349992006,MDEyOklzc3VlQ29tbWVudDM0OTk5MjAwNg==,1197350,2017-12-07T14:59:12Z,2017-12-07T14:59:12Z,MEMBER,"@jhamman, I can't reproduce your error. If you can give me a reproducible example, I will make a test for it.
I think this is converging.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-349766763,https://api.github.com/repos/pydata/xarray/issues/1528,349766763,MDEyOklzc3VlQ29tbWVudDM0OTc2Njc2Mw==,1197350,2017-12-06T20:36:03Z,2017-12-06T20:36:03Z,MEMBER,"@jhamman - but the error being raised is wrong! There is a string formatting error raised in trying to generate a useful, informative error message.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-349540155,https://api.github.com/repos/pydata/xarray/issues/1528,349540155,MDEyOklzc3VlQ29tbWVudDM0OTU0MDE1NQ==,1197350,2017-12-06T05:38:26Z,2017-12-06T05:38:26Z,MEMBER,"I believe that this is now complete enough to consider merging. I have addressed nearly all of @shoyer's suggestions. I have added a bunch more tests and am now quite satisfied with the test suite. I wrote some basic documentation, with the usual disclaimers about the experimental nature of this new feature.
The zarr tests will not run if the zarr version is less than 2.2.0. This is not released yet. This means that only the py36-zarr-dev build actually runs the zarr tests. Once @alimanfoo releases the next version, the zarr tests should kick in on all the builds.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-349495568,https://api.github.com/repos/pydata/xarray/issues/1528,349495568,MDEyOklzc3VlQ29tbWVudDM0OTQ5NTU2OA==,1197350,2017-12-06T01:08:11Z,2017-12-06T01:08:11Z,MEMBER,@jhamman - could you elaborate on the nature of the error you got with uneven dask chunks. We should be catching this and raising a useful error message.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-348564159,https://api.github.com/repos/pydata/xarray/issues/1528,348564159,MDEyOklzc3VlQ29tbWVudDM0ODU2NDE1OQ==,1197350,2017-12-01T17:58:59Z,2017-12-01T17:59:06Z,MEMBER,"Sorry this has become such a behemoth. I know it is hard to review. I couldn't see how to make a more atomic PR because a new backend has lots of interrelated parts that need each other in order to work.
To finish it up, I propose to raise an error when attempting to encode variable-length string data. If someone can give me a quick one liner to help identify such datatypes, that would be helpful.
We will revisit these encoding issues once Stephan's refactoring is merged. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-347989858,https://api.github.com/repos/pydata/xarray/issues/1528,347989858,MDEyOklzc3VlQ29tbWVudDM0Nzk4OTg1OA==,1197350,2017-11-29T20:42:34Z,2017-11-29T20:42:34Z,MEMBER,"Actually, I think I just realized how to do it without too much pain. Stand by.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-347987097,https://api.github.com/repos/pydata/xarray/issues/1528,347987097,MDEyOklzc3VlQ29tbWVudDM0Nzk4NzA5Nw==,1197350,2017-11-29T20:32:07Z,2017-11-29T20:32:07Z,MEMBER,"> Is it possible to add one of these filters to XArray's default use of Zarr?
Because of the way the backends are structured right now, it is hard to bypass the existing encoding and replace it with a new encoding scheme. #1087 will make this easy to do. But now it is complicated.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-347983448,https://api.github.com/repos/pydata/xarray/issues/1528,347983448,MDEyOklzc3VlQ29tbWVudDM0Nzk4MzQ0OA==,1197350,2017-11-29T20:18:08Z,2017-11-29T20:18:08Z,MEMBER,"Right now I am in a dilemma over how to move forward. Fixing this string encoding issue will require some serious hacks to cf encoding. If I do this before #1087 is finished, it will be a waste of time (and a pain). On the other hand #1087 could take a long time, since it is a major refactor itself.
Is there some way to punt on the multi-length string encoding for now? We could just error if such variables are present. This would allow us to get the experimental zarr backend out into the wild. FWIW, none of the datasets I want to use this with actually have any string data variables at all. I believe 95% of netcdf datasets are just regular numbers. This is an edge case.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-347382612,https://api.github.com/repos/pydata/xarray/issues/1528,347382612,MDEyOklzc3VlQ29tbWVudDM0NzM4MjYxMg==,1197350,2017-11-28T01:21:34Z,2017-11-28T01:21:34Z,MEMBER,"> When still in the original interpreter session, all the objects still exist
> in memory, so all the pointers stored in the array are still valid.
Do you think this persistence could affect xarray's tests? The way the tests work is via a context manager, like this
```python
@contextlib.contextmanager
def roundtrip(self, data, save_kwargs={}, open_kwargs={},
allow_cleanup_failure=False):
with create_tmp_file(
suffix='.zarr',
allow_cleanup_failure=allow_cleanup_failure) as tmp_file:
data.to_zarr(store=tmp_file, **save_kwargs)
yield xr.open_zarr(tmp_file, **open_kwargs)
```
Do we need to add an extra step after `data.to_zarr` to somehow purge such objects?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-347381865,https://api.github.com/repos/pydata/xarray/issues/1528,347381865,MDEyOklzc3VlQ29tbWVudDM0NzM4MTg2NQ==,1197350,2017-11-28T01:16:58Z,2017-11-28T01:16:58Z,MEMBER,"`Out[2]: Bus error: 10` π±
Perhaps zarr should raise an error when assigning `zgs.x[:] = values`?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-347380750,https://api.github.com/repos/pydata/xarray/issues/1528,347380750,MDEyOklzc3VlQ29tbWVudDM0NzM4MDc1MA==,1197350,2017-11-28T01:10:01Z,2017-11-28T01:10:10Z,MEMBER,">zarr needs a filter that can encode and pack the strings into a single buffer, except in the special case where the data are being stored in-memory
@alimanfoo: the following also seems to works with directory store
```python
values = np.array([b'ab', b'cdef', np.nan], dtype=object)
zgs = zarr.open_group(store='zarr_directory')
zgs.create('x', shape=values.shape, dtype=values.dtype)
zgs.x[:] = values
```
This seems to contradict your statement above. What am I missing?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-347323043,https://api.github.com/repos/pydata/xarray/issues/1528,347323043,MDEyOklzc3VlQ29tbWVudDM0NzMyMzA0Mw==,1197350,2017-11-27T20:48:35Z,2017-11-27T20:53:28Z,MEMBER,"After a few more tweaks, this is now quite close to passing all the `CFEncodedDataTest` tests.
The remaining issues are all related to the encoding of strings. Basically, zarr's handling of strings:
http://zarr.readthedocs.io/en/latest/tutorial.html?highlight=strings#string-arrays
is considerably different from netCDF's. Because `ZarrStore` is a subclass of `WritableCFDataStore`, all of the dataset variables get passed through `encode_cf_variable` before writing. This screws up things that actually work already quite naturally.
Consider the following direct creation of a variable length string in zarr:
```python
values = np.array([b'ab', b'cdef', np.nan], dtype=object)
zgs = zarr.open_group()
zgs.create('x', shape=values.shape, dtype=values.dtype)
zgs.x[:] = values
zgs.x
```
```
Array(/x, (3,), object, chunks=(3,), order=C)
nbytes: 24; nbytes_stored: 350; ratio: 0.1; initialized: 1/1
compressor: Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
store: DictStore
```
It seems we can encode variable-length strings into objects just fine. (`np.testing.assert_array_equal(values, zgs.x[:])` fails only because of the `nan` value. The array round-trips just fine.)
However, after passing through xarray's cf encoding, this no longer works:
```python
encoding = {'_FillValue': b'X', 'dtype': 'S1'}
original = xr.Dataset({'x': ('t', values, {}, encoding)})
zarr_dict_store = {}
original.to_zarr(store=zarr_dict_store)
zs = zarr.open_group(store=zarr_dict_store)
print(zs.x)
print(zs.x[:])
```
```
Array(/x, (3, 4), |S1, chunks=(3, 4), order=C)
nbytes: 12; nbytes_stored: 428; ratio: 0.0; initialized: 1/1
compressor: Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
store: dict
array([[b'a', b'b', b'', b''],
[b'c', b'd', b'e', b'f'],
[b'X', b'', b'', b'']],
dtype='|S1')
```
Here is everything that happens in `encode_cf_variable`:
```python
var = maybe_encode_datetime(var, name=name)
var = maybe_encode_timedelta(var, name=name)
var, needs_copy = maybe_encode_offset_and_scale(var, needs_copy, name=name)
var, needs_copy = maybe_encode_fill_value(var, needs_copy, name=name)
var = maybe_encode_nonstring_dtype(var, name=name)
var = maybe_default_fill_value(var)
var = maybe_encode_bools(var)
var = ensure_dtype_not_object(var, name=name)
var = maybe_encode_string_dtype(var, name=name)
```
The challenge now is to figure out which parts of this we need to bypass for zarr and how to implement that bypassing.
Overall, I find the `conventions` module to be a bit unwieldy. There is a lot of stuff in there, not all of which is related to CF conventions. It would be useful to separate the actual conventions from the encoding / decoding needed for different backends.
At this point, I would appreciate some input from an encoding expert before I go refactoring stuff.
edit: The actual tests that fail are `CFEncodedDataTest.test_roundtrip_bytes_with_fill_value` and `CFEncodedDataTest.test_roundtrip_string_encoded_characters`. One option to move forward would be just to skip those tests for zarr. I am eager to get this out in the wild to see how it plays with real datasets.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-345574445,https://api.github.com/repos/pydata/xarray/issues/1528,345574445,MDEyOklzc3VlQ29tbWVudDM0NTU3NDQ0NQ==,1197350,2017-11-20T02:21:08Z,2017-11-20T02:21:08Z,MEMBER,"Those following this thread will probably be very excited to learn that the following code works with my zarr_backend branch:
```python
import gcsfs
fs = gcsfs.GCSFileSystem(project='pangeo-181919', token=None)
gcsmap = gcsfs.mapping.GCSMap('zarr_store_test', gcs=fs, check=True, create=False)
ds.to_zarr(store=gcsmap)
ds_gcs = xr.open_zarr(gcsmap, mode='r')
```
I never doubted this would be possible, but seeing it in action is quite exciting!","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-345126452,https://api.github.com/repos/pydata/xarray/issues/1528,345126452,MDEyOklzc3VlQ29tbWVudDM0NTEyNjQ1Mg==,1197350,2017-11-17T02:24:56Z,2017-11-17T02:24:56Z,MEMBER,"@jhamman would it screw you up if I pushed a few commits tonight? I wonβt touch the ZarrArrayWrapper. But I figured out how to fix auto_chunk.
Sent from my iPhone
> On Nov 16, 2017, at 7:12 PM, Matthew Rocklin wrote:
>
> Hooray for standard interfaces!
>
> β
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-345034208,https://api.github.com/repos/pydata/xarray/issues/1528,345034208,MDEyOklzc3VlQ29tbWVudDM0NTAzNDIwOA==,1197350,2017-11-16T19:22:01Z,2017-11-16T19:22:01Z,MEMBER,"Some things I would like to add to the zarr test suite:
- [ ] specifying zarr-specific encoding options ([compressors and filters](http://zarr.readthedocs.io/en/latest/tutorial.html#compressors))
- [ ] writing to different zarr storage backends (e.g. dict store, can we mock an S3 store?)
- [ ] different combinations of zarr and dask chunks. one <=> one, many <=> one are supported; many <=> one and many <=> many should raise errors / warnings (not thread safe)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-345030848,https://api.github.com/repos/pydata/xarray/issues/1528,345030848,MDEyOklzc3VlQ29tbWVudDM0NTAzMDg0OA==,1197350,2017-11-16T19:10:31Z,2017-11-16T19:10:31Z,MEMBER,"> FYI: I'm playing with your branch a bit today.
Great! If you use the latest zarr master, you should get the same test results as this travis build:
https://travis-ci.org/pydata/xarray/jobs/301606996
There are two outstanding failures related to encoding (`test_roundtrip_bytes_with_fill_value` and `test_roundtrip_string_encoded_characters`). And auto-caching is not working (`test_dataset_caching`). I consider these pretty minor.
The biggest problem is that, for reasons I don't understand, my ""auto-chunking"" behavior does not work (this is covered by the only zarr-specific test method: `test_auto_chunk`). My goal is to have zarr be lazy-by-default and create dask chunks for every zarr chunk. However, my implementation of this does not work:
https://github.com/pydata/xarray/pull/1528/files#diff-1bba25ab0d8275d763572bfdd10377c6R325
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-344040853,https://api.github.com/repos/pydata/xarray/issues/1528,344040853,MDEyOklzc3VlQ29tbWVudDM0NDA0MDg1Mw==,1197350,2017-11-13T20:04:12Z,2017-11-13T20:04:12Z,MEMBER,π¬ that's my punishment for being slow!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-339815147,https://api.github.com/repos/pydata/xarray/issues/1528,339815147,MDEyOklzc3VlQ29tbWVudDMzOTgxNTE0Nw==,1197350,2017-10-26T22:07:10Z,2017-10-26T22:07:10Z,MEMBER,"Fantastic! Are you planning a release any time soon? If not we can set up to test against the github master.
Sent from my iPhone
> On Oct 26, 2017, at 5:04 PM, Alistair Miles wrote:
>
> Just to say, support for 0d arrays, and for arrays with one or more zero-length dimensions, is in zarr master.
>
> β
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-335204883,https://api.github.com/repos/pydata/xarray/issues/1528,335204883,MDEyOklzc3VlQ29tbWVudDMzNTIwNDg4Mw==,1197350,2017-10-09T16:09:50Z,2017-10-09T16:09:50Z,MEMBER,"> I'm on paternity leave for the next 2 weeks
Congratulations! If you could just merge alimanfoo/zarr#154, it would really help us move forward.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-335162205,https://api.github.com/repos/pydata/xarray/issues/1528,335162205,MDEyOklzc3VlQ29tbWVudDMzNTE2MjIwNQ==,1197350,2017-10-09T13:43:49Z,2017-10-09T13:43:49Z,MEMBER,"> I won't be able to put any effort into zarr in the
> next month
Does this include merging PRs?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-335027491,https://api.github.com/repos/pydata/xarray/issues/1528,335027491,MDEyOklzc3VlQ29tbWVudDMzNTAyNzQ5MQ==,1197350,2017-10-08T18:23:50Z,2017-10-08T18:23:50Z,MEMBER,"> For thoroughness this might be worth doing with custom JSON encoder on the zarr side, but would also be easy to do in the xarray wrapper.
My impression is that zarr development is moving conservatively, so we would be better off finding workarounds in xarray.
@shoyer: where in the code would you recommend putting this logic? It seems like part of encoding / decoding to me.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-334981929,https://api.github.com/repos/pydata/xarray/issues/1528,334981929,MDEyOklzc3VlQ29tbWVudDMzNDk4MTkyOQ==,1197350,2017-10-08T04:16:58Z,2017-10-08T18:21:30Z,MEMBER,"There are two zarr issues that are causing some tests to fail:
1. zarr can't store zero-dimensional arrays.
```python
za = zarr.create(shape=(), store='tmp_file')
za[...] = 0
```
raises a file permission error. I believe that this is alimanfoo/zarr#150.
1. lots of the things that xarray likes to put in attributes are not serializable by zarr
```python
za = zarr.create(shape=(1), store='tmp_file')
za.attrs['foo'] = np.float32(0)
```
raises `TypeError: Object of type 'float32' is not JSON serializable`. This is alimanfoo/zarr#156.
Most of the failures of tests inherited from `CFEncodedDataTest` can be attributed to one of these two issues.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-334982373,https://api.github.com/repos/pydata/xarray/issues/1528,334982373,MDEyOklzc3VlQ29tbWVudDMzNDk4MjM3Mw==,1197350,2017-10-08T04:31:02Z,2017-10-08T04:31:09Z,MEMBER,"I worked on this on the plane back from Seattle. Yay for having no internet access!
Would appreciate feedback on the questions raised above from @shoyer, @jhamman, and anyone else with backend expertise.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-334633708,https://api.github.com/repos/pydata/xarray/issues/1528,334633708,MDEyOklzc3VlQ29tbWVudDMzNDYzMzcwOA==,1197350,2017-10-06T01:15:05Z,2017-10-06T01:15:05Z,MEMBER,"Here is where we are at with the Zarr backend tests
```
xarray/tests/test_backends.py::ZarrDataTest::test_coordinates_encoding PASSED
xarray/tests/test_backends.py::ZarrDataTest::test_dataset_caching FAILED
xarray/tests/test_backends.py::ZarrDataTest::test_dataset_compute PASSED
xarray/tests/test_backends.py::ZarrDataTest::test_default_fill_value FAILED
xarray/tests/test_backends.py::ZarrDataTest::test_encoding_kwarg FAILED
xarray/tests/test_backends.py::ZarrDataTest::test_encoding_same_dtype PASSED
xarray/tests/test_backends.py::ZarrDataTest::test_invalid_dataarray_names_raise FAILED
xarray/tests/test_backends.py::ZarrDataTest::test_load PASSED
xarray/tests/test_backends.py::ZarrDataTest::test_orthogonal_indexing FAILED
xarray/tests/test_backends.py::ZarrDataTest::test_pickle FAILED
xarray/tests/test_backends.py::ZarrDataTest::test_pickle_dataarray PASSED
xarray/tests/test_backends.py::ZarrDataTest::test_roundtrip_None_variable PASSED
xarray/tests/test_backends.py::ZarrDataTest::test_roundtrip_boolean_dtype PASSED
xarray/tests/test_backends.py::ZarrDataTest::test_roundtrip_coordinates PASSED
xarray/tests/test_backends.py::ZarrDataTest::test_roundtrip_datetime_data FAILED
xarray/tests/test_backends.py::ZarrDataTest::test_roundtrip_endian PASSED
xarray/tests/test_backends.py::ZarrDataTest::test_roundtrip_example_1_netcdf FAILED
xarray/tests/test_backends.py::ZarrDataTest::test_roundtrip_float64_data PASSED
xarray/tests/test_backends.py::ZarrDataTest::test_roundtrip_mask_and_scale FAILED
xarray/tests/test_backends.py::ZarrDataTest::test_roundtrip_object_dtype FAILED
xarray/tests/test_backends.py::ZarrDataTest::test_roundtrip_string_data PASSED
xarray/tests/test_backends.py::ZarrDataTest::test_roundtrip_strings_with_fill_value FAILED
xarray/tests/test_backends.py::ZarrDataTest::test_roundtrip_test_data PASSED
xarray/tests/test_backends.py::ZarrDataTest::test_roundtrip_timedelta_data FAILED
xarray/tests/test_backends.py::ZarrDataTest::test_unsigned_roundtrip_mask_and_scale FAILED
xarray/tests/test_backends.py::ZarrDataTest::test_write_store PASSED
xarray/tests/test_backends.py::ZarrDataTest::test_zero_dimensional_variable FAILED
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-334633152,https://api.github.com/repos/pydata/xarray/issues/1528,334633152,MDEyOklzc3VlQ29tbWVudDMzNDYzMzE1Mg==,1197350,2017-10-06T01:10:29Z,2017-10-06T01:10:29Z,MEMBER,"With @jhamman's help, I just made a little progress on this.
We now have a bare bones test suite for the zarr backend. This is very helpful for revealing where more work is needed: encoding. So the next step is to seriously confront that issue. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-333336320,https://api.github.com/repos/pydata/xarray/issues/1528,333336320,MDEyOklzc3VlQ29tbWVudDMzMzMzNjMyMA==,1197350,2017-09-30T21:13:48Z,2017-09-30T21:13:48Z,MEMBER,@martindurant: I may have some time to get back to working on this next week. (Especially if @jhamman can help me sort out the backend testing.) What is the status of your branch?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-327849640,https://api.github.com/repos/pydata/xarray/issues/1528,327849640,MDEyOklzc3VlQ29tbWVudDMyNzg0OTY0MA==,1197350,2017-09-07T16:17:13Z,2017-09-07T16:17:13Z,MEMBER,"I am stuck on figuring out how to develop a new test case for this. (It doesn't help that #1531 is messing up the backend tests.)
If @shoyer can give us a few hints about how to best implement a test class (i.e. what to subclass, etc.), I think that could jumpstart testing and move the PR forward.
I welcome contributions from others such as @martindurant on this. I won't have much time in the near future, since a new semester just dropped on me like a load of bricks.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-325738019,https://api.github.com/repos/pydata/xarray/issues/1528,325738019,MDEyOklzc3VlQ29tbWVudDMyNTczODAxOQ==,1197350,2017-08-29T17:35:09Z,2017-08-29T17:35:09Z,MEMBER,"One path forward for now would be to ignore the filters like `FixedScaleOffset` that are not present in netCDF, let xarray handle the CF encoding / decoding, and just put the compressors (e.g. `Blosc`, `Zlib`) and their parameters in the xarray variable encoding.
If we think there is an advantage to using the zarr native filters, that could be added via a future PR once we have the basic backend working.
@alimanfoo: when do you anticipate the 2.2 zarr release to happen? Will the API change significantly? If so, I will wait for that to move forward here.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-325690352,https://api.github.com/repos/pydata/xarray/issues/1528,325690352,MDEyOklzc3VlQ29tbWVudDMyNTY5MDM1Mg==,1197350,2017-08-29T14:54:53Z,2017-08-29T14:54:53Z,MEMBER,"I am now trying to understand the backend test suite structure.
Can someone explain to me why so many tests are skipped? For example, if I run
```
py.test -v xarray/tests/test_backends.py -rsx -k GenericNetCDFDataTest
```
I get
```
================================================== test session starts ==================================================
platform darwin -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0 -- /Users/rpa/anaconda/bin/python
cachedir: .cache
rootdir: /Users/rpa/RND/Public/xarray, inifile: setup.cfg
plugins: cov-2.5.1
collected 683 items
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_coordinates_encoding SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_cross_engine_read_write_netcdf3 PASSED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_dataset_caching SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_dataset_compute SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_default_fill_value SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_encoding_kwarg SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_encoding_same_dtype SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_encoding_unlimited_dims PASSED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_engine PASSED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_invalid_dataarray_names_raise SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_load SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_orthogonal_indexing PASSED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_pickle SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_pickle_dataarray SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_None_variable SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_boolean_dtype SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_coordinates SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_datetime_data SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_endian SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_example_1_netcdf SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_float64_data SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_mask_and_scale SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_object_dtype SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_string_data SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_strings_with_fill_value SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_test_data SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_timedelta_data SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_write_store PASSED
xarray/tests/test_backends.py::GenericNetCDFDataTest::test_zero_dimensional_variable SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_coordinates_encoding SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_cross_engine_read_write_netcdf3 PASSED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_dataset_caching SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_dataset_compute SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_default_fill_value SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_encoding_kwarg SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_encoding_same_dtype SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_encoding_unlimited_dims PASSED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_engine PASSED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_invalid_dataarray_names_raise SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_load SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_orthogonal_indexing PASSED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_pickle SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_pickle_dataarray SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_None_variable SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_boolean_dtype SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_coordinates SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_datetime_data SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_endian SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_example_1_netcdf SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_float64_data SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_mask_and_scale SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_object_dtype SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_string_data SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_strings_with_fill_value SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_test_data SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_timedelta_data SKIPPED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_write_store PASSED
xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_zero_dimensional_variable SKIPPED
================================================ short test summary info ================================================
SKIP [2] xarray/tests/test_backends.py:382: requires pynio
SKIP [2] xarray/tests/test_backends.py:214: requires pynio
SKIP [2] xarray/tests/test_backends.py:178: requires pynio
SKIP [2] xarray/tests/test_backends.py:468: requires pynio
SKIP [2] xarray/tests/test_backends.py:439: requires pynio
SKIP [2] xarray/tests/test_backends.py:490: requires pynio
SKIP [2] xarray/tests/test_backends.py:428: requires pynio
SKIP [2] xarray/tests/test_backends.py:145: requires pynio
SKIP [2] xarray/tests/test_backends.py:197: requires pynio
SKIP [2] xarray/tests/test_backends.py:207: requires pynio
SKIP [2] xarray/tests/test_backends.py:230: requires pynio
SKIP [2] xarray/tests/test_backends.py:311: requires pynio
SKIP [2] xarray/tests/test_backends.py:300: requires pynio
SKIP [2] xarray/tests/test_backends.py:271: requires pynio
SKIP [2] xarray/tests/test_backends.py:409: requires pynio
SKIP [2] xarray/tests/test_backends.py:291: requires pynio
SKIP [2] xarray/tests/test_backends.py:286: requires pynio
SKIP [2] xarray/tests/test_backends.py:362: requires pynio
SKIP [2] xarray/tests/test_backends.py:235: requires pynio
SKIP [2] xarray/tests/test_backends.py:264: requires pynio
SKIP [2] xarray/tests/test_backends.py:334: requires pynio
SKIP [2] xarray/tests/test_backends.py:139: requires pynio
SKIP [2] xarray/tests/test_backends.py:280: requires pynio
SKIP [2] xarray/tests/test_backends.py:109: requires pynio
```
Those line numbers refer to all of the skipped methods. Why should I need pynio to run those tests?
It looks like the same thing is happening on travis: https://travis-ci.org/pydata/xarray/jobs/268805771#L1527
Maybe @pwolfram understands this stuff?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-325660754,https://api.github.com/repos/pydata/xarray/issues/1528,325660754,MDEyOklzc3VlQ29tbWVudDMyNTY2MDc1NA==,1197350,2017-08-29T13:18:33Z,2017-08-29T13:18:33Z,MEMBER,"> encoding keeps track of how variables are represented in a file (e.g., chunking schemes, _FillValue/add_offset/scale_factor compression, time units), so we reconstruct a netCDF file that looks almost exactly like the file we've read from disk.
Is the goal here to be able to round-trip the file, such that calling `.to_netcdf()` produces an identical file to the original source file? For zarr, I think this would mean having the ability to read from one zarr store into xarray, and then write back to a different store, and have these two stores be identical. That makes sense to me.
I *don't* understand how encoding interacts with attributes? When is something an attribute vs. an encoding (`add_offset` for example)? How does xarray know whether the store automatically encodes / decodes the encodings vs. when it has to be done by xarray, e.g. by calling [`mask_and_scale`](https://github.com/pydata/xarray/blob/master/xarray/conventions.py#L35)
> > Should we encode / decode CF for zarr stores?
> Yes, probably, if we want to handle netcdf conventions for times, fill values and scaling.
Does this mean that my `ZarrStore` should inherit from `WritableCFDataStore` instead of `AbstractWritableDataStore`?
Regarding encoding, zarr has its own internal mechanism for encoding, which it calls ""filters"", that closely resemble some of the CF encoding options. For example the [`FixedScaleOffset`](http://zarr.readthedocs.io/en/latest/api/codecs.html#zarr.codecs.FixedScaleOffset) filter does something similar to as xarray's [`mask_and_scale`](https://github.com/pydata/xarray/blob/master/xarray/conventions.py#L35) function.
I don't yet understand how to make these elements work together properly, for example, do avoid applying the scale / offset function twice, as I mentioned above.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-325226656,https://api.github.com/repos/pydata/xarray/issues/1528,325226656,MDEyOklzc3VlQ29tbWVudDMyNTIyNjY1Ng==,1197350,2017-08-27T21:42:23Z,2017-08-27T21:42:23Z,MEMBER,"> Is the aim to reduce the number of metadata files hanging around?
This is also part of my goal. I think all the metadata can be stored internally to zarr via attributes. There just have to be some ""special"" attributes that xarray hides from the user. This is the same as h5netcdf.
@alimanfoo suggested this should be possible in that earlier thread:
> Specifically I'm wondering if this could all be stored as attributes on the
Zarr array, with some conventions for special xarray attribute names?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-325226495,https://api.github.com/repos/pydata/xarray/issues/1528,325226495,MDEyOklzc3VlQ29tbWVudDMyNTIyNjQ5NQ==,1197350,2017-08-27T21:38:35Z,2017-08-27T21:38:35Z,MEMBER,"> Could you comment more on the difference between your approach and mine?
Your functions are a great proof of concept for the relative ease of interoperability between xarray and zarr. What I have done here is to implement an xarray ""backend"" (i.e. DataStore) that uses zarr as its storage medium. This puts zarr on the same level as netCDF and HDF5 as a ""first class"" storage format for xarray data, as suggested by @shoyer in the comment on that thread. My hope is that this will enable the magical performance benefits that you have anticipated.
Digging deeper into that thread, I see @shoyer makes the following proposition:
> So we could either directly write a DataStore or write a separate ""znetcdf"" or ""netzdf"" module that implements an interface similar to [h5netcdf](https://github.com/shoyer/h5netcdf) (which itself is a thin wrapper on top of h5py).
With this PR, I have started to do the former (write a DataStore). However, I can already see the wisdom of what he says next:
> All things being equal, I would prefer the later approach, because people seem to find these intermediate interfaces useful, and it would help clarify the specification of the file format vs. details of how xarray uses it.
I have already implemented my own [custom DataStore](https://github.com/xgcm/xmitgcm/blob/master/xmitgcm/mds_store.py) for a different project, so I felt comfortable diving into this. But I might end up reinventing the wheel several times over if I continue down this road. In particular, I can see that my `HiddenKeyDict` is very similar to h5netcdf's [treatment of attributes](https://github.com/shoyer/h5netcdf/blob/master/h5netcdf/attrs.py#L9). (I had never looked at the h5netcdf code until just now!)
On the other hand, zarr is so simple to use that a separate wrapper package might be overkill.
So I am still not sure whether the approach I am taking here is worth pursuing further. I consider this a highly experimental PR, and I'm really looking for feedback.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-325173551,https://api.github.com/repos/pydata/xarray/issues/1528,325173551,MDEyOklzc3VlQ29tbWVudDMyNTE3MzU1MQ==,1197350,2017-08-27T02:40:22Z,2017-08-27T02:40:22Z,MEMBER,"cc @martindurant, @mrocklin, @alimanfoo","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694