html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/1528#issuecomment-365412033,https://api.github.com/repos/pydata/xarray/issues/1528,365412033,MDEyOklzc3VlQ29tbWVudDM2NTQxMjAzMw==,6042212,2018-02-13T21:35:03Z,2018-02-13T21:35:03Z,CONTRIBUTOR,"Yeah, ideally when adding a variable like
```
ds['myvar'] = xr.DataArray(data=da.zeros(..., chunks=(..)), dims=['l', 'b', 'v'])
ds.to_zarr(mapping)
```
we should be able to apply an optimization strategy in which the zarr array is created without filling in all those unnecessary zeros. This seems doable.
On the other hand, implementing
```
ds.myvar[slice, slice, slice] = some data
ds.to_zarr(mapping)
```
(which cannot be done currently with dask-arrays at all), in such a way that only partitions with data get updated - this seems really hard.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-364817111,https://api.github.com/repos/pydata/xarray/issues/1528,364817111,MDEyOklzc3VlQ29tbWVudDM2NDgxNzExMQ==,6042212,2018-02-12T02:43:43Z,2018-02-12T03:47:48Z,CONTRIBUTOR,"OK, so the way to do this in pure-zarr appears to be to simply create the appropriate zarr array and set it's dimensions attribute:
```
ds = xr.Dataset(coords={'b': np.arange(-4, 6, 0.005),
'l': np.arange(150, 72, -0.005),
'v': np.arange(58722.24288, -164706.4225401, -8.2446e2)},
ds.to_zarr(mapping)
g = zarr.open_group(mapping)
arr = g.zeros(..., shape like l, b, v)
arr.attrs['_ARRAY_DIMENSIONS'] = ['l', 'b', 'v']
```
`xr..open_zarr(mapping)` now shows the new array, without having to materialize any data into it, and `arr` can be written to piecemeal - without the convenience of the coordinate mapping, of course.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-364804697,https://api.github.com/repos/pydata/xarray/issues/1528,364804697,MDEyOklzc3VlQ29tbWVudDM2NDgwNDY5Nw==,6042212,2018-02-12T00:19:55Z,2018-02-12T00:19:55Z,CONTRIBUTOR,"It might be enough, in this case, to provide some helper function in zarr to create and fetch arrays that will show up as variables in xarray - this need not be specific to being used via dask. I am assuming with the work done in this PR, that there is an unambiguous way to determine if a zarr group can be interpreted as an xarray dataset, and that zarr then knows how to add things that look like variables (which generally in the zarr case don't involve writing any actual data until the parts of the array are filled in).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-364803984,https://api.github.com/repos/pydata/xarray/issues/1528,364803984,MDEyOklzc3VlQ29tbWVudDM2NDgwMzk4NA==,6042212,2018-02-12T00:12:36Z,2018-02-12T00:12:36Z,CONTRIBUTOR,"@jhamman , that partially solves what I mean, I can probably turn my data into dask arrays with some difficulty; but really I was hoping for something like the following:
```
ds = xr.Dataset(coords={'b': np.arange(-4, 6, 0.005),
'l': np.arange(150, 72, -0.005),
'v': np.arange(58722.24288, -164706.4225401, -8.2446e2)},
arr = ds.create_new_zero_array(dims=['l', 'b', 'v'])
arr[0:10, :, :] = 1
```
and expect to be able to set the values of the new variable in the same way that you can with the equivalent zarr array. I can probably get around this by setting the values with `da.zeros`, finding the zarr array in the dataset, and then setting its values.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-364801073,https://api.github.com/repos/pydata/xarray/issues/1528,364801073,MDEyOklzc3VlQ29tbWVudDM2NDgwMTA3Mw==,6042212,2018-02-11T23:35:34Z,2018-02-11T23:35:34Z,CONTRIBUTOR,"Question: how would one *build* a zarr-xarray dataset?
With zarr you can open an array that contains no data, and use set-slice notation to fill in the values (which is what dask's store essentially does).
If I have some pre-known coordinates and bigger-than-memory data arrays, how would I go about getting the values into the zarr structure? If this can't be done directly with the xarray interface, is there a way to call zarr's open/create/zeros such that the corresponding array will appear as a variable when the same dataset is opened with xarray?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-345770374,https://api.github.com/repos/pydata/xarray/issues/1528,345770374,MDEyOklzc3VlQ29tbWVudDM0NTc3MDM3NA==,6042212,2017-11-20T17:37:01Z,2017-11-20T17:37:01Z,CONTRIBUTOR,"This is, of course, by design :)
I imagine there is much that could be done to optimise performance, but for fewer, larger chunks, it should be pretty good.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-345104440,https://api.github.com/repos/pydata/xarray/issues/1528,345104440,MDEyOklzc3VlQ29tbWVudDM0NTEwNDQ0MA==,6042212,2017-11-17T00:10:19Z,2017-11-17T00:10:19Z,CONTRIBUTOR,"`hdfs3` also has a MutableMapping for HDFS. I did not succeed in getting one into azure-datalake-store, but it would not be hard to make. In this way, zarr can become a pretty general array cloud storage mechanism.","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-333400272,https://api.github.com/repos/pydata/xarray/issues/1528,333400272,MDEyOklzc3VlQ29tbWVudDMzMzQwMDI3Mg==,6042212,2017-10-01T19:26:22Z,2017-10-01T19:26:22Z,CONTRIBUTOR,"I have not done anything, I'm afraid, since posting my commit, the content of which is just an example of how you might pass parameters down to zarr, and a test-case which shows that the basic data is round-tripping properly, but actually the dataset does not come back with the same structure as it started off.
We can loop back and decide where to go from here.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-327901739,https://api.github.com/repos/pydata/xarray/issues/1528,327901739,MDEyOklzc3VlQ29tbWVudDMyNzkwMTczOQ==,6042212,2017-09-07T19:36:15Z,2017-09-07T19:36:15Z,CONTRIBUTOR,"@shoyer , is https://github.com/martindurant/xarray/commit/6c1fb6b76ebba862a1c5831210ce026160da0065 a reasonable start ?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-327833777,https://api.github.com/repos/pydata/xarray/issues/1528,327833777,MDEyOklzc3VlQ29tbWVudDMyNzgzMzc3Nw==,6042212,2017-09-07T15:23:31Z,2017-09-07T15:23:31Z,CONTRIBUTOR,"@rabernat , is there anything I can do to help push this along?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-325728378,https://api.github.com/repos/pydata/xarray/issues/1528,325728378,MDEyOklzc3VlQ29tbWVudDMyNTcyODM3OA==,6042212,2017-08-29T17:00:29Z,2017-08-29T17:00:29Z,CONTRIBUTOR,"A further rather big advantage in zarr that I'm not aware of in cdf/hdf (I may be wrong) is not just null values, but not having a given block be written to disc at all if it only contains null data. This probably meshes perfectly well with most user's understanding of missing data/fill value.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-325727354,https://api.github.com/repos/pydata/xarray/issues/1528,325727354,MDEyOklzc3VlQ29tbWVudDMyNTcyNzM1NA==,6042212,2017-08-29T16:57:10Z,2017-08-29T16:57:10Z,CONTRIBUTOR,"Worth pointing out here, that the zarr filter-set is extensible (I suppose hdf5 is too, but I don't think this is ever done in practice), but I don't think it makes any particular claims to performance.
I think both of the options above are reasonable, and there is no particular reason to exclude either: a zarr variable could look to xarray like floats but actually be stored as ints (i.e., arguments are passed to zarr), or it could look like ints which xarray expects to inflate to floats (i.e., stored as an attribute). I mean, if a user stores a float variable, but includes kwargs to zarr for scale/filter (or any other filter arguments), we should make no attempt to interrupt that.
The only question is, if the user wishes to apply scale/offset in xarray, which is their most likely intention? I would guess the latter, compute in xarray and use attributes, since xarray users probably don't know about zarr and its filters.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-325390391,https://api.github.com/repos/pydata/xarray/issues/1528,325390391,MDEyOklzc3VlQ29tbWVudDMyNTM5MDM5MQ==,6042212,2017-08-28T15:41:08Z,2017-08-28T15:41:08Z,CONTRIBUTOR,"@rabernat : on actually looking through your code :) Happy to see you doing exactly as I felt I was not knowledgeable to do and poke xarray's guts. If I can help in any way, please let me know, although I don't have a lot of spare hours right now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-325220001,https://api.github.com/repos/pydata/xarray/issues/1528,325220001,MDEyOklzc3VlQ29tbWVudDMyNTIyMDAwMQ==,6042212,2017-08-27T19:46:31Z,2017-08-27T19:46:31Z,CONTRIBUTOR,"Sorry that I let this slide - there was not a huge upswell of interest around what I had done, and I was not ready to dive into xarray internals.
Could you comment more on the difference between your approach and mine? Is the aim to reduce the number of metadata files hanging around? zarr has made an effort with the groups interface to parallel netCDF, which is, after all, what xarray essentially expects of all its data sources.
As in [this comment](https://github.com/pydata/xarray/issues/1223#issuecomment-274230041) I have come to the realisation that although nice to/from zarr methods can be made relatively easily, they will not get traction unless they can be put within a class that mimics the existing xarray infrastructure, i.e., the user would never know, except that magically they have extra encoding/compression options, the file-path can be an S3 URL (say), and dask parallel computation suddenly works on a cluster and/or with out-of-core processing.
That would raise some eyebrows!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694