html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/7833#issuecomment-1544199022,https://api.github.com/repos/pydata/xarray/issues/7833,1544199022,IC_kwDOAMm_X85cCptu,703554,2023-05-11T15:26:52Z,2023-05-11T15:26:52Z,CONTRIBUTOR,"Awesome, thanks @kmuehlbauer and @Illviljan 🙏🏻 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1704950804
https://github.com/pydata/xarray/issues/3564#issuecomment-1190061811,https://api.github.com/repos/pydata/xarray/issues/3564,1190061811,IC_kwDOAMm_X85G7ubz,703554,2022-07-20T09:44:40Z,2022-07-20T09:44:40Z,CONTRIBUTOR,"Hi folks, 

Just to mention that we've created a short tutorial on xarray which is meant as a gentle intro to folks coming from the malaria genetics field, who mostly have never heard of xarray before. We illustrate xarray first using outputs from a geostatistical model of how insecticide-treated bednets are used in Africa. We then give a couple of brief examples of how we use xarray for genomic data. There's video walkthroughs in French and English:

https://anopheles-genomic-surveillance.github.io/workshop-5/module-1-xarray.html

Please feel free to link to this in the xarray tutorial site if you'd like to :)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,527323165
https://github.com/pydata/xarray/issues/6771#issuecomment-1190057727,https://api.github.com/repos/pydata/xarray/issues/6771,1190057727,IC_kwDOAMm_X85G7tb_,703554,2022-07-20T09:40:41Z,2022-07-20T09:41:07Z,CONTRIBUTOR,"Hi @dcherian,

> We are currently reworking https://tutorial.xarray.dev/intro.html and would love to either add your material or link to it if you're creating a consolidated collection of genetics-related material. xref (#3564). We don't have a ""domain-specific"" section yet but are planning to create one after SciPy.

FWIW we've created a short tutorial on xarray which is meant as a gentle intro to folks coming from the malaria genetics field. We illustrate xarray first using outputs from a geostatistical model of how insecticide-treated bednets are used in Africa. We then give a couple of brief examples of how we use xarray for genomic data. There's video walkthroughs in French and English:

https://anopheles-genomic-surveillance.github.io/workshop-5/module-1-xarray.html

Please feel free to link to this in the xarray tutorial site if you'd like to :)","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,1300534066
https://github.com/pydata/xarray/issues/6771#issuecomment-1190052947,https://api.github.com/repos/pydata/xarray/issues/6771,1190052947,IC_kwDOAMm_X85G7sRT,703554,2022-07-20T09:36:10Z,2022-07-20T09:36:10Z,CONTRIBUTOR,"Hi @TomNicholas,

> > I would've thought that latitude and longitude would be 1-dimensional coordinate variables, yet they are drawn as 2-D arrays?
> 
> I think that if you assume that the axes of your grid data align with the cardinal directions (East-West / North-South) then you would expect latitude and longitude to be 1D, but if they don't align then the coordinates would need be 2D (i.e. if x and y are merely arbitrary lines along the Earth's surface).
> 
> I agree with you though that 2D lat/lon grids are unnecessarily confusing, especially for non-geoscience users.

Interesting, I hadn't considered that. Definitely a bit mind-bending though for us non-geoscientists :)

> I like the second diagram you showed more (it's also a neater version of the labelled one I made [here](https://github.com/pydata/xarray/pull/6076)). I think it's debatable whether `elevation` and `land_cover` constitute coordinates or data variables, but I have no strong opinion on that.
> 
> As for improvements, I think it would be clearer to at least use the second image over the first, and perhaps we could improve it further.

SGTM. FWIW on the second diagram I would use ""dimensions"" instead of ""indexes"". Getting dimensions first then helps to explain how you can use a coordinate variable to index a dimension.
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1300534066
https://github.com/pydata/xarray/issues/324#issuecomment-1054526670,https://api.github.com/repos/pydata/xarray/issues/324,1054526670,IC_kwDOAMm_X84-2szO,703554,2022-02-28T18:10:02Z,2022-02-28T18:10:02Z,CONTRIBUTOR,"Still relevant, would like to be able to group by multiple variables along a single dimension.","{""total_count"": 6, ""+1"": 6, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,58117200
https://github.com/pydata/xarray/issues/4663#issuecomment-802732278,https://api.github.com/repos/pydata/xarray/issues/4663,802732278,MDEyOklzc3VlQ29tbWVudDgwMjczMjI3OA==,703554,2021-03-19T10:44:31Z,2021-03-19T10:44:31Z,CONTRIBUTOR,"Thanks @dcherian.

Just to add that if we make progress with supporting indexing with dask arrays then at some point I think we'll hit a separate issue, which is that xarray will require that the chunk sizes of the indexed arrays are computed, but currently calling the dask array method `compute_chunk_sizes()` is inefficient for n-d arrays. Raised here: https://github.com/dask/dask/issues/7416

In case anyone needs a workaround for indexing a dataset with a 1d boolean dask array, I'm currently using [this hacked implementation](https://github.com/malariagen/malariagen-data-python/blob/e39dac2404f8f8c37449169bee0f61dd9c6fcb8c/malariagen_data/util.py#L129) of a compress() style function that operates on an xarray dataset, which includes more efficient computation of chunk sizes. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,759709924
https://github.com/pydata/xarray/issues/5054#issuecomment-802101178,https://api.github.com/repos/pydata/xarray/issues/5054,802101178,MDEyOklzc3VlQ29tbWVudDgwMjEwMTE3OA==,703554,2021-03-18T16:45:51Z,2021-03-18T16:58:44Z,CONTRIBUTOR,"FWIW my use case actually only needs indexing a single dimension, i.e., something equivalent to the numpy (or dask.array) [compress](https://numpy.org/doc/stable/reference/generated/numpy.compress.html) function. This can be hacked for xarray datasets in a fairly straightforward way:

```python
def _compress_dataarray(a, indexer, dim):
    data = a.data
    try:
        axis = a.dims.index(dim)
    except ValueError:
        v = data
    else:
        # rely on __array_function__ to handle dispatching to dask if
        # data is a dask array
        v = np.compress(indexer, a.data, axis=axis)
        if hasattr(v, 'compute_chunk_sizes'):
            # needed to know dim lengths
            v.compute_chunk_sizes()
    return v


def compress_dataset(ds, indexer, dim):
    if isinstance(indexer, str):
        indexer = ds[indexer].data
    
    coords = dict()
    for k in ds.coords:
        a = ds[k]
        v = _compress_dataarray(a, indexer, dim)
        coords[k] = (a.dims, v)
    
    data_vars = dict()
    for k in ds.data_vars:
        a = ds[k]
        v = _compress_dataarray(a, indexer, dim)
        data_vars[k] = (a.dims, v)
        
    attrs = ds.attrs.copy()
    
    return xr.Dataset(data_vars=data_vars, coords=coords, attrs=attrs)
```

Given the complexity of fancy indexing in general, I wonder if it's worth contemplating implementing a `Dataset.compress()` method as a first step.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,834972299
https://github.com/pydata/xarray/issues/5054#issuecomment-802096873,https://api.github.com/repos/pydata/xarray/issues/5054,802096873,MDEyOklzc3VlQ29tbWVudDgwMjA5Njg3Mw==,703554,2021-03-18T16:39:59Z,2021-03-18T16:39:59Z,CONTRIBUTOR,Thanks @dcherian.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,834972299
https://github.com/pydata/xarray/pull/4984#issuecomment-800504527,https://api.github.com/repos/pydata/xarray/issues/4984,800504527,MDEyOklzc3VlQ29tbWVudDgwMDUwNDUyNw==,703554,2021-03-16T18:28:09Z,2021-03-16T18:28:09Z,CONTRIBUTOR,"Yay, first xarray PR :partying_face: ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,819911891
https://github.com/pydata/xarray/pull/4984#issuecomment-800317378,https://api.github.com/repos/pydata/xarray/issues/4984,800317378,MDEyOklzc3VlQ29tbWVudDgwMDMxNzM3OA==,703554,2021-03-16T14:40:45Z,2021-03-16T14:40:45Z,CONTRIBUTOR,"> Could we add a very small test for the DataArray? Given the coverage on Dataset, it should mostly just test that the method works.

No problem, some DataArray tests are there.

> Any thoughts from others before we merge?

Good to go from my side.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,819911891
https://github.com/pydata/xarray/pull/4984#issuecomment-800176868,https://api.github.com/repos/pydata/xarray/issues/4984,800176868,MDEyOklzc3VlQ29tbWVudDgwMDE3Njg2OA==,703554,2021-03-16T11:24:42Z,2021-03-16T11:24:42Z,CONTRIBUTOR,"Hi @max-sixty,

> It looks like we need a `requires_numexpr` decorator on the tests — would you be OK to add that?

Sure, done.
 
> Could we add a simple method to `DataArray` which converts to a Dataset, calls the functions, and converts back too? (there are lots of examples already of this, let me know any issues)

Done. 

> And we should add the methods to `api.rst`, and a whatsnew entry if possible.

Done.

Let me know if there's anything else. Looking forward to using this :smile: ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,819911891
https://github.com/pydata/xarray/pull/4984#issuecomment-798993998,https://api.github.com/repos/pydata/xarray/issues/4984,798993998,MDEyOklzc3VlQ29tbWVudDc5ODk5Mzk5OA==,703554,2021-03-14T22:44:49Z,2021-03-14T22:44:49Z,CONTRIBUTOR,"> Currently the test runs over an array of two dimensions — `x` & `y`. Would `pd.query` work if there were also a `z` dimension?

No worries, yes any number of dimensions can be queried. I've added tests showing three dimensions can be queried.

As an aside, in writing these tests I came upon a probable upstream bug in pandas, reported as https://github.com/pandas-dev/pandas/issues/40436. I don't think this affects this PR though, and has low impact as only the ""python"" query parser is affected, and most people will use the default ""pandas"" query parser. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,819911891
https://github.com/pydata/xarray/pull/4984#issuecomment-797668635,https://api.github.com/repos/pydata/xarray/issues/4984,797668635,MDEyOklzc3VlQ29tbWVudDc5NzY2ODYzNQ==,703554,2021-03-12T18:16:15Z,2021-03-12T18:16:15Z,CONTRIBUTOR,Just to mention I've added tests to verify this works with variables backed by dask arrays. Also added explicit tests of different eval engine and query parser options. And added a docstring.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,819911891
https://github.com/pydata/xarray/pull/4984#issuecomment-797636489,https://api.github.com/repos/pydata/xarray/issues/4984,797636489,MDEyOklzc3VlQ29tbWVudDc5NzYzNjQ4OQ==,703554,2021-03-12T17:21:29Z,2021-03-12T17:21:29Z,CONTRIBUTOR,"Hi @max-sixty, no problem. Re this...

> Does the `pd.eval` work with more than two dimensions?

...not quite sure what you mean, could you elaborate?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,819911891
https://github.com/pydata/xarray/pull/4984#issuecomment-788828644,https://api.github.com/repos/pydata/xarray/issues/4984,788828644,MDEyOklzc3VlQ29tbWVudDc4ODgyODY0NA==,703554,2021-03-02T11:10:20Z,2021-03-02T11:10:20Z,CONTRIBUTOR,"Hi folks, thought I'd put up a proof of concept PR here for further discussion. Any advice/suggestions about if/how to take this forward would be very welcome.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,819911891
https://github.com/pydata/xarray/issues/4079#issuecomment-631075010,https://api.github.com/repos/pydata/xarray/issues/4079,631075010,MDEyOklzc3VlQ29tbWVudDYzMTA3NTAxMA==,703554,2020-05-19T20:50:26Z,2020-05-19T20:50:51Z,CONTRIBUTOR,"> In the specific example from your notebook, where do the dimensions lengths `__variants/BaseCounts_dim1`, `__variants/MLEAC_dim1` and `__variants/MLEAF_dim1` come from?
> 
> `BaseCounts_dim1` is length 4, so maybe that corresponds to DNA bases ATGC?

In this specific example, I do actually know where these dimension lengths come from. In fact I should've used the shared dimension `alt_alleles` instead of `__variants/MLEAC_dim1` and `__variants/MLEAF_dim1`. And yes `BaseCounts_dim1` does correspond to DNA bases.

But two points.

First, I don't care about these dimensions. The only dimensions I care about and will use are `variants`, `samples` and `ploidy`. 

Second, more important, this kind of data can come from a number of different sources, each of which includes a different set of arrays with different names and semantics. While there are some common arrays and naming conventions where I can guess what the dimensions mean, in general I can't know all of those up front and bake them in as special cases.  ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,621078539
https://github.com/pydata/xarray/issues/4081#issuecomment-631071623,https://api.github.com/repos/pydata/xarray/issues/4081,631071623,MDEyOklzc3VlQ29tbWVudDYzMTA3MTYyMw==,703554,2020-05-19T20:43:07Z,2020-05-19T20:43:07Z,CONTRIBUTOR,"Thanks @shoyer for raising this, would be nice to wrap the dimensions, I'd vote for one per line.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,621123222
https://github.com/pydata/xarray/issues/4079#issuecomment-630924754,https://api.github.com/repos/pydata/xarray/issues/4079,630924754,MDEyOklzc3VlQ29tbWVudDYzMDkyNDc1NA==,703554,2020-05-19T16:14:27Z,2020-05-19T16:14:27Z,CONTRIBUTOR,"Thanks @shoyer. 

For reference, I'm exploring putting some genome variation data into xarray, here's an [initial experiment](https://nbviewer.jupyter.org/gist/alimanfoo/b74b08465727894538d5b161b3ced764) and [discussion here](https://discourse.smadstatgen.org/t/xarray-conventions-for-variation-data/44/2).

In general I will have some arrays where I won't know what some of the dimensions mean, and so cannot give them a meaningful name.

No worries if this is hard, was just wondering if it was supported already.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,621078539
https://github.com/pydata/xarray/issues/4079#issuecomment-630913851,https://api.github.com/repos/pydata/xarray/issues/4079,630913851,MDEyOklzc3VlQ29tbWVudDYzMDkxMzg1MQ==,703554,2020-05-19T15:55:54Z,2020-05-19T15:55:54Z,CONTRIBUTOR,Thanks so much @rabernat for quick response.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,621078539
https://github.com/pydata/xarray/issues/3831#issuecomment-605179227,https://api.github.com/repos/pydata/xarray/issues/3831,605179227,MDEyOklzc3VlQ29tbWVudDYwNTE3OTIyNw==,703554,2020-03-27T18:10:05Z,2020-03-27T18:10:05Z,CONTRIBUTOR,"Just to say having some kind of stack integration tests is a marvellous idea. Another example of an issue that's very hard to pin down is https://github.com/zarr-developers/zarr-python/issues/528.

Btw we have also run into issues with fsspec caching directory listings and not invalidating the cache when store changes are made, although I haven't checked with latest master. We have a lot of workarounds in our code where we reopen everything after we've made changes to a store. Probably an area where some more digging and careful testing may be needed.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,576337745
https://github.com/pydata/xarray/pull/3526#issuecomment-554463832,https://api.github.com/repos/pydata/xarray/issues/3526,554463832,MDEyOklzc3VlQ29tbWVudDU1NDQ2MzgzMg==,703554,2019-11-15T17:57:42Z,2019-11-15T17:57:42Z,CONTRIBUTOR,"FWIW in the Zarr Python implementation I don't think we do any special encoding or decoding of attribute values. Whatever value is given then gets serialised using the built-in `json.dumps`. This means I believe that if someone provides a `dict` as an attribute value then that will get serialised as a JSON object, and get deserialised back to a `dict`, although this is not something we test for currently.

From the zarr v2 spec point of view I think anything goes in the `.zattrs` file, as long as `.zattrs` is a JSON object at the root.

Hth.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,522519084
https://github.com/pydata/xarray/issues/2586#issuecomment-455374760,https://api.github.com/repos/pydata/xarray/issues/2586,455374760,MDEyOklzc3VlQ29tbWVudDQ1NTM3NDc2MA==,703554,2019-01-17T23:49:07Z,2019-01-17T23:49:07Z,CONTRIBUTOR,"> IMO, zarr needs some kind of ""resolver"" mechanism that takes a string and decides what kind of store it represents. For example, if the path ends with `.zip`, then it should know it's zip store, if it starts with `gs://`, it should know it's a google cloud store, etc.

Some very limited support for this is there already, e.g., if string ends with '.zip' then a zip store will be used, but there's no support for dispatching to cloud stores via a URL-like protocol. There's an open issue for that: https://github.com/zarr-developers/zarr/issues/214","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,386515973
https://github.com/pydata/xarray/issues/1603#issuecomment-444187219,https://api.github.com/repos/pydata/xarray/issues/1603,444187219,MDEyOklzc3VlQ29tbWVudDQ0NDE4NzIxOQ==,703554,2018-12-04T17:33:34Z,2018-12-04T17:33:34Z,CONTRIBUTOR,"> I think that one big source of confusion has been so far mixing
> coordinates/variables and indexes. These are really two separate concepts,
> and the indexes refactoring should address that IMHO.
>
> For example, I think that da[some_name] should never return indexes but
> only coordinates (and/or data variables for Dataset). That would be much
> simpler.
>
Can't claim to be following every detail here, but this sounds very
sensible to me FWIW.

>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,262642978
https://github.com/pydata/xarray/pull/2559#issuecomment-442801741,https://api.github.com/repos/pydata/xarray/issues/2559,442801741,MDEyOklzc3VlQ29tbWVudDQ0MjgwMTc0MQ==,703554,2018-11-29T11:33:33Z,2018-11-29T11:33:33Z,CONTRIBUTOR,"Great to see this. On the API, FWIW I'd vote for using the same keyword (``consolidated``) in both, less burden on the user to remember what to use.","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,382497709
https://github.com/pydata/xarray/issues/1603#issuecomment-392831984,https://api.github.com/repos/pydata/xarray/issues/1603,392831984,MDEyOklzc3VlQ29tbWVudDM5MjgzMTk4NA==,703554,2018-05-29T15:59:46Z,2018-05-29T15:59:46Z,CONTRIBUTOR,"Ok, cool. Was wondering if now was right time to revisit that, alongside
the work proposed in this PR. Happy to participate in that discussion,
still interested in implementing some alternative index classes.

On Tue, 29 May 2018, 15:45 Stephan Hoyer, <notifications@github.com> wrote:

> Yes, the index API still needs to be determined. But I think we want to
> support something like that.
> On Tue, May 29, 2018 at 1:20 AM Alistair Miles <notifications@github.com>
> wrote:
>
> > I see this mentions an Index API, is that still to be decided?
> >
> > On Tue, 29 May 2018, 05:28 Stephan Hoyer, <notifications@github.com>
> > wrote:
> >
> > > I started thinking about how to do this incrementally, and it occurs to
> > me
> > > that a good place to start would be to write some of the utility
> > functions
> > > we'll need for this:
> > >
> > > 1. Normalizing and creating default indexes in the Dataset/DataArray
> > > constructor.
> > > 2. Combining indexes from all xarray objects that are inputs for an
> > > operations into indexes for the outputs.
> > > 3. Extracting MultiIndex objects from arguments into Dataset/DataArray
> > > and expanding them into multiple variables.
> > >
> > > I drafted up docstrings for each of these functions and did a little
> bit
> > > of working starting to think through implementations in #2195
> > > <https://github.com/pydata/xarray/pull/2195>. So this would be a great
> > > place for others to help out. Each of these could be separate PRs.
> > >
> > > —
> > > You are receiving this because you commented.
> > > Reply to this email directly, view it on GitHub
> > > <https://github.com/pydata/xarray/issues/1603#issuecomment-392649605>,
> > or mute
> > > the thread
> > > <
> >
> https://github.com/notifications/unsubscribe-auth/AAq8QvMauEPa6hfgorDoShZ2PwyYWk6Tks5t3M6AgaJpZM4PtACU
> > >
> > > .
> > >
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub
> > <https://github.com/pydata/xarray/issues/1603#issuecomment-392692996>,
> or mute
> > the thread
> > <
> https://github.com/notifications/unsubscribe-auth/ABKS1p8RjrupPM2z2d4_ylWX7826RQ0Rks5t3QTHgaJpZM4PtACU
> >
> > .
> >
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/issues/1603#issuecomment-392803210>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAq8QgygnzTX053NlGZ5A5j_tRkRxMj7ks5t3V79gaJpZM4PtACU>
> .
>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,262642978
https://github.com/pydata/xarray/issues/1603#issuecomment-392692996,https://api.github.com/repos/pydata/xarray/issues/1603,392692996,MDEyOklzc3VlQ29tbWVudDM5MjY5Mjk5Ng==,703554,2018-05-29T08:20:22Z,2018-05-29T08:20:22Z,CONTRIBUTOR,"I see this mentions an Index API, is that still to be decided?

On Tue, 29 May 2018, 05:28 Stephan Hoyer, <notifications@github.com> wrote:

> I started thinking about how to do this incrementally, and it occurs to me
> that a good place to start would be to write some of the utility functions
> we'll need for this:
>
>    1. Normalizing and creating default indexes in the Dataset/DataArray
>    constructor.
>    2. Combining indexes from all xarray objects that are inputs for an
>    operations into indexes for the outputs.
>    3. Extracting MultiIndex objects from arguments into Dataset/DataArray
>    and expanding them into multiple variables.
>
> I drafted up docstrings for each of these functions and did a little bit
> of working starting to think through implementations in #2195
> <https://github.com/pydata/xarray/pull/2195>. So this would be a great
> place for others to help out. Each of these could be separate PRs.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/issues/1603#issuecomment-392649605>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAq8QvMauEPa6hfgorDoShZ2PwyYWk6Tks5t3M6AgaJpZM4PtACU>
> .
>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,262642978
https://github.com/pydata/xarray/issues/1974#issuecomment-371626776,https://api.github.com/repos/pydata/xarray/issues/1974,371626776,MDEyOklzc3VlQ29tbWVudDM3MTYyNjc3Ng==,703554,2018-03-08T21:15:04Z,2018-03-08T21:15:04Z,CONTRIBUTOR,"It worked! Thanks again, pangeo.pydata.org is super cool.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,303270676
https://github.com/pydata/xarray/issues/1974#issuecomment-371603679,https://api.github.com/repos/pydata/xarray/issues/1974,371603679,MDEyOklzc3VlQ29tbWVudDM3MTYwMzY3OQ==,703554,2018-03-08T19:52:01Z,2018-03-08T19:52:01Z,CONTRIBUTOR,I have it running! Will try to start the talk with it.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,303270676
https://github.com/pydata/xarray/issues/1974#issuecomment-371561259,https://api.github.com/repos/pydata/xarray/issues/1974,371561259,MDEyOklzc3VlQ29tbWVudDM3MTU2MTI1OQ==,703554,2018-03-08T17:30:21Z,2018-03-08T17:30:21Z,CONTRIBUTOR,Actually just realising @rabernat and @mrocklin you guys already demoed all of this to ESIP back in January (really nice talk btw). So maybe I don't need to repeat.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,303270676
https://github.com/pydata/xarray/issues/1974#issuecomment-371558334,https://api.github.com/repos/pydata/xarray/issues/1974,371558334,MDEyOklzc3VlQ29tbWVudDM3MTU1ODMzNA==,703554,2018-03-08T17:21:08Z,2018-03-08T17:21:08Z,CONTRIBUTOR,Thanks @mrocklin.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,303270676
https://github.com/pydata/xarray/issues/1974#issuecomment-371544386,https://api.github.com/repos/pydata/xarray/issues/1974,371544386,MDEyOklzc3VlQ29tbWVudDM3MTU0NDM4Ng==,703554,2018-03-08T16:38:48Z,2018-03-08T16:38:48Z,CONTRIBUTOR,"Ha, Murphy's law. Shame because the combination of jupyterlab interface, launching a kubernetes cluster, and being able to click through to the Dask dashboard looks futuristic cool :-) I was really looking forward to seeing all my jobs spinning through the Dask dashboard as they work. I actually have a pretty packed talk already so don't absolutely need to include this, but if it does come back in time I'll slot it in. Talk starts 8pm GMT so still a few hours yet... ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,303270676
https://github.com/pydata/xarray/issues/1974#issuecomment-371538819,https://api.github.com/repos/pydata/xarray/issues/1974,371538819,MDEyOklzc3VlQ29tbWVudDM3MTUzODgxOQ==,703554,2018-03-08T16:22:16Z,2018-03-08T16:22:16Z,CONTRIBUTOR,"Just tried to run the xarray-data notebook from within pangeo.pydata.org jupyterlab, when I run this command:

```
gcsmap = gcsfs.mapping.GCSMap('pangeo-data/newman-met-ensemble')
```

...it hangs there indefinitely. If I keyboard interrupt  it bottoms out here:

```
/opt/conda/lib/python3.6/site-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
     71             if source_address:
     72                 sock.bind(source_address)
---> 73             sock.connect(sa)
     74             return sock
     75 
```

...suggesting it is not able to make a connection. Am I doing something wrong?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,303270676
https://github.com/pydata/xarray/issues/1974#issuecomment-371299755,https://api.github.com/repos/pydata/xarray/issues/1974,371299755,MDEyOklzc3VlQ29tbWVudDM3MTI5OTc1NQ==,703554,2018-03-07T21:58:49Z,2018-03-07T21:58:49Z,CONTRIBUTOR,"Wonderful, thanks both!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,303270676
https://github.com/pydata/xarray/pull/1528#issuecomment-350375750,https://api.github.com/repos/pydata/xarray/issues/1528,350375750,MDEyOklzc3VlQ29tbWVudDM1MDM3NTc1MA==,703554,2017-12-08T21:24:45Z,2017-12-08T22:27:47Z,CONTRIBUTOR,"Just to confirm, if writes are aligned with chunk boundaries in the
destination array then no locking is required.

Also if you're going to be moving large datasets into cloud storage and
doing distributed computing then it may be worth investigating compressors
and compressor options as good compression ratio may make a big difference
where network bandwidth may be the limiting factor. I would suggest using
the Blosc compressor with cname='zstd'. I would also suggest using shuffle,
the Blosc codec in latest numcodecs has an AUTOSHUFFLE option so byte
shuffle is used for arrays with >1 byte item size and bit shuffle is used
for arrays with 1 byte item size
. I would also experiment with compression level (clevel) to see how speed
balances against compression ratio. E.g., Blosc(cname='zstd', clevel=5,
shuffle=Blosc.AUTOSHUFFLE) may be a good starting point. The default
compressor is Blosc(cname='lz4', ...) is more optimised for fast local
storage, so speed is very good but compression ratio is moderate, this may
not be best for distributed computing.
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-350379064,https://api.github.com/repos/pydata/xarray/issues/1528,350379064,MDEyOklzc3VlQ29tbWVudDM1MDM3OTA2NA==,703554,2017-12-08T21:40:40Z,2017-12-08T22:27:35Z,CONTRIBUTOR,"Some examples of compressor benchmarking here may be useful
http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html

The specific conclusions probably won't apply to your data but some of the
code and ideas may be useful. Since writing that article I added Zstd and
LZ4 compressors in numcodecs so those may also be worth trying in addition
to Blosc with various configurations. (Blosc breaks up each chunk into
blocks which enables multithreaded compression/decompression but can also
reduce compression ratio over the same compressor library used without
Blosc. I.e., Blosc(cname='zstd', clevel=1) will behave differently from
Zstd(level=1) even though the same underlying compression library
(Zstandard) is being used.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-348839453,https://api.github.com/repos/pydata/xarray/issues/1528,348839453,MDEyOklzc3VlQ29tbWVudDM0ODgzOTQ1Mw==,703554,2017-12-04T01:40:57Z,2017-12-04T01:40:57Z,CONTRIBUTOR,"I know you're not including string support in this PR, but for interest, there are a couple of changes coming into zarr via https://github.com/alimanfoo/zarr/pull/212 that may be relevant in future. 

It should now be impossible to generate a segfault via a badly configured object array. It is also now much harder to badly configure an object array. When creating an object array, an object codec should be provided via the ``object_codec`` parameter. There are now three codecs in numcodecs that can be used for variable length text strings: MsgPack, Pickle and JSON (new). [Examples notebook here](https://github.com/alimanfoo/zarr/blob/14ac8d9bf19633232f6522dfcd925f300722b82b/notebooks/object_arrays.ipynb). In that notebook I also ran some simple benchmarks and MsgPack comes out well, but JSON isn't too shabby either.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1087#issuecomment-348183062,https://api.github.com/repos/pydata/xarray/issues/1087,348183062,MDEyOklzc3VlQ29tbWVudDM0ODE4MzA2Mg==,703554,2017-11-30T13:07:53Z,2017-11-30T13:07:53Z,CONTRIBUTOR,"FWIW for the filters, if it would be possible to use the numcodecs Codec
API http://numcodecs.readthedocs.io/en/latest/abc.html then that could be
beneficial beyond xarray, as any work you put into developing filters could
then be used elsewhere (e.g., in zarr).

On Thu, Nov 30, 2017 at 12:05 PM, Stephan Hoyer <notifications@github.com>
wrote:

> OK, I'm going to try to reboot this and finish it up in the form of an API
> that we'll be happy with going forward. I just discovered two more xarray
> backends over the past two days (in Unidata's Siphon and something
> @alexamici <https://github.com/alexamici> and colleagues are writing to
> reading GRIB files), so clearly the demand is here.
>
> One additional change I'd like to make is try to rewrite the
> encoding/decoding functions for variables into a series of invertible
> coding filters that can potentially be chained together in a flexible way
> (this is somewhat inspired by zarr). This will allow different backends to
> mix/match filters as necessary, depending on their particular needs. I'll
> start on that in another PR.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/pull/1087#issuecomment-348169779>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAq8QmzjKBnyjuGDFN6btGfhr2eFrhoiks5s7poXgaJpZM4Kq10M>
> .
>


-- 
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
Big Data Institute Building
Old Road Campus
Roosevelt Drive
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596
Email: alimanfoo@googlemail.com
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: https://twitter.com/alimanfoo
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187625917
https://github.com/pydata/xarray/pull/1528#issuecomment-347385269,https://api.github.com/repos/pydata/xarray/issues/1528,347385269,MDEyOklzc3VlQ29tbWVudDM0NzM4NTI2OQ==,703554,2017-11-28T01:36:29Z,2017-11-28T01:49:24Z,CONTRIBUTOR,"FWIW I think the best option at the moment is to make sure you add either Pickle or MsgPack filter for any zarr array with an object dtype. 

BTW I was thinking that zarr should automatically add one of these filters any time someone creates an array with an object dtype, to avoid them hitting the pointer issue. If you have any thoughts on best solution drop them here: https://github.com/alimanfoo/zarr/issues/208
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-347381734,https://api.github.com/repos/pydata/xarray/issues/1528,347381734,MDEyOklzc3VlQ29tbWVudDM0NzM4MTczNA==,703554,2017-11-28T01:16:07Z,2017-11-28T01:16:07Z,CONTRIBUTOR,"When still in the original interpreter session, all the objects still exist
in memory, so all the pointers stored in the array are still valid. Restart
the session and the objects are gone and the pointers are invalid.

On Tue, Nov 28, 2017 at 1:14 AM, Alistair Miles <alimanfoo@googlemail.com>
wrote:

> Try exiting and restarting the interpreter, then running:
>
> zgs = zarr.open_group(store='zarr_directory')
> zgs.x[:]
>
>
> On Tue, Nov 28, 2017 at 1:10 AM, Ryan Abernathey <notifications@github.com
> > wrote:
>
>> zarr needs a filter that can encode and pack the strings into a single
>> buffer, except in the special case where the data are being stored in-memory
>>
>> @alimanfoo <https://github.com/alimanfoo>: the following also seems to
>> works with directory store
>>
>> values = np.array([b'ab', b'cdef', np.nan], dtype=object)
>> zgs = zarr.open_group(store='zarr_directory')
>> zgs.create('x', shape=values.shape, dtype=values.dtype)
>> zgs.x[:] = values
>>
>> This seems to contradict your statement above. What am I missing?
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <https://github.com/pydata/xarray/pull/1528#issuecomment-347380750>, or mute
>> the thread
>> <https://github.com/notifications/unsubscribe-auth/AAq8QnNQ7bI5GRyHsUUSQAgusymx8eJnks5s611rgaJpZM4PDrlp>
>> .
>>
>
>
>
> --
> Alistair Miles
> Head of Epidemiological Informatics
> Centre for Genomics and Global Health <http://cggh.org>
> Big Data Institute Building
> Old Road Campus
> Roosevelt Drive
> Oxford
> OX3 7LF
> United Kingdom
> Phone: +44 (0)1865 743596 <+44%201865%20743596>
> Email: alimanfoo@googlemail.com
> Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
> Twitter: https://twitter.com/alimanfoo
>


-- 
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
Big Data Institute Building
Old Road Campus
Roosevelt Drive
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596
Email: alimanfoo@googlemail.com
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: https://twitter.com/alimanfoo
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-347381500,https://api.github.com/repos/pydata/xarray/issues/1528,347381500,MDEyOklzc3VlQ29tbWVudDM0NzM4MTUwMA==,703554,2017-11-28T01:14:42Z,2017-11-28T01:14:42Z,CONTRIBUTOR,"Try exiting and restarting the interpreter, then running:

zgs = zarr.open_group(store='zarr_directory')
zgs.x[:]


On Tue, Nov 28, 2017 at 1:10 AM, Ryan Abernathey <notifications@github.com>
wrote:

> zarr needs a filter that can encode and pack the strings into a single
> buffer, except in the special case where the data are being stored in-memory
>
> @alimanfoo <https://github.com/alimanfoo>: the following also seems to
> works with directory store
>
> values = np.array([b'ab', b'cdef', np.nan], dtype=object)
> zgs = zarr.open_group(store='zarr_directory')
> zgs.create('x', shape=values.shape, dtype=values.dtype)
> zgs.x[:] = values
>
> This seems to contradict your statement above. What am I missing?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/pull/1528#issuecomment-347380750>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAq8QnNQ7bI5GRyHsUUSQAgusymx8eJnks5s611rgaJpZM4PDrlp>
> .
>


-- 
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
Big Data Institute Building
Old Road Campus
Roosevelt Drive
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596
Email: alimanfoo@googlemail.com
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: https://twitter.com/alimanfoo
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-347363503,https://api.github.com/repos/pydata/xarray/issues/1528,347363503,MDEyOklzc3VlQ29tbWVudDM0NzM2MzUwMw==,703554,2017-11-27T23:27:41Z,2017-11-27T23:27:41Z,CONTRIBUTOR,"For variable length strings (or any array with an object dtype) zarr needs
a filter that can encode and pack the strings into a single buffer, except
in the special case where the data are being stored in-memory (as in your
first example). The filter has to be specified manually, some examples
here: http://zarr.readthedocs.io/en/master/tutorial.html#string-arrays.
There are two codecs currently in numcodecs that can do this, one is
Pickle, the other is MsgPack. I haven't done any benchmarking of data size
or encoding speed, but MsgPack may be preferable because it's more portable.

There was some discussion a while back about creating a codec that handles
variable-length strings by encoding via UTF8 then concatenating encoded
bytes and lengths or offsets, IIRC similar to Arrow, and maybe even
creating a special ""text"" dtype that inserts this filter automatically so
you don't have to add it manually. But there hasn't been a strong
motivation so far.

On Mon, Nov 27, 2017 at 10:32 PM, Stephan Hoyer <notifications@github.com>
wrote:

> Overall, I find the conventions module to be a bit unwieldy. There is a
> lot of stuff in there, not all of which is related to CF conventions. It
> would be useful to separate the actual conventions from the encoding /
> decoding needed for different backends.
>
> Agreed!
>
> I wonder why zarr doesn't have a UTF-8 variable length string type (
> alimanfoo/zarr#206 <https://github.com/alimanfoo/zarr/issues/206>) --
> that would feel like the obvious first choice for encoding this data.
>
> That said, xarary *should* be able to use first-length bytes just fine,
> doing UTF-8 encoding/decoding on the fly.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/pull/1528#issuecomment-347351224>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAq8QkLTQUuspLhiXYR2_WMW8Hg9LFziks5s6ziTgaJpZM4PDrlp>
> .
>


-- 
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
Big Data Institute Building
Old Road Campus
Roosevelt Drive
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596
Email: alimanfoo@googlemail.com
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: https://twitter.com/alimanfoo
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-345619509,https://api.github.com/repos/pydata/xarray/issues/1528,345619509,MDEyOklzc3VlQ29tbWVudDM0NTYxOTUwOQ==,703554,2017-11-20T08:07:44Z,2017-11-20T08:07:44Z,CONTRIBUTOR,"Fantastic!

On Monday, November 20, 2017, Matthew Rocklin <notifications@github.com>
wrote:

> That is, indeed, quite exciting. Also exciting is that I was able to look
> at and compute on your data easily.
>
> In [1]: import zarr
>
> In [2]: import gcsfs
>
> In [3]: fs = gcsfs.GCSFileSystem(project='pangeo-181919')
>
> In [4]: gcsmap = gcsfs.mapping.GCSMap('zarr_store_test', gcs=fs, check=True, create=False)
>
> In [5]: import xarray as xr
>
> In [6]: ds_gcs = xr.open_zarr(gcsmap, mode='r')
>
> In [7]: ds_gcs
> Out[7]: <xarray.Dataset>
> Dimensions:  (x: 200, y: 100)
> Coordinates:
>   * x        (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
>   * y        (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
> Data variables:
>     bar      (x) float64 dask.array<shape=(200,), chunksize=(40,)>
>     foo      (y, x) float32 dask.array<shape=(100, 200), chunksize=(50, 40)>
> Attributes:
>     array_atr:  [1, 2]
>     some_attr:  copana
>
> In [8]: ds_gcs.sum()
> Out[8]: <xarray.Dataset>
> Dimensions:  ()
> Data variables:
>     bar      float64 dask.array<shape=(), chunksize=()>
>     foo      float32 dask.array<shape=(), chunksize=()>
>
> In [9]: ds_gcs.sum().compute()
> Out[9]: <xarray.Dataset>
> Dimensions:  ()
> Data variables:
>     bar      float64 0.0
>     foo      float32 20000.0
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/pull/1528#issuecomment-345575240>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAq8Quu1UYM4BO3i_KzMkXGnN-g-TFczks5s4OO5gaJpZM4PDrlp>
> .
>


-- 
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
Big Data Institute Building
Old Road Campus
Roosevelt Drive
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596
Email: alimanfoo@googlemail.com
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: https://twitter.com/alimanfoo
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-345080945,https://api.github.com/repos/pydata/xarray/issues/1528,345080945,MDEyOklzc3VlQ29tbWVudDM0NTA4MDk0NQ==,703554,2017-11-16T22:18:04Z,2017-11-16T22:18:04Z,CONTRIBUTOR,"Re different zarr storage backends, main options are plain dict, DirectoryStore, ZipStore, and there's a [new DBMStore class just merged](https://github.com/alimanfoo/zarr/pull/186) which enables storage in any DBM-style database (e.g., Berkeley DB). ZipStore has some constraints because of how zip files work, you can't really replace an entry in a zip file which means anything that writes the same array chunk more than once will generate warnings. Dask's S3Map should also work, I haven't tried it and obviously not ideal for unit tests but I'd be interested if you get any experience with it.

Re different combinations of zarr and dask chunks, it can be thread safe even if chunks are not aligned, just need to pass a synchronizer when instantiating the array or group. Zarr has a ThreadSynchronizer class which can be used for thread-based parallelism. If a synchronizer is provided, it is used to lock each chunk individually during write operations. [More info here](http://zarr.readthedocs.io/en/latest/tutorial.html#parallel-computing-and-synchronization). 

Re fill values, zarr has a native concept of fill value for each array, with the fill value stored as part of the array metadata. Array metadata are stored as JSON and I recently [merged a fix](https://github.com/alimanfoo/zarr/pull/176) so that a bytes fill values could be used (via base64 encoding). I believe the netcdf way is to store fill value separately as value of ""_FillValue"" attribute? You could do this with zarr but user attributes are also JSON and so you would need to do your own encoding/decoding. But if possible I'd suggest using the native zarr fill_value support as it handles bytes fill value encoding and also checks to ensure fill values are valid wrt the array dtype.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-339897936,https://api.github.com/repos/pydata/xarray/issues/1528,339897936,MDEyOklzc3VlQ29tbWVudDMzOTg5NzkzNg==,703554,2017-10-27T07:42:34Z,2017-10-27T07:42:34Z,CONTRIBUTOR,"Suggest testing against GitHub master, there are a few other issues I'd
like to work through before next release.

On Thu, 26 Oct 2017 at 23:07, Ryan Abernathey <notifications@github.com>
wrote:

> Fantastic! Are you planning a release any time soon? If not we can set up
> to test against the github master.
>
> Sent from my iPhone
>
> > On Oct 26, 2017, at 5:04 PM, Alistair Miles <notifications@github.com>
> wrote:
> >
> > Just to say, support for 0d arrays, and for arrays with one or more
> zero-length dimensions, is in zarr master.
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub, or mute the thread.
> >
>
> —
> You are receiving this because you were mentioned.
>
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/pull/1528#issuecomment-339815147>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAq8QtP5kta-H9Y90Puv9BHig7krEI0Wks5swQKQgaJpZM4PDrlp>
> .
>
-- 
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
Big Data Institute Building
Old Road Campus
Roosevelt Drive
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596
Email: alimanfoo@googlemail.com
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: https://twitter.com/alimanfoo
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-339800443,https://api.github.com/repos/pydata/xarray/issues/1528,339800443,MDEyOklzc3VlQ29tbWVudDMzOTgwMDQ0Mw==,703554,2017-10-26T21:04:17Z,2017-10-26T21:04:17Z,CONTRIBUTOR,"Just to say, support for 0d arrays, and for arrays with one or more zero-length dimensions, is in zarr master.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/issues/1650#issuecomment-338786761,https://api.github.com/repos/pydata/xarray/issues/1650,338786761,MDEyOklzc3VlQ29tbWVudDMzODc4Njc2MQ==,703554,2017-10-23T20:29:41Z,2017-10-23T20:29:41Z,CONTRIBUTOR,"Index API sounds good.

Also I was just looking at [dask.dataframe indexing](https://github.com/dask/dask/blob/master/dask/dataframe/indexing.py), there .loc is implemented using information about index values at the boundaries of each partition (chunk). Not sure xarray should use same strategy for chunked datasets, but is another approach to avoid loading indexes into memory. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,267628781
https://github.com/pydata/xarray/issues/1650#issuecomment-338687376,https://api.github.com/repos/pydata/xarray/issues/1650,338687376,MDEyOklzc3VlQ29tbWVudDMzODY4NzM3Ng==,703554,2017-10-23T14:58:59Z,2017-10-23T14:58:59Z,CONTRIBUTOR,"It looks like #1017 is about having no index at all. I want indexes, but I
want to avoid loading all coordinate values into memory.

On Mon, Oct 23, 2017 at 1:47 PM, Fabien Maussion <notifications@github.com>
wrote:

> Has anyone considered implementing an index for monotonic data that does
> not require loading all values into main memory?
>
> But this is already the case? #1017
> <https://github.com/pydata/xarray/pull/1017>
>
> With on file datasets I *think* it is sufficient to drop_variables when
> opening the dataset in order not to parse the coordinates:
>
> ds = xr.open_dataset(f, drop_variables=['lon', 'lat'])
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/issues/1650#issuecomment-338647540>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAq8QsbZ81N2pKybO1sFHVHK0KTk1aELks5svIrJgaJpZM4QCq62>
> .
>


-- 
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
Big Data Institute Building
Old Road Campus
Roosevelt Drive
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596
Email: alimanfoo@googlemail.com
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: https://twitter.com/alimanfoo
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,267628781
https://github.com/pydata/xarray/issues/1650#issuecomment-338627454,https://api.github.com/repos/pydata/xarray/issues/1650,338627454,MDEyOklzc3VlQ29tbWVudDMzODYyNzQ1NA==,703554,2017-10-23T11:19:30Z,2017-10-23T11:19:30Z,CONTRIBUTOR,"Just to add a further thought, which is that the upper levels of the binary search tree could be be cached to get faster performance for repeated searches.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,267628781
https://github.com/pydata/xarray/issues/1603#issuecomment-338622746,https://api.github.com/repos/pydata/xarray/issues/1603,338622746,MDEyOklzc3VlQ29tbWVudDMzODYyMjc0Ng==,703554,2017-10-23T10:56:40Z,2017-10-23T10:56:40Z,CONTRIBUTOR,"Just to say I'm interested in how MultiIndexes are handled also. In our use case, we have two variables conventionally named CHROM (chromosome) and POS (position) which together describe a location in a genome. I want to combine both variables into a multi-index so I can, e.g., select all data from some data variable for chromosome X between positions 100,000-200,000. For all our data variables, this genome location multi-index would be used to index the first dimension.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,262642978
https://github.com/pydata/xarray/issues/66#issuecomment-338459385,https://api.github.com/repos/pydata/xarray/issues/66,338459385,MDEyOklzc3VlQ29tbWVudDMzODQ1OTM4NQ==,703554,2017-10-22T08:02:29Z,2017-10-22T08:02:29Z,CONTRIBUTOR,"Just to say thanks for the work on this, I've been looking at the h5netcdf
code recently to understand better how dimensions are plumbed in netcdf4.
I'm exploring refactoring all my data model classes in scikit-allel to
build on xarray, I think the time is right, especially if xarray gets a
Zarr backend too.

On Sun, 22 Oct 2017 at 02:01, Stephan Hoyer <notifications@github.com>
wrote:

> Closed #66 <https://github.com/pydata/xarray/issues/66>.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/issues/66#event-1304360167>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAq8QqPs_6iyjBqHhFoB2CV7blLX8TUYks5supQEgaJpZM4BpxKD>
> .
>
-- 
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
Big Data Institute Building
Old Road Campus
Roosevelt Drive
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596
Email: alimanfoo@googlemail.com
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: https://twitter.com/alimanfoo
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,29453809
https://github.com/pydata/xarray/pull/1528#issuecomment-335186616,https://api.github.com/repos/pydata/xarray/issues/1528,335186616,MDEyOklzc3VlQ29tbWVudDMzNTE4NjYxNg==,703554,2017-10-09T15:07:29Z,2017-10-09T17:23:21Z,CONTRIBUTOR,"I'm on paternity leave for the next 2 weeks, then will be catching up for a
couple of weeks I expect. May be able to merge straightforward PRs but will
have limited bandwidth.","{""total_count"": 3, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 3, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-335030993,https://api.github.com/repos/pydata/xarray/issues/1528,335030993,MDEyOklzc3VlQ29tbWVudDMzNTAzMDk5Mw==,703554,2017-10-08T19:17:27Z,2017-10-08T23:37:47Z,CONTRIBUTOR,"FWIW I think some JSON encoders for attributes would ultimately be a useful
addition to zarr, but I won't be able to put any effort into zarr in the
next month, so workarounds in xarray sounds like a good idea for now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-325813339,https://api.github.com/repos/pydata/xarray/issues/1528,325813339,MDEyOklzc3VlQ29tbWVudDMyNTgxMzMzOQ==,703554,2017-08-29T21:43:48Z,2017-08-29T21:43:48Z,CONTRIBUTOR,"On Tuesday, August 29, 2017, Ryan Abernathey <notifications@github.com>
wrote:
>
> @alimanfoo <https://github.com/alimanfoo>: when do you anticipate the 2.2
> zarr release to happen? Will the API change significantly? If so, I will
> wait for that to move forward here.
>
Zarr 2.2 will hopefully happen some time in the next 2 months, but it will
be fully backwards-compatible, no breaking API changes.


-- 
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
Big Data Institute Building
Old Road Campus
Roosevelt Drive
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596
Email: alimanfoo@googlemail.com
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: https://twitter.com/alimanfoo
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-325729013,https://api.github.com/repos/pydata/xarray/issues/1528,325729013,MDEyOklzc3VlQ29tbWVudDMyNTcyOTAxMw==,703554,2017-08-29T17:02:41Z,2017-08-29T17:02:41Z,CONTRIBUTOR,"FWIW all filter (codec) classes have been migrated from zarr to a separate packaged called numcodecs and will be imported from there in the next (2.2) zarr release. Here is [FixedScaleOffset](https://github.com/alimanfoo/numcodecs/blob/master/numcodecs/fixedscaleoffset.py). Implementation is basic numpy, probably some room for optimization. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/pull/1528#issuecomment-325727280,https://api.github.com/repos/pydata/xarray/issues/1528,325727280,MDEyOklzc3VlQ29tbWVudDMyNTcyNzI4MA==,703554,2017-08-29T16:56:55Z,2017-08-29T16:56:55Z,CONTRIBUTOR,"Following this with interest.

Regarding autoclose, just to confirm that zarr doesn't really have any notion of whether something is open or closed. When using the DirectoryStore storage class (most common use case I imagine), all files are automatically closed, nothing is kept open. There are some storage classes (e.g., ZipStore) that do require an explicit close call to finalise the file on disk if you have been writing data, but I think you can ignore this in xarray and leave it up to the user to manage this themselves.

Out of interest, @shoyer do you still think there would be value in writing a wrapper for zarr analogous to h5netcdf? Or does this PR provide all the necessary functionality?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,253136694
https://github.com/pydata/xarray/issues/1223#issuecomment-282031922,https://api.github.com/repos/pydata/xarray/issues/1223,282031922,MDEyOklzc3VlQ29tbWVudDI4MjAzMTkyMg==,703554,2017-02-23T15:55:38Z,2017-02-23T15:55:38Z,CONTRIBUTOR,"FWIW I think it would be better in xarray or a separate package, at least
at the moment, just because I don't have a lot of time right now for OSS
and need to keep Zarr as lean as possible.

On Thursday, February 23, 2017, Martin Durant <notifications@github.com>
wrote:

> @alimanfoo <https://github.com/alimanfoo> , do you think this work would
> make more sense as part of zarr rather than as part of xarray?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/issues/1223#issuecomment-281990573>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAq8QoeCQOn7WvB8gtLP5Bs6cifIKRQiks5rfYjSgaJpZM4Lp0yH>
> .
>


-- 
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: alimanfoo@googlemail.com
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202260275
https://github.com/pydata/xarray/issues/1223#issuecomment-281829618,https://api.github.com/repos/pydata/xarray/issues/1223,281829618,MDEyOklzc3VlQ29tbWVudDI4MTgyOTYxOA==,703554,2017-02-22T22:43:52Z,2017-02-22T22:43:52Z,CONTRIBUTOR,"Yep, that looks good. I was wondering about the xarray_to_zarr() function?

On Wednesday, February 22, 2017, Martin Durant <notifications@github.com>
wrote:

> @alimanfoo <https://github.com/alimanfoo> , in the new dataset save
> function, I do exactly [as you suggest] (https://gist.github.com/
> martindurant/06a1e98c91f0033c4649a48a2f943390#file-zarr_xarr-py-L168),
> with everything getting put as a dict into the main zarr group attributes,
> with special attribute names ""attrs"" for the data-set root, ""coords"" for
> the set of coordinate objects and ""variables"" for the set of variables
> objects (all of these have their own attributes in xarray).
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/issues/1223#issuecomment-281813651>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAq8QqSXNzQkrR0xOhhcp9QxWUIkz8Teks5rfKvggaJpZM4Lp0yH>
> .
>


-- 
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: alimanfoo@googlemail.com
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202260275
https://github.com/pydata/xarray/issues/1223#issuecomment-281496902,https://api.github.com/repos/pydata/xarray/issues/1223,281496902,MDEyOklzc3VlQ29tbWVudDI4MTQ5NjkwMg==,703554,2017-02-21T22:05:39Z,2017-02-21T22:05:39Z,CONTRIBUTOR,"Just to say this is looking neat.

For storing an xarray.DataArray, do you think it would be possible to do
away with pickling up all metadata and storing in the .xarray resource?
Specifically I'm wondering if this could all be stored as attributes on the
Zarr array, with some conventions for special xarray attribute names? I'm
guessing there must be some conventions for storing all this metadata as
attributes in an HDF5 (netCDF) file, it would potentially be nice to mirror
that as much as possible?

On Sat, Feb 11, 2017 at 10:56 PM, Martin Durant <notifications@github.com>
wrote:

> I have developed my example a little to sidestep subclassing you suggest,
> which seemed tricky to implement.
>
> Please see https://gist.github.com/martindurant/
> 06a1e98c91f0033c4649a48a2f943390
> (dataset_to/from_zarr functions)
>
> I can use the zarr groups structure to mirror at least typical use of
> xarrays: variables, coordinates and sets of attributes on each. I have
> tested this with s3 too, stealing a little code from dask to show the idea.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/issues/1223#issuecomment-279181938>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAq8QtydMLiMvgETYyaVF5D1CLb-4ot4ks5rbjy5gaJpZM4Lp0yH>
> .
>


-- 
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: alimanfoo@googlemail.com
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202260275
https://github.com/pydata/xarray/issues/1223#issuecomment-274214755,https://api.github.com/repos/pydata/xarray/issues/1223,274214755,MDEyOklzc3VlQ29tbWVudDI3NDIxNDc1NQ==,703554,2017-01-21T00:24:27Z,2017-01-21T00:24:27Z,CONTRIBUTOR,"Happy to help if there's anything to do on the zarr side.

On Fri, 20 Jan 2017 at 23:47, Matthew Rocklin <notifications@github.com>
wrote:

> Also cc @alimanfoo <https://github.com/alimanfoo>
>
>
>
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/issues/1223#issuecomment-274209930>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAq8QlwtJQ_OKOekveWuYtLmpR-caHvgks5rUUeTgaJpZM4Lp0yH>
> .
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,202260275
https://github.com/pydata/xarray/issues/66#issuecomment-90813596,https://api.github.com/repos/pydata/xarray/issues/66,90813596,MDEyOklzc3VlQ29tbWVudDkwODEzNTk2,703554,2015-04-08T06:04:53Z,2015-04-08T06:04:53Z,CONTRIBUTOR,"Thanks Stephan, I'll take a look.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,29453809
https://github.com/pydata/xarray/pull/127#issuecomment-43385302,https://api.github.com/repos/pydata/xarray/issues/127,43385302,MDEyOklzc3VlQ29tbWVudDQzMzg1MzAy,703554,2014-05-16T22:16:01Z,2014-05-16T22:16:01Z,CONTRIBUTOR,"No worries, glad to contribute.

On Friday, 16 May 2014, Stephan Hoyer notifications@github.com wrote:

> Thanks @alimanfoo https://github.com/alimanfoo!
> 
> ## 
> 
> Reply to this email directly or view it on GitHubhttps://github.com/xray-pydata/xray/pull/127#issuecomment-43287303
> .

## 

Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health http://cggh.org
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Web: http://purl.org/net/aliman
Email: alimanfoo@gmail.com
Tel: +44 (0)1865 287721 **_new number**_
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,33396232
https://github.com/pydata/xarray/pull/127#issuecomment-43059199,https://api.github.com/repos/pydata/xarray/issues/127,43059199,MDEyOklzc3VlQ29tbWVudDQzMDU5MTk5,703554,2014-05-14T09:20:01Z,2014-05-14T09:20:01Z,CONTRIBUTOR,"I've added a test to check for an error when a group is not found. I also changed the implementation of the group access function to avoid recursion, it seemed simpler.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,33396232
https://github.com/pydata/xarray/pull/127#issuecomment-43024743,https://api.github.com/repos/pydata/xarray/issues/127,43024743,MDEyOklzc3VlQ29tbWVudDQzMDI0NzQz,703554,2014-05-13T23:11:07Z,2014-05-13T23:11:07Z,CONTRIBUTOR,"Thanks for the comments, all makes good sense.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,33396232
https://github.com/pydata/xarray/issues/66#issuecomment-42869488,https://api.github.com/repos/pydata/xarray/issues/66,42869488,MDEyOklzc3VlQ29tbWVudDQyODY5NDg4,703554,2014-05-12T18:29:57Z,2014-05-12T18:29:57Z,CONTRIBUTOR,"One other detail, I have an HDF5 group for each conceptual dataset, but then variables may be organised into subgroups. It would be nice if this could be accommodated, e.g., when opening an HDF5 group as an xray dataset, assume the dataset contains all variables in the group and any subgroups searched recursively. Again apologies I don't know if this is allowed in NetCDF4, will do the research. 
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,29453809
https://github.com/pydata/xarray/issues/66#issuecomment-42840763,https://api.github.com/repos/pydata/xarray/issues/66,42840763,MDEyOklzc3VlQ29tbWVudDQyODQwNzYz,703554,2014-05-12T14:45:57Z,2014-05-12T14:45:57Z,CONTRIBUTOR,"Thanks @akleeman for the info, much appreciated.

A couple of other points I thought maybe worth mentioning if you're
considering wrapping h5py.

First I've been using lzf as the compression filter in my HDF5 files. I
believe h5py bundles the source for lzf. I don't know if lzf would be
supported if accessing through the python netcdf API.

Second, I have a situation where I have multiple datasets, each of which is
stored in a separate groups, each of which has two dimensions (genome
position and biological sample). The genome position scale is different for
each dataset (there's one dataset per chromosome), however, the biological
sample scale is actually common to all of the datasets. So at the moment I
have a variable in the root group with the ""samples"" dimension scale, then
each dataset group has it's own ""position"" dimension scale. You can
represent all this with HDF5 dimension scales, but I've no idea if this is
accommodated by NetCDF4 or could fit into the xray model. I could work
around this by copying the samples variable into each dataset, but just
thought I mention this pattern as something to be aware of.

On Mon, May 12, 2014 at 3:04 PM, akleeman notifications@github.com wrote:

> @alimanfoo https://github.com/alimanfoo
> 
> Glad you're enjoying xray!
> 
> From your description it sounds like it should be relatively simple for
> you to get xray working with your dataset. NetCDF4 is a subset of h5py and
> simply adding dimension scales should get you most of the way there.
> 
> Re: groups, each xray.Dataset corresponds to one HDF5 group. So while xray
> doesn't currently support groups, you could split your HDF5 dataset into
> separate files for each group and load those files using xray.
> Alternatively (if you feel ambitious) it shouldn't be too hard to get
> xray's NetCDF4DataStore (backends.netCDF4_.py) to work with groups,
> allowing you to do something like:
> 
> dataset = xray.open_dataset('multiple_groups.h5', group='/one_group')
> 
> Thishttp://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-module.htmlgives some good examples of how groups work within the netCDF4.
> 
> Also, as @shoyer https://github.com/shoyer mentioned, it might make
> sense to modify xray so that NetCDF4 support is obtained by wrapping h5py
> instead of netCDF4 which might make your life even easier.
> 
> ## 
> 
> Reply to this email directly or view it on GitHubhttps://github.com/xray-pydata/xray/issues/66#issuecomment-42835510
> .

## 

Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health http://cggh.org
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Web: http://purl.org/net/aliman
Email: alimanfoo@gmail.com
Tel: +44 (0)1865 287721 **_new number**_
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,29453809
https://github.com/pydata/xarray/issues/66#issuecomment-42805550,https://api.github.com/repos/pydata/xarray/issues/66,42805550,MDEyOklzc3VlQ29tbWVudDQyODA1NTUw,703554,2014-05-12T08:08:37Z,2014-05-12T08:08:37Z,CONTRIBUTOR,"I'm really enjoying working with xray, it's so nice to be able to think of my dimensions as named and labeled dimensions, no more remembering which axis is which!

I'm not sure if this is relevant to this specific issue, but I am working for the most part with HDF5 files created using h5py. I'm only just learning about NetCDF-4, but I have datasets that comprise a number of 1D and 2D variables with shared dimensions, so I think my data is already very close to the right model. I have a couple of questions:

(1) If I have multiple datasets within an HDF5 file, each within a separate group, can I access those through xray? 

(2) What would I need to add to my HDF5 to make it fully compliant with the xray/NetCDF4 model? Is it just a question of creating and attaching [dimension scales](http://docs.h5py.org/en/latest/high/dims.html) or would I need to do something else as well?

Thanks in advance.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,29453809