github: issue_comments: 66 rows where author_association = "CONTRIBUTOR" and user = 703554 sorted by updated

66 rows where author_association = "CONTRIBUTOR" and user = 703554 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1544199022	https://github.com/pydata/xarray/issues/7833#issuecomment-1544199022	https://api.github.com/repos/pydata/xarray/issues/7833	IC_kwDOAMm_X85cCptu	alimanfoo 703554	2023-05-11T15:26:52Z	2023-05-11T15:26:52Z	CONTRIBUTOR	Awesome, thanks @kmuehlbauer and @Illviljan 🙏🏻	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Slow performance of concat() 1704950804
1190061811	https://github.com/pydata/xarray/issues/3564#issuecomment-1190061811	https://api.github.com/repos/pydata/xarray/issues/3564	IC_kwDOAMm_X85G7ubz	alimanfoo 703554	2022-07-20T09:44:40Z	2022-07-20T09:44:40Z	CONTRIBUTOR	Hi folks, Just to mention that we've created a short tutorial on xarray which is meant as a gentle intro to folks coming from the malaria genetics field, who mostly have never heard of xarray before. We illustrate xarray first using outputs from a geostatistical model of how insecticide-treated bednets are used in Africa. We then give a couple of brief examples of how we use xarray for genomic data. There's video walkthroughs in French and English: https://anopheles-genomic-surveillance.github.io/workshop-5/module-1-xarray.html Please feel free to link to this in the xarray tutorial site if you'd like to :)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	DOC: from examples to tutorials 527323165
1190057727	https://github.com/pydata/xarray/issues/6771#issuecomment-1190057727	https://api.github.com/repos/pydata/xarray/issues/6771	IC_kwDOAMm_X85G7tb_	alimanfoo 703554	2022-07-20T09:40:41Z	2022-07-20T09:41:07Z	CONTRIBUTOR	Hi @dcherian, We are currently reworking https://tutorial.xarray.dev/intro.html and would love to either add your material or link to it if you're creating a consolidated collection of genetics-related material. xref (#3564). We don't have a "domain-specific" section yet but are planning to create one after SciPy. FWIW we've created a short tutorial on xarray which is meant as a gentle intro to folks coming from the malaria genetics field. We illustrate xarray first using outputs from a geostatistical model of how insecticide-treated bednets are used in Africa. We then give a couple of brief examples of how we use xarray for genomic data. There's video walkthroughs in French and English: https://anopheles-genomic-surveillance.github.io/workshop-5/module-1-xarray.html Please feel free to link to this in the xarray tutorial site if you'd like to :)	{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 }	Explaining xarray in a single picture 1300534066
1190052947	https://github.com/pydata/xarray/issues/6771#issuecomment-1190052947	https://api.github.com/repos/pydata/xarray/issues/6771	IC_kwDOAMm_X85G7sRT	alimanfoo 703554	2022-07-20T09:36:10Z	2022-07-20T09:36:10Z	CONTRIBUTOR	Hi @TomNicholas, I would've thought that latitude and longitude would be 1-dimensional coordinate variables, yet they are drawn as 2-D arrays? I think that if you assume that the axes of your grid data align with the cardinal directions (East-West / North-South) then you would expect latitude and longitude to be 1D, but if they don't align then the coordinates would need be 2D (i.e. if x and y are merely arbitrary lines along the Earth's surface). I agree with you though that 2D lat/lon grids are unnecessarily confusing, especially for non-geoscience users. Interesting, I hadn't considered that. Definitely a bit mind-bending though for us non-geoscientists :) I like the second diagram you showed more (it's also a neater version of the labelled one I made here). I think it's debatable whether `elevation` and `land_cover` constitute coordinates or data variables, but I have no strong opinion on that. As for improvements, I think it would be clearer to at least use the second image over the first, and perhaps we could improve it further. SGTM. FWIW on the second diagram I would use "dimensions" instead of "indexes". Getting dimensions first then helps to explain how you can use a coordinate variable to index a dimension.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explaining xarray in a single picture 1300534066
1054526670	https://github.com/pydata/xarray/issues/324#issuecomment-1054526670	https://api.github.com/repos/pydata/xarray/issues/324	IC_kwDOAMm_X84-2szO	alimanfoo 703554	2022-02-28T18:10:02Z	2022-02-28T18:10:02Z	CONTRIBUTOR	Still relevant, would like to be able to group by multiple variables along a single dimension.	{ "total_count": 6, "+1": 6, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Support multi-dimensional grouped operations and group_over 58117200
802732278	https://github.com/pydata/xarray/issues/4663#issuecomment-802732278	https://api.github.com/repos/pydata/xarray/issues/4663	MDEyOklzc3VlQ29tbWVudDgwMjczMjI3OA==	alimanfoo 703554	2021-03-19T10:44:31Z	2021-03-19T10:44:31Z	CONTRIBUTOR	Thanks @dcherian. Just to add that if we make progress with supporting indexing with dask arrays then at some point I think we'll hit a separate issue, which is that xarray will require that the chunk sizes of the indexed arrays are computed, but currently calling the dask array method `compute_chunk_sizes()` is inefficient for n-d arrays. Raised here: https://github.com/dask/dask/issues/7416 In case anyone needs a workaround for indexing a dataset with a 1d boolean dask array, I'm currently using this hacked implementation of a compress() style function that operates on an xarray dataset, which includes more efficient computation of chunk sizes.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fancy indexing a Dataset with dask DataArray triggers multiple computes 759709924
802101178	https://github.com/pydata/xarray/issues/5054#issuecomment-802101178	https://api.github.com/repos/pydata/xarray/issues/5054	MDEyOklzc3VlQ29tbWVudDgwMjEwMTE3OA==	alimanfoo 703554	2021-03-18T16:45:51Z	2021-03-18T16:58:44Z	CONTRIBUTOR	FWIW my use case actually only needs indexing a single dimension, i.e., something equivalent to the numpy (or dask.array) compress function. This can be hacked for xarray datasets in a fairly straightforward way: ```python def _compress_dataarray(a, indexer, dim): data = a.data try: axis = a.dims.index(dim) except ValueError: v = data else: # rely on array_function to handle dispatching to dask if # data is a dask array v = np.compress(indexer, a.data, axis=axis) if hasattr(v, 'compute_chunk_sizes'): # needed to know dim lengths v.compute_chunk_sizes() return v def compress_dataset(ds, indexer, dim): if isinstance(indexer, str): indexer = ds[indexer].data `coords = dict() for k in ds.coords: a = ds[k] v = _compress_dataarray(a, indexer, dim) coords[k] = (a.dims, v) data_vars = dict() for k in ds.data_vars: a = ds[k] v = _compress_dataarray(a, indexer, dim) data_vars[k] = (a.dims, v) attrs = ds.attrs.copy() return xr.Dataset(data_vars=data_vars, coords=coords, attrs=attrs)` ``` Given the complexity of fancy indexing in general, I wonder if it's worth contemplating implementing a `Dataset.compress()` method as a first step.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fancy indexing a Dataset with dask DataArray causes excessive memory usage 834972299
802096873	https://github.com/pydata/xarray/issues/5054#issuecomment-802096873	https://api.github.com/repos/pydata/xarray/issues/5054	MDEyOklzc3VlQ29tbWVudDgwMjA5Njg3Mw==	alimanfoo 703554	2021-03-18T16:39:59Z	2021-03-18T16:39:59Z	CONTRIBUTOR	Thanks @dcherian.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fancy indexing a Dataset with dask DataArray causes excessive memory usage 834972299
800504527	https://github.com/pydata/xarray/pull/4984#issuecomment-800504527	https://api.github.com/repos/pydata/xarray/issues/4984	MDEyOklzc3VlQ29tbWVudDgwMDUwNDUyNw==	alimanfoo 703554	2021-03-16T18:28:09Z	2021-03-16T18:28:09Z	CONTRIBUTOR	Yay, first xarray PR :partying_face:	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Adds Dataset.query() method, analogous to pandas DataFrame.query() 819911891
800317378	https://github.com/pydata/xarray/pull/4984#issuecomment-800317378	https://api.github.com/repos/pydata/xarray/issues/4984	MDEyOklzc3VlQ29tbWVudDgwMDMxNzM3OA==	alimanfoo 703554	2021-03-16T14:40:45Z	2021-03-16T14:40:45Z	CONTRIBUTOR	Could we add a very small test for the DataArray? Given the coverage on Dataset, it should mostly just test that the method works. No problem, some DataArray tests are there. Any thoughts from others before we merge? Good to go from my side.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Adds Dataset.query() method, analogous to pandas DataFrame.query() 819911891
800176868	https://github.com/pydata/xarray/pull/4984#issuecomment-800176868	https://api.github.com/repos/pydata/xarray/issues/4984	MDEyOklzc3VlQ29tbWVudDgwMDE3Njg2OA==	alimanfoo 703554	2021-03-16T11:24:42Z	2021-03-16T11:24:42Z	CONTRIBUTOR	Hi @max-sixty, It looks like we need a `requires_numexpr` decorator on the tests — would you be OK to add that? Sure, done. Could we add a simple method to `DataArray` which converts to a Dataset, calls the functions, and converts back too? (there are lots of examples already of this, let me know any issues) Done. And we should add the methods to `api.rst`, and a whatsnew entry if possible. Done. Let me know if there's anything else. Looking forward to using this :smile:	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Adds Dataset.query() method, analogous to pandas DataFrame.query() 819911891
798993998	https://github.com/pydata/xarray/pull/4984#issuecomment-798993998	https://api.github.com/repos/pydata/xarray/issues/4984	MDEyOklzc3VlQ29tbWVudDc5ODk5Mzk5OA==	alimanfoo 703554	2021-03-14T22:44:49Z	2021-03-14T22:44:49Z	CONTRIBUTOR	Currently the test runs over an array of two dimensions — `x` & `y`. Would `pd.query` work if there were also a `z` dimension? No worries, yes any number of dimensions can be queried. I've added tests showing three dimensions can be queried. As an aside, in writing these tests I came upon a probable upstream bug in pandas, reported as https://github.com/pandas-dev/pandas/issues/40436. I don't think this affects this PR though, and has low impact as only the "python" query parser is affected, and most people will use the default "pandas" query parser.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Adds Dataset.query() method, analogous to pandas DataFrame.query() 819911891
797668635	https://github.com/pydata/xarray/pull/4984#issuecomment-797668635	https://api.github.com/repos/pydata/xarray/issues/4984	MDEyOklzc3VlQ29tbWVudDc5NzY2ODYzNQ==	alimanfoo 703554	2021-03-12T18:16:15Z	2021-03-12T18:16:15Z	CONTRIBUTOR	Just to mention I've added tests to verify this works with variables backed by dask arrays. Also added explicit tests of different eval engine and query parser options. And added a docstring.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Adds Dataset.query() method, analogous to pandas DataFrame.query() 819911891
797636489	https://github.com/pydata/xarray/pull/4984#issuecomment-797636489	https://api.github.com/repos/pydata/xarray/issues/4984	MDEyOklzc3VlQ29tbWVudDc5NzYzNjQ4OQ==	alimanfoo 703554	2021-03-12T17:21:29Z	2021-03-12T17:21:29Z	CONTRIBUTOR	Hi @max-sixty, no problem. Re this... Does the `pd.eval` work with more than two dimensions? ...not quite sure what you mean, could you elaborate?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Adds Dataset.query() method, analogous to pandas DataFrame.query() 819911891
788828644	https://github.com/pydata/xarray/pull/4984#issuecomment-788828644	https://api.github.com/repos/pydata/xarray/issues/4984	MDEyOklzc3VlQ29tbWVudDc4ODgyODY0NA==	alimanfoo 703554	2021-03-02T11:10:20Z	2021-03-02T11:10:20Z	CONTRIBUTOR	Hi folks, thought I'd put up a proof of concept PR here for further discussion. Any advice/suggestions about if/how to take this forward would be very welcome.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Adds Dataset.query() method, analogous to pandas DataFrame.query() 819911891
631075010	https://github.com/pydata/xarray/issues/4079#issuecomment-631075010	https://api.github.com/repos/pydata/xarray/issues/4079	MDEyOklzc3VlQ29tbWVudDYzMTA3NTAxMA==	alimanfoo 703554	2020-05-19T20:50:26Z	2020-05-19T20:50:51Z	CONTRIBUTOR	In the specific example from your notebook, where do the dimensions lengths `__variants/BaseCounts_dim1`, `__variants/MLEAC_dim1` and `__variants/MLEAF_dim1` come from? `BaseCounts_dim1` is length 4, so maybe that corresponds to DNA bases ATGC? In this specific example, I do actually know where these dimension lengths come from. In fact I should've used the shared dimension `alt_alleles` instead of `__variants/MLEAC_dim1` and `__variants/MLEAF_dim1`. And yes `BaseCounts_dim1` does correspond to DNA bases. But two points. First, I don't care about these dimensions. The only dimensions I care about and will use are `variants`, `samples` and `ploidy`. Second, more important, this kind of data can come from a number of different sources, each of which includes a different set of arrays with different names and semantics. While there are some common arrays and naming conventions where I can guess what the dimensions mean, in general I can't know all of those up front and bake them in as special cases.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Unnamed dimensions 621078539
631071623	https://github.com/pydata/xarray/issues/4081#issuecomment-631071623	https://api.github.com/repos/pydata/xarray/issues/4081	MDEyOklzc3VlQ29tbWVudDYzMTA3MTYyMw==	alimanfoo 703554	2020-05-19T20:43:07Z	2020-05-19T20:43:07Z	CONTRIBUTOR	Thanks @shoyer for raising this, would be nice to wrap the dimensions, I'd vote for one per line.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Wrap "Dimensions" onto multiple lines in xarray.Dataset repr? 621123222
630924754	https://github.com/pydata/xarray/issues/4079#issuecomment-630924754	https://api.github.com/repos/pydata/xarray/issues/4079	MDEyOklzc3VlQ29tbWVudDYzMDkyNDc1NA==	alimanfoo 703554	2020-05-19T16:14:27Z	2020-05-19T16:14:27Z	CONTRIBUTOR	Thanks @shoyer. For reference, I'm exploring putting some genome variation data into xarray, here's an initial experiment and discussion here. In general I will have some arrays where I won't know what some of the dimensions mean, and so cannot give them a meaningful name. No worries if this is hard, was just wondering if it was supported already.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Unnamed dimensions 621078539
630913851	https://github.com/pydata/xarray/issues/4079#issuecomment-630913851	https://api.github.com/repos/pydata/xarray/issues/4079	MDEyOklzc3VlQ29tbWVudDYzMDkxMzg1MQ==	alimanfoo 703554	2020-05-19T15:55:54Z	2020-05-19T15:55:54Z	CONTRIBUTOR	Thanks so much @rabernat for quick response.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Unnamed dimensions 621078539
605179227	https://github.com/pydata/xarray/issues/3831#issuecomment-605179227	https://api.github.com/repos/pydata/xarray/issues/3831	MDEyOklzc3VlQ29tbWVudDYwNTE3OTIyNw==	alimanfoo 703554	2020-03-27T18:10:05Z	2020-03-27T18:10:05Z	CONTRIBUTOR	Just to say having some kind of stack integration tests is a marvellous idea. Another example of an issue that's very hard to pin down is https://github.com/zarr-developers/zarr-python/issues/528. Btw we have also run into issues with fsspec caching directory listings and not invalidating the cache when store changes are made, although I haven't checked with latest master. We have a lot of workarounds in our code where we reopen everything after we've made changes to a store. Probably an area where some more digging and careful testing may be needed.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Errors using to_zarr for an s3 store 576337745
554463832	https://github.com/pydata/xarray/pull/3526#issuecomment-554463832	https://api.github.com/repos/pydata/xarray/issues/3526	MDEyOklzc3VlQ29tbWVudDU1NDQ2MzgzMg==	alimanfoo 703554	2019-11-15T17:57:42Z	2019-11-15T17:57:42Z	CONTRIBUTOR	FWIW in the Zarr Python implementation I don't think we do any special encoding or decoding of attribute values. Whatever value is given then gets serialised using the built-in `json.dumps`. This means I believe that if someone provides a `dict` as an attribute value then that will get serialised as a JSON object, and get deserialised back to a `dict`, although this is not something we test for currently. From the zarr v2 spec point of view I think anything goes in the `.zattrs` file, as long as `.zattrs` is a JSON object at the root. Hth.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow nested dictionaries in the Zarr backend (#3517) 522519084
455374760	https://github.com/pydata/xarray/issues/2586#issuecomment-455374760	https://api.github.com/repos/pydata/xarray/issues/2586	MDEyOklzc3VlQ29tbWVudDQ1NTM3NDc2MA==	alimanfoo 703554	2019-01-17T23:49:07Z	2019-01-17T23:49:07Z	CONTRIBUTOR	IMO, zarr needs some kind of "resolver" mechanism that takes a string and decides what kind of store it represents. For example, if the path ends with `.zip`, then it should know it's zip store, if it starts with `gs://`, it should know it's a google cloud store, etc. Some very limited support for this is there already, e.g., if string ends with '.zip' then a zip store will be used, but there's no support for dispatching to cloud stores via a URL-like protocol. There's an open issue for that: https://github.com/zarr-developers/zarr/issues/214	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr loading from ZipStore gives error on default arguments 386515973
444187219	https://github.com/pydata/xarray/issues/1603#issuecomment-444187219	https://api.github.com/repos/pydata/xarray/issues/1603	MDEyOklzc3VlQ29tbWVudDQ0NDE4NzIxOQ==	alimanfoo 703554	2018-12-04T17:33:34Z	2018-12-04T17:33:34Z	CONTRIBUTOR	I think that one big source of confusion has been so far mixing coordinates/variables and indexes. These are really two separate concepts, and the indexes refactoring should address that IMHO. For example, I think that da[some_name] should never return indexes but only coordinates (and/or data variables for Dataset). That would be much simpler. Can't claim to be following every detail here, but this sounds very sensible to me FWIW.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
442801741	https://github.com/pydata/xarray/pull/2559#issuecomment-442801741	https://api.github.com/repos/pydata/xarray/issues/2559	MDEyOklzc3VlQ29tbWVudDQ0MjgwMTc0MQ==	alimanfoo 703554	2018-11-29T11:33:33Z	2018-11-29T11:33:33Z	CONTRIBUTOR	Great to see this. On the API, FWIW I'd vote for using the same keyword (`consolidated`) in both, less burden on the user to remember what to use.	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr consolidated 382497709
392831984	https://github.com/pydata/xarray/issues/1603#issuecomment-392831984	https://api.github.com/repos/pydata/xarray/issues/1603	MDEyOklzc3VlQ29tbWVudDM5MjgzMTk4NA==	alimanfoo 703554	2018-05-29T15:59:46Z	2018-05-29T15:59:46Z	CONTRIBUTOR	Ok, cool. Was wondering if now was right time to revisit that, alongside the work proposed in this PR. Happy to participate in that discussion, still interested in implementing some alternative index classes. On Tue, 29 May 2018, 15:45 Stephan Hoyer, notifications@github.com wrote: Yes, the index API still needs to be determined. But I think we want to support something like that. On Tue, May 29, 2018 at 1:20 AM Alistair Miles notifications@github.com wrote: I see this mentions an Index API, is that still to be decided? On Tue, 29 May 2018, 05:28 Stephan Hoyer, notifications@github.com wrote: I started thinking about how to do this incrementally, and it occurs to me that a good place to start would be to write some of the utility functions we'll need for this: Normalizing and creating default indexes in the Dataset/DataArray constructor. Combining indexes from all xarray objects that are inputs for an operations into indexes for the outputs. Extracting MultiIndex objects from arguments into Dataset/DataArray and expanding them into multiple variables. I drafted up docstrings for each of these functions and did a little bit of working starting to think through implementations in #2195 https://github.com/pydata/xarray/pull/2195. So this would be a great place for others to help out. Each of these could be separate PRs. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1603#issuecomment-392649605, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAq8QvMauEPa6hfgorDoShZ2PwyYWk6Tks5t3M6AgaJpZM4PtACU . — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1603#issuecomment-392692996, or mute the thread < https://github.com/notifications/unsubscribe-auth/ABKS1p8RjrupPM2z2d4_ylWX7826RQ0Rks5t3QTHgaJpZM4PtACU . — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1603#issuecomment-392803210, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QgygnzTX053NlGZ5A5j_tRkRxMj7ks5t3V79gaJpZM4PtACU .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
392692996	https://github.com/pydata/xarray/issues/1603#issuecomment-392692996	https://api.github.com/repos/pydata/xarray/issues/1603	MDEyOklzc3VlQ29tbWVudDM5MjY5Mjk5Ng==	alimanfoo 703554	2018-05-29T08:20:22Z	2018-05-29T08:20:22Z	CONTRIBUTOR	I see this mentions an Index API, is that still to be decided? On Tue, 29 May 2018, 05:28 Stephan Hoyer, notifications@github.com wrote: I started thinking about how to do this incrementally, and it occurs to me that a good place to start would be to write some of the utility functions we'll need for this: Normalizing and creating default indexes in the Dataset/DataArray constructor. Combining indexes from all xarray objects that are inputs for an operations into indexes for the outputs. Extracting MultiIndex objects from arguments into Dataset/DataArray and expanding them into multiple variables. I drafted up docstrings for each of these functions and did a little bit of working starting to think through implementations in #2195 https://github.com/pydata/xarray/pull/2195. So this would be a great place for others to help out. Each of these could be separate PRs. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1603#issuecomment-392649605, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QvMauEPa6hfgorDoShZ2PwyYWk6Tks5t3M6AgaJpZM4PtACU .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
371626776	https://github.com/pydata/xarray/issues/1974#issuecomment-371626776	https://api.github.com/repos/pydata/xarray/issues/1974	MDEyOklzc3VlQ29tbWVudDM3MTYyNjc3Ng==	alimanfoo 703554	2018-03-08T21:15:04Z	2018-03-08T21:15:04Z	CONTRIBUTOR	It worked! Thanks again, pangeo.pydata.org is super cool.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray/zarr cloud demo 303270676
371603679	https://github.com/pydata/xarray/issues/1974#issuecomment-371603679	https://api.github.com/repos/pydata/xarray/issues/1974	MDEyOklzc3VlQ29tbWVudDM3MTYwMzY3OQ==	alimanfoo 703554	2018-03-08T19:52:01Z	2018-03-08T19:52:01Z	CONTRIBUTOR	I have it running! Will try to start the talk with it.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray/zarr cloud demo 303270676
371561259	https://github.com/pydata/xarray/issues/1974#issuecomment-371561259	https://api.github.com/repos/pydata/xarray/issues/1974	MDEyOklzc3VlQ29tbWVudDM3MTU2MTI1OQ==	alimanfoo 703554	2018-03-08T17:30:21Z	2018-03-08T17:30:21Z	CONTRIBUTOR	Actually just realising @rabernat and @mrocklin you guys already demoed all of this to ESIP back in January (really nice talk btw). So maybe I don't need to repeat.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray/zarr cloud demo 303270676
371558334	https://github.com/pydata/xarray/issues/1974#issuecomment-371558334	https://api.github.com/repos/pydata/xarray/issues/1974	MDEyOklzc3VlQ29tbWVudDM3MTU1ODMzNA==	alimanfoo 703554	2018-03-08T17:21:08Z	2018-03-08T17:21:08Z	CONTRIBUTOR	Thanks @mrocklin.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray/zarr cloud demo 303270676
371544386	https://github.com/pydata/xarray/issues/1974#issuecomment-371544386	https://api.github.com/repos/pydata/xarray/issues/1974	MDEyOklzc3VlQ29tbWVudDM3MTU0NDM4Ng==	alimanfoo 703554	2018-03-08T16:38:48Z	2018-03-08T16:38:48Z	CONTRIBUTOR	Ha, Murphy's law. Shame because the combination of jupyterlab interface, launching a kubernetes cluster, and being able to click through to the Dask dashboard looks futuristic cool :-) I was really looking forward to seeing all my jobs spinning through the Dask dashboard as they work. I actually have a pretty packed talk already so don't absolutely need to include this, but if it does come back in time I'll slot it in. Talk starts 8pm GMT so still a few hours yet...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray/zarr cloud demo 303270676
371538819	https://github.com/pydata/xarray/issues/1974#issuecomment-371538819	https://api.github.com/repos/pydata/xarray/issues/1974	MDEyOklzc3VlQ29tbWVudDM3MTUzODgxOQ==	alimanfoo 703554	2018-03-08T16:22:16Z	2018-03-08T16:22:16Z	CONTRIBUTOR	Just tried to run the xarray-data notebook from within pangeo.pydata.org jupyterlab, when I run this command: `gcsmap = gcsfs.mapping.GCSMap('pangeo-data/newman-met-ensemble')` ...it hangs there indefinitely. If I keyboard interrupt it bottoms out here: `/opt/conda/lib/python3.6/site-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options) 71 if source_address: 72 sock.bind(source_address) ---> 73 sock.connect(sa) 74 return sock 75` ...suggesting it is not able to make a connection. Am I doing something wrong?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray/zarr cloud demo 303270676
371299755	https://github.com/pydata/xarray/issues/1974#issuecomment-371299755	https://api.github.com/repos/pydata/xarray/issues/1974	MDEyOklzc3VlQ29tbWVudDM3MTI5OTc1NQ==	alimanfoo 703554	2018-03-07T21:58:49Z	2018-03-07T21:58:49Z	CONTRIBUTOR	Wonderful, thanks both!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray/zarr cloud demo 303270676
350375750	https://github.com/pydata/xarray/pull/1528#issuecomment-350375750	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM1MDM3NTc1MA==	alimanfoo 703554	2017-12-08T21:24:45Z	2017-12-08T22:27:47Z	CONTRIBUTOR	Just to confirm, if writes are aligned with chunk boundaries in the destination array then no locking is required. Also if you're going to be moving large datasets into cloud storage and doing distributed computing then it may be worth investigating compressors and compressor options as good compression ratio may make a big difference where network bandwidth may be the limiting factor. I would suggest using the Blosc compressor with cname='zstd'. I would also suggest using shuffle, the Blosc codec in latest numcodecs has an AUTOSHUFFLE option so byte shuffle is used for arrays with >1 byte item size and bit shuffle is used for arrays with 1 byte item size . I would also experiment with compression level (clevel) to see how speed balances against compression ratio. E.g., Blosc(cname='zstd', clevel=5, shuffle=Blosc.AUTOSHUFFLE) may be a good starting point. The default compressor is Blosc(cname='lz4', ...) is more optimised for fast local storage, so speed is very good but compression ratio is moderate, this may not be best for distributed computing.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
350379064	https://github.com/pydata/xarray/pull/1528#issuecomment-350379064	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM1MDM3OTA2NA==	alimanfoo 703554	2017-12-08T21:40:40Z	2017-12-08T22:27:35Z	CONTRIBUTOR	Some examples of compressor benchmarking here may be useful http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html The specific conclusions probably won't apply to your data but some of the code and ideas may be useful. Since writing that article I added Zstd and LZ4 compressors in numcodecs so those may also be worth trying in addition to Blosc with various configurations. (Blosc breaks up each chunk into blocks which enables multithreaded compression/decompression but can also reduce compression ratio over the same compressor library used without Blosc. I.e., Blosc(cname='zstd', clevel=1) will behave differently from Zstd(level=1) even though the same underlying compression library (Zstandard) is being used.)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
348839453	https://github.com/pydata/xarray/pull/1528#issuecomment-348839453	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM0ODgzOTQ1Mw==	alimanfoo 703554	2017-12-04T01:40:57Z	2017-12-04T01:40:57Z	CONTRIBUTOR	I know you're not including string support in this PR, but for interest, there are a couple of changes coming into zarr via https://github.com/alimanfoo/zarr/pull/212 that may be relevant in future. It should now be impossible to generate a segfault via a badly configured object array. It is also now much harder to badly configure an object array. When creating an object array, an object codec should be provided via the `object_codec` parameter. There are now three codecs in numcodecs that can be used for variable length text strings: MsgPack, Pickle and JSON (new). Examples notebook here. In that notebook I also ran some simple benchmarks and MsgPack comes out well, but JSON isn't too shabby either.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
348183062	https://github.com/pydata/xarray/pull/1087#issuecomment-348183062	https://api.github.com/repos/pydata/xarray/issues/1087	MDEyOklzc3VlQ29tbWVudDM0ODE4MzA2Mg==	alimanfoo 703554	2017-11-30T13:07:53Z	2017-11-30T13:07:53Z	CONTRIBUTOR	FWIW for the filters, if it would be possible to use the numcodecs Codec API http://numcodecs.readthedocs.io/en/latest/abc.html then that could be beneficial beyond xarray, as any work you put into developing filters could then be used elsewhere (e.g., in zarr). On Thu, Nov 30, 2017 at 12:05 PM, Stephan Hoyer notifications@github.com wrote: OK, I'm going to try to reboot this and finish it up in the form of an API that we'll be happy with going forward. I just discovered two more xarray backends over the past two days (in Unidata's Siphon and something @alexamici https://github.com/alexamici and colleagues are writing to reading GRIB files), so clearly the demand is here. One additional change I'd like to make is try to rewrite the encoding/decoding functions for variables into a series of invertible coding filters that can potentially be chained together in a flexible way (this is somewhat inspired by zarr). This will allow different backends to mix/match filters as necessary, depending on their particular needs. I'll start on that in another PR. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1087#issuecomment-348169779, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QmzjKBnyjuGDFN6btGfhr2eFrhoiks5s7poXgaJpZM4Kq10M . -- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/aliman limanfoo.github.io/ Twitter: https://twitter.com/alimanfoo	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: New DataStore / Encoder / Decoder API for review 187625917
347385269	https://github.com/pydata/xarray/pull/1528#issuecomment-347385269	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM0NzM4NTI2OQ==	alimanfoo 703554	2017-11-28T01:36:29Z	2017-11-28T01:49:24Z	CONTRIBUTOR	FWIW I think the best option at the moment is to make sure you add either Pickle or MsgPack filter for any zarr array with an object dtype. BTW I was thinking that zarr should automatically add one of these filters any time someone creates an array with an object dtype, to avoid them hitting the pointer issue. If you have any thoughts on best solution drop them here: https://github.com/alimanfoo/zarr/issues/208	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
347381734	https://github.com/pydata/xarray/pull/1528#issuecomment-347381734	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM0NzM4MTczNA==	alimanfoo 703554	2017-11-28T01:16:07Z	2017-11-28T01:16:07Z	CONTRIBUTOR	When still in the original interpreter session, all the objects still exist in memory, so all the pointers stored in the array are still valid. Restart the session and the objects are gone and the pointers are invalid. On Tue, Nov 28, 2017 at 1:14 AM, Alistair Miles alimanfoo@googlemail.com wrote: Try exiting and restarting the interpreter, then running: zgs = zarr.open_group(store='zarr_directory') zgs.x[:] On Tue, Nov 28, 2017 at 1:10 AM, Ryan Abernathey <notifications@github.com wrote: zarr needs a filter that can encode and pack the strings into a single buffer, except in the special case where the data are being stored in-memory @alimanfoo https://github.com/alimanfoo: the following also seems to works with directory store values = np.array([b'ab', b'cdef', np.nan], dtype=object) zgs = zarr.open_group(store='zarr_directory') zgs.create('x', shape=values.shape, dtype=values.dtype) zgs.x[:] = values This seems to contradict your statement above. What am I missing? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1528#issuecomment-347380750, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QnNQ7bI5GRyHsUUSQAgusymx8eJnks5s611rgaJpZM4PDrlp . -- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 <+44%201865%20743596> Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/aliman limanfoo.github.io/ Twitter: https://twitter.com/alimanfoo -- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/aliman limanfoo.github.io/ Twitter: https://twitter.com/alimanfoo	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
347381500	https://github.com/pydata/xarray/pull/1528#issuecomment-347381500	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM0NzM4MTUwMA==	alimanfoo 703554	2017-11-28T01:14:42Z	2017-11-28T01:14:42Z	CONTRIBUTOR	Try exiting and restarting the interpreter, then running: zgs = zarr.open_group(store='zarr_directory') zgs.x[:] On Tue, Nov 28, 2017 at 1:10 AM, Ryan Abernathey notifications@github.com wrote: zarr needs a filter that can encode and pack the strings into a single buffer, except in the special case where the data are being stored in-memory @alimanfoo https://github.com/alimanfoo: the following also seems to works with directory store values = np.array([b'ab', b'cdef', np.nan], dtype=object) zgs = zarr.open_group(store='zarr_directory') zgs.create('x', shape=values.shape, dtype=values.dtype) zgs.x[:] = values This seems to contradict your statement above. What am I missing? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1528#issuecomment-347380750, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QnNQ7bI5GRyHsUUSQAgusymx8eJnks5s611rgaJpZM4PDrlp . -- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/aliman limanfoo.github.io/ Twitter: https://twitter.com/alimanfoo	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
347363503	https://github.com/pydata/xarray/pull/1528#issuecomment-347363503	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM0NzM2MzUwMw==	alimanfoo 703554	2017-11-27T23:27:41Z	2017-11-27T23:27:41Z	CONTRIBUTOR	For variable length strings (or any array with an object dtype) zarr needs a filter that can encode and pack the strings into a single buffer, except in the special case where the data are being stored in-memory (as in your first example). The filter has to be specified manually, some examples here: http://zarr.readthedocs.io/en/master/tutorial.html#string-arrays. There are two codecs currently in numcodecs that can do this, one is Pickle, the other is MsgPack. I haven't done any benchmarking of data size or encoding speed, but MsgPack may be preferable because it's more portable. There was some discussion a while back about creating a codec that handles variable-length strings by encoding via UTF8 then concatenating encoded bytes and lengths or offsets, IIRC similar to Arrow, and maybe even creating a special "text" dtype that inserts this filter automatically so you don't have to add it manually. But there hasn't been a strong motivation so far. On Mon, Nov 27, 2017 at 10:32 PM, Stephan Hoyer notifications@github.com wrote: Overall, I find the conventions module to be a bit unwieldy. There is a lot of stuff in there, not all of which is related to CF conventions. It would be useful to separate the actual conventions from the encoding / decoding needed for different backends. Agreed! I wonder why zarr doesn't have a UTF-8 variable length string type ( alimanfoo/zarr#206 https://github.com/alimanfoo/zarr/issues/206) -- that would feel like the obvious first choice for encoding this data. That said, xarary should be able to use first-length bytes just fine, doing UTF-8 encoding/decoding on the fly. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1528#issuecomment-347351224, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QkLTQUuspLhiXYR2_WMW8Hg9LFziks5s6ziTgaJpZM4PDrlp . -- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/aliman limanfoo.github.io/ Twitter: https://twitter.com/alimanfoo	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
345619509	https://github.com/pydata/xarray/pull/1528#issuecomment-345619509	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM0NTYxOTUwOQ==	alimanfoo 703554	2017-11-20T08:07:44Z	2017-11-20T08:07:44Z	CONTRIBUTOR	Fantastic! On Monday, November 20, 2017, Matthew Rocklin notifications@github.com wrote: That is, indeed, quite exciting. Also exciting is that I was able to look at and compute on your data easily. In [1]: import zarr In [2]: import gcsfs In [3]: fs = gcsfs.GCSFileSystem(project='pangeo-181919') In [4]: gcsmap = gcsfs.mapping.GCSMap('zarr_store_test', gcs=fs, check=True, create=False) In [5]: import xarray as xr In [6]: ds_gcs = xr.open_zarr(gcsmap, mode='r') In [7]: ds_gcs Out[7]: <xarray.Dataset> Dimensions: (x: 200, y: 100) Coordinates: * x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... * y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... Data variables: bar (x) float64 dask.array<shape=(200,), chunksize=(40,)> foo (y, x) float32 dask.array<shape=(100, 200), chunksize=(50, 40)> Attributes: array_atr: [1, 2] some_attr: copana In [8]: ds_gcs.sum() Out[8]: <xarray.Dataset> Dimensions: () Data variables: bar float64 dask.array<shape=(), chunksize=()> foo float32 dask.array<shape=(), chunksize=()> In [9]: ds_gcs.sum().compute() Out[9]: <xarray.Dataset> Dimensions: () Data variables: bar float64 0.0 foo float32 20000.0 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1528#issuecomment-345575240, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8Quu1UYM4BO3i_KzMkXGnN-g-TFczks5s4OO5gaJpZM4PDrlp . -- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/aliman limanfoo.github.io/ Twitter: https://twitter.com/alimanfoo	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
345080945	https://github.com/pydata/xarray/pull/1528#issuecomment-345080945	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM0NTA4MDk0NQ==	alimanfoo 703554	2017-11-16T22:18:04Z	2017-11-16T22:18:04Z	CONTRIBUTOR	Re different zarr storage backends, main options are plain dict, DirectoryStore, ZipStore, and there's a new DBMStore class just merged which enables storage in any DBM-style database (e.g., Berkeley DB). ZipStore has some constraints because of how zip files work, you can't really replace an entry in a zip file which means anything that writes the same array chunk more than once will generate warnings. Dask's S3Map should also work, I haven't tried it and obviously not ideal for unit tests but I'd be interested if you get any experience with it. Re different combinations of zarr and dask chunks, it can be thread safe even if chunks are not aligned, just need to pass a synchronizer when instantiating the array or group. Zarr has a ThreadSynchronizer class which can be used for thread-based parallelism. If a synchronizer is provided, it is used to lock each chunk individually during write operations. More info here. Re fill values, zarr has a native concept of fill value for each array, with the fill value stored as part of the array metadata. Array metadata are stored as JSON and I recently merged a fix so that a bytes fill values could be used (via base64 encoding). I believe the netcdf way is to store fill value separately as value of "_FillValue" attribute? You could do this with zarr but user attributes are also JSON and so you would need to do your own encoding/decoding. But if possible I'd suggest using the native zarr fill_value support as it handles bytes fill value encoding and also checks to ensure fill values are valid wrt the array dtype.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
339897936	https://github.com/pydata/xarray/pull/1528#issuecomment-339897936	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDMzOTg5NzkzNg==	alimanfoo 703554	2017-10-27T07:42:34Z	2017-10-27T07:42:34Z	CONTRIBUTOR	Suggest testing against GitHub master, there are a few other issues I'd like to work through before next release. On Thu, 26 Oct 2017 at 23:07, Ryan Abernathey notifications@github.com wrote: Fantastic! Are you planning a release any time soon? If not we can set up to test against the github master. Sent from my iPhone On Oct 26, 2017, at 5:04 PM, Alistair Miles notifications@github.com wrote: Just to say, support for 0d arrays, and for arrays with one or more zero-length dimensions, is in zarr master. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1528#issuecomment-339815147, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QtP5kta-H9Y90Puv9BHig7krEI0Wks5swQKQgaJpZM4PDrlp . -- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/aliman limanfoo.github.io/ Twitter: https://twitter.com/alimanfoo	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
339800443	https://github.com/pydata/xarray/pull/1528#issuecomment-339800443	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDMzOTgwMDQ0Mw==	alimanfoo 703554	2017-10-26T21:04:17Z	2017-10-26T21:04:17Z	CONTRIBUTOR	Just to say, support for 0d arrays, and for arrays with one or more zero-length dimensions, is in zarr master.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
338786761	https://github.com/pydata/xarray/issues/1650#issuecomment-338786761	https://api.github.com/repos/pydata/xarray/issues/1650	MDEyOklzc3VlQ29tbWVudDMzODc4Njc2MQ==	alimanfoo 703554	2017-10-23T20:29:41Z	2017-10-23T20:29:41Z	CONTRIBUTOR	Index API sounds good. Also I was just looking at dask.dataframe indexing, there .loc is implemented using information about index values at the boundaries of each partition (chunk). Not sure xarray should use same strategy for chunked datasets, but is another approach to avoid loading indexes into memory.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Low memory/out-of-core index? 267628781
338687376	https://github.com/pydata/xarray/issues/1650#issuecomment-338687376	https://api.github.com/repos/pydata/xarray/issues/1650	MDEyOklzc3VlQ29tbWVudDMzODY4NzM3Ng==	alimanfoo 703554	2017-10-23T14:58:59Z	2017-10-23T14:58:59Z	CONTRIBUTOR	It looks like #1017 is about having no index at all. I want indexes, but I want to avoid loading all coordinate values into memory. On Mon, Oct 23, 2017 at 1:47 PM, Fabien Maussion notifications@github.com wrote: Has anyone considered implementing an index for monotonic data that does not require loading all values into main memory? But this is already the case? #1017 https://github.com/pydata/xarray/pull/1017 With on file datasets I think it is sufficient to drop_variables when opening the dataset in order not to parse the coordinates: ds = xr.open_dataset(f, drop_variables=['lon', 'lat']) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1650#issuecomment-338647540, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QsbZ81N2pKybO1sFHVHK0KTk1aELks5svIrJgaJpZM4QCq62 . -- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/aliman limanfoo.github.io/ Twitter: https://twitter.com/alimanfoo	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Low memory/out-of-core index? 267628781
338627454	https://github.com/pydata/xarray/issues/1650#issuecomment-338627454	https://api.github.com/repos/pydata/xarray/issues/1650	MDEyOklzc3VlQ29tbWVudDMzODYyNzQ1NA==	alimanfoo 703554	2017-10-23T11:19:30Z	2017-10-23T11:19:30Z	CONTRIBUTOR	Just to add a further thought, which is that the upper levels of the binary search tree could be be cached to get faster performance for repeated searches.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Low memory/out-of-core index? 267628781
338622746	https://github.com/pydata/xarray/issues/1603#issuecomment-338622746	https://api.github.com/repos/pydata/xarray/issues/1603	MDEyOklzc3VlQ29tbWVudDMzODYyMjc0Ng==	alimanfoo 703554	2017-10-23T10:56:40Z	2017-10-23T10:56:40Z	CONTRIBUTOR	Just to say I'm interested in how MultiIndexes are handled also. In our use case, we have two variables conventionally named CHROM (chromosome) and POS (position) which together describe a location in a genome. I want to combine both variables into a multi-index so I can, e.g., select all data from some data variable for chromosome X between positions 100,000-200,000. For all our data variables, this genome location multi-index would be used to index the first dimension.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
338459385	https://github.com/pydata/xarray/issues/66#issuecomment-338459385	https://api.github.com/repos/pydata/xarray/issues/66	MDEyOklzc3VlQ29tbWVudDMzODQ1OTM4NQ==	alimanfoo 703554	2017-10-22T08:02:29Z	2017-10-22T08:02:29Z	CONTRIBUTOR	Just to say thanks for the work on this, I've been looking at the h5netcdf code recently to understand better how dimensions are plumbed in netcdf4. I'm exploring refactoring all my data model classes in scikit-allel to build on xarray, I think the time is right, especially if xarray gets a Zarr backend too. On Sun, 22 Oct 2017 at 02:01, Stephan Hoyer notifications@github.com wrote: Closed #66 https://github.com/pydata/xarray/issues/66. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/66#event-1304360167, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QqPs_6iyjBqHhFoB2CV7blLX8TUYks5supQEgaJpZM4BpxKD . -- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/aliman limanfoo.github.io/ Twitter: https://twitter.com/alimanfoo	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	HDF5 backend for xray 29453809
335186616	https://github.com/pydata/xarray/pull/1528#issuecomment-335186616	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDMzNTE4NjYxNg==	alimanfoo 703554	2017-10-09T15:07:29Z	2017-10-09T17:23:21Z	CONTRIBUTOR	I'm on paternity leave for the next 2 weeks, then will be catching up for a couple of weeks I expect. May be able to merge straightforward PRs but will have limited bandwidth.	{ "total_count": 3, "+1": 0, "-1": 0, "laugh": 0, "hooray": 3, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
335030993	https://github.com/pydata/xarray/pull/1528#issuecomment-335030993	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDMzNTAzMDk5Mw==	alimanfoo 703554	2017-10-08T19:17:27Z	2017-10-08T23:37:47Z	CONTRIBUTOR	FWIW I think some JSON encoders for attributes would ultimately be a useful addition to zarr, but I won't be able to put any effort into zarr in the next month, so workarounds in xarray sounds like a good idea for now.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
325813339	https://github.com/pydata/xarray/pull/1528#issuecomment-325813339	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDMyNTgxMzMzOQ==	alimanfoo 703554	2017-08-29T21:43:48Z	2017-08-29T21:43:48Z	CONTRIBUTOR	On Tuesday, August 29, 2017, Ryan Abernathey notifications@github.com wrote: @alimanfoo https://github.com/alimanfoo: when do you anticipate the 2.2 zarr release to happen? Will the API change significantly? If so, I will wait for that to move forward here. Zarr 2.2 will hopefully happen some time in the next 2 months, but it will be fully backwards-compatible, no breaking API changes. -- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/aliman limanfoo.github.io/ Twitter: https://twitter.com/alimanfoo	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
325729013	https://github.com/pydata/xarray/pull/1528#issuecomment-325729013	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDMyNTcyOTAxMw==	alimanfoo 703554	2017-08-29T17:02:41Z	2017-08-29T17:02:41Z	CONTRIBUTOR	FWIW all filter (codec) classes have been migrated from zarr to a separate packaged called numcodecs and will be imported from there in the next (2.2) zarr release. Here is FixedScaleOffset. Implementation is basic numpy, probably some room for optimization.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
325727280	https://github.com/pydata/xarray/pull/1528#issuecomment-325727280	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDMyNTcyNzI4MA==	alimanfoo 703554	2017-08-29T16:56:55Z	2017-08-29T16:56:55Z	CONTRIBUTOR	Following this with interest. Regarding autoclose, just to confirm that zarr doesn't really have any notion of whether something is open or closed. When using the DirectoryStore storage class (most common use case I imagine), all files are automatically closed, nothing is kept open. There are some storage classes (e.g., ZipStore) that do require an explicit close call to finalise the file on disk if you have been writing data, but I think you can ignore this in xarray and leave it up to the user to manage this themselves. Out of interest, @shoyer do you still think there would be value in writing a wrapper for zarr analogous to h5netcdf? Or does this PR provide all the necessary functionality?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
282031922	https://github.com/pydata/xarray/issues/1223#issuecomment-282031922	https://api.github.com/repos/pydata/xarray/issues/1223	MDEyOklzc3VlQ29tbWVudDI4MjAzMTkyMg==	alimanfoo 703554	2017-02-23T15:55:38Z	2017-02-23T15:55:38Z	CONTRIBUTOR	FWIW I think it would be better in xarray or a separate package, at least at the moment, just because I don't have a lot of time right now for OSS and need to keep Zarr as lean as possible. On Thursday, February 23, 2017, Martin Durant notifications@github.com wrote: @alimanfoo https://github.com/alimanfoo , do you think this work would make more sense as part of zarr rather than as part of xarray? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1223#issuecomment-281990573, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QoeCQOn7WvB8gtLP5Bs6cifIKRQiks5rfYjSgaJpZM4Lp0yH . -- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr as persistent store for xarray 202260275
281829618	https://github.com/pydata/xarray/issues/1223#issuecomment-281829618	https://api.github.com/repos/pydata/xarray/issues/1223	MDEyOklzc3VlQ29tbWVudDI4MTgyOTYxOA==	alimanfoo 703554	2017-02-22T22:43:52Z	2017-02-22T22:43:52Z	CONTRIBUTOR	Yep, that looks good. I was wondering about the xarray_to_zarr() function? On Wednesday, February 22, 2017, Martin Durant notifications@github.com wrote: @alimanfoo https://github.com/alimanfoo , in the new dataset save function, I do exactly [as you suggest] (https://gist.github.com/ martindurant/06a1e98c91f0033c4649a48a2f943390#file-zarr_xarr-py-L168), with everything getting put as a dict into the main zarr group attributes, with special attribute names "attrs" for the data-set root, "coords" for the set of coordinate objects and "variables" for the set of variables objects (all of these have their own attributes in xarray). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1223#issuecomment-281813651, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QqSXNzQkrR0xOhhcp9QxWUIkz8Teks5rfKvggaJpZM4Lp0yH . -- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr as persistent store for xarray 202260275
281496902	https://github.com/pydata/xarray/issues/1223#issuecomment-281496902	https://api.github.com/repos/pydata/xarray/issues/1223	MDEyOklzc3VlQ29tbWVudDI4MTQ5NjkwMg==	alimanfoo 703554	2017-02-21T22:05:39Z	2017-02-21T22:05:39Z	CONTRIBUTOR	Just to say this is looking neat. For storing an xarray.DataArray, do you think it would be possible to do away with pickling up all metadata and storing in the .xarray resource? Specifically I'm wondering if this could all be stored as attributes on the Zarr array, with some conventions for special xarray attribute names? I'm guessing there must be some conventions for storing all this metadata as attributes in an HDF5 (netCDF) file, it would potentially be nice to mirror that as much as possible? On Sat, Feb 11, 2017 at 10:56 PM, Martin Durant notifications@github.com wrote: I have developed my example a little to sidestep subclassing you suggest, which seemed tricky to implement. Please see https://gist.github.com/martindurant/ 06a1e98c91f0033c4649a48a2f943390 (dataset_to/from_zarr functions) I can use the zarr groups structure to mirror at least typical use of xarrays: variables, coordinates and sets of attributes on each. I have tested this with s3 too, stealing a little code from dask to show the idea. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1223#issuecomment-279181938, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QtydMLiMvgETYyaVF5D1CLb-4ot4ks5rbjy5gaJpZM4Lp0yH . -- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr as persistent store for xarray 202260275
274214755	https://github.com/pydata/xarray/issues/1223#issuecomment-274214755	https://api.github.com/repos/pydata/xarray/issues/1223	MDEyOklzc3VlQ29tbWVudDI3NDIxNDc1NQ==	alimanfoo 703554	2017-01-21T00:24:27Z	2017-01-21T00:24:27Z	CONTRIBUTOR	Happy to help if there's anything to do on the zarr side. On Fri, 20 Jan 2017 at 23:47, Matthew Rocklin notifications@github.com wrote: Also cc @alimanfoo https://github.com/alimanfoo — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1223#issuecomment-274209930, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QlwtJQ_OKOekveWuYtLmpR-caHvgks5rUUeTgaJpZM4Lp0yH .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr as persistent store for xarray 202260275
90813596	https://github.com/pydata/xarray/issues/66#issuecomment-90813596	https://api.github.com/repos/pydata/xarray/issues/66	MDEyOklzc3VlQ29tbWVudDkwODEzNTk2	alimanfoo 703554	2015-04-08T06:04:53Z	2015-04-08T06:04:53Z	CONTRIBUTOR	Thanks Stephan, I'll take a look.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	HDF5 backend for xray 29453809
43385302	https://github.com/pydata/xarray/pull/127#issuecomment-43385302	https://api.github.com/repos/pydata/xarray/issues/127	MDEyOklzc3VlQ29tbWVudDQzMzg1MzAy	alimanfoo 703554	2014-05-16T22:16:01Z	2014-05-16T22:16:01Z	CONTRIBUTOR	No worries, glad to contribute. On Friday, 16 May 2014, Stephan Hoyer notifications@github.com wrote: Thanks @alimanfoo https://github.com/alimanfoo! Reply to this email directly or view it on GitHubhttps://github.com/xray-pydata/xray/pull/127#issuecomment-43287303 . Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Web: http://purl.org/net/aliman Email: alimanfoo@gmail.com Tel: +44 (0)1865 287721 _new number_	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	initial implementation of support for NetCDF groups 33396232
43059199	https://github.com/pydata/xarray/pull/127#issuecomment-43059199	https://api.github.com/repos/pydata/xarray/issues/127	MDEyOklzc3VlQ29tbWVudDQzMDU5MTk5	alimanfoo 703554	2014-05-14T09:20:01Z	2014-05-14T09:20:01Z	CONTRIBUTOR	I've added a test to check for an error when a group is not found. I also changed the implementation of the group access function to avoid recursion, it seemed simpler.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	initial implementation of support for NetCDF groups 33396232
43024743	https://github.com/pydata/xarray/pull/127#issuecomment-43024743	https://api.github.com/repos/pydata/xarray/issues/127	MDEyOklzc3VlQ29tbWVudDQzMDI0NzQz	alimanfoo 703554	2014-05-13T23:11:07Z	2014-05-13T23:11:07Z	CONTRIBUTOR	Thanks for the comments, all makes good sense.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	initial implementation of support for NetCDF groups 33396232
42869488	https://github.com/pydata/xarray/issues/66#issuecomment-42869488	https://api.github.com/repos/pydata/xarray/issues/66	MDEyOklzc3VlQ29tbWVudDQyODY5NDg4	alimanfoo 703554	2014-05-12T18:29:57Z	2014-05-12T18:29:57Z	CONTRIBUTOR	One other detail, I have an HDF5 group for each conceptual dataset, but then variables may be organised into subgroups. It would be nice if this could be accommodated, e.g., when opening an HDF5 group as an xray dataset, assume the dataset contains all variables in the group and any subgroups searched recursively. Again apologies I don't know if this is allowed in NetCDF4, will do the research.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	HDF5 backend for xray 29453809
42840763	https://github.com/pydata/xarray/issues/66#issuecomment-42840763	https://api.github.com/repos/pydata/xarray/issues/66	MDEyOklzc3VlQ29tbWVudDQyODQwNzYz	alimanfoo 703554	2014-05-12T14:45:57Z	2014-05-12T14:45:57Z	CONTRIBUTOR	Thanks @akleeman for the info, much appreciated. A couple of other points I thought maybe worth mentioning if you're considering wrapping h5py. First I've been using lzf as the compression filter in my HDF5 files. I believe h5py bundles the source for lzf. I don't know if lzf would be supported if accessing through the python netcdf API. Second, I have a situation where I have multiple datasets, each of which is stored in a separate groups, each of which has two dimensions (genome position and biological sample). The genome position scale is different for each dataset (there's one dataset per chromosome), however, the biological sample scale is actually common to all of the datasets. So at the moment I have a variable in the root group with the "samples" dimension scale, then each dataset group has it's own "position" dimension scale. You can represent all this with HDF5 dimension scales, but I've no idea if this is accommodated by NetCDF4 or could fit into the xray model. I could work around this by copying the samples variable into each dataset, but just thought I mention this pattern as something to be aware of. On Mon, May 12, 2014 at 3:04 PM, akleeman notifications@github.com wrote: @alimanfoo https://github.com/alimanfoo Glad you're enjoying xray! From your description it sounds like it should be relatively simple for you to get xray working with your dataset. NetCDF4 is a subset of h5py and simply adding dimension scales should get you most of the way there. Re: groups, each xray.Dataset corresponds to one HDF5 group. So while xray doesn't currently support groups, you could split your HDF5 dataset into separate files for each group and load those files using xray. Alternatively (if you feel ambitious) it shouldn't be too hard to get xray's NetCDF4DataStore (backends.netCDF4_.py) to work with groups, allowing you to do something like: dataset = xray.open_dataset('multiple_groups.h5', group='/one_group') Thishttp://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-module.htmlgives some good examples of how groups work within the netCDF4. Also, as @shoyer https://github.com/shoyer mentioned, it might make sense to modify xray so that NetCDF4 support is obtained by wrapping h5py instead of netCDF4 which might make your life even easier. Reply to this email directly or view it on GitHubhttps://github.com/xray-pydata/xray/issues/66#issuecomment-42835510 . Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Web: http://purl.org/net/aliman Email: alimanfoo@gmail.com Tel: +44 (0)1865 287721 _new number_	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	HDF5 backend for xray 29453809
42805550	https://github.com/pydata/xarray/issues/66#issuecomment-42805550	https://api.github.com/repos/pydata/xarray/issues/66	MDEyOklzc3VlQ29tbWVudDQyODA1NTUw	alimanfoo 703554	2014-05-12T08:08:37Z	2014-05-12T08:08:37Z	CONTRIBUTOR	I'm really enjoying working with xray, it's so nice to be able to think of my dimensions as named and labeled dimensions, no more remembering which axis is which! I'm not sure if this is relevant to this specific issue, but I am working for the most part with HDF5 files created using h5py. I'm only just learning about NetCDF-4, but I have datasets that comprise a number of 1D and 2D variables with shared dimensions, so I think my data is already very close to the right model. I have a couple of questions: (1) If I have multiple datasets within an HDF5 file, each within a separate group, can I access those through xray? (2) What would I need to add to my HDF5 to make it fully compliant with the xray/NetCDF4 model? Is it just a question of creating and attaching dimension scales or would I need to do something else as well? Thanks in advance.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	HDF5 backend for xray 29453809

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);