issue_comments
66 rows where author_association = "CONTRIBUTOR" and user = 703554 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: issue_url, reactions, created_at (date), updated_at (date)
user 1
- alimanfoo · 66 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1544199022 | https://github.com/pydata/xarray/issues/7833#issuecomment-1544199022 | https://api.github.com/repos/pydata/xarray/issues/7833 | IC_kwDOAMm_X85cCptu | alimanfoo 703554 | 2023-05-11T15:26:52Z | 2023-05-11T15:26:52Z | CONTRIBUTOR | Awesome, thanks @kmuehlbauer and @Illviljan 🙏🏻 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Slow performance of concat() 1704950804 | |
1190061811 | https://github.com/pydata/xarray/issues/3564#issuecomment-1190061811 | https://api.github.com/repos/pydata/xarray/issues/3564 | IC_kwDOAMm_X85G7ubz | alimanfoo 703554 | 2022-07-20T09:44:40Z | 2022-07-20T09:44:40Z | CONTRIBUTOR | Hi folks, Just to mention that we've created a short tutorial on xarray which is meant as a gentle intro to folks coming from the malaria genetics field, who mostly have never heard of xarray before. We illustrate xarray first using outputs from a geostatistical model of how insecticide-treated bednets are used in Africa. We then give a couple of brief examples of how we use xarray for genomic data. There's video walkthroughs in French and English: https://anopheles-genomic-surveillance.github.io/workshop-5/module-1-xarray.html Please feel free to link to this in the xarray tutorial site if you'd like to :) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
DOC: from examples to tutorials 527323165 | |
1190057727 | https://github.com/pydata/xarray/issues/6771#issuecomment-1190057727 | https://api.github.com/repos/pydata/xarray/issues/6771 | IC_kwDOAMm_X85G7tb_ | alimanfoo 703554 | 2022-07-20T09:40:41Z | 2022-07-20T09:41:07Z | CONTRIBUTOR | Hi @dcherian,
FWIW we've created a short tutorial on xarray which is meant as a gentle intro to folks coming from the malaria genetics field. We illustrate xarray first using outputs from a geostatistical model of how insecticide-treated bednets are used in Africa. We then give a couple of brief examples of how we use xarray for genomic data. There's video walkthroughs in French and English: https://anopheles-genomic-surveillance.github.io/workshop-5/module-1-xarray.html Please feel free to link to this in the xarray tutorial site if you'd like to :) |
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 } |
Explaining xarray in a single picture 1300534066 | |
1190052947 | https://github.com/pydata/xarray/issues/6771#issuecomment-1190052947 | https://api.github.com/repos/pydata/xarray/issues/6771 | IC_kwDOAMm_X85G7sRT | alimanfoo 703554 | 2022-07-20T09:36:10Z | 2022-07-20T09:36:10Z | CONTRIBUTOR | Hi @TomNicholas,
Interesting, I hadn't considered that. Definitely a bit mind-bending though for us non-geoscientists :)
SGTM. FWIW on the second diagram I would use "dimensions" instead of "indexes". Getting dimensions first then helps to explain how you can use a coordinate variable to index a dimension. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explaining xarray in a single picture 1300534066 | |
1054526670 | https://github.com/pydata/xarray/issues/324#issuecomment-1054526670 | https://api.github.com/repos/pydata/xarray/issues/324 | IC_kwDOAMm_X84-2szO | alimanfoo 703554 | 2022-02-28T18:10:02Z | 2022-02-28T18:10:02Z | CONTRIBUTOR | Still relevant, would like to be able to group by multiple variables along a single dimension. |
{ "total_count": 6, "+1": 6, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Support multi-dimensional grouped operations and group_over 58117200 | |
802732278 | https://github.com/pydata/xarray/issues/4663#issuecomment-802732278 | https://api.github.com/repos/pydata/xarray/issues/4663 | MDEyOklzc3VlQ29tbWVudDgwMjczMjI3OA== | alimanfoo 703554 | 2021-03-19T10:44:31Z | 2021-03-19T10:44:31Z | CONTRIBUTOR | Thanks @dcherian. Just to add that if we make progress with supporting indexing with dask arrays then at some point I think we'll hit a separate issue, which is that xarray will require that the chunk sizes of the indexed arrays are computed, but currently calling the dask array method In case anyone needs a workaround for indexing a dataset with a 1d boolean dask array, I'm currently using this hacked implementation of a compress() style function that operates on an xarray dataset, which includes more efficient computation of chunk sizes. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Fancy indexing a Dataset with dask DataArray triggers multiple computes 759709924 | |
802101178 | https://github.com/pydata/xarray/issues/5054#issuecomment-802101178 | https://api.github.com/repos/pydata/xarray/issues/5054 | MDEyOklzc3VlQ29tbWVudDgwMjEwMTE3OA== | alimanfoo 703554 | 2021-03-18T16:45:51Z | 2021-03-18T16:58:44Z | CONTRIBUTOR | FWIW my use case actually only needs indexing a single dimension, i.e., something equivalent to the numpy (or dask.array) compress function. This can be hacked for xarray datasets in a fairly straightforward way: ```python def _compress_dataarray(a, indexer, dim): data = a.data try: axis = a.dims.index(dim) except ValueError: v = data else: # rely on array_function to handle dispatching to dask if # data is a dask array v = np.compress(indexer, a.data, axis=axis) if hasattr(v, 'compute_chunk_sizes'): # needed to know dim lengths v.compute_chunk_sizes() return v def compress_dataset(ds, indexer, dim): if isinstance(indexer, str): indexer = ds[indexer].data
``` Given the complexity of fancy indexing in general, I wonder if it's worth contemplating implementing a |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Fancy indexing a Dataset with dask DataArray causes excessive memory usage 834972299 | |
802096873 | https://github.com/pydata/xarray/issues/5054#issuecomment-802096873 | https://api.github.com/repos/pydata/xarray/issues/5054 | MDEyOklzc3VlQ29tbWVudDgwMjA5Njg3Mw== | alimanfoo 703554 | 2021-03-18T16:39:59Z | 2021-03-18T16:39:59Z | CONTRIBUTOR | Thanks @dcherian. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Fancy indexing a Dataset with dask DataArray causes excessive memory usage 834972299 | |
800504527 | https://github.com/pydata/xarray/pull/4984#issuecomment-800504527 | https://api.github.com/repos/pydata/xarray/issues/4984 | MDEyOklzc3VlQ29tbWVudDgwMDUwNDUyNw== | alimanfoo 703554 | 2021-03-16T18:28:09Z | 2021-03-16T18:28:09Z | CONTRIBUTOR | Yay, first xarray PR :partying_face: |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Adds Dataset.query() method, analogous to pandas DataFrame.query() 819911891 | |
800317378 | https://github.com/pydata/xarray/pull/4984#issuecomment-800317378 | https://api.github.com/repos/pydata/xarray/issues/4984 | MDEyOklzc3VlQ29tbWVudDgwMDMxNzM3OA== | alimanfoo 703554 | 2021-03-16T14:40:45Z | 2021-03-16T14:40:45Z | CONTRIBUTOR |
No problem, some DataArray tests are there.
Good to go from my side. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Adds Dataset.query() method, analogous to pandas DataFrame.query() 819911891 | |
800176868 | https://github.com/pydata/xarray/pull/4984#issuecomment-800176868 | https://api.github.com/repos/pydata/xarray/issues/4984 | MDEyOklzc3VlQ29tbWVudDgwMDE3Njg2OA== | alimanfoo 703554 | 2021-03-16T11:24:42Z | 2021-03-16T11:24:42Z | CONTRIBUTOR | Hi @max-sixty,
Sure, done.
Done.
Done. Let me know if there's anything else. Looking forward to using this :smile: |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Adds Dataset.query() method, analogous to pandas DataFrame.query() 819911891 | |
798993998 | https://github.com/pydata/xarray/pull/4984#issuecomment-798993998 | https://api.github.com/repos/pydata/xarray/issues/4984 | MDEyOklzc3VlQ29tbWVudDc5ODk5Mzk5OA== | alimanfoo 703554 | 2021-03-14T22:44:49Z | 2021-03-14T22:44:49Z | CONTRIBUTOR |
No worries, yes any number of dimensions can be queried. I've added tests showing three dimensions can be queried. As an aside, in writing these tests I came upon a probable upstream bug in pandas, reported as https://github.com/pandas-dev/pandas/issues/40436. I don't think this affects this PR though, and has low impact as only the "python" query parser is affected, and most people will use the default "pandas" query parser. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Adds Dataset.query() method, analogous to pandas DataFrame.query() 819911891 | |
797668635 | https://github.com/pydata/xarray/pull/4984#issuecomment-797668635 | https://api.github.com/repos/pydata/xarray/issues/4984 | MDEyOklzc3VlQ29tbWVudDc5NzY2ODYzNQ== | alimanfoo 703554 | 2021-03-12T18:16:15Z | 2021-03-12T18:16:15Z | CONTRIBUTOR | Just to mention I've added tests to verify this works with variables backed by dask arrays. Also added explicit tests of different eval engine and query parser options. And added a docstring. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Adds Dataset.query() method, analogous to pandas DataFrame.query() 819911891 | |
797636489 | https://github.com/pydata/xarray/pull/4984#issuecomment-797636489 | https://api.github.com/repos/pydata/xarray/issues/4984 | MDEyOklzc3VlQ29tbWVudDc5NzYzNjQ4OQ== | alimanfoo 703554 | 2021-03-12T17:21:29Z | 2021-03-12T17:21:29Z | CONTRIBUTOR | Hi @max-sixty, no problem. Re this...
...not quite sure what you mean, could you elaborate? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Adds Dataset.query() method, analogous to pandas DataFrame.query() 819911891 | |
788828644 | https://github.com/pydata/xarray/pull/4984#issuecomment-788828644 | https://api.github.com/repos/pydata/xarray/issues/4984 | MDEyOklzc3VlQ29tbWVudDc4ODgyODY0NA== | alimanfoo 703554 | 2021-03-02T11:10:20Z | 2021-03-02T11:10:20Z | CONTRIBUTOR | Hi folks, thought I'd put up a proof of concept PR here for further discussion. Any advice/suggestions about if/how to take this forward would be very welcome. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Adds Dataset.query() method, analogous to pandas DataFrame.query() 819911891 | |
631075010 | https://github.com/pydata/xarray/issues/4079#issuecomment-631075010 | https://api.github.com/repos/pydata/xarray/issues/4079 | MDEyOklzc3VlQ29tbWVudDYzMTA3NTAxMA== | alimanfoo 703554 | 2020-05-19T20:50:26Z | 2020-05-19T20:50:51Z | CONTRIBUTOR |
In this specific example, I do actually know where these dimension lengths come from. In fact I should've used the shared dimension But two points. First, I don't care about these dimensions. The only dimensions I care about and will use are Second, more important, this kind of data can come from a number of different sources, each of which includes a different set of arrays with different names and semantics. While there are some common arrays and naming conventions where I can guess what the dimensions mean, in general I can't know all of those up front and bake them in as special cases. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Unnamed dimensions 621078539 | |
631071623 | https://github.com/pydata/xarray/issues/4081#issuecomment-631071623 | https://api.github.com/repos/pydata/xarray/issues/4081 | MDEyOklzc3VlQ29tbWVudDYzMTA3MTYyMw== | alimanfoo 703554 | 2020-05-19T20:43:07Z | 2020-05-19T20:43:07Z | CONTRIBUTOR | Thanks @shoyer for raising this, would be nice to wrap the dimensions, I'd vote for one per line. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Wrap "Dimensions" onto multiple lines in xarray.Dataset repr? 621123222 | |
630924754 | https://github.com/pydata/xarray/issues/4079#issuecomment-630924754 | https://api.github.com/repos/pydata/xarray/issues/4079 | MDEyOklzc3VlQ29tbWVudDYzMDkyNDc1NA== | alimanfoo 703554 | 2020-05-19T16:14:27Z | 2020-05-19T16:14:27Z | CONTRIBUTOR | Thanks @shoyer. For reference, I'm exploring putting some genome variation data into xarray, here's an initial experiment and discussion here. In general I will have some arrays where I won't know what some of the dimensions mean, and so cannot give them a meaningful name. No worries if this is hard, was just wondering if it was supported already. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Unnamed dimensions 621078539 | |
630913851 | https://github.com/pydata/xarray/issues/4079#issuecomment-630913851 | https://api.github.com/repos/pydata/xarray/issues/4079 | MDEyOklzc3VlQ29tbWVudDYzMDkxMzg1MQ== | alimanfoo 703554 | 2020-05-19T15:55:54Z | 2020-05-19T15:55:54Z | CONTRIBUTOR | Thanks so much @rabernat for quick response. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Unnamed dimensions 621078539 | |
605179227 | https://github.com/pydata/xarray/issues/3831#issuecomment-605179227 | https://api.github.com/repos/pydata/xarray/issues/3831 | MDEyOklzc3VlQ29tbWVudDYwNTE3OTIyNw== | alimanfoo 703554 | 2020-03-27T18:10:05Z | 2020-03-27T18:10:05Z | CONTRIBUTOR | Just to say having some kind of stack integration tests is a marvellous idea. Another example of an issue that's very hard to pin down is https://github.com/zarr-developers/zarr-python/issues/528. Btw we have also run into issues with fsspec caching directory listings and not invalidating the cache when store changes are made, although I haven't checked with latest master. We have a lot of workarounds in our code where we reopen everything after we've made changes to a store. Probably an area where some more digging and careful testing may be needed. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Errors using to_zarr for an s3 store 576337745 | |
554463832 | https://github.com/pydata/xarray/pull/3526#issuecomment-554463832 | https://api.github.com/repos/pydata/xarray/issues/3526 | MDEyOklzc3VlQ29tbWVudDU1NDQ2MzgzMg== | alimanfoo 703554 | 2019-11-15T17:57:42Z | 2019-11-15T17:57:42Z | CONTRIBUTOR | FWIW in the Zarr Python implementation I don't think we do any special encoding or decoding of attribute values. Whatever value is given then gets serialised using the built-in From the zarr v2 spec point of view I think anything goes in the Hth. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Allow nested dictionaries in the Zarr backend (#3517) 522519084 | |
455374760 | https://github.com/pydata/xarray/issues/2586#issuecomment-455374760 | https://api.github.com/repos/pydata/xarray/issues/2586 | MDEyOklzc3VlQ29tbWVudDQ1NTM3NDc2MA== | alimanfoo 703554 | 2019-01-17T23:49:07Z | 2019-01-17T23:49:07Z | CONTRIBUTOR |
Some very limited support for this is there already, e.g., if string ends with '.zip' then a zip store will be used, but there's no support for dispatching to cloud stores via a URL-like protocol. There's an open issue for that: https://github.com/zarr-developers/zarr/issues/214 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Zarr loading from ZipStore gives error on default arguments 386515973 | |
444187219 | https://github.com/pydata/xarray/issues/1603#issuecomment-444187219 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0NDE4NzIxOQ== | alimanfoo 703554 | 2018-12-04T17:33:34Z | 2018-12-04T17:33:34Z | CONTRIBUTOR |
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442801741 | https://github.com/pydata/xarray/pull/2559#issuecomment-442801741 | https://api.github.com/repos/pydata/xarray/issues/2559 | MDEyOklzc3VlQ29tbWVudDQ0MjgwMTc0MQ== | alimanfoo 703554 | 2018-11-29T11:33:33Z | 2018-11-29T11:33:33Z | CONTRIBUTOR | Great to see this. On the API, FWIW I'd vote for using the same keyword ( |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Zarr consolidated 382497709 | |
392831984 | https://github.com/pydata/xarray/issues/1603#issuecomment-392831984 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM5MjgzMTk4NA== | alimanfoo 703554 | 2018-05-29T15:59:46Z | 2018-05-29T15:59:46Z | CONTRIBUTOR | Ok, cool. Was wondering if now was right time to revisit that, alongside the work proposed in this PR. Happy to participate in that discussion, still interested in implementing some alternative index classes. On Tue, 29 May 2018, 15:45 Stephan Hoyer, notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
392692996 | https://github.com/pydata/xarray/issues/1603#issuecomment-392692996 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM5MjY5Mjk5Ng== | alimanfoo 703554 | 2018-05-29T08:20:22Z | 2018-05-29T08:20:22Z | CONTRIBUTOR | I see this mentions an Index API, is that still to be decided? On Tue, 29 May 2018, 05:28 Stephan Hoyer, notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
371626776 | https://github.com/pydata/xarray/issues/1974#issuecomment-371626776 | https://api.github.com/repos/pydata/xarray/issues/1974 | MDEyOklzc3VlQ29tbWVudDM3MTYyNjc3Ng== | alimanfoo 703554 | 2018-03-08T21:15:04Z | 2018-03-08T21:15:04Z | CONTRIBUTOR | It worked! Thanks again, pangeo.pydata.org is super cool. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray/zarr cloud demo 303270676 | |
371603679 | https://github.com/pydata/xarray/issues/1974#issuecomment-371603679 | https://api.github.com/repos/pydata/xarray/issues/1974 | MDEyOklzc3VlQ29tbWVudDM3MTYwMzY3OQ== | alimanfoo 703554 | 2018-03-08T19:52:01Z | 2018-03-08T19:52:01Z | CONTRIBUTOR | I have it running! Will try to start the talk with it. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray/zarr cloud demo 303270676 | |
371561259 | https://github.com/pydata/xarray/issues/1974#issuecomment-371561259 | https://api.github.com/repos/pydata/xarray/issues/1974 | MDEyOklzc3VlQ29tbWVudDM3MTU2MTI1OQ== | alimanfoo 703554 | 2018-03-08T17:30:21Z | 2018-03-08T17:30:21Z | CONTRIBUTOR | Actually just realising @rabernat and @mrocklin you guys already demoed all of this to ESIP back in January (really nice talk btw). So maybe I don't need to repeat. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray/zarr cloud demo 303270676 | |
371558334 | https://github.com/pydata/xarray/issues/1974#issuecomment-371558334 | https://api.github.com/repos/pydata/xarray/issues/1974 | MDEyOklzc3VlQ29tbWVudDM3MTU1ODMzNA== | alimanfoo 703554 | 2018-03-08T17:21:08Z | 2018-03-08T17:21:08Z | CONTRIBUTOR | Thanks @mrocklin. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray/zarr cloud demo 303270676 | |
371544386 | https://github.com/pydata/xarray/issues/1974#issuecomment-371544386 | https://api.github.com/repos/pydata/xarray/issues/1974 | MDEyOklzc3VlQ29tbWVudDM3MTU0NDM4Ng== | alimanfoo 703554 | 2018-03-08T16:38:48Z | 2018-03-08T16:38:48Z | CONTRIBUTOR | Ha, Murphy's law. Shame because the combination of jupyterlab interface, launching a kubernetes cluster, and being able to click through to the Dask dashboard looks futuristic cool :-) I was really looking forward to seeing all my jobs spinning through the Dask dashboard as they work. I actually have a pretty packed talk already so don't absolutely need to include this, but if it does come back in time I'll slot it in. Talk starts 8pm GMT so still a few hours yet... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray/zarr cloud demo 303270676 | |
371538819 | https://github.com/pydata/xarray/issues/1974#issuecomment-371538819 | https://api.github.com/repos/pydata/xarray/issues/1974 | MDEyOklzc3VlQ29tbWVudDM3MTUzODgxOQ== | alimanfoo 703554 | 2018-03-08T16:22:16Z | 2018-03-08T16:22:16Z | CONTRIBUTOR | Just tried to run the xarray-data notebook from within pangeo.pydata.org jupyterlab, when I run this command:
...it hangs there indefinitely. If I keyboard interrupt it bottoms out here:
...suggesting it is not able to make a connection. Am I doing something wrong? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray/zarr cloud demo 303270676 | |
371299755 | https://github.com/pydata/xarray/issues/1974#issuecomment-371299755 | https://api.github.com/repos/pydata/xarray/issues/1974 | MDEyOklzc3VlQ29tbWVudDM3MTI5OTc1NQ== | alimanfoo 703554 | 2018-03-07T21:58:49Z | 2018-03-07T21:58:49Z | CONTRIBUTOR | Wonderful, thanks both! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray/zarr cloud demo 303270676 | |
350375750 | https://github.com/pydata/xarray/pull/1528#issuecomment-350375750 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM1MDM3NTc1MA== | alimanfoo 703554 | 2017-12-08T21:24:45Z | 2017-12-08T22:27:47Z | CONTRIBUTOR | Just to confirm, if writes are aligned with chunk boundaries in the destination array then no locking is required. Also if you're going to be moving large datasets into cloud storage and doing distributed computing then it may be worth investigating compressors and compressor options as good compression ratio may make a big difference where network bandwidth may be the limiting factor. I would suggest using the Blosc compressor with cname='zstd'. I would also suggest using shuffle, the Blosc codec in latest numcodecs has an AUTOSHUFFLE option so byte shuffle is used for arrays with >1 byte item size and bit shuffle is used for arrays with 1 byte item size . I would also experiment with compression level (clevel) to see how speed balances against compression ratio. E.g., Blosc(cname='zstd', clevel=5, shuffle=Blosc.AUTOSHUFFLE) may be a good starting point. The default compressor is Blosc(cname='lz4', ...) is more optimised for fast local storage, so speed is very good but compression ratio is moderate, this may not be best for distributed computing. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
350379064 | https://github.com/pydata/xarray/pull/1528#issuecomment-350379064 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM1MDM3OTA2NA== | alimanfoo 703554 | 2017-12-08T21:40:40Z | 2017-12-08T22:27:35Z | CONTRIBUTOR | Some examples of compressor benchmarking here may be useful http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html The specific conclusions probably won't apply to your data but some of the code and ideas may be useful. Since writing that article I added Zstd and LZ4 compressors in numcodecs so those may also be worth trying in addition to Blosc with various configurations. (Blosc breaks up each chunk into blocks which enables multithreaded compression/decompression but can also reduce compression ratio over the same compressor library used without Blosc. I.e., Blosc(cname='zstd', clevel=1) will behave differently from Zstd(level=1) even though the same underlying compression library (Zstandard) is being used.) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
348839453 | https://github.com/pydata/xarray/pull/1528#issuecomment-348839453 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0ODgzOTQ1Mw== | alimanfoo 703554 | 2017-12-04T01:40:57Z | 2017-12-04T01:40:57Z | CONTRIBUTOR | I know you're not including string support in this PR, but for interest, there are a couple of changes coming into zarr via https://github.com/alimanfoo/zarr/pull/212 that may be relevant in future. It should now be impossible to generate a segfault via a badly configured object array. It is also now much harder to badly configure an object array. When creating an object array, an object codec should be provided via the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
348183062 | https://github.com/pydata/xarray/pull/1087#issuecomment-348183062 | https://api.github.com/repos/pydata/xarray/issues/1087 | MDEyOklzc3VlQ29tbWVudDM0ODE4MzA2Mg== | alimanfoo 703554 | 2017-11-30T13:07:53Z | 2017-11-30T13:07:53Z | CONTRIBUTOR | FWIW for the filters, if it would be possible to use the numcodecs Codec API http://numcodecs.readthedocs.io/en/latest/abc.html then that could be beneficial beyond xarray, as any work you put into developing filters could then be used elsewhere (e.g., in zarr). On Thu, Nov 30, 2017 at 12:05 PM, Stephan Hoyer notifications@github.com wrote:
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/alimanlimanfoo.github.io/ Twitter: https://twitter.com/alimanfoo |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: New DataStore / Encoder / Decoder API for review 187625917 | |
347385269 | https://github.com/pydata/xarray/pull/1528#issuecomment-347385269 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NzM4NTI2OQ== | alimanfoo 703554 | 2017-11-28T01:36:29Z | 2017-11-28T01:49:24Z | CONTRIBUTOR | FWIW I think the best option at the moment is to make sure you add either Pickle or MsgPack filter for any zarr array with an object dtype. BTW I was thinking that zarr should automatically add one of these filters any time someone creates an array with an object dtype, to avoid them hitting the pointer issue. If you have any thoughts on best solution drop them here: https://github.com/alimanfoo/zarr/issues/208 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
347381734 | https://github.com/pydata/xarray/pull/1528#issuecomment-347381734 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NzM4MTczNA== | alimanfoo 703554 | 2017-11-28T01:16:07Z | 2017-11-28T01:16:07Z | CONTRIBUTOR | When still in the original interpreter session, all the objects still exist in memory, so all the pointers stored in the array are still valid. Restart the session and the objects are gone and the pointers are invalid. On Tue, Nov 28, 2017 at 1:14 AM, Alistair Miles alimanfoo@googlemail.com wrote:
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/alimanlimanfoo.github.io/ Twitter: https://twitter.com/alimanfoo |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
347381500 | https://github.com/pydata/xarray/pull/1528#issuecomment-347381500 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NzM4MTUwMA== | alimanfoo 703554 | 2017-11-28T01:14:42Z | 2017-11-28T01:14:42Z | CONTRIBUTOR | Try exiting and restarting the interpreter, then running: zgs = zarr.open_group(store='zarr_directory') zgs.x[:] On Tue, Nov 28, 2017 at 1:10 AM, Ryan Abernathey notifications@github.com wrote:
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/alimanlimanfoo.github.io/ Twitter: https://twitter.com/alimanfoo |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
347363503 | https://github.com/pydata/xarray/pull/1528#issuecomment-347363503 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NzM2MzUwMw== | alimanfoo 703554 | 2017-11-27T23:27:41Z | 2017-11-27T23:27:41Z | CONTRIBUTOR | For variable length strings (or any array with an object dtype) zarr needs a filter that can encode and pack the strings into a single buffer, except in the special case where the data are being stored in-memory (as in your first example). The filter has to be specified manually, some examples here: http://zarr.readthedocs.io/en/master/tutorial.html#string-arrays. There are two codecs currently in numcodecs that can do this, one is Pickle, the other is MsgPack. I haven't done any benchmarking of data size or encoding speed, but MsgPack may be preferable because it's more portable. There was some discussion a while back about creating a codec that handles variable-length strings by encoding via UTF8 then concatenating encoded bytes and lengths or offsets, IIRC similar to Arrow, and maybe even creating a special "text" dtype that inserts this filter automatically so you don't have to add it manually. But there hasn't been a strong motivation so far. On Mon, Nov 27, 2017 at 10:32 PM, Stephan Hoyer notifications@github.com wrote:
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/alimanlimanfoo.github.io/ Twitter: https://twitter.com/alimanfoo |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
345619509 | https://github.com/pydata/xarray/pull/1528#issuecomment-345619509 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NTYxOTUwOQ== | alimanfoo 703554 | 2017-11-20T08:07:44Z | 2017-11-20T08:07:44Z | CONTRIBUTOR | Fantastic! On Monday, November 20, 2017, Matthew Rocklin notifications@github.com wrote:
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/alimanlimanfoo.github.io/ Twitter: https://twitter.com/alimanfoo |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
345080945 | https://github.com/pydata/xarray/pull/1528#issuecomment-345080945 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NTA4MDk0NQ== | alimanfoo 703554 | 2017-11-16T22:18:04Z | 2017-11-16T22:18:04Z | CONTRIBUTOR | Re different zarr storage backends, main options are plain dict, DirectoryStore, ZipStore, and there's a new DBMStore class just merged which enables storage in any DBM-style database (e.g., Berkeley DB). ZipStore has some constraints because of how zip files work, you can't really replace an entry in a zip file which means anything that writes the same array chunk more than once will generate warnings. Dask's S3Map should also work, I haven't tried it and obviously not ideal for unit tests but I'd be interested if you get any experience with it. Re different combinations of zarr and dask chunks, it can be thread safe even if chunks are not aligned, just need to pass a synchronizer when instantiating the array or group. Zarr has a ThreadSynchronizer class which can be used for thread-based parallelism. If a synchronizer is provided, it is used to lock each chunk individually during write operations. More info here. Re fill values, zarr has a native concept of fill value for each array, with the fill value stored as part of the array metadata. Array metadata are stored as JSON and I recently merged a fix so that a bytes fill values could be used (via base64 encoding). I believe the netcdf way is to store fill value separately as value of "_FillValue" attribute? You could do this with zarr but user attributes are also JSON and so you would need to do your own encoding/decoding. But if possible I'd suggest using the native zarr fill_value support as it handles bytes fill value encoding and also checks to ensure fill values are valid wrt the array dtype. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
339897936 | https://github.com/pydata/xarray/pull/1528#issuecomment-339897936 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMzOTg5NzkzNg== | alimanfoo 703554 | 2017-10-27T07:42:34Z | 2017-10-27T07:42:34Z | CONTRIBUTOR | Suggest testing against GitHub master, there are a few other issues I'd like to work through before next release. On Thu, 26 Oct 2017 at 23:07, Ryan Abernathey notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
339800443 | https://github.com/pydata/xarray/pull/1528#issuecomment-339800443 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMzOTgwMDQ0Mw== | alimanfoo 703554 | 2017-10-26T21:04:17Z | 2017-10-26T21:04:17Z | CONTRIBUTOR | Just to say, support for 0d arrays, and for arrays with one or more zero-length dimensions, is in zarr master. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
338786761 | https://github.com/pydata/xarray/issues/1650#issuecomment-338786761 | https://api.github.com/repos/pydata/xarray/issues/1650 | MDEyOklzc3VlQ29tbWVudDMzODc4Njc2MQ== | alimanfoo 703554 | 2017-10-23T20:29:41Z | 2017-10-23T20:29:41Z | CONTRIBUTOR | Index API sounds good. Also I was just looking at dask.dataframe indexing, there .loc is implemented using information about index values at the boundaries of each partition (chunk). Not sure xarray should use same strategy for chunked datasets, but is another approach to avoid loading indexes into memory. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Low memory/out-of-core index? 267628781 | |
338687376 | https://github.com/pydata/xarray/issues/1650#issuecomment-338687376 | https://api.github.com/repos/pydata/xarray/issues/1650 | MDEyOklzc3VlQ29tbWVudDMzODY4NzM3Ng== | alimanfoo 703554 | 2017-10-23T14:58:59Z | 2017-10-23T14:58:59Z | CONTRIBUTOR | It looks like #1017 is about having no index at all. I want indexes, but I want to avoid loading all coordinate values into memory. On Mon, Oct 23, 2017 at 1:47 PM, Fabien Maussion notifications@github.com wrote:
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/alimanlimanfoo.github.io/ Twitter: https://twitter.com/alimanfoo |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Low memory/out-of-core index? 267628781 | |
338627454 | https://github.com/pydata/xarray/issues/1650#issuecomment-338627454 | https://api.github.com/repos/pydata/xarray/issues/1650 | MDEyOklzc3VlQ29tbWVudDMzODYyNzQ1NA== | alimanfoo 703554 | 2017-10-23T11:19:30Z | 2017-10-23T11:19:30Z | CONTRIBUTOR | Just to add a further thought, which is that the upper levels of the binary search tree could be be cached to get faster performance for repeated searches. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Low memory/out-of-core index? 267628781 | |
338622746 | https://github.com/pydata/xarray/issues/1603#issuecomment-338622746 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzODYyMjc0Ng== | alimanfoo 703554 | 2017-10-23T10:56:40Z | 2017-10-23T10:56:40Z | CONTRIBUTOR | Just to say I'm interested in how MultiIndexes are handled also. In our use case, we have two variables conventionally named CHROM (chromosome) and POS (position) which together describe a location in a genome. I want to combine both variables into a multi-index so I can, e.g., select all data from some data variable for chromosome X between positions 100,000-200,000. For all our data variables, this genome location multi-index would be used to index the first dimension. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
338459385 | https://github.com/pydata/xarray/issues/66#issuecomment-338459385 | https://api.github.com/repos/pydata/xarray/issues/66 | MDEyOklzc3VlQ29tbWVudDMzODQ1OTM4NQ== | alimanfoo 703554 | 2017-10-22T08:02:29Z | 2017-10-22T08:02:29Z | CONTRIBUTOR | Just to say thanks for the work on this, I've been looking at the h5netcdf code recently to understand better how dimensions are plumbed in netcdf4. I'm exploring refactoring all my data model classes in scikit-allel to build on xarray, I think the time is right, especially if xarray gets a Zarr backend too. On Sun, 22 Oct 2017 at 02:01, Stephan Hoyer notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
HDF5 backend for xray 29453809 | |
335186616 | https://github.com/pydata/xarray/pull/1528#issuecomment-335186616 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMzNTE4NjYxNg== | alimanfoo 703554 | 2017-10-09T15:07:29Z | 2017-10-09T17:23:21Z | CONTRIBUTOR | I'm on paternity leave for the next 2 weeks, then will be catching up for a couple of weeks I expect. May be able to merge straightforward PRs but will have limited bandwidth. |
{ "total_count": 3, "+1": 0, "-1": 0, "laugh": 0, "hooray": 3, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
335030993 | https://github.com/pydata/xarray/pull/1528#issuecomment-335030993 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMzNTAzMDk5Mw== | alimanfoo 703554 | 2017-10-08T19:17:27Z | 2017-10-08T23:37:47Z | CONTRIBUTOR | FWIW I think some JSON encoders for attributes would ultimately be a useful addition to zarr, but I won't be able to put any effort into zarr in the next month, so workarounds in xarray sounds like a good idea for now. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
325813339 | https://github.com/pydata/xarray/pull/1528#issuecomment-325813339 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMyNTgxMzMzOQ== | alimanfoo 703554 | 2017-08-29T21:43:48Z | 2017-08-29T21:43:48Z | CONTRIBUTOR | On Tuesday, August 29, 2017, Ryan Abernathey notifications@github.com wrote:
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org Big Data Institute Building Old Road Campus Roosevelt Drive Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/alimanlimanfoo.github.io/ Twitter: https://twitter.com/alimanfoo |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
325729013 | https://github.com/pydata/xarray/pull/1528#issuecomment-325729013 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMyNTcyOTAxMw== | alimanfoo 703554 | 2017-08-29T17:02:41Z | 2017-08-29T17:02:41Z | CONTRIBUTOR | FWIW all filter (codec) classes have been migrated from zarr to a separate packaged called numcodecs and will be imported from there in the next (2.2) zarr release. Here is FixedScaleOffset. Implementation is basic numpy, probably some room for optimization. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
325727280 | https://github.com/pydata/xarray/pull/1528#issuecomment-325727280 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMyNTcyNzI4MA== | alimanfoo 703554 | 2017-08-29T16:56:55Z | 2017-08-29T16:56:55Z | CONTRIBUTOR | Following this with interest. Regarding autoclose, just to confirm that zarr doesn't really have any notion of whether something is open or closed. When using the DirectoryStore storage class (most common use case I imagine), all files are automatically closed, nothing is kept open. There are some storage classes (e.g., ZipStore) that do require an explicit close call to finalise the file on disk if you have been writing data, but I think you can ignore this in xarray and leave it up to the user to manage this themselves. Out of interest, @shoyer do you still think there would be value in writing a wrapper for zarr analogous to h5netcdf? Or does this PR provide all the necessary functionality? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
282031922 | https://github.com/pydata/xarray/issues/1223#issuecomment-282031922 | https://api.github.com/repos/pydata/xarray/issues/1223 | MDEyOklzc3VlQ29tbWVudDI4MjAzMTkyMg== | alimanfoo 703554 | 2017-02-23T15:55:38Z | 2017-02-23T15:55:38Z | CONTRIBUTOR | FWIW I think it would be better in xarray or a separate package, at least at the moment, just because I don't have a lot of time right now for OSS and need to keep Zarr as lean as possible. On Thursday, February 23, 2017, Martin Durant notifications@github.com wrote:
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
zarr as persistent store for xarray 202260275 | |
281829618 | https://github.com/pydata/xarray/issues/1223#issuecomment-281829618 | https://api.github.com/repos/pydata/xarray/issues/1223 | MDEyOklzc3VlQ29tbWVudDI4MTgyOTYxOA== | alimanfoo 703554 | 2017-02-22T22:43:52Z | 2017-02-22T22:43:52Z | CONTRIBUTOR | Yep, that looks good. I was wondering about the xarray_to_zarr() function? On Wednesday, February 22, 2017, Martin Durant notifications@github.com wrote:
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
zarr as persistent store for xarray 202260275 | |
281496902 | https://github.com/pydata/xarray/issues/1223#issuecomment-281496902 | https://api.github.com/repos/pydata/xarray/issues/1223 | MDEyOklzc3VlQ29tbWVudDI4MTQ5NjkwMg== | alimanfoo 703554 | 2017-02-21T22:05:39Z | 2017-02-21T22:05:39Z | CONTRIBUTOR | Just to say this is looking neat. For storing an xarray.DataArray, do you think it would be possible to do away with pickling up all metadata and storing in the .xarray resource? Specifically I'm wondering if this could all be stored as attributes on the Zarr array, with some conventions for special xarray attribute names? I'm guessing there must be some conventions for storing all this metadata as attributes in an HDF5 (netCDF) file, it would potentially be nice to mirror that as much as possible? On Sat, Feb 11, 2017 at 10:56 PM, Martin Durant notifications@github.com wrote:
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
zarr as persistent store for xarray 202260275 | |
274214755 | https://github.com/pydata/xarray/issues/1223#issuecomment-274214755 | https://api.github.com/repos/pydata/xarray/issues/1223 | MDEyOklzc3VlQ29tbWVudDI3NDIxNDc1NQ== | alimanfoo 703554 | 2017-01-21T00:24:27Z | 2017-01-21T00:24:27Z | CONTRIBUTOR | Happy to help if there's anything to do on the zarr side. On Fri, 20 Jan 2017 at 23:47, Matthew Rocklin notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
zarr as persistent store for xarray 202260275 | |
90813596 | https://github.com/pydata/xarray/issues/66#issuecomment-90813596 | https://api.github.com/repos/pydata/xarray/issues/66 | MDEyOklzc3VlQ29tbWVudDkwODEzNTk2 | alimanfoo 703554 | 2015-04-08T06:04:53Z | 2015-04-08T06:04:53Z | CONTRIBUTOR | Thanks Stephan, I'll take a look. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
HDF5 backend for xray 29453809 | |
43385302 | https://github.com/pydata/xarray/pull/127#issuecomment-43385302 | https://api.github.com/repos/pydata/xarray/issues/127 | MDEyOklzc3VlQ29tbWVudDQzMzg1MzAy | alimanfoo 703554 | 2014-05-16T22:16:01Z | 2014-05-16T22:16:01Z | CONTRIBUTOR | No worries, glad to contribute. On Friday, 16 May 2014, Stephan Hoyer notifications@github.com wrote:
Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Web: http://purl.org/net/aliman Email: alimanfoo@gmail.com Tel: +44 (0)1865 287721 _new number_ |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
initial implementation of support for NetCDF groups 33396232 | |
43059199 | https://github.com/pydata/xarray/pull/127#issuecomment-43059199 | https://api.github.com/repos/pydata/xarray/issues/127 | MDEyOklzc3VlQ29tbWVudDQzMDU5MTk5 | alimanfoo 703554 | 2014-05-14T09:20:01Z | 2014-05-14T09:20:01Z | CONTRIBUTOR | I've added a test to check for an error when a group is not found. I also changed the implementation of the group access function to avoid recursion, it seemed simpler. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
initial implementation of support for NetCDF groups 33396232 | |
43024743 | https://github.com/pydata/xarray/pull/127#issuecomment-43024743 | https://api.github.com/repos/pydata/xarray/issues/127 | MDEyOklzc3VlQ29tbWVudDQzMDI0NzQz | alimanfoo 703554 | 2014-05-13T23:11:07Z | 2014-05-13T23:11:07Z | CONTRIBUTOR | Thanks for the comments, all makes good sense. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
initial implementation of support for NetCDF groups 33396232 | |
42869488 | https://github.com/pydata/xarray/issues/66#issuecomment-42869488 | https://api.github.com/repos/pydata/xarray/issues/66 | MDEyOklzc3VlQ29tbWVudDQyODY5NDg4 | alimanfoo 703554 | 2014-05-12T18:29:57Z | 2014-05-12T18:29:57Z | CONTRIBUTOR | One other detail, I have an HDF5 group for each conceptual dataset, but then variables may be organised into subgroups. It would be nice if this could be accommodated, e.g., when opening an HDF5 group as an xray dataset, assume the dataset contains all variables in the group and any subgroups searched recursively. Again apologies I don't know if this is allowed in NetCDF4, will do the research. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
HDF5 backend for xray 29453809 | |
42840763 | https://github.com/pydata/xarray/issues/66#issuecomment-42840763 | https://api.github.com/repos/pydata/xarray/issues/66 | MDEyOklzc3VlQ29tbWVudDQyODQwNzYz | alimanfoo 703554 | 2014-05-12T14:45:57Z | 2014-05-12T14:45:57Z | CONTRIBUTOR | Thanks @akleeman for the info, much appreciated. A couple of other points I thought maybe worth mentioning if you're considering wrapping h5py. First I've been using lzf as the compression filter in my HDF5 files. I believe h5py bundles the source for lzf. I don't know if lzf would be supported if accessing through the python netcdf API. Second, I have a situation where I have multiple datasets, each of which is stored in a separate groups, each of which has two dimensions (genome position and biological sample). The genome position scale is different for each dataset (there's one dataset per chromosome), however, the biological sample scale is actually common to all of the datasets. So at the moment I have a variable in the root group with the "samples" dimension scale, then each dataset group has it's own "position" dimension scale. You can represent all this with HDF5 dimension scales, but I've no idea if this is accommodated by NetCDF4 or could fit into the xray model. I could work around this by copying the samples variable into each dataset, but just thought I mention this pattern as something to be aware of. On Mon, May 12, 2014 at 3:04 PM, akleeman notifications@github.com wrote:
Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Web: http://purl.org/net/aliman Email: alimanfoo@gmail.com Tel: +44 (0)1865 287721 _new number_ |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
HDF5 backend for xray 29453809 | |
42805550 | https://github.com/pydata/xarray/issues/66#issuecomment-42805550 | https://api.github.com/repos/pydata/xarray/issues/66 | MDEyOklzc3VlQ29tbWVudDQyODA1NTUw | alimanfoo 703554 | 2014-05-12T08:08:37Z | 2014-05-12T08:08:37Z | CONTRIBUTOR | I'm really enjoying working with xray, it's so nice to be able to think of my dimensions as named and labeled dimensions, no more remembering which axis is which! I'm not sure if this is relevant to this specific issue, but I am working for the most part with HDF5 files created using h5py. I'm only just learning about NetCDF-4, but I have datasets that comprise a number of 1D and 2D variables with shared dimensions, so I think my data is already very close to the right model. I have a couple of questions: (1) If I have multiple datasets within an HDF5 file, each within a separate group, can I access those through xray? (2) What would I need to add to my HDF5 to make it fully compliant with the xray/NetCDF4 model? Is it just a question of creating and attaching dimension scales or would I need to do something else as well? Thanks in advance. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
HDF5 backend for xray 29453809 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
issue 21