issue_comments
71 rows where author_association = "MEMBER" and issue = 253136694 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- WIP: Zarr backend · 71 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
364954680 | https://github.com/pydata/xarray/pull/1528#issuecomment-364954680 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM2NDk1NDY4MA== | rabernat 1197350 | 2018-02-12T15:21:51Z | 2018-02-12T15:21:51Z | MEMBER | I'm enjoying this discussion. Zarr offers lots of new possibilities for appending / updating datasets that we should try to support. I personally would really like to be able to append / extend existing arrays from within xarray. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
364804265 | https://github.com/pydata/xarray/pull/1528#issuecomment-364804265 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM2NDgwNDI2NQ== | shoyer 1217238 | 2018-02-12T00:15:23Z | 2018-02-12T00:15:23Z | MEMBER | See https://github.com/dask/dask/issues/2000 for the dask issue. Once this works in dask it should be quite easy to implement in xarray, too. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
364804162 | https://github.com/pydata/xarray/pull/1528#issuecomment-364804162 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM2NDgwNDE2Mg== | shoyer 1217238 | 2018-02-12T00:14:22Z | 2018-02-12T00:14:22Z | MEMBER | @martindurant that could probably be addressed most cleanly by improving |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
364802374 | https://github.com/pydata/xarray/pull/1528#issuecomment-364802374 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM2NDgwMjM3NA== | jhamman 2443309 | 2018-02-11T23:54:01Z | 2018-02-11T23:54:01Z | MEMBER | @martindurant - If I understand your question correctly, I think you should be able to follow a pretty standard xarray workflow: ```Python ds = xr.Dataset() ds['your_varname'] = xr.DataArray(some_dask_array, dims=['dimname0', 'dimname1', ...], coords=dict_of_preknown_coords) repeat for each variable you want in your datasetds.to_zarr(some_zarr_store) then to opends2 = xr.open_zarr(some_zarr_store) ``` Two things to note: 1) if you are looking for decent performance when writing to a remote store, make sure you're working off xarray@master as #1800 fixed a number of choke points in the to_zarr implementation
2) if you are pushing to GCS, |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
364801395 | https://github.com/pydata/xarray/pull/1528#issuecomment-364801395 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM2NDgwMTM5NQ== | mrocklin 306380 | 2018-02-11T23:40:18Z | 2018-02-11T23:40:18Z | MEMBER | Does the to_zarr method suffice: http://xarray.pydata.org/en/latest/generated/xarray.Dataset.to_zarr.html#xarray.Dataset.to_zarr ? On Sun, Feb 11, 2018 at 6:35 PM, Martin Durant notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
351588678 | https://github.com/pydata/xarray/pull/1528#issuecomment-351588678 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM1MTU4ODY3OA== | shoyer 1217238 | 2017-12-14T02:23:03Z | 2017-12-14T02:23:03Z | MEMBER | woohoo, thank you Ryan! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
351401474 | https://github.com/pydata/xarray/pull/1528#issuecomment-351401474 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM1MTQwMTQ3NA== | rabernat 1197350 | 2017-12-13T14:09:12Z | 2017-12-13T14:09:12Z | MEMBER | Will merge later today if no further comments. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
350557153 | https://github.com/pydata/xarray/pull/1528#issuecomment-350557153 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM1MDU1NzE1Mw== | fmaussion 10050469 | 2017-12-10T15:45:13Z | 2017-12-10T15:45:13Z | MEMBER | Thanks for the tremendous work @rabernat , looking forward to testing this! In the future it would be nice to shortly describe the advantages of zarr over netcdf for new users. A speed benchmark could help, too! This can be done once the backend has more maturity, and when we will refactor the I/O docs |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
350365780 | https://github.com/pydata/xarray/pull/1528#issuecomment-350365780 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM1MDM2NTc4MA== | rabernat 1197350 | 2017-12-08T20:36:26Z | 2017-12-08T20:36:26Z | MEMBER | Any more reviews? @fmaussion & @pwolfram: you have experience with backends. Your reviews would be valuable. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
350352097 | https://github.com/pydata/xarray/pull/1528#issuecomment-350352097 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM1MDM1MjA5Nw== | shoyer 1217238 | 2017-12-08T19:34:09Z | 2017-12-08T19:34:09Z | MEMBER |
Oops, this is my fault! Instead, try:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
350343117 | https://github.com/pydata/xarray/pull/1528#issuecomment-350343117 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM1MDM0MzExNw== | mrocklin 306380 | 2017-12-08T18:55:35Z | 2017-12-08T18:55:35Z | MEMBER | Not as far as I know. On Fri, Dec 8, 2017 at 1:53 PM, Ryan Abernathey notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
350336238 | https://github.com/pydata/xarray/pull/1528#issuecomment-350336238 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM1MDMzNjIzOA== | rabernat 1197350 | 2017-12-08T18:26:58Z | 2017-12-08T18:26:58Z | MEMBER | There is a silly lingering issue that I need help resolving. In a8b478543a978bd98c37711609c610432fdc7d07, @jhamman added a function
The |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
349992006 | https://github.com/pydata/xarray/pull/1528#issuecomment-349992006 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0OTk5MjAwNg== | rabernat 1197350 | 2017-12-07T14:59:12Z | 2017-12-07T14:59:12Z | MEMBER | @jhamman, I can't reproduce your error. If you can give me a reproducible example, I will make a test for it. I think this is converging. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
349766763 | https://github.com/pydata/xarray/pull/1528#issuecomment-349766763 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0OTc2Njc2Mw== | rabernat 1197350 | 2017-12-06T20:36:03Z | 2017-12-06T20:36:03Z | MEMBER | @jhamman - but the error being raised is wrong! There is a string formatting error raised in trying to generate a useful, informative error message. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
349738624 | https://github.com/pydata/xarray/pull/1528#issuecomment-349738624 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0OTczODYyNA== | jhamman 2443309 | 2017-12-06T18:54:41Z | 2017-12-06T18:54:56Z | MEMBER | @rabernat - in trying out your branch, I've run into this error (mentioned by @mrocklin in pangeo-data/pangeo#19): ```Python-traceback ... ~/anaconda/envs/pangeo-dev/lib/python3.6/site-packages/xarray-0.10.0_79_g7b50320-py3.6.egg/xarray/backends/zarr.py in _extract_zarr_variable_encoding(variable, raise_on_invalid) 228 229 chunks = _determine_zarr_chunks(encoding.get('chunks'), variable.chunks, --> 230 variable.ndim) 231 encoding['chunks'] = chunks 232 return encoding ~/anaconda/envs/pangeo-dev/lib/python3.6/site-packages/xarray-0.10.0_79_g7b50320-py3.6.egg/xarray/backends/zarr.py in _determine_zarr_chunks(enc_chunks, var_chunks, ndim)
134 "Zarr requires uniform chunk sizes excpet for final chunk."
135 " Variable %r has incompatible chunks. Consider "
--> 136 "rechunking using TypeError: not all arguments converted during string formatting ``` As far as I can tell, reworking my chunk sizes to divide evenly into the dataset dimensions has corrected the problem. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
349554730 | https://github.com/pydata/xarray/pull/1528#issuecomment-349554730 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0OTU1NDczMA== | shoyer 1217238 | 2017-12-06T07:10:37Z | 2017-12-06T07:10:37Z | MEMBER | I just pushed a commit adding a test for |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
349540155 | https://github.com/pydata/xarray/pull/1528#issuecomment-349540155 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0OTU0MDE1NQ== | rabernat 1197350 | 2017-12-06T05:38:26Z | 2017-12-06T05:38:26Z | MEMBER | I believe that this is now complete enough to consider merging. I have addressed nearly all of @shoyer's suggestions. I have added a bunch more tests and am now quite satisfied with the test suite. I wrote some basic documentation, with the usual disclaimers about the experimental nature of this new feature. The zarr tests will not run if the zarr version is less than 2.2.0. This is not released yet. This means that only the py36-zarr-dev build actually runs the zarr tests. Once @alimanfoo releases the next version, the zarr tests should kick in on all the builds. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
349495568 | https://github.com/pydata/xarray/pull/1528#issuecomment-349495568 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0OTQ5NTU2OA== | rabernat 1197350 | 2017-12-06T01:08:11Z | 2017-12-06T01:08:11Z | MEMBER | @jhamman - could you elaborate on the nature of the error you got with uneven dask chunks. We should be catching this and raising a useful error message. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
349488598 | https://github.com/pydata/xarray/pull/1528#issuecomment-349488598 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0OTQ4ODU5OA== | mrocklin 306380 | 2017-12-06T00:30:21Z | 2017-12-06T00:30:21Z | MEMBER | We tried this out on a cloud-deployed cluster on GCE and things worked pleasantly. Some conversation here: https://github.com/pangeo-data/pangeo/issues/19 |
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
348569223 | https://github.com/pydata/xarray/pull/1528#issuecomment-348569223 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0ODU2OTIyMw== | shoyer 1217238 | 2017-12-01T18:20:32Z | 2017-12-01T18:20:32Z | MEMBER |
Variable length strings are stored with |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
348564159 | https://github.com/pydata/xarray/pull/1528#issuecomment-348564159 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0ODU2NDE1OQ== | rabernat 1197350 | 2017-12-01T17:58:59Z | 2017-12-01T17:59:06Z | MEMBER | Sorry this has become such a behemoth. I know it is hard to review. I couldn't see how to make a more atomic PR because a new backend has lots of interrelated parts that need each other in order to work. To finish it up, I propose to raise an error when attempting to encode variable-length string data. If someone can give me a quick one liner to help identify such datatypes, that would be helpful. We will revisit these encoding issues once Stephan's refactoring is merged. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
348560326 | https://github.com/pydata/xarray/pull/1528#issuecomment-348560326 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0ODU2MDMyNg== | shoyer 1217238 | 2017-12-01T17:43:03Z | 2017-12-01T17:43:03Z | MEMBER | I'll give this another look over the weekend. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
348414545 | https://github.com/pydata/xarray/pull/1528#issuecomment-348414545 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0ODQxNDU0NQ== | jhamman 2443309 | 2017-12-01T06:40:47Z | 2017-12-01T06:40:47Z | MEMBER | @rabernat - following @shoyer's thoughts here and in #1753, I'm not apposed to skipping the last few failing tests and live to fight strings another day. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
347989858 | https://github.com/pydata/xarray/pull/1528#issuecomment-347989858 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0Nzk4OTg1OA== | rabernat 1197350 | 2017-11-29T20:42:34Z | 2017-11-29T20:42:34Z | MEMBER | Actually, I think I just realized how to do it without too much pain. Stand by. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
347987097 | https://github.com/pydata/xarray/pull/1528#issuecomment-347987097 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0Nzk4NzA5Nw== | rabernat 1197350 | 2017-11-29T20:32:07Z | 2017-11-29T20:32:07Z | MEMBER |
Because of the way the backends are structured right now, it is hard to bypass the existing encoding and replace it with a new encoding scheme. #1087 will make this easy to do. But now it is complicated. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
347984582 | https://github.com/pydata/xarray/pull/1528#issuecomment-347984582 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0Nzk4NDU4Mg== | shoyer 1217238 | 2017-11-29T20:22:33Z | 2017-11-29T20:22:33Z | MEMBER | I'm fine skipping strings entirely for now. They are indeed unneeded for most netCDF datasets. On Wed, Nov 29, 2017 at 8:18 PM Ryan Abernathey notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
347983854 | https://github.com/pydata/xarray/pull/1528#issuecomment-347983854 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0Nzk4Mzg1NA== | mrocklin 306380 | 2017-11-29T20:19:37Z | 2017-11-29T20:19:37Z | MEMBER |
Is it possible to add one of these filters to XArray's default use of Zarr? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
347983448 | https://github.com/pydata/xarray/pull/1528#issuecomment-347983448 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0Nzk4MzQ0OA== | rabernat 1197350 | 2017-11-29T20:18:08Z | 2017-11-29T20:18:08Z | MEMBER | Right now I am in a dilemma over how to move forward. Fixing this string encoding issue will require some serious hacks to cf encoding. If I do this before #1087 is finished, it will be a waste of time (and a pain). On the other hand #1087 could take a long time, since it is a major refactor itself. Is there some way to punt on the multi-length string encoding for now? We could just error if such variables are present. This would allow us to get the experimental zarr backend out into the wild. FWIW, none of the datasets I want to use this with actually have any string data variables at all. I believe 95% of netcdf datasets are just regular numbers. This is an edge case. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
347981682 | https://github.com/pydata/xarray/pull/1528#issuecomment-347981682 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0Nzk4MTY4Mg== | mrocklin 306380 | 2017-11-29T20:11:25Z | 2017-11-29T20:11:25Z | MEMBER | FWIW my vote is for msgpack over pickle for both performance and cross-language reasons |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
347351224 | https://github.com/pydata/xarray/pull/1528#issuecomment-347351224 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NzM1MTIyNA== | shoyer 1217238 | 2017-11-27T22:32:47Z | 2017-11-28T07:51:31Z | MEMBER |
Agreed! I wonder why zarr doesn't have a UTF-8 variable length string type (https://github.com/alimanfoo/zarr/issues/206) -- that would feel like the obvious first choice for encoding this data. That said, xarary should be able to use fixed-length bytes just fine, doing UTF-8 encoding/decoding on the fly. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
347382612 | https://github.com/pydata/xarray/pull/1528#issuecomment-347382612 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NzM4MjYxMg== | rabernat 1197350 | 2017-11-28T01:21:34Z | 2017-11-28T01:21:34Z | MEMBER |
Do you think this persistence could affect xarray's tests? The way the tests work is via a context manager, like this
Do we need to add an extra step after |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
347381865 | https://github.com/pydata/xarray/pull/1528#issuecomment-347381865 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NzM4MTg2NQ== | rabernat 1197350 | 2017-11-28T01:16:58Z | 2017-11-28T01:16:58Z | MEMBER |
Perhaps zarr should raise an error when assigning |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
347380750 | https://github.com/pydata/xarray/pull/1528#issuecomment-347380750 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NzM4MDc1MA== | rabernat 1197350 | 2017-11-28T01:10:01Z | 2017-11-28T01:10:10Z | MEMBER |
@alimanfoo: the following also seems to works with directory store
This seems to contradict your statement above. What am I missing? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
347323043 | https://github.com/pydata/xarray/pull/1528#issuecomment-347323043 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NzMyMzA0Mw== | rabernat 1197350 | 2017-11-27T20:48:35Z | 2017-11-27T20:53:28Z | MEMBER | After a few more tweaks, this is now quite close to passing all the The remaining issues are all related to the encoding of strings. Basically, zarr's handling of strings:
http://zarr.readthedocs.io/en/latest/tutorial.html?highlight=strings#string-arrays
is considerably different from netCDF's. Because Consider the following direct creation of a variable length string in zarr:
It seems we can encode variable-length strings into objects just fine. ( However, after passing through xarray's cf encoding, this no longer works:
Here is everything that happens in The challenge now is to figure out which parts of this we need to bypass for zarr and how to implement that bypassing. Overall, I find the At this point, I would appreciate some input from an encoding expert before I go refactoring stuff. edit: The actual tests that fail are |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
345778844 | https://github.com/pydata/xarray/pull/1528#issuecomment-345778844 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NTc3ODg0NA== | mrocklin 306380 | 2017-11-20T18:05:25Z | 2017-11-20T18:05:25Z | MEMBER |
It's so nice when well-designed things come together and just work as planned :) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
345575240 | https://github.com/pydata/xarray/pull/1528#issuecomment-345575240 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NTU3NTI0MA== | mrocklin 306380 | 2017-11-20T02:28:07Z | 2017-11-20T02:28:07Z | MEMBER | That is, indeed, quite exciting. Also exciting is that I was able to look at and compute on your data easily. ```python In [1]: import zarr In [2]: import gcsfs In [3]: fs = gcsfs.GCSFileSystem(project='pangeo-181919') In [4]: gcsmap = gcsfs.mapping.GCSMap('zarr_store_test', gcs=fs, check=True, create=False) In [5]: import xarray as xr In [6]: ds_gcs = xr.open_zarr(gcsmap, mode='r') In [7]: ds_gcs Out[7]: <xarray.Dataset> Dimensions: (x: 200, y: 100) Coordinates: * x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... * y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... Data variables: bar (x) float64 dask.array<shape=(200,), chunksize=(40,)> foo (y, x) float32 dask.array<shape=(100, 200), chunksize=(50, 40)> Attributes: array_atr: [1, 2] some_attr: copana In [8]: ds_gcs.sum() Out[8]: <xarray.Dataset> Dimensions: () Data variables: bar float64 dask.array<shape=(), chunksize=()> foo float32 dask.array<shape=(), chunksize=()> In [9]: ds_gcs.sum().compute() Out[9]: <xarray.Dataset> Dimensions: () Data variables: bar float64 0.0 foo float32 20000.0 ``` |
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
345574445 | https://github.com/pydata/xarray/pull/1528#issuecomment-345574445 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NTU3NDQ0NQ== | rabernat 1197350 | 2017-11-20T02:21:08Z | 2017-11-20T02:21:08Z | MEMBER | Those following this thread will probably be very excited to learn that the following code works with my zarr_backend branch:
I never doubted this would be possible, but seeing it in action is quite exciting! |
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
345128506 | https://github.com/pydata/xarray/pull/1528#issuecomment-345128506 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NTEyODUwNg== | jhamman 2443309 | 2017-11-17T02:38:41Z | 2017-11-17T02:38:41Z | MEMBER | @rabernat - It might a little but we'll sort it out. See https://github.com/rabernat/xarray/pull/3. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
345126452 | https://github.com/pydata/xarray/pull/1528#issuecomment-345126452 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NTEyNjQ1Mg== | rabernat 1197350 | 2017-11-17T02:24:56Z | 2017-11-17T02:24:56Z | MEMBER | @jhamman would it screw you up if I pushed a few commits tonight? I won’t touch the ZarrArrayWrapper. But I figured out how to fix auto_chunk. Sent from my iPhone
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
345104713 | https://github.com/pydata/xarray/pull/1528#issuecomment-345104713 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NTEwNDcxMw== | mrocklin 306380 | 2017-11-17T00:12:01Z | 2017-11-17T00:12:01Z | MEMBER | Hooray for standard interfaces! |
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 1, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
345101150 | https://github.com/pydata/xarray/pull/1528#issuecomment-345101150 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NTEwMTE1MA== | mrocklin 306380 | 2017-11-16T23:52:48Z | 2017-11-16T23:52:48Z | MEMBER | The gcsfs library also provides a MutableMapping for Google Cloud Storage. The dask.distributed library now also provides a distributed lock for synchronization, if necessary though in practice we should just rechunk the dask.array before writing. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
345091139 | https://github.com/pydata/xarray/pull/1528#issuecomment-345091139 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NTA5MTEzOQ== | shoyer 1217238 | 2017-11-16T23:02:14Z | 2017-11-16T23:02:14Z | MEMBER |
We will need to write new adapter code to map xarray's explicit indexer classes onto the appropriate zarr methods, e.g., ```python def getitem(self, key): array = self.get_arraay() if isinstance(key, BasicIndexer): return array[key.tuple] elif isinstance(key, VectorizedIndexer): return array.vindex[_replace_slices_with_arrays(key.tuple, self.shape)] else: assert isinstance(key, OuterIndexer) return array.oindex[key.tuple] untested, but I think this does the appropriate shape munging to make slicesappear as the last axes of the result arraydef _replace_slice_with_arrays(key, shape): num_slices = sum(1 for k in key if isinstance(k, slice)) num_arrays = len(shape) - num_slices new_key = [] slice_count = 0 for k, size in zip(key, shape): if isinstance(k, slice): array = np.arange(*k.indices(size)) sl = [np.newaxis] * len(shape) sl[num_arrays + slice_count] = np.newaxis k = array[sl] slice_count += 1 else: assert isinstance(k, numpy.ndarray) k = k[(slice(None),) * num_arrays + (np.newaxis,) * num_slices] new_key.append(k) return tuple(new_key) ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
345034208 | https://github.com/pydata/xarray/pull/1528#issuecomment-345034208 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NTAzNDIwOA== | rabernat 1197350 | 2017-11-16T19:22:01Z | 2017-11-16T19:22:01Z | MEMBER | Some things I would like to add to the zarr test suite:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
345030848 | https://github.com/pydata/xarray/pull/1528#issuecomment-345030848 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NTAzMDg0OA== | rabernat 1197350 | 2017-11-16T19:10:31Z | 2017-11-16T19:10:31Z | MEMBER |
Great! If you use the latest zarr master, you should get the same test results as this travis build: https://travis-ci.org/pydata/xarray/jobs/301606996 There are two outstanding failures related to encoding ( The biggest problem is that, for reasons I don't understand, my "auto-chunking" behavior does not work (this is covered by the only zarr-specific test method: |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
345026224 | https://github.com/pydata/xarray/pull/1528#issuecomment-345026224 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NTAyNjIyNA== | jhamman 2443309 | 2017-11-16T18:53:42Z | 2017-11-16T18:53:42Z | MEMBER | @rabernat - FYI: I'm playing with your branch a bit today. @shoyer and @rabernat, can we brainstorm what a ```Python class ZarrArrayWraper(BackendArray): def init(self, variable_name, datastore): self.datastore = datastore self.variable_name = variable_name array = self.get_array() self.shape = array.shape self.dtype = np.dtype(array.dtype.kind + str(array.dtype.itemsize))
``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
344040853 | https://github.com/pydata/xarray/pull/1528#issuecomment-344040853 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NDA0MDg1Mw== | rabernat 1197350 | 2017-11-13T20:04:12Z | 2017-11-13T20:04:12Z | MEMBER | 😬 that's my punishment for being slow! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
344040250 | https://github.com/pydata/xarray/pull/1528#issuecomment-344040250 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDM0NDA0MDI1MA== | shoyer 1217238 | 2017-11-13T20:02:03Z | 2017-11-13T20:02:03Z | MEMBER | @rabernat sorry for the churn here, but you are also probably going to need to update after the explicit indexing changes in #1705. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
339815147 | https://github.com/pydata/xarray/pull/1528#issuecomment-339815147 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMzOTgxNTE0Nw== | rabernat 1197350 | 2017-10-26T22:07:10Z | 2017-10-26T22:07:10Z | MEMBER | Fantastic! Are you planning a release any time soon? If not we can set up to test against the github master. Sent from my iPhone
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
335204883 | https://github.com/pydata/xarray/pull/1528#issuecomment-335204883 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMzNTIwNDg4Mw== | rabernat 1197350 | 2017-10-09T16:09:50Z | 2017-10-09T16:09:50Z | MEMBER |
Congratulations! If you could just merge alimanfoo/zarr#154, it would really help us move forward. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
335162205 | https://github.com/pydata/xarray/pull/1528#issuecomment-335162205 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMzNTE2MjIwNQ== | rabernat 1197350 | 2017-10-09T13:43:49Z | 2017-10-09T13:43:49Z | MEMBER |
Does this include merging PRs? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
335027491 | https://github.com/pydata/xarray/pull/1528#issuecomment-335027491 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMzNTAyNzQ5MQ== | rabernat 1197350 | 2017-10-08T18:23:50Z | 2017-10-08T18:23:50Z | MEMBER |
My impression is that zarr development is moving conservatively, so we would be better off finding workarounds in xarray. @shoyer: where in the code would you recommend putting this logic? It seems like part of encoding / decoding to me. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
334981929 | https://github.com/pydata/xarray/pull/1528#issuecomment-334981929 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMzNDk4MTkyOQ== | rabernat 1197350 | 2017-10-08T04:16:58Z | 2017-10-08T18:21:30Z | MEMBER | There are two zarr issues that are causing some tests to fail:
Most of the failures of tests inherited from |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
335015485 | https://github.com/pydata/xarray/pull/1528#issuecomment-335015485 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMzNTAxNTQ4NQ== | shoyer 1217238 | 2017-10-08T15:46:36Z | 2017-10-08T15:46:36Z | MEMBER | For serializing attributes, the easiest fix is to call |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
334982373 | https://github.com/pydata/xarray/pull/1528#issuecomment-334982373 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMzNDk4MjM3Mw== | rabernat 1197350 | 2017-10-08T04:31:02Z | 2017-10-08T04:31:09Z | MEMBER | I worked on this on the plane back from Seattle. Yay for having no internet access! Would appreciate feedback on the questions raised above from @shoyer, @jhamman, and anyone else with backend expertise. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
334633708 | https://github.com/pydata/xarray/pull/1528#issuecomment-334633708 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMzNDYzMzcwOA== | rabernat 1197350 | 2017-10-06T01:15:05Z | 2017-10-06T01:15:05Z | MEMBER | Here is where we are at with the Zarr backend tests
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
334633152 | https://github.com/pydata/xarray/pull/1528#issuecomment-334633152 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMzNDYzMzE1Mg== | rabernat 1197350 | 2017-10-06T01:10:29Z | 2017-10-06T01:10:29Z | MEMBER | With @jhamman's help, I just made a little progress on this. We now have a bare bones test suite for the zarr backend. This is very helpful for revealing where more work is needed: encoding. So the next step is to seriously confront that issue. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
334316122 | https://github.com/pydata/xarray/pull/1528#issuecomment-334316122 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMzNDMxNjEyMg== | jhamman 2443309 | 2017-10-04T23:14:58Z | 2017-10-04T23:14:58Z | MEMBER | @rabernat - testing should be fully functional now. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
333579128 | https://github.com/pydata/xarray/pull/1528#issuecomment-333579128 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMzMzU3OTEyOA== | jhamman 2443309 | 2017-10-02T15:58:05Z | 2017-10-02T15:58:05Z | MEMBER | @rabernat - re backends testing, #1557 is pretty close. I can wrap it up this week. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
333336320 | https://github.com/pydata/xarray/pull/1528#issuecomment-333336320 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMzMzMzNjMyMA== | rabernat 1197350 | 2017-09-30T21:13:48Z | 2017-09-30T21:13:48Z | MEMBER | @martindurant: I may have some time to get back to working on this next week. (Especially if @jhamman can help me sort out the backend testing.) What is the status of your branch? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
327900874 | https://github.com/pydata/xarray/pull/1528#issuecomment-327900874 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMyNzkwMDg3NA== | shoyer 1217238 | 2017-09-07T19:32:41Z | 2017-09-07T19:32:41Z | MEMBER | @rabernat indeed, the backend tests are not terribly well organized right now. Probably the place to start is to inherit from |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
327849640 | https://github.com/pydata/xarray/pull/1528#issuecomment-327849640 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMyNzg0OTY0MA== | rabernat 1197350 | 2017-09-07T16:17:13Z | 2017-09-07T16:17:13Z | MEMBER | I am stuck on figuring out how to develop a new test case for this. (It doesn't help that #1531 is messing up the backend tests.) If @shoyer can give us a few hints about how to best implement a test class (i.e. what to subclass, etc.), I think that could jumpstart testing and move the PR forward. I welcome contributions from others such as @martindurant on this. I won't have much time in the near future, since a new semester just dropped on me like a load of bricks. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
325742232 | https://github.com/pydata/xarray/pull/1528#issuecomment-325742232 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMyNTc0MjIzMg== | shoyer 1217238 | 2017-08-29T17:50:04Z | 2017-08-29T17:50:04Z | MEMBER |
The only advantage here would be for non-xarray users, who could use zarr to do this decoding/encoding automatically. For what it's worth, the implementation of scale offsets in xarray looks basically equivalent to what's done in zarr. I don't think there's a performance difference either way.
If you use chunks, I believe HDF5/NetCDF4 do the same thing, e.g., ``` In [10]: with h5py.File('one-chunk.h5') as f: f.create_dataset('foo', (100, 100), chunks=(100, 100)) In [11]: with h5py.File('many-chunk.h5') as f: f.create_dataset('foo', (100000, 100000), chunks=(100, 100)) In [12]: ls -l | grep chunk.h5 -rw-r--r-- 1 shoyer eng 1400 Aug 29 10:48 many-chunk.h5 -rw-r--r-- 1 shoyer eng 1400 Aug 29 10:48 one-chunk.h5 ``` (Note the same file-size) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
325738019 | https://github.com/pydata/xarray/pull/1528#issuecomment-325738019 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMyNTczODAxOQ== | rabernat 1197350 | 2017-08-29T17:35:09Z | 2017-08-29T17:35:09Z | MEMBER | One path forward for now would be to ignore the filters like If we think there is an advantage to using the zarr native filters, that could be added via a future PR once we have the basic backend working. @alimanfoo: when do you anticipate the 2.2 zarr release to happen? Will the API change significantly? If so, I will wait for that to move forward here. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
325723577 | https://github.com/pydata/xarray/pull/1528#issuecomment-325723577 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMyNTcyMzU3Nw== | shoyer 1217238 | 2017-08-29T16:43:58Z | 2017-08-29T16:44:25Z | MEMBER |
Yes, exactly.
Typically, we store things in encoding that are attributes on the underlying NetCDF file, but no longer make sense to describe the decoded data. For example:
- On the file,
Currently, we assume that stores never do this, and always handle it ourselves. We might need a special exception for zarr and scale/offset encoding.
Maybe, though again it will probably need slightly customized conventions for writing data (if we let zarr handling scale/offset encoding).
We have two options:
1. Handle it all in xarray via the machinery in I think (2) would be the preferred way to do this. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
325716892 | https://github.com/pydata/xarray/pull/1528#issuecomment-325716892 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMyNTcxNjg5Mg== | shoyer 1217238 | 2017-08-29T16:19:57Z | 2017-08-29T16:19:57Z | MEMBER | @rabernat I think this is #1531 -- |
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 1, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
325690352 | https://github.com/pydata/xarray/pull/1528#issuecomment-325690352 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMyNTY5MDM1Mg== | rabernat 1197350 | 2017-08-29T14:54:53Z | 2017-08-29T14:54:53Z | MEMBER | I am now trying to understand the backend test suite structure. Can someone explain to me why so many tests are skipped? For example, if I run
I get ``` ================================================== test session starts ================================================== platform darwin -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0 -- /Users/rpa/anaconda/bin/python cachedir: .cache rootdir: /Users/rpa/RND/Public/xarray, inifile: setup.cfg plugins: cov-2.5.1 collected 683 items xarray/tests/test_backends.py::GenericNetCDFDataTest::test_coordinates_encoding SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_cross_engine_read_write_netcdf3 PASSED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_dataset_caching SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_dataset_compute SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_default_fill_value SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_encoding_kwarg SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_encoding_same_dtype SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_encoding_unlimited_dims PASSED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_engine PASSED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_invalid_dataarray_names_raise SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_load SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_orthogonal_indexing PASSED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_pickle SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_pickle_dataarray SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_None_variable SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_boolean_dtype SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_coordinates SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_datetime_data SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_endian SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_example_1_netcdf SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_float64_data SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_mask_and_scale SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_object_dtype SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_string_data SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_strings_with_fill_value SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_test_data SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_roundtrip_timedelta_data SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_write_store PASSED xarray/tests/test_backends.py::GenericNetCDFDataTest::test_zero_dimensional_variable SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_coordinates_encoding SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_cross_engine_read_write_netcdf3 PASSED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_dataset_caching SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_dataset_compute SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_default_fill_value SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_encoding_kwarg SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_encoding_same_dtype SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_encoding_unlimited_dims PASSED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_engine PASSED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_invalid_dataarray_names_raise SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_load SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_orthogonal_indexing PASSED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_pickle SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_pickle_dataarray SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_None_variable SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_boolean_dtype SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_coordinates SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_datetime_data SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_endian SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_example_1_netcdf SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_float64_data SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_mask_and_scale SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_object_dtype SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_string_data SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_strings_with_fill_value SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_test_data SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_roundtrip_timedelta_data SKIPPED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_write_store PASSED xarray/tests/test_backends.py::GenericNetCDFDataTestAutocloseTrue::test_zero_dimensional_variable SKIPPED ================================================ short test summary info ================================================ SKIP [2] xarray/tests/test_backends.py:382: requires pynio SKIP [2] xarray/tests/test_backends.py:214: requires pynio SKIP [2] xarray/tests/test_backends.py:178: requires pynio SKIP [2] xarray/tests/test_backends.py:468: requires pynio SKIP [2] xarray/tests/test_backends.py:439: requires pynio SKIP [2] xarray/tests/test_backends.py:490: requires pynio SKIP [2] xarray/tests/test_backends.py:428: requires pynio SKIP [2] xarray/tests/test_backends.py:145: requires pynio SKIP [2] xarray/tests/test_backends.py:197: requires pynio SKIP [2] xarray/tests/test_backends.py:207: requires pynio SKIP [2] xarray/tests/test_backends.py:230: requires pynio SKIP [2] xarray/tests/test_backends.py:311: requires pynio SKIP [2] xarray/tests/test_backends.py:300: requires pynio SKIP [2] xarray/tests/test_backends.py:271: requires pynio SKIP [2] xarray/tests/test_backends.py:409: requires pynio SKIP [2] xarray/tests/test_backends.py:291: requires pynio SKIP [2] xarray/tests/test_backends.py:286: requires pynio SKIP [2] xarray/tests/test_backends.py:362: requires pynio SKIP [2] xarray/tests/test_backends.py:235: requires pynio SKIP [2] xarray/tests/test_backends.py:264: requires pynio SKIP [2] xarray/tests/test_backends.py:334: requires pynio SKIP [2] xarray/tests/test_backends.py:139: requires pynio SKIP [2] xarray/tests/test_backends.py:280: requires pynio SKIP [2] xarray/tests/test_backends.py:109: requires pynio ``` Those line numbers refer to all of the skipped methods. Why should I need pynio to run those tests? It looks like the same thing is happening on travis: https://travis-ci.org/pydata/xarray/jobs/268805771#L1527 Maybe @pwolfram understands this stuff? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
325660754 | https://github.com/pydata/xarray/pull/1528#issuecomment-325660754 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMyNTY2MDc1NA== | rabernat 1197350 | 2017-08-29T13:18:33Z | 2017-08-29T13:18:33Z | MEMBER |
Is the goal here to be able to round-trip the file, such that calling I don't understand how encoding interacts with attributes? When is something an attribute vs. an encoding (
Does this mean that my Regarding encoding, zarr has its own internal mechanism for encoding, which it calls "filters", that closely resemble some of the CF encoding options. For example the I don't yet understand how to make these elements work together properly, for example, do avoid applying the scale / offset function twice, as I mentioned above. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
325525827 | https://github.com/pydata/xarray/pull/1528#issuecomment-325525827 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMyNTUyNTgyNw== | shoyer 1217238 | 2017-08-29T01:14:05Z | 2017-08-29T01:14:05Z | MEMBER |
Yes, probably, if we want to handle netcdf conventions for times, fill values and scaling.
This would be nice! But it's also a bigger issue (will look for the number, I think it's already been opened).
Still need to think about this one.
I guess we can ignore them (maybe add a warning?) -- they're not part of the zarr data model.
I don't think we need any autoclose logic at all -- zarr doesn't leave open files hanging around already. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
325226656 | https://github.com/pydata/xarray/pull/1528#issuecomment-325226656 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMyNTIyNjY1Ng== | rabernat 1197350 | 2017-08-27T21:42:23Z | 2017-08-27T21:42:23Z | MEMBER |
This is also part of my goal. I think all the metadata can be stored internally to zarr via attributes. There just have to be some "special" attributes that xarray hides from the user. This is the same as h5netcdf. @alimanfoo suggested this should be possible in that earlier thread:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
325226495 | https://github.com/pydata/xarray/pull/1528#issuecomment-325226495 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMyNTIyNjQ5NQ== | rabernat 1197350 | 2017-08-27T21:38:35Z | 2017-08-27T21:38:35Z | MEMBER |
Your functions are a great proof of concept for the relative ease of interoperability between xarray and zarr. What I have done here is to implement an xarray "backend" (i.e. DataStore) that uses zarr as its storage medium. This puts zarr on the same level as netCDF and HDF5 as a "first class" storage format for xarray data, as suggested by @shoyer in the comment on that thread. My hope is that this will enable the magical performance benefits that you have anticipated. Digging deeper into that thread, I see @shoyer makes the following proposition:
With this PR, I have started to do the former (write a DataStore). However, I can already see the wisdom of what he says next:
I have already implemented my own custom DataStore for a different project, so I felt comfortable diving into this. But I might end up reinventing the wheel several times over if I continue down this road. In particular, I can see that my On the other hand, zarr is so simple to use that a separate wrapper package might be overkill. So I am still not sure whether the approach I am taking here is worth pursuing further. I consider this a highly experimental PR, and I'm really looking for feedback. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 | |
325173551 | https://github.com/pydata/xarray/pull/1528#issuecomment-325173551 | https://api.github.com/repos/pydata/xarray/issues/1528 | MDEyOklzc3VlQ29tbWVudDMyNTE3MzU1MQ== | rabernat 1197350 | 2017-08-27T02:40:22Z | 2017-08-27T02:40:22Z | MEMBER | cc @martindurant, @mrocklin, @alimanfoo |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Zarr backend 253136694 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 5