issue_comments
15 rows where issue = 342531772 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- zarr and xarray chunking compatibility and `to_zarr` performance · 15 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
805883595 | https://github.com/pydata/xarray/issues/2300#issuecomment-805883595 | https://api.github.com/repos/pydata/xarray/issues/2300 | MDEyOklzc3VlQ29tbWVudDgwNTg4MzU5NQ== | rabernat 1197350 | 2021-03-24T14:48:55Z | 2021-03-24T14:48:55Z | MEMBER | In #5056, I have implemented the solution of deleting |
{ "total_count": 2, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 2, "rocket": 0, "eyes": 0 } |
zarr and xarray chunking compatibility and `to_zarr` performance 342531772 | |
790088409 | https://github.com/pydata/xarray/issues/2300#issuecomment-790088409 | https://api.github.com/repos/pydata/xarray/issues/2300 | MDEyOklzc3VlQ29tbWVudDc5MDA4ODQwOQ== | rabernat 1197350 | 2021-03-03T21:55:44Z | 2021-03-03T21:55:44Z | MEMBER |
I would not favor that. A user may choose to define their desired zarr chunks by putting this information in encoding. In this case, it's good to raise the error. (This is the case I had in mind when I wrote this code.) The problem here is that encoding is often being carried over from the original dataset and persisted across operations that change chunk size. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
zarr and xarray chunking compatibility and `to_zarr` performance 342531772 | |
789978111 | https://github.com/pydata/xarray/issues/2300#issuecomment-789978111 | https://api.github.com/repos/pydata/xarray/issues/2300 | MDEyOklzc3VlQ29tbWVudDc4OTk3ODExMQ== | dcherian 2448579 | 2021-03-03T18:59:39Z | 2021-03-03T18:59:39Z | MEMBER | alternatively |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
zarr and xarray chunking compatibility and `to_zarr` performance 342531772 | |
789974968 | https://github.com/pydata/xarray/issues/2300#issuecomment-789974968 | https://api.github.com/repos/pydata/xarray/issues/2300 | MDEyOklzc3VlQ29tbWVudDc4OTk3NDk2OA== | rabernat 1197350 | 2021-03-03T18:54:43Z | 2021-03-03T18:54:43Z | MEMBER | I think we are all in agreement. Just waiting for someone to make a PR. It's probably just a few lines of code changes. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
zarr and xarray chunking compatibility and `to_zarr` performance 342531772 | |
789972117 | https://github.com/pydata/xarray/issues/2300#issuecomment-789972117 | https://api.github.com/repos/pydata/xarray/issues/2300 | MDEyOklzc3VlQ29tbWVudDc4OTk3MjExNw== | jbusecke 14314623 | 2021-03-03T18:50:18Z | 2021-03-03T18:50:18Z | CONTRIBUTOR |
Just ran into this issue myself and just wanted to add a +1 to stripping the encoding when |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
zarr and xarray chunking compatibility and `to_zarr` performance 342531772 | |
673565228 | https://github.com/pydata/xarray/issues/2300#issuecomment-673565228 | https://api.github.com/repos/pydata/xarray/issues/2300 | MDEyOklzc3VlQ29tbWVudDY3MzU2NTIyOA== | LunarLanding 4441338 | 2020-08-13T16:04:04Z | 2020-08-13T16:04:04Z | NONE | I arrived here due to a different use case / problem, which ultimately I solved, but I think there's value in documenting it here. My use case is the following workflow: 1 . take raw data, build a dataset, append it to a zarr store Z 2 . analyze the data on Z, then maybe goto 1. Step 2's performance is much better when data on Z is chunked properly along the appending dimension 'frame' (chunks of size 50), however step 1 only adds 1 element along it. I end up with Z having chunks (1,1,1,1,1...) on 'frame'. On xarray 0.16.0, this seems solvable via the encoding parameter, if we take care to only use it on the store creation. Before that version, I was using something like the monkey patch posted by @chrisbarber . Code: ```python import shutil import xarray as xr import numpy as np import tempfile zarr_path = tempfile.mkdtemp() def append_test(ds,chunks): shutil.rmtree(zarr_path)
sometime before 0.16.0import contextlib @contextlib.contextmanager def change_determine_zarr_chunks(chunks): orig_determine_zarr_chunks = xr.backends.zarr._determine_zarr_chunks try: def new_determine_zarr_chunks( enc_chunks, var_chunks, ndim, name): da = ds[name] zchunks = tuple(chunks[dim] if (dim in chunks and chunks[dim] is not None) else da.shape[i] for i,dim in enumerate(da.dims)) return zchunks xr.backends.zarr._determine_zarr_chunks = new_determine_zarr_chunks yield finally: xr.backends.zarr._determine_zarr_chunks = orig_determine_zarr_chunks chunks = {'frame':10,'other':50} ds = xr.Dataset({'data':xr.DataArray(data=np.random.rand(100,100),dims=('frame','other'))}) append_test(ds,chunks) with change_determine_zarr_chunks(chunks): append_test(ds,chunks) with 0.16.0def append_test_encoding(ds,chunks): shutil.rmtree(zarr_path)
append_test_encoding(ds,chunks) ```
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
zarr and xarray chunking compatibility and `to_zarr` performance 342531772 | |
637635620 | https://github.com/pydata/xarray/issues/2300#issuecomment-637635620 | https://api.github.com/repos/pydata/xarray/issues/2300 | MDEyOklzc3VlQ29tbWVudDYzNzYzNTYyMA== | chrisroat 1053153 | 2020-06-02T15:42:43Z | 2020-06-02T15:42:43Z | CONTRIBUTOR | If there is a non-dimension coordinate, the error is also tickled. ``` import xarray as xr import numpy as np ds=xr.Dataset({'foo': (['bar'], np.zeros((100,)))}) Problem also affects non-dimension coordsds.coords['baz'] = ('bar', ['mah'] * 100) ds.to_zarr('test.zarr', mode='w') ds2 = xr.open_zarr('test.zarr') ds3 = ds2.chunk({'bar': 2}) ds3.foo.encoding = {} ds3.coords['baz'].encoding = {} # Need this, too. ds3.to_zarr('test3.zarr', mode='w') ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
zarr and xarray chunking compatibility and `to_zarr` performance 342531772 | |
627448680 | https://github.com/pydata/xarray/issues/2300#issuecomment-627448680 | https://api.github.com/repos/pydata/xarray/issues/2300 | MDEyOklzc3VlQ29tbWVudDYyNzQ0ODY4MA== | dcherian 2448579 | 2020-05-12T16:22:55Z | 2020-05-12T16:22:55Z | MEMBER |
IMO this is the user-friendly thing to do. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
zarr and xarray chunking compatibility and `to_zarr` performance 342531772 | |
598790404 | https://github.com/pydata/xarray/issues/2300#issuecomment-598790404 | https://api.github.com/repos/pydata/xarray/issues/2300 | MDEyOklzc3VlQ29tbWVudDU5ODc5MDQwNA== | rabernat 1197350 | 2020-03-13T15:51:54Z | 2020-03-13T15:51:54Z | MEMBER | Hi all. I am looking into this issue, trying to figure out if it is still a thing. I just tried @chrisbarber's MRE above using xarray v 0.15.
I can trigger the error in a third step:
raises
The problem is that, even though we rechunked the data,
This was populated with the variable was read from As a workaround, you can delete the encoding (either just the For all the users stuck on this problem (e.g. @abarciauskas-bgse):
- update to the latest version of xarray and then
- delete the encoding on your variables, or overwrite it with the For xarray developers, the question is whether the |
{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
zarr and xarray chunking compatibility and `to_zarr` performance 342531772 | |
519688842 | https://github.com/pydata/xarray/issues/2300#issuecomment-519688842 | https://api.github.com/repos/pydata/xarray/issues/2300 | MDEyOklzc3VlQ29tbWVudDUxOTY4ODg0Mg== | nbren12 1386642 | 2019-08-08T21:10:54Z | 2019-08-08T21:10:54Z | CONTRIBUTOR | I am getting the same error too. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
zarr and xarray chunking compatibility and `to_zarr` performance 342531772 | |
493408428 | https://github.com/pydata/xarray/issues/2300#issuecomment-493408428 | https://api.github.com/repos/pydata/xarray/issues/2300 | MDEyOklzc3VlQ29tbWVudDQ5MzQwODQyOA== | tinaok 46813815 | 2019-05-17T10:37:35Z | 2019-05-17T10:37:35Z | NONE | Hi, I'm new to xarray & zarr ,
After reading a zarr file, I re-chunk the data using xarray.Dataset.chunk. Then create a newly chunked data stored as zarr file with xarray.Dataset.to_zarr But I get error message:
'NotImplementedError: Specified zarr chunks (200, 100, 1) would overlap multiple dask chunks ((50, 50, 50, 50), (25, 25, 25, 25), (10000,)). This is not implemented in xarray yet. Consider rechunking the data using Then why do I get 'notimplemented error ? Do I have to use 'del dsread.data.encoding['chunks']. each time before using 'Dataset.to_zarr' as a workaround? but probably I am missing somthing. I hope someone can point me out... I made a notebook here for reproducing the pb. thanks for your help, regards Tina |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
zarr and xarray chunking compatibility and `to_zarr` performance 342531772 | |
406732486 | https://github.com/pydata/xarray/issues/2300#issuecomment-406732486 | https://api.github.com/repos/pydata/xarray/issues/2300 | MDEyOklzc3VlQ29tbWVudDQwNjczMjQ4Ng== | chrisbarber 1530840 | 2018-07-20T21:33:08Z | 2018-07-20T21:33:08Z | NONE | I took a closer look and noticed my one-dimensional fields of size 505359 were reporting a chunksize or 63170. Turns out that's enough to come up with a minimal repro: ```python
|
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
zarr and xarray chunking compatibility and `to_zarr` performance 342531772 | |
406718847 | https://github.com/pydata/xarray/issues/2300#issuecomment-406718847 | https://api.github.com/repos/pydata/xarray/issues/2300 | MDEyOklzc3VlQ29tbWVudDQwNjcxODg0Nw== | shoyer 1217238 | 2018-07-20T20:31:42Z | 2018-07-20T20:31:42Z | MEMBER |
No, there's no downside here. It's just not possible to define a single dict of chunks in this case. Can you look into the It would also help to come up with a self-contained example that reproduces this using dummy data. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
zarr and xarray chunking compatibility and `to_zarr` performance 342531772 | |
406705740 | https://github.com/pydata/xarray/issues/2300#issuecomment-406705740 | https://api.github.com/repos/pydata/xarray/issues/2300 | MDEyOklzc3VlQ29tbWVudDQwNjcwNTc0MA== | chrisbarber 1530840 | 2018-07-20T19:36:08Z | 2018-07-20T19:38:03Z | NONE | Ah, that's great. I do see some improvement. Specifically, I can now set chunks using xarray, and successfully write to zarr, and reopen it. However, when reopening it I do find that the chunks have been inconsistently applied (some fields have the expected chunksize whereas some small fields have the entire variable in one chunk). Furthermore, trying to write a second time with I also tried loading my entire dataset into memory, allowing the initial Curious: Is there any downside in xarray to using datasets with inconsistent chunks? I take it that it is a supported configuration because xarray allows it to happen, but just outputs that error when calling One other thing to add: it might be nice to have an option to allow zarr auto-chunking even when |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
zarr and xarray chunking compatibility and `to_zarr` performance 342531772 | |
406165245 | https://github.com/pydata/xarray/issues/2300#issuecomment-406165245 | https://api.github.com/repos/pydata/xarray/issues/2300 | MDEyOklzc3VlQ29tbWVudDQwNjE2NTI0NQ== | shoyer 1217238 | 2018-07-19T06:08:26Z | 2018-07-19T06:08:26Z | MEMBER | I just pushed a new xarray release (0.10.8) earlier today. We had a fix for zarr chunking in there (https://github.com/pydata/xarray/pull/2228) -- does that solve your issue? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
zarr and xarray chunking compatibility and `to_zarr` performance 342531772 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 9