github: issue_comments: 15 rows where issue = 342531772 sorted by updated

15 rows where issue = 342531772 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
805883595	https://github.com/pydata/xarray/issues/2300#issuecomment-805883595	https://api.github.com/repos/pydata/xarray/issues/2300	MDEyOklzc3VlQ29tbWVudDgwNTg4MzU5NQ==	rabernat 1197350	2021-03-24T14:48:55Z	2021-03-24T14:48:55Z	MEMBER	In #5056, I have implemented the solution of deleting `chunks` from encoding when `chunk()` is called on a variable. A review of that PR would be welcome.	{ "total_count": 2, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 2, "rocket": 0, "eyes": 0 }	zarr and xarray chunking compatibility and `to_zarr` performance 342531772
790088409	https://github.com/pydata/xarray/issues/2300#issuecomment-790088409	https://api.github.com/repos/pydata/xarray/issues/2300	MDEyOklzc3VlQ29tbWVudDc5MDA4ODQwOQ==	rabernat 1197350	2021-03-03T21:55:44Z	2021-03-03T21:55:44Z	MEMBER	alternatively `to_zarr` could ignore `encoding["chunks"]` when the data is already chunked? I would not favor that. A user may choose to define their desired zarr chunks by putting this information in encoding. In this case, it's good to raise the error. (This is the case I had in mind when I wrote this code.) The problem here is that encoding is often being carried over from the original dataset and persisted across operations that change chunk size.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr and xarray chunking compatibility and `to_zarr` performance 342531772
789978111	https://github.com/pydata/xarray/issues/2300#issuecomment-789978111	https://api.github.com/repos/pydata/xarray/issues/2300	MDEyOklzc3VlQ29tbWVudDc4OTk3ODExMQ==	dcherian 2448579	2021-03-03T18:59:39Z	2021-03-03T18:59:39Z	MEMBER	alternatively `to_zarr` could ignore `encoding["chunks"]` when the data is already chunked?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr and xarray chunking compatibility and `to_zarr` performance 342531772
789974968	https://github.com/pydata/xarray/issues/2300#issuecomment-789974968	https://api.github.com/repos/pydata/xarray/issues/2300	MDEyOklzc3VlQ29tbWVudDc4OTk3NDk2OA==	rabernat 1197350	2021-03-03T18:54:43Z	2021-03-03T18:54:43Z	MEMBER	I think we are all in agreement. Just waiting for someone to make a PR. It's probably just a few lines of code changes.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr and xarray chunking compatibility and `to_zarr` performance 342531772
789972117	https://github.com/pydata/xarray/issues/2300#issuecomment-789972117	https://api.github.com/repos/pydata/xarray/issues/2300	MDEyOklzc3VlQ29tbWVudDc4OTk3MjExNw==	jbusecke 14314623	2021-03-03T18:50:18Z	2021-03-03T18:50:18Z	CONTRIBUTOR	the question is whether the chunk() method should delete existing chunks attributes from encoding. IMO this is the user-friendly thing to do. Just ran into this issue myself and just wanted to add a +1 to stripping the encoding when `.chunk()` is used.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr and xarray chunking compatibility and `to_zarr` performance 342531772
673565228	https://github.com/pydata/xarray/issues/2300#issuecomment-673565228	https://api.github.com/repos/pydata/xarray/issues/2300	MDEyOklzc3VlQ29tbWVudDY3MzU2NTIyOA==	LunarLanding 4441338	2020-08-13T16:04:04Z	2020-08-13T16:04:04Z	NONE	I arrived here due to a different use case / problem, which ultimately I solved, but I think there's value in documenting it here. My use case is the following workflow: 1 . take raw data, build a dataset, append it to a zarr store Z 2 . analyze the data on Z, then maybe goto 1. Step 2's performance is much better when data on Z is chunked properly along the appending dimension 'frame' (chunks of size 50), however step 1 only adds 1 element along it. I end up with Z having chunks (1,1,1,1,1...) on 'frame'. On xarray 0.16.0, this seems solvable via the encoding parameter, if we take care to only use it on the store creation. Before that version, I was using something like the monkey patch posted by @chrisbarber . Code: ```python import shutil import xarray as xr import numpy as np import tempfile zarr_path = tempfile.mkdtemp() def append_test(ds,chunks): shutil.rmtree(zarr_path) `for i in range(21): d = ds.isel(frame=slice(i,i+1)) d = d.chunk(chunks) d.to_zarr(zarr_path,consolidated=True,(dict(mode='a',append_dim='frame') if i>0 else {})) dsa = xr.open_zarr(str(zarr_path),consolidated=True) print(dsa.chunks,dsa.dims)` sometime before 0.16.0 import contextlib @contextlib.contextmanager def change_determine_zarr_chunks(chunks): orig_determine_zarr_chunks = xr.backends.zarr._determine_zarr_chunks try: def new_determine_zarr_chunks( enc_chunks, var_chunks, ndim, name): da = ds[name] zchunks = tuple(chunks[dim] if (dim in chunks and chunks[dim] is not None) else da.shape[i] for i,dim in enumerate(da.dims)) return zchunks xr.backends.zarr._determine_zarr_chunks = new_determine_zarr_chunks yield finally: xr.backends.zarr._determine_zarr_chunks = orig_determine_zarr_chunks chunks = {'frame':10,'other':50} ds = xr.Dataset({'data':xr.DataArray(data=np.random.rand(100,100),dims=('frame','other'))}) append_test(ds,chunks) with change_determine_zarr_chunks(chunks): append_test(ds,chunks) with 0.16.0 def append_test_encoding(ds,chunks): shutil.rmtree(zarr_path) `encoding = {} for k,v in ds.variables.items(): encoding[k]={'chunks':tuple(chunks[dk] if dk in chunks else v.shape[i] for i,dk in enumerate(v.dims))} for i in range(21): d = ds.isel(frame=slice(i,i+1)) d = d.chunk(chunks) d.to_zarr(zarr_path,consolidated=True,(dict(mode='a',append_dim='frame') if i>0 else dict(encoding = encoding))) dsa = xr.open_zarr(str(zarr_path),consolidated=True) print(dsa.chunks,dsa.dims)` append_test_encoding(ds,chunks) ``` `Frozen(SortedKeysDict({'frame': (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), 'other': (50, 50)})) Frozen(SortedKeysDict({'frame': 21, 'other': 100})) Frozen(SortedKeysDict({'frame': (10, 10, 1), 'other': (50, 50)})) Frozen(SortedKeysDict({'frame': 21, 'other': 100})) Frozen(SortedKeysDict({'frame': (10, 10, 1), 'other': (50, 50)})) Frozen(SortedKeysDict({'frame': 21, 'other': 100}))`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr and xarray chunking compatibility and `to_zarr` performance 342531772
637635620	https://github.com/pydata/xarray/issues/2300#issuecomment-637635620	https://api.github.com/repos/pydata/xarray/issues/2300	MDEyOklzc3VlQ29tbWVudDYzNzYzNTYyMA==	chrisroat 1053153	2020-06-02T15:42:43Z	2020-06-02T15:42:43Z	CONTRIBUTOR	If there is a non-dimension coordinate, the error is also tickled. ``` import xarray as xr import numpy as np ds=xr.Dataset({'foo': (['bar'], np.zeros((100,)))}) Problem also affects non-dimension coords ds.coords['baz'] = ('bar', ['mah'] * 100) ds.to_zarr('test.zarr', mode='w') ds2 = xr.open_zarr('test.zarr') ds3 = ds2.chunk({'bar': 2}) ds3.foo.encoding = {} ds3.coords['baz'].encoding = {} # Need this, too. ds3.to_zarr('test3.zarr', mode='w') ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr and xarray chunking compatibility and `to_zarr` performance 342531772
627448680	https://github.com/pydata/xarray/issues/2300#issuecomment-627448680	https://api.github.com/repos/pydata/xarray/issues/2300	MDEyOklzc3VlQ29tbWVudDYyNzQ0ODY4MA==	dcherian 2448579	2020-05-12T16:22:55Z	2020-05-12T16:22:55Z	MEMBER	the question is whether the chunk() method should delete existing chunks attributes from encoding. IMO this is the user-friendly thing to do.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr and xarray chunking compatibility and `to_zarr` performance 342531772
598790404	https://github.com/pydata/xarray/issues/2300#issuecomment-598790404	https://api.github.com/repos/pydata/xarray/issues/2300	MDEyOklzc3VlQ29tbWVudDU5ODc5MDQwNA==	rabernat 1197350	2020-03-13T15:51:54Z	2020-03-13T15:51:54Z	MEMBER	Hi all. I am looking into this issue, trying to figure out if it is still a thing. I just tried @chrisbarber's MRE above using xarray v 0.15. `python import xarray as xr ds=xr.Dataset({'foo': (['bar'], np.zeros((505359,)))}) ds.to_zarr('test.zarr', mode='w') ds2=xr.open_zarr('test.zarr') ds2.to_zarr('test2.zarr', mode='w')` This now works without error, thanks to #2487. I can trigger the error in a third step: `python ds3 = ds2.chunk({'bar': 10000}) ds3.to_zarr('test3.zarr', mode='w')` raises NotImplementedError: Specified zarr chunks (63170,) would overlap multiple dask chunks ((10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 5359),). This is not implemented in xarray yet. Consider rechunking the data using `chunk()` or specifying different chunks in encoding. The problem is that, even though we rechunked the data, `chunks` key is still present in `encoding`. ```python print(ds3.foo.encoding) {'chunks': (63170,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, '_FillValue': nan, 'dtype': dtype('float64')} ``` This was populated with the variable was read from `test.zarr`. As a workaround, you can delete the encoding (either just the `chunk` attribute or all of it): `python ds3.foo.encoding = {} ds3.to_zarr('test3.zarr', mode='w')` This allows the operation to complete successfully. For all the users stuck on this problem (e.g. @abarciauskas-bgse): - update to the latest version of xarray and then - delete the encoding on your variables, or overwrite it with the `encoding` keyword in `to_zarr`. For xarray developers, the question is whether the `chunk()` method should delete existing `chunks` attributes from encoding.	{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr and xarray chunking compatibility and `to_zarr` performance 342531772
519688842	https://github.com/pydata/xarray/issues/2300#issuecomment-519688842	https://api.github.com/repos/pydata/xarray/issues/2300	MDEyOklzc3VlQ29tbWVudDUxOTY4ODg0Mg==	nbren12 1386642	2019-08-08T21:10:54Z	2019-08-08T21:10:54Z	CONTRIBUTOR	I am getting the same error too.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr and xarray chunking compatibility and `to_zarr` performance 342531772
493408428	https://github.com/pydata/xarray/issues/2300#issuecomment-493408428	https://api.github.com/repos/pydata/xarray/issues/2300	MDEyOklzc3VlQ29tbWVudDQ5MzQwODQyOA==	tinaok 46813815	2019-05-17T10:37:35Z	2019-05-17T10:37:35Z	NONE	Hi, I'm new to xarray & zarr , After reading a zarr file, I re-chunk the data using xarray.Dataset.chunk. Then create a newly chunked data stored as zarr file with xarray.Dataset.to_zarr But I get error message: 'NotImplementedError: Specified zarr chunks (200, 100, 1) would overlap multiple dask chunks ((50, 50, 50, 50), (25, 25, 25, 25), (10000,)). This is not implemented in xarray yet. Consider rechunking the data using `chunk()` or specifying different chunks in encoding.' My xarray version is12.1, & and my understanding is that according to this post https://github.com/pydata/xarray/issues/2300 .it is fixed, thus so it is implemented to 12.1?? Then why do I get 'notimplemented error ? Do I have to use 'del dsread.data.encoding['chunks']. each time before using 'Dataset.to_zarr' as a workaround? but probably I am missing somthing. I hope someone can point me out... I made a notebook here for reproducing the pb. https://github.com/tinaok/Pangeo-for-beginners/blob/master/3-1%20zarr%20and%20re-chunking%20bug%20report.ipynb thanks for your help, regards Tina	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr and xarray chunking compatibility and `to_zarr` performance 342531772
406732486	https://github.com/pydata/xarray/issues/2300#issuecomment-406732486	https://api.github.com/repos/pydata/xarray/issues/2300	MDEyOklzc3VlQ29tbWVudDQwNjczMjQ4Ng==	chrisbarber 1530840	2018-07-20T21:33:08Z	2018-07-20T21:33:08Z	NONE	I took a closer look and noticed my one-dimensional fields of size 505359 were reporting a chunksize or 63170. Turns out that's enough to come up with a minimal repro: ```python xr.version '0.10.8' ds=xr.Dataset({'foo': (['bar'], np.zeros((505359,)))}) ds.to_zarr('test.zarr') <xarray.backends.zarr.ZarrStore object at 0x7fd9680f7fd0> ds2=xr.open_zarr('test.zarr') ds2 <xarray.Dataset> Dimensions: (bar: 505359) Dimensions without coordinates: bar Data variables: foo (bar) float64 dask.array<shape=(505359,), chunksize=(63170,)> ds2.foo.encoding {'chunks': (63170,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, '_FillValue': nan, 'dtype': dtype('float64')} ds2.to_zarr('test2.zarr') `raises` NotImplementedError: Specified zarr chunks (63170,) would overlap multiple dask chunks ((63170, 63170, 63 170, 63170, 63170, 63170, 63170, 63169),). This is not implemented in xarray yet. Consider rechunking th e data using `chunk()` or specifying different chunks in encoding. ```	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr and xarray chunking compatibility and `to_zarr` performance 342531772
406718847	https://github.com/pydata/xarray/issues/2300#issuecomment-406718847	https://api.github.com/repos/pydata/xarray/issues/2300	MDEyOklzc3VlQ29tbWVudDQwNjcxODg0Nw==	shoyer 1217238	2018-07-20T20:31:42Z	2018-07-20T20:31:42Z	MEMBER	Curious: Is there any downside in xarray to using datasets with inconsistent chunks? No, there's no downside here. It's just not possible to define a single dict of chunks in this case. Can you look into the `encoding` attributes of any variables you load from disk? It would also help to come up with a self-contained example that reproduces this using dummy data.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr and xarray chunking compatibility and `to_zarr` performance 342531772
406705740	https://github.com/pydata/xarray/issues/2300#issuecomment-406705740	https://api.github.com/repos/pydata/xarray/issues/2300	MDEyOklzc3VlQ29tbWVudDQwNjcwNTc0MA==	chrisbarber 1530840	2018-07-20T19:36:08Z	2018-07-20T19:38:03Z	NONE	Ah, that's great. I do see some improvement. Specifically, I can now set chunks using xarray, and successfully write to zarr, and reopen it. However, when reopening it I do find that the chunks have been inconsistently applied (some fields have the expected chunksize whereas some small fields have the entire variable in one chunk). Furthermore, trying to write a second time with `to_zarr` leads to: `*** NotImplementedError: Specified zarr chunks (100,) would overlap multiple dask chunks ((100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 4),). This is not implemented in xarray yet. Consider rechunking the data using`chunk()`or specifying different chunks in encoding.` Trying to reapply the original chunks with `xr.Dataset.chunk` succeeds, and `ds.chunks` no longer reports "inconsistent chunks", but trying to write still produces the same error. I also tried loading my entire dataset into memory, allowing the initial `to_zarr` to default to zarr's chunking heuristics. Trying to read and write a second time again results in the same error: `NotImplementedError: Specified zarr chunks (63170,) would overlap multiple dask chunks ((63170, 63170, 63170, 63170, 63170, 63170, 63170, 63169),). This is not implemented in xarray yet. Consider rechunking the data using`chunk()`or specifying different chunks in encoding.` I tried this round-tripping experiment with my monkey patches, and it works for a sequence of read/write/read/write... without any intervention in between. This only works for default zarr-chunking, however, since the patch to `xr.backends.zarr._determine_zarr_chunks` overrides whatever chunks are on the originating dataset. Curious: Is there any downside in xarray to using datasets with inconsistent chunks? I take it that it is a supported configuration because xarray allows it to happen, but just outputs that error when calling `ds.chunks`, which is just a sort of convenience method for looking at chunks across a whole dataset which happens to have consistent chunks...? One other thing to add: it might be nice to have an option to allow zarr auto-chunking even when `chunks!={}`. I don't know how sensitive zarr performance is to chunksizes, but it'd be nice to have some form of sane auto-chunking available when you don't want to bother with manually choosing.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr and xarray chunking compatibility and `to_zarr` performance 342531772
406165245	https://github.com/pydata/xarray/issues/2300#issuecomment-406165245	https://api.github.com/repos/pydata/xarray/issues/2300	MDEyOklzc3VlQ29tbWVudDQwNjE2NTI0NQ==	shoyer 1217238	2018-07-19T06:08:26Z	2018-07-19T06:08:26Z	MEMBER	I just pushed a new xarray release (0.10.8) earlier today. We had a fix for zarr chunking in there (https://github.com/pydata/xarray/pull/2228) -- does that solve your issue?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr and xarray chunking compatibility and `to_zarr` performance 342531772

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

15 rows where issue = 342531772 sorted by updated_at descending

sometime before 0.16.0

with 0.16.0

Problem also affects non-dimension coords

Advanced export