github: issue_comments: 731 rows where user = 1197350 sorted by updated

731 rows where user = 1197350 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
285380106	https://github.com/pydata/xarray/issues/1303#issuecomment-285380106	https://api.github.com/repos/pydata/xarray/issues/1303	MDEyOklzc3VlQ29tbWVudDI4NTM4MDEwNg==	rabernat 1197350	2017-03-09T15:18:18Z	2024-02-06T17:57:21Z	MEMBER	Just wanted to link to a somewhat related discussion happening in brian-rose/climlab#50.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`xarray.core.variable.as_variable()` part of the public API? 213004586
1534724554	https://github.com/pydata/xarray/issues/3213#issuecomment-1534724554	https://api.github.com/repos/pydata/xarray/issues/3213	IC_kwDOAMm_X85begnK	rabernat 1197350	2023-05-04T12:51:59Z	2023-05-04T12:51:59Z	MEMBER	I suspect (but don't know, as I'm just a user of xarray, not a developer) that it's also not thoroughly tested. Existing sparse testing is here: https://github.com/pydata/xarray/blob/main/xarray/tests/test_sparse.py We would welcome enhancements to this!	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	How should xarray use/support sparse arrays? 479942077
1534001190	https://github.com/pydata/xarray/issues/3213#issuecomment-1534001190	https://api.github.com/repos/pydata/xarray/issues/3213	IC_kwDOAMm_X85bbwAm	rabernat 1197350	2023-05-04T02:36:57Z	2023-05-04T02:36:57Z	MEMBER	Hi @jdbutler and welcome! We would welcome this sort of contribution eagerly. I would characterize our current support of sparse arrays as really just a proof of concept. When to use sparse and how to do it effectively is not well documented. Simply adding more documentation around the already-supported use cases would be a great place to start IMO. My own exploration of this are described in this Pangeo post. The use case is regridding. It touches on quite a few of the points you're interested in, in particular the integration with geodataframe. Along similar lines, @dcherian has been working on using opt_einsum together with sparse in https://github.com/pangeo-data/xESMF/issues/222#issuecomment-1524041837 and https://github.com/pydata/xarray/issues/7764. I'd also suggest catching up on what @martinfleis is doing with vector data cubes in xvec. (See also Pangeo post on this topic.) Of the three topics you enumerated, I'm most interested in the serialization one. However, I'd rather see serialization of sparse arrays prototyped in Zarr, as its much more conducive to experimentation than NetCDF (which requires writing C to do anything custom). I would recommend exploring serialization from a sparse array in memory to a sparse format on disk via a custom codec. Zarr recently added support for a `meta_array` parameter that determines what array type is materialized by the codec pipeline (see https://github.com/zarr-developers/zarr-python/pull/1131). The use case there was loading data direct to GPU. In a way sparse is similar--it's an array container that is not numpy or dask.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	How should xarray use/support sparse arrays? 479942077
1524332001	https://github.com/pydata/xarray/issues/7764#issuecomment-1524332001	https://api.github.com/repos/pydata/xarray/issues/7764	IC_kwDOAMm_X85a23Xh	rabernat 1197350	2023-04-27T00:56:21Z	2023-04-27T00:56:21Z	MEMBER	Is there ever a case where it would be preferable to use numpy if opt_einsum were installed? If not, I would propose that, like bottleneck, we just automatically use it if available.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Support opt_einsum in xr.dot 1672288892
1497579600	https://github.com/pydata/xarray/issues/7716#issuecomment-1497579600	https://api.github.com/repos/pydata/xarray/issues/7716	IC_kwDOAMm_X85ZQ0BQ	rabernat 1197350	2023-04-05T14:23:57Z	2023-04-05T14:23:57Z	MEMBER	Do we have a plan to support pandas 2?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	bad conda solve with pandas 2 1654022522
1492139481	https://github.com/pydata/xarray/issues/6323#issuecomment-1492139481	https://api.github.com/repos/pydata/xarray/issues/6323	IC_kwDOAMm_X85Y8D3Z	rabernat 1197350	2023-03-31T15:31:55Z	2023-03-31T15:31:55Z	MEMBER	We should also consider a configuration option to automatically drop encoding.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	propagation of `encoding` 1158378382
1460185069	https://github.com/pydata/xarray/issues/7039#issuecomment-1460185069	https://api.github.com/repos/pydata/xarray/issues/7039	IC_kwDOAMm_X85XCKft	rabernat 1197350	2023-03-08T13:51:06Z	2023-03-08T13:51:06Z	MEMBER	Rather than using the scale_factor and add_offset approach, I would look into xbitinfo if you want to optimize your compression.	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Encoding error when saving netcdf 1373352524
1460182260	https://github.com/pydata/xarray/pull/7540#issuecomment-1460182260	https://api.github.com/repos/pydata/xarray/issues/7540	IC_kwDOAMm_X85XCJz0	rabernat 1197350	2023-03-08T13:48:51Z	2023-03-08T13:49:21Z	MEMBER	Regarding locks, I think we need to think hard about the best way to deal with this across the stack. There are a couple of different options: - Current status: just use a global lock on the entire array--super inefficient - A bit better: use per-variable locks - Even better: have locks at the shard level. This would allow concurrent writing of shards - Alternative which accomplishes the same thing: expose different virtual chunks when reading vs. writing. When writing, the writer library (e.g. Xarray or Dask) would see the shards as the chunks (with a lower layer of the stack handling breaking the shard down into chunks). When reading, the individual, smaller chunks would be accessible. Note that there are still some deep inefficiencies in the way zarr-python writes shards (see https://github.com/zarr-developers/zarr-python/discussions/1338). I think we should be optimizing things at the Zarr level first, before implementing workarounds in Xarray.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	added 'storage_transformers' to valid_encodings 1588516592
1460175664	https://github.com/pydata/xarray/pull/7540#issuecomment-1460175664	https://api.github.com/repos/pydata/xarray/issues/7540	IC_kwDOAMm_X85XCIMw	rabernat 1197350	2023-03-08T13:44:02Z	2023-03-08T13:44:02Z	MEMBER	It's great to see this PR get started in Xarray! Thanks @JMorado! From the perspective of a Zarr developer, the sharding feature is still highly experimental. The API may change significantly. While the sharding code is released in the sense that it is available deep in Zarr, it is not really considered part of the public API yet. So perhaps it's a bit too early to be doing this?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	added 'storage_transformers' to valid_encodings 1588516592
1422860618	https://github.com/pydata/xarray/issues/7515#issuecomment-1422860618	https://api.github.com/repos/pydata/xarray/issues/7515	IC_kwDOAMm_X85UzyFK	rabernat 1197350	2023-02-08T16:05:13Z	2023-02-08T16:47:59Z	MEMBER	It seems like there are at least 3 separate topics being discussed here. Could Xarray wrap Aesara / PyTensor arrays, in the same way it wraps numpy arrays, Dask arrays, cupy arrays, sparse arrays, pint arrays, etc? This way, Xarray users could benefit from the performance and other features of Aesara while keeping the high-level analysis API they know and love. AFAIU, Any array library that implements the NEP 37 protocol should be wrappable. This is Joe's original topic. Should Aesara / PyTensor implement their own versions of named dimensions and coordinates? This is an internal question for those projects. Not the original topic, but nevertheless we would love to help by exposing some Xarray internals for reuse by other packages (this is on our roadmap). It would be a shame to reinvent wheels unnecessarily. I would be interested in understanding the tradeoffs and different use cases between this and topic 1. Pre-existing tensions between Aesara and PyTensor. Since this conversation is happening on our issue tracker, I'll point to our code of conduct and hope that the conversation can remain positive and respectful of all viewpoints. From our point of view as Xarray devs, PyTensor and Aesara do indeed seem quite similar in scope. It would be wonderful if we could all work together in some way towards topic 1.	{ "total_count": 8, "+1": 8, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Aesara as an array backend in Xarray 1575494367
1416643026	https://github.com/pydata/xarray/pull/7142#issuecomment-1416643026	https://api.github.com/repos/pydata/xarray/issues/7142	IC_kwDOAMm_X85UcEHS	rabernat 1197350	2023-02-04T03:02:09Z	2023-02-04T03:02:09Z	MEMBER	I just noticed our very low coverage rating and found this PR. Did this PR work? Should we update it and merge? It would be great to have our coverage back in the 90s rather than the 50s 😝 .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fix Codecov 1401132297
1412408324	https://github.com/pydata/xarray/pull/7496#issuecomment-1412408324	https://api.github.com/repos/pydata/xarray/issues/7496	IC_kwDOAMm_X85UL6QE	rabernat 1197350	2023-02-01T17:06:47Z	2023-02-01T17:06:47Z	MEMBER	It is true that Xarray is now becoming very different from pandas in how it opens data.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	deprecate open_zarr 1564661430
1385683582	https://github.com/pydata/xarray/issues/7446#issuecomment-1385683582	https://api.github.com/repos/pydata/xarray/issues/7446	IC_kwDOAMm_X85Sl9p-	rabernat 1197350	2023-01-17T16:23:01Z	2023-01-17T16:23:01Z	MEMBER	Hi @gauteh! This is very cool! Thanks for sharing. I'm really excited about way that Rust can be used to optimized different parts of our stack. A couple of questions: - Can your reader read over HTTP / S3 protocol? Or is it just local files? - Do you know about kerchunk? The approach you described: The reader works by indexing the chunks of a dataset so that chunks can be accessed independently. ...is identical to the approach taken by Kerchunk (although the implementation is different). I'm curious what specification you use to store your indexes. Could we make your implementation interoperable with kerchunk, such that a kerchunk reference specification could be read by your reader? It would be great to reach for some degree of alignment here. - Do you know about hdf5-coro - http://icesat2sliderule.org/h5coro/ - they have similar goals, but focused on cloud-based access I hope this can be of general interest, and if it would be of interest to move the hidefix xarray backend into xarray that would be very cool. This is definitely of general interest! However, it is not necessary to add a new backend directly into xarray. We support entry points which allow packages to implement their own readers, as you have apparently already discovered: https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html Installing your package should be enough to enable the new engine. We would, however, welcome a documentation PR that described how to use this package on the I/O page.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel + multi-threaded reading of NetCDF4 + HDF5: Hidefix! 1536004355
1378079073	https://github.com/pydata/xarray/pull/7418#issuecomment-1378079073	https://api.github.com/repos/pydata/xarray/issues/7418	IC_kwDOAMm_X85SI9Fh	rabernat 1197350	2023-01-11T00:34:03Z	2023-01-11T00:34:03Z	MEMBER	we should carefully evaluate the datatree API to make sure we won't want to change it soon I agree with this. We could use the PR process for this review.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Import datatree in xarray? 1519552711
1373993285	https://github.com/pydata/xarray/issues/3996#issuecomment-1373993285	https://api.github.com/repos/pydata/xarray/issues/3996	IC_kwDOAMm_X85R5XlF	rabernat 1197350	2023-01-06T18:36:56Z	2023-01-06T18:47:48Z	MEMBER	We found a nice solution to this using @TomNicholas's Datatree ```python import xarray as xr import datatree dt = datatree.open_datatree("AQUA_MODIS.20220809T182500.L2.OC.nc") def fix_dimension_names(ds): if 'pixel_control_points' in ds.dims: ds = ds.swap_dims({'pixel_control_points': 'pixels_per_line'}) return ds dt_fixed = dt.map_over_subtree(fix_dimension_names) all_dsets = [subtree.ds for node, subtree in dt_fixed.items()] ds = xr.merge(all_dsets, combine_attrs="drop_conflicts") ds = ds.set_coords(['latitude', 'longitude']) ds.chlor_a.plot(x="longitude", y="latitude", robust=True) ```	{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 1, "eyes": 0 }	MODIS L2 Data Missing Data Variables and Geolocation Data 605608998
1372822656	https://github.com/pydata/xarray/pull/7418#issuecomment-1372822656	https://api.github.com/repos/pydata/xarray/issues/7418	IC_kwDOAMm_X85R05yA	rabernat 1197350	2023-01-05T21:50:53Z	2023-01-05T21:50:53Z	MEMBER	I personally favor just copying the code into Xarray and archiving the old repo.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Import datatree in xarray? 1519552711
1372802153	https://github.com/pydata/xarray/pull/7418#issuecomment-1372802153	https://api.github.com/repos/pydata/xarray/issues/7418	IC_kwDOAMm_X85R00xp	rabernat 1197350	2023-01-05T21:31:33Z	2023-01-05T21:31:33Z	MEMBER	At what stage is atatree "ready" to moved in here? At what stage should it become encouraged public API? My opinion is that Datatree should move into Xarray now, ideally in a way that does not disrupt any existing user code, and that Datatree should become a first-class Xarray object (together with DataArray, and Dataset). Since it's a new feature, we don't necessarily have to be super conservative here. I think it is more than good enough / stable enough in its current state. What's a good way to slowly roll the feature out? Since Datatree sits above DataArray and Dataset, it should not interfere with any of our existing API. As long as test coverage is good, documentation is solid, and the code style matches the rest of Xarray, I think we can just bring it in. How do I decrease the bus factor on datatree's code? Can I get some code reviews during the merging process? 🙏 I think that it is inevitable that you Tom will be the main owner of the Datatree code at the beginning (as @shoyer was of all of Xarray when he first released it). Over time, if people use it, some fraction of users will become maintainers, starting with the existing dev team. Should I make a new CI environment just for testing datatree stuff? Why? Are its dependencies different from Xarray?	{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Import datatree in xarray? 1519552711
1315553661	https://github.com/pydata/xarray/issues/5878#issuecomment-1315553661	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X85OacF9	rabernat 1197350	2022-11-15T16:22:30Z	2022-11-15T16:22:30Z	MEMBER	Your issue is that the consolidated metadata have not been updated: ```python import gcsfs fs = gcsfs.GCSFileSystem() the latest array metadata print(fs.cat('gs://ldeo-glaciology/append_test/test30/temperature/.zarray').decode()) -> "shape": [ 6 ] the consolidated metadata print(fs.cat(''gs://ldeo-glaciology/append_test/test30/.zmetadata'').decode()) -> "shape": [ 3 ] ``` There are two ways to fix this. Don't use consolidated metadatda on read. (This will be a bit slower) `python ds = xr.open_dataset('gs://ldeo-glaciology/append_test/test30', engine='zarr', consolidated=False)` Reconsolidate your metadata after append. https://zarr.readthedocs.io/en/stable/tutorial.html#consolidating-metadata	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
1300863799	https://github.com/pydata/xarray/issues/6308#issuecomment-1300863799	https://api.github.com/repos/pydata/xarray/issues/6308	IC_kwDOAMm_X85NiZs3	rabernat 1197350	2022-11-02T16:39:53Z	2022-11-02T16:39:53Z	MEMBER	Just found this issue! I agree that this would be helpful. But isn't it fundamentally a Dask issue? Vanilla Xarray + Numpy has none of these problems because everything is in memory.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.doctor(): diagnostics on a Dataset / DataArray ? 1151751524
1255550548	https://github.com/pydata/xarray/issues/6818#issuecomment-1255550548	https://api.github.com/repos/pydata/xarray/issues/6818	IC_kwDOAMm_X85K1i5U	rabernat 1197350	2022-09-22T21:09:15Z	2022-09-22T21:09:15Z	MEMBER	I just hit this same bug with numpy 1.23.3. Installing xarray from github main branch fixed it. I think we really need to release soon (#7069).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 2022.6.0 doesn't work well with numpy 1.20 1315607023
1248302788	https://github.com/pydata/xarray/issues/7039#issuecomment-1248302788	https://api.github.com/repos/pydata/xarray/issues/7039	IC_kwDOAMm_X85KZ5bE	rabernat 1197350	2022-09-15T16:02:17Z	2022-09-15T16:02:17Z	MEMBER	I am curious as to what exactly from the encoding introduces the noise (I still need to read through the documentation more thoroughly)? The encoding says that your data should be encoded according to the following pseudocode formula: `encoded = int((original - offset) / scale_factor) decoded = (scale_factor * float(encoded)) + offset` So the floating-point data are converted back and forth to a less precise type (integer) in order to save space. These numerical operations cannot preserve exact floating point accuracy. That's just how numerical float-point operations work. If you skip the encoding, then you just write the floating point bytes directly to disk, with no loss of precision. This sort of encoding a crude form of lossy compression that is still unfortunately in use, even though there are much better algorithms available (and built into netcdf and zarr). Differences on the order of 10^-14 should not affect any real-world calculations. However, this seems like a much, much smaller difference than the problem you originally reported. This suggests that the MRE does not actually reproduce the bug after all. How was the plot above (https://github.com/pydata/xarray/issues/7039#issue-1373352524) generated? From your actual MRE code? Or from your earlier example with real data?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Encoding error when saving netcdf 1373352524
1248241823	https://github.com/pydata/xarray/issues/7039#issuecomment-1248241823	https://api.github.com/repos/pydata/xarray/issues/7039	IC_kwDOAMm_X85KZqif	rabernat 1197350	2022-09-15T15:12:34Z	2022-09-15T15:12:34Z	MEMBER	I'm puzzled that I was not able to reproduce this error. I modified the end slightly as follows ```python save dataset as netcdf ds.to_netcdf("test.nc") load saved dataset ds_test = xr.open_dataset('test.nc') verify that the two are equal within numerical precision xr.testing.assert_allclose(ds, ds_test) plot plt.plot(ds.t2m - ds_test.t2m) ``` In my case, the differences were just numerical noise (order 10^-14) I used the binder environment for this. I'm pretty stumped.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Encoding error when saving netcdf 1373352524
1248098918	https://github.com/pydata/xarray/issues/7039#issuecomment-1248098918	https://api.github.com/repos/pydata/xarray/issues/7039	IC_kwDOAMm_X85KZHpm	rabernat 1197350	2022-09-15T13:25:11Z	2022-09-15T13:25:11Z	MEMBER	Thanks so much for taking the time to write up this detailed bug report! 🙏	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Encoding error when saving netcdf 1373352524
1246005938	https://github.com/pydata/xarray/issues/2812#issuecomment-1246005938	https://api.github.com/repos/pydata/xarray/issues/2812	IC_kwDOAMm_X85KRIqy	rabernat 1197350	2022-09-13T22:18:31Z	2022-09-13T22:18:31Z	MEMBER	Glad you got it working! So you're saying it does not work with `open_zarr` and does work with `open_dataset(...engine='zarr')`? Weird. We should deprecate `open_zarr`. However, the behavior in Dask is strange. I think it is making each worker have its own cache and blowing up memory if I ask for a large cache. Yes, I think I experienced that as well. I think the entire cache is serialized and passed around between workers.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	expose zarr caching from xarray 421029352
1243823078	https://github.com/pydata/xarray/issues/2812#issuecomment-1243823078	https://api.github.com/repos/pydata/xarray/issues/2812	IC_kwDOAMm_X85KIzvm	rabernat 1197350	2022-09-12T14:25:39Z	2022-09-12T14:25:39Z	MEMBER	I have successfully used the Zarr LRU cache with Xarray. You just have to initialize the Store object outside of Xarray and then pass it to `open_zarr` or `open_dataset(store, engine="zarr")`. Have you tried that?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	expose zarr caching from xarray 421029352
1216491512	https://github.com/pydata/xarray/issues/6916#issuecomment-1216491512	https://api.github.com/repos/pydata/xarray/issues/6916	IC_kwDOAMm_X85Igi_4	rabernat 1197350	2022-08-16T11:11:38Z	2022-08-16T11:11:38Z	MEMBER	As a general principle, I think we should try to put enough information in `encoding` to enable one to re-open the dataset from scratch with the same parameters. So that would mean including the engine and other `open_dataset` options in `encoding`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Given zarr-backed Xarray determine store and group 1339129609
1170451917	https://github.com/pydata/xarray/pull/6721#issuecomment-1170451917	https://api.github.com/repos/pydata/xarray/issues/6721	IC_kwDOAMm_X85Fw63N	rabernat 1197350	2022-06-29T20:15:15Z	2022-06-29T20:15:15Z	MEMBER	Awesome work!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fix .chunks loading lazy backed array data 1284071791
1146377099	https://github.com/pydata/xarray/issues/6662#issuecomment-1146377099	https://api.github.com/repos/pydata/xarray/issues/6662	IC_kwDOAMm_X85EVFOL	rabernat 1197350	2022-06-03T21:30:48Z	2022-06-03T21:30:48Z	MEMBER	Following up on the suggestion from @shoyer in to not use a context manager, if I redefine my function as ```python def open_pickle_and_reload(path): of = fsspec.open(path, mode='rb').open() ds1 = xr.open_dataset(of, engine='h5netcdf') `# pickle it and reload it ds2 = loads(dumps(ds1)) ds2.load()` ``` ...it appears to work fine.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Obscure h5netcdf http serialization issue with python's http.server 1260047355
1146184372	https://github.com/pydata/xarray/issues/6662#issuecomment-1146184372	https://api.github.com/repos/pydata/xarray/issues/6662	IC_kwDOAMm_X85EUWK0	rabernat 1197350	2022-06-03T17:05:00Z	2022-06-03T17:06:26Z	MEMBER	`python with fsspec.open('http://127.0.0.1:8000/tiny.nc', mode='rb') as fp: with xr.open_dataset(fp, engine='h5netcdf') as ds1: print(type(fp)) print(fp.__dict__) ds1.load()` <class 'fsspec.implementations.http.HTTPFile'> {'asynchronous': False, 'url': 'http://127.0.0.1:8000/tiny.nc', 'session': <aiohttp.client.ClientSession object at 0x18bcdddc0>, '_details': {'name': 'http://127.0.0.1:8000/tiny.nc', 'size': 6164, 'type': 'file'}, 'size': 6164, 'path': 'http://127.0.0.1:8000/tiny.nc', 'fs': <fsspec.implementations.http.HTTPFileSystem object at 0x110059dc0>, 'mode': 'rb', 'blocksize': 5242880, 'loc': 1075, 'autocommit': True, 'end': None, 'start': None, '_closed': False, 'kwargs': {}, 'cache': <fsspec.caching.BytesCache object at 0x18eda16d0>, 'loop': <_UnixSelectorEventLoop running=True closed=False debug=False>}	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Obscure h5netcdf http serialization issue with python's http.server 1260047355
1146119478	https://github.com/pydata/xarray/issues/6662#issuecomment-1146119478	https://api.github.com/repos/pydata/xarray/issues/6662	IC_kwDOAMm_X85EUGU2	rabernat 1197350	2022-06-03T16:04:21Z	2022-06-03T16:05:40Z	MEMBER	The `http.server` apparently does not accept range requests. That could definitely be related. However, I don't understand why that would affect only the pickled version. If the server doesn't support range requests, how are we able to load the file at all? This works: `python with fsspec.open('http://127.0.0.1:8000/tiny.nc', mode='rb') as fp: with xr.open_dataset(fp, engine='h5netcdf') as ds1: ds1.load()`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Obscure h5netcdf http serialization issue with python's http.server 1260047355
1146099479	https://github.com/pydata/xarray/issues/6662#issuecomment-1146099479	https://api.github.com/repos/pydata/xarray/issues/6662	IC_kwDOAMm_X85EUBcX	rabernat 1197350	2022-06-03T15:54:34Z	2022-06-03T15:54:34Z	MEMBER	Python's HTTP server does not normally provide content lengths without some extra work, that might be the difference. Don't think that's it. `% curl -I "http://127.0.0.1:8000/tiny.nc" HTTP/1.0 200 OK Server: SimpleHTTP/0.6 Python/3.9.9 Date: Fri, 03 Jun 2022 15:53:52 GMT Content-type: application/x-netcdf Content-Length: 6164 Last-Modified: Fri, 03 Jun 2022 15:00:52 GMT`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Obscure h5netcdf http serialization issue with python's http.server 1260047355
1137851771	https://github.com/pydata/xarray/issues/6633#issuecomment-1137851771	https://api.github.com/repos/pydata/xarray/issues/6633	IC_kwDOAMm_X85D0j17	rabernat 1197350	2022-05-25T21:10:44Z	2022-05-25T21:10:44Z	MEMBER	Yes it is definitely a pathological example. 💣 But the fact remains that there are many cases where we just want to discover dataset contents as quickly as possible and want to avoid the cost of loading coordinates and creating indexes.	{ "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Opening dataset without loading any indexes? 1247010680
1137821786	https://github.com/pydata/xarray/issues/6633#issuecomment-1137821786	https://api.github.com/repos/pydata/xarray/issues/6633	IC_kwDOAMm_X85D0cha	rabernat 1197350	2022-05-25T20:34:30Z	2022-05-25T20:34:59Z	MEMBER	Here is an example that really highlights the performance cost of always loading dimension coordinates: `python import zarr store = zarr.storage.FSStore("s3://mur-sst/zarr/", anon=True) %time list(zarr.open_consolidated(store)) # -> Wall time: 86.4 ms %time ds = xr.open_dataset(store, engine='zarr') # -> Wall time: 17.1 s` `%prun` confirms that Xarray is spending most of its time just loading data for the `time` axis, which you can reproduce at the zarr level as: `python zgroup = zarr.open_consolidated(store) %time _ = zgroup['time'][:] # -> Wall time: 14.7 s` Obviously this example is pretty extreme. There are things that could be done to optimize it, etc. But it really highlights the costs of eagerly loading dimension coordinates. If I don't care about label-based indexing for this dataset, I would rather have my 17s back! :+1: to "`indexes={}` (empty dictionary) to explicitly skip creating indexes".	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Opening dataset without loading any indexes? 1247010680
1122649316	https://github.com/pydata/xarray/issues/4628#issuecomment-1122649316	https://api.github.com/repos/pydata/xarray/issues/4628	IC_kwDOAMm_X85C6kTk	rabernat 1197350	2022-05-10T17:00:47Z	2022-05-10T17:02:34Z	MEMBER	Any pointers regarding where to start / modules involved to implement this? I would like to have a try. The starting point would be to look at the code in indexing.py and try to understand how lazy indexing works. In particular, look at https://github.com/pydata/xarray/blob/3920c48d61d1f213a849bae51faa473b9c471946/xarray/core/indexing.py#L465-L470 Then you may want to try writing a class that looks like ```python class LazilyConcatenatedArray: # have to decide what to inherit from `def __init__(self, *arrays: LazilyIndexedArray, concat_axis=0): # figure out what you need to keep track of @property def shape(self): # figure out how to determine the total shape def __getitem__(self, indexer) -> LazilyIndexedArray: # figure out how to map an indexer to the right piece of data` ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Lazy concatenation of arrays 753852119
1122567902	https://github.com/pydata/xarray/issues/6588#issuecomment-1122567902	https://api.github.com/repos/pydata/xarray/issues/6588	IC_kwDOAMm_X85C6Qbe	rabernat 1197350	2022-05-10T15:48:03Z	2022-05-10T15:48:03Z	MEMBER	Oops sorry for the duplicate issue! 🤦	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Support lazy concatenation without dask 1231184996
1115292947	https://github.com/pydata/xarray/pull/6566#issuecomment-1115292947	https://api.github.com/repos/pydata/xarray/issues/6566	IC_kwDOAMm_X85CegUT	rabernat 1197350	2022-05-02T19:46:06Z	2022-05-02T19:46:06Z	MEMBER	Exposing this options seems like a great idea IMO. I'm not sure the best way to test this. I think the most basic test is just to make sure the `inline=True` option gets invoked in the test suite. Going further, one could examine the dask graph to make sure inlining is actually happening, but that sounds fragile and maybe also not xarray's responsibility. Let's just make sure it gets to dask.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	New inline_array kwarg for open_dataset 1223270563
1113408611	https://github.com/pydata/xarray/issues/6538#issuecomment-1113408611	https://api.github.com/repos/pydata/xarray/issues/6538	IC_kwDOAMm_X85CXURj	rabernat 1197350	2022-04-29T14:46:13Z	2022-04-29T14:46:13Z	MEMBER	Thanks so much for opening this @philippjfr! I agree this is a major regression. Accessing `.chunk` on a variable should not trigger eager loading of the data.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Accessing chunks on zarr backed xarray seems to load entire array into memory 1220990859
1102992117	https://github.com/pydata/xarray/issues/6484#issuecomment-1102992117	https://api.github.com/repos/pydata/xarray/issues/6484	IC_kwDOAMm_X85BvlL1	rabernat 1197350	2022-04-19T19:08:31Z	2022-04-19T19:08:31Z	MEMBER	Big :+1:	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Should we raise a more informative error on no zarr dir? 1203835220
1099797820	https://github.com/pydata/xarray/issues/6448#issuecomment-1099797820	https://api.github.com/repos/pydata/xarray/issues/6448	IC_kwDOAMm_X85BjZU8	rabernat 1197350	2022-04-15T02:38:48Z	2022-04-15T02:38:48Z	MEMBER	I am guilty of sidetracking this issue into the politics of CRS encoding. That discussion is important. But in the meantime, @wankoelias's original issue reveals is narrower technical issue with Xarray's Zarr writer: Xarray won't let you serialize a dictionary attribute to zarr, even though zarr has no problem with this. That is a problem we can fix pretty easily. The `_validate_attrs` helper function was just borrowed from `to_netcdf`: https://github.com/pydata/xarray/blob/586992e8d2998751cb97b1cab4d3caa9dca116e0/xarray/backends/api.py#L133-L135 We could refactor this function to be more flexible to account for zarr's broader range of allowed attribute types (as we have evidently already done for h5netcdf). Or we could just bypass it completely in the `to_zarr` method. That is the only real decision we need to make here right now. @wankoelias - you seem to understand the issue pretty well. Would you be game for making a PR? We would be glad to support you along the way.	{ "total_count": 2, "+1": 0, "-1": 0, "laugh": 2, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Writing GDAL ZARR _CRS attribute not possible 1194993450
1091703481	https://github.com/pydata/xarray/issues/6448#issuecomment-1091703481	https://api.github.com/repos/pydata/xarray/issues/6448	IC_kwDOAMm_X85BEhK5	rabernat 1197350	2022-04-07T12:57:17Z	2022-04-07T12:57:17Z	MEMBER	@christophenoel - I share your perspective. But there is a huge swath of the geospatial world who basically hate NetCDF and avoid it like the plague. These communities prefer to use geotiff and GDAL. We need to reach for interoperability.	{ "total_count": 2, "+1": 0, "-1": 0, "laugh": 1, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 }	Writing GDAL ZARR _CRS attribute not possible 1194993450
1090742693	https://github.com/pydata/xarray/issues/6448#issuecomment-1090742693	https://api.github.com/repos/pydata/xarray/issues/6448	IC_kwDOAMm_X85BA2ml	rabernat 1197350	2022-04-06T20:21:20Z	2022-04-06T20:22:40Z	MEMBER	I think the core problem here is that Zarr itself supports arbitrary json data structures as attributes, but netCDF does not. The Zarr serialization in Xarray is designed to emulate netCDF, but we could make that optional, for example, with a flag to bypass attribute encoding / decoding and just pass the python data directly through to Zarr. However, my concern would be that netCDF4 C library would not be able to read those files (nczarr). What happens if you try to open up a GDAL-created Zarr with netCDF4? FWIW, the new GeoZarr Spec by @christophenoel does not use the GDAL convention for CRS. Instead, it recommends to use CF conventions for encoding CRS. This is more compatible with NetCDF, but won't be parsed correctly by GDAL. I am a little discouraged that we have not managed to align better across projects so far (e.g. having this conversation before the GDAL Zarr CRS convention was implemented). 😞 For example, either of these two GDAL PRs: - https://github.com/OSGeo/gdal/pull/3896 - https://github.com/OSGeo/gdal/pull/4521 However, it is not too late! Let's try to reach for a standard way of encoding CRS in Zarr that can be used across languages and implementations of Zarr. My own preference would be to try to get GDAL to support the GeoZarr Spec and thus the CF-convention CRS attribute, rather than trying to get Xarray to be able to write the GDAL CRS convention.	{ "total_count": 7, "+1": 7, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Writing GDAL ZARR _CRS attribute not possible 1194993450
1076810559	https://github.com/pydata/xarray/issues/6374#issuecomment-1076810559	https://api.github.com/repos/pydata/xarray/issues/6374	IC_kwDOAMm_X85ALtM_	rabernat 1197350	2022-03-23T20:54:39Z	2022-03-23T20:54:39Z	MEMBER	Sure, to be clear, my hesitancy is mostly just around being reluctant to maintain more complexity in our zarr interface. If there is momentum to implement and maintain this compatibility, I am definitely not opposed. 🚀	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Should the zarr backend support NCZarr conventions? 1172229856
1076622767	https://github.com/pydata/xarray/issues/6374#issuecomment-1076622767	https://api.github.com/repos/pydata/xarray/issues/6374	IC_kwDOAMm_X85AK_Wv	rabernat 1197350	2022-03-23T17:39:57Z	2022-03-23T17:39:57Z	MEMBER	My opinion is that we should not try to support the nczarr conventions directly. Xarray already supports nczarr via netCDF4. If netCDF4 can open the Zarr store, then Xarray can read it. Supporting nczarr directly would require lots of custom logic within xarray. That's because nczarr introduces several additional metadata files that are not part of the zarr spec. These additional metadata files break the abstractions through which xarray interacts with zarr; working around this requires going under the hood, access the store object directly (rather than the zarr groups and arrays). I would turn this question around and ask: if netCDF4 supports access to these datasets directly, what's the advantage of xarray bypassing netCDF4 and opening them directly? If there are significant performance benefits, I would be more likely to consider it worthwhile.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Should the zarr backend support NCZarr conventions? 1172229856
1065385198	https://github.com/pydata/xarray/issues/6345#issuecomment-1065385198	https://api.github.com/repos/pydata/xarray/issues/6345	IC_kwDOAMm_X84_gHzu	rabernat 1197350	2022-03-11T18:41:11Z	2022-03-11T18:41:11Z	MEMBER	It seems like what we really want to do is verify that the datatype of the appended data matches the data type on disk.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`to_zarr` raises `ValueError: Invalid dtype` with `mode='a'` (but not with `mode='w'`) 1164454058
1065350469	https://github.com/pydata/xarray/issues/6345#issuecomment-1065350469	https://api.github.com/repos/pydata/xarray/issues/6345	IC_kwDOAMm_X84_f_VF	rabernat 1197350	2022-03-11T17:58:28Z	2022-03-11T17:58:28Z	MEMBER	Thanks for reporting this @kmsampson. My feeling is that it is a bug...which we can hopefully fix pretty easily!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`to_zarr` raises `ValueError: Invalid dtype` with `mode='a'` (but not with `mode='w'`) 1164454058
1063401936	https://github.com/pydata/xarray/issues/6345#issuecomment-1063401936	https://api.github.com/repos/pydata/xarray/issues/6345	IC_kwDOAMm_X84_YjnQ	rabernat 1197350	2022-03-09T21:43:49Z	2022-03-09T21:43:49Z	MEMBER	The relevant code is here https://github.com/pydata/xarray/blob/d293f50f9590251ce09543319d1f0dc760466f1b/xarray/backends/api.py#L1405-L1406 and here https://github.com/pydata/xarray/blob/d293f50f9590251ce09543319d1f0dc760466f1b/xarray/backends/api.py#L1280-L1298 What I don't understand is why different validation is needed for the append scenario than for the the write scenario. @shoyer worked on this in #5252, so maybe he has some ideas.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`to_zarr` raises `ValueError: Invalid dtype` with `mode='a'` (but not with `mode='w'`) 1164454058
1043038150	https://github.com/pydata/xarray/issues/1385#issuecomment-1043038150	https://api.github.com/repos/pydata/xarray/issues/1385	IC_kwDOAMm_X84-K3_G	rabernat 1197350	2022-02-17T14:57:03Z	2022-02-17T14:57:03Z	MEMBER	See deeper dive in https://github.com/pydata/xarray/discussions/6284	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	slow performance with open_mfdataset 224553135
1043016100	https://github.com/pydata/xarray/issues/1385#issuecomment-1043016100	https://api.github.com/repos/pydata/xarray/issues/1385	IC_kwDOAMm_X84-Kymk	rabernat 1197350	2022-02-17T14:36:23Z	2022-02-17T14:36:23Z	MEMBER	Ah ok so if that is your goal, `decode_times=False` should be enough to solve it. There is a problem with the time encoding in this file. The units (`days since 1950-01-01T00:00:00Z`) are not compatible with the values (738457.04166667, etc.). That would place your measurements sometime in the year 3971. This is part of the problem, but not the whole story.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	slow performance with open_mfdataset 224553135
1043001146	https://github.com/pydata/xarray/issues/1385#issuecomment-1043001146	https://api.github.com/repos/pydata/xarray/issues/1385	IC_kwDOAMm_X84-Ku86	rabernat 1197350	2022-02-17T14:21:45Z	2022-02-17T14:22:23Z	MEMBER	(I could post to a web server if there's any reason to prefer that.) In general that would be a little more convenient than google drive, because then we could download the file from python (rather than having a manual step). This would allow us to share a fully copy-pasteable code snippet to reproduce the issue. But don't worry about that for now. First, I'd note that your issue is not really related to `open_mfdataset` at all, since it is reproduced just using `open_dataset`. The core problem is that you have ~15M timesteps, and it is taking forever to decode the times out of them. It's fast when you do `decode_times=False` because the data aren't actually being read. I'm going to make a post over in discussions to dig a bit deeper into this. StackOverflow isn't monitored too regularly by this community.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	slow performance with open_mfdataset 224553135
1042937825	https://github.com/pydata/xarray/issues/1385#issuecomment-1042937825	https://api.github.com/repos/pydata/xarray/issues/1385	IC_kwDOAMm_X84-Kffh	rabernat 1197350	2022-02-17T13:14:50Z	2022-02-17T13:14:50Z	MEMBER	Hi Tom! 👋 So much has evolved about xarray since this original issue was posted. However, we continue to use it as a catchall for people looking to speed up open_mfdataset. I saw your stackoverflow post. Any chance you could post a link to the actual file in question?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	slow performance with open_mfdataset 224553135
1033782892	https://github.com/pydata/xarray/pull/6258#issuecomment-1033782892	https://api.github.com/repos/pydata/xarray/issues/6258	IC_kwDOAMm_X849nkZs	rabernat 1197350	2022-02-09T13:51:55Z	2022-02-09T13:51:55Z	MEMBER	came to the conclusion that the previously existing tests had been overly restrictive Sounds very likely!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	removed check for last dask chunk size in to_zarr 1128485610
1033779138	https://github.com/pydata/xarray/pull/5692#issuecomment-1033779138	https://api.github.com/repos/pydata/xarray/issues/5692	IC_kwDOAMm_X849njfC	rabernat 1197350	2022-02-09T13:47:43Z	2022-02-09T13:47:43Z	MEMBER	Just chiming in to say 💪 ! We see the work you are putting in @benbovy. I'm so excited to be using this feature. Is there a way I can help?	{ "total_count": 5, "+1": 5, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes 966983801
1033757210	https://github.com/pydata/xarray/pull/6258#issuecomment-1033757210	https://api.github.com/repos/pydata/xarray/issues/6258	IC_kwDOAMm_X849neIa	rabernat 1197350	2022-02-09T13:23:23Z	2022-02-09T13:23:23Z	MEMBER	Thanks for working on this Tobias! Yes I implemented much of the Dask / Zarr interface and would be happy to review when you're ready.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	removed check for last dask chunk size in to_zarr 1128485610
984940677	https://github.com/pydata/xarray/issues/1068#issuecomment-984940677	https://api.github.com/repos/pydata/xarray/issues/1068	IC_kwDOAMm_X846tQCF	rabernat 1197350	2021-12-02T19:36:12Z	2021-12-02T19:36:12Z	MEMBER	One solution to this problem might be the creation of a custom Xarray backend for NASA EarthData. This backend could manage authentication with EDL and have its own documentation. If this package were maintained by NASA, it would close the feedback loop more effectively.	{ "total_count": 5, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 4, "eyes": 1 }	Use xarray.open_dataset() for password-protected Opendap files 186169975
984920867	https://github.com/pydata/xarray/issues/1068#issuecomment-984920867	https://api.github.com/repos/pydata/xarray/issues/1068	IC_kwDOAMm_X846tLMj	rabernat 1197350	2021-12-02T19:08:54Z	2021-12-02T19:08:54Z	MEMBER	Just wanted to say how much I appreciate @betolink acting as a communication channel between Xarray and NASA. Users often end up on our issue tracker because Xarray raises errors whenever it can't read data. But the source of these problems is not with Xarray, it's with the upstream data provider. This also happens all the time with xmitgcm, e.g. https://github.com/MITgcm/xmitgcm/issues/266 It would be great if NASA had a better way to respond to these issues which didn't require that you "know a guy".	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Use xarray.open_dataset() for password-protected Opendap files 186169975
971790307	https://github.com/pydata/xarray/issues/5995#issuecomment-971790307	https://api.github.com/repos/pydata/xarray/issues/5995	IC_kwDOAMm_X8457Ffj	rabernat 1197350	2021-11-17T17:18:41Z	2021-11-17T17:18:41Z	MEMBER	How can i tell xarray to load/dump variable by variable without loading the entire file ? You could try to chunk the data and then Dask will write it for you in chunks. To do in in serial you could use the dask single-threaded scheduler.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	High memory usage of xarray vs netCDF4 function 1056247970
969021506	https://github.com/pydata/xarray/issues/5878#issuecomment-969021506	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X845whhC	rabernat 1197350	2021-11-15T15:25:37Z	2021-11-15T15:25:46Z	MEMBER	So there are two layers here where caching could be happening: - gcsfs / fsspec (python) - gcs itself I propose we eliminate the python layer entirely for the moment. Whenever you load the dataset, it's shape is completely determined by whatever zarr sees in `gs://ldeo-glaciology/append_test/test5/temperature/.zarray`. So try looking at this file directly. You can figure out its public URL and just do curl, e.g. `curl https://storage.googleapis.com/ldeo-glaciology/append_test/test5/temperature/.zarray { "chunks": [ 3 ], "compressor": { "blocksize": 0, "clevel": 5, "cname": "lz4", "id": "blosc", "shuffle": 1 }, "dtype": "<i8", "fill_value": null, "filters": null, "order": "C", "shape": [ 6 ], "zarr_format": 2 }` Run this from jupyterhub from the command line. Then try `gcs.cat('ldeo-glaciology/append_test/test5/temperature/.zarray'` and see if you see the same thing. Basically just eliminate as many layers as possible from the problem until you get to the core issue.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
968993065	https://github.com/pydata/xarray/issues/1068#issuecomment-968993065	https://api.github.com/repos/pydata/xarray/issues/1068	IC_kwDOAMm_X845wakp	rabernat 1197350	2021-11-15T14:58:05Z	2021-11-15T14:58:05Z	MEMBER	At what point do we escalate this issue to NASA? Is there a channel via which they can receive and respond to user feedback?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Use xarray.open_dataset() for password-protected Opendap files 186169975
967363845	https://github.com/pydata/xarray/issues/5878#issuecomment-967363845	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X845qM0F	rabernat 1197350	2021-11-12T19:18:38Z	2021-11-12T19:18:38Z	MEMBER	Ok I think I may understand what is happening ```python load the zarr store ds_both = xr.open_zarr(mapper) ``` When you do this, zarr reads a file called `gs://ldeo-glaciology/append_test/test5/temperature/.zarray`. Since the data are public, I can look at it right now: `$ gsutil cat gs://ldeo-glaciology/append_tet/test5/temperature/.zarray { "chunks": [ 3 ], "compressor": { "blocksize": 0, "clevel": 5, "cname": "lz4", "id": "blosc", "shuffle": 1 }, "dtype": "<i8", "fill_value": null, "filters": null, "order": "C", "shape": [ 6 ], }` Right now, it shows the shape is `[6]`, as expected after the appending. However, if you read the file immediately after appending (within the 3600s `max-age`), you will get the cached copy. The cached copy will still be of shape `[3]`--it won't know about the append. To test this hypothesis, you would need to disable caching on the bucket. Do you have privileges to do that?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
967142419	https://github.com/pydata/xarray/issues/5878#issuecomment-967142419	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X845pWwT	rabernat 1197350	2021-11-12T14:05:36Z	2021-11-12T14:05:36Z	MEMBER	Can you post the full stack trace of the error you get when you try to append?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
966665066	https://github.com/pydata/xarray/issues/5878#issuecomment-966665066	https://api.github.com/repos/pydata/xarray/issues/5878	IC_kwDOAMm_X845niNq	rabernat 1197350	2021-11-11T22:17:32Z	2021-11-11T22:17:32Z	MEMBER	I think that this is not an issue with xarray, zarr, or anything in python world but rather an issue with how caching works on GCS public buckets: https://cloud.google.com/storage/docs/metadata To test this, forget about xarray and zarr for a minute and just use gcsfs to list the bucket contents before and after your writes. I think you will find that the default cache lifetime of 3600 seconds means that you cannot "see" the changes to the bucket or the objects as quickly as needed in order to append.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	problem appending to zarr on GCS when using json token 1030811490
966324523	https://github.com/pydata/xarray/issues/1068#issuecomment-966324523	https://api.github.com/repos/pydata/xarray/issues/1068	IC_kwDOAMm_X845mPEr	rabernat 1197350	2021-11-11T13:59:55Z	2021-11-11T13:59:55Z	MEMBER	I'd like to tag @betolink in this issue. He knows quite a bit about both Xarray and Earthdata login. Maybe he can help us get to the bottom of these problems. Luis, any ideas?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Use xarray.open_dataset() for password-protected Opendap files 186169975
964084038	https://github.com/pydata/xarray/issues/5954#issuecomment-964084038	https://api.github.com/repos/pydata/xarray/issues/5954	IC_kwDOAMm_X845dsFG	rabernat 1197350	2021-11-09T11:56:30Z	2021-11-09T11:56:30Z	MEMBER	Thanks for the info @alexamici! 2. but most backends serialise writes anyway, so the advantage is limited. I'm not sure I understand this comment, specifically what is meant by "serialise writes". I often use Xarray to do distributed writes to Zarr stores using 100+ distributed dask workers. It works great. We would need the same thing from a TileDB backend. We are focusing on the user-facing API, but in the end, whether we call it `.to`, `.to_dataset`, or `.store_dataset` is not really a difficult or important question. It's clear we need some generic writing method. The much harder question is the back-end API. As Alessandro says: Adding support for a single save_dataset entry point to the backend API is trivial, but adding full support for possibly distributed writes looks like it is much more work.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Writeable backends via entrypoints 1047608434
961202990	https://github.com/pydata/xarray/issues/5918#issuecomment-961202990	https://api.github.com/repos/pydata/xarray/issues/5918	IC_kwDOAMm_X845Sssu	rabernat 1197350	2021-11-04T16:21:23Z	2021-11-04T16:21:23Z	MEMBER	Maybe @martindurant has some insights?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Reading zarr gives unspecific PermissionError: Access Denied when public data has been consolidated after being written to S3 1039844354
938741037	https://github.com/pydata/xarray/issues/1900#issuecomment-938741037	https://api.github.com/repos/pydata/xarray/issues/1900	IC_kwDOAMm_X8439A0t	rabernat 1197350	2021-10-08T15:41:29Z	2021-10-08T15:41:29Z	MEMBER	But Pydantic looks promising Big :+1: to this.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Representing & checking Dataset schemas 295959111
863452266	https://github.com/pydata/xarray/pull/5252#issuecomment-863452266	https://api.github.com/repos/pydata/xarray/issues/5252	MDEyOklzc3VlQ29tbWVudDg2MzQ1MjI2Ng==	rabernat 1197350	2021-06-17T18:07:28Z	2021-06-17T18:07:28Z	MEMBER	Really sorry I didn't get around to review. My excuse is that I moved back to NYC last week and fell behind on everything. Thanks for moving it forward. 💪	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Add mode="r+" for to_zarr and use consolidated writes/reads by default 874331538
863213400	https://github.com/pydata/xarray/issues/5028#issuecomment-863213400	https://api.github.com/repos/pydata/xarray/issues/5028	MDEyOklzc3VlQ29tbWVudDg2MzIxMzQwMA==	rabernat 1197350	2021-06-17T12:53:16Z	2021-06-17T12:53:22Z	MEMBER	So glad this got fixed upstream! That's how it is supposed to work! 🏆 Thanks to everyone for making this happen.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Saving zarr to remote location lower cases all data_vars 830507003
839106491	https://github.com/pydata/xarray/issues/5219#issuecomment-839106491	https://api.github.com/repos/pydata/xarray/issues/5219	MDEyOklzc3VlQ29tbWVudDgzOTEwNjQ5MQ==	rabernat 1197350	2021-05-11T20:08:27Z	2021-05-11T20:08:27Z	MEMBER	Instead we could require explicitly supplying `chunks` vis the `encoding` parameter in the `to_zarr()` call. This could also break existing workflows though. For example, pangeo-forge is using the encoding.chunks attribute to specify target dataset chunks.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr encoding attributes persist after slicing data, raising error on `to_zarr` 868352536
832712426	https://github.com/pydata/xarray/issues/3653#issuecomment-832712426	https://api.github.com/repos/pydata/xarray/issues/3653	MDEyOklzc3VlQ29tbWVudDgzMjcxMjQyNg==	rabernat 1197350	2021-05-05T14:01:25Z	2021-05-05T14:01:33Z	MEMBER	Update: there is now a way to read a remote netCDF file from an HTTP server directly using the netcdf-python library. The trick is to append `#mode=bytes` to the end of the url. ```python import xarray as xr import netCDF4 # I'm using version 1.5.6 url = "https://www.ldeo.columbia.edu/~rpa/NOAA_NCDC_ERSST_v3b_SST.nc#mode=bytes" raw netcdf4 Dataset ds = netCDF4.Dataset(url) xarray Dataset ds = xr.open_dataset(url) ```	{ "total_count": 12, "+1": 5, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 6, "rocket": 0, "eyes": 0 }	"[Errno -90] NetCDF: file not found: b" when opening netCDF from server 543197350
831970193	https://github.com/pydata/xarray/pull/5252#issuecomment-831970193	https://api.github.com/repos/pydata/xarray/issues/5252	MDEyOklzc3VlQ29tbWVudDgzMTk3MDE5Mw==	rabernat 1197350	2021-05-04T14:07:03Z	2021-05-04T14:07:03Z	MEMBER	Question: does this mode still require eager loading of dimension coordinates?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Add mode="r+" for to_zarr and use consolidated writes/reads by default 874331538
828071017	https://github.com/pydata/xarray/issues/5219#issuecomment-828071017	https://api.github.com/repos/pydata/xarray/issues/5219	MDEyOklzc3VlQ29tbWVudDgyODA3MTAxNw==	rabernat 1197350	2021-04-28T01:26:34Z	2021-04-28T01:26:34Z	MEMBER	we probably would NOT want to use `safe_chunks=False`, correct? correct The problem in this issue is that the dataset is carrying around its original chunks in `.encoding` and then xarray tries to use these values to set the chunk encoding on the second write op. The solution is to manually delete the chunk encoding from all your data variables. Something like `python for var in ds: del ds[var].encoding['chunks']` Originally part of #5056 was a change that would have xarray automatically do this deletion after some operations (such as calling `.chunk()`); however, we could not reach a consensus on the best way to implement that change. Your example is interesting because it is a slightly different scenario -- calling `sel()` instead of `chunk()` -- but the root cause appears to be the same: `encoding['chunks']` is being kept around too conservatively.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr encoding attributes persist after slicing data, raising error on `to_zarr` 868352536
826913149	https://github.com/pydata/xarray/pull/5065#issuecomment-826913149	https://api.github.com/repos/pydata/xarray/issues/5065	MDEyOklzc3VlQ29tbWVudDgyNjkxMzE0OQ==	rabernat 1197350	2021-04-26T15:08:43Z	2021-04-26T15:08:43Z	MEMBER	I think this PR has received a very thorough review. I would be pleased if someone from @pydata/xarray would merge it soon.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr chunking fixes 837243943
826888674	https://github.com/pydata/xarray/pull/5065#issuecomment-826888674	https://api.github.com/repos/pydata/xarray/issues/5065	MDEyOklzc3VlQ29tbWVudDgyNjg4ODY3NA==	rabernat 1197350	2021-04-26T14:38:49Z	2021-04-26T14:38:49Z	MEMBER	The pre-commit workflow is raising a blackdoc error I am not seeing in my local env ```diff diff --git a/doc/internals/duck-arrays-integration.rst b/doc/internals/duck-arrays-integration.rst index eb5c4d8..2bc3c1f 100644 --- a/doc/internals/duck-arrays-integration.rst +++ b/doc/internals/duck-arrays-integration.rst @@ -25,7 +25,7 @@ argument: ... `def _repr_inline_(self, max_width):` """ format to a single line with at most max_width characters """ """format to a single line with at most max_width characters""" ... ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr chunking fixes 837243943
822571688	https://github.com/pydata/xarray/issues/4554#issuecomment-822571688	https://api.github.com/repos/pydata/xarray/issues/4554	MDEyOklzc3VlQ29tbWVudDgyMjU3MTY4OA==	rabernat 1197350	2021-04-19T15:44:07Z	2021-04-19T15:44:07Z	MEMBER	we rearrange the DataArrays to 2D arrays FWIW, this is the exact same thing we do in xhistorgram in order to apply histogram over a specific group of axes: https://github.com/xgcm/xhistogram/blob/2681aee6fe04e7656c458f32277f87e76653b6e8/xhistogram/core.py#L238-L254 We noticed a similar problem with Dask's reshape implementation, raised here: https://github.com/dask/dask/issues/5544	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Unexpected chunking of 3d DataArray in `polyfit()` 732910109
821315433	https://github.com/pydata/xarray/issues/5172#issuecomment-821315433	https://api.github.com/repos/pydata/xarray/issues/5172	MDEyOklzc3VlQ29tbWVudDgyMTMxNTQzMw==	rabernat 1197350	2021-04-16T17:07:03Z	2021-04-16T17:07:03Z	MEMBER	Yes I agree. Should I just close this and move it to h5netcdf?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Inconsistent attribute handling between netcdf4 and h5netcdf engines 859945463
817990859	https://github.com/pydata/xarray/pull/5065#issuecomment-817990859	https://api.github.com/repos/pydata/xarray/issues/5065	MDEyOklzc3VlQ29tbWVudDgxNzk5MDg1OQ==	rabernat 1197350	2021-04-12T17:27:28Z	2021-04-12T17:27:28Z	MEMBER	Any further feedback on this now reduced-scope PR? Merging this would be helpful for moving forward Pangeo forge.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr chunking fixes 837243943
815019613	https://github.com/pydata/xarray/pull/5065#issuecomment-815019613	https://api.github.com/repos/pydata/xarray/issues/5065	MDEyOklzc3VlQ29tbWVudDgxNTAxOTYxMw==	rabernat 1197350	2021-04-07T15:44:25Z	2021-04-07T15:44:25Z	MEMBER	I have removed the controversial `encoding['chunks']` stuff from the PR. Now it only contains the `safe_chunks` option in `to_zarr`. If there are no further comments on this, I think this is good to go.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr chunking fixes 837243943
814102743	https://github.com/pydata/xarray/pull/5065#issuecomment-814102743	https://api.github.com/repos/pydata/xarray/issues/5065	MDEyOklzc3VlQ29tbWVudDgxNDEwMjc0Mw==	rabernat 1197350	2021-04-06T13:03:53Z	2021-04-06T13:03:53Z	MEMBER	We seem to be unable to resolve the complexities around chunk encoding. I propose to remove this from the PR and reduce the scope to just the `safe_chunks` features. @aurghs should probably be the one to tackle the chunk encoding problem; unfortunately it exceeds my understanding, and I don't have time to dig deeper at the moment. In the meantime `safe_chunks` is important for pangeo-forge forward progress. Please give a 👍 or 👎 to this idea if you have an opinion.	{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr chunking fixes 837243943
811975731	https://github.com/pydata/xarray/pull/5065#issuecomment-811975731	https://api.github.com/repos/pydata/xarray/issues/5065	MDEyOklzc3VlQ29tbWVudDgxMTk3NTczMQ==	rabernat 1197350	2021-04-01T15:12:15Z	2021-04-01T15:12:15Z	MEMBER	But it seems to me that having two different definitions of chunks (dask one and encoded one), is not very intuitive and it's not easy to define a clear default in writing. My use for `encoding.chunks` is to tell Zarr what chunks to use on disk.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr chunking fixes 837243943
811308284	https://github.com/pydata/xarray/pull/5065#issuecomment-811308284	https://api.github.com/repos/pydata/xarray/issues/5065	MDEyOklzc3VlQ29tbWVudDgxMTMwODI4NA==	rabernat 1197350	2021-03-31T18:23:03Z	2021-03-31T18:23:03Z	MEMBER	So any ideas how to proceed? 🧐	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr chunking fixes 837243943
811275436	https://github.com/pydata/xarray/pull/5065#issuecomment-811275436	https://api.github.com/repos/pydata/xarray/issues/5065	MDEyOklzc3VlQ29tbWVudDgxMTI3NTQzNg==	rabernat 1197350	2021-03-31T17:31:53Z	2021-03-31T17:32:12Z	MEMBER	A just pushed a new commit which deletes all encoding inside `variable.chunk()`. But as you will see when the CI finishes, this leads to a lot of test failures. For example: ``` =============================================================================== FAILURES ================================================================================ _______ TestNetCDF4ViaDaskData.testroundtrip_string_encoded_characters ________ self = <xarray.tests.test_backends.TestNetCDF4ViaDaskData object at 0x18cba4c40> `def test_roundtrip_string_encoded_characters(self): expected = Dataset({"x": ("t", ["ab", "cdef"])}) expected["x"].encoding["dtype"] = "S1" with self.roundtrip(expected) as actual: assert_identical(expected, actual)` `assert actual["x"].encoding["_Encoding"] == "utf-8"` E KeyError: '_Encoding' /Users/rpa/Code/xarray/xarray/tests/test_backends.py:485: KeyError ``` Why is `chunk` getting called here? Does it actually get called every time we load a dataset with chunks? If so, we will need a more sophisticated solution.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr chunking fixes 837243943
811265134	https://github.com/pydata/xarray/pull/5065#issuecomment-811265134	https://api.github.com/repos/pydata/xarray/issues/5065	MDEyOklzc3VlQ29tbWVudDgxMTI2NTEzNA==	rabernat 1197350	2021-03-31T17:17:07Z	2021-03-31T17:17:07Z	MEMBER	Replace `self._encoding` with `None` here? Thanks! Yeah that's what I had in mind. But I was wondering if there was an example of doing that it else I could copy. In any case, I'll give it a try now.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr chunking fixes 837243943
811189539	https://github.com/pydata/xarray/pull/5065#issuecomment-811189539	https://api.github.com/repos/pydata/xarray/issues/5065	MDEyOklzc3VlQ29tbWVudDgxMTE4OTUzOQ==	rabernat 1197350	2021-03-31T16:12:13Z	2021-03-31T16:12:23Z	MEMBER	In today's dev call, we proposed to handle encoding in `chunk` the same way we handle it in indexing: by deleting all encoding. The problem is, I can't figure out where this happens. Can someone point me to the place in the code where indexing operations delete encoding? A related question: I discovered this encoding option `preferred_chunks`, which is treated specially: https://github.com/pydata/xarray/blob/57a4479fcd3ebc579cf00e0d6bf85007eda44b56/xarray/core/dataset.py#L396 Should the Zarr backend be setting this?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr chunking fixes 837243943
811148122	https://github.com/pydata/xarray/pull/5065#issuecomment-811148122	https://api.github.com/repos/pydata/xarray/issues/5065	MDEyOklzc3VlQ29tbWVudDgxMTE0ODEyMg==	rabernat 1197350	2021-03-31T15:16:37Z	2021-03-31T15:16:37Z	MEMBER	I appreciate the discussion on this PR. Does anyone have a concrete suggestion of what to do? If we are not in agreement about the encoding stuff, perhaps I should remove that and just move forward with the `safe_chunks` part of this PR?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr chunking fixes 837243943
810683846	https://github.com/pydata/xarray/issues/4470#issuecomment-810683846	https://api.github.com/repos/pydata/xarray/issues/4470	MDEyOklzc3VlQ29tbWVudDgxMDY4Mzg0Ng==	rabernat 1197350	2021-03-31T01:22:29Z	2021-03-31T01:22:29Z	MEMBER	I just saw this very cool tweet about ipyvista / iris integration and it reminded me of this thread. Are there any clear steps we can take to help advance the vtk / pyvista / xarray integration further?	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray / vtk integration 710357592
807128780	https://github.com/pydata/xarray/pull/5065#issuecomment-807128780	https://api.github.com/repos/pydata/xarray/issues/5065	MDEyOklzc3VlQ29tbWVudDgwNzEyODc4MA==	rabernat 1197350	2021-03-25T17:19:15Z	2021-03-25T17:19:15Z	MEMBER	Perhaps a kwarg in `to_zarr` like `ignore_encoding_chunks`? I would argue that this is unnecessary. If you want to explicitly drop encoding, just `del da.encoding['chunks']` before writing. But most users don't figure out that they should do this, because the default behavior is counterintuitive. The problem here is with the default behavior of propagating chunk encoding through computations when it no longer makes sense. My example with the `dtype` encoding illustrates that we already drop encoding on certain operations, so it's not unprecedented. It's more of an implementation question: where and how to do the dropping. FWIW, I would also favor dropping `encoding['chunks']` after indexing, coarsening, interpolating, etc. Basically anything that changes the array shape or chunk structure.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr chunking fixes 837243943
806724345	https://github.com/pydata/xarray/pull/5065#issuecomment-806724345	https://api.github.com/repos/pydata/xarray/issues/5065	MDEyOklzc3VlQ29tbWVudDgwNjcyNDM0NQ==	rabernat 1197350	2021-03-25T13:17:03Z	2021-03-25T13:17:59Z	MEMBER	I see your point. I guess I don't fully understand where else in the code path encoding gets dropped. Consider this example `python import xarray as xr ds = xr.Dataset({'foo': ('time', [1, 1], {'dtype': 'int16'})}) ds = xr.decode_cf(ds).compute() assert "dtype" in ds.foo.encoding assert "dtype" not in (0.5 * ds.foo).encoding` Xarray knows to drop the `dtype` encoding after an arithmetic operation. How does that work? To me `.chunk` feel like a similar case: an operation that invalidates any existing encoding.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr chunking fixes 837243943
806701802	https://github.com/pydata/xarray/issues/4118#issuecomment-806701802	https://api.github.com/repos/pydata/xarray/issues/4118	MDEyOklzc3VlQ29tbWVudDgwNjcwMTgwMg==	rabernat 1197350	2021-03-25T13:01:56Z	2021-03-25T13:05:03Z	MEMBER	So we have: - Numerous promising prototypes to draw from - A technical team who can write the proposal and execute the proposed work (@aurghs & @alexamici of B-open) - Numerous supporting use cases from the bioimaging (@joshmoore), condensed matter (@tacaswell), and bayesian modeling (ArviZ; @OriolAbril) domains We are just missing a PI, someone who is willing to put their name on top of the proposal and click submit. I have gone on record as committed to not leading any new proposals this year. And in any case, this is a good opportunity for someone else from the @pydata/xarray core dev team to try on a leadership role.	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature Request: Hierarchical storage and processing in xarray 628719058
805883595	https://github.com/pydata/xarray/issues/2300#issuecomment-805883595	https://api.github.com/repos/pydata/xarray/issues/2300	MDEyOklzc3VlQ29tbWVudDgwNTg4MzU5NQ==	rabernat 1197350	2021-03-24T14:48:55Z	2021-03-24T14:48:55Z	MEMBER	In #5056, I have implemented the solution of deleting `chunks` from encoding when `chunk()` is called on a variable. A review of that PR would be welcome.	{ "total_count": 2, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 2, "rocket": 0, "eyes": 0 }	zarr and xarray chunking compatibility and `to_zarr` performance 342531772
804050169	https://github.com/pydata/xarray/pull/5065#issuecomment-804050169	https://api.github.com/repos/pydata/xarray/issues/5065	MDEyOklzc3VlQ29tbWVudDgwNDA1MDE2OQ==	rabernat 1197350	2021-03-22T13:12:45Z	2021-03-22T13:12:45Z	MEMBER	Thanks Anderson. Fixed by rebasing. Now RTD build is failing, but there is no obvious error in the logs...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr chunking fixes 837243943
803712024	https://github.com/pydata/xarray/pull/5065#issuecomment-803712024	https://api.github.com/repos/pydata/xarray/issues/5065	MDEyOklzc3VlQ29tbWVudDgwMzcxMjAyNA==	rabernat 1197350	2021-03-22T01:58:23Z	2021-03-22T02:02:00Z	MEMBER	Confused about the test error. It seems unrelated. In `test_sparse.py:test_variable_method` `E TypeError: no implementation found for 'numpy.allclose' on types that implement __array_function__: [<class 'numpy.ndarray'>, <class 'sparse._coo.core.COO'>]`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr chunking fixes 837243943
801240559	https://github.com/pydata/xarray/issues/4118#issuecomment-801240559	https://api.github.com/repos/pydata/xarray/issues/4118	MDEyOklzc3VlQ29tbWVudDgwMTI0MDU1OQ==	rabernat 1197350	2021-03-17T16:47:20Z	2021-03-17T16:47:20Z	MEMBER	On today's Xarray dev call, we discussed pursuing another CZI grant to support this feature in Xarray. The image pyramid use case would provide a strong link to the bioimaging community. @alexamici and the B-open folks seem enthusiastic. I had to leave the meeting early, so I didn't hear the end of the conversation. But did we decide who might serve as PI for such a proposal?	{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 }	Feature Request: Hierarchical storage and processing in xarray 628719058
790088409	https://github.com/pydata/xarray/issues/2300#issuecomment-790088409	https://api.github.com/repos/pydata/xarray/issues/2300	MDEyOklzc3VlQ29tbWVudDc5MDA4ODQwOQ==	rabernat 1197350	2021-03-03T21:55:44Z	2021-03-03T21:55:44Z	MEMBER	alternatively `to_zarr` could ignore `encoding["chunks"]` when the data is already chunked? I would not favor that. A user may choose to define their desired zarr chunks by putting this information in encoding. In this case, it's good to raise the error. (This is the case I had in mind when I wrote this code.) The problem here is that encoding is often being carried over from the original dataset and persisted across operations that change chunk size.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr and xarray chunking compatibility and `to_zarr` performance 342531772
789974968	https://github.com/pydata/xarray/issues/2300#issuecomment-789974968	https://api.github.com/repos/pydata/xarray/issues/2300	MDEyOklzc3VlQ29tbWVudDc4OTk3NDk2OA==	rabernat 1197350	2021-03-03T18:54:43Z	2021-03-03T18:54:43Z	MEMBER	I think we are all in agreement. Just waiting for someone to make a PR. It's probably just a few lines of code changes.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	zarr and xarray chunking compatibility and `to_zarr` performance 342531772
761136148	https://github.com/pydata/xarray/issues/4691#issuecomment-761136148	https://api.github.com/repos/pydata/xarray/issues/4691	MDEyOklzc3VlQ29tbWVudDc2MTEzNjE0OA==	rabernat 1197350	2021-01-15T19:18:50Z	2021-01-15T19:18:50Z	MEMBER	cc @martindurant for fsspec issue	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Non-HTTPS remote URLs no longer work as input for open_zarr 766826777
758373462	https://github.com/pydata/xarray/issues/4789#issuecomment-758373462	https://api.github.com/repos/pydata/xarray/issues/4789	MDEyOklzc3VlQ29tbWVudDc1ODM3MzQ2Mg==	rabernat 1197350	2021-01-12T03:36:26Z	2021-01-12T03:36:26Z	MEMBER	I uncovered this issue with Dask's SVG in its `_repr_html` function: https://github.com/dask/dask/issues/6670. The fix made a big difference in repr size. Possibly related?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Poor performance of repr of large arrays, particularly jupyter repr 782943813
741949159	https://github.com/pydata/xarray/pull/4461#issuecomment-741949159	https://api.github.com/repos/pydata/xarray/issues/4461	MDEyOklzc3VlQ29tbWVudDc0MTk0OTE1OQ==	rabernat 1197350	2020-12-09T18:02:03Z	2020-12-09T18:02:11Z	MEMBER	I think @shoyer has laid out the options in a very clear way. I weakly favor option 2, as I think it preferable in terms of software architecture and our broader roadmap for Xarray. However, I am cognizant of the significant effort that @martindurant has put into this, and I don't want his effort to go to waste. Some mitigating factors are: - The example I gave above (https://github.com/pydata/xarray/pull/4461#issuecomment-741939277) shows that one high-impact feature that users want (async capabilities in Zarr) already works, albiet with a different syntax. So this PR is more about convenience. - Presumably the knowledge about Xarray that Martin has gained by implementing this PR is transferrable to a different context, and so we would not be starting from scratch if we went with 2.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow fsspec/zarr/mfdataset 709187212
741939277	https://github.com/pydata/xarray/pull/4461#issuecomment-741939277	https://api.github.com/repos/pydata/xarray/issues/4461	MDEyOklzc3VlQ29tbWVudDc0MTkzOTI3Nw==	rabernat 1197350	2020-12-09T17:44:55Z	2020-12-09T17:44:55Z	MEMBER	@rsignell-usgs: note that your example works without this PR (but with the just released zarr 2.6.1) as follows `python mapper = fsspec.get_mapper('s3://noaa-nwm-retro-v2.0-zarr-pds') ds = xr.open_zarr(mapper, consolidated=True)` Took 4s on my laptop (outside of AWS).	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow fsspec/zarr/mfdataset 709187212
736786380	https://github.com/pydata/xarray/issues/4631#issuecomment-736786380	https://api.github.com/repos/pydata/xarray/issues/4631	MDEyOklzc3VlQ29tbWVudDczNjc4NjM4MA==	rabernat 1197350	2020-12-01T20:03:54Z	2020-12-01T20:03:54Z	MEMBER	Ok then I am 👍 on @dcherian's solution.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Decode_cf fails when scale_factor is a length-1 list 753965875
736526797	https://github.com/pydata/xarray/issues/4631#issuecomment-736526797	https://api.github.com/repos/pydata/xarray/issues/4631	MDEyOklzc3VlQ29tbWVudDczNjUyNjc5Nw==	rabernat 1197350	2020-12-01T12:39:53Z	2020-12-01T12:39:53Z	MEMBER	But what did we do before?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Decode_cf fails when scale_factor is a length-1 list 753965875

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

731 rows where user = 1197350 sorted by updated_at descending

the latest array metadata

-> "shape": [ 6 ]

the consolidated metadata

-> "shape": [ 3 ]

save dataset as netcdf

load saved dataset

verify that the two are equal within numerical precision

plot

load the zarr store

raw netcdf4 Dataset

xarray Dataset

Advanced export