html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1303#issuecomment-285380106,https://api.github.com/repos/pydata/xarray/issues/1303,285380106,MDEyOklzc3VlQ29tbWVudDI4NTM4MDEwNg==,1197350,2017-03-09T15:18:18Z,2024-02-06T17:57:21Z,MEMBER,Just wanted to link to a somewhat related discussion happening in [brian-rose/climlab#50](https://github.com/climlab/climlab/issues/50).,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,213004586
https://github.com/pydata/xarray/issues/3213#issuecomment-1534724554,https://api.github.com/repos/pydata/xarray/issues/3213,1534724554,IC_kwDOAMm_X85begnK,1197350,2023-05-04T12:51:59Z,2023-05-04T12:51:59Z,MEMBER,"> I suspect (but don't know, as I'm just a user of xarray, not a developer) that it's also not thoroughly _tested_.

Existing sparse testing is here: https://github.com/pydata/xarray/blob/main/xarray/tests/test_sparse.py

We would welcome enhancements to this!
","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,479942077
https://github.com/pydata/xarray/issues/3213#issuecomment-1534001190,https://api.github.com/repos/pydata/xarray/issues/3213,1534001190,IC_kwDOAMm_X85bbwAm,1197350,2023-05-04T02:36:57Z,2023-05-04T02:36:57Z,MEMBER,"Hi @jdbutler and welcome! We would welcome this sort of contribution eagerly.

I would characterize our current support of sparse arrays as really just a proof of concept. When to use sparse and how to do it effectively is not well documented. Simply adding more documentation around the already-supported use cases would be a great place to start IMO.

My own exploration of this are described in [this Pangeo post](https://discourse.pangeo.io/t/conservative-region-aggregation-with-xarray-geopandas-and-sparse/2715). The use case is regridding. It touches on quite a few of the points you're interested in, in particular the integration with geodataframe. Along similar lines, @dcherian has been working on using opt_einsum together with sparse in https://github.com/pangeo-data/xESMF/issues/222#issuecomment-1524041837 and https://github.com/pydata/xarray/issues/7764. 

I'd also suggest catching up on what @martinfleis is doing with vector data cubes in [xvec](https://github.com/xarray-contrib/xvec). (See also [Pangeo post on this topic](https://discourse.pangeo.io/t/vector-data-cubes/2904).)

Of the three topics you enumerated, I'm most interested in the serialization one. However, I'd rather see serialization of sparse arrays prototyped in Zarr, as its much more conducive to experimentation than NetCDF (which requires writing C to do anything custom). I would recommend exploring serialization from a sparse array in memory to a sparse format on disk via a [custom codec](https://numcodecs.readthedocs.io/). Zarr recently added support for a `meta_array` parameter that determines what array type is materialized by the codec pipeline (see https://github.com/zarr-developers/zarr-python/pull/1131). The use case there was loading data [direct to GPU](https://xarray.dev/blog/xarray-kvikio). In a way sparse is similar--it's an array container that is not numpy or dask.




","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,479942077
https://github.com/pydata/xarray/issues/7764#issuecomment-1524332001,https://api.github.com/repos/pydata/xarray/issues/7764,1524332001,IC_kwDOAMm_X85a23Xh,1197350,2023-04-27T00:56:21Z,2023-04-27T00:56:21Z,MEMBER,"Is there ever a case where it would be preferable to use numpy if opt_einsum were installed? If not, I would propose that, like bottleneck, we just automatically use it if available.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1672288892
https://github.com/pydata/xarray/issues/7716#issuecomment-1497579600,https://api.github.com/repos/pydata/xarray/issues/7716,1497579600,IC_kwDOAMm_X85ZQ0BQ,1197350,2023-04-05T14:23:57Z,2023-04-05T14:23:57Z,MEMBER,Do we have a plan to support pandas 2?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1654022522
https://github.com/pydata/xarray/issues/6323#issuecomment-1492139481,https://api.github.com/repos/pydata/xarray/issues/6323,1492139481,IC_kwDOAMm_X85Y8D3Z,1197350,2023-03-31T15:31:55Z,2023-03-31T15:31:55Z,MEMBER,We should also consider a configuration option to automatically drop encoding.,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1158378382
https://github.com/pydata/xarray/issues/7039#issuecomment-1460185069,https://api.github.com/repos/pydata/xarray/issues/7039,1460185069,IC_kwDOAMm_X85XCKft,1197350,2023-03-08T13:51:06Z,2023-03-08T13:51:06Z,MEMBER,"Rather than using the scale_factor and add_offset approach, I would look into [xbitinfo](https://xbitinfo.readthedocs.io/) if you want to optimize your compression.","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1373352524
https://github.com/pydata/xarray/pull/7540#issuecomment-1460182260,https://api.github.com/repos/pydata/xarray/issues/7540,1460182260,IC_kwDOAMm_X85XCJz0,1197350,2023-03-08T13:48:51Z,2023-03-08T13:49:21Z,MEMBER,"Regarding locks, I think we need to think hard about the best way to deal with this across the stack. There are a couple of different options:
- Current status: just use a global lock on the entire array--super inefficient
- A bit better: use per-variable locks
- Even better: have locks at the shard level. This would allow concurrent writing of shards
- Alternative which accomplishes the same thing: expose different virtual chunks when reading vs. writing. When writing, the writer library (e.g. Xarray or Dask) would see the shards as the chunks (with a lower layer of the stack handling breaking the shard down into chunks). When reading, the individual, smaller chunks would be accessible.

Note that there are still some deep inefficiencies in the way zarr-python writes shards (see https://github.com/zarr-developers/zarr-python/discussions/1338). I think we should be optimizing things at the Zarr level first, before implementing workarounds in Xarray.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1588516592
https://github.com/pydata/xarray/pull/7540#issuecomment-1460175664,https://api.github.com/repos/pydata/xarray/issues/7540,1460175664,IC_kwDOAMm_X85XCIMw,1197350,2023-03-08T13:44:02Z,2023-03-08T13:44:02Z,MEMBER,"It's great to see this PR get started in Xarray! Thanks @JMorado!

From the perspective of a Zarr developer, the sharding feature is still highly experimental. The API may change significantly. While the sharding code is released in the sense that it is available deep in Zarr, it is not really considered part of the public API yet.

So perhaps it's a bit too early to be doing this?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1588516592
https://github.com/pydata/xarray/issues/7515#issuecomment-1422860618,https://api.github.com/repos/pydata/xarray/issues/7515,1422860618,IC_kwDOAMm_X85UzyFK,1197350,2023-02-08T16:05:13Z,2023-02-08T16:47:59Z,MEMBER,"It seems like there are at least 3 separate topics being discussed here.

1. Could Xarray wrap Aesara / PyTensor arrays, in the same way it wraps numpy arrays, Dask arrays, cupy arrays, sparse arrays, pint arrays, etc? This way, Xarray users could benefit from the performance and other features of Aesara while keeping the high-level analysis API they know and love. AFAIU, Any array library that implements the [NEP 37](https://numpy.org/neps/nep-0037-array-module.html) protocol should be wrappable. This is Joe's original topic. 
2. Should Aesara / PyTensor implement their own versions of named dimensions and coordinates? This is an internal question for those projects. Not the original topic, but nevertheless we would love to help by exposing some Xarray internals for reuse by other packages (this is on our [roadmap](https://docs.xarray.dev/en/stable/roadmap.html#labeled-array-without-coordinates)). It would be a shame to reinvent wheels unnecessarily. I would be interested in understanding the tradeoffs and different use cases between this and topic 1. 
3. Pre-existing tensions between Aesara and PyTensor. Since this conversation is happening on our issue tracker, I'll point to our [code of conduct](https://github.com/pydata/xarray/blob/main/CODE_OF_CONDUCT.md) and hope that the conversation can remain positive and respectful of all viewpoints. From our point of view as Xarray devs, PyTensor and Aesara do indeed seem quite similar in scope. It would be wonderful if we could all work together in some way towards topic 1.","{""total_count"": 8, ""+1"": 8, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1575494367
https://github.com/pydata/xarray/pull/7142#issuecomment-1416643026,https://api.github.com/repos/pydata/xarray/issues/7142,1416643026,IC_kwDOAMm_X85UcEHS,1197350,2023-02-04T03:02:09Z,2023-02-04T03:02:09Z,MEMBER,"I just noticed our very low coverage rating and found this PR. Did this PR work? Should we update it and merge?

It would be great to have our coverage back in the 90s rather than the 50s 😝 .","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1401132297
https://github.com/pydata/xarray/pull/7496#issuecomment-1412408324,https://api.github.com/repos/pydata/xarray/issues/7496,1412408324,IC_kwDOAMm_X85UL6QE,1197350,2023-02-01T17:06:47Z,2023-02-01T17:06:47Z,MEMBER,It is true that Xarray is now becoming very different from pandas in how it opens data.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1564661430
https://github.com/pydata/xarray/issues/7446#issuecomment-1385683582,https://api.github.com/repos/pydata/xarray/issues/7446,1385683582,IC_kwDOAMm_X85Sl9p-,1197350,2023-01-17T16:23:01Z,2023-01-17T16:23:01Z,MEMBER,"Hi @gauteh! This is very cool! Thanks for sharing. I'm really excited about way that Rust can be used to optimized different parts of our stack.

A couple of questions:
- Can your reader read over HTTP / S3 protocol? Or is it just local files? 
- Do you know about [kerchunk](https://fsspec.github.io/kerchunk/)? The approach you described: 
  > The reader works by indexing the chunks of a dataset so that chunks can be accessed independently.
  
  ...is identical to the approach taken by Kerchunk (although the implementation is different). I'm curious what specification you use to store your indexes. Could we make your implementation interoperable with kerchunk, such that a kerchunk reference specification could be read by your reader? It would be great to reach for some degree of alignment here.
- Do you know about hdf5-coro - http://icesat2sliderule.org/h5coro/ - they have similar goals, but focused on cloud-based access

> I hope this can be of general interest, and if it would be of interest to move the hidefix xarray backend into xarray that would be very cool.

This is definitely of general interest! However, it is not necessary to add a new backend directly into xarray. We support entry points which allow packages to implement their own readers, as you have apparently already discovered: https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html

Installing your package should be enough to enable the new engine.

We would, however, welcome a documentation PR that described how to use this package on the I/O page.

","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1536004355
https://github.com/pydata/xarray/pull/7418#issuecomment-1378079073,https://api.github.com/repos/pydata/xarray/issues/7418,1378079073,IC_kwDOAMm_X85SI9Fh,1197350,2023-01-11T00:34:03Z,2023-01-11T00:34:03Z,MEMBER,"> we should carefully evaluate the datatree API to make sure we won't want to change it soon

I agree with this. We could use the PR process for this review.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1519552711
https://github.com/pydata/xarray/issues/3996#issuecomment-1373993285,https://api.github.com/repos/pydata/xarray/issues/3996,1373993285,IC_kwDOAMm_X85R5XlF,1197350,2023-01-06T18:36:56Z,2023-01-06T18:47:48Z,MEMBER,"We found a nice solution to this using @TomNicholas's Datatree

```python
import xarray as xr
import datatree

dt = datatree.open_datatree(""AQUA_MODIS.20220809T182500.L2.OC.nc"")

def fix_dimension_names(ds):
    if 'pixel_control_points' in ds.dims:
        ds = ds.swap_dims({'pixel_control_points': 'pixels_per_line'})
    return ds

dt_fixed = dt.map_over_subtree(fix_dimension_names)

all_dsets = [subtree.ds for node, subtree in dt_fixed.items()]
ds = xr.merge(all_dsets, combine_attrs=""drop_conflicts"")
ds = ds.set_coords(['latitude', 'longitude'])

ds.chlor_a.plot(x=""longitude"", y=""latitude"", robust=True)
```

![image](https://user-images.githubusercontent.com/1197350/211076797-1c53f08c-4359-40a2-a409-d748e20b7da1.png)
","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 1, ""eyes"": 0}",,605608998
https://github.com/pydata/xarray/pull/7418#issuecomment-1372822656,https://api.github.com/repos/pydata/xarray/issues/7418,1372822656,IC_kwDOAMm_X85R05yA,1197350,2023-01-05T21:50:53Z,2023-01-05T21:50:53Z,MEMBER,I personally favor just copying the code into Xarray and archiving the old repo.,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1519552711
https://github.com/pydata/xarray/pull/7418#issuecomment-1372802153,https://api.github.com/repos/pydata/xarray/issues/7418,1372802153,IC_kwDOAMm_X85R00xp,1197350,2023-01-05T21:31:33Z,2023-01-05T21:31:33Z,MEMBER,"
> * At what stage is atatree ""ready"" to moved in here? At what stage should it become encouraged public API?

My opinion is that Datatree should move into Xarray now, ideally in a way that does not disrupt any existing user code, and that Datatree should become a first-class Xarray object (together with DataArray, and Dataset). Since it's a new feature, we don't necessarily have to be super conservative here. I think it is more than good enough / stable enough in its current state.

> * What's a good way to slowly roll the feature out?

Since Datatree sits _above_ DataArray and Dataset, it should not interfere with any of our existing API. As long as test coverage is good, documentation is solid, and the code style matches the rest of Xarray, I think we can just bring it in.

> * How do I decrease the bus factor on datatree's code? Can I get some code reviews during the merging process? 🙏

I think that it is inevitable that you Tom will be the main owner of the Datatree code at the beginning (as @shoyer was of all of Xarray when he first released it). Over time, if people use it, some fraction of users will become maintainers, starting with the existing dev team. 

> * Should I make a new CI environment just for testing datatree stuff?

Why? Are its dependencies different from Xarray?","{""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1519552711
https://github.com/pydata/xarray/issues/5878#issuecomment-1315553661,https://api.github.com/repos/pydata/xarray/issues/5878,1315553661,IC_kwDOAMm_X85OacF9,1197350,2022-11-15T16:22:30Z,2022-11-15T16:22:30Z,MEMBER,"Your issue is that the consolidated metadata have not been updated:

```python
import gcsfs
fs = gcsfs.GCSFileSystem()
# the latest array metadata
print(fs.cat('gs://ldeo-glaciology/append_test/test30/temperature/.zarray').decode())
# ->     ""shape"": [ 6 ]
# the consolidated metadata
print(fs.cat(''gs://ldeo-glaciology/append_test/test30/.zmetadata'').decode())
# ->     ""shape"": [ 3 ]
```

There are two ways to fix this.

1. Don't use consolidated metadatda on read. (This will be a bit slower)
   ```python
   ds = xr.open_dataset('gs://ldeo-glaciology/append_test/test30', engine='zarr', consolidated=False)
   ```
1. Reconsolidate your metadata after append. https://zarr.readthedocs.io/en/stable/tutorial.html#consolidating-metadata

","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1030811490
https://github.com/pydata/xarray/issues/6308#issuecomment-1300863799,https://api.github.com/repos/pydata/xarray/issues/6308,1300863799,IC_kwDOAMm_X85NiZs3,1197350,2022-11-02T16:39:53Z,2022-11-02T16:39:53Z,MEMBER,Just found this issue! I agree that this would be helpful. But isn't it fundamentally a Dask issue? Vanilla Xarray + Numpy has none of these problems because everything is in memory.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1151751524
https://github.com/pydata/xarray/issues/6818#issuecomment-1255550548,https://api.github.com/repos/pydata/xarray/issues/6818,1255550548,IC_kwDOAMm_X85K1i5U,1197350,2022-09-22T21:09:15Z,2022-09-22T21:09:15Z,MEMBER,"I just hit this same bug with numpy 1.23.3. Installing xarray from github main branch fixed it.

I think we really need to release soon (#7069).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1315607023
https://github.com/pydata/xarray/issues/7039#issuecomment-1248302788,https://api.github.com/repos/pydata/xarray/issues/7039,1248302788,IC_kwDOAMm_X85KZ5bE,1197350,2022-09-15T16:02:17Z,2022-09-15T16:02:17Z,MEMBER,"> I am curious as to what exactly from the encoding introduces the noise (I still need to read through the documentation more thoroughly)?

The encoding says that your data should be encoded according to the following pseudocode formula:
```
encoded = int((original - offset) / scale_factor)
decoded = (scale_factor * float(encoded)) + offset
```

So the floating-point data are converted back and forth to a less precise type (integer) in order to save space. These numerical operations cannot preserve exact floating point accuracy. That's just how numerical float-point operations work. If you skip the encoding, then you just write the floating point bytes directly to disk, with no loss of precision. 

This sort of encoding a crude form of lossy compression that is still unfortunately in use, even though there are much better algorithms available (and built into netcdf and zarr). Differences on the order of 10^-14 should not affect any real-world calculations.

However, this seems like a much, much smaller difference than the problem you originally reported. This suggests that the MRE does not actually reproduce the bug after all. How was the plot above (https://github.com/pydata/xarray/issues/7039#issue-1373352524) generated? From your actual MRE code? Or from your earlier example with real data?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1373352524
https://github.com/pydata/xarray/issues/7039#issuecomment-1248241823,https://api.github.com/repos/pydata/xarray/issues/7039,1248241823,IC_kwDOAMm_X85KZqif,1197350,2022-09-15T15:12:34Z,2022-09-15T15:12:34Z,MEMBER,"I'm puzzled that I was not able to reproduce this error. I modified the end slightly as follows

```python
# save dataset as netcdf
ds.to_netcdf(""test.nc"")
# load saved dataset
ds_test = xr.open_dataset('test.nc')
# verify that the two are equal within numerical precision
xr.testing.assert_allclose(ds, ds_test)

# plot
plt.plot(ds.t2m - ds_test.t2m)
```

In my case, the differences were just numerical noise (order 10^-14)
![image](https://user-images.githubusercontent.com/1197350/190440267-36547265-d107-4adb-8286-e9faf317eeeb.png)

I used the [binder environment](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb) for this.

I'm pretty stumped.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1373352524
https://github.com/pydata/xarray/issues/7039#issuecomment-1248098918,https://api.github.com/repos/pydata/xarray/issues/7039,1248098918,IC_kwDOAMm_X85KZHpm,1197350,2022-09-15T13:25:11Z,2022-09-15T13:25:11Z,MEMBER,Thanks so much for taking the time to write up this detailed bug report! 🙏 ,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1373352524
https://github.com/pydata/xarray/issues/2812#issuecomment-1246005938,https://api.github.com/repos/pydata/xarray/issues/2812,1246005938,IC_kwDOAMm_X85KRIqy,1197350,2022-09-13T22:18:31Z,2022-09-13T22:18:31Z,MEMBER,"Glad you got it working! So you're saying it _does not work_ with `open_zarr` and _does work_ with `open_dataset(...engine='zarr')`? Weird. We should deprecate `open_zarr`.

> However, the behavior in Dask is strange. I think it is making each worker have its own cache and blowing up memory if I ask for a large cache.

Yes, I think I experienced that as well. I think the entire cache is serialized and passed around between workers.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,421029352
https://github.com/pydata/xarray/issues/2812#issuecomment-1243823078,https://api.github.com/repos/pydata/xarray/issues/2812,1243823078,IC_kwDOAMm_X85KIzvm,1197350,2022-09-12T14:25:39Z,2022-09-12T14:25:39Z,MEMBER,"I have successfully used the Zarr LRU cache with Xarray. You just have to initialize the Store object outside of Xarray and then pass it to `open_zarr` or `open_dataset(store, engine=""zarr"")`.

Have you tried that?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,421029352
https://github.com/pydata/xarray/issues/6916#issuecomment-1216491512,https://api.github.com/repos/pydata/xarray/issues/6916,1216491512,IC_kwDOAMm_X85Igi_4,1197350,2022-08-16T11:11:38Z,2022-08-16T11:11:38Z,MEMBER,"As a general principle, I think we should try to put enough information in `encoding` to enable one to re-open the dataset from scratch with the same parameters. So that would mean including the engine and other `open_dataset` options in `encoding`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1339129609
https://github.com/pydata/xarray/pull/6721#issuecomment-1170451917,https://api.github.com/repos/pydata/xarray/issues/6721,1170451917,IC_kwDOAMm_X85Fw63N,1197350,2022-06-29T20:15:15Z,2022-06-29T20:15:15Z,MEMBER,Awesome work!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1284071791
https://github.com/pydata/xarray/issues/6662#issuecomment-1146377099,https://api.github.com/repos/pydata/xarray/issues/6662,1146377099,IC_kwDOAMm_X85EVFOL,1197350,2022-06-03T21:30:48Z,2022-06-03T21:30:48Z,MEMBER,"Following up on the suggestion from @shoyer in to not use a context manager, if I redefine my function as

```python
def open_pickle_and_reload(path):
    of = fsspec.open(path, mode='rb').open()
    ds1 = xr.open_dataset(of, engine='h5netcdf')

    # pickle it and reload it
    ds2 = loads(dumps(ds1))
    ds2.load()
```

...it appears to work fine.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1260047355
https://github.com/pydata/xarray/issues/6662#issuecomment-1146184372,https://api.github.com/repos/pydata/xarray/issues/6662,1146184372,IC_kwDOAMm_X85EUWK0,1197350,2022-06-03T17:05:00Z,2022-06-03T17:06:26Z,MEMBER,"```python
with fsspec.open('http://127.0.0.1:8000/tiny.nc', mode='rb') as fp:
    with xr.open_dataset(fp, engine='h5netcdf') as ds1:
        print(type(fp))
        print(fp.__dict__)
        ds1.load()
```

```
<class 'fsspec.implementations.http.HTTPFile'>
{'asynchronous': False, 'url': 'http://127.0.0.1:8000/tiny.nc',
'session': <aiohttp.client.ClientSession object at 0x18bcdddc0>,
'_details': {'name': 'http://127.0.0.1:8000/tiny.nc', 'size': 6164, 'type': 'file'},
'size': 6164, 'path': 'http://127.0.0.1:8000/tiny.nc',
'fs': <fsspec.implementations.http.HTTPFileSystem object at 0x110059dc0>,
'mode': 'rb', 'blocksize': 5242880, 'loc': 1075,
'autocommit': True, 'end': None, 'start': None, '_closed': False,
'kwargs': {}, 'cache': <fsspec.caching.BytesCache object at 0x18eda16d0>,
'loop': <_UnixSelectorEventLoop running=True closed=False debug=False>}
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1260047355
https://github.com/pydata/xarray/issues/6662#issuecomment-1146119478,https://api.github.com/repos/pydata/xarray/issues/6662,1146119478,IC_kwDOAMm_X85EUGU2,1197350,2022-06-03T16:04:21Z,2022-06-03T16:05:40Z,MEMBER,"The `http.server` apparently does not accept range requests. That could definitely be related. However, I don't understand why that would affect only the pickled version. If the server doesn't support range requests, how are we able to load the file at all? This works:

```python
with fsspec.open('http://127.0.0.1:8000/tiny.nc', mode='rb') as fp:
    with xr.open_dataset(fp, engine='h5netcdf') as ds1:
        ds1.load()
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1260047355
https://github.com/pydata/xarray/issues/6662#issuecomment-1146099479,https://api.github.com/repos/pydata/xarray/issues/6662,1146099479,IC_kwDOAMm_X85EUBcX,1197350,2022-06-03T15:54:34Z,2022-06-03T15:54:34Z,MEMBER,"> Python's HTTP server does not normally provide content lengths without some extra work, that might be the difference.

Don't think that's it.

```
% curl -I ""http://127.0.0.1:8000/tiny.nc""
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/3.9.9
Date: Fri, 03 Jun 2022 15:53:52 GMT
Content-type: application/x-netcdf
Content-Length: 6164
Last-Modified: Fri, 03 Jun 2022 15:00:52 GMT
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1260047355
https://github.com/pydata/xarray/issues/6633#issuecomment-1137851771,https://api.github.com/repos/pydata/xarray/issues/6633,1137851771,IC_kwDOAMm_X85D0j17,1197350,2022-05-25T21:10:44Z,2022-05-25T21:10:44Z,MEMBER,Yes it is definitely a pathological example. 💣  But the fact remains that there are many cases where we just want to discover dataset contents as quickly as possible and want to avoid the cost of loading coordinates and creating indexes.,"{""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1247010680
https://github.com/pydata/xarray/issues/6633#issuecomment-1137821786,https://api.github.com/repos/pydata/xarray/issues/6633,1137821786,IC_kwDOAMm_X85D0cha,1197350,2022-05-25T20:34:30Z,2022-05-25T20:34:59Z,MEMBER,"Here is an example that really highlights the performance cost of always loading dimension coordinates:

```python
import zarr
store = zarr.storage.FSStore(""s3://mur-sst/zarr/"", anon=True)
%time list(zarr.open_consolidated(store)) # -> Wall time: 86.4 ms
%time ds = xr.open_dataset(store, engine='zarr') # -> Wall time: 17.1 s
```

`%prun` confirms that Xarray is spending most of its time just loading data for the `time` axis, which you can reproduce at the zarr level as:

```python
zgroup = zarr.open_consolidated(store)
%time _ = zgroup['time'][:] # -> Wall time: 14.7 s
```

Obviously this example is pretty extreme. There are things that could be done to optimize it, etc. But it really highlights the costs of eagerly loading dimension coordinates. If I don't care about label-based indexing for this dataset, I would rather have my 17s back!

:+1: to ""`indexes={}` (empty dictionary) to explicitly skip creating indexes"".
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1247010680
https://github.com/pydata/xarray/issues/4628#issuecomment-1122649316,https://api.github.com/repos/pydata/xarray/issues/4628,1122649316,IC_kwDOAMm_X85C6kTk,1197350,2022-05-10T17:00:47Z,2022-05-10T17:02:34Z,MEMBER,"> Any pointers regarding where to start / modules involved to implement this? I would like to have a try.

The starting point would be to look at the code in [indexing.py](https://github.com/pydata/xarray/blob/main/xarray/core/indexing.py) and try to understand how lazy indexing works.

In particular, look at 

https://github.com/pydata/xarray/blob/3920c48d61d1f213a849bae51faa473b9c471946/xarray/core/indexing.py#L465-L470

Then you may want to try writing a class that looks like

```python
 class LazilyConcatenatedArray:  # have to decide what to inherit from 
  
    def __init__(self, *arrays: LazilyIndexedArray, concat_axis=0):
        # figure out what you need to keep track of

    @property
    def shape(self):
        # figure out how to determine the total shape

    def __getitem__(self, indexer) -> LazilyIndexedArray:
        # figure out how to map an indexer to the right piece of data
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,753852119
https://github.com/pydata/xarray/issues/6588#issuecomment-1122567902,https://api.github.com/repos/pydata/xarray/issues/6588,1122567902,IC_kwDOAMm_X85C6Qbe,1197350,2022-05-10T15:48:03Z,2022-05-10T15:48:03Z,MEMBER,Oops sorry for the duplicate issue! 🤦 ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1231184996
https://github.com/pydata/xarray/pull/6566#issuecomment-1115292947,https://api.github.com/repos/pydata/xarray/issues/6566,1115292947,IC_kwDOAMm_X85CegUT,1197350,2022-05-02T19:46:06Z,2022-05-02T19:46:06Z,MEMBER,"Exposing this options seems like a great idea IMO.

I'm not sure the best way to test this. I think the most basic test is just to make sure the `inline=True` option gets invoked in the test suite. Going further, one could examine the dask graph to make sure inlining is actually happening, but that sounds fragile and maybe also not xarray's responsibility. Let's just make sure it gets to dask.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1223270563
https://github.com/pydata/xarray/issues/6538#issuecomment-1113408611,https://api.github.com/repos/pydata/xarray/issues/6538,1113408611,IC_kwDOAMm_X85CXURj,1197350,2022-04-29T14:46:13Z,2022-04-29T14:46:13Z,MEMBER,"Thanks so much for opening this @philippjfr!

I agree this is a major regression. Accessing `.chunk` on a variable should not trigger eager loading of the data.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1220990859
https://github.com/pydata/xarray/issues/6484#issuecomment-1102992117,https://api.github.com/repos/pydata/xarray/issues/6484,1102992117,IC_kwDOAMm_X85BvlL1,1197350,2022-04-19T19:08:31Z,2022-04-19T19:08:31Z,MEMBER,Big :+1:,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1203835220
https://github.com/pydata/xarray/issues/6448#issuecomment-1099797820,https://api.github.com/repos/pydata/xarray/issues/6448,1099797820,IC_kwDOAMm_X85BjZU8,1197350,2022-04-15T02:38:48Z,2022-04-15T02:38:48Z,MEMBER,"I am guilty of sidetracking this issue into the politics of CRS encoding. That discussion is important. But in the meantime, @wankoelias's original issue reveals is narrower technical issue with Xarray's Zarr writer: **Xarray won't let you serialize a dictionary attribute to zarr, even though zarr has no problem with this**. That is a problem we can fix pretty easily. 

The `_validate_attrs` helper function was just borrowed from `to_netcdf`:

https://github.com/pydata/xarray/blob/586992e8d2998751cb97b1cab4d3caa9dca116e0/xarray/backends/api.py#L133-L135

We could refactor this function to be more flexible to account for zarr's broader range of allowed attribute types (as we have evidently already done for h5netcdf). Or we could just bypass it completely in the `to_zarr` method. That is the only real decision we need to make here right now.

@wankoelias - you seem to understand the issue pretty well. Would you be game for making a PR? We would be glad to support you along the way.","{""total_count"": 2, ""+1"": 0, ""-1"": 0, ""laugh"": 2, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1194993450
https://github.com/pydata/xarray/issues/6448#issuecomment-1091703481,https://api.github.com/repos/pydata/xarray/issues/6448,1091703481,IC_kwDOAMm_X85BEhK5,1197350,2022-04-07T12:57:17Z,2022-04-07T12:57:17Z,MEMBER,@christophenoel - I share your perspective. But there is a huge swath of the geospatial world who basically hate NetCDF and avoid it like the plague. These communities prefer to use geotiff and GDAL. We need to reach for interoperability.,"{""total_count"": 2, ""+1"": 0, ""-1"": 0, ""laugh"": 1, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,1194993450
https://github.com/pydata/xarray/issues/6448#issuecomment-1090742693,https://api.github.com/repos/pydata/xarray/issues/6448,1090742693,IC_kwDOAMm_X85BA2ml,1197350,2022-04-06T20:21:20Z,2022-04-06T20:22:40Z,MEMBER,"I think the core problem here is that Zarr itself supports arbitrary json data structures as attributes, but netCDF does not. The Zarr serialization in Xarray is designed to emulate netCDF, but we could make that optional, for example, with a flag to bypass attribute encoding / decoding and just pass the python data directly through to Zarr. 

However, my concern would be that netCDF4 C library would not be able to read those files (nczarr). What happens if you try to open up a GDAL-created Zarr with netCDF4?

FWIW, the new [GeoZarr Spec](https://github.com/christophenoel/geozarr-spec/blob/main/geozarr-spec.md#coordinate-reference-system) by @christophenoel does not use the GDAL convention for CRS. Instead, it recommends to use CF conventions for encoding CRS. This is more compatible with NetCDF, but won't be parsed correctly by GDAL.

I am a little discouraged that we have not managed to align better across projects so far (e.g. having this conversation before the GDAL Zarr CRS convention was implemented). 😞  For example, either of these two GDAL PRs:
- https://github.com/OSGeo/gdal/pull/3896
- https://github.com/OSGeo/gdal/pull/4521

However, it is not too late! Let's try to reach for a standard way of encoding CRS in Zarr that can be used across languages and implementations of Zarr.

My own preference would be to try to get GDAL to support the GeoZarr Spec and thus the CF-convention CRS attribute, rather than trying to get Xarray to be able to write the GDAL CRS convention.","{""total_count"": 7, ""+1"": 7, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1194993450
https://github.com/pydata/xarray/issues/6374#issuecomment-1076810559,https://api.github.com/repos/pydata/xarray/issues/6374,1076810559,IC_kwDOAMm_X85ALtM_,1197350,2022-03-23T20:54:39Z,2022-03-23T20:54:39Z,MEMBER,"Sure, to be clear, my hesitancy is mostly just around being reluctant to maintain more complexity in our zarr interface. If there is momentum to implement and maintain this compatibility, I am definitely not opposed. 🚀 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1172229856
https://github.com/pydata/xarray/issues/6374#issuecomment-1076622767,https://api.github.com/repos/pydata/xarray/issues/6374,1076622767,IC_kwDOAMm_X85AK_Wv,1197350,2022-03-23T17:39:57Z,2022-03-23T17:39:57Z,MEMBER,"My opinion is that we should not try to support the nczarr conventions directly. _Xarray already supports nczarr via netCDF4_. If netCDF4 can open the Zarr store, then Xarray can read it.

Supporting nczarr directly would require lots of custom logic within xarray. That's because nczarr introduces several additional metadata files that are not part of the zarr spec. These additional metadata files break the abstractions through which xarray interacts with zarr; working around this requires going under the hood, access the store object directly (rather than the zarr groups and arrays).

I would turn this question around and ask: _if netCDF4 supports access to these datasets directly, what's the advantage of xarray bypassing netCDF4 and opening them directly?_ If there are significant performance benefits, I would be more likely to consider it worthwhile.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1172229856
https://github.com/pydata/xarray/issues/6345#issuecomment-1065385198,https://api.github.com/repos/pydata/xarray/issues/6345,1065385198,IC_kwDOAMm_X84_gHzu,1197350,2022-03-11T18:41:11Z,2022-03-11T18:41:11Z,MEMBER,It seems like what we really want to do is verify that the datatype of the appended data matches the data type on disk.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1164454058
https://github.com/pydata/xarray/issues/6345#issuecomment-1065350469,https://api.github.com/repos/pydata/xarray/issues/6345,1065350469,IC_kwDOAMm_X84_f_VF,1197350,2022-03-11T17:58:28Z,2022-03-11T17:58:28Z,MEMBER,Thanks for reporting this @kmsampson. My feeling is that it is a bug...which we can hopefully fix pretty easily!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1164454058
https://github.com/pydata/xarray/issues/6345#issuecomment-1063401936,https://api.github.com/repos/pydata/xarray/issues/6345,1063401936,IC_kwDOAMm_X84_YjnQ,1197350,2022-03-09T21:43:49Z,2022-03-09T21:43:49Z,MEMBER,"The relevant code is here

https://github.com/pydata/xarray/blob/d293f50f9590251ce09543319d1f0dc760466f1b/xarray/backends/api.py#L1405-L1406

and here

https://github.com/pydata/xarray/blob/d293f50f9590251ce09543319d1f0dc760466f1b/xarray/backends/api.py#L1280-L1298

What I don't understand is _why different validation is needed for the append scenario than for the the write scenario_. @shoyer worked on this in #5252, so maybe he has some ideas.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1164454058
https://github.com/pydata/xarray/issues/1385#issuecomment-1043038150,https://api.github.com/repos/pydata/xarray/issues/1385,1043038150,IC_kwDOAMm_X84-K3_G,1197350,2022-02-17T14:57:03Z,2022-02-17T14:57:03Z,MEMBER,See deeper dive in https://github.com/pydata/xarray/discussions/6284,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135
https://github.com/pydata/xarray/issues/1385#issuecomment-1043016100,https://api.github.com/repos/pydata/xarray/issues/1385,1043016100,IC_kwDOAMm_X84-Kymk,1197350,2022-02-17T14:36:23Z,2022-02-17T14:36:23Z,MEMBER,"Ah ok so if that is your goal, `decode_times=False` should be enough to solve it.

There is a problem with the time encoding in this file. The units (`days since 1950-01-01T00:00:00Z`) are not compatible with the values (738457.04166667, etc.). That would place your measurements sometime in the year 3971. This is part of the problem, but not the whole story.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135
https://github.com/pydata/xarray/issues/1385#issuecomment-1043001146,https://api.github.com/repos/pydata/xarray/issues/1385,1043001146,IC_kwDOAMm_X84-Ku86,1197350,2022-02-17T14:21:45Z,2022-02-17T14:22:23Z,MEMBER,"> (I could post to a web server if there's any reason to prefer that.)

In general that would be a little more convenient than google drive, because then we could download the file from python (rather than having a manual step). This would allow us to share a fully copy-pasteable code snippet to reproduce the issue. But don't worry about that for now.

First, I'd note that your issue is not really related to `open_mfdataset` at all, since it is reproduced just using `open_dataset`. The core problem is that you have ~15M timesteps, and it is taking forever to decode the times out of them. It's fast when you do `decode_times=False` because the data aren't actually being read. I'm going to make a post over in [discussions](https://github.com/pydata/xarray/discussions) to dig a bit deeper into this. StackOverflow isn't monitored too regularly by this community.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135
https://github.com/pydata/xarray/issues/1385#issuecomment-1042937825,https://api.github.com/repos/pydata/xarray/issues/1385,1042937825,IC_kwDOAMm_X84-Kffh,1197350,2022-02-17T13:14:50Z,2022-02-17T13:14:50Z,MEMBER,"Hi Tom! 👋 

So much has evolved about xarray since this original issue was posted. However, we continue to use it as a catchall for people looking to speed up open_mfdataset. I saw your stackoverflow post. Any chance you could post a link to the actual file in question?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135
https://github.com/pydata/xarray/pull/6258#issuecomment-1033782892,https://api.github.com/repos/pydata/xarray/issues/6258,1033782892,IC_kwDOAMm_X849nkZs,1197350,2022-02-09T13:51:55Z,2022-02-09T13:51:55Z,MEMBER,"> came to the conclusion that the previously existing tests had been overly restrictive

Sounds very likely!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1128485610
https://github.com/pydata/xarray/pull/5692#issuecomment-1033779138,https://api.github.com/repos/pydata/xarray/issues/5692,1033779138,IC_kwDOAMm_X849njfC,1197350,2022-02-09T13:47:43Z,2022-02-09T13:47:43Z,MEMBER,Just chiming in to say 💪 ! We see the work you are putting in @benbovy. I'm so excited to be using this feature. Is there a way I can help?,"{""total_count"": 5, ""+1"": 5, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,966983801
https://github.com/pydata/xarray/pull/6258#issuecomment-1033757210,https://api.github.com/repos/pydata/xarray/issues/6258,1033757210,IC_kwDOAMm_X849neIa,1197350,2022-02-09T13:23:23Z,2022-02-09T13:23:23Z,MEMBER,Thanks for working on this Tobias! Yes I implemented much of the Dask / Zarr interface and would be happy to review when you're ready.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1128485610
https://github.com/pydata/xarray/issues/1068#issuecomment-984940677,https://api.github.com/repos/pydata/xarray/issues/1068,984940677,IC_kwDOAMm_X846tQCF,1197350,2021-12-02T19:36:12Z,2021-12-02T19:36:12Z,MEMBER,"One solution to this problem might be the creation of a [custom Xarray backend](http://xarray.pydata.org/en/stable/internals/how-to-add-new-backend.html) for NASA EarthData. This backend could manage authentication with EDL and have its own documentation. If this package were maintained by NASA, it would close the feedback loop more effectively.","{""total_count"": 5, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 4, ""eyes"": 1}",,186169975
https://github.com/pydata/xarray/issues/1068#issuecomment-984920867,https://api.github.com/repos/pydata/xarray/issues/1068,984920867,IC_kwDOAMm_X846tLMj,1197350,2021-12-02T19:08:54Z,2021-12-02T19:08:54Z,MEMBER,"Just wanted to say how much I appreciate @betolink acting as a communication channel between Xarray and NASA. Users often end up on our issue tracker because Xarray raises errors whenever it can't read data. But the source of these problems is not with Xarray, it's with the upstream data provider.

This also happens all the time with xmitgcm, e.g. https://github.com/MITgcm/xmitgcm/issues/266

It would be great if NASA had a better way to respond to these issues which didn't require that you ""know a guy"".","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,186169975
https://github.com/pydata/xarray/issues/5995#issuecomment-971790307,https://api.github.com/repos/pydata/xarray/issues/5995,971790307,IC_kwDOAMm_X8457Ffj,1197350,2021-11-17T17:18:41Z,2021-11-17T17:18:41Z,MEMBER,"> How can i tell xarray to load/dump variable by variable without loading the entire file ?

You could try to [chunk the data](http://xarray.pydata.org/en/stable/user-guide/dask.html) and then Dask will write it for you in chunks. To do in in serial you could use the dask single-threaded scheduler.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1056247970
https://github.com/pydata/xarray/issues/5878#issuecomment-969021506,https://api.github.com/repos/pydata/xarray/issues/5878,969021506,IC_kwDOAMm_X845whhC,1197350,2021-11-15T15:25:37Z,2021-11-15T15:25:46Z,MEMBER,"So there are two layers here where caching could be happening:
- gcsfs / fsspec (python)
- gcs itself 

I propose we eliminate the python layer entirely for the moment. Whenever you load the dataset, it's shape is completely determined by whatever zarr sees in `gs://ldeo-glaciology/append_test/test5/temperature/.zarray`. So try looking at this file directly. You can figure out its public URL and just do curl, e.g.
```
curl https://storage.googleapis.com/ldeo-glaciology/append_test/test5/temperature/.zarray
{
    ""chunks"": [
        3
    ],
    ""compressor"": {
        ""blocksize"": 0,
        ""clevel"": 5,
        ""cname"": ""lz4"",
        ""id"": ""blosc"",
        ""shuffle"": 1
    },
    ""dtype"": ""<i8"",
    ""fill_value"": null,
    ""filters"": null,
    ""order"": ""C"",
    ""shape"": [
        6
    ],
    ""zarr_format"": 2
}
```

Run this from jupyterhub from the command line. Then try `gcs.cat('ldeo-glaciology/append_test/test5/temperature/.zarray'` and see if you see the same thing. Basically just eliminate as many layers as possible from the problem until you get to the core issue.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1030811490
https://github.com/pydata/xarray/issues/1068#issuecomment-968993065,https://api.github.com/repos/pydata/xarray/issues/1068,968993065,IC_kwDOAMm_X845wakp,1197350,2021-11-15T14:58:05Z,2021-11-15T14:58:05Z,MEMBER,At what point do we escalate this issue to NASA? Is there a channel via which they can receive and respond to user feedback?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,186169975
https://github.com/pydata/xarray/issues/5878#issuecomment-967363845,https://api.github.com/repos/pydata/xarray/issues/5878,967363845,IC_kwDOAMm_X845qM0F,1197350,2021-11-12T19:18:38Z,2021-11-12T19:18:38Z,MEMBER,"Ok I think I may understand what is happening

```python
## load the zarr store
ds_both = xr.open_zarr(mapper)
```

When you do this, zarr reads a file called `gs://ldeo-glaciology/append_test/test5/temperature/.zarray`. Since the data are public, I can look at it right now:

```
$ gsutil cat gs://ldeo-glaciology/append_tet/test5/temperature/.zarray
{
    ""chunks"": [
        3
    ],
    ""compressor"": {
        ""blocksize"": 0,
        ""clevel"": 5,
        ""cname"": ""lz4"",
        ""id"": ""blosc"",
        ""shuffle"": 1
    },
    ""dtype"": ""<i8"",
    ""fill_value"": null,
    ""filters"": null,
    ""order"": ""C"",
    ""shape"": [
        6
    ],
}
```

Right now, it shows the shape is `[6]`, as expected after the appending. However, if you read the file immediately after appending (within the 3600s `max-age`), you will get the cached copy. The cached copy will still be of shape `[3]`--it won't know about the append.

To test this hypothesis, you would need to [disable caching](https://cloud.google.com/storage/docs/metadata) on the bucket. Do you have privileges to do that? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1030811490
https://github.com/pydata/xarray/issues/5878#issuecomment-967142419,https://api.github.com/repos/pydata/xarray/issues/5878,967142419,IC_kwDOAMm_X845pWwT,1197350,2021-11-12T14:05:36Z,2021-11-12T14:05:36Z,MEMBER,Can you post the full stack trace of the error you get when you try to append?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1030811490
https://github.com/pydata/xarray/issues/5878#issuecomment-966665066,https://api.github.com/repos/pydata/xarray/issues/5878,966665066,IC_kwDOAMm_X845niNq,1197350,2021-11-11T22:17:32Z,2021-11-11T22:17:32Z,MEMBER,"I think that this is not an issue with xarray, zarr, or anything in python world but rather an issue with how caching works on GCS public buckets: https://cloud.google.com/storage/docs/metadata

To test this, forget about xarray and zarr for a minute and just use [gcsfs](https://gcsfs.readthedocs.io/en/latest/) to list the bucket contents before and after your writes. I think you will find that the default cache lifetime of 3600 seconds means that you cannot ""see"" the changes to the bucket or the objects as quickly as needed in order to append.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1030811490
https://github.com/pydata/xarray/issues/1068#issuecomment-966324523,https://api.github.com/repos/pydata/xarray/issues/1068,966324523,IC_kwDOAMm_X845mPEr,1197350,2021-11-11T13:59:55Z,2021-11-11T13:59:55Z,MEMBER,"I'd like to tag @betolink in this issue. He knows quite a bit about both Xarray and Earthdata login. Maybe he can help us get to the bottom of these problems. Luis, any ideas?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,186169975
https://github.com/pydata/xarray/issues/5954#issuecomment-964084038,https://api.github.com/repos/pydata/xarray/issues/5954,964084038,IC_kwDOAMm_X845dsFG,1197350,2021-11-09T11:56:30Z,2021-11-09T11:56:30Z,MEMBER,"Thanks for the info @alexamici!

> 2\. but most backends serialise writes anyway, so the advantage is limited.

I'm not sure I understand this comment, specifically what is meant by ""serialise writes"". I often use Xarray to do distributed writes to Zarr stores using 100+ distributed dask workers. It works great. We would need the same thing from a TileDB backend.

We are focusing on the user-facing API, but in the end, whether we call it `.to`, `.to_dataset`, or `.store_dataset` is not really a difficult or important question. It's clear we need _some_ generic writing method. The much harder question is the **back-end API**. As Alessandro says:

>  Adding support for a single save_dataset entry point to the backend API is trivial, but adding full support for possibly distributed writes looks like it is much more work.

","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1047608434
https://github.com/pydata/xarray/issues/5918#issuecomment-961202990,https://api.github.com/repos/pydata/xarray/issues/5918,961202990,IC_kwDOAMm_X845Sssu,1197350,2021-11-04T16:21:23Z,2021-11-04T16:21:23Z,MEMBER,Maybe @martindurant has some insights?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1039844354
https://github.com/pydata/xarray/issues/1900#issuecomment-938741037,https://api.github.com/repos/pydata/xarray/issues/1900,938741037,IC_kwDOAMm_X8439A0t,1197350,2021-10-08T15:41:29Z,2021-10-08T15:41:29Z,MEMBER,"> But Pydantic looks promising

Big :+1: to this. 

![this is the way](https://media4.giphy.com/media/Ld77zD3fF3Run8olIt/giphy.gif?cid=ecf05e47rz2271ul6pj0n0p3oopuzbi5wbivpodbcz0l4095&rid=giphy.gif&ct=g)","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,295959111
https://github.com/pydata/xarray/pull/5252#issuecomment-863452266,https://api.github.com/repos/pydata/xarray/issues/5252,863452266,MDEyOklzc3VlQ29tbWVudDg2MzQ1MjI2Ng==,1197350,2021-06-17T18:07:28Z,2021-06-17T18:07:28Z,MEMBER,Really sorry I didn't get around to review. My excuse is that I moved back to NYC last week and fell behind on everything. Thanks for moving it forward. 💪 ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,874331538
https://github.com/pydata/xarray/issues/5028#issuecomment-863213400,https://api.github.com/repos/pydata/xarray/issues/5028,863213400,MDEyOklzc3VlQ29tbWVudDg2MzIxMzQwMA==,1197350,2021-06-17T12:53:16Z,2021-06-17T12:53:22Z,MEMBER,So glad this got fixed upstream! That's how it is supposed to work! 🏆  Thanks to everyone for making this happen.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,830507003
https://github.com/pydata/xarray/issues/5219#issuecomment-839106491,https://api.github.com/repos/pydata/xarray/issues/5219,839106491,MDEyOklzc3VlQ29tbWVudDgzOTEwNjQ5MQ==,1197350,2021-05-11T20:08:27Z,2021-05-11T20:08:27Z,MEMBER,"> Instead we could require explicitly supplying `chunks` vis the `encoding` parameter in the `to_zarr()` call.

This could also break existing workflows though. For example, pangeo-forge is using the encoding.chunks attribute to specify target dataset chunks.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,868352536
https://github.com/pydata/xarray/issues/3653#issuecomment-832712426,https://api.github.com/repos/pydata/xarray/issues/3653,832712426,MDEyOklzc3VlQ29tbWVudDgzMjcxMjQyNg==,1197350,2021-05-05T14:01:25Z,2021-05-05T14:01:33Z,MEMBER,"Update: there is now a way to read a remote netCDF file from an HTTP server directly using the netcdf-python library. The trick is to append `#mode=bytes` to the end of the url.

```python
import xarray as xr
import netCDF4  # I'm using version 1.5.6

url = ""https://www.ldeo.columbia.edu/~rpa/NOAA_NCDC_ERSST_v3b_SST.nc#mode=bytes""

# raw netcdf4 Dataset
ds = netCDF4.Dataset(url)

# xarray Dataset
ds = xr.open_dataset(url)
```

","{""total_count"": 12, ""+1"": 5, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 6, ""rocket"": 0, ""eyes"": 0}",,543197350
https://github.com/pydata/xarray/pull/5252#issuecomment-831970193,https://api.github.com/repos/pydata/xarray/issues/5252,831970193,MDEyOklzc3VlQ29tbWVudDgzMTk3MDE5Mw==,1197350,2021-05-04T14:07:03Z,2021-05-04T14:07:03Z,MEMBER,Question: does this mode still require eager loading of dimension coordinates?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,874331538
https://github.com/pydata/xarray/issues/5219#issuecomment-828071017,https://api.github.com/repos/pydata/xarray/issues/5219,828071017,MDEyOklzc3VlQ29tbWVudDgyODA3MTAxNw==,1197350,2021-04-28T01:26:34Z,2021-04-28T01:26:34Z,MEMBER,"> we probably would NOT want to use `safe_chunks=False`, correct?

correct

The problem in this issue is that the dataset is carrying around its original chunks in `.encoding` and then xarray tries to use these values to set the chunk encoding on the second write op. The solution is to *manually delete the chunk encoding from all your data variables*. Something like
```python
for var in ds:
    del ds[var].encoding['chunks']
```

Originally part of #5056 was a change that would have xarray automatically do this deletion after some operations (such as calling `.chunk()`);  however, we could not reach a consensus on the best way to implement that change. Your example is interesting because it is a slightly different scenario -- calling `sel()` instead of `chunk()` -- but the root cause appears to be the same: `encoding['chunks']` is being kept around too conservatively.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,868352536
https://github.com/pydata/xarray/pull/5065#issuecomment-826913149,https://api.github.com/repos/pydata/xarray/issues/5065,826913149,MDEyOklzc3VlQ29tbWVudDgyNjkxMzE0OQ==,1197350,2021-04-26T15:08:43Z,2021-04-26T15:08:43Z,MEMBER,I think this PR has received a very thorough review. I would be pleased if someone from @pydata/xarray would merge it soon.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,837243943
https://github.com/pydata/xarray/pull/5065#issuecomment-826888674,https://api.github.com/repos/pydata/xarray/issues/5065,826888674,MDEyOklzc3VlQ29tbWVudDgyNjg4ODY3NA==,1197350,2021-04-26T14:38:49Z,2021-04-26T14:38:49Z,MEMBER,"The pre-commit workflow is raising a blackdoc error I am not seeing in my local env

```diff
diff --git a/doc/internals/duck-arrays-integration.rst b/doc/internals/duck-arrays-integration.rst
index eb5c4d8..2bc3c1f 100644
--- a/doc/internals/duck-arrays-integration.rst
+++ b/doc/internals/duck-arrays-integration.rst
@@ -25,7 +25,7 @@ argument:
         ...
 
         def _repr_inline_(self, max_width):
-            """""" format to a single line with at most max_width characters """"""
+            """"""format to a single line with at most max_width characters""""""
             ...
 ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,837243943
https://github.com/pydata/xarray/issues/4554#issuecomment-822571688,https://api.github.com/repos/pydata/xarray/issues/4554,822571688,MDEyOklzc3VlQ29tbWVudDgyMjU3MTY4OA==,1197350,2021-04-19T15:44:07Z,2021-04-19T15:44:07Z,MEMBER,"> we rearrange the DataArrays to 2D arrays

FWIW, this is the exact same thing we do in xhistorgram in order to apply histogram over a specific group of axes:

https://github.com/xgcm/xhistogram/blob/2681aee6fe04e7656c458f32277f87e76653b6e8/xhistogram/core.py#L238-L254

We noticed a similar problem with Dask's reshape implementation, raised here: https://github.com/dask/dask/issues/5544
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,732910109
https://github.com/pydata/xarray/issues/5172#issuecomment-821315433,https://api.github.com/repos/pydata/xarray/issues/5172,821315433,MDEyOklzc3VlQ29tbWVudDgyMTMxNTQzMw==,1197350,2021-04-16T17:07:03Z,2021-04-16T17:07:03Z,MEMBER,Yes I agree. Should I just close this and move it to h5netcdf?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,859945463
https://github.com/pydata/xarray/pull/5065#issuecomment-817990859,https://api.github.com/repos/pydata/xarray/issues/5065,817990859,MDEyOklzc3VlQ29tbWVudDgxNzk5MDg1OQ==,1197350,2021-04-12T17:27:28Z,2021-04-12T17:27:28Z,MEMBER,Any further feedback on this now reduced-scope PR? Merging this would be helpful for moving forward Pangeo forge.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,837243943
https://github.com/pydata/xarray/pull/5065#issuecomment-815019613,https://api.github.com/repos/pydata/xarray/issues/5065,815019613,MDEyOklzc3VlQ29tbWVudDgxNTAxOTYxMw==,1197350,2021-04-07T15:44:25Z,2021-04-07T15:44:25Z,MEMBER,"I have removed the controversial `encoding['chunks']` stuff from the PR. Now it only contains the `safe_chunks` option in `to_zarr`.

If there are no further comments on this, I think this is good to go.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,837243943
https://github.com/pydata/xarray/pull/5065#issuecomment-814102743,https://api.github.com/repos/pydata/xarray/issues/5065,814102743,MDEyOklzc3VlQ29tbWVudDgxNDEwMjc0Mw==,1197350,2021-04-06T13:03:53Z,2021-04-06T13:03:53Z,MEMBER,"We seem to be unable to resolve the complexities around chunk encoding. I propose to remove this from the PR and reduce the scope to just the `safe_chunks` features. @aurghs should probably be the one to tackle the chunk encoding problem; unfortunately it exceeds my understanding, and I don't have time to dig deeper at the moment. In the meantime `safe_chunks` is important for pangeo-forge forward progress.

Please give a 👍  or 👎  to this idea if you have an opinion.","{""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,837243943
https://github.com/pydata/xarray/pull/5065#issuecomment-811975731,https://api.github.com/repos/pydata/xarray/issues/5065,811975731,MDEyOklzc3VlQ29tbWVudDgxMTk3NTczMQ==,1197350,2021-04-01T15:12:15Z,2021-04-01T15:12:15Z,MEMBER,"> But it seems to me that having two different definitions of chunks (dask one and encoded one), is not very intuitive and it's not easy to define a clear default in writing.

My use for `encoding.chunks` is to tell Zarr what chunks to use on disk.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,837243943
https://github.com/pydata/xarray/pull/5065#issuecomment-811308284,https://api.github.com/repos/pydata/xarray/issues/5065,811308284,MDEyOklzc3VlQ29tbWVudDgxMTMwODI4NA==,1197350,2021-03-31T18:23:03Z,2021-03-31T18:23:03Z,MEMBER,So any ideas how to proceed? 🧐,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,837243943
https://github.com/pydata/xarray/pull/5065#issuecomment-811275436,https://api.github.com/repos/pydata/xarray/issues/5065,811275436,MDEyOklzc3VlQ29tbWVudDgxMTI3NTQzNg==,1197350,2021-03-31T17:31:53Z,2021-03-31T17:32:12Z,MEMBER,"A just pushed a new commit which deletes all encoding inside `variable.chunk()`. But as you will see when the CI finishes, this leads to a lot of test failures. For example:

```
=============================================================================== FAILURES ================================================================================
____________________________________________________ TestNetCDF4ViaDaskData.test_roundtrip_string_encoded_characters ____________________________________________________

self = <xarray.tests.test_backends.TestNetCDF4ViaDaskData object at 0x18cba4c40>

    def test_roundtrip_string_encoded_characters(self):
        expected = Dataset({""x"": (""t"", [""ab"", ""cdef""])})
        expected[""x""].encoding[""dtype""] = ""S1""
        with self.roundtrip(expected) as actual:
            assert_identical(expected, actual)
>           assert actual[""x""].encoding[""_Encoding""] == ""utf-8""
E           KeyError: '_Encoding'

/Users/rpa/Code/xarray/xarray/tests/test_backends.py:485: KeyError
```

Why is `chunk` getting called here? Does it actually get called every time we load a dataset with chunks? If so, we will need a more sophisticated solution.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,837243943
https://github.com/pydata/xarray/pull/5065#issuecomment-811265134,https://api.github.com/repos/pydata/xarray/issues/5065,811265134,MDEyOklzc3VlQ29tbWVudDgxMTI2NTEzNA==,1197350,2021-03-31T17:17:07Z,2021-03-31T17:17:07Z,MEMBER,"> Replace `self._encoding` with `None` here?

Thanks! Yeah that's what I had in mind. But I was wondering if there was an example of doing that it else I could copy.

In any case, I'll give it a try now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,837243943
https://github.com/pydata/xarray/pull/5065#issuecomment-811189539,https://api.github.com/repos/pydata/xarray/issues/5065,811189539,MDEyOklzc3VlQ29tbWVudDgxMTE4OTUzOQ==,1197350,2021-03-31T16:12:13Z,2021-03-31T16:12:23Z,MEMBER,"In today's dev call, we proposed to handle encoding in `chunk` the same way we handle it in indexing: by deleting all encoding. 

The problem is, I can't figure out where this happens. Can someone point me to the place in the code where indexing operations delete encoding?

A related question: I discovered this encoding option `preferred_chunks`, which is treated specially:
https://github.com/pydata/xarray/blob/57a4479fcd3ebc579cf00e0d6bf85007eda44b56/xarray/core/dataset.py#L396

Should the Zarr backend be setting this?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,837243943
https://github.com/pydata/xarray/pull/5065#issuecomment-811148122,https://api.github.com/repos/pydata/xarray/issues/5065,811148122,MDEyOklzc3VlQ29tbWVudDgxMTE0ODEyMg==,1197350,2021-03-31T15:16:37Z,2021-03-31T15:16:37Z,MEMBER,"I appreciate the discussion on this PR. Does anyone have a concrete suggestion of what to do?

If we are not in agreement about the encoding stuff, perhaps I should remove that and just move forward with the `safe_chunks` part of this PR?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,837243943
https://github.com/pydata/xarray/issues/4470#issuecomment-810683846,https://api.github.com/repos/pydata/xarray/issues/4470,810683846,MDEyOklzc3VlQ29tbWVudDgxMDY4Mzg0Ng==,1197350,2021-03-31T01:22:29Z,2021-03-31T01:22:29Z,MEMBER,"I just saw this very [cool tweet](https://twitter.com/billjameslittle/status/1377064778036822017) about ipyvista / iris integration and it reminded me of this thread. 

Are there any clear steps we can take to help advance the vtk / pyvista / xarray integration further?","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,710357592
https://github.com/pydata/xarray/pull/5065#issuecomment-807128780,https://api.github.com/repos/pydata/xarray/issues/5065,807128780,MDEyOklzc3VlQ29tbWVudDgwNzEyODc4MA==,1197350,2021-03-25T17:19:15Z,2021-03-25T17:19:15Z,MEMBER,"> Perhaps a kwarg in `to_zarr` like `ignore_encoding_chunks`?

I would argue that this is unnecessary. If you want to explicitly drop encoding, just `del da.encoding['chunks']` before writing. But most users don't figure out that they should do this, because the default behavior is counterintuitive.

The problem here is with the default behavior of propagating chunk encoding through computations when it no longer makes sense. My example with the `dtype` encoding illustrates that we already drop encoding on certain operations, so it's not unprecedented. It's more of an implementation question: where and how to do the dropping.

FWIW, I would also favor dropping `encoding['chunks']` after indexing, coarsening, interpolating, etc. Basically anything that changes the array shape or chunk structure.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,837243943
https://github.com/pydata/xarray/pull/5065#issuecomment-806724345,https://api.github.com/repos/pydata/xarray/issues/5065,806724345,MDEyOklzc3VlQ29tbWVudDgwNjcyNDM0NQ==,1197350,2021-03-25T13:17:03Z,2021-03-25T13:17:59Z,MEMBER,"I see your point. I guess I don't fully understand where else in the code path encoding gets dropped. Consider this example

```python
import xarray as xr
ds = xr.Dataset({'foo': ('time', [1, 1], {'dtype': 'int16'})})
ds = xr.decode_cf(ds).compute()
assert ""dtype"" in ds.foo.encoding
assert ""dtype"" not in (0.5 * ds.foo).encoding
```

Xarray knows to drop the `dtype` encoding after an arithmetic operation. How does that work? To me `.chunk` feel like a similar case: an operation that invalidates any existing encoding.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,837243943
https://github.com/pydata/xarray/issues/4118#issuecomment-806701802,https://api.github.com/repos/pydata/xarray/issues/4118,806701802,MDEyOklzc3VlQ29tbWVudDgwNjcwMTgwMg==,1197350,2021-03-25T13:01:56Z,2021-03-25T13:05:03Z,MEMBER,"So we have:
- Numerous promising prototypes to draw from
- A technical team who can write the proposal and execute the proposed work (@aurghs & @alexamici of B-open)
- Numerous supporting use cases from the bioimaging (@joshmoore), condensed matter (@tacaswell), and bayesian modeling (ArviZ; @OriolAbril) domains

We are just missing a PI, someone who is willing to put their name on top of the proposal and click submit. I have [gone on record](https://rabernat.medium.com/advising-and-collaborating-during-a-pandemic-and-sabbatical-ca9531b82b6d) as committed to not leading any new proposals this year. And in any case, this is a good opportunity for someone else from the @pydata/xarray core dev team to try on a leadership role.


","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,628719058
https://github.com/pydata/xarray/issues/2300#issuecomment-805883595,https://api.github.com/repos/pydata/xarray/issues/2300,805883595,MDEyOklzc3VlQ29tbWVudDgwNTg4MzU5NQ==,1197350,2021-03-24T14:48:55Z,2021-03-24T14:48:55Z,MEMBER,"In #5056, I have implemented the solution of deleting `chunks` from encoding when `chunk()` is called on a variable. A review of that PR would be welcome. ","{""total_count"": 2, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 2, ""rocket"": 0, ""eyes"": 0}",,342531772
https://github.com/pydata/xarray/pull/5065#issuecomment-804050169,https://api.github.com/repos/pydata/xarray/issues/5065,804050169,MDEyOklzc3VlQ29tbWVudDgwNDA1MDE2OQ==,1197350,2021-03-22T13:12:45Z,2021-03-22T13:12:45Z,MEMBER,"Thanks Anderson. Fixed by rebasing. Now RTD build is failing, but there is no obvious error in the logs...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,837243943
https://github.com/pydata/xarray/pull/5065#issuecomment-803712024,https://api.github.com/repos/pydata/xarray/issues/5065,803712024,MDEyOklzc3VlQ29tbWVudDgwMzcxMjAyNA==,1197350,2021-03-22T01:58:23Z,2021-03-22T02:02:00Z,MEMBER,"Confused about the test error. It seems unrelated. In `test_sparse.py:test_variable_method`

```
E   TypeError: no implementation found for 'numpy.allclose' on types that implement __array_function__: [<class 'numpy.ndarray'>, <class 'sparse._coo.core.COO'>]
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,837243943
https://github.com/pydata/xarray/issues/4118#issuecomment-801240559,https://api.github.com/repos/pydata/xarray/issues/4118,801240559,MDEyOklzc3VlQ29tbWVudDgwMTI0MDU1OQ==,1197350,2021-03-17T16:47:20Z,2021-03-17T16:47:20Z,MEMBER,"On today's Xarray dev call, we discussed pursuing another CZI grant to support this feature in Xarray. The image pyramid use case would provide a strong link to the bioimaging community. @alexamici and the B-open folks seem enthusiastic.

I had to leave the meeting early, so I didn't hear the end of the conversation. But did we decide who might serve as PI for such a proposal?","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,628719058
https://github.com/pydata/xarray/issues/2300#issuecomment-790088409,https://api.github.com/repos/pydata/xarray/issues/2300,790088409,MDEyOklzc3VlQ29tbWVudDc5MDA4ODQwOQ==,1197350,2021-03-03T21:55:44Z,2021-03-03T21:55:44Z,MEMBER,"> alternatively `to_zarr` could ignore `encoding[""chunks""]` when the data is already chunked?

I would not favor that. A user may choose to define their desired zarr chunks by putting this information in encoding. In this case, it's good to raise the error. (This is the case I had in mind when I wrote this code.)

The problem here is that encoding is often being carried over from the original dataset and persisted across operations that change chunk size.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,342531772
https://github.com/pydata/xarray/issues/2300#issuecomment-789974968,https://api.github.com/repos/pydata/xarray/issues/2300,789974968,MDEyOklzc3VlQ29tbWVudDc4OTk3NDk2OA==,1197350,2021-03-03T18:54:43Z,2021-03-03T18:54:43Z,MEMBER,I think we are all in agreement. Just waiting for someone to make a PR. It's probably just a few lines of code changes.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,342531772
https://github.com/pydata/xarray/issues/4691#issuecomment-761136148,https://api.github.com/repos/pydata/xarray/issues/4691,761136148,MDEyOklzc3VlQ29tbWVudDc2MTEzNjE0OA==,1197350,2021-01-15T19:18:50Z,2021-01-15T19:18:50Z,MEMBER,cc @martindurant for fsspec issue,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,766826777
https://github.com/pydata/xarray/issues/4789#issuecomment-758373462,https://api.github.com/repos/pydata/xarray/issues/4789,758373462,MDEyOklzc3VlQ29tbWVudDc1ODM3MzQ2Mg==,1197350,2021-01-12T03:36:26Z,2021-01-12T03:36:26Z,MEMBER,I uncovered this issue with Dask's SVG in its `_repr_html` function: https://github.com/dask/dask/issues/6670. The fix made a big difference in repr size. Possibly related?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,782943813
https://github.com/pydata/xarray/pull/4461#issuecomment-741949159,https://api.github.com/repos/pydata/xarray/issues/4461,741949159,MDEyOklzc3VlQ29tbWVudDc0MTk0OTE1OQ==,1197350,2020-12-09T18:02:03Z,2020-12-09T18:02:11Z,MEMBER,"I think @shoyer has laid out the options in a very clear way.

I weakly favor option 2, as I think it preferable in terms of software architecture and our broader roadmap for Xarray. However, I am cognizant of the significant effort that @martindurant has put into this, and I don't want his effort to go to waste.

Some mitigating factors are:
- The example I gave above (https://github.com/pydata/xarray/pull/4461#issuecomment-741939277) shows that one high-impact feature that users want (async capabilities in Zarr) already works, albiet with a different syntax. So this PR is more about convenience.
- Presumably the knowledge about Xarray that Martin has gained by implementing this PR is transferrable to a different context, and so we would not be starting from scratch if we went with 2.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,709187212
https://github.com/pydata/xarray/pull/4461#issuecomment-741939277,https://api.github.com/repos/pydata/xarray/issues/4461,741939277,MDEyOklzc3VlQ29tbWVudDc0MTkzOTI3Nw==,1197350,2020-12-09T17:44:55Z,2020-12-09T17:44:55Z,MEMBER,"@rsignell-usgs: note that your example works without this PR (but with the just released zarr 2.6.1) as follows
```python
mapper = fsspec.get_mapper('s3://noaa-nwm-retro-v2.0-zarr-pds')
ds = xr.open_zarr(mapper, consolidated=True)
```

Took 4s on my laptop (outside of AWS).","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,709187212
https://github.com/pydata/xarray/issues/4631#issuecomment-736786380,https://api.github.com/repos/pydata/xarray/issues/4631,736786380,MDEyOklzc3VlQ29tbWVudDczNjc4NjM4MA==,1197350,2020-12-01T20:03:54Z,2020-12-01T20:03:54Z,MEMBER,Ok then I am 👍  on @dcherian's solution.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,753965875
https://github.com/pydata/xarray/issues/4631#issuecomment-736526797,https://api.github.com/repos/pydata/xarray/issues/4631,736526797,MDEyOklzc3VlQ29tbWVudDczNjUyNjc5Nw==,1197350,2020-12-01T12:39:53Z,2020-12-01T12:39:53Z,MEMBER,But what did we do before?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,753965875