id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
396806015,MDU6SXNzdWUzOTY4MDYwMTU=,2660,DataArrays to/from Zarr Arrays,1197350,open,0,,,7,2019-01-08T08:56:05Z,2023-10-27T14:00:20Z,,MEMBER,,,,"Right now, `open_zarr` and `Dataset.to_zarr` only work with [Zarr groups](https://zarr.readthedocs.io/en/stable/tutorial.html#groups). Zarr Groups can contain multiple [Array](https://zarr.readthedocs.io/en/stable/tutorial.html#creating-an-array) objects.

It would be nice if we could open Zarr Arrays directly as xarray DataArrays and write xarray DataArrays directly to Zarr Arrays.

However, this might not make sense, because, unlike xarray DataArrays, zarr Arrays can't hold any coordinates.

Just raising this idea for discussion.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2660/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,reopened,13221727,issue
357808970,MDExOlB1bGxSZXF1ZXN0MjEzNzM2NTAx,2405,WIP: don't create indexes on multidimensional dimensions,1197350,closed,0,,,7,2018-09-06T20:13:11Z,2023-07-19T18:33:17Z,2023-07-19T18:33:17Z,MEMBER,,0,pydata/xarray/pulls/2405," - [x] Closes #2368, Closes #2233
 - [ ] Tests added (for all bug fixes or enhancements)
 - [ ] Tests passed (for all non-documentation changes)
 - [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)

This is just a start to the solution proposed in #2368. A surprisingly small number of tests broke in my local environment.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2405/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1047608434,I_kwDOAMm_X84-cTxy,5954,Writeable backends via entrypoints,1197350,open,0,,,7,2021-11-08T15:47:12Z,2021-11-09T16:28:59Z,,MEMBER,,,,"The backend refactor has gone a long way towards making it easier to implement custom backend readers via entry points. However, it is still not clear how to implement a _writeable_ backend from a third party package as an entry point. Some of the reasons for this are:

- While our reading function (`open_dataset`) has a generic name, our writing functions (`Dataset.to_netcdf` / `Dataset.to_zarr`) are still format specific. (Related to https://github.com/pydata/xarray/issues/3638). **I propose we introduce a generic `Dataset.to` method and deprecate the others.**
- The `BackendEntrypoint` base class does not have a writing method, just `open_dataset`:
   https://github.com/pydata/xarray/blob/e0deb9cf0a5cd5c9e3db033fd13f075added9c1e/xarray/backends/common.py#L356-L370
   (Related to https://github.com/pydata/xarray/issues/1970)
- As a result, writing is implemented ad-hoc for each backend.
- This makes it impossible for a third-party package to to implement writing.


 We should fix this situation! Here are the steps I would take.

- [ ] Decide on the desired API for writeable backends.
- [ ] Formalize this in the `BackendEntrypoint` base class.
- [ ] Refactor the existing writeable backends (netcdf4-python, h5netcdf, scipy, Zarr) to use this API
- [ ] Maybe deprecate `to_zarr` and `to_netcdf` (or at least refactor to make a shallow call to a generic method)
- [ ] Encourage third party implementors to try it (e.g. TileDB)
 



","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5954/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 1}",,,13221727,issue
193657418,MDU6SXNzdWUxOTM2NTc0MTg=,1154,netCDF reading is not prominent in the docs,1197350,closed,0,,,7,2016-12-06T01:18:40Z,2019-02-02T06:33:44Z,2019-02-02T06:33:44Z,MEMBER,,,,"Just opening an issue to highlight what I think is a problem with the docs.

For me, the primary use of xarray is to read and process existing netCDF data files. @shoyer's  popular [blog post](https://www.continuum.io/content/xray-dask-out-core-labeled-arrays-python) illustrates this use case extremely well.

However, when I open the [docs](http://xarray.pydata.org/), I have to dig quite deep before I can see how to read a netCDF file. This could be turning away many potential users. The stuff about netCDF reading is hidden under ""Serialization and IO"". Many potential users will have no idea what either of these words mean.

IMO the solution to this is to reorganize the docs to make reading netCDF much more prominent and obvious.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1154/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
180536861,MDExOlB1bGxSZXF1ZXN0ODc2NDc0MDk=,1027,Groupby bins empty groups,1197350,closed,0,,,7,2016-10-02T21:31:32Z,2016-10-03T15:22:18Z,2016-10-03T15:22:15Z,MEMBER,,0,pydata/xarray/pulls/1027,"This PR fixes a bug in `groupby_bins` in which empty bins were dropped from the grouped results. Now `groupby_bins` restores any empty bins automatically. To recover the old behavior, one could apply `dropna` after a groupby operation.

Fixes #1019  
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1027/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
94508580,MDExOlB1bGxSZXF1ZXN0Mzk3NTI1MTQ=,468,Option for closing files with scipy backend,1197350,closed,0,,,7,2015-07-11T21:24:24Z,2015-08-10T12:50:45Z,2015-08-09T00:04:12Z,MEMBER,,0,pydata/xarray/pulls/468,"This addresses issue #463, in which open_mfdataset failed when trying to open a list of files longer than my system's ulimit. I tried to find a solution in which the underlying netcdf file objects are kept closed by default and only reopened ""when needed"".

I ended up subclassing scipy.io.netcdf_file and overwriting the variable attribute with a property which first checks whether the file is open or closed and opens it if needed. That was the easy part. The hard part was figuring out when to close them. The problem is that a couple of different parts of the code (e.g. each individual variable and also the datastore object itself) keep references to the netcdf_file object. In the end I used the debugger to find out when during initialization the variables were actually being read and added some calls to close() in various different places. It is relatively easy to close the files up at the end of the initialization, but it was much harder to make sure that the whole array of files is never open at the same time. I also had to disable mmap when this option is active.

This solution is messy and, moreover, extremely slow. There is a factor of ~100 performance penalty during initialization for reopening and closing the files all the time (but only a factor of 10 for the actual calculation). I am sure this could be reduced if someone who understands the code better found some judicious points at which to call close() on the netcdf_file. The loss of mmap also sucks.

This option can be accessed with the close_files key word, which I added to api.

Timing for loading and doing a calculation with close_files=True:

``` python
count_open_files()
%time mfds =  xray.open_mfdataset(ddir + '/dt_global_allsat_msla_uv_2014101*.nc', engine='scipy', close_files=True)
count_open_files()
%time print float(mfds.variables['u'].mean())
count_open_files()
```

output:

```
3 open files
CPU times: user 11.1 s, sys: 17.5 s, total: 28.5 s
Wall time: 27.7 s
2 open files
0.0055650632367
CPU times: user 649 ms, sys: 974 ms, total: 1.62 s
Wall time: 633 ms
2 open files
```

Timing for loading and doing a calculation with close_files=False (default, should revert to old behavior):

``` python
count_open_files()
%time mfds =  xray.open_mfdataset(ddir + '/dt_global_allsat_msla_uv_2014101*.nc', engine='scipy', close_files=False)
count_open_files()
%time print float(mfds.variables['u'].mean())
count_open_files()
```

```
3 open files
CPU times: user 264 ms, sys: 85.3 ms, total: 349 ms
Wall time: 291 ms
22 open files
0.0055650632367
CPU times: user 174 ms, sys: 141 ms, total: 315 ms
Wall time: 56 ms
22 open files
```

This is not a very serious pull request, but I spent all day on it, so I thought I would share. Maybe you can see some obvious way to improve it...
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/468/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull