id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
2266174558,I_kwDOAMm_X86HExRe,8975,Xarray sponsorship guidelines,1217238,open,0,,,3,2024-04-26T17:05:01Z,2024-04-30T20:52:33Z,,MEMBER,,,,"### At what level of support should Xarray acknowledge sponsors on our website?

I would like to surface this for open discussion because there are potential sponsoring organizations with conflicts of interest with members of Xarray's leadership team (e.g., [Earthmover](https://earthmover.io/), which employs @jhamman, @rabernat and @dcherian).

My suggestion is to use [NumPy's guidelines](https://numpy.org/neps/nep-0046-sponsorship-guidelines.html), with an adjustment down to 1/3 of the thresholds to account for the smaller size of the project:

- $10,000/yr for unrestricted financial contributions (e.g., donations)
- $20,000/yr for financial contributions for a particular purpose (e.g., grants)
- $30,000/yr for in-kind contributions (e.g., time for employees to contribute)
- 2 person-months/yr of paid work time for one or more Xarray maintainers or regular contributors to any Xarray team or activity

The NumPy guidelines also include a grace period of a minimum of 6 months for acknowledging support. I would suggest increasing this to a minimum of 1 year for Xarray.

I would greatly appreciate any feedback from members of the community, either in this issue or on the next [team meeting](https://docs.xarray.dev/en/stable/developers-meeting.html).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8975/reactions"", ""total_count"": 6, ""+1"": 5, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
271043420,MDU6SXNzdWUyNzEwNDM0MjA=,1689,Roundtrip serialization of coordinate variables with spaces in their names,1217238,open,0,,,5,2017-11-03T16:43:20Z,2024-03-22T14:02:48Z,,MEMBER,,,,"If coordinates have spaces in their names, they get restored from netCDF files as data variables instead:
```
>>> xarray.open_dataset(xarray.Dataset(coords={'name with spaces': 1}).to_netcdf())
<xarray.Dataset>
Dimensions:           ()
Data variables:
    name with spaces  int32 1
````

This happens because the CF convention is to indicate coordinates as a space separated string, e.g., `coordinates='latitude longitude'`.

Even though these aren't CF compliant variable names (which cannot have strings) It would be nice to have an ad-hoc convention for xarray that allows us to serialize/deserialize coordinates in all/most cases. Maybe we could use escape characters for spaces (e.g., `coordinates='name\ with\ spaces'`) or quote names if they have spaces (e.g., `coordinates='""name\ with\ spaces""'`?

At the very least, we should issue a warning in these cases.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1689/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
267542085,MDU6SXNzdWUyNjc1NDIwODU=,1647,Representing missing values in string arrays on disk,1217238,closed,0,,,3,2017-10-23T05:01:10Z,2024-02-06T13:03:40Z,2024-02-06T13:03:40Z,MEMBER,,,,"This came up as part of my clean-up of serializing unicode strings in https://github.com/pydata/xarray/pull/1648.

There are two ways to represent strings in netCDF files.

- As character arrays (`NC_CHAR`), supported by both netCDF3 and netCDF4
- As variable length unicode strings (`NC_STRING`), only supported by netCDF4/HDF5.

Currently, by default (if no `_FillValue` is set) we replace missing values (NaN) with an empty string when writing data to disk.

For character arrays, we *could* use the normal `_FillValue` mechanism to set a fill value and decode when data is read back from disk. In fact, this already currently works for `dtype=bytes` (though it isn't documented):
```
In [10]: ds = xr.Dataset({'foo': ('x', np.array([b'bar', np.nan], dtype=object), {}, {'_FillValue': b''})})

In [11]: ds
Out[11]:
<xarray.Dataset>
Dimensions:  (x: 2)
Dimensions without coordinates: x
Data variables:
    foo      (x) object b'bar' nan

In [12]: ds.to_netcdf('foobar.nc')

In [13]: xr.open_dataset('foobar.nc').load()
Out[13]:
<xarray.Dataset>
Dimensions:  (x: 2)
Dimensions without coordinates: x
Data variables:
    foo      (x) object b'bar' nan
```

For variable length strings, it [currently isn't possible](https://github.com/Unidata/netcdf4-python/issues/730) to set a fill-value. So there's no good way to indicate missing values, though this may change if the future depending on the resolution of the netCDF-python issue.

It would obviously be nice to always automatically round-trip missing values, both for strings and bytes. I see two possible ways to do this:
1. Require setting an explicit `_FillValue` when a string contains missing values, by raising an error if this isn't done. We need an explicit choice because there aren't any extra unused characters left over, at least for character arrays. (NetCDF explicitly allows arbitrary bytes to be stored in `NC_CHAR`, even though this maps to an HDF5 fixed-width string with ASCII encoding.)
 For variable length strings, we could potentially set a [non-character unicode symbol](https://en.wikipedia.org/wiki/Specials_(Unicode_block)) like `U+FFFF`, but again that isn't supported yet.
2. Treat empty strings as equivalent to a missing value (NaN). This has the advantage of not requiring an explicit choice of `_FillValue`, so we don't need to wait for any netCDF4 issues to be resolved. However, this does mean that empty strings would not round-trip. Still, given the relative prevalence of missing values vs empty strings in xarray/pandas, it's probably the lesser evil to not preserve empty string.

The default option is to adopt neither of these, and keep the current behavior where missing values are written as empty strings and not decoded at all.

Any opinions? I am leaning towards option (2).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1647/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
842436143,MDU6SXNzdWU4NDI0MzYxNDM=,5081,Lazy indexing arrays as a stand-alone package,1217238,open,0,,,6,2021-03-27T07:06:03Z,2023-12-15T13:20:03Z,,MEMBER,,,,"From @rabernat on [Twitter](https://twitter.com/rabernat/status/1330707155742322689):

> ""Xarray has some secret private classes for lazily indexing / wrapping arrays that are so useful I think they should be broken out into a standalone package. https://github.com/pydata/xarray/blob/master/xarray/core/indexing.py#L516""

The idea here is create a first-class  ""duck array"" library for lazy indexing that could replace xarray's internal classes for lazy indexing. This would be in some ways similar to dask.array, but much simpler, because it doesn't have to worry about parallel computing.

Desired features:

- Lazy indexing
- Lazy transposes
- Lazy concatenation (#4628) and stacking
- Lazy vectorized operations (e.g., unary and binary arithmetic) 
     - needed for decoding variables from disk (`xarray.encoding`) and 
     - building lazy multi-dimensional coordinate arrays corresponding to map projections  (#3620)
- Maybe: lazy reshapes (#4113)

A common feature of these operations is they can (and almost always should) be _fused_ with indexing: if N elements are selected via indexing, only O(N) compute and memory is required to produce them, regards of the size of the original arrays as long as the number of applied operations can be treated as a constant. Memory access is significantly slower than compute on modern hardware, so recomputing these operations on the fly is almost always a good idea.

Out of scope: lazy computation when indexing could require access to many more elements to compute the desired value than are returned. For example, `mean()` probably should not be lazy, because that could involve computation of a very large number of elements that one might want to cache.

This is valuable functionality for Xarray for two reasons:

1. It allows for ""previewing"" small bits of data loaded from disk or remote storage, even if that data needs some form of cheap ""decoding"" from its form on disk.
2. It allows for xarray to decode data in a lazy fashion that is compatible with full-featured systems for lazy computation (e.g., Dask), without requiring the user to choose dask when reading the data.

Related issues:

- [Proposal] Expose Variable without Pandas dependency #3981
- Lazy concatenation of arrays #4628
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5081/reactions"", ""total_count"": 6, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 6, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
197939448,MDU6SXNzdWUxOTc5Mzk0NDg=,1189,Document using a spawning multiprocessing pool for multiprocessing with dask,1217238,closed,0,,,3,2016-12-29T01:21:50Z,2023-12-05T21:51:04Z,2023-12-05T21:51:04Z,MEMBER,,,,"This is a nice option for working with in-file HFD5/netCDF4 compression:
https://github.com/pydata/xarray/pull/1128#issuecomment-261936849

Mixed multi-threading/multi-processing could also be interesting, if anyone wants to revive that: https://github.com/dask/dask/pull/457 (I think it would work now that xarray data stores are pickle-able)

CC @mrocklin","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1189/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
430188626,MDU6SXNzdWU0MzAxODg2MjY=,2873,Dask distributed tests fail locally,1217238,closed,0,,,3,2019-04-07T20:26:53Z,2023-12-05T21:43:02Z,2023-12-05T21:43:02Z,MEMBER,,,,"I'm not sure why, but when I run the integration tests with dask-distributed locally (on my MacBook pro), they fail:
```
$ pytest xarray/tests/test_distributed.py  --maxfail 1
================================================ test session starts =================================================
platform darwin -- Python 3.7.2, pytest-4.0.1, py-1.7.0, pluggy-0.8.0
rootdir: /Users/shoyer/dev/xarray, inifile: setup.cfg
plugins: repeat-0.7.0
collected 19 items

xarray/tests/test_distributed.py F

====================================================== FAILURES ======================================================
__________________________ test_dask_distributed_netcdf_roundtrip[netcdf4-NETCDF3_CLASSIC] ___________________________

loop = <tornado.platform.asyncio.AsyncIOLoop object at 0x1c182da1d0>
tmp_netcdf_filename = '/private/var/folders/15/qdcz0wqj1t9dg40m_ld0fjkh00b4kd/T/pytest-of-shoyer/pytest-3/test_dask_distributed_netcdf_r0/testfile.nc'
engine = 'netcdf4', nc_format = 'NETCDF3_CLASSIC'

    @pytest.mark.parametrize('engine,nc_format', ENGINES_AND_FORMATS)  # noqa
    def test_dask_distributed_netcdf_roundtrip(
            loop, tmp_netcdf_filename, engine, nc_format):

        if engine not in ENGINES:
            pytest.skip('engine not available')

        chunks = {'dim1': 4, 'dim2': 3, 'dim3': 6}

        with cluster() as (s, [a, b]):
            with Client(s['address'], loop=loop):

                original = create_test_data().chunk(chunks)

                if engine == 'scipy':
                    with pytest.raises(NotImplementedError):
                        original.to_netcdf(tmp_netcdf_filename,
                                           engine=engine, format=nc_format)
                    return

                original.to_netcdf(tmp_netcdf_filename,
                                   engine=engine, format=nc_format)

                with xr.open_dataset(tmp_netcdf_filename,
                                     chunks=chunks, engine=engine) as restored:
                    assert isinstance(restored.var1.data, da.Array)
                    computed = restored.compute()
>                   assert_allclose(original, computed)

xarray/tests/test_distributed.py:87:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../miniconda3/envs/xarray-py37/lib/python3.7/contextlib.py:119: in __exit__
    next(self.gen)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

nworkers = 2, nanny = False, worker_kwargs = {}, active_rpc_timeout = 1, scheduler_kwargs = {}

    @contextmanager
    def cluster(nworkers=2, nanny=False, worker_kwargs={}, active_rpc_timeout=1,
                scheduler_kwargs={}):
        ...  # trimmed
        start = time()
        while list(ws):
            sleep(0.01)
>           assert time() < start + 1, 'Workers still around after one second'
E           AssertionError: Workers still around after one second

../../miniconda3/envs/xarray-py37/lib/python3.7/site-packages/distributed/utils_test.py:721: AssertionError
------------------------------------------------ Captured stderr call ------------------------------------------------
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at:     tcp://127.0.0.1:51715
distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:51718
distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:51718
distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:51715
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                          1
distributed.worker - INFO -                Memory:                   17.18 GB
distributed.worker - INFO -       Local Directory: /Users/shoyer/dev/xarray/_test_worker-5cabd1b7-4d9c-49eb-a79e-205c588f5dae/worker-n8uv72yx
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:51720
distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:51720
distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:51715
distributed.scheduler - INFO - Register tcp://127.0.0.1:51718
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                          1
distributed.worker - INFO -                Memory:                   17.18 GB
distributed.worker - INFO -       Local Directory: /Users/shoyer/dev/xarray/_test_worker-71a426d4-bd34-4808-9d33-79cac2bb4801/worker-a70rlf4r
distributed.worker - INFO - -------------------------------------------------
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:51718
distributed.core - INFO - Starting established connection
distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:51715
distributed.worker - INFO - -------------------------------------------------
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register tcp://127.0.0.1:51720
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:51720
distributed.core - INFO - Starting established connection
distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:51715
distributed.worker - INFO - -------------------------------------------------
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Receive client connection: Client-59a7918c-5972-11e9-912a-8c85907bce57
distributed.core - INFO - Starting established connection
distributed.core - INFO - Event loop was unresponsive in Worker for 1.05s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.scheduler - INFO - Receive client connection: Client-worker-5a5c81de-5972-11e9-9136-8c85907bce57
distributed.core - INFO - Starting established connection
distributed.core - INFO - Event loop was unresponsive in Worker for 1.33s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.scheduler - INFO - Receive client connection: Client-worker-5b2496d8-5972-11e9-9137-8c85907bce57
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Remove client Client-59a7918c-5972-11e9-912a-8c85907bce57
distributed.scheduler - INFO - Remove client Client-59a7918c-5972-11e9-912a-8c85907bce57
distributed.scheduler - INFO - Close client connection: Client-59a7918c-5972-11e9-912a-8c85907bce57
distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:51720
distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:51718
distributed.scheduler - INFO - Remove worker tcp://127.0.0.1:51720
distributed.core - INFO - Removing comms to tcp://127.0.0.1:51720
distributed.scheduler - INFO - Remove worker tcp://127.0.0.1:51718
distributed.core - INFO - Removing comms to tcp://127.0.0.1:51718
distributed.scheduler - INFO - Lost all workers
distributed.scheduler - INFO - Remove client Client-worker-5b2496d8-5972-11e9-9137-8c85907bce57
distributed.scheduler - INFO - Remove client Client-worker-5a5c81de-5972-11e9-9136-8c85907bce57
distributed.scheduler - INFO - Close client connection: Client-worker-5b2496d8-5972-11e9-9137-8c85907bce57
distributed.scheduler - INFO - Close client connection: Client-worker-5a5c81de-5972-11e9-9136-8c85907bce57
distributed.scheduler - INFO - Scheduler closing...
distributed.scheduler - INFO - Scheduler closing all comms
```

Version info:
```
In [2]: xarray.show_versions()

INSTALLED VERSIONS
------------------
commit: 2ce0639ee2ba9c7b1503356965f77d847d6cfcdf
python: 3.7.2 (default, Dec 29 2018, 00:00:04)
[Clang 4.0.1 (tags/RELEASE_401/final)]
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.2

xarray: 0.12.1+4.g2ce0639e
pandas: 0.24.0
numpy: 1.15.4
scipy: 1.1.0
netCDF4: 1.4.3.2
pydap: None
h5netcdf: 0.7.0
h5py: 2.9.0
Nio: None
zarr: 2.2.0
cftime: 1.0.3.4
nc_time_axis: None
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 1.1.5
distributed: 1.26.1
matplotlib: 3.0.2
cartopy: 0.17.0
seaborn: 0.9.0
setuptools: 40.0.0
pip: 18.0
conda: None
pytest: 4.0.1
IPython: 6.5.0
sphinx: 1.8.2
```

@mrocklin does this sort of error look familiar to you?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2873/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue
707647715,MDExOlB1bGxSZXF1ZXN0NDkyMDEzODg4,4453,Simplify and restore old behavior for deep-copies,1217238,closed,0,,,3,2020-09-23T20:10:33Z,2023-09-14T03:06:34Z,2023-09-14T03:06:33Z,MEMBER,,1,pydata/xarray/pulls/4453,"Intended to fix https://github.com/pydata/xarray/issues/4449

The goal is to restore behavior to match what we had prior to https://github.com/pydata/xarray/pull/4379 for all types of `data` other than `np.ndarray` objects

Needs tests!

<!-- Feel free to remove check-list items aren't relevant to your change -->

 - [ ] Closes #xxxx
 - [ ] Tests added
 - [ ] Passes `isort . && black . && mypy . && flake8`
 - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
 - [ ] New functions/methods are listed in `api.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4453/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
588105641,MDU6SXNzdWU1ODgxMDU2NDE=,3893,HTML repr in the online docs,1217238,open,0,,,3,2020-03-26T02:17:51Z,2023-09-11T17:41:59Z,,MEMBER,,,,"I noticed two minor issues in our online docs, now that we've switched to the hip new HTML repr by default.

1. Most doc pages still show text, not HTML. I suspect this is a limitation of the [IPython sphinx derictive](https://ipython.readthedocs.io/en/stable/sphinxext.html) we use for our snippets. We might be able to fix that by switching to [jupyter-sphinx](https://jupyter-sphinx.readthedocs.io/en/latest/)?

2. The ""attributes"" part of the HTML repr in our notebook examples [looks a little funny](http://xarray.pydata.org/en/stable/examples/multidimensional-coords.html), with strange blue formatting around each attribute name. It looks like part of the outer style of our docs is leaking into the HTML repr:
![image](https://user-images.githubusercontent.com/1217238/77603390-31bc5a80-6ecd-11ea-911d-f2b6ed2714f6.png)
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3893/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1376109308,I_kwDOAMm_X85SBcL8,7045,Should Xarray stop doing automatic index-based alignment?,1217238,open,0,,,13,2022-09-16T15:31:03Z,2023-08-23T07:42:34Z,,MEMBER,,,,"### What is your issue?

I am increasingly thinking that automatic index-based alignment in Xarray (copied from pandas) may have been a design mistake. Almost every time I work with datasets with different indexes, I find myself writing code to explicitly align them:

1. Automatic alignment is **hard to predict**. The implementation is complicated, and the exact mode of automatic alignment (outer vs inner vs left join) depends on the specific operation. It's also no longer possible to predict the shape (or even the dtype) resulting from most Xarray operations purely from input shape/dtype.
2. Automatic alignment brings unexpected **performance penalty**. In some domains (analytics) this is OK, but in others (e.g,. numerical modeling or deep learning) this is a complete deal-breaker.
3. Automatic alignment is **not useful for float indexes**, because exact matches are rare. In practice, this makes it less useful in Xarray's usual domains than it for pandas.

Would it be insane to consider changing Xarray's behavior to stop doing automatic alignment? I imagine we could roll this out slowly, first with warnings and then with an option for disabling it.

If you think this is a good or bad idea, consider responding to this issue with a 👍  or 👎  reaction.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7045/reactions"", ""total_count"": 13, ""+1"": 9, ""-1"": 2, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 2}",,,13221727,issue
342928718,MDExOlB1bGxSZXF1ZXN0MjAyNzE0MjUx,2302,WIP: lazy=True in apply_ufunc(),1217238,open,0,,,1,2018-07-20T00:01:21Z,2023-07-18T04:19:17Z,,MEMBER,,0,pydata/xarray/pulls/2302," - [x] Closes https://github.com/pydata/xarray/issues/2298
 - [ ] Tests added
 - [ ] Tests passed
 - [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API

Still needs more tests and documentation.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2302/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1767947798,PR_kwDOAMm_X85TkPzV,7933,Update calendar for developers meeting,1217238,closed,0,,,0,2023-06-21T16:09:44Z,2023-06-21T17:56:22Z,2023-06-21T17:56:22Z,MEMBER,,0,pydata/xarray/pulls/7933,"The old calendar was on @jhamman's UCAR account, which he no longer has access to!

xref https://github.com/pydata/xarray/issues/4001","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7933/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
479942077,MDU6SXNzdWU0Nzk5NDIwNzc=,3213,How should xarray use/support sparse arrays?,1217238,open,0,,,55,2019-08-13T03:29:42Z,2023-06-07T15:43:55Z,,MEMBER,,,,"I'm looking forward to being easily able to create sparse xarray objects from pandas: https://github.com/pydata/xarray/issues/3206

Are there other xarray APIs that could make good use of sparse arrays, or could make sparse arrays easier to use?

Some ideas:
- `to_sparse()`/`to_dense()` methods for converting to/from sparse without requiring using  `.data`
- `to_dataframe()`/`to_series()` could grow options for skipping the fill-value in sparse arrays, so they can round-trip MultiIndex data back to pandas
-  Serialization to/from netCDF files, using some custom convention (see https://github.com/pydata/xarray/issues/1375#issuecomment-402699810)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3213/reactions"", ""total_count"": 14, ""+1"": 14, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1465287257,I_kwDOAMm_X85XVoJZ,7325,Support reading Zarr data via TensorStore,1217238,open,0,,,1,2022-11-27T00:12:17Z,2023-05-11T01:24:27Z,,MEMBER,,,,"### What is your issue?

[TensorStore](https://github.com/google/tensorstore/) is another high performance API for reading distributed arrays in formats such as Zarr, written in C++.

It could be interesting to write an Xarray storage backend using TensorStore as an alternative way to read Zarr files.

As an exercise, I make a little demo of doing this: https://gist.github.com/shoyer/5b0c485979cc9c36a9685d8cf8e94565

I have not tested it for performance. The main annoyance is that TensorStore doesn't understand Zarr groups or Zarr array attributes, so I needed to write my own helpers for reading this metadata.

Also, there's a bit of an impedance mis-match between TensorStore (where everything returns futures) and Xarray (which assumes that indexing results in numpy arrays). This could likely be improved with some amount of effort -- in particular https://github.com/pydata/xarray/pull/6874/files should help.

CC @jbms who may have better ideas about how to use the TensorStore API.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7325/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
253395960,MDU6SXNzdWUyNTMzOTU5NjA=,1533,Index variables loaded from dask can be computed twice,1217238,closed,0,,,6,2017-08-28T17:18:27Z,2023-04-06T04:15:46Z,2023-04-06T04:15:46Z,MEMBER,,,,as reported by @crusaderky  in #1522 ,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1533/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
209653741,MDU6SXNzdWUyMDk2NTM3NDE=,1285,FAQ page could use some updating,1217238,open,0,,,1,2017-02-23T03:29:16Z,2023-03-26T16:32:44Z,,MEMBER,,,,"Along the same lines as https://github.com/pydata/xarray/issues/1282, we haven't done much updating for frequently asked questions -- it's mostly still the original handful of FAQ entries I wrote in the first version of the docs.

Topics worth addressing:

- [ ] How xarray handles missing values
- [x] File formats -- how can I read format *X* in xarray? (Maybe we should make a table with links to other packages?)

(please add suggestions for this list!)

StackOverflow may be a helpful reference here: http://stackoverflow.com/questions/tagged/python-xarray?sort=votes&pageSize=50","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1285/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
176805500,MDU6SXNzdWUxNzY4MDU1MDA=,1004,Remove IndexVariable.name,1217238,open,0,,,3,2016-09-14T03:27:43Z,2023-03-11T19:57:40Z,,MEMBER,,,,"As discussed in #947, we should remove the `IndexVariable.name` attribute. It should be fine to use an `IndexVariable` anywhere, regardless of whether or not it labels ticks along a dimension.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1004/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
98587746,MDU6SXNzdWU5ODU4Nzc0Ng==,508,Ignore missing variables when concatenating datasets?,1217238,closed,0,,,8,2015-08-02T06:03:57Z,2023-01-20T16:04:28Z,2023-01-20T16:04:28Z,MEMBER,,,,"Several users (@raj-kesavan, @richardotis, now myself) have wondered about how to concatenate xray Datasets with different variables.

With the current `xray.concat`, you need to awkwardly create dummy variables filled with `NaN` in datasets that don't have them (or drop mismatched variables entirely). Neither of these are great options -- `concat` should have an option (the default?) to take care of this for the user.

This would also be more consistent with `pd.concat`, which takes a more relaxed approach to matching dataframes with different variables (it does an outer join).
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/508/reactions"", ""total_count"": 6, ""+1"": 6, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
895983112,MDExOlB1bGxSZXF1ZXN0NjQ4MTM1NTcy,5351,Add xarray.backends.NoMatchingEngineError,1217238,open,0,,,4,2021-05-19T22:09:21Z,2022-11-16T15:19:54Z,,MEMBER,,0,pydata/xarray/pulls/5351,"<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #5329
- [x] Tests added
- [x] Passes `pre-commit run --all-files`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [x] New functions/methods are listed in `api.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5351/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
803068773,MDExOlB1bGxSZXF1ZXN0NTY5MDU5MTEz,4879,Cache files for different CachingFileManager objects separately,1217238,closed,0,,,10,2021-02-07T21:48:06Z,2022-10-18T16:40:41Z,2022-10-18T16:40:40Z,MEMBER,,0,pydata/xarray/pulls/4879,"This means that explicitly opening a file multiple times with
``open_dataset`` (e.g., after modifying it on disk) now reopens the file
from scratch, rather than reusing a cached version.

If users want to reuse the cached file, they can reuse the same xarray
object. We don't need this for handling many files in Dask (the original
motivation for caching), because in those cases only a single
CachingFileManager is created.

I think this should some long-standing usability issues: #4240, #4862

Conveniently, this also obviates the need for some messy reference
counting logic.

<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #4240, #4862
- [x] Tests added
- [x] Passes `pre-commit run --all-files`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4879/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
623804131,MDU6SXNzdWU2MjM4MDQxMzE=,4090,Error with indexing 2D lat/lon coordinates,1217238,closed,0,,,2,2020-05-24T06:19:45Z,2022-09-28T12:06:03Z,2022-09-28T12:06:03Z,MEMBER,,,,"```
filslp = ""ChonghuaYinData/prmsl.mon.mean.nc""
filtmp = ""ChonghuaYinData/air.sig995.mon.mean.nc""
filprc = ""ChonghuaYinData/precip.mon.mean.nc""

ds_slp = xr.open_dataset(filslp).sel(time=slice(str(yrStrt)+'-01-01', str(yrLast)+'-12-31'))

ds_slp
```
outputs:
```
<xarray.Dataset>
Dimensions:            (nbnds: 2, time: 480, x: 349, y: 277)
Coordinates:
  * time               (time) datetime64[ns] 1979-01-01 ... 2018-12-01
    lat                (y, x) float32 ...
    lon                (y, x) float32 ...
  * y                  (y) float32 0.0 32463.0 64926.0 ... 8927325.0 8959788.0
  * x                  (x) float32 0.0 32463.0 64926.0 ... 11264660.0 11297120.0
Dimensions without coordinates: nbnds
Data variables:
    Lambert_Conformal  int32 ...
    prmsl              (time, y, x) float32 ...
    time_bnds          (time, nbnds) float64 ...
Attributes:
    Conventions:    CF-1.2
    centerlat:      50.0
    centerlon:      -107.0
    comments:       
    institution:    National Centers for Environmental Prediction
    latcorners:     [ 1.000001  0.897945 46.3544   46.63433 ]
    loncorners:     [-145.5       -68.32005    -2.569891  148.6418  ]
    platform:       Model
    standardpar1:   50.0
    standardpar2:   50.000001
    title:          NARR Monthly Means
    dataset_title:  NCEP North American Regional Reanalysis (NARR)
    history:        created 2016/04/12 by NOAA/ESRL/PSD
    references:     https://www.esrl.noaa.gov/psd/data/gridded/data.narr.html
    source:         http://www.emc.ncep.noaa.gov/mmb/rreanl/index.html
    References:     
```

```
yrStrt  = 1950          # manually specify for convenience
yrLast  = 2018          # 20th century ends 2018  

clStrt  = 1950          # reference climatology for SOI
clLast  = 1979          

yrStrtP = 1979          # 1st year GPCP
yrLastP = yrLast        # match 20th century

latT = -17.6         # Tahiti
lonT = 210.75  
latD = -12.5         # Darwin 
lonD = 130.83  

# select grids of T and D
T = ds_slp.sel(lat=latT, lon=lonT, method='nearest')
D = ds_slp.sel(lat=latD, lon=lonD, method='nearest')
```
outputs:
```
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-27-6702b30f473f> in <module>
      1 # select grids of T and D
----> 2 T = ds_slp.sel(lat=latT, lon=lonT, method='nearest')
      3 D = ds_slp.sel(lat=latD, lon=lonD, method='nearest')

~\Anaconda3\lib\site-packages\xarray\core\dataset.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   2004         indexers = either_dict_or_kwargs(indexers, indexers_kwargs, ""sel"")
   2005         pos_indexers, new_indexes = remap_label_indexers(
-> 2006             self, indexers=indexers, method=method, tolerance=tolerance
   2007         )
   2008         result = self.isel(indexers=pos_indexers, drop=drop)

~\Anaconda3\lib\site-packages\xarray\core\coordinates.py in remap_label_indexers(obj, indexers, method, tolerance, **indexers_kwargs)
    378 
    379     pos_indexers, new_indexes = indexing.remap_label_indexers(
--> 380         obj, v_indexers, method=method, tolerance=tolerance
    381     )
    382     # attach indexer's coordinate to pos_indexers

~\Anaconda3\lib\site-packages\xarray\core\indexing.py in remap_label_indexers(data_obj, indexers, method, tolerance)
    257     new_indexes = {}
    258 
--> 259     dim_indexers = get_dim_indexers(data_obj, indexers)
    260     for dim, label in dim_indexers.items():
    261         try:

~\Anaconda3\lib\site-packages\xarray\core\indexing.py in get_dim_indexers(data_obj, indexers)
    223     ]
    224     if invalid:
--> 225         raise ValueError(""dimensions or multi-index levels %r do not exist"" % invalid)
    226 
    227     level_indexers = defaultdict(dict)

ValueError: dimensions or multi-index levels ['lat', 'lon'] do not exist
```

Does any know how fix to this problem?Thank you very much.

_Originally posted by @JimmyGao0204 in https://github.com/pydata/xarray/issues/475#issuecomment-633172787_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4090/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1210147360,I_kwDOAMm_X85IIWIg,6504,test_weighted.test_weighted_operations_nonequal_coords should avoid depending on random number seed,1217238,closed,0,1217238,,0,2022-04-20T19:56:19Z,2022-08-29T20:42:30Z,2022-08-29T20:42:30Z,MEMBER,,,,"### What happened?

In testing an upgrade to the latest version of xarray in our systems, I noticed this test failing:
```
def test_weighted_operations_nonequal_coords():
        # There are no weights for a == 4, so that data point is ignored.
        weights = DataArray(np.random.randn(4), dims=(""a"",), coords=dict(a=[0, 1, 2, 3]))
        data = DataArray(np.random.randn(4), dims=(""a"",), coords=dict(a=[1, 2, 3, 4]))
        check_weighted_operations(data, weights, dim=""a"", skipna=None)
    
        q = 0.5
        result = data.weighted(weights).quantile(q, dim=""a"")
        # Expected value computed using code from [https://aakinshin.net/posts/weighted-quantiles/](https://www.google.com/url?q=https://aakinshin.net/posts/weighted-quantiles/&sa=D) with values at a=1,2,3
        expected = DataArray([0.9308707], coords={""quantile"": [q]}).squeeze()
>       assert_allclose(result, expected)
E       AssertionError: Left and right DataArray objects are not close
E       
E       Differing values:
E       L
E           array(0.919569)
E       R
E           array(0.930871)
```

It appears that this test is hard-coded to match a particular random number seed, which in turn would fix the resutls of `np.random.randn()`.

### What did you expect to happen?

Whenever possible, Xarray's own tests should avoid relying on particular random number generators, e.g., in this case we could specify random numbers instead.

A back-up option would be to explicitly set random seed locally inside the tests, e.g., by creating a `np.random.RandomState()` with a fixed seed and using that. The global random state used by `np.random.randn()` is sensitive to implementation details like the order in which tests are run.

### Minimal Complete Verifiable Example

_No response_

### Relevant log output

_No response_

### Anything else we need to know?

_No response_

### Environment

...","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6504/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1210267320,I_kwDOAMm_X85IIza4,6505,Dropping a MultiIndex variable raises an error after explicit indexes refactor,1217238,closed,0,,,3,2022-04-20T22:07:26Z,2022-07-21T14:46:58Z,2022-07-21T14:46:58Z,MEMBER,,,,"### What happened?

With the latest released version of Xarray, it is possible to delete all variables corresponding to a MultiIndex by simply deleting the name of the MultiIndex.

After the explicit indexes refactor (i.e,. using the ""main"" development branch) this now raises error about how this would ""corrupt"" index state. This comes up when using `drop()` and `assign_coords()` and possibly some other methods.

This is not hard to work around, but we may want to consider this bug a blocker for the next Xarray release. I found the issue surfaced in several projects when attempting to use the new version of Xarray inside Google's codebase.

CC @benbovy in case you have any thoughts to share.

### What did you expect to happen?

For now, we should preserve the behavior of deleting the variables corresponding to MultiIndex levels, but should issue a deprecation warning encouraging users to explicitly delete everything.

### Minimal Complete Verifiable Example

```Python
import xarray

array = xarray.DataArray(
    [[1, 2], [3, 4]],
    dims=['x', 'y'],
    coords={'x': ['a', 'b']},
)
stacked = array.stack(z=['x', 'y'])
print(stacked.drop('z'))
print()
print(stacked.assign_coords(z=[1, 2, 3, 4]))
```


### Relevant log output

```Python
ValueError                                Traceback (most recent call last)
Input In [1], in <cell line: 9>()
      3 array = xarray.DataArray(
      4     [[1, 2], [3, 4]],
      5     dims=['x', 'y'],
      6     coords={'x': ['a', 'b']},
      7 )
      8 stacked = array.stack(z=['x', 'y'])
----> 9 print(stacked.drop('z'))
     10 print()
     11 print(stacked.assign_coords(z=[1, 2, 3, 4]))

File ~/dev/xarray/xarray/core/dataarray.py:2425, in DataArray.drop(self, labels, dim, errors, **labels_kwargs)
   2408 def drop(
   2409     self,
   2410     labels: Mapping = None,
   (...)
   2414     **labels_kwargs,
   2415 ) -> DataArray:
   2416     """"""Backward compatible method based on `drop_vars` and `drop_sel`
   2417
   2418     Using either `drop_vars` or `drop_sel` is encouraged
   (...)
   2423     DataArray.drop_sel
   2424     """"""
-> 2425     ds = self._to_temp_dataset().drop(labels, dim, errors=errors)
   2426     return self._from_temp_dataset(ds)

File ~/dev/xarray/xarray/core/dataset.py:4590, in Dataset.drop(self, labels, dim, errors, **labels_kwargs)
   4584 if dim is None and (is_scalar(labels) or isinstance(labels, Iterable)):
   4585     warnings.warn(
   4586         ""dropping variables using `drop` will be deprecated; using drop_vars is encouraged."",
   4587         PendingDeprecationWarning,
   4588         stacklevel=2,
   4589     )
-> 4590     return self.drop_vars(labels, errors=errors)
   4591 if dim is not None:
   4592     warnings.warn(
   4593         ""dropping labels using list-like labels is deprecated; using ""
   4594         ""dict-like arguments with `drop_sel`, e.g. `ds.drop_sel(dim=[labels])."",
   4595         DeprecationWarning,
   4596         stacklevel=2,
   4597     )

File ~/dev/xarray/xarray/core/dataset.py:4549, in Dataset.drop_vars(self, names, errors)
   4546 if errors == ""raise"":
   4547     self._assert_all_in_dataset(names)
-> 4549 assert_no_index_corrupted(self.xindexes, names)
   4551 variables = {k: v for k, v in self._variables.items() if k not in names}
   4552 coord_names = {k for k in self._coord_names if k in variables}

File ~/dev/xarray/xarray/core/indexes.py:1394, in assert_no_index_corrupted(indexes, coord_names)
   1392 common_names_str = "", "".join(f""{k!r}"" for k in common_names)
   1393 index_names_str = "", "".join(f""{k!r}"" for k in index_coords)
-> 1394 raise ValueError(
   1395     f""cannot remove coordinate(s) {common_names_str}, which would corrupt ""
   1396     f""the following index built from coordinates {index_names_str}:\n""
   1397     f""{index}""
   1398 )

ValueError: cannot remove coordinate(s) 'z', which would corrupt the following index built from coordinates 'z', 'x', 'y':
<xarray.core.indexes.PandasMultiIndex object at 0x148c95150>
```


### Anything else we need to know?

_No response_

### Environment

<details>

INSTALLED VERSIONS
------------------
commit: 33cdabd261b5725ac357c2823bd0f33684d3a954
python: 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:42:03) [Clang 12.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 21.4.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 0.18.3.dev137+g96c56836
pandas: 1.4.2
numpy: 1.22.3
scipy: 1.8.0
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.11.3
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.04.1
distributed: 2022.4.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2022.3.0
cupy: None
pint: None
sparse: None
setuptools: 62.1.0
pip: 22.0.4
conda: None
pytest: 7.1.1
IPython: 8.2.0
sphinx: None
</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6505/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
168272291,MDExOlB1bGxSZXF1ZXN0NzkzMjE2NTc=,924,WIP: progress toward making groupby work with multiple arguments,1217238,open,0,,,16,2016-07-29T08:07:57Z,2022-06-09T14:50:17Z,,MEMBER,,0,pydata/xarray/pulls/924,"Fixes #324 

It definitely doesn't work properly yet, totally mixing up coordinates,
data variables and multi-indexes (as shown by the failing tests).

A simple example:

```
In [4]: coords = {'a': ('x', [0, 0, 1, 1]), 'b': ('y', [0, 0, 1, 1])}

In [5]: square = xr.DataArray(np.arange(16).reshape(4, 4), coords=coords, dims=['x', 'y'])

In [6]: square
Out[6]:
<xarray.DataArray (x: 4, y: 4)>
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
Coordinates:
    b        (y) int64 0 0 1 1
    a        (x) int64 0 0 1 1
  * x        (x) int64 0 1 2 3
  * y        (y) int64 0 1 2 3

In [7]: square.groupby(['a', 'b']).mean()
Out[7]:
<xarray.DataArray (a: 2, b: 2)>
array([[  2.5,   4.5],
       [ 10.5,  12.5]])
Coordinates:
  * a        (a) int64 0 1
  * b        (b) int64 0 1

In [8]: square.groupby(['x', 'y']).mean()
Out[8]:
<xarray.DataArray (x: 4, y: 4)>
array([[  0.,   1.,   2.,   3.],
       [  4.,   5.,   6.,   7.],
       [  8.,   9.,  10.,  11.],
       [ 12.,  13.,  14.,  15.]])
Coordinates:
  * x        (x) int64 0 1 2 3
  * y        (y) int64 0 1 2 3
```

More examples:
https://gist.github.com/shoyer/5cfa4d5751e8a78a14af25f8442ad8d5
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/924/reactions"", ""total_count"": 4, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 3, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
711626733,MDU6SXNzdWU3MTE2MjY3MzM=,4473,Wrap numpy-groupies to speed up Xarray's groupby aggregations,1217238,closed,0,,,8,2020-09-30T04:43:04Z,2022-05-15T02:38:29Z,2022-05-15T02:38:29Z,MEMBER,,,,"<!-- Please do a quick search of existing issues to make sure that this has not been asked before. -->

**Is your feature request related to a problem? Please describe.**

Xarray's groupby aggregations (e.g., `groupby(..).sum()`) are very slow compared to pandas, as described in https://github.com/pydata/xarray/issues/659.

**Describe the solution you'd like**

We could speed things up considerably (easily 100x) by wrapping the [numpy-groupies](https://github.com/ml31415/numpy-groupies) package.

**Additional context**

One challenge is how to handle dask arrays (and other duck arrays). In some cases it might make sense to apply the numpy-groupies function (using apply_ufunc), but in other cases it might be better to stick with the current indexing + concatenate solution. We could either pick some simple heuristics for choosing the algorithm to use on dask arrays, or could just stick with the current algorithm for now.

In particular, it might make sense to stick with the current algorithm if there are a many chunks in the arrays to aggregated along the ""grouped"" dimension (depending on the size of the unique group values).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4473/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
326205036,MDU6SXNzdWUzMjYyMDUwMzY=,2180,How should Dataset.update() handle conflicting coordinates?,1217238,open,0,,,16,2018-05-24T16:46:23Z,2022-04-30T13:40:28Z,,MEMBER,,,,"Recently, we updated `Dataset.__setitem__` to drop conflicting coordinates from DataArray values being assigned if they conflict with existing coordinates (https://github.com/pydata/xarray/pull/2087). Because `update` and `__setitem__` share the same code path, this inadvertently updated `update` as well. Is this something we want?

In v0.10.3, both `__setitem__` and `update` prioritize coordinates from the assigned objects (e.g., `value` in `dataset[key] = value`).

In v0.10.4, both `__setitem__` and `update` prioritize coordinates from the original object (e.g., `dataset`).

I'm not sure this is the right behavior. In particular, in the case of `dataset.update(other)` where `other` is also an `xarray.Dataset`, it seems like coordinates from `other` should take priority.

Note that one advantage of the current logic (which is violated by my current fix in https://github.com/pydata/xarray/pull/2162), is that we maintain the invariant that `dataset[key] = value` is equivalent to `dataset.update({key: value})`.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2180/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
612918997,MDU6SXNzdWU2MTI5MTg5OTc=,4034,Fix tight_layout warning on cartopy facetgrid docs example,1217238,open,0,,,1,2020-05-05T21:54:46Z,2022-04-30T12:37:50Z,,MEMBER,,,,"Per the fix in https://github.com/pydata/xarray/pull/4032, I'm pretty sure we will soon start seeing a warning message printed on ReadTheDocs in Cartopy FacetGrid example:
http://xarray.pydata.org/en/stable/plotting.html#maps

This would be nice to fix for users, especially because it's likely users will see this warning when running code outside of our documentation, too.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4034/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
621123222,MDU6SXNzdWU2MjExMjMyMjI=,4081,"Wrap ""Dimensions"" onto multiple lines in xarray.Dataset repr?",1217238,closed,0,,,4,2020-05-19T16:31:59Z,2022-04-29T19:59:24Z,2022-04-29T19:59:24Z,MEMBER,,,,"Here's an example dataset of a large dataset from @alimanfoo:
https://nbviewer.jupyter.org/gist/alimanfoo/b74b08465727894538d5b161b3ced764
```
<xarray.Dataset>
Dimensions:                         (__variants/BaseCounts_dim1: 4, __variants/MLEAC_dim1: 3, __variants/MLEAF_dim1: 3, alt_alleles: 3, ploidy: 2, samples: 1142, variants: 21442865)
Coordinates:
    samples/ID                      (samples) object dask.array<chunksize=(1142,), meta=np.ndarray>
    variants/CHROM                  (variants) object dask.array<chunksize=(21442865,), meta=np.ndarray>
    variants/POS                    (variants) int32 dask.array<chunksize=(4194304,), meta=np.ndarray>
Dimensions without coordinates: __variants/BaseCounts_dim1, __variants/MLEAC_dim1, __variants/MLEAF_dim1, alt_alleles, ploidy, samples, variants
Data variables:
    variants/ABHet                  (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray>
    variants/ABHom                  (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray>
    variants/AC                     (variants, alt_alleles) int32 dask.array<chunksize=(4194304, 3), meta=np.ndarray>
    variants/AF                     (variants, alt_alleles) float32 dask.array<chunksize=(4194304, 3), meta=np.ndarray>
...
```

I know similarly large datasets with lots of dimensions come up in other contexts as well, e.g., with geophysical model output.

That's a very long first line! This would be easier to read as:
```
<xarray.Dataset>
Dimensions:                         (__variants/BaseCounts_dim1: 4, __variants/MLEAC_dim1: 3,
                                     __variants/MLEAF_dim1: 3, alt_alleles: 3, ploidy: 2,
                                     samples: 1142, variants: 21442865)
Coordinates:
    samples/ID                      (samples) object dask.array<chunksize=(1142,), meta=np.ndarray>
    variants/CHROM                  (variants) object dask.array<chunksize=(21442865,), meta=np.ndarray>
    variants/POS                    (variants) int32 dask.array<chunksize=(4194304,), meta=np.ndarray>
Dimensions without coordinates: __variants/BaseCounts_dim1, __variants/MLEAC_dim1, __variants/MLEAF_dim1, alt_alleles, ploidy, samples, variants
Data variables:
    variants/ABHet                  (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray>
    variants/ABHom                  (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray>
    variants/AC                     (variants, alt_alleles) int32 dask.array<chunksize=(4194304, 3), meta=np.ndarray>
    variants/AF                     (variants, alt_alleles) float32 dask.array<chunksize=(4194304, 3), meta=np.ndarray>
...
```

or maybe:
```
<xarray.Dataset>
Dimensions:
    __variants/BaseCounts_dim1: 4
    __variants/MLEAC_dim1: 3
    __variants/MLEAF_dim1: 3
    alt_alleles: 3
    ploidy: 2
    samples: 1142
    variants: 21442865
Coordinates:
    samples/ID                      (samples) object dask.array<chunksize=(1142,), meta=np.ndarray>
    variants/CHROM                  (variants) object dask.array<chunksize=(21442865,), meta=np.ndarray>
    variants/POS                    (variants) int32 dask.array<chunksize=(4194304,), meta=np.ndarray>
Dimensions without coordinates: __variants/BaseCounts_dim1, __variants/MLEAC_dim1, __variants/MLEAF_dim1, alt_alleles, ploidy, samples, variants
Data variables:
    variants/ABHet                  (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray>
    variants/ABHom                  (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray>
    variants/AC                     (variants, alt_alleles) int32 dask.array<chunksize=(4194304, 3), meta=np.ndarray>
    variants/AF                     (variants, alt_alleles) float32 dask.array<chunksize=(4194304, 3), meta=np.ndarray>
...
```

`Dimensions without coordinates` could probably use some wrapping, too.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4081/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
205455788,MDU6SXNzdWUyMDU0NTU3ODg=,1251,Consistent naming for xarray's methods that apply functions,1217238,closed,0,,,13,2017-02-05T21:27:24Z,2022-04-27T20:06:25Z,2022-04-27T20:06:25Z,MEMBER,,,,"We currently have two types of methods that take a function to apply to xarray objects:
- `pipe` (on `DataArray` and `Dataset`): apply a function to this entire object (`array.pipe(func)` -> `func(array)`)
- `apply` (on `Dataset` and `GroupBy`): apply a function to each labeled object in this object (e.g., `ds.apply(func)` -> `ds({k: func(v) for k, v in ds.data_vars.items()})`).

And one more method that we want to add but isn't finalized yet -- currently named `apply_ufunc`:
- Apply a function that acts on unlabeled (i.e., numpy) arrays to each array in the object

I'd like to have three distinct names that makes it clear what these methods do and how they are different.  This has come up a few times recently, e.g., https://github.com/pydata/xarray/issues/1130

One proposal: rename `apply` to `map`, and then use `apply` only for methods that act on unlabeled arrays. This would require a deprecation cycle, but eventually it would let us add `.apply` methods for handling raw arrays to both Dataset and DataArray. (We could use a separate apply method from `apply_ufunc` to convert `dim` arguments to `axis` and not do automatic broadcasting.)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1251/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
342180429,MDU6SXNzdWUzNDIxODA0Mjk=,2298,Making xarray math lazy,1217238,open,0,,,7,2018-07-18T05:18:53Z,2022-04-19T15:38:59Z,,MEMBER,,,,"At SciPy, I had the realization that it would be relatively straightforward to make element-wise math between xarray objects lazy. This would let us support lazy coordinate arrays, a feature that has quite a few use-cases, e.g., for both geoscience and astronomy.

The trick would be to write a lazy array class that holds an element-wise vectorized function and passes indexers on to its arguments. I haven't thought too hard about this yet for vectorized indexing, but it could be quite efficient for outer indexing. I have some prototype code but no tests yet.

The question is how to hook this into xarray operations. In particular, supposing that the inputs to a function do no hold dask arrays:
- Should we try to make *every* element-wise operation with vectorized functions (ufuncs) lazy by default? This might have negative performance implications and would be a little tricky to implement with xarray's current code, since we still implement binary operations like `+` with separate logic from `apply_ufunc`.
- Should we make every element-wise operation that explicitly uses `apply_ufunc()` lazy by default?
- Or should we only make element-wise operations lazy with `apply_ufunc()` if you use some special flag, e.g., `apply_ufunc(..., lazy=True)`?

I am leaning towards the last option for now but would welcome other opinions.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2298/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
902622057,MDU6SXNzdWU5MDI2MjIwNTc=,5381,concat() with compat='no_conflicts' on dask arrays has accidentally quadratic runtime,1217238,open,0,,,0,2021-05-26T16:12:06Z,2022-04-19T03:48:27Z,,MEMBER,,,,"This ends up calling `fillna()` in a loop inside `xarray.core.merge.unique_variable()`, something like:
```python
  out = variables[0]
  for var in variables[1:]:
    out = out.fillna(var)
```
https://github.com/pydata/xarray/blob/55e5b5aaa6d9c27adcf9a7cb1f6ac3bf71c10dea/xarray/core/merge.py#L147-L149

This has quadratic behavior if the variables are stored in dask arrays (the dask graph gets one element larger after each loop iteration). This is OK for `merge()` (which typically only has two arguments) but is problematic for dealing with variables that shouldn't be concatenated inside `concat()`, which should be able to handle very long lists of arguments.

I encountered this because `compat='no_conflicts'` is the default for `xarray.combine_nested()`.

I guess there's also the related issue which is that even if we produced the output dask graph by hand without a loop, it still wouldn't be easy to evaluate for a large number of elements. Ideally we would use some sort of tree-reduction to ensure the operation can be parallelized.

xref https://github.com/google/xarray-beam/pull/13","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5381/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
325439138,MDU6SXNzdWUzMjU0MzkxMzg=,2171,Support alignment/broadcasting with unlabeled dimensions of size 1,1217238,open,0,,,5,2018-05-22T19:52:21Z,2022-04-19T03:15:24Z,,MEMBER,,,,"Sometimes, it's convenient to include placeholder dimensions of size 1, which allows for removing any ambiguity related to the order of output dimensions.

Currently, this is not supported with xarray:
```
>>> xr.DataArray([1], dims='x') + xr.DataArray([1, 2, 3], dims='x')
ValueError: arguments without labels along dimension 'x' cannot be aligned because they have different dimension sizes: {1, 3}

>>> xr.Variable(('x',), [1]) + xr.Variable(('x',), [1, 2, 3])
ValueError: operands cannot be broadcast together with mismatched lengths for dimension 'x': (1, 3)
```

However, these operations aren't really ambiguous. With size 1 dimensions, we could logically do broadcasting like NumPy arrays, e.g.,
```
>>> np.array([1]) + np.array([1, 2, 3])
array([2, 3, 4])
```

This would be particularly convenient if we add `keepdims=True` to xarray operations (#2170).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2171/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
665488672,MDU6SXNzdWU2NjU0ODg2NzI=,4267,CachingFileManager should not use __del__,1217238,open,0,,,2,2020-07-25T01:20:52Z,2022-04-17T21:42:39Z,,MEMBER,,,,"`__del__` is sometimes called after modules have been deallocated, which results in errors printed to stderr when Python exits. This manifests itself in the following bug:
https://github.com/shoyer/h5netcdf/issues/50

Per https://github.com/shoyer/h5netcdf/issues/50#issuecomment-572191867, the right solution is probably to use `weakref.finalize`.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4267/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
469440752,MDU6SXNzdWU0Njk0NDA3NTI=,3139,"Change the signature of DataArray to DataArray(data, dims, coords, ...)?",1217238,open,0,,,1,2019-07-17T20:54:57Z,2022-04-09T15:28:51Z,,MEMBER,,,,"Currently, the signature of DataArray is `DataArray(data, coords, dims, ...)`: http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html

In the long term, I think `DataArray(data, dims, coords, ...)` would be more intuitive: dimensions are a more fundamental part of xarray's data model than coordinates. Certainly I find it much more common to omit `coords` than to omit `dims` when I create a `DataArray`.

My original reasoning for this argument order was that `dims` could be copied from `coords`, e.g., `DataArray(new_data, old_dataarray.coords)`, and it was nice to be able to pass this sole argument by position instead of by name. But a cleaner way to write this now is `old_dataarray.copy(data=new_data)`.

The challenge in making any change here would be to have a smooth deprecation process, and that ideally avoids requiring users to rewrite all of their code and avoids loads of pointless/extraneous warnings. I'm not entirely sure this is possible. We could likely use heuristics to distinguish between `dims` and `coords` arguments regardless of their order, but this probably isn't something we would want to preserve in the long term.

An alternative that might achieve some of the convenience of this change would be to allow for passing lists of strings in the `coords` argument by position, which are interpreted as dimensions, e.g., `DataArray(data, ['x', 'y'])`. The downside of this alternative is that it would add even more special cases to the `DataArray` constructor , which would make it harder to understand.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3139/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
327166000,MDExOlB1bGxSZXF1ZXN0MTkxMDMwMjA4,2195,WIP: explicit indexes,1217238,closed,0,,,3,2018-05-29T04:25:15Z,2022-03-21T14:59:52Z,2022-03-21T14:59:52Z,MEMBER,,0,pydata/xarray/pulls/2195,"Some utility functions that should be useful for https://github.com/pydata/xarray/issues/1603

Still very much a work in progress -- it would be great if someone has time to finish writing any of these in another PR!","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2195/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
864249974,MDU6SXNzdWU4NjQyNDk5NzQ=,5202,Make creating a MultiIndex in stack optional,1217238,closed,0,,,7,2021-04-21T20:21:03Z,2022-03-17T17:11:42Z,2022-03-17T17:11:42Z,MEMBER,,,,"As @Hoeze notes in https://github.com/pydata/xarray/issues/5179, calling `stack()` can be ""incredibly slow and memory-demanding, since it creates a MultiIndex of every possible coordinate in the array.""

This is true with how `stack()` works currently, but I'm not sure this is necessary. I suspect it's a vestigial design choice from copying pandas, back from before Xarray had optional indexes. One benefit is that it's convenient for making `unstack()` the inverse of `stack()`, but isn't always required.

Regardless of how we define the semantics for boolean indexing (https://github.com/pydata/xarray/issues/1887), it seems like it could be a good idea to allow stack to skip creating a MultiIndex for the new dimension, via a new keyword argument such as `ds.stack(index=False)`. This would be equivalent to calling `reset_index()` after `stack()` but would be cheaper because the MultiIndex is never created in the first place.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5202/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
237008177,MDU6SXNzdWUyMzcwMDgxNzc=,1460,groupby should still squeeze for non-monotonic inputs,1217238,open,0,,,5,2017-06-19T20:05:14Z,2022-03-04T21:31:41Z,,MEMBER,,,,"We can simply use `argsort()` to determine `group_indices` instead of `np.arange()`:
https://github.com/pydata/xarray/blob/22ff955d53e253071f6e4fa849e5291d0005282a/xarray/core/groupby.py#L256","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1460/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
58117200,MDU6SXNzdWU1ODExNzIwMA==,324,Support multi-dimensional grouped operations and group_over,1217238,open,0,,741199,12,2015-02-18T19:42:20Z,2022-02-28T19:03:17Z,,MEMBER,,,,"Multi-dimensional grouped operations should be relatively straightforward -- the main complexity will be writing an N-dimensional concat that doesn't involve repetitively copying data.

The idea with `group_over` would be to support groupby operations that act on a single element from each of the given groups, rather than the unique values. For example, `ds.group_over(['lat', 'lon'])` would let you iterate over or apply to 2D slices of `ds`, no matter how many dimensions it has.

Roughly speaking (it's a little more complex for the case of non-dimension variables), `ds.group_over(dims)` would get translated into `ds.groupby([d for d in ds.dims if d not in dims])`. 

Related: #266
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/324/reactions"", ""total_count"": 18, ""+1"": 18, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1090700695,I_kwDOAMm_X85BAsWX,6125,[Bug]: HTML repr does not display well in notebooks hosted on GitHub,1217238,open,0,,,0,2021-12-29T19:05:49Z,2021-12-29T19:36:25Z,,MEMBER,,,,"### What happened?

We see _both_ the raw text *and* a malformed version of the HTML (without CSS formatting).

Example (https://github.com/microsoft/PlanetaryComputerExamples/blob/main/quickstarts/reading-zarr-data.ipynb):
![image](https://user-images.githubusercontent.com/1217238/147695209-127feae1-7dd2-48b9-9626-f0c8eb3815eb.png)

### What did you expect to happen?

Either:

1. Ideally, we only see the HTML repr, with CSS formatting applied.
2. Or, if that isn't possible, we should figure out how to only show the raw text.

nbviewer [gets this right](https://nbviewer.org/github/microsoft/PlanetaryComputerExamples/blob/main/quickstarts/reading-zarr-data.ipynb):
![image](https://user-images.githubusercontent.com/1217238/147695174-eebcefff-f99a-4391-b9c1-13ccf77f36ba.png)


### Minimal Complete Verifiable Example

_No response_

### Relevant log output

_No response_

### Anything else we need to know?

_No response_

### Environment

NA","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6125/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1062709354,PR_kwDOAMm_X84u-sO9,6025,Simplify missing value handling in xarray.corr,1217238,closed,0,,,1,2021-11-24T17:48:03Z,2021-11-28T04:39:22Z,2021-11-28T04:39:22Z,MEMBER,,0,pydata/xarray/pulls/6025,"This PR simplifies the fix from https://github.com/pydata/xarray/pull/5731, specifically for the benefit of xarray.corr. There is no need to use `map_blocks` instead of using `where` directly.

It is a basically an alternative version of https://github.com/pydata/xarray/pull/5284. It is potentially slightly less efficient to do this masking step when unnecessary, but I doubt this makes a noticeable performance difference in practice (and I doubt this optimization is useful insdie `map_blocks`, anyways).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6025/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1044151556,PR_kwDOAMm_X84uELYB,5935,Docs: fix URL for PTSA,1217238,closed,0,,,1,2021-11-03T21:56:44Z,2021-11-05T09:36:04Z,2021-11-05T09:36:04Z,MEMBER,,0,pydata/xarray/pulls/5935,One of the PTSA authors told me about the new URL by email.,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5935/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
874292512,MDU6SXNzdWU4NzQyOTI1MTI=,5251,Switch default for Zarr reading/writing to consolidated=True?,1217238,closed,0,,,4,2021-05-03T06:59:42Z,2021-08-30T15:21:11Z,2021-08-30T15:21:11Z,MEMBER,,,,"Consolidated metadata was a new feature in Zarr v2.3, which was released over two year ago (March 22, 2019).

Since then, I have used `consolidated=True` _every_ time I've written or opened a Zarr store. As far as I can tell, this is almost always a good idea:
- With local storage, it usually doesn't really matter. You spend a bit of time writing the consolidated metadata and have one extra file on disk, but the overhead is typically negligible.
- With Cloud object stores or network filesystems, it can matter quite a large amount. Without consolidated metadata, these systems can be unusably slow for opening datasets. Cloud storage is of course the main use-case for Zarr. If you're using a local disk, you might as well stick with single files such as netCDF.

I wonder if consolidated metadata is mature enough now that we could consider switching the default behavior in Xarray. From my perspective, this is a big ""gotcha"" for getting good performance with Zarr. More than one of my colleagues has been unimpressed with the performance of Zarr until they learned to set `consolidated=True`.

I would suggest doing this in way is almost entirely backwards compatible, with only a minor performance costs for reading non-consolidated datasets:
- `to_zarr()` switches the default to `consolidated=True`. The `consolidate_metadata()` will thus happen by default.
- `open_zarr()` switches the default to `consolidated=None`, which means ""Try reading consolidated metadata, and fall-back to non-consolidated if that fails."" This will be slightly slower for non-consolidated metadata due to the extra file-lookup, but given that opening with non-consolidated metadata already requires a moderately large number of file look-ups, I doubt anyone will notice the difference.

CC @rabernat ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5251/reactions"", ""total_count"": 11, ""+1"": 11, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
928402742,MDU6SXNzdWU5Mjg0MDI3NDI=,5516,Rename master branch -> main,1217238,closed,0,,,4,2021-06-23T15:45:57Z,2021-07-23T21:58:39Z,2021-07-23T21:58:39Z,MEMBER,,,,"This is a best practice for inclusive projects.

See https://github.com/github/renaming for guidance.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5516/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
948890466,MDExOlB1bGxSZXF1ZXN0NjkzNjY1NDEy,5624,Make typing-extensions optional,1217238,closed,0,,,6,2021-07-20T17:43:22Z,2021-07-22T23:30:49Z,2021-07-22T23:02:03Z,MEMBER,,0,pydata/xarray/pulls/5624,"Type checking may be a little worse if typing-extensions are not installed, but I don't think it's worth the trouble of adding another hard dependency just for one use for TypeGuard.

Note: sadly this doesn't work yet. Mypy (and pylance) don't like the type alias defined with try/except. Any ideas? In the worst case, we could revert the TypeGuard entirely, but that would be a shame...

- [x] Closes #5495
- [x] Passes `pre-commit run --all-files`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5624/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
890534794,MDU6SXNzdWU4OTA1MzQ3OTQ=,5295,"Engine is no longer inferred for filenames not ending in "".nc""",1217238,closed,0,,,0,2021-05-12T22:28:46Z,2021-07-15T14:57:54Z,2021-05-14T22:40:14Z,MEMBER,,,,"This works with xarray=0.17.0:
```python
import xarray
xarray.Dataset({'x': [1, 2, 3]}).to_netcdf('tmp')
xarray.open_dataset('tmp')
```

On xarray 0.18.0, it fails:
```
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-20e128a730aa> in <module>()
      2 
      3 xarray.Dataset({'x': [1, 2, 3]}).to_netcdf('tmp')
----> 4 xarray.open_dataset('tmp')

/usr/local/lib/python3.7/dist-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    483 
    484     if engine is None:
--> 485         engine = plugins.guess_engine(filename_or_obj)
    486 
    487     backend = plugins.get_backend(engine)

/usr/local/lib/python3.7/dist-packages/xarray/backends/plugins.py in guess_engine(store_spec)
    110             warnings.warn(f""{engine!r} fails while guessing"", RuntimeWarning)
    111 
--> 112     raise ValueError(""cannot guess the engine, try passing one explicitly"")
    113 
    114 

ValueError: cannot guess the engine, try passing one explicitly
```

I'm not entirely sure what changed. My guess is that we used to fall-back to trying to use SciPy, but don't do that anymore. A potential fix would be reading strings as filenames in `xarray.backends.utils.read_magic_number`.

Related: https://github.com/pydata/xarray/issues/5291","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5295/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
252707680,MDU6SXNzdWUyNTI3MDc2ODA=,1525,Consider setting name=False in Variable.chunk(),1217238,open,0,,,4,2017-08-24T19:34:28Z,2021-07-13T01:50:16Z,,MEMBER,,,,"@mrocklin writes:

>  The following will be slower:
```
b = (a.chunk(...) + 1) + (a.chunk(...) + 1)
```
> In current operation this will be optimized to
```
tmp = a.chunk(...) + 1
b = tmp + tmp
```
> So you'll lose that, but I suspect that in your case chunking the same dataset many times is somewhat rare.

See here for discussion: https://github.com/pydata/xarray/pull/1517#issuecomment-324722153

Whether this is worth doing really depends on on what people would find most useful -- and what is the most intuitive behavior.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1525/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
254888879,MDU6SXNzdWUyNTQ4ODg4Nzk=,1552,Flow chart for choosing indexing operations,1217238,open,0,,,2,2017-09-03T17:33:30Z,2021-07-11T22:26:17Z,,MEMBER,,,,"We have a lot of indexing operations, even though `sel_points` and `isel_points` are about to be deprecated (#1473).

A flow chart / decision tree to help users pick the right indexing operation might be helpful (e.g., like [this skimage FlowChart](http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html)). It would ask various questions (e.g., do you have labels or integer positions? do you want to select or impose coordinates?) and then suggest appropriate the indexer methods.

cc @fujiisoup ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1552/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
891281614,MDU6SXNzdWU4OTEyODE2MTQ=,5302,Suggesting specific IO backends to install when open_dataset() fails,1217238,closed,0,,,3,2021-05-13T18:45:28Z,2021-06-23T08:18:07Z,2021-06-23T08:18:07Z,MEMBER,,,,"Currently, Xarray's internal backends don't get registered unless the necessary dependencies are installed:
https://github.com/pydata/xarray/blob/1305d9b624723b86050ca5b2d854e5326bbaa8e6/xarray/backends/netCDF4_.py#L567-L568

In order to facilitating suggesting a specific backend to install (e.g., to improve error messages from opening tutorial datasets https://github.com/pydata/xarray/issues/5291), I would suggest that Xarray _always_ registers its own backend entrypoints. Then we make the following changes to the plugin protocol:

- `guess_can_open()` should work _regardless_ of whether the underlying backend is installed
- `installed()` returns a boolean reporting whether backend is installed. The default method in the base class would return `True`, for backwards compatibility.
- `open_dataset()` of course should error if the backend is not installed.

This will let us leverage the existing `guess_can_open()` functionality to suggest specific optional dependencies to install. E.g., if you supply a netCDF3 file: `Xarray cannot find a matching installed backend for this file in the installed backends [""h5netcdf""]. Consider installing one of the following backends which reports a match: [""scipy"", ""netcdf4""]`

Does this reasonable and worthwhile?

CC @aurghs @alexamici ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5302/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
874331538,MDExOlB1bGxSZXF1ZXN0NjI4OTE0NDQz,5252,"Add mode=""r+"" for to_zarr and use consolidated writes/reads by default",1217238,closed,0,,,14,2021-05-03T07:57:16Z,2021-06-22T06:51:35Z,2021-06-17T17:19:26Z,MEMBER,,0,pydata/xarray/pulls/5252,"`mode=""r+""` only allows for modifying pre-existing array values in a
Zarr store. This makes it a safer default `mode` when doing a limited
`region` write. It also offers a nice performance bonus when using
consolidated metadata, because the store to modify can be opened in
""consolidated"" mode -- rather than painfully slow non-consolidated
mode.

This PR includes several related changes to `to_zarr()`:

1. It adds support for the new `mode=""r+""`.
2. `consolidated=True` in `to_zarr()` now means ""open in consolidated
   mode"" if using using `mode=""r+""`, instead of ""write in consolidated
   mode"" (which would not make sense for r+).
3. It allows setting `consolidated=True` when using `region`, mostly
   for the sake of fast store opening with r+.
4. Validation in `to_zarr()` has been reorganized to always use the
   _existing_ Zarr group, rather than re-opening zar stores from
   scratch, which could require additional network requests.
5. Incidentally, I've renamed the `ZarrStore.ds` attribute to
   `ZarrStore.zarr_group`, which is a much more descriptive name.

These changes gave me a ~5x boost in write performance in a large
parallel job making use of `to_zarr` with `region`.

<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Tests added
- [x] Passes `pre-commit run --all-files`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5252/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
340733448,MDU6SXNzdWUzNDA3MzM0NDg=,2283,Exact alignment should allow missing dimension coordinates,1217238,open,0,,,2,2018-07-12T17:40:24Z,2021-06-15T09:52:29Z,,MEMBER,,,,"#### Code Sample, a copy-pastable example if possible

```python
import xarray as xr
xr.align(xr.DataArray([1, 2, 3], dims='x'),
         xr.DataArray([1, 2, 3], dims='x', coords=[[0, 1, 2]]),
         join='exact')
```
#### Problem description

This currently results in an error, but a missing index of size 3 does not actually conflict:
```python-traceback
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-1d63d3512fb6> in <module>()
      1 xr.align(xr.DataArray([1, 2, 3], dims='x'),
      2          xr.DataArray([1, 2, 3], dims='x', coords=[[0, 1, 2]]),
----> 3          join='exact')

/usr/local/lib/python3.6/dist-packages/xarray/core/alignment.py in align(*objects, **kwargs)
    129                     raise ValueError(
    130                         'indexes along dimension {!r} are not equal'
--> 131                         .format(dim))
    132                 index = joiner(matching_indexes)
    133                 joined_indexes[dim] = index

ValueError: indexes along dimension 'x' are not equal
```

This surfaced as an issue on StackOverflow: https://stackoverflow.com/questions/51308962/computing-matrix-vector-multiplication-for-each-time-point-in-two-dataarrays

#### Expected Output

Both output arrays should end up with the `x` coordinate from the input that has it, like the output of the above expression if `join='inner'`:
```
(<xarray.DataArray (x: 3)>
 array([1, 2, 3])
 Coordinates:
   * x        (x) int64 0 1 2, <xarray.DataArray (x: 3)>
 array([1, 2, 3])
 Coordinates:
   * x        (x) int64 0 1 2)
```

#### Output of ``xr.show_versions()``

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.33+
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

xarray: 0.10.7
pandas: 0.22.0
numpy: 1.14.5
scipy: 0.19.1
netCDF4: None
h5netcdf: None
h5py: 2.8.0
Nio: None
zarr: None
bottleneck: None
cyordereddict: None
dask: None
distributed: None
matplotlib: 2.1.2
cartopy: None
seaborn: 0.7.1
setuptools: 39.1.0
pip: 10.0.1
conda: None
pytest: None
IPython: 5.5.0
sphinx: None
</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2283/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
842438533,MDU6SXNzdWU4NDI0Mzg1MzM=,5082,Move encoding from xarray.Variable to duck arrays?,1217238,open,0,,,2,2021-03-27T07:21:55Z,2021-06-13T01:34:00Z,,MEMBER,,,,"The `encoding` property on `Variable` has always been an awkward part of Xarray's API, and an example of poor separation of concerns. It add conceptual overhead to all uses of `xarray.Variable`, but exists only for the (somewhat niche) benefit of Xarray's backend IO functionality. This is particularly problematic if we consider the possible separation of `xarray.Variable` into a separate package to remove the pandas dependency (https://github.com/pydata/xarray/issues/3981).

I think a cleaner way to handle `encoding` would be to move it from `Variable` onto array objects, specifically duck array objects that Xarray creates when loading data from disk. As long as these duck arrays don't ""propagate"" themselves under array operations but rather turn into raw numpy arrays (or whatever is wrapped), this would automatically resolve all issues around propagating `encoding` attributes (e.g., https://github.com/pydata/xarray/pull/5065, https://github.com/pydata/xarray/issues/1614). And users who don't care about `encoding` because they don't use Xarray's IO functionality would never need to think about it.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5082/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
416554477,MDU6SXNzdWU0MTY1NTQ0Nzc=,2797,Stalebot is being overly aggressive,1217238,closed,0,,,7,2019-03-03T19:37:37Z,2021-06-03T21:31:46Z,2021-06-03T21:22:48Z,MEMBER,,,,"E.g., see https://github.com/pydata/xarray/issues/1151 where stalebot closed an issue even after another comment.

Is this something we need to reconfigure or just a bug?

cc @pydata/xarray ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2797/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
276241764,MDU6SXNzdWUyNzYyNDE3NjQ=,1739,Utility to restore original dimension order after apply_ufunc,1217238,open,0,,,11,2017-11-23T00:47:57Z,2021-05-29T07:39:33Z,,MEMBER,,,,"This seems to be coming up quite a bit for wrapping functions that apply an operation along an axis, e.g., for `interpolate` in #1640 or `rank` in #1733.

We should either write a utility function to do this or consider adding an option to `apply_ufunc`.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1739/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
901047466,MDU6SXNzdWU5MDEwNDc0NjY=,5372,Consider revising the _repr_inline_ protocol,1217238,open,0,,,0,2021-05-25T16:18:31Z,2021-05-25T16:18:31Z,,MEMBER,,,,"`_repr_inline_` looks like an [IPython special method](https://ipython.readthedocs.io/en/stable/config/integrating.html#rich-display) but is actually includes some xarray specific details: the result should not include `shape` or `dtype`.

As I wrote in https://github.com/pydata/xarray/pull/5352, I would suggest revising it in one of two ways:

1. Giving it a name like `_xarray_repr_inline_` to make it clearer that it's Xarray specific
2. Include some more generic way of indicating that `shape`/`dtype` is redundant, e.g,. call it like `obj._repr_ndarray_inline_(dtype=False, shape=False)`","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5372/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
891253662,MDExOlB1bGxSZXF1ZXN0NjQ0MTQ5Mzc2,5300,Better error message when no backend engine is found.,1217238,closed,0,,,4,2021-05-13T18:10:04Z,2021-05-18T21:23:00Z,2021-05-18T21:23:00Z,MEMBER,,0,pydata/xarray/pulls/5300,"Also includes a better error message when loading a tutorial dataset but an underlying IO dependency is not found.

<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Fixes #5291
- [x] Tests added
- [x] Passes `pre-commit run --all-files`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5300/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
890573049,MDExOlB1bGxSZXF1ZXN0NjQzNTc1Mjc5,5296,More robust guess_can_open for netCDF4/scipy/h5netcdf entrypoints,1217238,closed,0,,,1,2021-05-12T23:53:32Z,2021-05-14T22:40:14Z,2021-05-14T22:40:14Z,MEMBER,,0,pydata/xarray/pulls/5296,"The new version checks magic numbers in files on disk, not just already open file objects.

I've also added a bunch of unit-tests.

Fixes GH5295

<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Closes #5295
- [x] Tests added
- [x] Passes `pre-commit run --all-files`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5296/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
46049691,MDU6SXNzdWU0NjA0OTY5MQ==,255,Add Dataset.to_pandas() method,1217238,closed,0,,987654,2,2014-10-17T00:01:36Z,2021-05-04T13:56:00Z,2021-05-04T13:56:00Z,MEMBER,,,,"This would be the complement of the DataArray constructor, converting an xray.DataArray into a 1D series, 2D DataFrame or 3D panel, whichever is appropriate.

`to_pandas` would also makes sense for Dataset, if it could convert 0d datasets to series, e.g., `pd.Series({k: v.item() for k, v in ds.items()})` (there is currently no direct way to do this), and revert to to_dataframe for higher dimensional input.
- [x] DataArray method
- [ ] Dataset method
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/255/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
294241734,MDU6SXNzdWUyOTQyNDE3MzQ=,1887,Boolean indexing with multi-dimensional key arrays,1217238,open,0,,,13,2018-02-04T23:28:45Z,2021-04-22T21:06:47Z,,MEMBER,,,,"Originally from https://github.com/pydata/xarray/issues/974

For _boolean indexing_:
- `da[key]` where `key` is a boolean labelled array (with _any_ number of dimensions) is made equivalent to `da.where(key.reindex_like(ds), drop=True)`. This matches the existing behavior if `key` is a 1D boolean array. For multi-dimensional arrays, even though the result is now multi-dimensional, this coupled with automatic skipping of NaNs means that `da[key].mean()` gives the same result as in NumPy.
- `da[key] = value` where `key` is a boolean labelled array can be made equivalent to `da = da.where(*align(key.reindex_like(da), value.reindex_like(da)))` (that is, the three argument form of `where`).
- `da[key_0, ..., key_n]` where all of `key_i` are boolean arrays gets handled in the usual way. It is an `IndexingError` to supply multiple labelled keys if any of them are not already aligned with as the corresponding index coordinates (and share the same dimension name). If they want alignment, we suggest users simply write `da[key_0 & ... & key_n]`.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1887/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
346822633,MDU6SXNzdWUzNDY4MjI2MzM=,2336,test_88_character_filename_segmentation_fault should not try to write to the current working directory,1217238,closed,0,,,2,2018-08-02T01:06:41Z,2021-04-20T23:38:53Z,2021-04-20T23:38:53Z,MEMBER,,,,"This files in cases where the current working directory does not support writes, e.g., as seen here
```
    def test_88_character_filename_segmentation_fault(self):
        # should be fixed in netcdf4 v1.3.1
        with mock.patch('netCDF4.__version__', '1.2.4'):
            with warnings.catch_warnings():
                message = ('A segmentation fault may occur when the '
                           'file path has exactly 88 characters')
                warnings.filterwarnings('error', message)
                with pytest.raises(Warning):
                    # Need to construct 88 character filepath
>                   xr.Dataset().to_netcdf('a' * (88 - len(os.getcwd()) - 1))

tests/test_backends.py:1234: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
core/dataset.py:1150: in to_netcdf
    compute=compute)
backends/api.py:715: in to_netcdf
    autoclose=autoclose, lock=lock)
backends/netCDF4_.py:332: in open
    ds = opener()
backends/netCDF4_.py:231: in _open_netcdf4_group
    ds = nc4.Dataset(filename, mode=mode, **kwargs)
third_party/py/netCDF4/_netCDF4.pyx:2111: in netCDF4._netCDF4.Dataset.__init__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   IOError: [Errno 13] Permission denied
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2336/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
843996137,MDU6SXNzdWU4NDM5OTYxMzc=,5092,Concurrent loading of coordinate arrays from Zarr,1217238,open,0,,,0,2021-03-30T02:19:50Z,2021-04-19T02:43:31Z,,MEMBER,,,,"When you open a dataset with Zarr, xarray loads coordinate arrays corresponding to indexes in serial. This can be slow (multiple seconds) even with only a handful of such arrays if they are stored in a remote filesystem (e.g., cloud object stores). This is similar to the use-cases for [consolidated metadata](https://zarr.readthedocs.io/en/latest/tutorial.html#consolidating-metadata).

In principle, we could speed up loading datasets from Zarr into Xarray significantly by reading the data corresponding to these arrays in parallel (e.g., in multiple threads).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5092/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
621082480,MDU6SXNzdWU2MjEwODI0ODA=,4080,Most arguments to open_dataset should be keyword only,1217238,closed,0,,,1,2020-05-19T15:38:51Z,2021-03-16T10:56:09Z,2021-03-16T10:56:09Z,MEMBER,,,,"`open_dataset` has a long list of arguments: `xarray.open_dataset(filename_or_obj, group=None, decode_cf=True, mask_and_scale=None, decode_times=True, autoclose=None, concat_characters=True, decode_coords=True, engine=None, chunks=None, lock=None, cache=None, drop_variables=None, backend_kwargs=None, use_cftime=None)`

Similarly to the case for pandas (https://github.com/pandas-dev/pandas/issues/27544), it would be nice to make most of these arguments keyword-only, e.g., `def open_dataset(filename_or_obj, group, *, ...)`. For consistency, this would also apply to `open_dataarray`, `decode_cf`, `open_mfdataset`, etc.

This would encourage writing readable code when calling `open_dataset()` and would allow us to use better organization when adding new arguments (e.g., `decode_timedelta` in https://github.com/pydata/xarray/pull/4071).

To make this change, we could make use of the `deprecate_nonkeyword_arguments` decorator from https://github.com/pandas-dev/pandas/pull/27573","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4080/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
645062817,MDExOlB1bGxSZXF1ZXN0NDM5NTg4OTU1,4178,Fix min_deps_check; revert to support numpy=1.14 and pandas=0.24,1217238,closed,0,,,5,2020-06-25T00:37:19Z,2021-02-27T21:46:43Z,2021-02-27T21:46:42Z,MEMBER,,1,pydata/xarray/pulls/4178,"Fixes the issue noticed in:
https://github.com/pydata/xarray/pull/4175#issuecomment-649135372

Let's see if this passes CI...

<!-- Feel free to remove check-list items aren't relevant to your change -->

 - [x] Passes `isort -rc . && black . && mypy . && flake8`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4178/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
645154872,MDU6SXNzdWU2NDUxNTQ4NzI=,4179,Consider revising our minimum dependency version policy,1217238,closed,0,,,7,2020-06-25T05:04:38Z,2021-02-22T05:02:25Z,2021-02-22T05:02:25Z,MEMBER,,,,"Our [current policy](http://xarray.pydata.org/en/stable/installing.html#minimum-dependency-versions) is that xarray supports ""the minor version (X.Y) initially published no more than N months ago"" where N is:

- Python: 42 months (NEP 29)
- numpy: 24 months (NEP 29)
- pandas: 12 months
- scipy: 12 months
- sparse, pint and other libraries that rely on NEP-18 for integration: very latest available versions only,
- all other libraries: 6 months

I think this policy is too aggressive, particularly for pandas, SciPy and other libraries. Some of these projects can go 6+ months between minor releases. For example, version 2.3 of zarr is currently more than 6 months old. So if zarr released 2.4 *today* and xarray issued a new release *tomorrow*, and then our policy would dictate that we could ask users to upgrade to the new version.

In https://github.com/pydata/xarray/pull/4178, I misinterpreted our policy as supporting ""the most recent minor version (X.Y) initially published more than N months ago"". This version makes a bit more sense to me: users only need to upgrade dependencies at least every N months to use the latest xarray release.

I understand that NEP-29 chose its language intentionally, so that distributors know ahead of time when they can drop support for a Python or NumPy version. But this seems like a (very) poor fit for projects without regular releases. At the very least we should adjust the specific time windows.

I'll see if I can gain some understanding of the motivation for this particular language over on the NumPy tracker...","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4179/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
267927402,MDU6SXNzdWUyNjc5Mjc0MDI=,1652,Resolve warnings issued in the xarray test suite,1217238,closed,0,,,10,2017-10-24T07:36:55Z,2021-02-21T23:06:35Z,2021-02-21T23:06:34Z,MEMBER,,,,"82 warnings are currently issued in the process of running our test suite:
https://gist.github.com/shoyer/db0b2c82efd76b254453216e957c4345

Some of can probably be safely ignored, but others are likely noticed by users, e.g.,
https://stackoverflow.com/questions/41130138/why-is-invalid-value-encountered-in-greater-warning-thrown-in-python-xarray-fo/41147570#41147570

It would be nice to clean up all of these, either by catching the appropriate upstream warning (if irrelevant) or changing our usage to avoid the warning. There may very well be a lurking FutureWarning in there somewhere that could cause issues when another library updates.

Probably the easiest way to get started here is to get the test suite running locally, and use `py.test -W error` to turn all warnings into errors.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1652/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
777327298,MDU6SXNzdWU3NzczMjcyOTg=,4749,Option for combine_attrs with conflicting values silently dropped,1217238,closed,0,,,0,2021-01-01T18:04:49Z,2021-02-10T19:50:17Z,2021-02-10T19:50:17Z,MEMBER,,,,"`merge()` currently supports four options for merging `attrs`:
```
    combine_attrs : {""drop"", ""identical"", ""no_conflicts"", ""override""}, \
                    default: ""drop""
        String indicating how to combine attrs of the objects being merged:
        - ""drop"": empty attrs on returned Dataset.
        - ""identical"": all attrs must be the same on every object.
        - ""no_conflicts"": attrs from all objects are combined, any that have
          the same name must also have the same value.
        - ""override"": skip comparing and copy attrs from the first dataset to
          the result.
```

It would be nice to have an option to combine attrs from all objects like ""no_conflicts"", but that drops attributes with conflicting values rather than raising an error. We might call this `combine_attrs=""drop_conflicts""` or `combine_attrs=""matching""`.

This is similar to how xarray currently handles conflicting values for `DataArray.name` and would be more suitable to consider for the default behavior of `merge` and other functions/methods that merge coordinates (e.g., apply_ufunc, concat, where, binary arithmetic).

cc @keewis ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4749/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
264098632,MDU6SXNzdWUyNjQwOTg2MzI=,1618,apply_raw() for a simpler version of apply_ufunc(),1217238,open,0,,,4,2017-10-10T04:51:38Z,2021-01-01T17:14:43Z,,MEMBER,,,,"`apply_raw()` would work like `apply_ufunc()`, but without the hard to understand broadcasting behavior and core dimensions.

The rule for `apply_raw()` would be that it directly unwraps its arguments and passes them on to the wrapped function, without any broadcasting. We would also include a `dim` argument that is automatically converted into the appropriate `axis` argument when calling the wrapped function.

Output dimensions would be determined from a simple rule of some sort:
- Default output dimensions would either be copied from the first argument, or would take on the ordered union on all input dimensions.
- Custom dimensions could either be set by adding a `drop_dims` argument (like `dask.array.map_blocks`), or require an explicit override `output_dims`.

This also could be suitable for defining as a method instead of a separate function. See https://github.com/pydata/xarray/issues/1251 and https://github.com/pydata/xarray/issues/1130 for related issues.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1618/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
269700511,MDU6SXNzdWUyNjk3MDA1MTE=,1672,Append along an unlimited dimension to an existing netCDF file,1217238,open,0,,,8,2017-10-30T18:09:54Z,2020-11-29T17:35:04Z,,MEMBER,,,,"This would be a nice feature to have for some use cases, e.g., for writing simulation time-steps:
https://stackoverflow.com/questions/46951981/create-and-write-xarray-dataarray-to-netcdf-in-chunks

It should be relatively straightforward to add, too, building on support for writing files with unlimited dimensions. User facing API would probably be a new keyword argument to `to_netcdf()`, e.g., `extend='time'` to indicate the extended dimension.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1672/reactions"", ""total_count"": 21, ""+1"": 21, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
314444743,MDU6SXNzdWUzMTQ0NDQ3NDM=,2059,How should xarray serialize bytes/unicode strings across Python/netCDF versions?,1217238,open,0,,,5,2018-04-15T19:36:55Z,2020-11-19T10:08:16Z,,MEMBER,,,,"# netCDF string types

We have several options for storing strings in netCDF files:
- `NC_CHAR`: netCDF's legacy character type. The closest match is NumPy `'S1'` dtype. In principle, it's supposed to be able to store arbitrary bytes. On HDF5, it uses an UTF-8 encoded string with a fixed-size of 1 (but note that HDF5 does not complain about storing arbitrary bytes).
- `NC_STRING`: netCDF's newer variable length string type. It's only available on netCDF4 (not netCDF3). It corresponds to an HDF5 variable-length string with UTF-8 encoding.
- `NC_CHAR` with an `_Encoding` attribute: xarray and netCDF4-Python support an ad-hoc convention for storing unicode strings in `NC_CHAR` data-types, by adding an attribute `{'_Encoding': 'UTF-8'}`. The data is still stored as fixed width strings, but xarray (and netCDF4-Python) can decode them as unicode.

`NC_STRING` would seem like a clear win in cases where it's supported, but as @crusaderky points out in https://github.com/pydata/xarray/issues/2040, it actually results in much larger netCDF files in many cases than using character arrays, which are more easily compressed. Nonetheless, we currently default to storing unicode strings in `NC_STRING`, because it's the most portable option -- every tool that handles HDF5 and netCDF4 should be able to read it properly as unicode strings.

# NumPy/Python string types

On the Python side, our options are perhaps even more confusing:
- NumPy's `dtype=np.string_` corresponds to fixed-length bytes. This is the default dtype for strings on Python 2, because on Python 2 strings are the same as bytes.
- NumPy's `dtype=np.unicode_` corresponds to fixed-length unicode. This is the default dtype for strings on Python 3, because on Python 3 strings are the same as unicode.
- Strings are also commonly stored in numpy arrays with `dtype=np.object_`, as arrays of either `bytes` or `unicode` objects. This is a pragmatic choice, because otherwise NumPy has no support for variable length strings. We also use this (like pandas) to mark missing values with `np.nan`.

Like pandas, we are pretty liberal with converting back and forth between fixed-length (`np.string`/`np.unicode_`) and variable-length (object dtype) representations of strings as necessary. This works pretty well, though converting from object arrays in particular has downsides, since it cannot be done lazily with dask.

# Current behavior of xarray

Currently, xarray uses the same behavior on Python 2/3. The priority was faithfully round-tripping data from a particular version of Python to netCDF and back, which the current serialization behavior achieves:

| Python version | NetCDF version | NumPy datatype | NetCDF datatype |
| --------- | ---------- | -------------- | ------------ |
| Python 2 | NETCDF3 | np.string_ / str | NC_CHAR |
| Python 2 | NETCDF4 | np.string_ / str | NC_CHAR |
| Python 3 | NETCDF3 | np.string_ / bytes | NC_CHAR |
| Python 3 | NETCDF4 | np.string_ / bytes | NC_CHAR |
| Python 2 | NETCDF3 | np.unicode_ / unicode | NC_CHAR with UTF-8 encoding |
| Python 2 | NETCDF4 | np.unicode_ / unicode | NC_STRING |
| Python 3 | NETCDF3 | np.unicode_ / str | NC_CHAR with UTF-8 encoding |
| Python 3 | NETCDF4 | np.unicode_ / str | NC_STRING |
| Python 2 | NETCDF3 | object bytes/str | NC_CHAR |
| Python 2 | NETCDF4 | object bytes/str | NC_CHAR |
| Python 3 | NETCDF3 | object bytes | NC_CHAR |
| Python 3 | NETCDF4 | object bytes | NC_CHAR |
| Python 2 | NETCDF3 | object unicode | NC_CHAR with UTF-8 encoding |
| Python 2 | NETCDF4 | object unicode | NC_STRING |
| Python 3 | NETCDF3 | object unicode/str | NC_CHAR with UTF-8 encoding |
| Python 3 | NETCDF4 | object unicode/str | NC_STRING |

This can also be selected explicitly for most data-types by setting dtype in encoding:
- `'S1'` for NC_CHAR (with or without encoding)
- `str` for NC_STRING (though I'm not 100% sure it works properly currently when given bytes)

Script for generating table:
<details>

```python
from __future__ import print_function
import xarray as xr
import uuid
import netCDF4
import numpy as np
import sys

for dtype_name, value in [
    ('np.string_ / ' + type(b'').__name__, np.array([b'abc'])),
    ('np.unicode_ / ' + type(u'').__name__, np.array([u'abc'])),
    ('object bytes/' + type(b'').__name__, np.array([b'abc'], dtype=object)),
    ('object unicode/' + type(u'').__name__, np.array([u'abc'], dtype=object)),
]:
    for format in ['NETCDF3_64BIT', 'NETCDF4']:
        filename = str(uuid.uuid4()) + '.nc'
        xr.Dataset({'data': value}).to_netcdf(filename, format=format)
        with netCDF4.Dataset(filename) as f:
            var = f.variables['data']
            disk_dtype = var.dtype
            has_encoding = hasattr(var, '_Encoding')
        disk_dtype_name = (('NC_CHAR' if disk_dtype == 'S1' else 'NC_STRING') +
                           (' with UTF-8 encoding' if has_encoding else ''))
        print('|', 'Python %i' % sys.version_info[0],
              '|', format[:7],
              '|', dtype_name,
              '|', disk_dtype_name,
              '|')
```
</details>

# Potential alternatives

The main option I'm considering is switching to default to `NC_CHAR` with UTF-8 encoding for np.string_ / str and object bytes/str on Python 2. The current behavior could be explicitly toggled by setting an encoding of `{'_Encoding': None}`.

This would imply two changes:
1. Attempting to serialize arbitrary bytes (on Python 2) would start raising an error -- anything that isn't ASCII would require explicitly disabling `_Encoding`.
2. Strings read back from disk on Python 2 would come back as unicode instead of bytes.

This implicit conversion would be consistent with Python 2's general handling of bytes/unicode, and facilitate reading netCDF files on Python 3 that were written with Python 2.

The counter-argument is that it may not be worth changing this at this late point, given that we will be sunsetting Python 2 support by year's end.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2059/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
613012939,MDExOlB1bGxSZXF1ZXN0NDEzODQ3NzU0,4035,Support parallel writes to regions of zarr stores,1217238,closed,0,,,17,2020-05-06T02:40:19Z,2020-11-04T06:19:01Z,2020-11-04T06:19:01Z,MEMBER,,0,pydata/xarray/pulls/4035,"This PR adds support for a `region` keyword argument to `to_zarr()`, to support parallel writes to different parts of arrays in a zarr stores, e.g., `ds.to_zarr(..., region={'x': slice(1000, 2000)})` to write a dataset over the range `1000:2000` along the `x` dimension.

This is useful for creating large Zarr datasets _without_ requiring dask. For example, the separate workers in a simulation job might each write a single non-overlapping chunk of a Zarr file. The standard way to handle such datasets today is to first write netCDF files in each process, and then consolidate them afterwards with dask (see #3096).

### Creating empty Zarr stores

In order to do so, the Zarr file must be pre-existing with desired variables in the right shapes/chunks. It is desirable to be able to create such stores without actually writing data, because datasets that we want to write in parallel may be very large.

In the example below, I achieve this filling a `Dataset` with dask arrays, and passing `compute=False` to `to_zarr()`. This works, but it relies on an undocumented implementation detail of the `compute` argument. We should either:

1. Officially document that the `compute` argument only controls writing array values, not metadata (at least for zarr).
2. Add a new keyword argument or entire new method for creating an unfilled Zarr store, e.g., `write_values=False`.

I think (1) is maybe the cleanest option (no extra API endpoints).

### Unchunked variables

One potential gotcha concerns coordinate arrays that are not chunked, e.g., consider parallel writing of a dataset divided along time with 2D `latitude` and `longitude` arrays that are fixed over all chunks. With the current PR, such coordinate arrays would get rewritten by each separate writer.

If a Zarr store does not have atomic writes, then conceivably this could result in corrupted data. The default DirectoryStore has atomic writes and cloud based object stores should also be atomic, so perhaps this doesn't matter in practice, but at the very least it's inefficient and could cause issues for large-scale jobs due to resource contention.

Options include:

1. Current behavior. Variables whose dimensions do not overlap with `region` are written by `to_zarr()`. *This is likely the most intuitive behavior for writing from a single process at a time.*
2. Exclude variables whose dimensions do not overlap with `region` from being written. This is likely the most convenient behavior for writing from multiple processes at once.
3. Like (2), but issue a warning  if any such variables exist instead of silently dropping them.
4. Like (2), but raise an error instead of a warning. Require the user to explicitly drop them with `.drop()`. This is probably the safest behavior.

I think (4) would be my preferred option. Some users would undoubtedly find this annoying, but the power-users for whom we are adding this feature would likely appreciate it.

### Usage example

```python
import xarray
import dask.array as da

ds = xarray.Dataset({'u': (('x',), da.arange(1000, chunks=100))})

# create the new zarr store, but don't write data
path = 'my-data.zarr'
ds.to_zarr(path, compute=False)

# look at the unwritten data
ds_opened = xarray.open_zarr(path)
print('Data before writing:', ds_opened.u.data[::100].compute())
# Data before writing: [  1 100   1 100 100   1   1   1   1   1]

# write out each slice (could be in separate processes)
for start in range(0, 1000, 100):
  selection = {'x': slice(start, start + 100)}
  ds.isel(selection).to_zarr(path, region=selection)

print('Data after writing:', ds_opened.u.data[::100].compute())
# Data after writing: [  0 100 200 300 400 500 600 700 800 900]
```

 - [x] Closes https://github.com/pydata/xarray/issues/3096
 - [x] Integration test
 - [x] Unit tests
 - [x] Passes `isort -rc . && black . && mypy . && flake8`
 - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4035/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
124809636,MDU6SXNzdWUxMjQ4MDk2MzY=,703,Document xray internals / advanced API,1217238,closed,0,,,2,2016-01-04T18:12:30Z,2020-11-03T17:33:32Z,2020-11-03T17:33:32Z,MEMBER,,,,"It would be useful to document the internal `Variable` class and the internal structure of `Dataset` and `DataArray`. This would be helpful for both new contributors and expert users, who might find `Variable` helpful as an advanced API.

I had some notes in an earlier version of the docs that could be adapted. Note, however, that the internal structure of `DataArray` changed in #648:
http://xray.readthedocs.org/en/v0.2/tutorial.html#notes-on-xray-s-internals
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/703/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
715374721,MDU6SXNzdWU3MTUzNzQ3MjE=,4490,Group together decoding options into a single argument,1217238,open,0,,,6,2020-10-06T06:15:18Z,2020-10-29T04:07:46Z,,MEMBER,,,,"**Is your feature request related to a problem? Please describe.**

`open_dataset()` currently has a _very_ long function signature. This makes it hard to keep track of everything it can do, and is particularly problematic for the authors of _new_ backends (e.g., see https://github.com/pydata/xarray/pull/4477), which might need to know how to handle all these arguments.

**Describe the solution you'd like**

To simple the interface, I propose to group together all the decoding options into a new `DecodingOptions` class. I'm thinking something like:
```python
from dataclasses import dataclass, field, asdict
from typing import Optional, List

@dataclass(frozen=True)
class DecodingOptions:
    mask: Optional[bool] = None
    scale: Optional[bool] = None
    datetime: Optional[bool] = None
    timedelta: Optional[bool] = None
    use_cftime: Optional[bool] = None
    concat_characters: Optional[bool] = None
    coords: Optional[bool] = None
    drop_variables: Optional[List[str]] = None

    @classmethods
    def disabled(cls):
        return cls(mask=False, scale=False, datetime=False, timedelta=False,
                  concat_characters=False, coords=False)

    def non_defaults(self):
        return {k: v for k, v in asdict(self).items() if v is not None}

    # add another method for creating default Variable Coder() objects,
    # e.g., those listed in encode_cf_variable()
```

The signature of `open_dataset` would then become:
```python
def open_dataset(
    filename_or_obj,
    group=None,
    *
    engine=None,
    chunks=None,
    lock=None,
    cache=None,
    backend_kwargs=None,
    decode: Union[DecodingOptions, bool] = None, 
    **deprecated_kwargs
):
    if decode is None:
        decode = DecodingOptions()
    if decode is False:
        decode = DecodingOptions.disabled()
    # handle deprecated_kwargs...
    ...
```

**Question**: are `decode` and `DecodingOptions` the right names? Maybe these should still include the name ""CF"", e.g., `decode_cf` and `CFDecodingOptions`, given that these are specific to CF conventions?

**Note**: the current signature is `open_dataset(filename_or_obj, group=None, decode_cf=True, mask_and_scale=None, decode_times=True, autoclose=None, concat_characters=True, decode_coords=True, engine=None, chunks=None, lock=None, cache=None, drop_variables=None, backend_kwargs=None, use_cftime=None, decode_timedelta=None)`

Usage with the new interface would look like `xr.open_dataset(filename, decode=False)` or `xr.open_dataset(filename, decode=xr.DecodingOptions(mask=False, scale=False))`.

This requires a _little_ bit more typing than what we currently have, but it has a few advantages:

1. It's easier to understand the role of different arguments. Now there is a function with ~8 arguments and a class with ~8 arguments rather than a function with ~15 arguments.
2. It's easier to add new decoding arguments (e.g., for more advanced CF conventions), because they don't clutter the `open_dataset` interface. For example, I separated out `mask` and `scale` arguments, versus the current `mask_and_scale` argument.
3. If a new backend plugin for `open_dataset()` needs to handle every option supported by `open_dataset()`, this makes that task significantly easier. The only decoding options they need to worry about are _non-default_ options that were explicitly set, i.e., those exposed by the `non_defaults()` method. If another decoding option wasn't explicitly set and isn't recognized by the backend, they can just ignore it.

**Describe alternatives you've considered**

For the overall approach:

1. We could keep the current design, with separate keyword arguments for decoding options, and just be very careful about passing around these arguments. This seems pretty painful for the backend refactor, though.
2. We could keep the current design only for the user facing `open_dataset()` interface, and then internally convert into the `DecodingOptions()` struct for passing to backend constructors. This would provide much needed flexibility for backend authors, but most users wouldn't benefit from the new interface. Perhaps this would make sense as an intermediate step?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4490/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
718492237,MDExOlB1bGxSZXF1ZXN0NTAwODc5MTY3,4500,Add variable/attribute names to netCDF validation errors,1217238,closed,0,,,1,2020-10-10T00:47:18Z,2020-10-10T05:28:08Z,2020-10-10T05:28:08Z,MEMBER,,0,pydata/xarray/pulls/4500,"This should result in a better user experience, e.g., specifically
pointing out the attribute with an invalid value.

<!-- Feel free to remove check-list items aren't relevant to your change -->

 - [x] Tests added
 - [x] Passes `isort . && black . && mypy . && flake8`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4500/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
169274464,MDU6SXNzdWUxNjkyNzQ0NjQ=,939,Consider how to deal with the proliferation of decoder options on open_dataset,1217238,closed,0,,,8,2016-08-04T01:57:26Z,2020-10-06T15:39:11Z,2020-10-06T15:39:11Z,MEMBER,,,,"There are already lots of keyword arguments, and users want even more! (#843)

Maybe we should use some sort of object to encapsulate desired options?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/939/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
253107677,MDU6SXNzdWUyNTMxMDc2Nzc=,1527,"Binary operations with ds.groupby('time.dayofyear') errors out, but ds.groupby('time.month') works",1217238,open,0,,,10,2017-08-26T16:54:53Z,2020-09-29T10:05:42Z,,MEMBER,,,,"Reported on the mailing list:

Original datasets:
```
>>> ds_xr
<xarray.DataArray (time: 12775)>
array([-0.01, -0.01, -0.01, ..., -0.27, -0.27, -0.27])
Coordinates:
  * time     (time) datetime64[ns] 1979-01-01 1979-01-02 1979-01-03 ...

>>> slope_itcp_ds
<xarray.Dataset>
Dimensions:                        (lat: 73, level: 2, lon: 144, time: 366)
Coordinates:
  * lon                            (lon) float32 0.0 2.5 5.0 7.5 10.0 12.5 ...
  * lat                            (lat) float32 90.0 87.5 85.0 82.5 80.0 ...
  * level                          (level) float64 0.0 1.0
  * time                           (time) datetime64[ns] 2010-01-01 ...
Data variables:
    __xarray_dataarray_variable__  (time, level, lat, lon) float64 -0.8795 ...
Attributes:
    CDI:          Climate Data Interface version 1.7.1 (http://mpimet.mpg.de/...
    Conventions:  CF-1.4
    history:      Fri Aug 25 18:55:50 2017: cdo -inttime,2010-01-01,00:00:00,...
    CDO:          Climate Data Operators version 1.7.1 (http://mpimet.mpg.de/...
```

Issue:
Grouping by month works and outputs this:
```
>>> ds_xr.groupby('time.month') - slope_itcp_ds.groupby('time.month').mean('time')
<xarray.Dataset>
Dimensions:                        (lat: 73, level: 2, lon: 144, time: 12775)
Coordinates:
  * lon                            (lon) float32 0.0 2.5 5.0 7.5 10.0 12.5 ...
  * lat                            (lat) float32 90.0 87.5 85.0 82.5 80.0 ...
  * level                          (level) float64 0.0 1.0
    month                          (time) int64 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
  * time                           (time) datetime64[ns] 1979-01-01 ...
Data variables:
    __xarray_dataarray_variable__  (time, level, lat, lon) float64 1.015 ...
```

Grouping by dayofyear doesn't work and gives this traceback:
```
>>> ds_xr.groupby('time.dayofyear') - slope_itcp_ds.groupby('time.dayofyear').mean('time')
KeyError                                  Traceback (most recent call last)
<ipython-input-10-01c0cf4c980a> in <module>()
----> 1 ds_xr.groupby('time.dayofyear') - slope_itcp_ds.groupby('time.dayofyear').mean('time')

/data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/groupby.py in func(self, other)
    316             g = f if not reflexive else lambda x, y: f(y, x)
    317             applied = self._yield_binary_applied(g, other)
--> 318             combined = self._combine(applied)
    319             return combined
    320         return func

/data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/groupby.py in _combine(self, applied, shortcut)
    532             combined = self._concat_shortcut(applied, dim, positions)
    533         else:
--> 534             combined = concat(applied, dim)
    535             combined = _maybe_reorder(combined, dim, positions)
    536 

/data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in concat(objs, dim, data_vars, coords, compat, positions, indexers, mode, concat_over)
    118         raise TypeError('can only concatenate xarray Dataset and DataArray '
    119                         'objects, got %s' % type(first_obj))
--> 120     return f(objs, dim, data_vars, coords, compat, positions)
    121 
    122 

/data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions)
    210     datasets = align(*datasets, join='outer', copy=False, exclude=[dim])
    211 
--> 212     concat_over = _calc_concat_over(datasets, dim, data_vars, coords)
    213 
    214     def insert_result_variable(k, v):

/data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in _calc_concat_over(datasets, dim, data_vars, coords)
    190                            if dim in v.dims)
    191     concat_over.update(process_subset_opt(data_vars, 'data_vars'))
--> 192     concat_over.update(process_subset_opt(coords, 'coords'))
    193     if dim in datasets[0]:
    194         concat_over.add(dim)

/data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in process_subset_opt(opt, subset)
    165                                for ds in datasets[1:])
    166                 # all nonindexes that are not the same in each dataset
--> 167                 concat_new = set(k for k in getattr(datasets[0], subset)
    168                                  if k not in concat_over and differs(k))
    169             elif opt == 'all':

/data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in <genexpr>(.0)
    166                 # all nonindexes that are not the same in each dataset
    167                 concat_new = set(k for k in getattr(datasets[0], subset)
--> 168                                  if k not in concat_over and differs(k))
    169             elif opt == 'all':
    170                 concat_new = (set(getattr(datasets[0], subset)) -

/data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in differs(vname)
    163                     v = datasets[0].variables[vname]
    164                     return any(not ds.variables[vname].equals(v)
--> 165                                for ds in datasets[1:])
    166                 # all nonindexes that are not the same in each dataset
    167                 concat_new = set(k for k in getattr(datasets[0], subset)

/data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in <genexpr>(.0)
    163                     v = datasets[0].variables[vname]
    164                     return any(not ds.variables[vname].equals(v)
--> 165                                for ds in datasets[1:])
    166                 # all nonindexes that are not the same in each dataset
    167                 concat_new = set(k for k in getattr(datasets[0], subset)

/data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/utils.py in __getitem__(self, key)
    288 
    289     def __getitem__(self, key):
--> 290         return self.mapping[key]
    291 
    292     def __iter__(self):

KeyError: 'lon'
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1527/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
644821435,MDU6SXNzdWU2NDQ4MjE0MzU=,4176,Pre-expand data and attributes in DataArray/Variable HTML repr?,1217238,closed,0,,,7,2020-06-24T18:22:35Z,2020-09-21T20:10:26Z,2020-06-28T17:03:40Z,MEMBER,,,,"## Proposal

Given that a major purpose for plotting an array is to look at data or attributes, I wonder if we should expand these sections by default?
- I worry that clicking on icons to expand sections may not be easy to discover
- This would also be consistent with the text repr, which shows these sections by default (the Dataset repr is already consistent by default between text and HTML already)

## Context

Currently the HTML repr for DataArray/Variable looks like this:
![image](https://user-images.githubusercontent.com/1217238/85610183-9e014400-b60b-11ea-8be1-5f9196126acd.png)

To see array data, you have to click on the ![image](https://user-images.githubusercontent.com/1217238/85610286-b7a28b80-b60b-11ea-9496-a4f9d9b048ac.png) icon:
![image](https://user-images.githubusercontent.com/1217238/85610262-b1acaa80-b60b-11ea-9621-17f0bcffb885.png)

(thanks to @max-sixty for making this a little bit more manageably sized in https://github.com/pydata/xarray/pull/3905!)

There's also a really nice repr for nested dask arrays:
![image](https://user-images.githubusercontent.com/1217238/85610598-fcc6bd80-b60b-11ea-8b1a-5cf950449dcb.png)

","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4176/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
702372014,MDExOlB1bGxSZXF1ZXN0NDg3NjYxMzIz,4426,Fix for h5py deepcopy issues,1217238,closed,0,,,6,2020-09-16T01:11:00Z,2020-09-18T22:31:13Z,2020-09-18T22:31:09Z,MEMBER,,0,pydata/xarray/pulls/4426,"<!-- Feel free to remove check-list items aren't relevant to your change -->

 - [x] Closes #4425
 - [x] Tests added
 - [x] Passes `isort . && black . && mypy . && flake8`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4426/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
669307837,MDExOlB1bGxSZXF1ZXN0NDU5Njk1NDA5,4292,Fix indexing with datetime64[ns] with pandas=1.1,1217238,closed,0,,,11,2020-07-31T00:48:50Z,2020-09-16T03:11:48Z,2020-09-16T01:33:30Z,MEMBER,,0,pydata/xarray/pulls/4292,"Fixes #4283

The underlying issue is that calling `.item()` on a NumPy array with
`dtype=datetime64[ns]` returns an _integer_, rather than an `np.datetime64`
scalar. This is somewhat baffling but works this way because `.item()`
returns native Python types, but `datetime.datetime` doesn't support
nanosecond precision.

`pandas.Index.get_loc` used to support these integers, but now is more strict.
Hence we get errors.

We can fix this by using `array[()]` to convert 0d arrays into NumPy scalars
instead of calling `array.item()`.

I've added a crude regression test. There may well be a better way to test this
but I haven't figured it out yet.

<!-- Feel free to remove check-list items aren't relevant to your change -->

 - [x] Tests added
 - [x] Passes `isort . && black . && mypy . && flake8`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4292/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
417542619,MDU6SXNzdWU0MTc1NDI2MTk=,2803,Test failure with TestValidateAttrs.test_validating_attrs,1217238,closed,0,,,6,2019-03-05T23:03:02Z,2020-08-25T14:29:19Z,2019-03-14T15:59:13Z,MEMBER,,,,"This is due to setting multi-dimensional attributes being an error, as of the latest netCDF4-Python release: https://github.com/Unidata/netcdf4-python/blob/master/Changelog

E.g., as seen on Appveyor: https://ci.appveyor.com/project/shoyer/xray/builds/22834250/job/9q0ip6i3cchlbkw2
```
================================== FAILURES ===================================
___________________ TestValidateAttrs.test_validating_attrs ___________________
self = <xarray.tests.test_backends.TestValidateAttrs object at 0x00000096BE5FAFD0>
    def test_validating_attrs(self):
        def new_dataset():
            return Dataset({'data': ('y', np.arange(10.0))},
                           {'y': np.arange(10)})
    
        def new_dataset_and_dataset_attrs():
            ds = new_dataset()
            return ds, ds.attrs
    
        def new_dataset_and_data_attrs():
            ds = new_dataset()
            return ds, ds.data.attrs
    
        def new_dataset_and_coord_attrs():
            ds = new_dataset()
            return ds, ds.coords['y'].attrs
    
        for new_dataset_and_attrs in [new_dataset_and_dataset_attrs,
                                      new_dataset_and_data_attrs,
                                      new_dataset_and_coord_attrs]:
            ds, attrs = new_dataset_and_attrs()
    
            attrs[123] = 'test'
            with raises_regex(TypeError, 'Invalid name for attr'):
                ds.to_netcdf('test.nc')
    
            ds, attrs = new_dataset_and_attrs()
            attrs[MiscObject()] = 'test'
            with raises_regex(TypeError, 'Invalid name for attr'):
                ds.to_netcdf('test.nc')
    
            ds, attrs = new_dataset_and_attrs()
            attrs[''] = 'test'
            with raises_regex(ValueError, 'Invalid name for attr'):
                ds.to_netcdf('test.nc')
    
            # This one should work
            ds, attrs = new_dataset_and_attrs()
            attrs['test'] = 'test'
            with create_tmp_file() as tmp_file:
                ds.to_netcdf(tmp_file)
    
            ds, attrs = new_dataset_and_attrs()
            attrs['test'] = {'a': 5}
            with raises_regex(TypeError, 'Invalid value for attr'):
                ds.to_netcdf('test.nc')
    
            ds, attrs = new_dataset_and_attrs()
            attrs['test'] = MiscObject()
            with raises_regex(TypeError, 'Invalid value for attr'):
                ds.to_netcdf('test.nc')
    
            ds, attrs = new_dataset_and_attrs()
            attrs['test'] = 5
            with create_tmp_file() as tmp_file:
                ds.to_netcdf(tmp_file)
    
            ds, attrs = new_dataset_and_attrs()
            attrs['test'] = 3.14
            with create_tmp_file() as tmp_file:
                ds.to_netcdf(tmp_file)
    
            ds, attrs = new_dataset_and_attrs()
            attrs['test'] = [1, 2, 3, 4]
            with create_tmp_file() as tmp_file:
                ds.to_netcdf(tmp_file)
    
            ds, attrs = new_dataset_and_attrs()
            attrs['test'] = (1.9, 2.5)
            with create_tmp_file() as tmp_file:
                ds.to_netcdf(tmp_file)
    
            ds, attrs = new_dataset_and_attrs()
            attrs['test'] = np.arange(5)
            with create_tmp_file() as tmp_file:
                ds.to_netcdf(tmp_file)
    
            ds, attrs = new_dataset_and_attrs()
            attrs['test'] = np.arange(12).reshape(3, 4)
            with create_tmp_file() as tmp_file:
>               ds.to_netcdf(tmp_file)
xarray\tests\test_backends.py:3450: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
xarray\core\dataset.py:1323: in to_netcdf
    compute=compute)
xarray\backends\api.py:767: in to_netcdf
    unlimited_dims=unlimited_dims)
xarray\backends\api.py:810: in dump_to_store
    unlimited_dims=unlimited_dims)
xarray\backends\common.py:262: in store
    self.set_attributes(attributes)
xarray\backends\common.py:278: in set_attributes
    self.set_attribute(k, v)
xarray\backends\netCDF4_.py:418: in set_attribute
    _set_nc_attribute(self.ds, key, value)
xarray\backends\netCDF4_.py:294: in _set_nc_attribute
    obj.setncattr(key, value)
netCDF4\_netCDF4.pyx:2781: in netCDF4._netCDF4.Dataset.setncattr
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>   ???
E   ValueError: multi-dimensional array attributes not supported
netCDF4\_netCDF4.pyx:1514: ValueError
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2803/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
676306518,MDU6SXNzdWU2NzYzMDY1MTg=,4331,Support explicitly setting a dimension order with to_dataframe(),1217238,closed,0,,,0,2020-08-10T17:45:17Z,2020-08-14T18:28:26Z,2020-08-14T18:28:26Z,MEMBER,,,,"As discussed in https://github.com/pydata/xarray/issues/2346, it would be nice to support explicitly setting the desired order of dimensions when calling `Dataset.to_dataframe()` or `DataArray.to_dataframe()`.

There is nice precedent for this in the `to_dask_dataframe` method:
http://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_dask_dataframe.html

I imagine we could copy the exact same API for `to_dataframe.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4331/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
671019427,MDU6SXNzdWU2NzEwMTk0Mjc=,4295,We shouldn't require a recent version of setuptools to install xarray,1217238,closed,0,,,33,2020-08-01T16:49:57Z,2020-08-14T09:52:42Z,2020-08-14T09:52:42Z,MEMBER,,,,"@canol reports on our mailing that our setuptools 41.2 (released 21 August 2019) install requirement is making it hard to install recent versions of xarray at his company:
https://groups.google.com/g/xarray/c/HS_xcZDEEtA/m/GGmW-3eMCAAJ

> Hello, this is just a feedback about an issue we experienced which caused our internal tools stack to stay with xarray 0.15 version instead of a newer versions.
>
> We are a company using xarray in our internal frameworks and at the beginning we didn't have any restrictions on xarray version in our requirements file, so that new installations of our framework were using the latest version of xarray. But a few months ago we started to hear complaints from users who were having problems with installing our framework and the installation was failing because of xarray's requirement to use at least setuptools 41.2 which is released on 21th of August last year. So it hasn't been a year since it got released which might be considered relatively new.
>
> During the installation of our framework, pip was failing to update setuptools by saying that some other process is already using setuptools files so it cannot update setuptools. The people who are using our framework are not software developers so they didn't know how to solve this problem and it became so overwhelming for us maintainers that we set the xarray requirement to version >=0.15 <0.16. We also share our internal framework with customers of our company so we didn't want to bother the customers with any potential problems.
>
> You can see some other people having having similar problem when trying to update setuptools here (although not related to xarray): https://stackoverflow.com/questions/49338652/pip-install-u-setuptools-fail-windows-10
> 
> It is not a big deal but I just wanted to give this as a feedback. I don't know how much xarray depends on setuptools' 41.2 version.


I was surprised to see this in our `setup.cfg` file, added by @crusaderky in #3628. The version requirement is not documented in our docs.

Given that setuptools may be challenging to upgrade, would it be possible to relax this version requirement?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4295/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
638597800,MDExOlB1bGxSZXF1ZXN0NDM0MzMxNzQ3,4154,Update issue templates inspired/based on dask,1217238,closed,0,,,1,2020-06-15T07:00:53Z,2020-08-05T13:05:33Z,2020-06-17T16:50:57Z,MEMBER,,0,pydata/xarray/pulls/4154,See https://github.com/dask/dask/issues/new/choose for an approximate example of what this looks like.,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4154/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
290593053,MDU6SXNzdWUyOTA1OTMwNTM=,1850,xarray contrib module,1217238,closed,0,,,25,2018-01-22T19:50:08Z,2020-07-23T16:34:10Z,2020-07-23T16:34:10Z,MEMBER,,,,"Over in #1288 @nbren12 wrote:

> Overall, I think the xarray community could really benefit from some kind of centralized contrib package which has a low barrier to entry for these kinds of functions.

Yes, I agree that we should explore this. There are a lot of interesting projects building on xarray now but not great ways to discover them.

Are there other open source projects with a good model we should copy here?
- Scikit-Learn has a separate GitHub org/repositories for contrib projects: https://github.com/scikit-learn-contrib.
- TensorFlow has a contrib module within the TensorFlow namespace: `tensorflow.contrib`

This gives us two different models to consider. The first ""separate repository"" model might be easier/flexible from a maintenance perspective. Any preferences/thoughts?

There's also some nice overlap with the [Pangeo project](https://pangeo-data.github.io/).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1850/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
646073396,MDExOlB1bGxSZXF1ZXN0NDQwNDMxNjk5,4184,Improve the speed of from_dataframe with a MultiIndex (by 40x!),1217238,closed,0,,,1,2020-06-26T07:39:14Z,2020-07-02T20:39:02Z,2020-07-02T20:39:02Z,MEMBER,,0,pydata/xarray/pulls/4184,"Before:

    pandas.MultiIndexSeries.time_to_xarray
    ======= ========= ==========
    --             subset
    ------- --------------------
    dtype     True     False
    ======= ========= ==========
      int    505±0ms   37.1±0ms
     float   485±0ms   38.3±0ms
    ======= ========= ==========

After:

    pandas.MultiIndexSeries.time_to_xarray
    ======= ============ ==========
    --               subset
    ------- -----------------------
    dtype      True       False
    ======= ============ ==========
      int    10.7±0.4ms   22.6±1ms
     float   10.0±0.8ms   21.1±1ms
    ======= ============ ==========

~~There are still some cases where we have to fall back to the existing
slow implementation, but hopefully they should now be relatively rare.~~ Edit: now we always use the new implementation


<!-- Feel free to remove check-list items aren't relevant to your change -->

 - [x] Closes #2459, closes #4186 
 - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
 - [x] Passes `isort -rc . && black . && mypy . && flake8`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4184/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 1, ""eyes"": 0}",,,13221727,pull
645961347,MDExOlB1bGxSZXF1ZXN0NDQwMzQ2NTQz,4182,Show data by default in HTML repr for DataArray,1217238,closed,0,,,0,2020-06-26T02:25:08Z,2020-06-28T17:03:41Z,2020-06-28T17:03:41Z,MEMBER,,0,pydata/xarray/pulls/4182,"<!-- Feel free to remove check-list items aren't relevant to your change -->

 - [x] Closes #4176
 - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4182/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
644170008,MDExOlB1bGxSZXF1ZXN0NDM4ODQxMjk2,4171,Remove <pre> from nested HTML repr,1217238,closed,0,,,0,2020-06-23T21:51:14Z,2020-06-24T15:45:20Z,2020-06-24T15:45:00Z,MEMBER,,0,pydata/xarray/pulls/4171,"Using `<pre>` messes up the display of nested HTML reprs, e.g., from dask. Now we only use the `<pre>` tag when displaying raw text reprs.

Before (Jupyter notebook):
![image](https://user-images.githubusercontent.com/1217238/85467844-8faa1e00-b560-11ea-8565-b22105ca603a.png)

After:
![image](https://user-images.githubusercontent.com/1217238/85467860-946ed200-b560-11ea-90ed-79ea6505e07f.png)

 - [x] Tests added
 - [x] Passes `isort -rc . && black . && mypy . && flake8`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4171/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
613546626,MDExOlB1bGxSZXF1ZXN0NDE0MjgwMDEz,4039,Revise pull request template,1217238,closed,0,,,5,2020-05-06T19:08:19Z,2020-06-18T05:45:11Z,2020-06-18T05:45:10Z,MEMBER,,0,pydata/xarray/pulls/4039,"See below for the new language, to clarify that documentation is only necessary
for ""user visible changes.""

I added ""including notable bug fixes"" to indicate that minor bug fixes may not
be worth noting (I was thinking of test-suite only fixes in this category) but
perhaps that is too confusing.

cc @pydata/xarray for opinions!

<!-- Feel free to remove check-list items aren't relevant to your change -->

 - [ ] Closes #xxxx
 - [ ] Tests added
 - [ ] Passes `isort -rc . && black . && mypy . && flake8`
 - [ ] Fully documented, including `whats-new.rst` for user visible changes
       (including notable bug fixes) and `api.rst` for new API
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4039/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
639334065,MDExOlB1bGxSZXF1ZXN0NDM0OTQ0NTc4,4159,Test RTD's new pull request builder,1217238,closed,0,,,1,2020-06-16T03:06:32Z,2020-06-17T16:54:02Z,2020-06-17T16:54:02Z,MEMBER,,1,pydata/xarray/pulls/4159,"https://docs.readthedocs.io/en/latest/guides/autobuild-docs-for-pull-requests.html

<!-- Feel free to remove check-list items aren't relevant to your change -->

Don't merge this!","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4159/reactions"", ""total_count"": 3, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 3, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
639397110,MDExOlB1bGxSZXF1ZXN0NDM0OTk1NzQz,4160,Fix failing upstream-dev build & remove docs build,1217238,closed,0,,,0,2020-06-16T06:08:55Z,2020-06-16T06:35:49Z,2020-06-16T06:35:44Z,MEMBER,,0,pydata/xarray/pulls/4160,"Instead, we'll use RTD's new doc builder instead. For an example, click on
""docs/readthedocs.org:xray"" below or look at GH4159

<!-- Feel free to remove check-list items aren't relevant to your change -->

 - [x] Closes https://github.com/pydata/xarray/issues/4146
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4160/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
35682274,MDU6SXNzdWUzNTY4MjI3NA==,158,groupby should work with name=None,1217238,closed,0,,,2,2014-06-13T15:38:00Z,2020-05-30T13:15:56Z,2020-05-30T13:15:56Z,MEMBER,,,,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/158/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
612214951,MDExOlB1bGxSZXF1ZXN0NDEzMjIyOTEx,4028,Remove broken test for Panel with to_pandas(),1217238,closed,0,,,5,2020-05-04T22:41:42Z,2020-05-06T01:50:21Z,2020-05-06T01:50:21Z,MEMBER,,0,pydata/xarray/pulls/4028,"We don't support creating a Panel with to_pandas() with *any* version of
pandas at present, so this test was previous broken if pandas < 0.25 was
installed.

<!-- Feel free to remove check-list items aren't relevant to your change -->
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4028/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
612772669,MDU6SXNzdWU2MTI3NzI2Njk=,4030,Doc build on Azure is timing out on master,1217238,closed,0,,,1,2020-05-05T17:30:16Z,2020-05-05T21:49:26Z,2020-05-05T21:49:26Z,MEMBER,,,,"I don't know what's going on, but it currently times out after 1 hour:
https://dev.azure.com/xarray/xarray/_build/results?buildId=2767&view=logs&j=7e620c85-24a8-5ffa-8b1f-642bc9b1fc36&t=68484831-0a19-5145-bfe9-6309e5f7691d

Is it possible to login to Azure to debug this stuff?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4030/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
612838635,MDExOlB1bGxSZXF1ZXN0NDEzNzA3Mzgy,4032,Allow warning with cartopy in docs plotting build,1217238,closed,0,,,1,2020-05-05T19:25:11Z,2020-05-05T21:49:26Z,2020-05-05T21:49:26Z,MEMBER,,0,pydata/xarray/pulls/4032,"Fixes https://github.com/pydata/xarray/issues/4030

It looks like this is triggered by the new cartopy version now being installed
on RTD (version 0.17.0 -> 0.18.0).

Long term we should fix this, but for now it's better just to disable the
warning.

Here's the message from RTD:
```
Exception occurred:
  File ""/home/docs/checkouts/readthedocs.org/user_builds/xray/conda/latest/lib/python3.8/site-packages/IPython/sphinxext/ipython_directive.py"", line 586, in process_input
    raise RuntimeError('Non Expected warning in `{}` line {}'.format(filename, lineno))
RuntimeError: Non Expected warning in `/home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/latest/doc/plotting.rst` line 732
The full traceback has been saved in /tmp/sphinx-err-qav6jjmm.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at <https://github.com/sphinx-doc/sphinx/issues>. Thanks!

>>>-------------------------------------------------------------------------
Warning in /home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/latest/doc/plotting.rst at block ending on line 732
Specify :okwarning: as an option in the ipython:: block to suppress this message
----------------------------------------------------------------------------
/home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/latest/xarray/plot/facetgrid.py:373: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations.
  self.fig.tight_layout()
<<<-------------------------------------------------------------------------
```
https://readthedocs.org/projects/xray/builds/10969146/

<!-- Feel free to remove check-list items aren't relevant to your change -->
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4032/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
612262200,MDExOlB1bGxSZXF1ZXN0NDEzMjYwNTY2,4029,Support overriding existing variables in to_zarr() without appending,1217238,closed,0,,,2,2020-05-05T01:06:40Z,2020-05-05T19:28:02Z,2020-05-05T19:28:02Z,MEMBER,,0,pydata/xarray/pulls/4029,"This is nice for consistency with `to_netcdf`. It should be useful for cases where users want to update values in existing Zarr datasets.

<!-- Feel free to remove check-list items aren't relevant to your change -->

 - [x] Tests added
 - [x] Passes `isort -rc . && black . && mypy . && flake8`
 - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4029/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
187625917,MDExOlB1bGxSZXF1ZXN0OTI1MjQzMjg=,1087,WIP: New DataStore / Encoder / Decoder API for review,1217238,closed,0,,,8,2016-11-07T05:02:04Z,2020-04-17T18:37:45Z,2020-04-17T18:37:45Z,MEMBER,,0,pydata/xarray/pulls/1087,"The goal here is to make something extensible that we can live with for quite
some time, and to clean up the internals of xarray's backend interface.

Most of these are analogues of existing xarray classes with a cleaned up
interface. I have not yet worried about backwards compatibility or tests -- I
would appreciate feedback on the approach here.

Several parts of the logic exist for the sake of dask. I've included the word
""dask"" in comments to facilitate inspection by mrocklin.

CC @rabernat, @pwolfram, @jhamman, @mrocklin -- for review

CC @mcgibbon, @JoyMonteiro -- this is relevant to our discussion today about
adding support for appending to netCDF files. Don't let this stop you from
getting started on that with the existing interface, though.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1087/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
598567792,MDU6SXNzdWU1OTg1Njc3OTI=,3966,HTML repr is slightly broken in Google Colab,1217238,closed,0,,,1,2020-04-12T20:44:51Z,2020-04-16T20:14:37Z,2020-04-16T20:14:32Z,MEMBER,,,,"The ""data"" toggles are pre-expanded and don't work.

See https://github.com/googlecolab/colabtools/issues/1145 for a full description.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3966/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
479434052,MDU6SXNzdWU0Nzk0MzQwNTI=,3206,DataFrame with MultiIndex -> xarray with sparse array,1217238,closed,0,,,1,2019-08-12T00:46:16Z,2020-04-06T20:41:26Z,2019-08-27T08:54:26Z,MEMBER,,,,"Now that we have preliminary support for [sparse](https://sparse.pydata.org/en/latest/) arrays in xarray, one really cool feature we could explore is creating sparse arrays from MultiIndexed pandas DataFrames.

Right now, xarray's methods for creating objects from pandas always create dense arrays, but the size of these dense arrays can get big really quickly if the MultiIndex is sparsely populated, e.g.,
```python
import pandas as pd
import numpy as np
import xarray
df = pd.DataFrame({
    'w': range(10),
    'x': list('abcdefghij'),
    'y': np.arange(0, 100, 10),
    'z': np.ones(10),
}).set_index(['w', 'x', 'y'])
print(xarray.Dataset.from_dataframe(df))
```
This length 10 DataFrame turned into a dense array with 1000 elements (only 10 of which are not NaN):
```
<xarray.Dataset>
Dimensions:  (w: 10, x: 10, y: 10)
Coordinates:
  * w        (w) int64 0 1 2 3 4 5 6 7 8 9
  * x        (x) object 'a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j'
  * y        (y) int64 0 10 20 30 40 50 60 70 80 90
Data variables:
    z        (w, x, y) float64 1.0 nan nan nan nan nan ... nan nan nan nan 1.0
```

We can imagine `xarray.Dataset.from_dataframe(df, sparse=True)` would make the same Dataset, but with sparse array (with a `NaN` fill value) instead of dense arrays.

Once sparse arrays work pretty well, this could actually obviate most of the use cases for `MultiIndex` in arrays. Arguably the model is quite a bit cleaner.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3206/reactions"", ""total_count"": 3, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 3, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
479940669,MDU6SXNzdWU0Nzk5NDA2Njk=,3212,Custom fill_value for from_dataframe/from_series,1217238,open,0,,,0,2019-08-13T03:22:46Z,2020-04-06T20:40:26Z,,MEMBER,,,,"It would be to have the option to customize the fill value when creating an xarray objects from pandas, instead of requiring to always be NaN.

This would probably be especially useful when creating sparse arrays (https://github.com/pydata/xarray/issues/3206), for which it often makes sense to use a fill value of zero. If your data has integer values (e.g., it represents counts), you probably don't want to let it be cast to float first.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3212/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
314482923,MDU6SXNzdWUzMTQ0ODI5MjM=,2061,Backend specific conventions decoding,1217238,open,0,,,1,2018-04-16T02:45:46Z,2020-04-05T23:42:34Z,,MEMBER,,,,"Currently, we have a single function `xarray.decode_cf()` that we apply to data loaded from all xarray backends.

This is appropriate for netCDF data, but it's not appropriate for backends with different implementations. For example, it doesn't work for zarr (which is why we have the separate `open_zarr`), and is also a poor fit for PseudoNetCDF (https://github.com/pydata/xarray/pull/1905). In the worst cases (e.g., for PseudoNetCDF) it can actually result in data being decoded *twice*, which can result in incorrectly scaled data.

Instead, we should declare default decoders as part of the backend API, and use those decoders as the defaults for `open_dataset()`.

This should probably be tackled as part of the broader backends refactor: https://github.com/pydata/xarray/issues/1970
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2061/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
28376794,MDU6SXNzdWUyODM3Njc5NA==,25,Consistent rules for handling merges between variables with different attributes,1217238,closed,0,,,13,2014-02-26T22:37:01Z,2020-04-05T19:13:13Z,2014-09-04T06:50:49Z,MEMBER,,,,"Currently, variable attributes are checked for equality before allowing for a merge via a call to `xarray_equal`. It should be possible to merge datasets even if some of the variable metadata disagrees (conflicting attributes should be dropped). This is already the behavior for global attributes.

The right design of this feature should probably include some optional argument to `Dataset.merge` indicating how strict we want the merge to be. I can see at least three versions that could be useful:
1. Drop conflicting metadata silently.
2. Don't allow for conflicting values, but drop non-matching keys.
3. Require all keys and values to match.

We can argue about which of these should be the default option. My inclination is to be as flexible as possible by using 1 or 2 in most cases.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/25/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
173612265,MDU6SXNzdWUxNzM2MTIyNjU=,988,Hooks for custom attribute handling in xarray operations,1217238,open,0,,,24,2016-08-27T19:48:22Z,2020-04-05T18:19:11Z,,MEMBER,,,,"Over in #964, I am working on a rewrite/unification of the guts of xarray's logic for computation with labelled data. The goal is to get all of xarray's internal logic for working with labelled data going through a minimal set of flexible functions which we can also expose as part of the API.

Because we will finally have all (or at least nearly all) xarray operations using the same code path, I think it will also finally become feasible to open up hooks allowing extensions how xarray handles metadata.

Two obvious use cases here are units (#525) and automatic maintenance of metadata (e.g., [`cell_methods`](https://github.com/pydata/xarray/issues/987#issuecomment-242912131) or [`history`](#826) fields). Both of these are out of scope for xarray itself, mostly because the specific logic tends to be domain specific. This could also subsume options like the existing `keep_attrs` on many operations.

I like the idea of supporting something like NumPy's [`__array_wrap__`](http://docs.scipy.org/doc/numpy-1.11.0/reference/arrays.classes.html#numpy.class.__array_wrap__) to allow third-party code to finalize xarray objects in some way before they are returned. However, it's not obvious to me what the right design is.
- Should we lookup a custom attribute on subclasses like `__array_wrap__` (or `__numpy_ufunc__`) in NumPy, or should we have a system (e.g., unilaterally or with a context manager and `xarray.set_options`) for registering hooks that are then checked on _all_ xarray objects? I am inclined toward the later, even though it's a little slower, just because it will be simpler and easier to get right
- Should these methods be able to control the full result objects, or only set `attrs` and/or `name`?
- To be useful, do we need to allow extensions to take control of the full operation, to support things like automatic unit conversion? This would suggest something closing to `__numpy_ufunc__`, which is a little more ambitious than what I had in mind here.

Feedback would be greatly appreciated.

CC @darothen @rabernat @jhamman @pwolfram
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/988/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
29136905,MDU6SXNzdWUyOTEzNjkwNQ==,60,Implement DataArray.idxmax(),1217238,closed,0,,741199,14,2014-03-10T22:03:06Z,2020-03-29T01:54:25Z,2020-03-29T01:54:25Z,MEMBER,,,,"Should match the pandas function: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.idxmax.html
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/60/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue