id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
2161133346,PR_kwDOAMm_X85oSZw7,8797,tokenize() should ignore difference between None and {} attrs,6213168,closed,0,6213168,,1,2024-02-29T12:22:24Z,2024-03-01T11:15:30Z,2024-03-01T03:29:51Z,MEMBER,,0,pydata/xarray/pulls/8797,- Closes #8788,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8797/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
2088095900,PR_kwDOAMm_X85kaiOH,8618,Re-enable mypy checks for parse_dims unit tests,6213168,closed,0,6213168,,1,2024-01-18T11:32:28Z,2024-01-19T15:49:33Z,2024-01-18T15:34:23Z,MEMBER,,0,pydata/xarray/pulls/8618,"As per https://github.com/pydata/xarray/pull/8606#discussion_r1452680454
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8618/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
2079054085,PR_kwDOAMm_X85j77Os,8606,Clean up Dims type annotation,6213168,closed,0,6213168,,1,2024-01-12T15:05:40Z,2024-01-18T18:14:15Z,2024-01-16T10:26:08Z,MEMBER,,0,pydata/xarray/pulls/8606,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8606/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1678587031,I_kwDOAMm_X85kDTSX,7777,xarray minimum versions policy is more aggressive than NEP-29,6213168,closed,0,,,1,2023-04-21T14:06:15Z,2023-05-01T22:26:57Z,2023-05-01T22:26:57Z,MEMBER,,,,"### What is your issue?

In #4179 / #4907, the xarray policy around minimum supported version of dependencies was changed, with the reasoning that the previous policy (based on [NEP-29](https://numpy.org/neps/nep-0029-deprecation_policy.html)) was too aggressive.
Ironically, this caused xarray to drop Python 3.8 on Jan 26th (#7461), 3 months *before* what NEP-29 recommends (Apr 14th).
This is hard to defend - and in fact it sparked discontent (see late comments in #7461).

Regardless of what policy xarray decides to use internally, it should never be more aggressive than NEP-29.
[The xarray documentation](https://docs.xarray.dev/en/stable/getting-started-guide/installing.html#minimum-dependency-versions) is also incorrect, as it states ""Python: 24 months ([NEP-29](https://numpy.org/neps/nep-0029-deprecation_policy.html))"" which is not, in fact, in NEP-29.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7777/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1683335751,PR_kwDOAMm_X85PHLmT,7785,Remove pandas<2 pin,6213168,closed,0,,,1,2023-04-25T14:55:12Z,2023-04-26T17:51:53Z,2023-04-25T15:03:10Z,MEMBER,,0,pydata/xarray/pulls/7785,XREF #7650,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7785/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1140046499,PR_kwDOAMm_X84y7YhY,6282,Remove xfail from tests decorated by @gen_cluster,6213168,closed,0,,,1,2022-02-16T13:47:56Z,2023-04-25T14:53:35Z,2022-02-16T16:32:35Z,MEMBER,,0,pydata/xarray/pulls/6282,``@gen_cluster`` has now been fixed upstream.,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6282/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
309691307,MDU6SXNzdWUzMDk2OTEzMDc=,2028,slice using non-index coordinates,6213168,closed,0,,,21,2018-03-29T09:53:33Z,2023-02-08T19:47:22Z,2022-10-03T10:38:57Z,MEMBER,,,,"It should be relatively straightforward to allow slicing on coordinates that are not backed by an IndexVariable, or in other words coordinates that are on a dimension with a different name, as long as they are 1-dimensional (unsure about the multidimensional case).

E.g. given this array:
```
a = xarray.DataArray(
    [10, 20, 30],
    dims=['country'],
    coords={
        'country': ['US', 'Germany', 'France'],
        'currency': ('country', ['USD', 'EUR', 'EUR'])
    })

<xarray.DataArray (country: 3)>
array([10, 20, 30])
Coordinates:
  * country   (country) <U7 'US' 'Germany' 'France'
    currency  (country) <U3 'USD' 'EUR' 'EUR'
```

This is currently not possible:
```
a.sel(currency='EUR')


ValueError: dimensions or multi-index levels ['currency'] do not exist
```

It should be interpreted as a shorthand for:
```
a.sel(country=a.currency == 'EUR')

<xarray.DataArray (country: 2)>
array([20, 30])
Coordinates:
  * country   (country) <U7 'Germany' 'France'
    currency  (country) <U3 'EUR' 'EUR'
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2028/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
166441031,MDU6SXNzdWUxNjY0NDEwMzE=,907,unstack() treats string coords as objects,6213168,closed,0,,,7,2016-07-19T21:33:28Z,2022-09-27T12:11:36Z,2022-09-27T12:11:35Z,MEMBER,,,,"unstack() should be smart enough to recognise that all labels in a coord are strings, and convert them to numpy strings.
This is particularly relevant e.g. if you want to dump the xarray to netcdf and then read it with a non-python library.

``` python
import xarray

a = xarray.DataArray([[1,2],[3,4]], dims=['x', 'y'], coords={'x': ['x1', 'x2'], 'y': ['y1', 'y2']})
a
```

```
<xarray.DataArray (x: 2, y: 2)>
array([[1, 2],
       [3, 4]])
Coordinates:
  * y        (y) <U2 'y1' 'y2'
  * x        (x) <U2 'x1' 'x2'
```

``` python
a.stack(s=['x', 'y']).unstack('s')
```

```
<xarray.DataArray (x: 2, y: 2)>
array([[1, 2],
       [3, 4]])
Coordinates:
  * x        (x) object 'x1' 'x2'
  * y        (y) object 'y1' 'y2'
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/907/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
264509098,MDU6SXNzdWUyNjQ1MDkwOTg=,1624,Improve documentation and error validation for set_options(arithmetic_join),6213168,closed,0,,,7,2017-10-11T09:05:49Z,2022-06-25T20:01:07Z,2022-06-25T20:01:07Z,MEMBER,,,,"The documentation for set_options laconically says:

```
arithmetic_join: DataArray/Dataset alignment in binary operations. Default: 'inner'.
```

leaving the user wonder what the other options are.
Also, the set_options code does not make any kind of domain check on the possible values.
By scanning the code I gathered that the valid values (and their meanings) should be the same as align(join=...), but I'd like confirmation on that...","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1624/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
309686915,MDU6SXNzdWUzMDk2ODY5MTU=,2027,square-bracket slice a Dataset with a DataArray,6213168,open,0,,,4,2018-03-29T09:39:57Z,2022-04-18T03:51:25Z,,MEMBER,,,,"Given this:
```
ds = xarray.Dataset(
    data_vars={
        'vote': ('pupil', [5, 7, 8]),
        'age': ('pupil', [15, 14, 16])
    },
    coords={
        'pupil': ['Alice', 'Bob', 'Charlie']
    })


<xarray.Dataset>
Dimensions:  (pupil: 3)
Coordinates:
  * pupil    (pupil) <U7 'Alice' 'Bob' 'Charlie'
Data variables:
    vote     (pupil) int64 5 7 8
    age      (pupil) int64 15 14 16
```

Why does this work:
```
ds.age[ds.vote >= 6]

<xarray.DataArray 'age' (pupil: 2)>
array([14, 16])
Coordinates:
  * pupil    (pupil) <U7 'Bob' 'Charlie'
```

But this doesn't?
```
ds[ds.vote >= 6]

KeyError: False
```
``ds.vote >= 6`` is a DataArray with dims=('pupil', ) and dtype=bool, so I can't think of any ambiguity in what I want to achieve?

Workaround:
```
ds.sel(pupil=ds.vote >= 6)


<xarray.Dataset>
Dimensions:  (pupil: 2)
Coordinates:
  * pupil    (pupil) <U7 'Bob' 'Charlie'
Data variables:
    vote     (pupil) int64 7 8
    age      (pupil) int64 14 16
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2027/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
502130982,MDU6SXNzdWU1MDIxMzA5ODI=,3370,Hundreds of Sphinx errors,6213168,closed,0,,,14,2019-10-03T15:17:09Z,2022-04-17T20:33:05Z,2022-04-17T20:33:05Z,MEMBER,,,,"sphinx-build emits a ton of errors that need to be polished out:

https://readthedocs.org/projects/xray/builds/ -> latest -> open last step

Options for the long term:
- Change the ""Docs"" azure pipelines job to crash if there are new failures. From past experience though, this should come together with a sensible way to whitelist errors that can't be fixed. This will severely slow down development as PRs will systematically fail on such a check.
- Add a task in the release process where, immediately before closing a release, the maintainer needs to manually go through the sphinx-build log and fix any new issues. This would be a major extra piece of work for the maintainer.

I am honestly not excited by either of the above. Alternative suggestions are welcome.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3370/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
666896781,MDU6SXNzdWU2NjY4OTY3ODE=,4279,intersphinx looks for implementation modules,6213168,open,0,,,0,2020-07-28T08:55:12Z,2022-04-09T03:03:30Z,,MEMBER,,,,"This is a widespread issue caused by the pattern of defining objects in private module and then exposing them to the final user by importing them in the top-level ``__init__.py``, vs. how intersphinx works.

Exact same issue in different projects:
- https://github.com/aio-libs/aiohttp/issues/3714
- https://jira.mongodb.org/browse/MOTOR-338
- https://github.com/tkem/cachetools/issues/178
- https://github.com/AmphoraInc/xarray_mongodb/pull/22
- https://github.com/jonathanslenders/asyncio-redis/issues/143

If a project 
1. uses xarray, intersphinx, and autodoc 
2. subclasses any of the classes exposed by ``xarray/__init__.py`` and documents the new class with the ``:show-inheritance:`` flag
3. Starting from Sphinx 3, **has any of the above classes anywhere in a type annotation**

Then Sphinx emits a warning and fails to create a hyperlink, because intersphinx uses the ``__module__`` attribute to look up the object in objects.inv, but ``__module__`` points to the implementation module while objects.inv points to the top-level ``xarray`` module.

# Workaround
In conf.py:

```python
import xarray
xarray.DataArray.__module__ = ""xarray""
```

# Solution
Put the above hack in ``xarray/__init__.py``","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4279/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
505550120,MDU6SXNzdWU1MDU1NTAxMjA=,3391,map_blocks doesn't work when dask isn't installed,6213168,closed,0,,,1,2019-10-10T22:53:55Z,2021-11-24T17:25:24Z,2021-11-24T17:25:24Z,MEMBER,,,,"Iterative improvement on #3276 @dcherian

map_blocks crashes with ImportError if dask isn't installed, even if it's legal to run it on a DataArray/Dataset without any dask variables.
This forces writers of extension libraries to either not use map_blocks, add dask as a strict requirement, or write a switch in their own code.

Please change the code so that it works without dask (you'll need to write a stub of ``dask.is_dask_collection`` that always returns False) and add relevant tests to be triggered in our py36-bare-minimum CI environment.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3391/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
980223048,MDExOlB1bGxSZXF1ZXN0NzIwNTAxNTkz,5740,Remove ad-hoc handling of NEP18 libraries in CI,6213168,closed,0,,,1,2021-08-26T13:04:36Z,2021-09-04T10:53:39Z,2021-08-31T10:14:35Z,MEMBER,,0,pydata/xarray/pulls/5740,sparse and pint are mature enough that it is no longer necessary to have a separate CI environment for them.,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5740/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
945560052,MDExOlB1bGxSZXF1ZXN0NjkwODcyNTk1,5610,Fix gen_cluster failures; dask_version tweaks,6213168,closed,0,,,5,2021-07-15T16:26:21Z,2021-07-15T18:04:00Z,2021-07-15T17:25:43Z,MEMBER,,0,pydata/xarray/pulls/5610,"- fixes one of the issues reported in #5600
- ``distributed.utils_test.gen_cluster`` no longer accepts timeout=None for the sake of robustness
- deleted ancient dask backwards compatibility code
- clean up code around ``dask.__version__``","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5610/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
193294569,MDU6SXNzdWUxOTMyOTQ1Njk=,1151,Scalar coords vs. concat,6213168,open,0,,,11,2016-12-03T15:42:18Z,2021-07-08T17:42:18Z,,MEMBER,,,,"Why does this work:
```
>> import xarray
>> a = xarray.DataArray([1, 2, 3], dims=['x'], coords={'y': 10})
>> b = xarray.DataArray([4, 5, 6], dims=['x'])
>> a + b
<xarray.DataArray (x: 3)>
array([5, 7, 9])
Coordinates:
    y        int64 10
```
But this doesn't?
```
>> xarray.concat([a, b], dim='x')
KeyError: 'y'
```
It doesn't seem coherent to me...","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1151/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
305757822,MDU6SXNzdWUzMDU3NTc4MjI=,1995,apply_ufunc support for chunks on input_core_dims,6213168,open,0,,,13,2018-03-15T23:50:22Z,2021-05-17T18:59:18Z,,MEMBER,,,,"I am trying to optimize the following function:

    c = (a * b).sum('x', skipna=False)

where a and b are xarray.DataArray's, both with dimension x and both with dask backend.

I successfully obtained a 5.5x speedup with the following:

    @numba.guvectorize(['void(float64[:], float64[:], float64[:])'], '(n),(n)->()', nopython=True, cache=True)
    def mulsum(a, b, res):
        acc = 0
        for i in range(a.size):
            acc += a[i] * b[i]
        res.flat[0] = acc

    c = xarray.apply_ufunc(
        mulsum, a, b,
        input_core_dims=[['x'], ['x']],
        dask='parallelized', output_dtypes=[float])

The problem is that this introduces a (quite problematic, in my case) constraint that a and b can't be chunked on dimension x - which is theoretically avoidable as long as the kernel function doesn't need interaction between x[i] and x[j] (e.g. it can't work for an interpolator, which would require to rely on dask ghosting).

# Proposal 
Add a parameter to apply_ufunc, ``reduce_func=None``. reduce_func is a function which takes as input two parameters a, b that are the output of func. apply_ufunc will invoke it whenever there's chunking on an input_core_dim.

e.g. my use case above would simply become:

    c = xarray.apply_ufunc(
        mulsum, a, b,
        input_core_dims=[['x'], ['x']],
        dask='parallelized', output_dtypes=[float], reduce_func=operator.sum)

So if I have 2 chunks in a and b on dimension x, apply_ufunc will internally do

    c1 = mulsum(a1, b1)
    c2 = mulsum(a2, b2)
    c = operator.sum(c1, c2)

Note that reduce_func will be invoked exclusively in presence of dask='parallelized' and when there's chunking on one or more of the input_core_dims. If reduce_func is left to None, apply_ufunc will keep crashing like it does now.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1995/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
417356439,MDU6SXNzdWU0MTczNTY0Mzk=,2801,NaN-sized chunks,6213168,open,0,,,2,2019-03-05T15:30:14Z,2021-04-24T02:41:34Z,,MEMBER,,,,"It would be nice to have support for NaN-sized dask chunks, e.g. ``x[x > 2]``.
There are two problems:

1. ``x[x > 2]`` silently resolves the dask graph. It definitely shouldn't. There needs to be some discussion on what needs to happen to indices on the NaN-sized dimension; I can think of 3 options:
- silently drop any index that would become undefined
- drop any index that would become undefined and issue a warning
- hard crash if there is any index that would become undefined
- redesign IndexVariable so that it can contain dask data (probably much more complicated than the 3 above).
The above design decision is anyway for when there _is_ an index; dims without indices should just work.

2. This crashes:
```>>> a = xarray.DataArray([1, 2, 3, 4]).chunk(2)
>>> xarray.DataArray(a.data[a.data > 2]).compute()

ValueError: replacement data must match the Variable's shape
```
I didn't investigate but I suspect it should be trivial to fix. I'm not sure why there is a check at all? Any such health check  should be in dask only IMHO.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2801/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
817271773,MDExOlB1bGxSZXF1ZXN0NTgwNzkxNTcy,4965,Support for dask.graph_manipulation,6213168,closed,0,,,1,2021-02-26T11:19:09Z,2021-03-05T09:24:17Z,2021-03-05T09:24:14Z,MEMBER,,0,pydata/xarray/pulls/4965,"Second iteration upon https://github.com/pydata/xarray/pull/4884
CI is currently failing vs. dask git tip because of https://github.com/dask/dask/issues/7263 (unrelated to this PR)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4965/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
804694945,MDExOlB1bGxSZXF1ZXN0NTcwNDE5NjIz,4884,Compatibility with dask 2021.02.0,6213168,closed,0,,,0,2021-02-09T16:12:02Z,2021-02-11T18:33:03Z,2021-02-11T18:32:59Z,MEMBER,,0,pydata/xarray/pulls/4884,"Closes #4860
Reverts #4873 

Restore compatibility with dask 2021.02.0 by avoiding improper assumptions on the implementation details of ``da.Array.__dask_postpersist__()``.

This PR *does not* align xarray to the new dask collection spec (https://github.com/dask/dask/issues/7093), as I just realized that Datasets violate the rule of having all dask keys with the same name if they contain more than one dask variable - and cannot do otherwise. So I have to change the dask collection spec again to accommodate them.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4884/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
311573817,MDU6SXNzdWUzMTE1NzM4MTc=,2039,open_mfdataset: skip loading for indexes and coordinates from all but the first file,6213168,open,0,,,1,2018-04-05T11:32:02Z,2021-01-27T17:49:21Z,,MEMBER,,,,"This is a follow-up from #1521.

When invoking open_mfdataset, very frequently the user knows in advance that all of his coords that aren't
on the concat_dim are already aligned, and may be willing to blindly trust such assumption in exchange of a huge performance boost.

My production data: 200x NetCDF files on a not very performant NFS file system, concatenated on the ""scenario"" dimension:

```
xarray.open_mfdataset('cube.*.nc', engine='h5netcdf', concat_dim='scenario')

<xarray.Dataset>
Dimensions:      (attribute: 1, fx_id: 40, instr_id: 10765, scenario: 500001, timestep: 1)
Coordinates:
  * attribute    (attribute) object 'THEO/Value'
    currency     (instr_id) object 'ZAR' 'EUR' 'EUR' 'EUR' 'EUR' 'EUR' 'GBP' ...
  * fx_id        (fx_id) object 'GBP' 'USD' 'EUR' 'JPY' 'ARS' 'AUD' 'BRL' ...
  * instr_id     (instr_id) object 'S01626556_ZAE000204921' '537805_1275' ...
  * timestep     (timestep) datetime64[ns] 2016-12-31
    type         (instr_id) object 'American' 'Bond Future' 'Bond Future' ...
  * scenario     (scenario) object 'Base Scenario' 'SSMC_1' 'SSMC_2' ...
Data variables:
    FX           (fx_id, timestep, scenario) float64 dask.array<shape=(40, 1, 500001), chunksize=(40, 1, 2501)>
    instruments  (instr_id, attribute, timestep, scenario) float64 dask.array<shape=(10765, 1, 1, 500001), chunksize=(10765, 1, 1, 2501)>

CPU times: user 19.6 s, sys: 981 ms, total: 20.6 s
Wall time: 24.4 s
```


If I skip loading and comparing the non-index coords from all 200 files:

```
xarray.open_mfdataset('cube.*.nc'), engine='h5netcdf', concat_dim='scenario', coords='all')

<xarray.Dataset>
Dimensions:      (attribute: 1, fx_id: 40, instr_id: 10765, scenario: 500001, timestep: 1)
Coordinates:
  * attribute    (attribute) object 'THEO/Value'
  * fx_id        (fx_id) object 'GBP' 'USD' 'EUR' 'JPY' 'ARS' 'AUD' 'BRL' ...
  * instr_id     (instr_id) object 'S01626556_ZAE000204921' '537805_1275' ...
  * timestep     (timestep) datetime64[ns] 2016-12-31
    currency     (scenario, instr_id) object dask.array<shape=(500001, 10765), chunksize=(2501, 10765)>
  * scenario     (scenario) object 'Base Scenario' 'SSMC_1' 'SSMC_2' ...
    type         (scenario, instr_id) object dask.array<shape=(500001, 10765), chunksize=(2501, 10765)>
Data variables:
    FX           (fx_id, timestep, scenario) float64 dask.array<shape=(40, 1, 500001), chunksize=(40, 1, 2501)>
    instruments  (instr_id, attribute, timestep, scenario) float64 dask.array<shape=(10765, 1, 1, 500001), chunksize=(10765, 1, 1, 2501)>

CPU times: user 12.7 s, sys: 305 ms, total: 13 s
Wall time: 14.8 s
```

If I skip loading and comparing also the index coords from all 200 files:

```
cube = xarray.open_mfdataset(sh.resolve_env(f'{dynamic}/mtf/{cubename}/nc/cube.*.nc'), engine='h5netcdf',
                             concat_dim='scenario', 
                             drop_variables=['attribute', 'fx_id', 'instr_id', 'timestep', 'currency', 'type'])

<xarray.Dataset>
Dimensions:      (attribute: 1, fx_id: 40, instr_id: 10765, scenario: 500001, timestep: 1)
Coordinates:
  * scenario     (scenario) object 'Base Scenario' 'SSMC_1' 'SSMC_2' ...
Dimensions without coordinates: attribute, fx_id, instr_id, timestep
Data variables:
    FX           (fx_id, timestep, scenario) float64 dask.array<shape=(40, 1, 500001), chunksize=(40, 1, 2501)>
    instruments  (instr_id, attribute, timestep, scenario) float64 dask.array<shape=(10765, 1, 1, 500001), chunksize=(10765, 1, 1, 2501)>

CPU times: user 7.31 s, sys: 61 ms, total: 7.37 s
Wall time: 9.05 s
```

# Proposed design
Add a new optional parameter to open_mfdataset, ``assume_aligned=None``.
It can be valued to a list of variable names or ""all"", and requires ``concat_dim`` to be explicitly set.
It causes open_mfdataset to use the first occurrence of every variable and blindly skip loading the subsequent ones.

## Algorithm
1. Perform the first invocation to the underlying open_dataset like it happens now
2. if assume_aligned is not None: for each new NetCDF file, figure out which variables need to be aligned & compared (as opposed to concatenated), and add them to a drop_variables list.
3. if assume_aligned != ""all"": drop_variables &= assume_aligned
3. Pass the increasingly long drop_variables list to the underlying open_dataset","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2039/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
671216158,MDExOlB1bGxSZXF1ZXN0NDYxNDM4MDIz,4297,Lazily load resource files,6213168,closed,0,6213168,,4,2020-08-01T21:31:36Z,2020-09-22T05:32:38Z,2020-08-02T07:05:15Z,MEMBER,,0,pydata/xarray/pulls/4297,"- Marginal speed-up and RAM footprint reduction when not running in Jupyter Notebook
- Closes #4294","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4297/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
671108068,MDExOlB1bGxSZXF1ZXN0NDYxMzM1NDAx,4296,Increase support window of all dependencies,6213168,closed,0,6213168,,7,2020-08-01T18:55:54Z,2020-08-14T09:52:46Z,2020-08-14T09:52:42Z,MEMBER,,0,pydata/xarray/pulls/4296,"Closes #4295

Increase width of the sliding window for minimum supported version:
- setuptools from 6 months sliding window to hardcoded >= 38.4, and to 42 months sliding window starting from July 2021
- dask and distributed from 6 months sliding window to hardcoded >= 2.9, and to 12 months sliding window starting from January 2021
- all other libraries from 6 months to 12 months sliding window","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4296/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
671561223,MDExOlB1bGxSZXF1ZXN0NDYxNzY2OTA1,4299,Support PyCharm deployment over SSH,6213168,closed,0,,,3,2020-08-02T06:19:09Z,2020-08-03T19:41:36Z,2020-08-03T19:41:29Z,MEMBER,,0,pydata/xarray/pulls/4299,Fix ``pip install .`` when no ``.git`` directory exists; namely when the xarray source directory has been rsync'ed by PyCharm Professional for a remote deployment over SSH.,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4299/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
272004812,MDU6SXNzdWUyNzIwMDQ4MTI=,1699,apply_ufunc(dask='parallelized') output_dtypes for datasets,6213168,open,0,,,8,2017-11-07T22:18:23Z,2020-04-06T15:31:17Z,,MEMBER,,,,"When a Dataset has variables with different dtypes, there's no way to tell apply_ufunc that the same function applied to different variables will produce different dtypes:

```
ds1 = xarray.Dataset(data_vars={'a': ('x', [1, 2]), 'b': ('x', [3.0, 4.5])}).chunk()
ds2 = xarray.apply_ufunc(lambda x: x + 1, ds1, dask='parallelized', output_dtypes=[float])
ds2

<xarray.Dataset>
Dimensions:  (x: 2)
Dimensions without coordinates: x
Data variables:
    a        (x) float64 dask.array<shape=(2,), chunksize=(2,)>
    b        (x) float64 dask.array<shape=(2,), chunksize=(2,)>

ds2.compute()

<xarray.Dataset>
Dimensions:  (x: 2)
Dimensions without coordinates: x
Data variables:
    a        (x) int64 2 3
    b        (x) float64 4.0 5.5
```

### Proposed solution
When the output is a dataset, apply_ufunc could accept either ``output_dtypes=[t]`` (if all output variables will have the same dtype) or ``output_dtypes=[{var1: t1, var2: t2, ...}]``. In the example above, it would be ``output_dtypes=[{'a': int, 'b': float}]``.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1699/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
555752381,MDExOlB1bGxSZXF1ZXN0MzY3NjM3MzUw,3724,setuptools-scm (3),6213168,closed,0,,,3,2020-01-27T18:26:11Z,2020-02-14T12:07:22Z,2020-01-27T18:51:50Z,MEMBER,,0,pydata/xarray/pulls/3724,"Fix https://github.com/pydata/xarray/pull/3714#issuecomment-578626605
@shoyer I have no way of testing if this fixes github - please see by yourself after merging to master.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3724/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
557020666,MDExOlB1bGxSZXF1ZXN0MzY4Njg4MTAz,3727,Python 3.8 CI,6213168,closed,0,,,6,2020-01-29T17:50:52Z,2020-02-10T09:41:07Z,2020-01-31T15:52:19Z,MEMBER,,0,pydata/xarray/pulls/3727,"- Run full-fat suite of tests for Python 3.8
- Move asv, MacOSX tests, readthedocs, binder, and more to Python 3.8
- Test windows against latest numpy version
- Windows tests remain on Python 3.7 because of a couple of Python 3.8 tests that fail exclusively in CI. Will investigate later.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3727/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
557012230,MDExOlB1bGxSZXF1ZXN0MzY4NjgxMjgw,3726,Avoid unsafe use of pip,6213168,closed,0,,,3,2020-01-29T17:33:48Z,2020-01-30T12:23:05Z,2020-01-29T23:39:40Z,MEMBER,,0,pydata/xarray/pulls/3726,Closes #3725,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3726/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
554662467,MDExOlB1bGxSZXF1ZXN0MzY2Nzc1ODIz,3721,Add isort to CI,6213168,closed,0,,,9,2020-01-24T10:41:54Z,2020-01-30T12:22:53Z,2020-01-28T19:41:52Z,MEMBER,,0,pydata/xarray/pulls/3721,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3721/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
553518018,MDExOlB1bGxSZXF1ZXN0MzY1ODM1MjQ3,3714,setuptools-scm and one-liner setup.py,6213168,closed,0,,,12,2020-01-22T12:46:43Z,2020-01-27T07:42:36Z,2020-01-22T15:40:34Z,MEMBER,,0,pydata/xarray/pulls/3714,"- Closes #3369
- Replace versioneer with setuptools-scm
- Replace setup.py with setup.cfg
- Drop pytest-runner as instructed by deprecation notice on the project webpage
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3714/reactions"", ""total_count"": 2, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 2, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
554647652,MDExOlB1bGxSZXF1ZXN0MzY2NzYzMzQ3,3720,setuptools-scm and isort tweaks,6213168,closed,0,,,2,2020-01-24T10:12:03Z,2020-01-24T15:34:34Z,2020-01-24T15:28:48Z,MEMBER,,0,pydata/xarray/pulls/3720,"Follow-up on https://github.com/pydata/xarray/pull/3714

- Fix regression in mypy if pip creates a zipped archive
- Avoid breakage in the extremely unlikely event that setuptools is not installed
- Guarantee ``xarray.__version__`` to be always PEP440-compatible. This prevents a breakage if you run pandas without xarray installed and with the xarray sources folder in PYTHONPATH.
- Apply isort to ``xarray.__init__``
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3720/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
502082831,MDU6SXNzdWU1MDIwODI4MzE=,3369,Define a process to test the readthedocs CI before merging into master,6213168,closed,0,,,3,2019-10-03T13:56:02Z,2020-01-22T15:40:34Z,2020-01-22T15:40:33Z,MEMBER,,,,"This is an offshoot of #3358.

The readthedocs CI has a bad habit of failing even after the Azure Pipelines job ""Docs"" has succeeded.

After major changes that impact the documentation, and before merging everything into master, it would be advisable to explicitly verify that RTD builds correctly.

So far I tried to
1. create my own readthedocs project, https://readthedocs.org/projects/crusaderky-xarray/
2. point it to my fork https://github.com/crusaderky/xarray/
3. enable build for the branch I want to merge

This is currently failing because of an issue with versioneer, which incorrectly sets ``xarray.__version__`` to ``0+untagged.111.g6d60700``.
This in turn causes a failure in a minimum version check in ``pandas.DataFrame.to_xarray()`` on pandas>=0.25.

In the master RTD project https://readthedocs.org/projects/xray/, I can instead read ``xarray: 0.13.0+20.gdd2b803a``.

So far the only workaround I could find was to downgrade pandas to 0.24 in ``ci/requirements/doc.yml``.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3369/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
551532886,MDExOlB1bGxSZXF1ZXN0MzY0MjM4MTM2,3703,hardcoded xarray.__all__,6213168,closed,0,,,4,2020-01-17T17:09:45Z,2020-01-18T00:58:06Z,2020-01-17T20:42:25Z,MEMBER,,0,pydata/xarray/pulls/3703,Closes #3695,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3703/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
551544665,MDExOlB1bGxSZXF1ZXN0MzY0MjQ3NjE3,3705,One-off isort run,6213168,closed,0,,,4,2020-01-17T17:36:10Z,2020-01-17T22:59:26Z,2020-01-17T21:00:24Z,MEMBER,,0,pydata/xarray/pulls/3705,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3705/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
551544199,MDExOlB1bGxSZXF1ZXN0MzY0MjQ3MjQz,3704,Bump mypy to v0.761,6213168,closed,0,,,1,2020-01-17T17:35:09Z,2020-01-17T22:59:19Z,2020-01-17T18:51:51Z,MEMBER,,0,pydata/xarray/pulls/3704,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3704/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
522935511,MDExOlB1bGxSZXF1ZXN0MzQxMDM3NTg5,3533,2x~5x speed up for isel() in most cases,6213168,closed,0,,,7,2019-11-14T15:34:24Z,2019-12-05T16:45:40Z,2019-12-05T16:39:40Z,MEMBER,,0,pydata/xarray/pulls/3533,"Yet another major improvement for #2799.

Achieve a 2x to 5x boost in isel performance when slicing small arrays by int, slice, list of int, scalar ndarray, or 1-dimensional ndarray.

```python
import xarray

da = xarray.DataArray([[1, 2], [3, 4]], dims=['x', 'y'])
v = da.variable
a = da.variable.values
ds = da.to_dataset(name=""d"")

ds_with_idx = xarray.Dataset({
    'x': [10, 20],
    'y': [100, 200],
    'd': (('x', 'y'), [[1, 2], [3, 4]])
})
da_with_idx = ds_with_idx.d

# before -> after
%timeit a[0] # 121 ns
%timeit v[0] # 7 µs
%timeit v.isel(x=0) # 10 µs
%timeit da[0]  # 65 µs -> 15 µs
%timeit da.isel(x=0)  # 63 µs -> 13 µs
%timeit ds.isel(x=0)  # 48 µs -> 24 µs
%timeit da_with_idx[0]  # 209 µs -> 82 µs
%timeit da_with_idx.isel(x=0, drop=False)  # 135 µs -> 34 µs
%timeit da_with_idx.isel(x=0, drop=True)  # 101 µs -> 34 µs
%timeit ds_with_idx.isel(x=0, drop=False)  # 90 µs -> 49 µs
%timeit ds_with_idx.isel(x=0, drop=True)  # 65 µs -> 49 µs
```

Marked as WIP because this commands running the asv suite to verify there are no regressions for large arrays.
(on a separate note, we really need to add the small size cases to asv - as discussed in #3382).

This profoundly alters one of the most important methods in xarray and I must confess it makes me nervous, particularly as I am unsure if the test coverage of  DataArray.isel() is as through as that for Dataset.isel().","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3533/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
525689517,MDExOlB1bGxSZXF1ZXN0MzQzMjYxNDg0,3551,Clarify conda environments for new contributors,6213168,closed,0,,,1,2019-11-20T09:47:15Z,2019-11-20T14:50:48Z,2019-11-20T09:47:57Z,MEMBER,,0,pydata/xarray/pulls/3551,"<!-- Feel free to remove check-list items aren't relevant to your change -->

 - [x] Closes #3549","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3551/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
510915725,MDU6SXNzdWU1MTA5MTU3MjU=,3434,v0.14.1 Release,6213168,closed,0,,,18,2019-10-22T21:08:15Z,2019-11-19T23:44:52Z,2019-11-19T23:44:52Z,MEMBER,,,,"I think with the multiple recent breakages we've just had due to dependency upgrades, we should push out a patch release with some haste.

Please comment/add/object

Must have
--------------
- [x] numpy 1.18 support #3409
- [x] pseudonetcdf 3.1.0 support #3409, #3420 
- [x] require cftime != 1.0.4 #3463 
- [x] groupby reduce regression fix #3403
- [x] pandas master support #3440 
 
Nice to have
-----------------
- [x] ellipsis (...) work #1081, #3414, #3418, #3421, #3423, #3424
- [x] HTML repr #3425 (really mouth-watering, but I'm unsure about how far it is from completion)
- [x] groupby drop nan groups #3406 
- [x] deprecate `allow_lazy` #3435 
- [x] \_\_dask_tokenize\_\_ #3446 
- [x] dask name equality #3453
- [x] Leave empty slot when not using accessors #3531 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3434/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
523438384,MDExOlB1bGxSZXF1ZXN0MzQxNDQyMTI4,3537,Numpy 1.18 support,6213168,closed,0,,,13,2019-11-15T12:17:32Z,2019-11-19T14:06:50Z,2019-11-19T14:06:46Z,MEMBER,,0,pydata/xarray/pulls/3537,"Fix mean() and nanmean() for datetime64 arrays on numpy backend when upgrading from numpy 1.17 to 1.18.
All other nan-reductions on datetime64s were broken before and remain broken.
mean() on datetime64 and dask was broken before and remains broken.

 - [x] Closes #3409
 - [x] Passes `black . && mypy . && flake8`
 - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3537/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
522780826,MDExOlB1bGxSZXF1ZXN0MzQwOTEwMjQ3,3531,Leave empty slot when not using accessors,6213168,closed,0,,,1,2019-11-14T10:54:55Z,2019-11-15T17:43:57Z,2019-11-15T17:43:54Z,MEMBER,,0,pydata/xarray/pulls/3531,"Save a few bytes and nanoseconds for the overwhelming majority of the users that don't use accessors.
Lay the groundwork for potential future use of ``@pandas.utils.cache_readonly``.

xref https://github.com/pydata/xarray/issues/3514","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3531/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
503983776,MDU6SXNzdWU1MDM5ODM3NzY=,3382,Improve indexing performance benchmarks,6213168,open,0,,,0,2019-10-08T11:20:39Z,2019-11-14T15:52:33Z,,MEMBER,,,,"As discussed in #3375 - FYI @jhamman 

``asv_bench/benchmarks/indexing.py`` is currently missing some key use cases:

- All tests in the above module use arrays with 2~6 million points.
While this is important to spot any case where the numpy underlying functions start being unnecessarily called more than once, it also means any performance improvement or degradation in any of the pure-Python code will be completely drowned out.
All tests should be run twice, once with the current ``nx = 3000; ny = 2000; nt = 1000`` and again with ``nx = 15; ny = 10; nt = 5``.
- DataArray slicing (sel, isel, and square brackets)
- Slicing when there are no IndexVariables (verify that we're not creating dummy variables, doing a full scan on them, and then discarding them)
- other?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3382/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
521842949,MDExOlB1bGxSZXF1ZXN0MzQwMTQ1OTg0,3515,Recursive tokenization,6213168,closed,0,6213168,,1,2019-11-12T22:35:13Z,2019-11-13T00:54:32Z,2019-11-13T00:53:27Z,MEMBER,,0,pydata/xarray/pulls/3515,"After misreading the dask documentation  <https://docs.dask.org/en/latest/custom-collections.html#deterministic-hashing>, I was under the impression that the output of ``__dask_tokenize__`` would be recursively parsed, like it happens for ``__getstate__`` or ``__reduce__``. That's not the case - the output of ``__dask_tokenize__`` is just fed into a str() function so it has to be made explicitly recursive!","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3515/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
329251342,MDU6SXNzdWUzMjkyNTEzNDI=,2214,Simplify graph of DataArray.chunk(),6213168,closed,0,,,2,2018-06-04T23:30:19Z,2019-11-10T04:34:58Z,2019-11-10T04:34:58Z,MEMBER,,,,"```
>>> dict(xarray.DataArray([1, 2]).chunk().__dask_graph__())
{
    ('xarray-<this-array>-7e885b8e329090da3fe58d4483c0cf8b', 0): (<function dask.array.core.getter(a, b, asarray=True, lock=None)>, 'xarray-<this-array>-7e885b8e329090da3fe58d4483c0cf8b', (slice(0, 2, None),)),
    'xarray-<this-array>-7e885b8e329090da3fe58d4483c0cf8b': ImplicitToExplicitIndexingAdapter(array=NumpyIndexingAdapter(array=array([1, 2])))
}
```
There is no reason why this should be any more complicated than da.from_array:
```
>>> dict(da.from_array(np.array([1, 2]), chunks=2).__dask_graph__())
{
    ('array-de932becc43e72c010bc91ffefe42af1', 0): (<function dask.array.core.getter(a, b, asarray=True, lock=None)>, 'array-original-de932becc43e72c010bc91ffefe42af1', (slice(0, 2, None),)),
    'array-original-de932becc43e72c010bc91ffefe42af1': array([1, 2])
}
```
da.from_array itself should be simplified - see twin issue https://github.com/dask/dask/issues/3556","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2214/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
510527025,MDExOlB1bGxSZXF1ZXN0MzMwODg1Mzk2,3429,minor lint tweaks,6213168,closed,0,,,4,2019-10-22T09:15:03Z,2019-10-24T12:53:24Z,2019-10-24T12:53:21Z,MEMBER,,0,pydata/xarray/pulls/3429,"- Ran pyflakes 2.1.1
- Some f-string tweaks
- Ran black -t py36
- Ran mypy 0.740. We'll need to skip it and jump directly to 0.750 once it's released because of https://github.com/python/mypy/issues/7735
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3429/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
511869575,MDExOlB1bGxSZXF1ZXN0MzMxOTg2MzUw,3442,pandas-dev workaround,6213168,closed,0,,,0,2019-10-24T10:59:55Z,2019-10-24T11:43:42Z,2019-10-24T11:43:36Z,MEMBER,,0,pydata/xarray/pulls/3442,Temporary hack around #3440 to get green CI,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3442/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
510974193,MDExOlB1bGxSZXF1ZXN0MzMxMjU4MjQx,3436,MAGA (Make Azure Green Again),6213168,closed,0,,,3,2019-10-22T22:56:21Z,2019-10-24T09:57:59Z,2019-10-23T01:06:10Z,MEMBER,,0,pydata/xarray/pulls/3436,"Let all CI tests become green again to avoid hindering developers who are working on PRs unrelated to the present incompatibilities (numpy=1.18, cftime=1.0.4, pseudonetcdf=3.1.0).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3436/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
272002705,MDU6SXNzdWUyNzIwMDI3MDU=,1698,apply_ufunc(dask='parallelized') to infer output_dtypes,6213168,open,0,,,3,2017-11-07T22:11:11Z,2019-10-22T08:33:38Z,,MEMBER,,,,"If one doesn't provide the ``dtype`` parameter to ``dask.map_blocks()``, it automatically infers it by running the kernel on trivial dummy data.
It should be straightforward to make ``xarray.apply_ufunc(dask='parallelized')`` use the same functionality if the ``output_dtypes`` parameter is omitted.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1698/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
509655174,MDExOlB1bGxSZXF1ZXN0MzMwMTYwMDQy,3420,Restore crashing CI tests on pseudonetcdf-3.1,6213168,closed,0,,,5,2019-10-20T21:26:40Z,2019-10-21T01:32:54Z,2019-10-20T22:42:36Z,MEMBER,,0,pydata/xarray/pulls/3420,"Related to #3409 

The crashes caused by pseudonetcdf-3.1 are blocking all PRs.
Sorry I don't know anything about pseudonetcdf. This PR takes the issue out of the critical path so that whoever knows about the library can deal with it in due time.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3420/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
506885041,MDU6SXNzdWU1MDY4ODUwNDE=,3397,"""How Do I..."" formatting issues",6213168,closed,0,,,4,2019-10-14T21:32:27Z,2019-10-16T21:41:06Z,2019-10-16T21:41:06Z,MEMBER,,,,"@dcherian 
The new page http://xarray.pydata.org/en/stable/howdoi.html (#3357) is somewhat painful to read on readthedocs. The table goes out of the screen and one is forced to scroll left and right non stop.

Maybe a better alternative could be with Sphinx definitions syntax (which allows for automatic reflowing)?

```rst
How do I ...
============
Add variables from other datasets to my dataset?
    :py:meth:`Dataset.merge`
```
(that's a 4 spaces indent)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3397/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
506216396,MDExOlB1bGxSZXF1ZXN0MzI3NDg0OTQ2,3395,Annotate LRUCache,6213168,closed,0,,,0,2019-10-12T17:44:43Z,2019-10-12T20:05:36Z,2019-10-12T20:05:33Z,MEMBER,,0,pydata/xarray/pulls/3395,Very minor type annotations work,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3395/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
503163130,MDExOlB1bGxSZXF1ZXN0MzI1MDc2MzQ5,3375,Speed up isel and __getitem__,6213168,closed,0,6213168,,5,2019-10-06T21:27:42Z,2019-10-10T09:21:56Z,2019-10-09T18:01:30Z,MEMBER,,0,pydata/xarray/pulls/3375,"First iterative improvement for #2799.

Speed up Dataset.isel up to 33% and DataArray.isel up to 25% (when there are no indices and the numpy array is small). 15% speedup when there are indices.

Benchmarks can be found in #2799.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3375/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
500582648,MDExOlB1bGxSZXF1ZXN0MzIzMDIwOTY1,3358,Rolling minimum dependency versions policy,6213168,closed,0,6213168,,24,2019-09-30T23:50:39Z,2019-10-09T02:02:29Z,2019-10-08T21:23:47Z,MEMBER,,0,pydata/xarray/pulls/3358,"Closes #3222
Closes #3293 

- Drop support for Python 3.5
- Upgrade numpy to 1.14 (24 months old)
- Upgrade pandas to 0.24 (12 months old)
- Downgrade scipy to 1.0 (policy allows for 1.2, but it breaks numpy=1.14)
- Downgrade dask to 1.2 (6 months old)
- Other upgrades/downgrades to comply with the policy
- CI tool to verify that the minimum dependencies requirements in CI are compliant with the policy
- Overhaul CI environment for readthedocs

Out of scope:
- Purge away all OrderedDict's","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3358/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
481250429,MDU6SXNzdWU0ODEyNTA0Mjk=,3222,Minimum versions for optional libraries,6213168,closed,0,,,12,2019-08-15T17:18:16Z,2019-10-08T21:23:47Z,2019-10-08T21:23:47Z,MEMBER,,,,"In CI there are:

- tests for all the latest versions of all libraries, mandatory and optional (py36, py37, py37-windows)
- tests for the minimum versions of the mandatory libraries only (py35-min)

There are no tests for legacy versions of the optional libraries.

Today I tried downgrading dask in the py37 environment to dask=1.1.2, which is 6 months old...

**...it's a bloodbath.** 383 errors of the most diverse kind.

In the codebase I found mentions to much older minimum versions: installing.rst mentions dask >=0.16.1, and Dataset.chunk() even asks for dask>=0.9.

It think we should add CI tests for old versions of the optional dependencies.
What policy should we adopt when we find an incompatibility? How old a library should be not to bother fixing bugs and just require a newer version? I personally would go for an aggressive 6 months worth' of backwards compatibility; less if the time it takes to fix the issues is excessive.
The tests should run on py36 because py35 builds are becoming very scarce in anaconda.

This has the outlook of being an exercise in extreme frustration. I'm afraid I personally hold zero interest towards packages older than the latest available in the anaconda official repo, so I'm not volunteering for this one (sorry).

I'd like to hear other people's opinions and/or offers of self-immolation... :)

","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3222/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
485708282,MDU6SXNzdWU0ODU3MDgyODI=,3268,Stateful user-defined accessors,6213168,open,0,,,15,2019-08-27T09:54:28Z,2019-10-08T11:13:25Z,,MEMBER,,,,"If anybody decorates a stateful class with ``@register_dataarray_accessor`` or ``@register_dataset_accessor``, the instance will lose its state on any method that invokes ``_to_temp_dataset``, as well as on a shallow copy. 

```python

In [1]: @xarray.register_dataarray_accessor('foo') 
   ...: class Foo: 
   ...:     def __init__(self, obj): 
   ...:         self.obj = obj 
   ...:         self.x = 1 
   ...:          
   ...:                                                                                                                                                                                                                                                        

In [2]: a = xarray.DataArray()                                                                                                                                                                                                                                 

In [3]: a.foo.x                                                                                                                                                                                                                                                
Out[3]: 1

In [4]: a.foo.x = 2                                                                                                                                                                                                                                            

In [5]: a.foo.x                                                                                                                                                                                                                                               
Out[5]: 2

In [6]: a.roll().foo.x                                                                                                                                                                                                                                        
Out[6]: 1

In [7]: a.copy(deep=False).foo.x                                                                                                                                                                                                                              
Out[7]: 1
```

While in the case of ``_to_temp_dataset`` it could be possible to spend (substantial) effort to retain the state, on the case of copy() it's impossible without modifying the accessor duck API, as one would need to tamper with the accessor instance in place and modify the pointer back to the DataArray/Dataset.

This issue is so glaring that it makes me strongly suspect that nobody saves any state in accessor classes. This kind of use would also be problematic in practical terms, as the accessor object would have a hard time realising when its own state is no longer coherent with the referenced DataArray/Dataset.

This design also carries the problem that it introduces a circular reference in the DataArray/Dataset. This means that, after someone invokes an accessor method on his DataArray/Dataset, then the whole object - _including the numpy buffers!_ - won't be instantly collected when it's dereferenced by the user, and it will have to instead wait for the next ``gc`` pass. This could cause huge increases in RAM usage overnight in a user application, which would be very hard to logically link to a change that just added a custom method.

Finally, with https://github.com/pydata/xarray/pull/3250/, this statefulness forces us to increase the RAM usage of all datasets and dataarrays by an extra slot, for all users, even if this feature is quite niche.

**Proposed solution**

Get rid of accessor caching altogether, and just recreate the accessor object from scratch every time it is invoked. In the documentation, clarify that the ``__init__`` method should not perform anything computationally intensive.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3268/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
502530652,MDExOlB1bGxSZXF1ZXN0MzI0NTkyODE1,3373,Lint,6213168,closed,0,,,1,2019-10-04T09:29:46Z,2019-10-04T22:18:48Z,2019-10-04T22:17:57Z,MEMBER,,0,pydata/xarray/pulls/3373,Minor cosmetic changes,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3373/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
470714103,MDU6SXNzdWU0NzA3MTQxMDM=,3154,pynio causes dependency conflicts in py36 CI build,6213168,closed,0,,,9,2019-07-20T21:00:43Z,2019-10-03T15:22:17Z,2019-10-03T15:22:17Z,MEMBER,,,,"On Saturday night, all Python 3.6 CI builds started failing. Python 3.7 is unaffected.
See https://dev.azure.com/xarray/xarray/_build/results?buildId=362&view=logs

MacOSX py36:
```
UnsatisfiableError: The following specifications were found to be in conflict:
  - pynio
  - python=3.6
  - rasterio
```

Linux py36:
```
UnsatisfiableError: The following specifications were found to be in conflict:
  - cfgrib[version='>=0.9.2']
  - h5netcdf
  - pynio
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3154/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
495221393,MDExOlB1bGxSZXF1ZXN0MzE4ODA4Njgy,3318,Allow weakref,6213168,closed,0,,,2,2019-09-18T13:19:09Z,2019-10-03T13:39:35Z,2019-09-18T15:53:51Z,MEMBER,,0,pydata/xarray/pulls/3318,"<!-- Feel free to remove check-list items aren't relevant to your change -->

 - [x] Closes #3317 
 - [x] Tests added
 - [x] Passes `black . && mypy . && flake8`
 - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3318/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
501461219,MDExOlB1bGxSZXF1ZXN0MzIzNzI5Mjkx,3365,Demo: CI offline?,6213168,closed,0,,,0,2019-10-02T12:34:38Z,2019-10-02T17:32:18Z,2019-10-02T17:32:13Z,MEMBER,,0,pydata/xarray/pulls/3365,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3365/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
501461397,MDU6SXNzdWU1MDE0NjEzOTc=,3366,CI offline?,6213168,closed,0,,,2,2019-10-02T12:35:00Z,2019-10-02T17:32:03Z,2019-10-02T17:32:03Z,MEMBER,,,,"Azure pipelines is not being triggered by PRs this morning. See https://github.com/pydata/xarray/pull/3358 and https://github.com/pydata/xarray/pull/3365.

Last run was 12 hours ago.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3366/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
500777641,MDExOlB1bGxSZXF1ZXN0MzIzMTc1OTk0,3359,Revisit # noqa annotations,6213168,closed,0,,,1,2019-10-01T09:35:15Z,2019-10-01T18:13:59Z,2019-10-01T18:13:56Z,MEMBER,,0,pydata/xarray/pulls/3359,"Revisit all ``# noqa`` annotation. Remove useless ones; replace blanket ones with specific error messages.
Work around https://github.com/PyCQA/pyflakes/issues/453.

note: ``# noqa: F811`` on the ``@overload``'ed functions works around a pyflakes bug already fixed in git master (https://github.com/PyCQA/pyflakes/pull/435) but not in a release yet, so it has to stay for now.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3359/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
500912288,MDExOlB1bGxSZXF1ZXN0MzIzMjg1ODgw,3360,WIP: Fix codecov.io upload on Windows,6213168,closed,0,,,1,2019-10-01T13:53:19Z,2019-10-01T15:13:21Z,2019-10-01T14:11:22Z,MEMBER,,0,pydata/xarray/pulls/3360,Closes #3354,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3360/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
498399866,MDExOlB1bGxSZXF1ZXN0MzIxMzM5MjE1,3346,CI test suites with pinned minimum dependencies,6213168,closed,0,,,2,2019-09-25T16:38:44Z,2019-09-26T09:38:59Z,2019-09-26T09:38:47Z,MEMBER,,0,pydata/xarray/pulls/3346,"Second step towards resolving #3222.
Added two suites of CI tests:
- Pinned minimum versions for all optional dependencies, except NEP18-dependant ones
- Pinned minimum versions for NEP18 optional dependencies - at the moment only sparse; soon also pint (#3238)

**All versions are the frozen snapshot of what py36.yml deploys today**.
This PR ensures that we won't have accidental breakages _from this moment on_. 
I made no effort  to try downgrading to sensible obsolete versions, as that would require a completely different order of magnitude of work. 
I would suggest to proceed  with the downgrades (and consequent bugfixes) over several small, iterative future PRs that build upon this framework.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3346/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
497945632,MDExOlB1bGxSZXF1ZXN0MzIwOTgwNzIw,3340,CI environments overhaul,6213168,closed,0,,,7,2019-09-24T22:01:10Z,2019-09-25T01:50:08Z,2019-09-25T01:40:55Z,MEMBER,,0,pydata/xarray/pulls/3340,"Propaedeutic CI work to #3222.

- py36 and py37 are now identical
- Many optional dependencies were missing in one test suite or another (see details below)
- Tests that require hypothesis now always run if hypothesis is installed
- py37-windows.yml requirements file has been rebuilt starting from py37.yml
- Sorted requirements files alphabetically for better maintainability
- Added black. This is not needed by CI, but I personally use these yaml files to deploy my dev environment and I would expect many more developers to do the same.
  Alternatively, we could go the other way around and remove flake8 from everywhere and mypy from py36 and py37-windows. IMHO the marginal speedup would not be worth the complication.

Added packages to py36.yml (net of changes in order):
+ black
+ hypothesis
+ nc-time-axis
+ numba
+ numbagg
+ pynio (https://github.com/pydata/xarray/issues/3154 seems to be now fixed upstream)
+ sparse

Added packages to py37.yml (net of changes in order):

+ black
+ cdms2
+ hypothesis
+ iris>=1.10
+ numba (previously implicitly installed from pip by numbagg; now installed from conda)
+ pynio

Added packages to py37-windows.yml (net of changes in order):

+ black
+ bottleneck
+ flake8
+ hypothesis
+ iris>=1.10
+ lxml
+ mypy==0.720
+ numba
+ numbagg
+ pseudonetcdf>=3.0.1
+ pydap
+ sparse
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3340/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
478886013,MDExOlB1bGxSZXF1ZXN0MzA1OTA3Mzk2,3196,One-off isort run,6213168,closed,0,,,5,2019-08-09T09:17:39Z,2019-09-09T08:28:05Z,2019-08-23T20:33:04Z,MEMBER,,0,pydata/xarray/pulls/3196,"A one-off, manually vetted and tweaked isort run","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3196/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
484499801,MDExOlB1bGxSZXF1ZXN0MzEwMzYxOTMz,3250,__slots__,6213168,closed,0,,,10,2019-08-23T12:16:44Z,2019-08-30T12:13:28Z,2019-08-29T17:14:20Z,MEMBER,,0,pydata/xarray/pulls/3250,"What changes:
- Most classes now define ``__slots__``
- removed ``_initialized`` property
- Enforced checks that all subclasses must also define ``__slots__``. For third-party subclasses, this is for now a DeprecationWarning and should be changed into a hard crash later on.
- 22% reduction in RAM usage
- 5% performance speedup for a DataArray method that performs a ``_to_temp_dataset`` roundtrip 

**DISCUSS:** support for third party subclasses is very poor at the moment (#1097). Should we skip the deprecation altogether?

Performance benchmark:
```python
import timeit
import psutil
import xarray

a = xarray.DataArray([1, 2], dims=['x'], coords={'x': [10, 20]})
RUNS = 10000
t = timeit.timeit(""a.roll(x=1, roll_coords=True)"", globals=globals(), number=RUNS)
print(""{:.0f} us"".format(t / RUNS * 1e6))

p = psutil.Process()
N = 100000
rss0 = p.memory_info().rss
x = [
    xarray.DataArray([1, 2], dims=['x'], coords={'x': [10, 20]})
    for _ in range(N)
]
rss1 = p.memory_info().rss
print(""{:.0f} bytes"".format((rss1 - rss0) / N))
```
Output:

| test          | env | master     | slots      |
|:-------------:|:---:|:----------:| ----------:|
| DataArray.roll | py35-min | 332 us     | 360 us     |
|  DataArray.roll | py37 | 354 us     | 337 us     |
| RAM usage of a DataArray | py35-min     | 2755 bytes | 2074 bytes |
| RAM usage of a DataArray | py37     | 1970 bytes | 1532 bytes |

The performance degradation on Python 3.5 is caused by the deprecation mechanism - see changes to common.py.

I honestly never realised that xarray objects are measured in kilobytes (vs. 32 bytes of underlying buffers!)

","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3250/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
479587855,MDExOlB1bGxSZXF1ZXN0MzA2NDQ4ODIw,3207,Annotations for .data_vars() and .coords(),6213168,closed,0,,,0,2019-08-12T11:08:45Z,2019-08-13T04:01:26Z,2019-08-12T20:49:02Z,MEMBER,,0,pydata/xarray/pulls/3207,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3207/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
479359871,MDExOlB1bGxSZXF1ZXN0MzA2Mjc0MTUz,3203,Match mypy version between CI and pre-commit hook,6213168,closed,0,,,0,2019-08-11T11:30:36Z,2019-08-12T21:03:11Z,2019-08-11T22:32:41Z,MEMBER,,0,pydata/xarray/pulls/3203,Pre-commit hook is currently failing because of an issue detected by mypy 0.720 but not by mypy 0.650,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3203/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
479359010,MDExOlB1bGxSZXF1ZXN0MzA2MjczNTY3,3202,chunk sparse arrays,6213168,closed,0,6213168,,4,2019-08-11T11:19:16Z,2019-08-12T21:02:31Z,2019-08-12T21:02:25Z,MEMBER,,0,pydata/xarray/pulls/3202,"Closes #3191

@shoyer I completely disabled wrapping in ImplicitToExplicitIndexingAdapter for sparse arrays, cupy arrays, etc.
I'm not sure if it's desirable; the chief problem is that I don't think I understand the purpose of ImplicitToExplicitIndexingAdapter to begin with... some enlightenment would be appreciated.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3202/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
478343417,MDU6SXNzdWU0NzgzNDM0MTc=,3191,DataArray.chunk() from sparse array produces malformed dask array,6213168,closed,0,,,1,2019-08-08T09:08:56Z,2019-08-12T21:02:24Z,2019-08-12T21:02:24Z,MEMBER,,,,"#3117 by @nvictus introduces support for sparse in plain xarray.
dask already supports it.

Running with:
- xarray git head
- dask 2.2.0
- numpy 1.16.4
- sparse 0.7.0
- NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1

```python
>>> import numpy, sparse, xarray, dask.array
>>> s = sparse.COO(numpy.array([1, 2]))         
>>> da1 =  dask.array.from_array(s)
>>> da1._meta
<COO: shape=(0,), dtype=int64, nnz=0, fill_value=0>
>>> da1.compute()
<COO: shape=(2,), dtype=int64, nnz=2, fill_value=0>
>>> da2 = xarray.DataArray(s).chunk().data
>>> da2._meta                                                                                                                                                                                                      
array([], dtype=int64)  # Wrong
>>> da2.compute()
RuntimeError: Cannot convert a sparse array to dense automatically. To manually densify, use the todense method.
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3191/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
478891507,MDExOlB1bGxSZXF1ZXN0MzA1OTExODA2,3197,Enforce mypy compliance in CI,6213168,closed,0,,,6,2019-08-09T09:29:55Z,2019-08-11T08:49:02Z,2019-08-10T09:48:33Z,MEMBER,,0,pydata/xarray/pulls/3197,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3197/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
478969353,MDExOlB1bGxSZXF1ZXN0MzA1OTc1NjM4,3198,Ignore example.grib.0112.idx,6213168,closed,0,,,0,2019-08-09T12:47:12Z,2019-08-09T12:49:02Z,2019-08-09T12:48:08Z,MEMBER,,0,pydata/xarray/pulls/3198,"``open_dataset(""<name>.grib"", engine=""cfgrib"")`` creates a new file in the same directory called ``<name>.grib.<numbers>.idx``","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3198/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
477814538,MDExOlB1bGxSZXF1ZXN0MzA1MDYyMzUw,3190,pyupgrade one-off run,6213168,closed,0,,,2,2019-08-07T09:32:57Z,2019-08-09T08:50:22Z,2019-08-07T17:26:01Z,MEMBER,,0,pydata/xarray/pulls/3190,"A one-off, manually vetted and tweaked run of pyupgrade","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3190/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
476218350,MDExOlB1bGxSZXF1ZXN0MzAzODE4ODg3,3177,More annotations,6213168,closed,0,,,6,2019-08-02T14:49:50Z,2019-08-09T08:50:13Z,2019-08-06T01:19:36Z,MEMBER,,0,pydata/xarray/pulls/3177,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3177/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
202423683,MDU6SXNzdWUyMDI0MjM2ODM=,1224,fast weighted sum,6213168,closed,0,,,5,2017-01-23T00:29:19Z,2019-08-09T08:36:11Z,2019-08-09T08:36:11Z,MEMBER,,,,"In my project I'm struggling with weighted sums of 2000-4000 dask-based xarrays. The time to reach the final dask-based array, the size of the final dask dict, and the time to compute the actual result are horrendous.

So I wrote the below which - as laborious as it may look - gives a performance boost nothing short of miraculous. At the bottom you'll find some benchmarks as well.

https://gist.github.com/crusaderky/62832a5ffc72ccb3e0954021b0996fdf

In my project, this deflated the size of the final dask dict from 5.2 million keys to 3.3 million and cut a 30% from the time required to define it.

I think it's generic enough to be a good addition to the core xarray module. Impressions?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1224/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
466750687,MDU6SXNzdWU0NjY3NTA2ODc=,3092,black formatting,6213168,closed,0,,,14,2019-07-11T08:43:55Z,2019-08-08T22:34:53Z,2019-08-08T22:34:53Z,MEMBER,,,,"I, like many others, have irreversibly fallen in love with black.
Can we apply it to the existing codebase and as an enforced CI test?
The only (big) problem is that developers will need to manually apply it to any open branches and then merge from master - and even then, merging likely won't be trivial.
How did the dask project tackle the issue?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3092/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
475599589,MDU6SXNzdWU0NzU1OTk1ODk=,3174,CI failure downloading external data,6213168,closed,0,,,2,2019-08-01T10:21:36Z,2019-08-07T08:41:13Z,2019-08-07T08:41:13Z,MEMBER,,,,"The 'Docs' ci project is failing because http://naciscdn.org is unresponsive:

https://dev.azure.com/xarray/xarray/_build/results?buildId=408&view=logs&jobId=7e620c85-24a8-5ffa-8b1f-642bc9b1fc36

Excerpt:
```
/usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/cartopy/io/__init__.py:260: DownloadWarning: Downloading: http://naciscdn.org/naturalearth/110m/physical/ne_110m_coastline.zip
  warnings.warn('Downloading: {}'.format(url), DownloadWarning)

Exception occurred:
  File ""/usr/share/miniconda/envs/xarray-tests/lib/python3.7/urllib/request.py"", line 1319, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 110] Connection timed out>
The full traceback has been saved in /tmp/sphinx-err-nq73diee.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at <https://github.com/sphinx-doc/sphinx/issues>. Thanks!
##[error]Bash exited with code '2'.
##[section]Finishing: Build HTML docs
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3174/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
466886456,MDExOlB1bGxSZXF1ZXN0Mjk2NjQ1MTgy,3095,Fix regression: IndexVariable.copy(deep=True) casts dtype=U to object,6213168,closed,0,,,6,2019-07-11T13:16:16Z,2019-08-02T14:37:52Z,2019-08-02T14:02:50Z,MEMBER,,0,pydata/xarray/pulls/3095," - [x] Closes #3094
 - [x] Tests added
 - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3095/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
466815556,MDU6SXNzdWU0NjY4MTU1NTY=,3094,REGRESSION: copy(deep=True) casts unicode indices to object,6213168,closed,0,,,3,2019-07-11T10:46:28Z,2019-08-02T14:02:50Z,2019-08-02T14:02:50Z,MEMBER,,,,"Dataset.copy(deep=True) and DataArray.copy (deep=True/False) accidentally cast IndexVariable's with dtype='<U*' to object. Same applies to copy.copy() and copy.deepcopy().

This is a regression in xarray >= 0.12.2. xarray 0.12.1 and earlier are unaffected.

```

In [1]: ds = xarray.Dataset(
   ...:     coords={'x': ['foo'], 'y': ('x', ['bar'])},
   ...:     data_vars={'z': ('x', ['baz'])})                                                              

In [2]: ds                                                                                                                                                                                                                     
Out[2]: 
<xarray.Dataset>
Dimensions:  (x: 1)
Coordinates:
  * x        (x) <U3 'foo'
    y        (x) <U3 'bar'
Data variables:
    z        (x) <U3 'baz'

In [3]: ds.copy()                                                                                                                                                                                                              
Out[3]: 
<xarray.Dataset>
Dimensions:  (x: 1)
Coordinates:
  * x        (x) <U3 'foo'
    y        (x) <U3 'bar'
Data variables:
    z        (x) <U3 'baz'

In [4]: ds.copy(deep=True)                                                                                                                                                                                                     
Out[4]: 
<xarray.Dataset>
Dimensions:  (x: 1)
Coordinates:
  * x        (x) object 'foo'
    y        (x) <U3 'bar'
Data variables:
    z        (x) <U3 'baz'

In [5]: ds.z                                                                                                                                                                                                                   
Out[5]: 
<xarray.DataArray 'z' (x: 1)>
array(['baz'], dtype='<U3')
Coordinates:
  * x        (x) <U3 'foo'
    y        (x) <U3 'bar'

In [6]: ds.z.copy()                                                                                                                                                                                                            
Out[6]: 
<xarray.DataArray 'z' (x: 1)>
array(['baz'], dtype='<U3')
Coordinates:
  * x        (x) object 'foo'
    y        (x) <U3 'bar'

In [7]: ds.z.copy(deep=True)                                                                                                                                                                                                   
Out[7]: 
<xarray.DataArray 'z' (x: 1)>
array(['baz'], dtype='<U3')
Coordinates:
  * x        (x) object 'foo'
    y        (x) <U3 'bar'
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3094/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
475571573,MDExOlB1bGxSZXF1ZXN0MzAzMjk0OTEx,3173,Fix distributed.Client.compute applied to DataArray,6213168,closed,0,,,1,2019-08-01T09:22:39Z,2019-08-02T05:04:51Z,2019-08-01T21:43:11Z,MEMBER,,0,pydata/xarray/pulls/3173,"<!-- Feel free to remove check-list items aren't relevant to your change -->

 - [x] Closes #3171
 - [x] Tests added
 - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3173/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
475244610,MDU6SXNzdWU0NzUyNDQ2MTA=,3171,distributed.Client.compute fails on DataArray,6213168,closed,0,,,2,2019-07-31T16:33:01Z,2019-08-01T21:43:11Z,2019-08-01T21:43:11Z,MEMBER,,,,"As of
- dask 2.1.0
- distributed 2.1.0
- xarray 0.12.1 or git head (didn't try older versions):

```python
>>> import xarray
>>> import distributed
>>> client = distributed.Client(set_as_default=False)
>>> ds = xarray.Dataset({'d': ('x', [1, 2])}).chunk(1)
>>> client.compute(ds).result()
<xarray.Dataset>
Dimensions:  (x: 2)
Dimensions without coordinates: x
Data variables:
    d        (x) int64 1 2

>>> client.compute(ds.d).result()
distributed.worker - WARNING -  Compute Failed
Function:  _dask_finalize
args:      ([[array([1]), array([2])]], <function Dataset._dask_postcompute at 0x316a1db70>, ([(True, <this-array>, (<function Variable._dask_finalize at 0x3168f7f28>, (<function finalize at 0x1166bb8c8>, (), ('x',), OrderedDict(), None)))], set(), {'x': 2}, None, None, None, None), 'd')
kwargs:    {}
Exception: KeyError(<this-array>)

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-8-2dbfe1b2ff17> in <module>
----> 1 client.compute(ds.d).result()

/anaconda3/lib/python3.7/site-packages/distributed/client.py in result(self, timeout)
    226         result = self.client.sync(self._result, callback_timeout=timeout, raiseit=False)
    227         if self.status == ""error"":
--> 228             six.reraise(*result)
    229         elif self.status == ""cancelled"":
    230             raise result

/anaconda3/lib/python3.7/site-packages/six.py in reraise(tp, value, tb)
    690                 value = tp()
    691             if value.__traceback__ is not tb:
--> 692                 raise value.with_traceback(tb)
    693             raise value
    694         finally:

~/PycharmProjects/xarray/xarray/core/dataarray.py in _dask_finalize()
    706     def _dask_finalize(results, func, args, name):
    707         ds = func(results, *args)
--> 708         variable = ds._variables.pop(_THIS_ARRAY)
    709         coords = ds._variables
    710         return DataArray(variable, coords, name=name, fastpath=True)
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3171/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
467756080,MDExOlB1bGxSZXF1ZXN0Mjk3MzQwNTEy,3112,More annotations in Dataset,6213168,closed,0,,,10,2019-07-13T19:06:49Z,2019-08-01T10:41:51Z,2019-07-31T17:48:00Z,MEMBER,,0,pydata/xarray/pulls/3112,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3112/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
252548859,MDU6SXNzdWUyNTI1NDg4NTk=,1524,(trivial) xarray.quantile silently resolves dask arrays,6213168,closed,0,,,9,2017-08-24T09:54:11Z,2019-07-23T00:18:06Z,2017-08-28T17:31:57Z,MEMBER,,,,"In variable.py, line 1116, you're missing a raise statement:

```
        if isinstance(self.data, dask_array_type):
            TypeError(""quantile does not work for arrays stored as dask ""
                      ""arrays. Load the data via .compute() or .load() prior ""
                      ""to calling this method."")
```

Currently looking into extending dask.percentile() to support more than 1D arrays, and then use it in xarray too.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1524/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
466765652,MDExOlB1bGxSZXF1ZXN0Mjk2NTQ1MjA4,3093,Increase minimum Python version to 3.5.3,6213168,closed,0,,,2,2019-07-11T09:12:02Z,2019-07-13T23:54:48Z,2019-07-13T21:58:31Z,MEMBER,,0,pydata/xarray/pulls/3093,"Closes #3089
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3093/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
465984161,MDU6SXNzdWU0NjU5ODQxNjE=,3089,Python 3.5.0-3.5.1 support,6213168,closed,0,,,5,2019-07-09T21:04:28Z,2019-07-13T21:58:31Z,2019-07-13T21:58:31Z,MEMBER,,,,"Python 3.5.0 has gone out of the conda-forge repository. 3.5.1 is still there... for now.
The anaconda repository starts directly from 3.5.4.
3.5.0 and 3.5.1 are a colossal pain in the back for typing support.
Is this a good time to increase the requirement to >= 3.5.2?
I honestly can't think how anybody could be unable to upgrade to the latest available 3.5 with minimal effort...","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3089/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
264517839,MDU6SXNzdWUyNjQ1MTc4Mzk=,1625,Option for arithmetics to ignore nans created by alignment,6213168,closed,0,,,3,2017-10-11T09:33:34Z,2019-07-11T09:48:07Z,2019-07-11T09:48:07Z,MEMBER,,,,"Can anybody tell me if there is anybody who benefits from this behaviour? I can't think of any good use cases.

```
wallet = xarray.DataArray([50, 70], dims=['currency'], coords={'currency': ['EUR', 'USD']})
restaurant_bill = xarray.DataArray([30], dims=['currency'], coords={'currency': ['USD']})
with xarray.set_options(arithmetic_join=""outer""):
    print(wallet - restaurant_bill)

<xarray.DataArray (currency: 2)>
array([ nan,  40.])
Coordinates:
  * currency  (currency) object 'EUR' 'USD'
```

While it is fairly clear why it can be desirable to have ``nan + not nan = nan`` as a default in arithmetic when the nan is already present in one of the input arrays, when the nan is introduced as part of an automatic align things become much less intuitive.

Proposal:
- add a parameter to ``xarray.align``, ``fillvalue=numpy.nan``, which determines what will appear in the newly created array elements
- change \_\_add\_\_, \_\_sub\_\_ etc. to invoke ``xarray.align(fillvalue=0)``
- change \_\_mul\_\_, \_\_truediv\_\_ etc. to invoke ``xarray.align(fillvalue=1)``

In theory the setting could be left as an opt-in as ``set_options(arithmetic_align_fillvalue='neutral')``, yet I wonder who would actually want the current behaviour?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1625/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
466004569,MDExOlB1bGxSZXF1ZXN0Mjk1OTM1Nzg2,3090,WIP: more annotations,6213168,closed,0,,,3,2019-07-09T22:02:44Z,2019-07-11T08:40:34Z,2019-07-11T04:20:56Z,MEMBER,,0,pydata/xarray/pulls/3090,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3090/reactions"", ""total_count"": 2, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 2, ""eyes"": 0}",,,13221727,pull
442159309,MDExOlB1bGxSZXF1ZXN0Mjc3MzMxMjQx,2950,Base classes in Python 3 don't need to subclass object,6213168,closed,0,,,3,2019-05-09T10:14:38Z,2019-07-09T20:06:21Z,2019-05-09T16:01:37Z,MEMBER,,0,pydata/xarray/pulls/2950,"<!-- Feel free to remove check-list items aren't relevant to your change -->

 - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2950/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
462401539,MDExOlB1bGxSZXF1ZXN0MjkzMTAxODQx,3065,kwargs.pop() cleanup,6213168,closed,0,,,7,2019-06-30T12:47:07Z,2019-07-09T20:06:13Z,2019-07-01T01:58:50Z,MEMBER,,0,pydata/xarray/pulls/3065,"- Clean up everywhere the pattern
```
def my_func(*args, **kwargs):
    my_optional_arg = kwargs.pop('my_optional_arg', None)
```
which was inherited from not being able to put named keyword arguments after ``*args`` in Python 2.

- Fix bug in SplineInterpolator where the ``__init__`` method would write to the class attributes of BaseInterpolator.
- ``map_dataarray`` was unintentionally and subtly relying on ``_process_cmap_cbar_kwargs`` to modify the kwargs in place. ``_process_cmap_cbar_kwargs`` is now strictly read-only and the modifications in kwargs have been made explicit in the caller function.
- Rename all 'kwds' to 'kwargs' for sake of coherency","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3065/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
464929212,MDExOlB1bGxSZXF1ZXN0Mjk1MDg4MjMx,3088,More annotations,6213168,closed,0,,,3,2019-07-07T08:40:15Z,2019-07-09T20:04:37Z,2019-07-09T16:23:12Z,MEMBER,,0,pydata/xarray/pulls/3088,"A little incremental addition to type annotations. By no means complete, but it should be ready for merge in its own right nonetheless.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3088/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
438421176,MDExOlB1bGxSZXF1ZXN0Mjc0NDU1ODQz,2929,Typing for DataArray/Dataset,6213168,closed,0,,,25,2019-04-29T17:19:35Z,2019-06-30T10:08:39Z,2019-06-25T22:03:40Z,MEMBER,,0,pydata/xarray/pulls/2929,"Status:
* I'm generally not pleased with the amount of added verbosity. Happy to accept suggestions on how to improve.
* Switching all variable names from str to Hashable. Without proper unit tests however (out of scope) non-string hashables are expected not to work most of the times. My preference would still be to stay limited on str...
* DataArray done.
* Dataset not done (except where it was hindering DataArray).
* mypy passes with the only error ``""Mapping[...]"" has no attribute ""copy""``. This is due to the fact that I can't see a way to use ``typing.OrderedDict`` without breaking compatibility with python < 3.7.2.
* py.test should be successful

@shoyer any early feedback is appreciated","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2929/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
430214243,MDExOlB1bGxSZXF1ZXN0MjY4MTUyODIw,2877,WIP: type annotations,6213168,closed,0,,,12,2019-04-08T00:55:31Z,2019-04-24T14:54:07Z,2019-04-10T18:41:50Z,MEMBER,,0,pydata/xarray/pulls/2877,Fixes #2869,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2877/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
341355638,MDU6SXNzdWUzNDEzNTU2Mzg=,2289,DataArray.to_csv(),6213168,closed,0,,,6,2018-07-15T21:56:20Z,2019-03-12T15:01:18Z,2019-03-12T15:01:18Z,MEMBER,,,,"I'm using xarray to aggregate 38 GB worth of NetCDF data into a bunch of CSV reports.
I have two problems:

1. The reports are 500,000 rows by 2,000 columns. Before somebody says ""if you're using CSV for this size of data you're doing it wrong"" - yes, I know, but it was the only way to make the data accessible to a bunch of people that only know how to use Excel and VBA. :tired_face:
The sheer size of the reports means that (1) it's unsavory to keep the whole thing in RAM (2) pandas to_csv will take ages to complete (as it's single-threaded). The slowness is compounded by the fact that I have to compress everything with gzip.
2. I have to produce up to 40 reports from the exact same NetCDF files. I use dask to perform the computation, and different reports share a large amount of intermediate graph nodes. So I need to do everything in a single invocation to ``dask.compute()`` to allow the dask scheduler to de-duplicate the nodes.

To solve both problems, I wrote a new function:
http://xarray-extras.readthedocs.io/en/latest/api/csv.html

And now my high level wrapper code looks like this:
```
# DataSet from 200 .nc files, with a total of 500000 points on the 'row' dimension
nc = xarray.open_mfdataset('inputs.*.nc')
reports = [
    # DataArrays with shape (500000, 2000), with the rows split in 200 chunks
    gen_report0(nc),
    gen_report1(nc),
    ....
    gen_report39(nc),
]
futures = [
    # dask.delayed objects
    to_csv(reports[0], 'report0.csv.gz', compression='gzip'),
    to_csv(reports[1], 'report1.csv.gz', compression='gzip'),
    ....
    to_csv(reports[39], 'report39.csv.gz', compression='gzip'),
]
dask.compute(*futures)
```
The function is currently production quality in xarray-extras, but it would be very easy to refactor it as a method of xarray.DataArray in the main library.

Opinions?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2289/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
166439490,MDU6SXNzdWUxNjY0Mzk0OTA=,906,unstack() sorts data alphabetically,6213168,closed,0,,,14,2016-07-19T21:25:26Z,2019-02-23T12:47:00Z,2019-02-23T12:47:00Z,MEMBER,,,,"DataArray.unstack() sorts the data alphabetically by label.
Besides being poor for performance, this is very problematic whenever the order matters, and the labels are not in alphabetical order to begin with.

``` python

import xarray
import pandas

index = [
    ['x1', 'first' ],
    ['x1', 'second'],
    ['x1', 'third' ],
    ['x1', 'fourth'],
    ['x0', 'first' ],
    ['x0', 'second'],
    ['x0', 'third' ],
    ['x0', 'fourth'],
]
index = pandas.MultiIndex.from_tuples(index, names=['x', 'count'])
s = pandas.Series(list(range(8)), index)
a = xarray.DataArray(s)
a
```

```
<xarray.DataArray (dim_0: 8)>
array([0, 1, 2, 3, 4, 5, 6, 7], dtype=int64)
Coordinates:
  * dim_0    (dim_0) object ('x1', 'first') ('x1', 'second') ('x1', 'third') ...
```

``` python
a.unstack('dim_0')
```

```
<xarray.DataArray (x: 2, count: 4)>
array([[4, 7, 5, 6],
       [0, 3, 1, 2]], dtype=int64)
Coordinates:
  * x        (x) object 'x0' 'x1'
  * count    (count) object 'first' 'fourth' 'second' 'third'
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/906/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
168469112,MDU6SXNzdWUxNjg0NjkxMTI=,926,stack() on dask array produces inefficient chunking,6213168,closed,0,,,4,2016-07-30T14:12:34Z,2019-02-01T16:04:43Z,2019-02-01T16:04:43Z,MEMBER,,,,"Whe the stack() method is used on a xarray with dask backend, one would expect that every output chunk is produced by exactly 1 input chunk.

This is not the case, as stack() actually produces an extremely fragmented dask array:
https://gist.github.com/crusaderky/07991681d49117bfbef7a8870e3cba67
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/926/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
193294729,MDU6SXNzdWUxOTMyOTQ3Mjk=,1152,Scalar coords seep into index coords,6213168,closed,0,,,8,2016-12-03T15:43:53Z,2019-02-01T16:02:12Z,2019-02-01T16:02:12Z,MEMBER,,,,"Is this by design? I can't put any sense in it
```
>> a = xarray.DataArray([1, 2, 3], dims=['x'], coords={'x': [1, 2, 3], 'y': 10})
>> a.coords['x']
<xarray.DataArray 'x' (x: 3)>
array([1, 2, 3])
Coordinates:
  * x        (x) int64 1 2 3
    y        int64 10
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1152/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
172291585,MDU6SXNzdWUxNzIyOTE1ODU=,979,align() should align chunks,6213168,closed,0,,,4,2016-08-20T21:25:01Z,2019-01-24T17:19:30Z,2019-01-24T17:19:30Z,MEMBER,,,,"In the xarray docs I read

> With the current version of dask, there is no automatic alignment of chunks when performing operations between dask arrays with different chunk sizes. If your computation involves multiple dask arrays with different chunks, you may need to explicitly rechunk each array to ensure compatibility. 

While chunk auto-alignment could be done within the dask library, that would be limited to arrays with the same dimensionality and same dims order. For example it would not be possible to have a dask library call to align the chunks on xarrays with the following dims:
- (time, latitude, longitude)
- (time)
- (longitude, latitude)

even if it makes perfect sense in xarray.

I think xarray.align() should take care of it automatically.

A safe algorithm would be to always scale down the chunksize when in conflict. This would prevent having chunks larger than expected, and should minimise (in a greedy way) the number of operations. It's also a good idea on dask.distributed, where merging two chunks could cause one of them to travel on the network - which is very expensive.

e.g. to reconcile chunksizes 
a: (5, 10, 6) 
b: (5, 7, 9)
the algorithm would rechunk both arrays to (5, 7, 3, 6).

Finally, when served with a numpy-based array and a dask-based array, align() should convert the numpy array to dask. The critical use case that would benefit from this behaviour is when align() is invoked inside a broadcast() between a tiny constant you just loaded from csv/pandas/pure python list/whatever - e.g. dims=(time, ) shape=(100, ) - and a huge dask-backed array e.g. dims=(time, scenario) shape=(100, 2\*\*30) chunks=(25, 2\*\*20).
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/979/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
296927704,MDU6SXNzdWUyOTY5Mjc3MDQ=,1909,Failure in test_cross_engine_read_write_netcdf3,6213168,closed,0,,,3,2018-02-13T23:48:44Z,2019-01-13T20:56:14Z,2019-01-13T20:56:14Z,MEMBER,,,,"Two unit tests are failing in the latest git master:
- GenericNetCDFDataTest.test_cross_engine_read_write_netcdf3
- GenericNetCDFDataTestAutocloseTrue.test_cross_engine_read_write_netcdf3

Both with the message:

```
xarray/tests/test_backends.py:1558: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
xarray/backends/api.py:286: in open_dataset
    autoclose=autoclose)
xarray/backends/netCDF4_.py:275: in open
    ds = opener()
xarray/backends/netCDF4_.py:199: in _open_netcdf4_group
    ds = nc4.Dataset(filename, mode=mode, **kwargs)
netCDF4/_netCDF4.pyx:2015: in netCDF4._netCDF4.Dataset.__init__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   OSError: [Errno -36] NetCDF: Invalid argument: b'/tmp/tmpwp675lnc/temp-1069.nc'

netCDF4/_netCDF4.pyx:1636: OSError
```

Attaching conda list: [conda.txt](https://github.com/pydata/xarray/files/1722111/conda.txt)



","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1909/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
311578894,MDU6SXNzdWUzMTE1Nzg4OTQ=,2040,to_netcdf() to automatically switch to fixed-length strings for compressed variables ,6213168,open,0,,,2,2018-04-05T11:50:16Z,2019-01-13T01:42:03Z,,MEMBER,,,,"When you have fixed-length numpy arrays of unicode characters (<U...) in a dataset, and you invoke to_netcdf() without any particular encoding, they are automatically stored as variable-length strings, unless you explicitly specify ``{'dtype': 'S1'}``.

Is this in order to save disk space in case strings vary wildly in size? I *may* be able to see the point in this case.
However, this approach is disastrous if variables are compressed, as any compression algorithm will reduce the zero-panning at the end of the strings to a negligible size.

My test data: a dataset with \~50 variables, of which half are strings of 10\~100 english characters and the other half are floats, all on a single dimension with 12k points.

Test 1:
```
ds.to_netcdf('uncompressed.nc')
```
Result: 45MB

Test 2:
```
encoding = {k: {'gzip': True, 'shuffle': True} for k in ds.variables}
ds.to_netcdf('bad-compression.nc', encoding=encoding)
```
Result: 42MB

Test 3:
```
encoding = {}
for k, v in ds.variables.items():
    encoding[k] = {'gzip': True, 'shuffle': True}
    if v.dtype.kind == 'U':
        encoding[k]['dtype'] = 'S1'
ds.to_netcdf('good-compression.nc', encoding=encoding)
```
Result: 5MB

# Proposal
In case of string variables, if no dtype is explicitly defined, to_netcdf() should dynamically assign it to S1 if compression is enabled, str if disabled.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2040/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
330473082,MDU6SXNzdWUzMzA0NzMwODI=,2219,to_netcdf broken encoding: dtype='S1' + chunksizes,6213168,open,0,,,2,2018-06-07T23:46:13Z,2019-01-13T01:38:51Z,,MEMBER,,,,"```
xarray.Dataset({'x': ['foo', 'bar', 'baz']}).to_netcdf(
    'foo.nc', engine='h5netcdf',
    encoding={'x': {'dtype': 'S1', 'zlib': True, 'chunksizes': (2, )}})

ValueError: ""chunks"" must have same rank as dataset shape
```
Same with ``engine='netcdf4'``. The issue is present in 0.10.6 as well as in 0.10.3.
The problem is obviously that dtype=S1 changes the shape of the variable before passing it to the backend, but while doing so doesn't also change an eventual chunksizes setting.

The workaround is to omit chunksizes or set it to True.

","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2219/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
339611449,MDU6SXNzdWUzMzk2MTE0NDk=,2273,to_netcdf uses deprecated and unnecessary dask call,6213168,closed,0,,,4,2018-07-09T21:20:20Z,2018-07-31T20:03:41Z,2018-07-31T19:42:20Z,MEMBER,,,,"```
>>> ds = xarray.Dataset({'x': 1})
>>> ds.to_netcdf('foo.nc')
dask/utils.py:1010: UserWarning: Deprecated, see dask.base.get_scheduler instead
```

Stack trace:
```
> xarray/backends/common.py(44)get_scheduler()
     43         from dask.utils import effective_get
---> 44         actual_get = effective_get(get, collection)
```
There are two separate problems here:

- dask recently changed API from ``get(get=callable)`` to ``get(scheduler=str)``. Should we
   - just increase the minimum version of dask (I doubt anybody will complain)
   - go through the hoops of dynamically invoking a different API depending on the dask version :sweat:
   - silence the warning now, and then increase the minimum version of dask the day that dask removes the old API entirely (risky)?
- xarray is calling dask even when it's unnecessary, as none of the variables in the example Dataset had a dask backend.
I don't think there are any CI suites for NetCDF without dask. I'm also wondering if they would bring any actual added value, as dask is small, has no exotic dependencies, and is pure Python; so I doubt anybody will have problems installing it whatever his setup is.

@shoyer opinion?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2273/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue