id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1216178209,PR_kwDOAMm_X8420Do_,6516,Use new importlib.metadata.entry_points interface where available,367900,closed,0,,,1,2022-04-26T16:06:35Z,2022-04-27T06:01:08Z,2022-04-27T01:07:51Z,CONTRIBUTOR,,0,pydata/xarray/pulls/6516,"With Python 3.10, the entry_points() method returning a SelectableGroups dict interface was deprecated. The preferred way is to now filter by group through a keyword argument.

- [x] Closes #6514.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6516/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
856900805,MDU6SXNzdWU4NTY5MDA4MDU=,5148,Handling of non-string dimension names,367900,open,0,,,5,2021-04-13T12:13:44Z,2022-04-09T01:36:19Z,,CONTRIBUTOR,,,,"While working on a pull request (#5149) for #5146 I came across an inconsistency in allowed dimension names. If I try and create a DataArray with a non-string dimension, I get a TypeError:

```python console
>>> import xarray as xr
>>> da = xr.DataArray(np.ones((5, 5)), dims=[1, ""y""])
...
TypeError: dimension 1 is not a string
```

But creating it with a string and renaming it works:

```python console
>>> da = xr.DataArray(np.ones((5, 5)), dims=[""x"", ""y""]).rename(x=1)
>>> da
<xarray.DataArray (1: 5, y: 5)>
array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])
Dimensions without coordinates: 1, y
```

I can create a dataset via this renaming, but trying to get the repr value fails as `xarray.core.utils.SortedKeysDict` tries to sort it and cannot compare the string dimension to the int dimension:

```python console
>>> import xarray as xr
>>> ds = xr.Dataset({""test"": xr.DataArray(np.ones((5, 5)), dims=[""x"", ""y""]).rename(x=1)})
>>> ds
...
~/software/external/xarray/xarray/core/formatting.py in dataset_repr(ds)
    519 
    520     dims_start = pretty_print(""Dimensions:"", col_width)
--> 521     summary.append(""{}({})"".format(dims_start, dim_summary(ds)))
    522 
    523     if ds.coords:

~/software/external/xarray/xarray/core/formatting.py in dim_summary(obj)
    422 
    423 def dim_summary(obj):
--> 424     elements = [f""{k}: {v}"" for k, v in obj.sizes.items()]
    425     return "", "".join(elements)
    426 

~/software/external/xarray/xarray/core/formatting.py in <listcomp>(.0)
    422 
    423 def dim_summary(obj):
--> 424     elements = [f""{k}: {v}"" for k, v in obj.sizes.items()]
    425     return "", "".join(elements)
    426 

/usr/lib/python3.9/_collections_abc.py in __iter__(self)
    847 
    848     def __iter__(self):
--> 849         for key in self._mapping:
    850             yield (key, self._mapping[key])
    851 

~/software/external/xarray/xarray/core/utils.py in __iter__(self)
    437 
    438     def __iter__(self) -> Iterator[K]:
--> 439         return iter(self.mapping)
    440 
    441     def __len__(self) -> int:

~/software/external/xarray/xarray/core/utils.py in __iter__(self)
    504     def __iter__(self) -> Iterator[K]:
    505         # see #4571 for the reason of the type ignore
--> 506         return iter(sorted(self.mapping))  # type: ignore[type-var]
    507 
    508     def __len__(self) -> int:

TypeError: '<' not supported between instances of 'str' and 'int'
```
The same thing happens if I call rename on the dataset rather than the array it is initialised with.

If the initialiser requires the dimension names to be strings, and other code (which includes the HTML formatter I was looking at when I found this) assume that they are, then `rename` and any other method which can alter dimension names should also enforce the string requirement.

**Environment**:

<details><summary>Output of <tt>xr.show_versions()</tt></summary>

INSTALLED VERSIONS
------------------
commit: 851d85b9203b49039237b447b3707b270d613db5
python: 3.9.2 (default, Feb 20 2021, 18:40:11) 
[GCC 10.2.0]
python-bits: 64
OS: Linux
OS-release: 5.11.13-arch1-1
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_NZ.UTF-8
LOCALE: en_NZ.UTF-8
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.17.0
pandas: 1.2.3
numpy: 1.20.1
scipy: 1.6.2
netCDF4: 1.5.6
pydap: None
h5netcdf: 0.10.0
h5py: 3.2.1
Nio: None
zarr: None
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.2
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.03.0
distributed: 2021.03.0
matplotlib: 3.4.1
cartopy: 0.18.0
seaborn: 0.11.1
numbagg: None
pint: None
setuptools: 54.2.0
pip: 20.3.1
conda: None
pytest: 6.2.3
IPython: 7.22.0
sphinx: 3.5.4
</details>","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5148/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
856728083,MDU6SXNzdWU4NTY3MjgwODM=,5146,HTML formatting of non-string attribute names fails,367900,closed,0,,,3,2021-04-13T08:36:31Z,2021-11-11T18:21:31Z,2021-11-11T18:21:31Z,CONTRIBUTOR,,,,"Working in a notebook (and presumably, anywhere else that uses the HTML formatter to show an array), non-string attribute keys cause an exception. The output then falls back to the repr value.

```python console
In [1]: import xarray as xr

In [2]: data = xr.DataArray([1, 2, 3], attrs={1: 3.14})

In [3]: data.attrs
Out[3]: {1: 3.14}

In [4]: data
```
```python traceback
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/usr/lib/python3.9/site-packages/IPython/core/formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
--> 345                 return method()
    346             return None
    347         else:

~/software/external/xarray/xarray/core/common.py in _repr_html_(self)
    148         if OPTIONS[""display_style""] == ""text"":
    149             return f""<pre>{escape(repr(self))}</pre>""
--> 150         return formatting_html.array_repr(self)
    151 
    152     def _iter(self: Any) -> Iterator[Any]:

~/software/external/xarray/xarray/core/formatting_html.py in array_repr(arr)
    269         sections.append(coord_section(arr.coords))
    270 
--> 271     sections.append(attr_section(arr.attrs))
    272 
    273     return _obj_repr(arr, header_components, sections)

~/software/external/xarray/xarray/core/formatting_html.py in _mapping_section(mapping, name, details_func, max_items_collapse, enabled)
    171     return collapsible_section(
    172         name,
--> 173         details=details_func(mapping),
    174         n_items=n_items,
    175         enabled=enabled,

~/software/external/xarray/xarray/core/formatting_html.py in summarize_attrs(attrs)
     47 
     48 def summarize_attrs(attrs):
---> 49     attrs_dl = """".join(
     50         f""<dt><span>{escape(k)} :</span></dt>"" f""<dd>{escape(str(v))}</dd>""
     51         for k, v in attrs.items()

~/software/external/xarray/xarray/core/formatting_html.py in <genexpr>(.0)
     48 def summarize_attrs(attrs):
     49     attrs_dl = """".join(
---> 50         f""<dt><span>{escape(k)} :</span></dt>"" f""<dd>{escape(str(v))}</dd>""
     51         for k, v in attrs.items()
     52     )

/usr/lib/python3.9/html/__init__.py in escape(s, quote)
     17     translated.
     18     """"""
---> 19     s = s.replace(""&"", ""&amp;"") # Must be done first!
     20     s = s.replace(""<"", ""&lt;"")
     21     s = s.replace("">"", ""&gt;"")

AttributeError: 'int' object has no attribute 'replace'
```
```python console
Out[4]: <xarray.DataArray (dim_0: 3)>
        array([1, 2, 3])
        Dimensions without coordinates: dim_0
        Attributes:
            1:        3.14
```

**Environment**:

<details><summary>Output of <tt>xr.show_versions()</tt></summary>

INSTALLED VERSIONS
------------------
commit: c91983d4765b23e0474231c85057d31f9b6b2f33
python: 3.9.2 (default, Feb 20 2021, 18:40:11) 
[GCC 10.2.0]
python-bits: 64
OS: Linux
OS-release: 5.11.13-arch1-1
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_NZ.UTF-8
LOCALE: en_NZ.UTF-8
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.17.0
pandas: 1.2.3
numpy: 1.20.1
scipy: 1.6.2
netCDF4: 1.5.6
pydap: None
h5netcdf: 0.10.0
h5py: 3.2.1
Nio: None
zarr: None
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.2
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.03.0
distributed: 2021.03.0
matplotlib: 3.4.1
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: None
setuptools: 54.2.0
pip: 20.3.1
conda: None
pytest: 6.2.3
IPython: 7.22.0
sphinx: 3.5.4
</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5146/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
567678992,MDU6SXNzdWU1Njc2Nzg5OTI=,3781,to_netcdf() doesn't work with multiprocessing scheduler,367900,open,0,,,4,2020-02-19T16:28:22Z,2021-09-25T16:02:41Z,,CONTRIBUTOR,,,,"If I create a chunked lazily-computed array, writing it to disk with `to_netcdf()` computes and writes it with the threading and distributed schedulers, but not with the multiprocessing scheduler. The only reference I've found when searching for the exception message comes from [this StackOverflow question](https://stackoverflow.com/q/55852025).

#### MCVE Code Sample

```python
import dask
import numpy as np
import xarray as xr

if __name__ == ""__main__"":
    # Simple worker function.
    def inner(ds):
        if sum(ds.dims.values()) == 0:
            return ds
        return ds**2

    # Some random data to work with.
    ds = xr.Dataset(
            {""test"": ((""a"", ""b""), np.random.uniform(size=(1000, 1000)))},
            {""a"": np.arange(1000), ""b"": np.arange(1000)}
    )

    # Chunk it and apply the worker to each chunk.
    ds_chunked = ds.chunk({""a"": 100, ""b"": 200})
    ds_squared = ds_chunked.map_blocks(inner)

    # Thread pool scheduler can compute while writing.
    dask.config.set(scheduler=""threads"")
    print(""Writing thread pool test to disk."")
    ds_squared.to_netcdf(""test-threads.nc"")

    # Local cluster with distributed works too.
    c = dask.distributed.Client()
    dask.config.set(scheduler=c)
    print(""Writing local cluster test to disk."")
    ds_squared.to_netcdf(""test-localcluster.nc"")

    # Process pool scheduler can compute.
    dask.config.set(scheduler=""processes"")
    print(""Computing with process pool scheduler."")
    ds_squared.compute()

    # But it cannot compute while writing.
    print(""Trying to write process pool test to disk."")
    ds_squared.to_netcdf(""test-process.nc"")

```

#### Expected Output

Complete netCDF files should be created from all three schedulers.


#### Problem Description

The thread pool and distributed local cluster schedulers result in a complete output. The process pool scheduler fails when trying to write (note that test-process.nc is created with the header and coordinate information, but no actual data is written). The traceback is:

```pytb
Traceback (most recent call last):
  File ""bug.py"", line 54, in <module>
    ds_squared.to_netcdf(""test-process.nc"")
  File ""/usr/lib/python3.8/site-packages/xarray/core/dataset.py"", line 1535, in to_netcdf
    return to_netcdf(
  File ""/usr/lib/python3.8/site-packages/xarray/backends/api.py"", line 1097, in to_netcdf
    writes = writer.sync(compute=compute)
  File ""/usr/lib/python3.8/site-packages/xarray/backends/common.py"", line 198, in sync
    delayed_store = da.store(
  File ""/usr/lib/python3.8/site-packages/dask/array/core.py"", line 923, in store
    result.compute(**kwargs)
  File ""/usr/lib/python3.8/site-packages/dask/base.py"", line 165, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File ""/usr/lib/python3.8/site-packages/dask/base.py"", line 436, in compute
    results = schedule(dsk, keys, **kwargs)
  File ""/usr/lib/python3.8/site-packages/dask/multiprocessing.py"", line 212, in get
    result = get_async(
  File ""/usr/lib/python3.8/site-packages/dask/local.py"", line 494, in get_async
    fire_task()
  File ""/usr/lib/python3.8/site-packages/dask/local.py"", line 460, in fire_task
    dumps((dsk[key], data)),
  File ""/usr/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py"", line 62, in dumps
    cp.dump(obj)
  File ""/usr/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py"", line 538, in dump
    return Pickler.dump(self, obj)
  File ""/usr/lib/python3.8/multiprocessing/synchronize.py"", line 101, in __getstate__
    context.assert_spawning(self)
  File ""/usr/lib/python3.8/multiprocessing/context.py"", line 363, in assert_spawning
    raise err
RuntimeError: Lock objects should only be shared between processes through inheritance
```

With a bit of editing of the system multiprocessing module I was able to determine that the lock being reported by this exception was the first lock created. I then added a breakpoint to the Lock constructor to get a traceback of what was creating it:

| File                 | Line | Function
|----------------------|------|-------------------------
| core/dataset.py      | 1535 | Dataset.to_netcdf
| backends/api.py      | 1071 | to_netcdf
| backends/netCDF4_.py |  350 | open
| backends/locks.py    |  114 | get_write_lock
| backends/locks.py    |   39 | _get_multiprocessing_lock

This last function creates the offending multiprocessing.Lock() object. Note that there are six Locks constructed and so its possible that the later-created ones would also cause an issue.

The h5netcdf backend has the same problem with Lock. However the SciPy backend gives a NotImplementedError for this:

```python
ds_squared.to_netcdf(""test-process.nc"", engine=""scipy"")
```

```pytb
Traceback (most recent call last):
  File ""bug.py"", line 54, in <module>
    ds_squared.to_netcdf(""test-process.nc"", engine=""scipy"")
  File ""/usr/lib/python3.8/site-packages/xarray/core/dataset.py"", line 1535, in to_netcdf
    return to_netcdf(
  File ""/usr/lib/python3.8/site-packages/xarray/backends/api.py"", line 1056, in to_netcdf
    raise NotImplementedError(
NotImplementedError: Writing netCDF files with the scipy backend is not currently supported with dask's multiprocessing scheduler
```

I'm not sure how simple it would be to get this working with the multiprocessing scheduler, or how vital it is given that the distributed scheduler works. If nothing else, it would be good to get the same NotImplementedError as with the SciPy backend.

#### Output of ``xr.show_versions()``
<details>

commit: None
python: 3.8.1 (default, Jan 22 2020, 06:38:00) 
[GCC 9.2.0]
python-bits: 64
OS: Linux
OS-release: 5.5.4-arch1-1
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_NZ.UTF-8
LOCALE: en_NZ.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.3

xarray: 0.15.0
pandas: 1.0.1
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: 0.7.4
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.1.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.10.1
distributed: 2.10.0
matplotlib: 3.1.3
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 45.2.0
pip: 19.3
conda: None
pytest: 5.3.5
IPython: 7.12.0
sphinx: 2.4.2
</details>","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3781/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
856901056,MDExOlB1bGxSZXF1ZXN0NjE0NDA4MzQz,5149,Convert attribute and dimension names to strings when generating HTML repr,367900,closed,0,,,5,2021-04-13T12:14:03Z,2021-05-04T03:39:00Z,2021-05-04T03:38:53Z,CONTRIBUTOR,,0,pydata/xarray/pulls/5149,"The standard repr() already handled non-string attribute names, but the HTML formatter failed when trying to escape HTML entitites in non-string names. This just calls str() before escape(). It also includes tests for Dataset, DataArray and Variable.

Reported in #5146. ~~Note that there may be a need to do the same for dimension names if they are allowed to be strings. Currently dimensions must be created as strings but can later be renamed to non-strings, see #5148.~~ Dimensions can be non-str, updated.

<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Tests added
- [x] Passes `pre-commit run --all-files`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5149/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
853473276,MDU6SXNzdWU4NTM0NzMyNzY=,5132,Backend caching should not use a relative path,367900,closed,0,,,4,2021-04-08T13:27:03Z,2021-04-15T12:12:26Z,2021-04-15T12:12:26Z,CONTRIBUTOR,,,,"Datasets opened from disk are cached with a key based (amongst other things) on their filename. If you have the same filename in different directories, and open them after changing directory, a cache collision occurs as the filename is the same and so the first opened dataset is always returned.

**Minimal Complete Verifiable Example**:

```python
import os
from pathlib import Path
import tempfile

import numpy as np
import xarray as xr


with tempfile.TemporaryDirectory() as d:
    base = Path(d).resolve()

    # Create some data in separate directories but with same filename.
    (base / ""zeros"").mkdir()
    z_fn = base / ""zeros"" / ""data.nc""
    xr.DataArray(np.zeros((5, 5), dtype=int)).to_netcdf(z_fn)
    (base / ""ones"").mkdir()
    o_fn = base / ""ones"" / ""data.nc""
    xr.DataArray(np.ones((5, 5), dtype=int)).to_netcdf(o_fn)

    # Open with the absolute path and check we get what we expect.
    z_abs = xr.open_dataarray(z_fn)
    o_abs = xr.open_dataarray(o_fn)
    assert (z_abs == 0).all(), ""zeros with absolute path incorrect""
    assert (o_abs == 1).all(), ""zeros with absolute path incorrect""

    # Open with relative path from base directory.
    os.chdir(base)
    z_base = xr.open_dataarray(""zeros/data.nc"")
    o_base = xr.open_dataarray(""ones/data.nc"")
    assert (z_base == 0).all(), ""zeros with relative path from base incorrect""
    assert (o_base == 1).all(), ""zeros with relative path from base incorrect""

    # Open from containing directory.
    os.chdir(base / ""zeros"")
    z_local = xr.open_dataarray(""data.nc"")
    os.chdir(base / ""ones"")
    o_local = xr.open_dataarray(""data.nc"")
    assert (z_local == 0).all(), ""zeros opened from containing dir incorrect""
    assert (o_local == 1).all(), ""ones opened from containing dir incorrect""

```

**What happened**: On master, the final assertion is triggered as the cache returns the zeros array instead of the ones.

**What you expected to happen**: No assertion.

**Anything else we need to know?**:
This was introduced in 50d97e9d. I found this with the above test script (named `cache_bug.py`) with a Git bisect session:

```console
$ git bisect start master v0.16.2
Bisecting: 88 revisions left to test after this (roughly 7 steps)
[d555172c7d069ca9cf7a9a32bfd5f422be133861] Allow swap_dims to take kwargs (#4841)
$ git bisect run python cache_bug.py
...
50d97e9d35bac783850827fa66ff5eb768e62905 is the first bad commit
...
```

I then manually confirmed this by running the script on 50d97e9d and its parent.

The caching is performed by `xarray.backends.file_manager.CachingFileManager`. The obvious solution would be to use `pathlib` / `os.path` (whichever is preferred in xarray) to convert the paths to absolute before caching. For example, changing the default netCDF4 backend from

https://github.com/pydata/xarray/blob/e56905889c836c736152b11a7e6117a229715975/xarray/backends/netCDF4_.py#L375-L377

to

```python
        manager = CachingFileManager(
            netCDF4.Dataset, os.path.abspath(filename), mode=mode, kwargs=kwargs
        )
```

fixes this for me.  I guess this should be done (if needed) by each backend to keep CachingFileManager as general as possible.

If my analysis and proposed solution seems correct, I'm happy to work up a pull request with these fixes and some regression tests.

If you're wondering about the use case where I bumped into this problem: we're using Click for a CLI, and using its test helpers. One of these (``isolated_filesystem``) creates and changes into an empty temporary directory before running the CLI function under test, so we can use `open_dataset(""output.nc"")` to load the CLI output for checking. Since it does this in the same process, using a parametrized test function means the first created file is always loaded for checking. Took a while to track down what was happening!

**Environment**:

<details><summary>Output of <tt>xr.show_versions()</tt></summary>

INSTALLED VERSIONS
------------------
commit: ec4e8b5f279e28588eee8ff43a328ca6c2f89f01
python: 3.9.2 (default, Feb 20 2021, 18:40:11) 
[GCC 10.2.0]
python-bits: 64
OS: Linux
OS-release: 5.11.11-arch1-1
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_NZ.UTF-8
LOCALE: en_NZ.UTF-8
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.17.0
pandas: 1.2.3
numpy: 1.20.1
scipy: 1.6.2
netCDF4: 1.5.6
pydap: None
h5netcdf: 0.9.0
h5py: 3.1.0
Nio: None
zarr: None
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.1
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.03.0
distributed: 2021.03.0
matplotlib: 3.4.1
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: None
setuptools: 54.2.0
pip: 20.3.1
conda: None
pytest: 6.2.3
IPython: 7.22.0
sphinx: 3.5.2
</details>","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5132/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
808558647,MDExOlB1bGxSZXF1ZXN0NTczNTc5NzUx,4911,Fix behaviour of min_count in reducing functions,367900,closed,0,,,6,2021-02-15T13:53:34Z,2021-02-19T08:12:39Z,2021-02-19T08:12:02Z,CONTRIBUTOR,,0,pydata/xarray/pulls/4911,"The first commit modifies existing tests to check Dask-backed arrays are not computed. It also adds some specific checks that the correct result (NaN or a number as appropriate) is returned and some tests for checking membership of `xarray.core.dtypes.NAT_TYPES`. After this commit I get 89 test failures, and they seem to cover the cases reported in #4898.

The second commit fixes these failures:

* The checks of the nan mask in `xarray.core.nanops._maybe_null_out` are changed to use `np.where` which allows lazy evaluation. 

* Previously, `xarray.core.dtypes.NAT_TYPES` was a tuple of datetime64 and timedelta64 instances; I've changed it to a set of the dtypes of these instances. It is only used for the membership check in `_maybe_null_out` so a set seems appropriate. The previous use of instances rather than dtypes caused a bug -- ``np.float64 in NAT_TYPES`` returned true even though it only contained datetime64/timedelta64. This meant that reducing operations over all axes (`axis=None` or `...`) with float64 arrays ignored min_count as the membership check in `_maybe_null_out` caused it to be skipped.

- [x] Closes #4898
- [x] Tests added
- [x] Passes `pre-commit run --all-files`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4911/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
807089005,MDU6SXNzdWU4MDcwODkwMDU=,4898,Sum and prod with min_count forces evaluation,367900,closed,0,,,5,2021-02-12T09:42:06Z,2021-02-19T08:12:02Z,2021-02-19T08:12:01Z,CONTRIBUTOR,,,,"If I use the `sum` method on a lazy array with `min_count != None` then evaluation is forced. If there is some limitation of the implementation which means it cannot be added to the computation graph for lazy evaluation then this should be mentioned in the docs.

**Minimal Complete Verifiable Example**:

```python
import numpy as np
import xarray as xr


def worker(da):
    if da.shape == (0, 0):
        return da

    raise RuntimeError(""I was evaluated"")


da = xr.DataArray(
    np.random.normal(size=(20, 500)),
    dims=(""x"", ""y""),
    coords=(np.arange(20), np.arange(500)),
)

da = da.chunk(dict(x=5))
lazy = da.map_blocks(worker)
result1 = lazy.sum(""x"", skipna=True)
result2 = lazy.sum(""x"", skipna=True, min_count=5)

```

**What happened**: ``RuntimeError: I was evaluated``

**What you expected to happen**: No output or exceptions, as the result1 and result2 arrays are not printed or saved.

**Environment**:

<details><summary>Output of <tt>xr.show_versions()</tt></summary>

INSTALLED VERSIONS
------------------
commit: None
python: 3.9.1 (default, Feb  6 2021, 06:49:13) 
[GCC 10.2.0]
python-bits: 64
OS: Linux
OS-release: 5.10.15-arch1-1
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_NZ.UTF-8
LOCALE: en_NZ.UTF-8
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.16.2
pandas: 1.2.1
numpy: 1.20.0
scipy: 1.6.0
netCDF4: 1.5.5.1
pydap: None
h5netcdf: 0.9.0
h5py: 3.1.0
Nio: None
zarr: None
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.0
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2020.12.0
distributed: 2020.12.0
matplotlib: 3.3.4
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: None
setuptools: 53.0.0
pip: 20.3.1
conda: None
pytest: 6.2.1
IPython: 7.19.0
sphinx: 3.4.3

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4898/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue