id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
279456192,MDU6SXNzdWUyNzk0NTYxOTI=,1761,Importing xarray fails if old version of bottleneck is installed,1882397,closed,0,,,5,2017-12-05T17:10:25Z,2020-02-09T21:39:48Z,2020-02-09T21:39:48Z,NONE,,,,"Importing version 0.11 of xarray fails if version 1.0.0 of Bottleneck is installed.
Bottleneck seems to be an optional dependency of xarray. During runtime xarray replaces functions by their bottleneck versions if that is installed, but it does not check if the version of bottleneck that is installed is new enough to provide that function:

The `getattr` here fails with an AttributeError in this case:

https://github.com/pydata/xarray/blob/b46fcd656391d786b8d25b0615f6d4bd30b524b7/xarray/core/ops.py#L361-L365

`AttributeError: 'module' object has no attribute 'move_argmax'`

`move_argmax` was included into bottleneck in version 1.1.0, so if version 1.0 is installed this can't work.

I saw this on python2.7, but I don't think that should matter...","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1761/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
372006204,MDU6SXNzdWUzNzIwMDYyMDQ=,2496,Incorrect conversion from sliced pd.MultiIndex,1882397,closed,0,,,2,2018-10-19T15:25:38Z,2019-02-19T09:42:52Z,2019-02-19T09:42:51Z,NONE,,,,"If we convert a pandas dataframe with a multiindex, slice it to remove some entries from the index, a converted DataArray still contains the removed items in the coordinates (although the values are NaN).

```python
# We create an example dataframe
idx = pd.MultiIndex.from_product([list('abc'), list('xyz')])
df = pd.DataFrame(data={'col': np.random.randn(len(idx))}, index=idx)
df.columns.name = 'cols'
df.index.names = ['idx1', 'idx2']
df2 = df.loc[['a', 'b']]
```
```python
# df2 does not contain `c` in the first level
>>> df2
cols            col
idx1 idx2          
a    x    -0.844476
     y    -0.845998
     z     1.965143
b    x    -0.159293
     y     0.188163
     z    -1.076204

# It still shows up in the converted xarray though:
>>> xr.DataArray(df2).unstack('dim_0')
<xarray.DataArray (cols: 1, idx1: 3, idx2: 3)>
array([[[-0.844476, -0.845998,  1.965143],
        [-0.159293,  0.188163, -1.076204],
        [      nan,       nan,       nan]]])
Coordinates:
  * cols     (cols) object 'col'
  * idx1     (idx1) object 'a' 'b' 'c'
  * idx2     (idx2) object 'x' 'y' 'z'
```

If the original dataframe is very sparse, this can lead to gigantic unnecessary memory usage.


<details>

#### Output of ``xr.show_versions()``

```
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_GB.UTF-8
LANG: None
LOCALE: en_GB.UTF-8

xarray: 0.10.9
pandas: 0.23.4
numpy: 1.15.2
scipy: 1.1.0
netCDF4: 1.4.1
h5netcdf: 0.6.2
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.0b1
PseudonetCDF: None
rasterio: None
iris: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.19.2
distributed: 1.23.2
matplotlib: 3.0.0
cartopy: None
seaborn: 0.9.0
setuptools: 40.4.3
pip: 18.0
conda: 4.5.11
pytest: 3.8.1
IPython: 7.0.1
sphinx: 1.8.1
```

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2496/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
355264812,MDU6SXNzdWUzNTUyNjQ4MTI=,2389,Large pickle overhead in ds.to_netcdf() involving dask.delayed functions,1882397,closed,0,,,11,2018-08-29T17:43:28Z,2019-01-13T21:17:12Z,2019-01-13T21:17:12Z,NONE,,,,"If we write a dask array that doesn't involve `dask.delayed` functions using `ds.to_netcdf`, there is only little overhead from pickle:

```python
vals = da.random.random(500, chunks=(1,))
ds = xr.Dataset({'vals': (['a'], vals)})
write = ds.to_netcdf('file2.nc', compute=False)
%prun -stime -l10 write.compute()
```

```
         123410 function calls (104395 primitive calls) in 13.720 seconds

   Ordered by: internal time
   List reduced from 203 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        8   10.032    1.254   10.032    1.254 {method 'acquire' of '_thread.lock' objects}
     1001    2.939    0.003    2.950    0.003 {built-in method _pickle.dumps}
     1001    0.614    0.001    3.569    0.004 pickle.py:30(dumps)
6504/1002    0.012    0.000    0.021    0.000 utils.py:803(convert)
11507/1002    0.010    0.000    0.019    0.000 utils_comm.py:144(unpack_remotedata)
     6013    0.009    0.000    0.009    0.000 utils.py:767(tokey)
3002/1002    0.008    0.000    0.017    0.000 utils_comm.py:181(<listcomp>)
    11512    0.007    0.000    0.008    0.000 core.py:26(istask)
     1002    0.006    0.000    3.589    0.004 worker.py:788(dumps_task)
        1    0.005    0.005    0.007    0.007 core.py:273(<dictcomp>)
```

But if we use results from `dask.delayed`, pickle takes up most of the time:
```python
@dask.delayed
def make_data():
    return np.array(np.random.randn())

vals = da.stack([da.from_delayed(make_data(), (), np.float64) for _ in range(500)])
ds = xr.Dataset({'vals': (['a'], vals)})
write = ds.to_netcdf('file5.nc', compute=False)
%prun -stime -l10 write.compute()
```

```
         115045243 function calls (104115443 primitive calls) in 67.240 seconds

   Ordered by: internal time
   List reduced from 292 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
8120705/501   17.597    0.000   59.036    0.118 pickle.py:457(save)
2519027/501    7.581    0.000   59.032    0.118 pickle.py:723(save_tuple)
        4    6.978    1.745    6.978    1.745 {method 'acquire' of '_thread.lock' objects}
  3082150    5.362    0.000    8.748    0.000 pickle.py:413(memoize)
 11474396    4.516    0.000    5.970    0.000 pickle.py:213(write)
  8121206    4.186    0.000    5.202    0.000 pickle.py:200(commit_frame)
 13747943    2.703    0.000    2.703    0.000 {method 'get' of 'dict' objects}
 17057538    1.887    0.000    1.887    0.000 {built-in method builtins.id}
  4568116    1.772    0.000    1.782    0.000 {built-in method _struct.pack}
  2762513    1.613    0.000    2.826    0.000 pickle.py:448(get)
```

This additional pickle overhead does not happen if we compute the dataset without writing it to a file.

<details>

Output of `%prun -stime -l10 ds.compute()` without `dask.delayed`:
```
         83856 function calls (73348 primitive calls) in 0.566 seconds

   Ordered by: internal time
   List reduced from 259 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        4    0.441    0.110    0.441    0.110 {method 'acquire' of '_thread.lock' objects}
      502    0.013    0.000    0.013    0.000 {method 'send' of '_socket.socket' objects}
      500    0.011    0.000    0.011    0.000 {built-in method _pickle.dumps}
     1000    0.007    0.000    0.008    0.000 core.py:159(get_dependencies)
     3500    0.007    0.000    0.007    0.000 utils.py:767(tokey)
 3000/500    0.006    0.000    0.010    0.000 utils.py:803(convert)
      500    0.005    0.000    0.019    0.000 pickle.py:30(dumps)
        1    0.004    0.004    0.008    0.008 core.py:3826(concatenate3)
 4500/500    0.004    0.000    0.008    0.000 utils_comm.py:144(unpack_remotedata)
        1    0.004    0.004    0.017    0.017 order.py:83(order)
```

With `dask.delayed`:
```
         149376 function calls (139868 primitive calls) in 1.738 seconds

   Ordered by: internal time
   List reduced from 264 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        4    1.568    0.392    1.568    0.392 {method 'acquire' of '_thread.lock' objects}
        1    0.015    0.015    0.038    0.038 optimization.py:455(fuse)
      502    0.012    0.000    0.012    0.000 {method 'send' of '_socket.socket' objects}
     6500    0.010    0.000    0.010    0.000 utils.py:767(tokey)
5500/1000    0.009    0.000    0.012    0.000 utils_comm.py:144(unpack_remotedata)
     2500    0.008    0.000    0.009    0.000 core.py:159(get_dependencies)
      500    0.007    0.000    0.009    0.000 client.py:142(__init__)
     1000    0.005    0.000    0.008    0.000 core.py:280(subs)
2000/1000    0.005    0.000    0.008    0.000 utils.py:803(convert)
        1    0.004    0.004    0.022    0.022 order.py:83(order)
```

</details>

I am using `dask.distributed`. I haven't tested it with anything else.

#### Software versions

<details>

```
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_GB.UTF-8
LANG: None
LOCALE: en_GB.UTF-8

xarray: 0.10.8
pandas: 0.23.4
numpy: 1.15.1
scipy: 1.1.0
netCDF4: 1.4.0
h5netcdf: 0.6.2
h5py: 2.8.0
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.18.2
distributed: 1.22.1
matplotlib: 2.2.2
cartopy: None
seaborn: 0.9.0
setuptools: 40.2.0
pip: 18.0
conda: 4.5.11
pytest: 3.7.3
IPython: 6.5.0
sphinx: 1.7.7
```

</details>","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2389/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
342426261,MDU6SXNzdWUzNDI0MjYyNjE=,2299,Confusing behaviour with MultiIndex,1882397,closed,0,6815844,,1,2018-07-18T17:41:12Z,2018-08-13T22:16:31Z,2018-08-13T22:16:31Z,NONE,,,,"`Dataset` allows assignment of new variables with dimension names that are used in a MultiIndex, even if the lengths do not match the existing coordinate.

```python
a = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).unstack('a')
a.index.names = ['dim0', 'dim1']
a.index.name = 'stacked_dim'

b = xr.Dataset(coords={'dim0': ['a', 'b'], 'dim1': [0, 1]})
b = b.stack(dim_stacked=['dim0', 'dim1'])
assert(len(b.dim0) == 4)

# This should raise an errors because the length is != 4
b['c'] = (('dim0',), [10, 11])
b
```
Instead, it reports `dim0` as a new dimension without coordinates:
```
<xarray.Dataset>
Dimensions:      (dim0: 2, dim_stacked: 4)
Coordinates:
  * dim_stacked  (dim_stacked) MultiIndex
  - dim0         (dim_stacked) object 'a' 'a' 'b' 'b'
  - dim1         (dim_stacked) int64 0 1 0 1
Dimensions without coordinates: dim0
Data variables:
    c            (dim0) int64 10 11
```

Similar cases of coordinates that aren't used do raise an error:
```python
ds = xr.Dataset()
ds.coords['a'] = [1, 2, 3]
ds = ds.sel(a=1)
ds['b'] = (('a',), [1, 2])
ds
```

#### Output of ``xr.show_versions()``

<details>

```
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_GB.UTF-8
LANG: None
LOCALE: en_GB.UTF-8

xarray: 0.10.7
pandas: 0.23.2
numpy: 1.14.5
scipy: 1.1.0
netCDF4: 1.4.0
h5netcdf: None
h5py: 2.8.0
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.18.1
distributed: 1.22.0
matplotlib: 2.2.2
cartopy: None
seaborn: 0.8.1
setuptools: 39.2.0
pip: 10.0.1
conda: 4.5.8
pytest: 3.6.2
IPython: 6.4.0
sphinx: 1.7.5
```

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2299/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue