id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
595784008,MDU6SXNzdWU1OTU3ODQwMDg=,3945,Implement `value_counts` method,1200058,open,0,,,3,2020-04-07T11:05:06Z,2023-09-12T15:47:22Z,,NONE,,,,"Implement `value_counts` method

#### MCVE Code Sample

```python
print(object)
<xarray.DataArray (subtissue: 49, sample: 532, gene_id: 31490)>
dask.array<where, shape=(49, 532, 31490), dtype=object, chunksize=(1, 10, 31490), chunktype=numpy.ndarray>
Coordinates:
  * gene_id    (gene_id) object 'ENSG00000000003' ... 'ENSG00000285966'
  * sample     (sample) object 'GTEX-1117F' 'GTEX-111CU' ... 'GTEX-ZZPU'
  * subtissue  (subtissue) object 'Adipose - Subcutaneous' ... 'Whole Blood'
```

#### Suggested API:
`object.value_count(**kwargs)` should return an array with a new dimension defined by the kwargs key, containing the count values of all dimensions defined by the kwargs value.

#### Expected Output

```python
object.value_count(observation_counts=[""subtissue"", ""sample""])
<xarray.DataArray (observation_counts: 3, gene_id: 31490)>
dask.array<where, shape=(3, 31490), dtype=int, chunksize=(3, 31490), chunktype=numpy.ndarray>
Coordinates:
  * gene_id    (gene_id) object 'ENSG00000000003' ... 'ENSG00000285966'
  * observation_counts  (observation_counts) object 'underexpressed' 'normal' 'overexpressed'
```

#### Problem Description
Currently there is no existing equivalent to this method that I know in xarray.


#### Versions

<details><summary>Output of `xr.show_versions()`</summary>

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.6 | packaged by conda-forge | (default, Jan  7 2020, 22:33:48) 
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 5.3.11-1.el7.elrepo.x86_64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.3

xarray: 0.15.0
pandas: 1.0.0
numpy: 1.17.5
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: 0.7.4
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.10.1
distributed: 2.10.0
matplotlib: 3.1.3
cartopy: None
seaborn: 0.10.0
numbagg: None
setuptools: 45.1.0.post20200119
pip: 20.0.2
conda: None
pytest: 5.3.5
IPython: 7.12.0
sphinx: 2.0.1



</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3945/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
712217045,MDU6SXNzdWU3MTIyMTcwNDU=,4476,Reimplement GroupBy.argmax,1200058,open,0,,,5,2020-09-30T19:25:22Z,2023-03-03T06:59:40Z,,NONE,,,,"Please implement

**Is your feature request related to a problem? Please describe.**
Observed:
```python
da.groupby(""g"").argmax(dim=""t"")
```
```python
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-84-15c199b0f7d4> in <module>
----> 1 da.groupby(""g"").argmax(dim=""t"")

AttributeError: 'DataArrayGroupBy' object has no attribute 'argmax'
```

**Describe the solution you'd like**
Expected: Vector of length `len(unique(g))` containing the indices of `da[""t""]` where the value was maximum.

**Workaround:**
```python
da.groupby(""g"").apply(lambda c: c.argmax(dim=""t""))
```
```
<xarray.DataArray 'da' (st: 11, g: 1)>
array([[ 7],
       [ 0],
       [14],
       [14],
       [ 0],
       [ 0],
       [ 7],
       [ 0],
       [14],
       [ 0],
       [ 7]])
Coordinates:
  * st      (st) object 'a' ... 'z'
  * g       (g) object 'E'
```

","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4476/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
860418546,MDU6SXNzdWU4NjA0MTg1NDY=,5179,N-dimensional boolean indexing ,1200058,open,0,,,6,2021-04-17T14:07:48Z,2021-07-16T17:30:45Z,,NONE,,,,"<!-- Please do a quick search of existing issues to make sure that this has not been asked before. -->

Currently, the docs state that boolean indexing is only possible with 1-dimensional arrays:
http://xarray.pydata.org/en/stable/indexing.html

However, I often have the case where I'd like to convert a subset of an xarray to a dataframe.
Usually, I would call e.g.:
```python
data = xrds.stack(observations=[""dim1"", ""dim2"", ""dim3""])
data = data.isel(~ data.missing)
df = data.to_dataframe()
```

However, this approach is incredibly slow and memory-demanding, since it creates a MultiIndex of every possible coordinate in the array.

**Describe the solution you'd like**
A better approach would be to directly allow index selection with the boolean array:
```python
data = xrds.isel(~ xrds.missing, dim=""observations"")
df = data.to_dataframe()
```
This way, it is possible to
1) Identify the resulting coordinates with `np.argwhere()`
2) Directly use the underlying array for fancy indexing: `variable.data[mask]`

**Additional context**
I created a proof-of-concept that works for my projects:
https://gist.github.com/Hoeze/c746ea1e5fef40d99997f765c48d3c0d
Some important lines are those:
```python
def core_dim_locs_from_cond(cond, new_dim_name, core_dims=None) -> List[Tuple[str, xr.DataArray]]:
    [...]
    core_dim_locs = np.argwhere(cond.data)
    if isinstance(core_dim_locs, dask.array.core.Array):
        core_dim_locs = core_dim_locs.persist().compute_chunk_sizes()

def subset_variable(variable, core_dim_locs, new_dim_name, mask=None):
    [...]
    subset = dask.array.asanyarray(variable.data)[mask]
    # force-set chunk size from known chunks
    chunk_sizes = core_dim_locs[0][1].chunks[0]
    subset._chunks = (chunk_sizes, *subset._chunks[1:])
```

As a result, I would expect something like this:
![image](https://user-images.githubusercontent.com/1200058/115115833-d907a600-9f96-11eb-9c3f-eb91a6a5dbd2.png)
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5179/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
489825483,MDU6SXNzdWU0ODk4MjU0ODM=,3281,"[proposal] concatenate by axis, ignore dimension names",1200058,open,0,,,4,2019-09-05T15:06:22Z,2021-07-08T17:42:53Z,,NONE,,,,"Hi, I wrote a helper function which allows to concatenate arrays like `xr.combine_nested` with the difference that it only supports `xr.DataArrays`, concatenates them by axis position similar to `np.concatenate` and overwrites all dimension names.

I often need this to combine very different feature types.

```python
from typing import Union, Tuple, List
import numpy as np
import xarray as xr

def concat_by_axis(
        darrs: Union[List[xr.DataArray], Tuple[xr.DataArray]],
        dims: Union[List[str], Tuple[str]],
        axis: int = None,
        **kwargs
):
    """"""
    Concat arrays along some axis similar to `np.concatenate`. Automatically renames the dimensions to `dims`.
    Please note that this renaming happens by the axis position, therefore make sure to transpose all arrays
    to the correct dimension order.

    :param darrs: List or tuple of xr.DataArrays
    :param dims: The dimension names of the resulting array. Renames axes where necessary.
    :param axis: The axis which should be concatenated along
    :param kwargs: Additional arguments which will be passed to `xr.concat()`
    :return: Concatenated xr.DataArray with dimensions `dim`.
    """"""

    # Get depth of nested lists. Assumes `darrs` is correctly formatted as list of lists.
    if axis is None:
        axis = 0
        l = darrs
        # while l is a list or tuple and contains elements:
        while isinstance(l, List) or isinstance(l, Tuple) and l:
            # increase depth by one
            axis -= 1
            l = l[0]
        if axis == 0:
            raise ValueError(""`darrs` has to be a (possibly nested) list or tuple of xr.DataArrays!"")

    to_concat = list()
    for i, da in enumerate(darrs):
        # recursive call for nested arrays;
        # most inner call should have axis = -1,
        # most outer call should have axis = - depth_of_darrs
        if isinstance(da, list) or isinstance(da, tuple):
            da = concat_axis(da, dims=dims, axis=axis + 1, **kwargs)

        if not isinstance(da, xr.DataArray):
            raise ValueError(""Input %d must be a xr.DataArray"" % i)
        if len(da.dims) != len(dims):
            raise ValueError(""Input %d must have the same number of dimensions as specified in the `dims` argument!"" % i)

        # force-rename dimensions
        da = da.rename(dict(zip(da.dims, dims)))

        to_concat.append(da)

    return xr.concat(to_concat, dim=dims[axis], **kwargs)

```

Would it make sense to include this in xarray?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3281/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
636512559,MDU6SXNzdWU2MzY1MTI1NTk=,4143,[Feature request] Masked operations,1200058,open,0,,,1,2020-06-10T20:04:45Z,2021-04-22T20:54:03Z,,NONE,,,,"Xarray already has `unstack(sparse=True)` which is quite awesome.
However, in many cases it is costly to convert a very dense array (existing values >> missing values) to a sparse representation. Also, many calculations require to convert the sparse array back into dense array and to manually mask the missing values (e.g. Keras).

Logically, a sparse array is equal to a masked dense array. 
They only differ in their internal data representation.
Therefore, I would propose to have a `masked=True` option for all operations that can create missing values. These cover (amongst others):
- `.unstack([...], masked=True)`
- `.where(<multi-dimensional array>, masked=True)`
- `.align([...], masked=True)`

This would solve a number of problems:
- No more conversion of int -> float
- Explicit value for missingness
- When stacking data with missing values, the missing values can be just dropped
- When converting data with missing values to DataFrame, the missing values can be just dropped



#### MCVE Code Sample

An example would be outer joins with slightly different coordinates (taken from the documentation):
```python
>>> x
<xarray.DataArray (lat: 2, lon: 2)>
array([[25, 35],
       [10, 24]])
Coordinates:
* lat      (lat) float64 35.0 40.0
* lon      (lon) float64 100.0 120.0

>>> y
<xarray.DataArray (lat: 2, lon: 2)>
array([[20,  5],
       [ 7, 13]])
Coordinates:
* lat      (lat) float64 35.0 42.0
* lon      (lon) float64 100.0 120.0
```
#### Non-masked outer join:
```python
>>> a, b = xr.align(x, y, join=""outer"")
>>> a
<xarray.DataArray (lat: 3, lon: 2)>
array([[25., 35.],
       [10., 24.],
       [nan, nan]])
Coordinates:
* lat      (lat) float64 35.0 40.0 42.0
* lon      (lon) float64 100.0 120.0
>>> b
<xarray.DataArray (lat: 3, lon: 2)>
array([[20.,  5.],
       [nan, nan],
       [ 7., 13.]])
Coordinates:
* lat      (lat) float64 35.0 40.0 42.0
* lon      (lon) float64 100.0 120.0
```
#### The masked version:
```python
>>> a, b = xr.align(x, y, join=""outer"", masked=True)
>>> a
<xarray.DataArray (lat: 3, lon: 2)>
masked_array(data=[[25, 35],
                   [10, 24],
                   [--, --]],
             mask=[[False, False],
                   [False, False],
                   [True, True]],
             fill_value=0)
Coordinates:
* lat      (lat) float64 35.0 40.0 42.0
* lon      (lon) float64 100.0 120.0
>>> b
<xarray.DataArray (lat: 3, lon: 2)>
masked_array(data=[[20, 5],
                   [--, --],
                   [7, 13]],
             mask=[[False, False],
                   [True, True],
                   [False, False]],
             fill_value=0)
Coordinates:
* lat      (lat) float64 35.0 40.0 42.0
* lon      (lon) float64 100.0 120.0
```

Related issue:
https://github.com/pydata/xarray/issues/3955
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4143/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
512879550,MDU6SXNzdWU1MTI4Nzk1NTA=,3452,[feature request] __iter__() for rolling-window on datasets,1200058,open,0,,,2,2019-10-26T20:08:06Z,2021-02-18T21:41:58Z,,NONE,,,,"Currently, rolling() on a dataset does not return an iterator:

#### MCVE Code Sample
```python
arr = xr.DataArray(np.arange(0, 7.5, 0.5).reshape(3, 5),
    dims=('x', 'y'))

r = arr.to_dataset(name=""test"").rolling(y=3)
for label, arr_window in r:
    print(label)
```
```
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-b1703cb71c1e> in <module>
      3 
      4 r = arr.to_dataset(name=""test"").rolling(y=3)
----> 5 for label, arr_window in r:
      6     print(label)

TypeError: 'DatasetRolling' object is not iterable
```

#### Output of ``xr.show_versions()``
<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.4 (default, Aug 13 2019, 20:35:49) 
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 5.3.7-arch1-1-ARCH
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: de_DE.UTF-8
libhdf5: 1.10.4
libnetcdf: None

xarray: 0.13.0
pandas: 0.24.2
numpy: 1.16.4
scipy: 1.3.0
netCDF4: None
pydap: None
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.1.0
distributed: 2.1.0
matplotlib: 3.1.1
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 19.1.1
conda: None
pytest: None
IPython: 7.8.0
sphinx: None

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3452/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
528060435,MDU6SXNzdWU1MjgwNjA0MzU=,3570,fillna on dataset converts all variables to float,1200058,open,0,,,5,2019-11-25T12:39:49Z,2020-09-15T15:35:04Z,,NONE,,,,"#### MCVE Code Sample
```python
xr.Dataset(
     {
         ""A"": (""x"", [np.nan, 2, np.nan, 0]),
         ""B"": (""x"", [3, 4, np.nan, 1]),
         ""C"": (""x"", [True, True, False, False]),
         ""D"": (""x"", [np.nan, 3, np.nan, 4])
     },
     coords={""x"": [0, 1, 2, 3]}
).fillna(value={""A"": 0})
```
```
<xarray.Dataset>
Dimensions:  (x: 4)
Coordinates:
  * x        (x) int64 0 1 2 3
Data variables:
    A        (x) float64 0.0 2.0 0.0 0.0
    B        (x) float64 3.0 4.0 nan 1.0
    C        (x) float64 1.0 1.0 0.0 0.0
    D        (x) float64 nan 3.0 nan 4.0
```

#### Expected Output
```
<xarray.Dataset>
Dimensions:  (x: 4)
Coordinates:
  * x        (x) int64 0 1 2 3
Data variables:
    A        (x) float64 0.0 2.0 0.0 0.0
    B        (x) float64 3.0 4.0 nan 1.0
    C        (x) bool True True False False
    D        (x) float64 nan 3.0 nan 4.0
```

#### Problem Description
I'd like to use `fillna` to replace NaN's in some of a `Dataset`'s variables.
However, `fillna` unexpectably converts all variables to float, even if they are boolean or integers.

Would it be possible to apply `fillna` only on float / object types and consider the `value` argument, if I only want to apply `fillna` to a subset of the dataset?


#### Output of ``xr.show_versions()``
<details>

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.4 (default, Aug 13 2019, 20:35:49) 
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.27.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.1

xarray: 0.14.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: 1.4.2
pydap: None
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.5.2
distributed: 2.5.2
matplotlib: 3.1.1
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 19.2.3
conda: None
pytest: 5.0.1
IPython: 7.8.0
sphinx: None


</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3570/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
566509807,MDU6SXNzdWU1NjY1MDk4MDc=,3775,[Question] Efficient shortcut for unstacking only parts of dimension?,1200058,open,0,,,1,2020-02-17T20:46:03Z,2020-03-07T04:53:05Z,,NONE,,,,"Hi all, is there an efficient way to unstack only parts of a MultiIndex?

Consider for example the following array:
```python
<xarray.Dataset>
Dimensions:                      (observations: 17525)
Coordinates:
  * observations                 (observations) MultiIndex
  - subtissue                    (observations) object 'Skin_Sun_Exposed_Lower_leg' ... 'Thyroid'
  - individual                   (observations) object 'GTEX-111FC' ... 'GTEX-ZZPU'
  - gene                         (observations) object 'ENSG00000140400' ... 'ENSG00000174233'
  - end                          (observations) object '5' '5' '5' ... '3' '3'
Data variables:
    fraser_min_pval              (observations) float64 dask.array<chunksize=(17525,), meta=np.ndarray>
    fraser_min_minus_log10_pval  (observations) float64 dask.array<chunksize=(17525,), meta=np.ndarray>
```
Here, I have a MultiIndex `observations=[""subtissue"", ""individual"", ""gene"", ""end""]`.
However, I would like to have `end` in its own dimension.
Currently, I have to do the following to solve this issue:
```python3
xrds.unstack(""observations"").stack(observations=[""subtissue"", ""individual"", ""gene"",])
```
However, this seems quite inefficient and introduces `NaN`'s.


#### Output of ``xr.show_versions()``
<details>

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.6 | packaged by conda-forge | (default, Jan  7 2020, 22:33:48) 
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1062.1.2.el7.x86_64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.3

xarray: 0.15.0
pandas: 1.0.0
numpy: 1.17.5
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: 0.7.4
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.10.1
distributed: 2.10.0
matplotlib: 3.1.3
cartopy: None
seaborn: 0.10.0
numbagg: None
setuptools: 45.1.0.post20200119
pip: 20.0.2
conda: None
pytest: 5.3.5
IPython: 7.12.0
sphinx: None

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3775/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
325661581,MDU6SXNzdWUzMjU2NjE1ODE=,2175,[Feature Request] Visualizing dimensions,1200058,open,0,,,4,2018-05-23T11:22:29Z,2019-07-12T16:10:23Z,,NONE,,,,"Hi, I'm curious how you created your logo:
![grafik](https://user-images.githubusercontent.com/1200058/40421311-c4d18d62-5e8b-11e8-94f4-b217f51b61b0.png)

I'd like to create visualizations of the dimensions in my dataset similar to your logo.
Having a functionality simplifying this task would be a very useful feature in xarray.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2175/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue