id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 595784008,MDU6SXNzdWU1OTU3ODQwMDg=,3945,Implement `value_counts` method,1200058,open,0,,,3,2020-04-07T11:05:06Z,2023-09-12T15:47:22Z,,NONE,,,,"Implement `value_counts` method #### MCVE Code Sample ```python print(object) dask.array Coordinates: * gene_id (gene_id) object 'ENSG00000000003' ... 'ENSG00000285966' * sample (sample) object 'GTEX-1117F' 'GTEX-111CU' ... 'GTEX-ZZPU' * subtissue (subtissue) object 'Adipose - Subcutaneous' ... 'Whole Blood' ``` #### Suggested API: `object.value_count(**kwargs)` should return an array with a new dimension defined by the kwargs key, containing the count values of all dimensions defined by the kwargs value. #### Expected Output ```python object.value_count(observation_counts=[""subtissue"", ""sample""]) dask.array Coordinates: * gene_id (gene_id) object 'ENSG00000000003' ... 'ENSG00000285966' * observation_counts (observation_counts) object 'underexpressed' 'normal' 'overexpressed' ``` #### Problem Description Currently there is no existing equivalent to this method that I know in xarray. #### Versions
Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.3.11-1.el7.elrepo.x86_64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.15.0 pandas: 1.0.0 numpy: 1.17.5 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: 0.7.4 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.10.1 distributed: 2.10.0 matplotlib: 3.1.3 cartopy: None seaborn: 0.10.0 numbagg: None setuptools: 45.1.0.post20200119 pip: 20.0.2 conda: None pytest: 5.3.5 IPython: 7.12.0 sphinx: 2.0.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3945/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 712217045,MDU6SXNzdWU3MTIyMTcwNDU=,4476,Reimplement GroupBy.argmax,1200058,open,0,,,5,2020-09-30T19:25:22Z,2023-03-03T06:59:40Z,,NONE,,,,"Please implement **Is your feature request related to a problem? Please describe.** Observed: ```python da.groupby(""g"").argmax(dim=""t"") ``` ```python --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in ----> 1 da.groupby(""g"").argmax(dim=""t"") AttributeError: 'DataArrayGroupBy' object has no attribute 'argmax' ``` **Describe the solution you'd like** Expected: Vector of length `len(unique(g))` containing the indices of `da[""t""]` where the value was maximum. **Workaround:** ```python da.groupby(""g"").apply(lambda c: c.argmax(dim=""t"")) ``` ``` array([[ 7], [ 0], [14], [14], [ 0], [ 0], [ 7], [ 0], [14], [ 0], [ 7]]) Coordinates: * st (st) object 'a' ... 'z' * g (g) object 'E' ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4476/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 860418546,MDU6SXNzdWU4NjA0MTg1NDY=,5179,N-dimensional boolean indexing ,1200058,open,0,,,6,2021-04-17T14:07:48Z,2021-07-16T17:30:45Z,,NONE,,,," Currently, the docs state that boolean indexing is only possible with 1-dimensional arrays: http://xarray.pydata.org/en/stable/indexing.html However, I often have the case where I'd like to convert a subset of an xarray to a dataframe. Usually, I would call e.g.: ```python data = xrds.stack(observations=[""dim1"", ""dim2"", ""dim3""]) data = data.isel(~ data.missing) df = data.to_dataframe() ``` However, this approach is incredibly slow and memory-demanding, since it creates a MultiIndex of every possible coordinate in the array. **Describe the solution you'd like** A better approach would be to directly allow index selection with the boolean array: ```python data = xrds.isel(~ xrds.missing, dim=""observations"") df = data.to_dataframe() ``` This way, it is possible to 1) Identify the resulting coordinates with `np.argwhere()` 2) Directly use the underlying array for fancy indexing: `variable.data[mask]` **Additional context** I created a proof-of-concept that works for my projects: https://gist.github.com/Hoeze/c746ea1e5fef40d99997f765c48d3c0d Some important lines are those: ```python def core_dim_locs_from_cond(cond, new_dim_name, core_dims=None) -> List[Tuple[str, xr.DataArray]]: [...] core_dim_locs = np.argwhere(cond.data) if isinstance(core_dim_locs, dask.array.core.Array): core_dim_locs = core_dim_locs.persist().compute_chunk_sizes() def subset_variable(variable, core_dim_locs, new_dim_name, mask=None): [...] subset = dask.array.asanyarray(variable.data)[mask] # force-set chunk size from known chunks chunk_sizes = core_dim_locs[0][1].chunks[0] subset._chunks = (chunk_sizes, *subset._chunks[1:]) ``` As a result, I would expect something like this: ![image](https://user-images.githubusercontent.com/1200058/115115833-d907a600-9f96-11eb-9c3f-eb91a6a5dbd2.png) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5179/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 489825483,MDU6SXNzdWU0ODk4MjU0ODM=,3281,"[proposal] concatenate by axis, ignore dimension names",1200058,open,0,,,4,2019-09-05T15:06:22Z,2021-07-08T17:42:53Z,,NONE,,,,"Hi, I wrote a helper function which allows to concatenate arrays like `xr.combine_nested` with the difference that it only supports `xr.DataArrays`, concatenates them by axis position similar to `np.concatenate` and overwrites all dimension names. I often need this to combine very different feature types. ```python from typing import Union, Tuple, List import numpy as np import xarray as xr def concat_by_axis( darrs: Union[List[xr.DataArray], Tuple[xr.DataArray]], dims: Union[List[str], Tuple[str]], axis: int = None, **kwargs ): """""" Concat arrays along some axis similar to `np.concatenate`. Automatically renames the dimensions to `dims`. Please note that this renaming happens by the axis position, therefore make sure to transpose all arrays to the correct dimension order. :param darrs: List or tuple of xr.DataArrays :param dims: The dimension names of the resulting array. Renames axes where necessary. :param axis: The axis which should be concatenated along :param kwargs: Additional arguments which will be passed to `xr.concat()` :return: Concatenated xr.DataArray with dimensions `dim`. """""" # Get depth of nested lists. Assumes `darrs` is correctly formatted as list of lists. if axis is None: axis = 0 l = darrs # while l is a list or tuple and contains elements: while isinstance(l, List) or isinstance(l, Tuple) and l: # increase depth by one axis -= 1 l = l[0] if axis == 0: raise ValueError(""`darrs` has to be a (possibly nested) list or tuple of xr.DataArrays!"") to_concat = list() for i, da in enumerate(darrs): # recursive call for nested arrays; # most inner call should have axis = -1, # most outer call should have axis = - depth_of_darrs if isinstance(da, list) or isinstance(da, tuple): da = concat_axis(da, dims=dims, axis=axis + 1, **kwargs) if not isinstance(da, xr.DataArray): raise ValueError(""Input %d must be a xr.DataArray"" % i) if len(da.dims) != len(dims): raise ValueError(""Input %d must have the same number of dimensions as specified in the `dims` argument!"" % i) # force-rename dimensions da = da.rename(dict(zip(da.dims, dims))) to_concat.append(da) return xr.concat(to_concat, dim=dims[axis], **kwargs) ``` Would it make sense to include this in xarray?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3281/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 636512559,MDU6SXNzdWU2MzY1MTI1NTk=,4143,[Feature request] Masked operations,1200058,open,0,,,1,2020-06-10T20:04:45Z,2021-04-22T20:54:03Z,,NONE,,,,"Xarray already has `unstack(sparse=True)` which is quite awesome. However, in many cases it is costly to convert a very dense array (existing values >> missing values) to a sparse representation. Also, many calculations require to convert the sparse array back into dense array and to manually mask the missing values (e.g. Keras). Logically, a sparse array is equal to a masked dense array. They only differ in their internal data representation. Therefore, I would propose to have a `masked=True` option for all operations that can create missing values. These cover (amongst others): - `.unstack([...], masked=True)` - `.where(, masked=True)` - `.align([...], masked=True)` This would solve a number of problems: - No more conversion of int -> float - Explicit value for missingness - When stacking data with missing values, the missing values can be just dropped - When converting data with missing values to DataFrame, the missing values can be just dropped #### MCVE Code Sample An example would be outer joins with slightly different coordinates (taken from the documentation): ```python >>> x array([[25, 35], [10, 24]]) Coordinates: * lat (lat) float64 35.0 40.0 * lon (lon) float64 100.0 120.0 >>> y array([[20, 5], [ 7, 13]]) Coordinates: * lat (lat) float64 35.0 42.0 * lon (lon) float64 100.0 120.0 ``` #### Non-masked outer join: ```python >>> a, b = xr.align(x, y, join=""outer"") >>> a array([[25., 35.], [10., 24.], [nan, nan]]) Coordinates: * lat (lat) float64 35.0 40.0 42.0 * lon (lon) float64 100.0 120.0 >>> b array([[20., 5.], [nan, nan], [ 7., 13.]]) Coordinates: * lat (lat) float64 35.0 40.0 42.0 * lon (lon) float64 100.0 120.0 ``` #### The masked version: ```python >>> a, b = xr.align(x, y, join=""outer"", masked=True) >>> a masked_array(data=[[25, 35], [10, 24], [--, --]], mask=[[False, False], [False, False], [True, True]], fill_value=0) Coordinates: * lat (lat) float64 35.0 40.0 42.0 * lon (lon) float64 100.0 120.0 >>> b masked_array(data=[[20, 5], [--, --], [7, 13]], mask=[[False, False], [True, True], [False, False]], fill_value=0) Coordinates: * lat (lat) float64 35.0 40.0 42.0 * lon (lon) float64 100.0 120.0 ``` Related issue: https://github.com/pydata/xarray/issues/3955 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4143/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 512879550,MDU6SXNzdWU1MTI4Nzk1NTA=,3452,[feature request] __iter__() for rolling-window on datasets,1200058,open,0,,,2,2019-10-26T20:08:06Z,2021-02-18T21:41:58Z,,NONE,,,,"Currently, rolling() on a dataset does not return an iterator: #### MCVE Code Sample ```python arr = xr.DataArray(np.arange(0, 7.5, 0.5).reshape(3, 5), dims=('x', 'y')) r = arr.to_dataset(name=""test"").rolling(y=3) for label, arr_window in r: print(label) ``` ``` --------------------------------------------------------------------------- TypeError Traceback (most recent call last) in 3 4 r = arr.to_dataset(name=""test"").rolling(y=3) ----> 5 for label, arr_window in r: 6 print(label) TypeError: 'DatasetRolling' object is not iterable ``` #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.3.7-arch1-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: de_DE.UTF-8 libhdf5: 1.10.4 libnetcdf: None xarray: 0.13.0 pandas: 0.24.2 numpy: 1.16.4 scipy: 1.3.0 netCDF4: None pydap: None h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.1.0 distributed: 2.1.0 matplotlib: 3.1.1 cartopy: None seaborn: 0.9.0 numbagg: None setuptools: 41.4.0 pip: 19.1.1 conda: None pytest: None IPython: 7.8.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3452/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 528060435,MDU6SXNzdWU1MjgwNjA0MzU=,3570,fillna on dataset converts all variables to float,1200058,open,0,,,5,2019-11-25T12:39:49Z,2020-09-15T15:35:04Z,,NONE,,,,"#### MCVE Code Sample ```python xr.Dataset( { ""A"": (""x"", [np.nan, 2, np.nan, 0]), ""B"": (""x"", [3, 4, np.nan, 1]), ""C"": (""x"", [True, True, False, False]), ""D"": (""x"", [np.nan, 3, np.nan, 4]) }, coords={""x"": [0, 1, 2, 3]} ).fillna(value={""A"": 0}) ``` ``` Dimensions: (x: 4) Coordinates: * x (x) int64 0 1 2 3 Data variables: A (x) float64 0.0 2.0 0.0 0.0 B (x) float64 3.0 4.0 nan 1.0 C (x) float64 1.0 1.0 0.0 0.0 D (x) float64 nan 3.0 nan 4.0 ``` #### Expected Output ``` Dimensions: (x: 4) Coordinates: * x (x) int64 0 1 2 3 Data variables: A (x) float64 0.0 2.0 0.0 0.0 B (x) float64 3.0 4.0 nan 1.0 C (x) bool True True False False D (x) float64 nan 3.0 nan 4.0 ``` #### Problem Description I'd like to use `fillna` to replace NaN's in some of a `Dataset`'s variables. However, `fillna` unexpectably converts all variables to float, even if they are boolean or integers. Would it be possible to apply `fillna` only on float / object types and consider the `value` argument, if I only want to apply `fillna` to a subset of the dataset? #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-957.27.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.14.0 pandas: 0.25.1 numpy: 1.17.2 scipy: 1.3.1 netCDF4: 1.4.2 pydap: None h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.5.2 distributed: 2.5.2 matplotlib: 3.1.1 cartopy: None seaborn: 0.9.0 numbagg: None setuptools: 41.4.0 pip: 19.2.3 conda: None pytest: 5.0.1 IPython: 7.8.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3570/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 566509807,MDU6SXNzdWU1NjY1MDk4MDc=,3775,[Question] Efficient shortcut for unstacking only parts of dimension?,1200058,open,0,,,1,2020-02-17T20:46:03Z,2020-03-07T04:53:05Z,,NONE,,,,"Hi all, is there an efficient way to unstack only parts of a MultiIndex? Consider for example the following array: ```python Dimensions: (observations: 17525) Coordinates: * observations (observations) MultiIndex - subtissue (observations) object 'Skin_Sun_Exposed_Lower_leg' ... 'Thyroid' - individual (observations) object 'GTEX-111FC' ... 'GTEX-ZZPU' - gene (observations) object 'ENSG00000140400' ... 'ENSG00000174233' - end (observations) object '5' '5' '5' ... '3' '3' Data variables: fraser_min_pval (observations) float64 dask.array fraser_min_minus_log10_pval (observations) float64 dask.array ``` Here, I have a MultiIndex `observations=[""subtissue"", ""individual"", ""gene"", ""end""]`. However, I would like to have `end` in its own dimension. Currently, I have to do the following to solve this issue: ```python3 xrds.unstack(""observations"").stack(observations=[""subtissue"", ""individual"", ""gene"",]) ``` However, this seems quite inefficient and introduces `NaN`'s. #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1062.1.2.el7.x86_64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.15.0 pandas: 1.0.0 numpy: 1.17.5 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: 0.7.4 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.10.1 distributed: 2.10.0 matplotlib: 3.1.3 cartopy: None seaborn: 0.10.0 numbagg: None setuptools: 45.1.0.post20200119 pip: 20.0.2 conda: None pytest: 5.3.5 IPython: 7.12.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3775/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 325661581,MDU6SXNzdWUzMjU2NjE1ODE=,2175,[Feature Request] Visualizing dimensions,1200058,open,0,,,4,2018-05-23T11:22:29Z,2019-07-12T16:10:23Z,,NONE,,,,"Hi, I'm curious how you created your logo: ![grafik](https://user-images.githubusercontent.com/1200058/40421311-c4d18d62-5e8b-11e8-94f4-b217f51b61b0.png) I'd like to create visualizations of the dimensions in my dataset similar to your logo. Having a functionality simplifying this task would be a very useful feature in xarray.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2175/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue