home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

9 rows where state = "open" and user = 1200058 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date)

type 1

  • issue 9

state 1

  • open · 9 ✖

repo 1

  • xarray 9
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
595784008 MDU6SXNzdWU1OTU3ODQwMDg= 3945 Implement `value_counts` method Hoeze 1200058 open 0     3 2020-04-07T11:05:06Z 2023-09-12T15:47:22Z   NONE      

Implement value_counts method

MCVE Code Sample

python print(object) <xarray.DataArray (subtissue: 49, sample: 532, gene_id: 31490)> dask.array<where, shape=(49, 532, 31490), dtype=object, chunksize=(1, 10, 31490), chunktype=numpy.ndarray> Coordinates: * gene_id (gene_id) object 'ENSG00000000003' ... 'ENSG00000285966' * sample (sample) object 'GTEX-1117F' 'GTEX-111CU' ... 'GTEX-ZZPU' * subtissue (subtissue) object 'Adipose - Subcutaneous' ... 'Whole Blood'

Suggested API:

object.value_count(**kwargs) should return an array with a new dimension defined by the kwargs key, containing the count values of all dimensions defined by the kwargs value.

Expected Output

python object.value_count(observation_counts=["subtissue", "sample"]) <xarray.DataArray (observation_counts: 3, gene_id: 31490)> dask.array<where, shape=(3, 31490), dtype=int, chunksize=(3, 31490), chunktype=numpy.ndarray> Coordinates: * gene_id (gene_id) object 'ENSG00000000003' ... 'ENSG00000285966' * observation_counts (observation_counts) object 'underexpressed' 'normal' 'overexpressed'

Problem Description

Currently there is no existing equivalent to this method that I know in xarray.

Versions

Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.3.11-1.el7.elrepo.x86_64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.15.0 pandas: 1.0.0 numpy: 1.17.5 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: 0.7.4 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.10.1 distributed: 2.10.0 matplotlib: 3.1.3 cartopy: None seaborn: 0.10.0 numbagg: None setuptools: 45.1.0.post20200119 pip: 20.0.2 conda: None pytest: 5.3.5 IPython: 7.12.0 sphinx: 2.0.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3945/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
712217045 MDU6SXNzdWU3MTIyMTcwNDU= 4476 Reimplement GroupBy.argmax Hoeze 1200058 open 0     5 2020-09-30T19:25:22Z 2023-03-03T06:59:40Z   NONE      

Please implement

Is your feature request related to a problem? Please describe. Observed: python da.groupby("g").argmax(dim="t") ```python


AttributeError Traceback (most recent call last) <ipython-input-84-15c199b0f7d4> in <module> ----> 1 da.groupby("g").argmax(dim="t")

AttributeError: 'DataArrayGroupBy' object has no attribute 'argmax' ```

Describe the solution you'd like Expected: Vector of length len(unique(g)) containing the indices of da["t"] where the value was maximum.

Workaround: python da.groupby("g").apply(lambda c: c.argmax(dim="t")) <xarray.DataArray 'da' (st: 11, g: 1)> array([[ 7], [ 0], [14], [14], [ 0], [ 0], [ 7], [ 0], [14], [ 0], [ 7]]) Coordinates: * st (st) object 'a' ... 'z' * g (g) object 'E'

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4476/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
860418546 MDU6SXNzdWU4NjA0MTg1NDY= 5179 N-dimensional boolean indexing Hoeze 1200058 open 0     6 2021-04-17T14:07:48Z 2021-07-16T17:30:45Z   NONE      

Currently, the docs state that boolean indexing is only possible with 1-dimensional arrays: http://xarray.pydata.org/en/stable/indexing.html

However, I often have the case where I'd like to convert a subset of an xarray to a dataframe. Usually, I would call e.g.: python data = xrds.stack(observations=["dim1", "dim2", "dim3"]) data = data.isel(~ data.missing) df = data.to_dataframe()

However, this approach is incredibly slow and memory-demanding, since it creates a MultiIndex of every possible coordinate in the array.

Describe the solution you'd like A better approach would be to directly allow index selection with the boolean array: python data = xrds.isel(~ xrds.missing, dim="observations") df = data.to_dataframe() This way, it is possible to 1) Identify the resulting coordinates with np.argwhere() 2) Directly use the underlying array for fancy indexing: variable.data[mask]

Additional context I created a proof-of-concept that works for my projects: https://gist.github.com/Hoeze/c746ea1e5fef40d99997f765c48d3c0d Some important lines are those: ```python def core_dim_locs_from_cond(cond, new_dim_name, core_dims=None) -> List[Tuple[str, xr.DataArray]]: [...] core_dim_locs = np.argwhere(cond.data) if isinstance(core_dim_locs, dask.array.core.Array): core_dim_locs = core_dim_locs.persist().compute_chunk_sizes()

def subset_variable(variable, core_dim_locs, new_dim_name, mask=None): [...] subset = dask.array.asanyarray(variable.data)[mask] # force-set chunk size from known chunks chunk_sizes = core_dim_locs[0][1].chunks[0] subset._chunks = (chunk_sizes, *subset._chunks[1:]) ```

As a result, I would expect something like this:

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5179/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
489825483 MDU6SXNzdWU0ODk4MjU0ODM= 3281 [proposal] concatenate by axis, ignore dimension names Hoeze 1200058 open 0     4 2019-09-05T15:06:22Z 2021-07-08T17:42:53Z   NONE      

Hi, I wrote a helper function which allows to concatenate arrays like xr.combine_nested with the difference that it only supports xr.DataArrays, concatenates them by axis position similar to np.concatenate and overwrites all dimension names.

I often need this to combine very different feature types.

```python from typing import Union, Tuple, List import numpy as np import xarray as xr

def concat_by_axis( darrs: Union[List[xr.DataArray], Tuple[xr.DataArray]], dims: Union[List[str], Tuple[str]], axis: int = None, **kwargs ): """ Concat arrays along some axis similar to np.concatenate. Automatically renames the dimensions to dims. Please note that this renaming happens by the axis position, therefore make sure to transpose all arrays to the correct dimension order.

:param darrs: List or tuple of xr.DataArrays
:param dims: The dimension names of the resulting array. Renames axes where necessary.
:param axis: The axis which should be concatenated along
:param kwargs: Additional arguments which will be passed to `xr.concat()`
:return: Concatenated xr.DataArray with dimensions `dim`.
"""

# Get depth of nested lists. Assumes `darrs` is correctly formatted as list of lists.
if axis is None:
    axis = 0
    l = darrs
    # while l is a list or tuple and contains elements:
    while isinstance(l, List) or isinstance(l, Tuple) and l:
        # increase depth by one
        axis -= 1
        l = l[0]
    if axis == 0:
        raise ValueError("`darrs` has to be a (possibly nested) list or tuple of xr.DataArrays!")

to_concat = list()
for i, da in enumerate(darrs):
    # recursive call for nested arrays;
    # most inner call should have axis = -1,
    # most outer call should have axis = - depth_of_darrs
    if isinstance(da, list) or isinstance(da, tuple):
        da = concat_axis(da, dims=dims, axis=axis + 1, **kwargs)

    if not isinstance(da, xr.DataArray):
        raise ValueError("Input %d must be a xr.DataArray" % i)
    if len(da.dims) != len(dims):
        raise ValueError("Input %d must have the same number of dimensions as specified in the `dims` argument!" % i)

    # force-rename dimensions
    da = da.rename(dict(zip(da.dims, dims)))

    to_concat.append(da)

return xr.concat(to_concat, dim=dims[axis], **kwargs)

```

Would it make sense to include this in xarray?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3281/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
636512559 MDU6SXNzdWU2MzY1MTI1NTk= 4143 [Feature request] Masked operations Hoeze 1200058 open 0     1 2020-06-10T20:04:45Z 2021-04-22T20:54:03Z   NONE      

Xarray already has unstack(sparse=True) which is quite awesome. However, in many cases it is costly to convert a very dense array (existing values >> missing values) to a sparse representation. Also, many calculations require to convert the sparse array back into dense array and to manually mask the missing values (e.g. Keras).

Logically, a sparse array is equal to a masked dense array. They only differ in their internal data representation. Therefore, I would propose to have a masked=True option for all operations that can create missing values. These cover (amongst others): - .unstack([...], masked=True) - .where(<multi-dimensional array>, masked=True) - .align([...], masked=True)

This would solve a number of problems: - No more conversion of int -> float - Explicit value for missingness - When stacking data with missing values, the missing values can be just dropped - When converting data with missing values to DataFrame, the missing values can be just dropped

MCVE Code Sample

An example would be outer joins with slightly different coordinates (taken from the documentation): ```python

x <xarray.DataArray (lat: 2, lon: 2)> array([[25, 35], [10, 24]]) Coordinates: * lat (lat) float64 35.0 40.0 * lon (lon) float64 100.0 120.0

y <xarray.DataArray (lat: 2, lon: 2)> array([[20, 5], [ 7, 13]]) Coordinates: * lat (lat) float64 35.0 42.0 * lon (lon) float64 100.0 120.0 ```

Non-masked outer join:

```python

a, b = xr.align(x, y, join="outer") a <xarray.DataArray (lat: 3, lon: 2)> array([[25., 35.], [10., 24.], [nan, nan]]) Coordinates: * lat (lat) float64 35.0 40.0 42.0 * lon (lon) float64 100.0 120.0 b <xarray.DataArray (lat: 3, lon: 2)> array([[20., 5.], [nan, nan], [ 7., 13.]]) Coordinates: * lat (lat) float64 35.0 40.0 42.0 * lon (lon) float64 100.0 120.0 ```

The masked version:

```python

a, b = xr.align(x, y, join="outer", masked=True) a <xarray.DataArray (lat: 3, lon: 2)> masked_array(data=[[25, 35], [10, 24], [--, --]], mask=[[False, False], [False, False], [True, True]], fill_value=0) Coordinates: * lat (lat) float64 35.0 40.0 42.0 * lon (lon) float64 100.0 120.0 b <xarray.DataArray (lat: 3, lon: 2)> masked_array(data=[[20, 5], [--, --], [7, 13]], mask=[[False, False], [True, True], [False, False]], fill_value=0) Coordinates: * lat (lat) float64 35.0 40.0 42.0 * lon (lon) float64 100.0 120.0 ```

Related issue: https://github.com/pydata/xarray/issues/3955

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4143/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
512879550 MDU6SXNzdWU1MTI4Nzk1NTA= 3452 [feature request] __iter__() for rolling-window on datasets Hoeze 1200058 open 0     2 2019-10-26T20:08:06Z 2021-02-18T21:41:58Z   NONE      

Currently, rolling() on a dataset does not return an iterator:

MCVE Code Sample

```python arr = xr.DataArray(np.arange(0, 7.5, 0.5).reshape(3, 5), dims=('x', 'y'))

r = arr.to_dataset(name="test").rolling(y=3) for label, arr_window in r: print(label)


TypeError Traceback (most recent call last) <ipython-input-12-b1703cb71c1e> in <module> 3 4 r = arr.to_dataset(name="test").rolling(y=3) ----> 5 for label, arr_window in r: 6 print(label)

TypeError: 'DatasetRolling' object is not iterable ```

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.3.7-arch1-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: de_DE.UTF-8 libhdf5: 1.10.4 libnetcdf: None xarray: 0.13.0 pandas: 0.24.2 numpy: 1.16.4 scipy: 1.3.0 netCDF4: None pydap: None h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.1.0 distributed: 2.1.0 matplotlib: 3.1.1 cartopy: None seaborn: 0.9.0 numbagg: None setuptools: 41.4.0 pip: 19.1.1 conda: None pytest: None IPython: 7.8.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3452/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
528060435 MDU6SXNzdWU1MjgwNjA0MzU= 3570 fillna on dataset converts all variables to float Hoeze 1200058 open 0     5 2019-11-25T12:39:49Z 2020-09-15T15:35:04Z   NONE      

MCVE Code Sample

python xr.Dataset( { "A": ("x", [np.nan, 2, np.nan, 0]), "B": ("x", [3, 4, np.nan, 1]), "C": ("x", [True, True, False, False]), "D": ("x", [np.nan, 3, np.nan, 4]) }, coords={"x": [0, 1, 2, 3]} ).fillna(value={"A": 0}) <xarray.Dataset> Dimensions: (x: 4) Coordinates: * x (x) int64 0 1 2 3 Data variables: A (x) float64 0.0 2.0 0.0 0.0 B (x) float64 3.0 4.0 nan 1.0 C (x) float64 1.0 1.0 0.0 0.0 D (x) float64 nan 3.0 nan 4.0

Expected Output

<xarray.Dataset> Dimensions: (x: 4) Coordinates: * x (x) int64 0 1 2 3 Data variables: A (x) float64 0.0 2.0 0.0 0.0 B (x) float64 3.0 4.0 nan 1.0 C (x) bool True True False False D (x) float64 nan 3.0 nan 4.0

Problem Description

I'd like to use fillna to replace NaN's in some of a Dataset's variables. However, fillna unexpectably converts all variables to float, even if they are boolean or integers.

Would it be possible to apply fillna only on float / object types and consider the value argument, if I only want to apply fillna to a subset of the dataset?

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-957.27.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.14.0 pandas: 0.25.1 numpy: 1.17.2 scipy: 1.3.1 netCDF4: 1.4.2 pydap: None h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.5.2 distributed: 2.5.2 matplotlib: 3.1.1 cartopy: None seaborn: 0.9.0 numbagg: None setuptools: 41.4.0 pip: 19.2.3 conda: None pytest: 5.0.1 IPython: 7.8.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3570/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
566509807 MDU6SXNzdWU1NjY1MDk4MDc= 3775 [Question] Efficient shortcut for unstacking only parts of dimension? Hoeze 1200058 open 0     1 2020-02-17T20:46:03Z 2020-03-07T04:53:05Z   NONE      

Hi all, is there an efficient way to unstack only parts of a MultiIndex?

Consider for example the following array: python <xarray.Dataset> Dimensions: (observations: 17525) Coordinates: * observations (observations) MultiIndex - subtissue (observations) object 'Skin_Sun_Exposed_Lower_leg' ... 'Thyroid' - individual (observations) object 'GTEX-111FC' ... 'GTEX-ZZPU' - gene (observations) object 'ENSG00000140400' ... 'ENSG00000174233' - end (observations) object '5' '5' '5' ... '3' '3' Data variables: fraser_min_pval (observations) float64 dask.array<chunksize=(17525,), meta=np.ndarray> fraser_min_minus_log10_pval (observations) float64 dask.array<chunksize=(17525,), meta=np.ndarray> Here, I have a MultiIndex observations=["subtissue", "individual", "gene", "end"]. However, I would like to have end in its own dimension. Currently, I have to do the following to solve this issue: python3 xrds.unstack("observations").stack(observations=["subtissue", "individual", "gene",]) However, this seems quite inefficient and introduces NaN's.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1062.1.2.el7.x86_64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.15.0 pandas: 1.0.0 numpy: 1.17.5 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: 0.7.4 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.10.1 distributed: 2.10.0 matplotlib: 3.1.3 cartopy: None seaborn: 0.10.0 numbagg: None setuptools: 45.1.0.post20200119 pip: 20.0.2 conda: None pytest: 5.3.5 IPython: 7.12.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3775/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
325661581 MDU6SXNzdWUzMjU2NjE1ODE= 2175 [Feature Request] Visualizing dimensions Hoeze 1200058 open 0     4 2018-05-23T11:22:29Z 2019-07-12T16:10:23Z   NONE      

Hi, I'm curious how you created your logo:

I'd like to create visualizations of the dimensions in my dataset similar to your logo. Having a functionality simplifying this task would be a very useful feature in xarray.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2175/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 20.569ms · About: xarray-datasette