home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

5 rows where user = 5308236 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 3
  • open 2

type 1

  • issue 5

repo 1

  • xarray 5
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
446868198 MDU6SXNzdWU0NDY4NjgxOTg= 2978 sel(method=x) is not propagated for MultiIndex mschrimpf 5308236 open 0     3 2019-05-21T23:30:56Z 2022-04-09T02:09:00Z   NONE      

When passing a method different from None to the selection method (e.g. .sel(method='nearest')), it is not propagated if the index is a MultiIndex. Specifically, the passing of the method key seems to be missing in xarray/core/indexing.py:convert_label_indexer https://github.com/pydata/xarray/blob/0811141e8f985a1f3b95ead92c3850cc74e160a5/xarray/core/indexing.py#L158-L159

For a normal index, the method is passed properly: https://github.com/pydata/xarray/blob/0811141e8f985a1f3b95ead92c3850cc74e160a5/xarray/core/indexing.py#L181

This leads to an unexpected KeyError when the selection value is not in the index, even if a nearest value could have been found.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-143-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.8 pandas: 0.24.2 numpy: 1.16.2 scipy: 1.1.0 netCDF4: 1.4.2 h5netcdf: None h5py: 2.8.0 Nio: None zarr: None bottleneck: None cyordereddict: None dask: 0.20.0 distributed: None matplotlib: 3.0.1 cartopy: None seaborn: 0.9.0 setuptools: 40.8.0 pip: 19.0.3 conda: None pytest: 3.10.0 IPython: 7.1.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2978/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
376953925 MDU6SXNzdWUzNzY5NTM5MjU= 2537 single coordinate is overwritten with dimension by set_index mschrimpf 5308236 open 0     8 2018-11-02T20:17:54Z 2020-11-02T17:24:35Z   NONE      

Code Sample

python import xarray as xr d = xr.DataArray([0], coords={'coord': ('dim', [0])}, dims=['dim']) d.set_index(append=True, inplace=True, dim=['coord']) d.sel(dim=0) # works d.sel(coord=0) # doesn't work, coord does not exist anymore print(d)

<xarray.DataArray (dim: 1)> array([0]) Coordinates: * dim (dim) int64 0

Problem description

when a DataArray is initialized with a dimension containing only one coordinate, selection on the coordinate is not directly possible. As a workaround, we can set_index but if there is only one coordinate on a dimension, the coordinate vanishes and its values are attached to the dimension directly.

The DataArrays in my use case are generic, in some cases there are multiple coordinates and sometimes there is only one. If the one consistent coordinate is discarded for some cases, follow-up code becomes tedious. Having a single-coordinate MultiIndex would be much more intuitive so that one can still .sel over the coordinate.

Expected Output

<xarray.DataArray (dim: 1)> array([0]) Coordinates: * dim MultiIndex coord (dim) int64 0

For more than one coordinate on the dimension, the dimension becomes a MultiIndex with all the coordinates. With only a single coordinate however, this does not happen.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.6.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-137-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.8 pandas: 0.23.4 numpy: 1.15.1 scipy: 1.1.0 netCDF4: 1.4.1 h5netcdf: None h5py: 2.8.0 Nio: None zarr: None bottleneck: None cyordereddict: None dask: 0.19.1 distributed: None matplotlib: 2.2.3 cartopy: None seaborn: 0.9.0 setuptools: 39.1.0 pip: 10.0.1 conda: None pytest: 3.8.0 IPython: 6.5.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2537/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
365678022 MDU6SXNzdWUzNjU2NzgwMjI= 2452 DataArray.sel extremely slow mschrimpf 5308236 closed 0     5 2018-10-01T23:09:47Z 2018-10-02T16:15:00Z 2018-10-02T15:58:21Z NONE      

Problem description

.sel is an xarray method I use a lot and I would have expected it to fairly efficient. However, even on tiny DataArrays, it takes seconds.

Code Sample, a copy-pastable example if possible

```python import timeit

setup = """ import itertools import numpy as np import xarray as xr import string

a = list(string.printable) b = list(string.ascii_lowercase) d = xr.DataArray(np.random.rand(len(a), len(b)), coords={'a': a, 'b': b}, dims=['a', 'b']) d.load() """

run = """ for _a, _b in itertools.product(a, b): d.sel(a=_a, b=_b) """ running_times = timeit.repeat(run, setup, repeat=3, number=10) print("xarray", running_times) # e.g. [14.792144000064582, 15.19372400001157, 15.345327000017278] ```

Expected Output

I would have expected the above code to run in milliseconds. However, it takes over 10 seconds! Adding an additional d = d.stack(aa=['a'], bb=['b']) makes it even slower, about twice as slow.

For reference, a naive dict-indexing implementation in Python takes 0.01 seconds: ```python setup = """ import itertools import numpy as np import string

a = list(string.printable) b = list(string.ascii_lowercase)

d = np.random.rand(len(a), len(b)) indexers = {'a': {coord: index for (index, coord) in enumerate(a)}, 'b': {coord: index for (index, coord) in enumerate(b)}} """

run = """ for _a, _b in itertools.product(a, b): index_a, index_b = indexers['a'][_a], indexers['b'][_b] item = d[index_a][index_b] """ running_times = timeit.repeat(run, setup, repeat=3, number=10) print("dicts", running_times) # e.g. [0.015355999930761755, 0.01466800004709512, 0.014295000000856817] ```

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.0.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-17134-Microsoft machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8 xarray: 0.10.8 pandas: 0.23.4 numpy: 1.15.1 scipy: 1.1.0 netCDF4: 1.4.1 h5netcdf: None h5py: None Nio: None zarr: None bottleneck: None cyordereddict: None dask: None distributed: None matplotlib: 2.2.3 cartopy: None seaborn: None setuptools: 40.2.0 pip: 10.0.1 conda: None pytest: 3.7.4 IPython: 6.5.0 sphinx: None

this is a follow-up from #2438

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2452/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
363629186 MDU6SXNzdWUzNjM2MjkxODY= 2438 Efficient workaround to group by multiple dimensions mschrimpf 5308236 closed 0     3 2018-09-25T15:11:38Z 2018-10-02T15:56:53Z 2018-10-02T15:56:53Z NONE      

Grouping by multiple dimensions is not yet supported (#324):

python d = DataAssembly([[1, 2, 3], [4, 5, 6]], coords={'a': ('multi_dim', ['a', 'b']), 'c': ('multi_dim', ['c', 'c']), 'b': ['x', 'y', 'z']}, dims=['multi_dim', 'b']) d.groupby(['a', 'b']) # TypeError: `group` must be an xarray.DataArray or the name of an xarray variable or dimension

An inefficient solution is to run the for loops manually:

```python a, b = np.unique(d['a'].values), np.unique(d['b'].values) result = xr.DataArray(np.zeros([len(a), len(b)]), coords={'a': a, 'b': b}, dims=['a', 'b']) for a, b in itertools.product(a, b): cells = d.sel(a=a, b=b) merge = cells.mean() result.loc[{'a': a, 'b': b}] = merge

result = DataArray (a: 2, b: 2)> array([[2., 3.], [5., 6.]])

Coordinates:

* a (a) <U1 'x' 'y'

* b (b) int64 0 1

```

This is however horribly slow for larger arrays. Is there a more efficient / straight-forward work-around?

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.0.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-17134-Microsoft machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8 xarray: 0.10.8 pandas: 0.23.4 numpy: 1.15.1 scipy: 1.1.0 netCDF4: 1.4.1 h5netcdf: None h5py: None Nio: None zarr: None bottleneck: None cyordereddict: None dask: None distributed: None matplotlib: 2.2.3 cartopy: None seaborn: None setuptools: 40.2.0 pip: 10.0.1 conda: None pytest: 3.7.4 IPython: 6.5.0 sphinx: None

Related: #324, https://stackoverflow.com/questions/52453426/grouping-by-multiple-dimensions

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2438/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
319085244 MDU6SXNzdWUzMTkwODUyNDQ= 2095 combine complementary DataArrays mschrimpf 5308236 closed 0     1 2018-05-01T01:02:26Z 2018-05-02T01:34:53Z 2018-05-02T01:34:52Z NONE      

I have a list of DataArrays with three dimensions. For each item in the list, two of the dimensions are a single value but the combination of all items would yield the full combinatorial values.

Code Sample

```python import itertools import numpy as np import xarray as xr

ds = []
for vals_dim1, vals_dim2 in itertools.product(list(range(2)), list(range(3))):
    d = xr.DataArray(np.random.rand(1, 1, 4),
                     coords={'dim1': [vals_dim1], 'dim2': [vals_dim2], 'dim3': range(4)},
                     dims=['dim1', 'dim2', 'dim3'])
    ds.append(d)

```

Expected Output

I then want to combine these complimentary DataArrays but none of what I tried so far seems to work. The result should be a DataArray with shape |2x3x4| and dimensions dim1: |2|, dim2: |3|, dim3: |4|.

The following do not work: ```python # does not automatically infer dimensions and fails with # "ValueError: conflicting sizes for dimension 'concat_dim': length 2 on 'concat_dim' and length 6 on <this-array>" ds = xr.concat(ds, dim=['dim1', 'dim2'])

# will still try to insert a new `concat_dim` and fails with
# "ValueError: conflicting MultiIndex level name(s): 'dim1' (concat_dim), (dim1) 'dim2' (concat_dim), (dim2)"
import pandas as pd
dims = [[0] * 3 + [1] * 3, list(range(3)) * 2]
dims = pd.MultiIndex.from_arrays(dims, names=['dim1', 'dim2'])
ds = xr.concat(ds, dim=dims)

# fails with
# AttributeError: 'DataArray' object has no attribute 'data_vars'
ds = xr.auto_combine(ds)

```

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-43-Microsoft machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: None.None xarray: 0.10.2 pandas: 0.22.0 numpy: 1.14.2 scipy: 1.0.0 netCDF4: 1.3.1 h5netcdf: None h5py: None Nio: None zarr: None bottleneck: None cyordereddict: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None setuptools: 38.5.1 pip: 10.0.1 conda: None pytest: 3.4.2 IPython: 6.2.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2095/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 3279.974ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows