github: issues: 3 rows where repo = 13221727, state = "closed" and user = 5308236 sorted by updated

3 rows where repo = 13221727, state = "closed" and user = 5308236 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
365678022	MDU6SXNzdWUzNjU2NzgwMjI=	2452	DataArray.sel extremely slow	mschrimpf 5308236	closed	5	2018-10-01T23:09:47Z	2018-10-02T16:15:00Z	2018-10-02T15:58:21Z	NONE	Problem description `.sel` is an xarray method I use a lot and I would have expected it to fairly efficient. However, even on tiny DataArrays, it takes seconds. Code Sample, a copy-pastable example if possible ```python import timeit setup = """ import itertools import numpy as np import xarray as xr import string a = list(string.printable) b = list(string.ascii_lowercase) d = xr.DataArray(np.random.rand(len(a), len(b)), coords={'a': a, 'b': b}, dims=['a', 'b']) d.load() """ run = """ for _a, _b in itertools.product(a, b): d.sel(a=_a, b=_b) """ running_times = timeit.repeat(run, setup, repeat=3, number=10) print("xarray", running_times) # e.g. [14.792144000064582, 15.19372400001157, 15.345327000017278] ``` Expected Output I would have expected the above code to run in milliseconds. However, it takes over 10 seconds! Adding an additional `d = d.stack(aa=['a'], bb=['b'])` makes it even slower, about twice as slow. For reference, a naive dict-indexing implementation in Python takes 0.01 seconds: ```python setup = """ import itertools import numpy as np import string a = list(string.printable) b = list(string.ascii_lowercase) d = np.random.rand(len(a), len(b)) indexers = {'a': {coord: index for (index, coord) in enumerate(a)}, 'b': {coord: index for (index, coord) in enumerate(b)}} """ run = """ for _a, _b in itertools.product(a, b): index_a, index_b = indexers['a'][_a], indexers['b'][_b] item = d[index_a][index_b] """ running_times = timeit.repeat(run, setup, repeat=3, number=10) print("dicts", running_times) # e.g. [0.015355999930761755, 0.01466800004709512, 0.014295000000856817] ``` Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.0.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-17134-Microsoft machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8 xarray: 0.10.8 pandas: 0.23.4 numpy: 1.15.1 scipy: 1.1.0 netCDF4: 1.4.1 h5netcdf: None h5py: None Nio: None zarr: None bottleneck: None cyordereddict: None dask: None distributed: None matplotlib: 2.2.3 cartopy: None seaborn: None setuptools: 40.2.0 pip: 10.0.1 conda: None pytest: 3.7.4 IPython: 6.5.0 sphinx: None this is a follow-up from #2438	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2452/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
363629186	MDU6SXNzdWUzNjM2MjkxODY=	2438	Efficient workaround to group by multiple dimensions	mschrimpf 5308236	closed	3	2018-09-25T15:11:38Z	2018-10-02T15:56:53Z	2018-10-02T15:56:53Z	NONE	Grouping by multiple dimensions is not yet supported (#324): python d = DataAssembly([[1, 2, 3], [4, 5, 6]], coords={'a': ('multi_dim', ['a', 'b']), 'c': ('multi_dim', ['c', 'c']), 'b': ['x', 'y', 'z']}, dims=['multi_dim', 'b']) d.groupby(['a', 'b']) # TypeError: `group` must be an xarray.DataArray or the name of an xarray variable or dimension An inefficient solution is to run the for loops manually: ```python a, b = np.unique(d['a'].values), np.unique(d['b'].values) result = xr.DataArray(np.zeros([len(a), len(b)]), coords={'a': a, 'b': b}, dims=['a', 'b']) for a, b in itertools.product(a, b): cells = d.sel(a=a, b=b) merge = cells.mean() result.loc[{'a': a, 'b': b}] = merge result = DataArray (a: 2, b: 2)> array([[2., 3.], [5., 6.]]) Coordinates: * a (a) <U1 'x' 'y' * b (b) int64 0 1 ``` This is however horribly slow for larger arrays. Is there a more efficient / straight-forward work-around? Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.0.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-17134-Microsoft machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8 xarray: 0.10.8 pandas: 0.23.4 numpy: 1.15.1 scipy: 1.1.0 netCDF4: 1.4.1 h5netcdf: None h5py: None Nio: None zarr: None bottleneck: None cyordereddict: None dask: None distributed: None matplotlib: 2.2.3 cartopy: None seaborn: None setuptools: 40.2.0 pip: 10.0.1 conda: None pytest: 3.7.4 IPython: 6.5.0 sphinx: None Related: #324, https://stackoverflow.com/questions/52453426/grouping-by-multiple-dimensions	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2438/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
319085244	MDU6SXNzdWUzMTkwODUyNDQ=	2095	combine complementary DataArrays	mschrimpf 5308236	closed	1	2018-05-01T01:02:26Z	2018-05-02T01:34:53Z	2018-05-02T01:34:52Z	NONE	I have a list of DataArrays with three dimensions. For each item in the list, two of the dimensions are a single value but the combination of all items would yield the full combinatorial values. Code Sample ```python import itertools import numpy as np import xarray as xr `ds = [] for vals_dim1, vals_dim2 in itertools.product(list(range(2)), list(range(3))): d = xr.DataArray(np.random.rand(1, 1, 4), coords={'dim1': [vals_dim1], 'dim2': [vals_dim2], 'dim3': range(4)}, dims=['dim1', 'dim2', 'dim3']) ds.append(d)` ``` Expected Output I then want to combine these complimentary `DataArray`s but none of what I tried so far seems to work. The result should be a `DataArray` with shape `\|2x3x4\|` and dimensions `dim1: \|2\|, dim2: \|3\|, dim3: \|4\|`. The following do not work: ```python # does not automatically infer dimensions and fails with # "ValueError: conflicting sizes for dimension 'concat_dim': length 2 on 'concat_dim' and length 6 on <this-array>" ds = xr.concat(ds, dim=['dim1', 'dim2']) # will still try to insert a new `concat_dim` and fails with # "ValueError: conflicting MultiIndex level name(s): 'dim1' (concat_dim), (dim1) 'dim2' (concat_dim), (dim2)" import pandas as pd dims = [[0] * 3 + [1] * 3, list(range(3)) * 2] dims = pd.MultiIndex.from_arrays(dims, names=['dim1', 'dim2']) ds = xr.concat(ds, dim=dims) # fails with # AttributeError: 'DataArray' object has no attribute 'data_vars' ds = xr.auto_combine(ds) ``` Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-43-Microsoft machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: None.None xarray: 0.10.2 pandas: 0.22.0 numpy: 1.14.2 scipy: 1.0.0 netCDF4: 1.3.1 h5netcdf: None h5py: None Nio: None zarr: None bottleneck: None cyordereddict: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None setuptools: 38.5.1 pip: 10.0.1 conda: None pytest: 3.4.2 IPython: 6.2.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2095/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

3 rows where repo = 13221727, state = "closed" and user = 5308236 sorted by updated_at descending

Problem description

Code Sample, a copy-pastable example if possible

Expected Output

Output of `xr.show_versions()`

result = DataArray (a: 2, b: 2)> array([[2., 3.], [5., 6.]])

Coordinates:

* a (a) <U1 'x' 'y'

* b (b) int64 0 1

Output of `xr.show_versions()`

Code Sample

Expected Output

Output of `xr.show_versions()`

Advanced export

issues

3 rows where repo = 13221727, state = "closed" and user = 5308236 sorted by updated_at descending

Problem description

Code Sample, a copy-pastable example if possible

Expected Output

Output of xr.show_versions()

result = DataArray (a: 2, b: 2)> array([[2., 3.], [5., 6.]])

Coordinates:

* a (a) <U1 'x' 'y'

* b (b) int64 0 1

Output of xr.show_versions()

Code Sample

Expected Output

Output of xr.show_versions()

Advanced export

Output of `xr.show_versions()`

Output of `xr.show_versions()`

Output of `xr.show_versions()`