home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

6 rows where repo = 13221727, state = "closed" and user = 13190237 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, closed_at, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 4
  • pull 2

state 1

  • closed · 6 ✖

repo 1

  • xarray · 6 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
374025325 MDU6SXNzdWUzNzQwMjUzMjU= 2511 Array indexing with dask arrays ulijh 13190237 closed 0     20 2018-10-25T16:13:11Z 2023-03-15T02:48:00Z 2023-03-15T02:48:00Z CONTRIBUTOR      

Code example

```python da = xr.DataArray(np.ones((10, 10))).chunk(2) indc = xr.DataArray(np.random.randint(0, 9, 10)).chunk(2)

This fails:

da[{'dim_1' : indc}].values ```

Problem description

Indexing with chunked arrays fails, whereas it's fine with "normal" arrays. In case the indices are the result of a lazy calculation, I would like to continue lazily.

Expected Output

I would expect an output just like in the "un-chunked" case:

``` da[{'dim_1' : indc.compute()}].values

Returns: array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

```

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.0.final.0 python-bits: 64 OS: Linux OS-release: 4.18.14-arch1-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: None LANG: de_DE.utf8 LOCALE: de_DE.UTF-8 xarray: 0.10.9 pandas: 0.23.4 numpy: 1.15.2 scipy: 1.1.0 netCDF4: None h5netcdf: 0.6.2 h5py: 2.8.0 Nio: None zarr: None cftime: None PseudonetCDF: None rasterio: None iris: None bottleneck: 1.2.1 cyordereddict: None dask: 0.19.4 distributed: None matplotlib: 2.2.3 cartopy: 0.16.0 seaborn: None setuptools: 40.4.3 pip: 18.0 conda: None pytest: 3.8.2 IPython: 6.5.0 sphinx: 1.8.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2511/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
608974755 MDU6SXNzdWU2MDg5NzQ3NTU= 4015 apply_ufunc gives wrong dtype with dask=parallelized and vectorized=True ulijh 13190237 closed 0     2 2020-04-29T11:17:48Z 2020-08-19T06:57:56Z 2020-08-19T06:57:56Z CONTRIBUTOR      

Applying a function to a data array with dtype = complex returns one with dtype = float. It seems to work before commit 17b70caa6eafa062fd31e7f39334b3de922ff422.

MCVE Code Sample

```python import numpy as np import xarray as xr

def func(x): return np.sum(x ** 2)

da = xr.DataArray(np.arange(234).reshape(2,3,4)) da = da + 1j * da da = da.chunk(dict(dim_1=1))

da2 = xr.apply_ufunc( func, da, vectorize=True, dask="parallelized", output_dtypes=[da.dtype], )

assert da2.dtype == da.dtype, "wrong dtype" ```

Expected Output

da and da2 should both have the same dtype=complex.

Problem Description

To me it seems to me that the kwarg meta=None somehow causes dask to allocate a float array and ignore the dtype kwargs (which seems to be carried through correctly down to dask.array.blockwise.blockwise()) . I'm not familiar with the apply_ufing and the dask code, so I can't tell on which end the bug sits.

Versions

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.2 (default, Apr 8 2020, 14:31:25) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.6.5-arch3-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: de_DE.utf8 LOCALE: de_DE.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.15.2.dev47+g33a66d63 pandas: 1.0.3 numpy: 1.18.3 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: 0.7.4 h5py: 2.10.0 Nio: None zarr: None cftime: 1.1.1.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.12.0 distributed: 2.14.0 matplotlib: 3.2.1 cartopy: 0.17.0 seaborn: 0.10.0 numbagg: None pint: None setuptools: 46.1.3 pip: 20.0.2 conda: None pytest: 5.4.1 IPython: 7.13.0 sphinx: 3.0.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4015/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
484212164 MDExOlB1bGxSZXF1ZXN0MzEwMTQwMTI2 3244 Make argmin/max work lazy with dask ulijh 13190237 closed 0     7 2019-08-22T20:55:49Z 2019-10-25T11:53:42Z 2019-09-06T23:15:19Z CONTRIBUTOR   0 pydata/xarray/pulls/3244

As @shoyer pointed out in https://github.com/pydata/xarray/issues/3237, nanargmax/min from numpy or dask should be used when not working on object arrays. Also, nanargmin/max were added to the nputils module so they should be using bottleneck if available.

  • [x] Closes #3237
  • [x] Tests added
  • [x] Passes black . && mypy . && flake8
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3244/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
483280810 MDU6SXNzdWU0ODMyODA4MTA= 3237 ``argmax()`` causes dask to compute ulijh 13190237 closed 0     4 2019-08-21T08:41:20Z 2019-09-06T23:15:19Z 2019-09-06T23:15:19Z CONTRIBUTOR      

Problem Description

While digging for #2511 I found that da.argmax() causes compute on a dask array in nanargmax(a, axis=None): https://github.com/pydata/xarray/blob/131f6022763d35edf461b11857d7c4ec6630b19d/xarray/core/nanops.py#L120 I feel like this shouldn't be the case as da.max() and da.data.argmax() don't compute and it renders the laziness useless.

MCVE Code Sample

```python In [1]: import numpy as np
...: import dask
...: import xarray as xr

In [2]: class Scheduler:
...: """ From: https://stackoverflow.com/questions/53289286/ """
...:
...: def init(self, max_computes=0):
...: self.max_computes = max_computes
...: self.total_computes = 0
...:
...: def call(self, dsk, keys, kwargs):
...: self.total_computes += 1
...: if self.total_computes > self.max_computes:
...: raise RuntimeError(
...: "Too many dask computations were scheduled: {}".format(
...: self.total_computes
...: )
...: )
...: return dask.get(dsk, keys,
kwargs)
...:

In [3]: scheduler = Scheduler()

In [4]: with dask.config.set(scheduler=scheduler):
...: da = xr.DataArray(
...: np.random.rand(234).reshape((2, 3, 4)),
...: ).chunk(dict(dim_0=1))
...:
...: dim = da.dims[-1]
...:
...: dask_idcs = da.data.argmax(axis=-1) # Computes 0 times
...: print("Total number of computes: %d" % scheduler.total_computes)
...:
...: da.max(dim) # Computes 0 times
...: print("Total number of computes: %d" % scheduler.total_computes)
...:
...: da.argmax(dim) # Does compute
...: print("Total number of computes: %d" % scheduler.total_computes)
...:
Total number of computes: 0
Total number of computes: 0


RuntimeError Traceback (most recent call last)
<ipython-input-4-f95c8753dbe6> in <module>
12 print("Total number of computes: %d" % scheduler.total_computes)
13
---> 14 da.argmax(dim) # Does compute
15 print("Total number of computes: %d" % scheduler.total_computes)
16

~/src/xarray/xarray/core/common.py in wrapped_func(self, dim, axis, skipna, kwargs)
42 def wrapped_func(self, dim=None, axis=None, skipna=None,
kwargs):
43 return self.reduce(
---> 44 func, dim, axis, skipna=skipna, allow_lazy=True, **kwargs
45 )
46

~/src/xarray/xarray/core/dataarray.py in reduce(self, func, dim, axis, keep_attrs, keepdims, kwargs)
2120 """
2121
-> 2122 var = self.variable.reduce(func, dim, axis, keep_attrs, keepdims,
kwargs)
2123 return self._replace_maybe_drop_dims(var)
2124

~/src/xarray/xarray/core/variable.py in reduce(self, func, dim, axis, keep_attrs, keepdims, allow_lazy, kwargs)
1456 input_data = self.data if allow_lazy else self.values
1457 if axis is not None:
-> 1458 data = func(input_data, axis=axis,
kwargs)
1459 else:
1460 data = func(input_data, **kwargs)

~/src/xarray/xarray/core/duck_array_ops.py in f(values, axis, skipna, kwargs)
279
280 try:
--> 281 return func(values, axis=axis,
kwargs)
282 except AttributeError:
283 if isinstance(values, dask_array_type):

~/src/xarray/xarray/core/nanops.py in nanargmax(a, axis)
118 if mask is not None:
119 mask = mask.all(axis=axis)
--> 120 if mask.any():
121 raise ValueError("All-NaN slice encountered")
122 return res

/usr/lib/python3.7/site-packages/dask/array/core.py in bool(self)
1370 )
1371 else:
-> 1372 return bool(self.compute())
1373
1374 nonzero = bool # python 2

/usr/lib/python3.7/site-packages/dask/base.py in compute(self, kwargs)
173 dask.base.compute
174 """
--> 175 (result,) = compute(self, traverse=False,
kwargs) 176 return result
177

/usr/lib/python3.7/site-packages/dask/base.py in compute(args, kwargs)
444 keys = [x.dask_keys() for x in collections]
445 postcomputes = [x.dask_postcompute() for x in collections]
--> 446 results = schedule(dsk, keys,
kwargs)
447 return repack([f(r,
a) for r, (f, a) in zip(results, postcomputes)])
448
```

Expected Output

None of the methods should actually compute: ``` python Total number of computes: 0
Total number of computes: 0
Total number of computes: 0
````

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Jul 16 2019, 07:12:58) [GCC 9.1.0] python-bits: 64 OS: Linux OS-release: 5.2.9-arch1-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: None LANG: de_DE.utf8 LOCALE: de_DE.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.0 xarray: 0.12.3+63.g131f6022 pandas: 0.25.0 numpy: 1.17.0 scipy: 1.3.1 netCDF4: 1.5.1.2 pydap: None h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: 1.0.25 cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.1.0 distributed: 1.27.1 matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: None setuptools: 41.0.1 pip: 19.0.3 conda: None pytest: 5.0.1 IPython: 7.6.1 sphinx: 2.2.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3237/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
481110823 MDExOlB1bGxSZXF1ZXN0MzA3NjczNzgx 3221 Allow invalid_netcdf=True in to_netcdf() ulijh 13190237 closed 0     6 2019-08-15T11:32:56Z 2019-08-22T20:20:42Z 2019-08-22T20:12:11Z CONTRIBUTOR   0 pydata/xarray/pulls/3221

Hi all,

I prepared a little PR which could close #2243 and would allow for a IMO clean way of writing data with complex dtypes (and others).

What do you think?

TODOs:

  • [X] Closes #2243
  • [X] Tests added
  • [x] Passes black . && mypy . && flake8
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3221/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
260279615 MDU6SXNzdWUyNjAyNzk2MTU= 1591 indexing/groupby fails on array opened with chunks from netcdf ulijh 13190237 closed 0     2 2017-09-25T13:37:43Z 2017-09-26T08:15:45Z 2017-09-26T05:36:26Z CONTRIBUTOR      

Hi, since the last update of dask (to version 0.15.3), iterating over a groupby object and indexing using np.int64 fails, when the DataArray was opend with chunks from netcdf.

I'm using xarray version 0.9.6 and the 'h5netcdf'-engin for reading/writing.

To reproduce: ``` import xarray as xr arr = xr.DataArray(np.random.rand(2, 3, 4), dims=['one', 'two', 'three']) arr.to_netcdf('test.nc', engine='h5netcdf')

arr_disk = xr.open_dataarray('test.nc', engine='h5netcdf', chunks=dict(one=1))

This produces the error:

[g for g in arr_disk.groupby('one')]

/usr/lib/python3.6/site-packages/xarray/core/groupby.py in _iter_grouped(self) 296 """Iterate over each element in this group""" 297 for indices in self._group_indices: --> 298 yield self._obj.isel(**{self._group_dim: indices}) 299 300 def _infer_concat_args(self, applied_example):

/usr/lib/python3.6/site-packages/xarray/core/dataarray.py in isel(self, drop, indexers) 677 DataArray.sel 678 """ --> 679 ds = self._to_temp_dataset().isel(drop=drop, indexers) 680 return self._from_temp_dataset(ds) 681

/usr/lib/python3.6/site-packages/xarray/core/dataset.py in isel(self, drop, indexers) 1141 for name, var in iteritems(self._variables): 1142 var_indexers = dict((k, v) for k, v in indexers if k in var.dims) -> 1143 new_var = var.isel(var_indexers) 1144 if not (drop and name in var_indexers): 1145 variables[name] = new_var

/usr/lib/python3.6/site-packages/xarray/core/variable.py in isel(self, **indexers) 568 if dim in indexers: 569 key[i] = indexers[dim] --> 570 return self[tuple(key)] 571 572 def squeeze(self, dim=None):

/usr/lib/python3.6/site-packages/xarray/core/variable.py in getitem(self, key) 398 dims = tuple(dim for k, dim in zip(key, self.dims) 399 if not isinstance(k, integer_types)) --> 400 values = self._indexable_data[key] 401 # orthogonal indexing should ensure the dimensionality is consistent 402 if hasattr(values, 'ndim'):

/usr/lib/python3.6/site-packages/xarray/core/indexing.py in getitem(self, key) 496 value = value[(slice(None),) * axis + (subkey,)] 497 else: --> 498 value = self.array[key] 499 return value 500

/home/herter/.local/lib/python3.6/site-packages/dask/array/core.py in getitem(self, index) 1220 1221 from .slicing import normalize_index, slice_with_dask_array -> 1222 index2 = normalize_index(index, self.shape) 1223 1224 if any(isinstance(i, Array) for i in index2):

/home/herter/.local/lib/python3.6/site-packages/dask/array/slicing.py in normalize_index(idx, shape) 760 idx = idx + (slice(None),) * (len(shape) - n_sliced_dims) 761 if len([i for i in idx if i is not None]) > len(shape): --> 762 raise IndexError("Too many indices for array") 763 764 none_shape = []

IndexError: Too many indices for array

/home/herter/.local/lib/python3.6/site-packages/dask/array/slicing.py(762)normalize_index() 760 idx = idx + (slice(None),) * (len(shape) - n_sliced_dims) 761 if len([i for i in idx if i is not None]) > len(shape): --> 762 raise IndexError("Too many indices for array") 763 764 none_shape = [] ```

I'm getting the same error when doing arr_disk[np.int64(0)]

Thanks Uli

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1591/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 33.753ms · About: xarray-datasette