home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 2115621781

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2115621781 I_kwDOAMm_X85-GdOV 8696 🐛 compatibility issues with ArrayAPI and SparseAPI Protocols in `namedarray` 13301940 open 0     2 2024-02-02T19:27:07Z 2024-02-03T10:55:04Z   MEMBER      

What happened?

i'm experiencing compatibility issues when using _arrayfunction_or_api and _sparsearrayfunction_or_api with the sparse arrays with dtype=object. specifically, runtime checks using isinstance with these protocols are failing, despite the sparse array object appearing to meet the necessary criteria (attributes and methods).

What did you expect to happen?

i expected that since COO arrays from the sparse library provide the necessary attributes and methods, they would pass the isinstance checks with the defined protocols.

```python In [56]: from xarray.namedarray._typing import _arrayfunction_or_api, _sparsearrayfunc ...: tion_or_api

In [57]: import xarray as xr, sparse, numpy as np, sparse, pandas as pd ```

  • numeric dtypes work

```python In [58]: x = np.random.random((10))

In [59]: x[x < 0.9] = 0

In [60]: s = sparse.COO(x)

In [61]: isinstance(s, _arrayfunction_or_api) Out[61]: True

In [62]: s Out[62]: <COO: shape=(10,), dtype=float64, nnz=0, fill_value=0.0> ```

  • string dtypes work

```python In [63]: p = sparse.COO(np.array(['a', 'b']))

In [64]: p Out[64]: <COO: shape=(2,), dtype=<U1, nnz=2, fill_value=>

In [65]: isinstance(s, _arrayfunction_or_api) Out[65]: True ``` - object dtype doesn't work

```python In [66]: q = sparse.COO(np.array(['a', 'b']).astype(object))

In [67]: isinstance(s, _arrayfunction_or_api) Out[67]: True

In [68]: isinstance(q, _arrayfunction_or_api)

TypeError Traceback (most recent call last) File ~/mambaforge/envs/xarray-tests/lib/python3.9/site-packages/sparse/_umath.py:606, in _Elemwise._get_func_coords_data(self, mask) 605 try: --> 606 func_data = self.func(func_args, dtype=self.dtype, *self.kwargs) 607 except TypeError:

TypeError: real() got an unexpected keyword argument 'dtype'

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last) File ~/mambaforge/envs/xarray-tests/lib/python3.9/site-packages/sparse/_umath.py:611, in _Elemwise._get_func_coords_data(self, mask) 610 out = np.empty(func_args[0].shape, dtype=self.dtype) --> 611 func_data = self.func(func_args, out=out, *self.kwargs) 612 except TypeError:

TypeError: real() got an unexpected keyword argument 'out'

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last) Cell In[68], line 1 ----> 1 isinstance(q, _arrayfunction_or_api)

File ~/mambaforge/envs/xarray-tests/lib/python3.9/typing.py:1149, in _ProtocolMeta.instancecheck(cls, instance) 1147 return True 1148 if cls._is_protocol: -> 1149 if all(hasattr(instance, attr) and 1150 # All methods can be blocked by setting them to None. 1151 (not callable(getattr(cls, attr, None)) or 1152 getattr(instance, attr) is not None) 1153 for attr in _get_protocol_attrs(cls)): 1154 return True 1155 return super().instancecheck(instance)

File ~/mambaforge/envs/xarray-tests/lib/python3.9/typing.py:1149, in <genexpr>(.0) 1147 return True 1148 if cls._is_protocol: -> 1149 if all(hasattr(instance, attr) and 1150 # All methods can be blocked by setting them to None. 1151 (not callable(getattr(cls, attr, None)) or 1152 getattr(instance, attr) is not None) 1153 for attr in _get_protocol_attrs(cls)): 1154 return True 1155 return super().instancecheck(instance)

File ~/mambaforge/envs/xarray-tests/lib/python3.9/site-packages/sparse/_sparse_array.py:900, in SparseArray.real(self) 875 @property 876 def real(self): 877 """The real part of the array. 878 879 Examples (...) 898 numpy.real : NumPy equivalent function. 899 """ --> 900 return self.array_ufunc(np.real, "call", self)

File ~/mambaforge/envs/xarray-tests/lib/python3.9/site-packages/sparse/_sparse_array.py:340, in SparseArray.array_ufunc(self, ufunc, method, inputs, kwargs) 337 inputs = tuple(reversed(inputs_transformed)) 339 if method == "call": --> 340 result = elemwise(ufunc, inputs, kwargs) 341 elif method == "reduce": 342 result = SparseArray._reduce(ufunc, *inputs, kwargs)

File ~/mambaforge/envs/xarray-tests/lib/python3.9/site-packages/sparse/_umath.py:49, in elemwise(func, args, kwargs) 12 def elemwise(func, args, kwargs): 13 """ 14 Apply a function to any number of arguments. 15 (...) 46 it is necessary to convert Numpy arrays to :obj:COO objects. 47 """ ---> 49 return _Elemwise(func, *args, kwargs).get_result()

File ~/mambaforge/envs/xarray-tests/lib/python3.9/site-packages/sparse/_umath.py:480, in _Elemwise.get_result(self) 477 if not any(mask): 478 continue --> 480 r = self._get_func_coords_data(mask) 482 if r is not None: 483 coords_list.append(r[0])

File ~/mambaforge/envs/xarray-tests/lib/python3.9/site-packages/sparse/_umath.py:613, in _Elemwise._get_func_coords_data(self, mask) 611 func_data = self.func(func_args, out=out, self.kwargs) 612 except TypeError: --> 613 func_data = self.func(func_args, **self.kwargs).astype(self.dtype) 615 unmatched_mask = ~equivalent(func_data, self.fill_value) 617 if not unmatched_mask.any():

ValueError: invalid literal for int() with base 10: 'a'

In [69]: q Out[69]: <COO: shape=(2,), dtype=object, nnz=2, fill_value=0> ```

the failing case appears to be a well know issue

  • https://github.com/pydata/sparse/issues/104

Minimal Complete Verifiable Example

```Python In [69]: q Out[69]: <COO: shape=(2,), dtype=object, nnz=2, fill_value=0>

In [70]: n = xr.NamedArray(data=q, dims=['x']) ```

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

```Python In [71]: n.data Out[71]: <COO: shape=(2,), dtype=object, nnz=2, fill_value=0>

In [72]: n Out[72]: --------------------------------------------------------------------------- TypeError Traceback (most recent call last) File ~/mambaforge/envs/xarray-tests/lib/python3.9/site-packages/sparse/_umath.py:606, in _Elemwise._get_func_coords_data(self, mask) 605 try: --> 606 func_data = self.func(func_args, dtype=self.dtype, *self.kwargs) 607 except TypeError:

TypeError: real() got an unexpected keyword argument 'dtype'

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last) File ~/mambaforge/envs/xarray-tests/lib/python3.9/site-packages/sparse/_umath.py:611, in _Elemwise._get_func_coords_data(self, mask) 610 out = np.empty(func_args[0].shape, dtype=self.dtype) --> 611 func_data = self.func(func_args, out=out, *self.kwargs) 612 except TypeError:

TypeError: real() got an unexpected keyword argument 'out'

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last) File ~/mambaforge/envs/xarray-tests/lib/python3.9/site-packages/IPython/core/formatters.py:708, in PlainTextFormatter.call(self, obj) 701 stream = StringIO() 702 printer = pretty.RepresentationPrinter(stream, self.verbose, 703 self.max_width, self.newline, 704 max_seq_length=self.max_seq_length, 705 singleton_pprinters=self.singleton_printers, 706 type_pprinters=self.type_printers, 707 deferred_pprinters=self.deferred_printers) --> 708 printer.pretty(obj) 709 printer.flush() 710 return stream.getvalue()

File ~/mambaforge/envs/xarray-tests/lib/python3.9/site-packages/IPython/lib/pretty.py:410, in RepresentationPrinter.pretty(self, obj) 407 return meth(obj, self, cycle) 408 if cls is not object \ 409 and callable(cls.dict.get('repr')): --> 410 return _repr_pprint(obj, self, cycle) 412 return _default_pprint(obj, self, cycle) 413 finally:

File ~/mambaforge/envs/xarray-tests/lib/python3.9/site-packages/IPython/lib/pretty.py:778, in repr_pprint(obj, p, cycle) 776 """A pprint that just redirects to the normal repr function.""" 777 # Find newlines and replace them with p.break() --> 778 output = repr(obj) 779 lines = output.splitlines() 780 with p.group():

File ~/devel/pydata/xarray/xarray/namedarray/core.py:987, in NamedArray.repr(self) 986 def repr(self) -> str: --> 987 return formatting.array_repr(self)

File ~/mambaforge/envs/xarray-tests/lib/python3.9/reprlib.py:21, in recursive_repr.<locals>.decorating_function.<locals>.wrapper(self) 19 repr_running.add(key) 20 try: ---> 21 result = user_function(self) 22 finally: 23 repr_running.discard(key)

File ~/devel/pydata/xarray/xarray/core/formatting.py:665, in array_repr(arr) 658 name_str = "" 660 if ( 661 isinstance(arr, Variable) 662 or _get_boolean_with_default("display_expand_data", default=True) 663 or isinstance(arr.variable._data, MemoryCachedArray) 664 ): --> 665 data_repr = short_data_repr(arr) 666 else: 667 data_repr = inline_variable_array_repr(arr.variable, OPTIONS["display_width"])

File ~/devel/pydata/xarray/xarray/core/formatting.py:633, in short_data_repr(array) 631 if isinstance(array, np.ndarray): 632 return short_array_repr(array) --> 633 elif isinstance(internal_data, _arrayfunction_or_api): 634 return limit_lines(repr(array.data), limit=40) 635 elif getattr(array, "_in_memory", None):

File ~/mambaforge/envs/xarray-tests/lib/python3.9/typing.py:1149, in _ProtocolMeta.instancecheck(cls, instance) 1147 return True 1148 if cls._is_protocol: -> 1149 if all(hasattr(instance, attr) and 1150 # All methods can be blocked by setting them to None. 1151 (not callable(getattr(cls, attr, None)) or 1152 getattr(instance, attr) is not None) 1153 for attr in _get_protocol_attrs(cls)): 1154 return True 1155 return super().instancecheck(instance)

File ~/mambaforge/envs/xarray-tests/lib/python3.9/typing.py:1149, in <genexpr>(.0) 1147 return True 1148 if cls._is_protocol: -> 1149 if all(hasattr(instance, attr) and 1150 # All methods can be blocked by setting them to None. 1151 (not callable(getattr(cls, attr, None)) or 1152 getattr(instance, attr) is not None) 1153 for attr in _get_protocol_attrs(cls)): 1154 return True 1155 return super().instancecheck(instance)

File ~/mambaforge/envs/xarray-tests/lib/python3.9/site-packages/sparse/_sparse_array.py:900, in SparseArray.real(self) 875 @property 876 def real(self): 877 """The real part of the array. 878 879 Examples (...) 898 numpy.real : NumPy equivalent function. 899 """ --> 900 return self.array_ufunc(np.real, "call", self)

File ~/mambaforge/envs/xarray-tests/lib/python3.9/site-packages/sparse/_sparse_array.py:340, in SparseArray.array_ufunc(self, ufunc, method, inputs, kwargs) 337 inputs = tuple(reversed(inputs_transformed)) 339 if method == "call": --> 340 result = elemwise(ufunc, inputs, kwargs) 341 elif method == "reduce": 342 result = SparseArray._reduce(ufunc, *inputs, kwargs)

File ~/mambaforge/envs/xarray-tests/lib/python3.9/site-packages/sparse/_umath.py:49, in elemwise(func, args, kwargs) 12 def elemwise(func, args, kwargs): 13 """ 14 Apply a function to any number of arguments. 15 (...) 46 it is necessary to convert Numpy arrays to :obj:COO objects. 47 """ ---> 49 return _Elemwise(func, *args, kwargs).get_result()

File ~/mambaforge/envs/xarray-tests/lib/python3.9/site-packages/sparse/_umath.py:480, in _Elemwise.get_result(self) 477 if not any(mask): 478 continue --> 480 r = self._get_func_coords_data(mask) 482 if r is not None: 483 coords_list.append(r[0])

File ~/mambaforge/envs/xarray-tests/lib/python3.9/site-packages/sparse/_umath.py:613, in _Elemwise._get_func_coords_data(self, mask) 611 func_data = self.func(func_args, out=out, self.kwargs) 612 except TypeError: --> 613 func_data = self.func(func_args, **self.kwargs).astype(self.dtype) 615 unmatched_mask = ~equivalent(func_data, self.fill_value) 617 if not unmatched_mask.any():

ValueError: invalid literal for int() with base 10: 'a' ```

Anything else we need to know?

i was trying to replace instances of is_duck_array with the protocol runtime checks (as part of https://github.com/pydata/xarray/pull/8319), and i've come to a realization that these runtime checks are rigid to accommodate the diverse behaviors of different array types, and is_duck_array() the function-based approach might be more manageable.

@Illviljan, are there any changes that could be made to both protocols without making them too complex?

Environment

```python INSTALLED VERSIONS ------------------ commit: 541049f45edeb518a767cb3b23fa53f6045aa508 python: 3.9.18 | packaged by conda-forge | (main, Dec 23 2023, 16:35:41) [Clang 16.0.6 ] python-bits: 64 OS: Darwin OS-release: 23.2.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2 xarray: 2024.1.2.dev50+g78dec61f pandas: 2.2.0 numpy: 1.26.3 scipy: 1.12.0 netCDF4: 1.6.5 pydap: installed h5netcdf: 1.3.0 h5py: 3.10.0 Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: 1.4.1 iris: 3.7.0 bottleneck: 1.3.7 dask: 2024.1.1 distributed: 2024.1.1 matplotlib: 3.8.2 cartopy: 0.22.0 seaborn: 0.13.2 numbagg: 0.7.1 fsspec: 2023.12.2 cupy: None pint: 0.23 sparse: 0.15.1 flox: 0.9.0 numpy_groupies: 0.9.22 setuptools: 67.7.2 pip: 23.3.2 conda: None pytest: 8.0.0 mypy: 1.8.0 IPython: 8.14.0 sphinx: None ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8696/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 87.247ms · About: xarray-datasette