home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

5 rows where user = 1634164 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: author_association, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 3
  • pull 2

state 2

  • closed 3
  • open 2

repo 1

  • xarray 5
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1316423844 I_kwDOAMm_X85Odwik 6822 RuntimeError when formatting sparse-backed DataArray in f-string khaeru 1634164 closed 0     2 2022-07-25T07:58:11Z 2022-08-09T09:17:39Z 2022-08-08T15:11:35Z NONE      

What happened?

On upgrading from xarray 2022.3.0 to 2022.6.0, f-string formatting of sparse-backed DataArray raises an exception.

What did you expect to happen?

  • Code does not error, or
  • A breaking change is listed in the “Breaking changes” section of the docs.

Minimal Complete Verifiable Example

```Python import pandas as pd import xarray as xr

s = pd.Series( range(4), index=pd.MultiIndex.from_product([list("ab"), list("cd")]), )

da = xr.DataArray.from_series(s, sparse=True)

print(f"{da}") ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python

xarray 2022.3.0:

<xarray.DataArray (level_0: 2, level_1: 2)> <COO: shape=(2, 2), dtype=float64, nnz=4, fill_value=nan>
Coordinates:
* level_0 (level_0) object 'a' 'b' * level_1 (level_1) object 'c' 'd'

xarray 2022.6.0:

Traceback (most recent call last):
File "/home/khaeru/bug.py", line 11, in <module> print(f"{da}") File "/home/khaeru/.local/lib/python3.10/site-packages/xarray/core/common.py", line 168, in format
return self.values.format(format_spec) File "/home/khaeru/.local/lib/python3.10/site-packages/xarray/core/dataarray.py", line 685, in values
return self.variable.values File "/home/khaeru/.local/lib/python3.10/site-packages/xarray/core/variable.py", line 527, in values
return _as_array_or_item(self._data) File "/home/khaeru/.local/lib/python3.10/site-packages/xarray/core/variable.py", line 267, in _as_array_or_item
data = np.asarray(data) File "/home/khaeru/.local/lib/python3.10/site-packages/sparse/_sparse_array.py", line 229, in array
raise RuntimeError( RuntimeError: Cannot convert a sparse array to dense automatically. To manually densify, use the todense method. ```

Anything else we need to know?

Along with the versions below, I have confirmed the error occurs with both sparse 0.12 and sparse 0.13.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] python-bits: 64 OS: Linux OS-release: 5.15.0-41-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: ('en_CA', 'UTF-8') libhdf5: 1.10.7 libnetcdf: 4.8.1 xarray: 2022.6.0 pandas: 1.4.2 numpy: 1.22.4 scipy: 1.8.0 netCDF4: 1.5.8 pydap: None h5netcdf: 0.12.0 h5py: 3.6.0 Nio: None zarr: None cftime: 1.5.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2022.01.0+dfsg distributed: 2022.01.0+ds.1 matplotlib: 3.5.1 cartopy: 0.20.2 seaborn: 0.11.2 numbagg: None fsspec: 2022.01.0 cupy: None pint: 0.18 sparse: 0.13.0 flox: None numpy_groupies: None setuptools: 62.1.0 pip: 22.0.2 conda: None pytest: 6.2.5 IPython: 7.31.1 sphinx: 4.5.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6822/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
606846911 MDExOlB1bGxSZXF1ZXN0NDA4OTY0MTM3 4007 Allow DataArray.to_series() without invoking sparse.COO.todense() khaeru 1634164 open 0     1 2020-04-25T20:15:16Z 2022-06-09T14:50:17Z   FIRST_TIME_CONTRIBUTOR   0 pydata/xarray/pulls/4007

This adds some code (from iiasa/ixmp#317) that allows DataArray.to_series() to be called without invoking sparse.COO.todense() when that is the backing data type.

I'm aware this needs some improvement to meet the standard of the existing codebase, so I hope I could ask for some guidance on how to address the following points (including whom to ask about them): - [ ] Make the same improvement in {DataArray,Dataset}.to_dataframe(). - [ ] Possibly move the code out of dataarray.py to a more appropriate location (where?). - [ ] Possibly check for sparse.COO explicitly instead of xarray.core.pycompat.sparse_array_type. Other SparseArray subclasses, e.g. DOK, may not have the same attributes.

Standard items: - [ ] Tests added. - [x] Passes isort -rc . && black . && mypy . && flake8 (Sort of: these wanted to modify 7 files beyond the one I touched; didn't commit these changes.) - [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4007/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
503711327 MDU6SXNzdWU1MDM3MTEzMjc= 3381 concat() fails when args have sparse.COO data and different fill values khaeru 1634164 open 0     4 2019-10-07T21:54:06Z 2021-07-08T17:43:57Z   NONE      

MCVE Code Sample

```python import numpy as np import pandas as pd import sparse import xarray as xr

Indices and raw data

foo = [f'foo{i}' for i in range(6)] bar = [f'bar{i}' for i in range(6)] raw = np.random.rand(len(foo) // 2, len(bar))

DataArray

a = xr.DataArray( data=sparse.COO.from_numpy(raw), coords=[foo[:3], bar], dims=['foo', 'bar'])

print(a.data.fill_value) # 0.0

Created from a pd.Series

b_series = pd.DataFrame(raw, index=foo[3:], columns=bar) \ .stack() \ .rename_axis(index=['foo', 'bar']) b = xr.DataArray.from_series(b_series, sparse=True)

print(b.data.fill_value) # nan

Works despite inconsistent fill-values

a + b a * b

Fails: complains about inconsistent fill-values

xr.concat([a, b], dim='foo') # ***

The fill_value argument doesn't help

xr.concat([a, b], dim='foo', fill_value=np.nan)

def fill_value(da): """Try to coerce one argument to a consistent fill-value.""" return xr.DataArray( data=sparse.as_coo(da.data, fill_value=np.nan), coords=da.coords, dims=da.dims, name=da.name, attrs=da.attrs, )

Fails: "Cannot provide a fill-value in combination with something that

already has a fill-value"

print(xr.concat([a.pipe(fill_value), b], dim='foo'))

If we cheat by recreating 'a' from scratch, copying the fill value of the

intended other argument, it works again:

a = xr.DataArray( data=sparse.COO.from_numpy(raw, fill_value=b.data.fill_value), coords=[foo[:3], bar], dims=['foo', 'bar']) c = xr.concat([a, b], dim='foo')

print(c.data.fill_value) # nan

But simple operations again create objects with potentially incompatible

fill-values

d = c.sum(dim='bar') print(d.data.fill_value) # 0.0 ```

Expected

concat() can be used without having to create new objects; i.e. the line marked *** just works.

Problem Description

Some basic xarray manipulations don't work on sparse.COO-backed objects.

xarray should automatically coerce objects into a compatible state, or at least provide users with methods to do so. Behaviour should also be documented, e.g. in this instance, which operations (here, .sum()) modify the underlying storage format in ways that necessitate some kind of (re-)conversion.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, Aug 20 2019, 17:04:43) [GCC 8.3.0] python-bits: 64 OS: Linux OS-release: 5.0.0-32-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.13.0 pandas: 0.25.0 numpy: 1.17.2 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: 0.7.1 h5py: 2.8.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.1.0 distributed: None matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: None setuptools: 40.8.0 pip: 19.2.3 conda: None pytest: 5.0.1 IPython: 5.8.0 sphinx: 2.2.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3381/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
143764621 MDU6SXNzdWUxNDM3NjQ2MjE= 805 pd.Period can't be used as a 1-element coord khaeru 1634164 closed 0     5 2016-03-27T00:45:52Z 2016-12-24T00:09:48Z 2016-12-24T00:09:48Z NONE      

With xarray 0.7.2, following this basic example from the docs, but with a modification in the last line to use pd.Period instead of pd.Timestamp:

``` python import numpy as np import xarray as xr

temp = 15 + 8 * np.random.randn(2, 2, 3) precip = 10 * np.random.rand(2, 2, 3) lon = [[-99.83, -99.32], [-99.79, -99.23]] lat = [[42.25, 42.21], [42.63, 42.59]]

ds = xr.Dataset({'temperature': (['x', 'y', 'time'], temp), 'precipitation': (['x', 'y', 'time'], precip)}, coords={'lon': (['x', 'y'], lon), 'lat': (['x', 'y'], lat), 'time': pd.date_range('2014-09-06', periods=3), 'reference_time': pd.Period('2014')}) ```

This raises:

ValueError: dimensions ('reference_time',) must have the same length as the number of data dimensions, ndim=0

I noticed (#645) that there are other issues stemming from pandas' PeriodIndex & company, so if this is not a straightforward fix I will understand!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/805/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
70805273 MDExOlB1bGxSZXF1ZXN0MzQwODk5MDk= 401 Handle bool in NetCDF4 conversion khaeru 1634164 closed 0     9 2015-04-24T21:59:08Z 2016-05-26T18:51:06Z 2016-05-23T04:54:40Z FIRST_TIME_CONTRIBUTOR   0 pydata/xarray/pulls/401

I am working on some code that creates xray.Datasets with a 'bool' dtype.

Trying to call Dataset.to_netcdf() on this code causes _nc4_values_and_dtype() to raise a ValueError, so I added these few lines to force the storage of these variables as 1-byte integers.

Perhaps it should be 'u1' instead; I can change that if need be.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/401/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 25.287ms · About: xarray-datasette