home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

3 rows where comments = 4, repo = 13221727 and "updated_at" is on date 2021-07-08 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: author_association, created_at (date), updated_at (date)

type 1

  • issue 3

state 1

  • open 3

repo 1

  • xarray · 3 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
503711327 MDU6SXNzdWU1MDM3MTEzMjc= 3381 concat() fails when args have sparse.COO data and different fill values khaeru 1634164 open 0     4 2019-10-07T21:54:06Z 2021-07-08T17:43:57Z   NONE      

MCVE Code Sample

```python import numpy as np import pandas as pd import sparse import xarray as xr

Indices and raw data

foo = [f'foo{i}' for i in range(6)] bar = [f'bar{i}' for i in range(6)] raw = np.random.rand(len(foo) // 2, len(bar))

DataArray

a = xr.DataArray( data=sparse.COO.from_numpy(raw), coords=[foo[:3], bar], dims=['foo', 'bar'])

print(a.data.fill_value) # 0.0

Created from a pd.Series

b_series = pd.DataFrame(raw, index=foo[3:], columns=bar) \ .stack() \ .rename_axis(index=['foo', 'bar']) b = xr.DataArray.from_series(b_series, sparse=True)

print(b.data.fill_value) # nan

Works despite inconsistent fill-values

a + b a * b

Fails: complains about inconsistent fill-values

xr.concat([a, b], dim='foo') # ***

The fill_value argument doesn't help

xr.concat([a, b], dim='foo', fill_value=np.nan)

def fill_value(da): """Try to coerce one argument to a consistent fill-value.""" return xr.DataArray( data=sparse.as_coo(da.data, fill_value=np.nan), coords=da.coords, dims=da.dims, name=da.name, attrs=da.attrs, )

Fails: "Cannot provide a fill-value in combination with something that

already has a fill-value"

print(xr.concat([a.pipe(fill_value), b], dim='foo'))

If we cheat by recreating 'a' from scratch, copying the fill value of the

intended other argument, it works again:

a = xr.DataArray( data=sparse.COO.from_numpy(raw, fill_value=b.data.fill_value), coords=[foo[:3], bar], dims=['foo', 'bar']) c = xr.concat([a, b], dim='foo')

print(c.data.fill_value) # nan

But simple operations again create objects with potentially incompatible

fill-values

d = c.sum(dim='bar') print(d.data.fill_value) # 0.0 ```

Expected

concat() can be used without having to create new objects; i.e. the line marked *** just works.

Problem Description

Some basic xarray manipulations don't work on sparse.COO-backed objects.

xarray should automatically coerce objects into a compatible state, or at least provide users with methods to do so. Behaviour should also be documented, e.g. in this instance, which operations (here, .sum()) modify the underlying storage format in ways that necessitate some kind of (re-)conversion.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, Aug 20 2019, 17:04:43) [GCC 8.3.0] python-bits: 64 OS: Linux OS-release: 5.0.0-32-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.13.0 pandas: 0.25.0 numpy: 1.17.2 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: 0.7.1 h5py: 2.8.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.1.0 distributed: None matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: None setuptools: 40.8.0 pip: 19.2.3 conda: None pytest: 5.0.1 IPython: 5.8.0 sphinx: 2.2.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3381/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
489825483 MDU6SXNzdWU0ODk4MjU0ODM= 3281 [proposal] concatenate by axis, ignore dimension names Hoeze 1200058 open 0     4 2019-09-05T15:06:22Z 2021-07-08T17:42:53Z   NONE      

Hi, I wrote a helper function which allows to concatenate arrays like xr.combine_nested with the difference that it only supports xr.DataArrays, concatenates them by axis position similar to np.concatenate and overwrites all dimension names.

I often need this to combine very different feature types.

```python from typing import Union, Tuple, List import numpy as np import xarray as xr

def concat_by_axis( darrs: Union[List[xr.DataArray], Tuple[xr.DataArray]], dims: Union[List[str], Tuple[str]], axis: int = None, **kwargs ): """ Concat arrays along some axis similar to np.concatenate. Automatically renames the dimensions to dims. Please note that this renaming happens by the axis position, therefore make sure to transpose all arrays to the correct dimension order.

:param darrs: List or tuple of xr.DataArrays
:param dims: The dimension names of the resulting array. Renames axes where necessary.
:param axis: The axis which should be concatenated along
:param kwargs: Additional arguments which will be passed to `xr.concat()`
:return: Concatenated xr.DataArray with dimensions `dim`.
"""

# Get depth of nested lists. Assumes `darrs` is correctly formatted as list of lists.
if axis is None:
    axis = 0
    l = darrs
    # while l is a list or tuple and contains elements:
    while isinstance(l, List) or isinstance(l, Tuple) and l:
        # increase depth by one
        axis -= 1
        l = l[0]
    if axis == 0:
        raise ValueError("`darrs` has to be a (possibly nested) list or tuple of xr.DataArrays!")

to_concat = list()
for i, da in enumerate(darrs):
    # recursive call for nested arrays;
    # most inner call should have axis = -1,
    # most outer call should have axis = - depth_of_darrs
    if isinstance(da, list) or isinstance(da, tuple):
        da = concat_axis(da, dims=dims, axis=axis + 1, **kwargs)

    if not isinstance(da, xr.DataArray):
        raise ValueError("Input %d must be a xr.DataArray" % i)
    if len(da.dims) != len(dims):
        raise ValueError("Input %d must have the same number of dimensions as specified in the `dims` argument!" % i)

    # force-rename dimensions
    da = da.rename(dict(zip(da.dims, dims)))

    to_concat.append(da)

return xr.concat(to_concat, dim=dims[axis], **kwargs)

```

Would it make sense to include this in xarray?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3281/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
223231729 MDU6SXNzdWUyMjMyMzE3Mjk= 1379 xr.concat consuming too much resources rafa-guedes 7799184 open 0     4 2017-04-20T23:33:52Z 2021-07-08T17:42:18Z   CONTRIBUTOR      

Hi, I am reading in several (~1000) small ascii files into Dataset objects and trying to concatenate them over one specific dimension but I eventually blow my memory up. The file glob is not huge (~700M, my computer has ~16G) and I can do it fine if I only read in the Datasets appending them to a list without concatenating them (my memory increases by 5% only or so by the time I had read them all).

However, when trying to concatenate each file into one single Dataset upon reading over a loop, the processing speeds drastically reduce before I have read 10% of the files or so and my memory usage keeps going up until it eventually blows up before I read and concatenate 30% of these files (the screenshot below was taken before it blew up, the memory usage was under 20% by the start of the processing).

I was wondering if this is expected, or if there something that could be improved to make that work more efficiently please. I'm changing my approach now by extracting numpy arrays from the individual Datasets, concatenating these numpy arrays and defining the final Dataset only at the end.

Thanks.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1379/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 522.794ms · About: xarray-datasette