home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

16 rows where repo = 13221727 and "updated_at" is on date 2021-07-08 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

These facets timed out: type

state 2

  • open 15
  • closed 1

repo 1

  • xarray · 16 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
714905042 MDU6SXNzdWU3MTQ5MDUwNDI= 4486 Feature request: xr.concat: `stack` parameter FRidh 2129135 open 0     1 2020-10-05T14:43:40Z 2021-07-08T17:44:38Z   NONE      

Is your feature request related to a problem? Please describe. In the case of dependent dimensions, there is a lot of missing data, and using a stacked layout is preferable. Composing an array using concat and then stack is not very memory efficient and results in NaN's that have to be removed.

I am now composing an array using concat and then reshaping using stack. This can consume a lot of memory, and requires explicit removal of the NaN's after stacking. Having a stack parameter to concat that takes the desired index would be very useful.

Describe the solution you'd like A stack parameter to concat that takes the desired index would be very useful.

Initially it may just do the naive concat followed by a stack and removal of NaN, but eventually it should insert items correctly without creating NaN.

Describe alternatives you've considered Composing an array using concat and then stack is not very memory efficient and results in NaN's that have to be removed.

Additional context Issue related to concat and stack https://github.com/pydata/xarray/issues/981.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4486/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
788534915 MDU6SXNzdWU3ODg1MzQ5MTU= 4824 combine_by_coords can succed when it shouldn't mathause 10194086 open 0     15 2021-01-18T20:39:29Z 2021-07-08T17:44:38Z   MEMBER      

What happened:

combine_by_coords can succeed when it should not - depending on the name of the dimensions (which determines the order of operations in combine_by_coords).

What you expected to happen:

  • I think it should throw an error in both cases.

Minimal Complete Verifiable Example:

```python import numpy as np import xarray as xr

data = np.arange(5).reshape(1, 5) x = np.arange(5) x_name = "lat"

da0 = xr.DataArray(data, dims=("t", x_name), coords={"t": [1], x_name: x}).to_dataset(name="a") x = x + 1e-6 da1 = xr.DataArray(data, dims=("t", x_name), coords={"t": [2], x_name: x}).to_dataset(name="a") ds = xr.combine_by_coords((da0, da1))

ds ```

returns: python <xarray.Dataset> Dimensions: (lat: 10, t: 2) Coordinates: * lat (lat) float64 0.0 1e-06 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0 * t (t) int64 1 2 Data variables: a (t, lat) float64 0.0 nan 1.0 nan 2.0 nan ... 2.0 nan 3.0 nan 4.0 Thus lat is interlaced - it don't think combine_by_coords should do this. If you set

python x_name = "lat" and run the example again, it returns:

```python-traceback ValueError: Resulting object does not have monotonic global indexes along dimension x

```

Anything else we need to know?:

  • this is vaguely related to #4077 but I think it is separate
  • combine_by_coords concatenates over all dimensions where the coords are different - therefore compat="override" doesn't actually do anything? Or does it?

https://github.com/pydata/xarray/blob/ba42c08af9afbd9e79d47bda404bf4a92a7314a0/xarray/core/combine.py#L69

cc @dcherian @TomNicholas

Environment:

Output of <tt>xr.show_versions()</tt>
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4824/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
546303413 MDU6SXNzdWU1NDYzMDM0MTM= 3666 Raise nice error when attempting to concatenate CFTimeIndex & DatetimeIndex tlogan2000 22454970 open 0     9 2020-01-07T14:08:03Z 2021-07-08T17:43:58Z   NONE      

MCVE Code Sample

```python import subprocess import sys import wget import glob

def install(package): subprocess.check_call([sys.executable, "-m", "pip", "install", package]) try: from xclim import ensembles except: install('xclim') from xclim import ensembles

outdir = 'tmp' url = [] url.append('https://github.com/Ouranosinc/xclim/raw/master/tests/testdata/EnsembleStats/BCCAQv2+ANUSPLIN300_ACCESS1-0_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc') url.append('https://github.com/Ouranosinc/xclim/raw/master/tests/testdata/EnsembleStats/BCCAQv2+ANUSPLIN300_BNU-ESM_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc') url.append('https://github.com/Ouranosinc/xclim/raw/master/tests/testdata/EnsembleStats/BCCAQv2+ANUSPLIN300_CCSM4_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc') url.append('https://github.com/Ouranosinc/xclim/raw/master/tests/testdata/EnsembleStats/BCCAQv2+ANUSPLIN300_CCSM4_historical+rcp45_r2i1p1_1950-2100_tg_mean_YS.nc') for u in url: wget.download(u,out=outdir) datasets = glob.glob(f'{outdir}/1950.nc') ens1 = ensembles.create_ensemble(datasets) print(ens1)

```

Expected Output

Following advice of @dcherian (https://github.com/Ouranosinc/xclim/issues/281#issue-508073942) we have started testing builds of xclim against the master branch as well as the current release:

Using xarray 0.14.1 via pip the above code generates a concatenated dataset with new added dimension 'realization'

Problem Description

using xarray@master the xclim.ensembles.create_ensemble call gives the following error:

Traceback (most recent call last): File "/home/travis/.PyCharmCE2019.3/config/scratches/scratch_26.py", line 23, in <module> ens1 = ensembles.create_ensemble(datasets) File "/home/travis/github_xclim/xclim/xclim/ensembles.py", line 83, in create_ensemble data = xr.concat(list1, dim=dim) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/xarray/core/concat.py", line 135, in concat return f(objs, dim, data_vars, coords, compat, positions, fill_value, join) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/xarray/core/concat.py", line 439, in _dataarray_concat join=join, File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/xarray/core/concat.py", line 303, in _dataset_concat *datasets, join=join, copy=False, exclude=[dim], fill_value=fill_value File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/xarray/core/alignment.py", line 298, in align index = joiner(matching_indexes) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2385, in __or__ return self.union(other) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2517, in union return self._union_incompatible_dtypes(other, sort=sort) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2436, in _union_incompatible_dtypes return Index.union(this, other, sort=sort).astype(object, copy=False) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2517, in union return self._union_incompatible_dtypes(other, sort=sort) .... File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 498, in __new__ return DatetimeIndex(subarr, copy=copy, name=name, **kwargs) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py", line 334, in __new__ int_as_wall_time=True, File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 446, in _from_sequence int_as_wall_time=int_as_wall_time, File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1854, in sequence_to_dt64ns data, copy = maybe_convert_dtype(data, copy) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 2060, in maybe_convert_dtype elif is_extension_type(data) and not is_datetime64tz_dtype(data): File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/dtypes/common.py", line 1734, in is_extension_type if is_categorical(arr): File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/dtypes/common.py", line 387, in is_categorical return isinstance(arr, ABCCategorical) or is_categorical_dtype(arr) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/dtypes/common.py", line 708, in is_categorical_dtype return CategoricalDtype.is_dtype(arr_or_dtype) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/dtypes/base.py", line 256, in is_dtype if isinstance(dtype, (ABCSeries, ABCIndexClass, ABCDataFrame, np.dtype)): File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/dtypes/generic.py", line 9, in _check return getattr(inst, attr, "_typ") in comp RecursionError: maximum recursion depth exceeded while calling a Python object

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.5 (default, Oct 25 2019, 15:51:11) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-74-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.3 xarray: 0.14.1+37.gdb36c5c0 pandas: 0.25.3 numpy: 1.17.4 scipy: 1.3.1 netCDF4: 1.5.3 pydap: None h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.1 cfgrib: None iris: None bottleneck: 1.3.1 dask: 2.6.0 distributed: 2.6.0 matplotlib: 3.1.1 cartopy: None seaborn: None numbagg: None setuptools: 41.6.0.post20191030 pip: 19.3.1 conda: None pytest: 5.2.2 IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3666/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
503711327 MDU6SXNzdWU1MDM3MTEzMjc= 3381 concat() fails when args have sparse.COO data and different fill values khaeru 1634164 open 0     4 2019-10-07T21:54:06Z 2021-07-08T17:43:57Z   NONE      

MCVE Code Sample

```python import numpy as np import pandas as pd import sparse import xarray as xr

Indices and raw data

foo = [f'foo{i}' for i in range(6)] bar = [f'bar{i}' for i in range(6)] raw = np.random.rand(len(foo) // 2, len(bar))

DataArray

a = xr.DataArray( data=sparse.COO.from_numpy(raw), coords=[foo[:3], bar], dims=['foo', 'bar'])

print(a.data.fill_value) # 0.0

Created from a pd.Series

b_series = pd.DataFrame(raw, index=foo[3:], columns=bar) \ .stack() \ .rename_axis(index=['foo', 'bar']) b = xr.DataArray.from_series(b_series, sparse=True)

print(b.data.fill_value) # nan

Works despite inconsistent fill-values

a + b a * b

Fails: complains about inconsistent fill-values

xr.concat([a, b], dim='foo') # ***

The fill_value argument doesn't help

xr.concat([a, b], dim='foo', fill_value=np.nan)

def fill_value(da): """Try to coerce one argument to a consistent fill-value.""" return xr.DataArray( data=sparse.as_coo(da.data, fill_value=np.nan), coords=da.coords, dims=da.dims, name=da.name, attrs=da.attrs, )

Fails: "Cannot provide a fill-value in combination with something that

already has a fill-value"

print(xr.concat([a.pipe(fill_value), b], dim='foo'))

If we cheat by recreating 'a' from scratch, copying the fill value of the

intended other argument, it works again:

a = xr.DataArray( data=sparse.COO.from_numpy(raw, fill_value=b.data.fill_value), coords=[foo[:3], bar], dims=['foo', 'bar']) c = xr.concat([a, b], dim='foo')

print(c.data.fill_value) # nan

But simple operations again create objects with potentially incompatible

fill-values

d = c.sum(dim='bar') print(d.data.fill_value) # 0.0 ```

Expected

concat() can be used without having to create new objects; i.e. the line marked *** just works.

Problem Description

Some basic xarray manipulations don't work on sparse.COO-backed objects.

xarray should automatically coerce objects into a compatible state, or at least provide users with methods to do so. Behaviour should also be documented, e.g. in this instance, which operations (here, .sum()) modify the underlying storage format in ways that necessitate some kind of (re-)conversion.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, Aug 20 2019, 17:04:43) [GCC 8.3.0] python-bits: 64 OS: Linux OS-release: 5.0.0-32-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.13.0 pandas: 0.25.0 numpy: 1.17.2 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: 0.7.1 h5py: 2.8.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.1.0 distributed: None matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: None setuptools: 40.8.0 pip: 19.2.3 conda: None pytest: 5.0.1 IPython: 5.8.0 sphinx: 2.2.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3381/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
512205079 MDU6SXNzdWU1MTIyMDUwNzk= 3445 Merge fails when sparse Dataset has overlapping dimension values k-a-mendoza 4605410 open 0     3 2019-10-24T22:08:12Z 2021-07-08T17:43:57Z   NONE      

Sparse numpy arrays used in a merge operation seem to fail under certain coordinate settings. for example, this works perfectly:

```python import xarray as xr import numpy as np

data_array1 = xr.DataArray(data,name='default', dims=['source','receiver','time'], coords={'source':['X.1'], 'receiver':['X.2'], 'time':time}).to_dataset() data_array2 = xr.DataArray(data,name='default', dims=['source','receiver','time'], coords={'source':['X.2'], 'receiver':['X.1'], 'time':time}).to_dataset()

dataset1 = xr.merge([data_array1,data_array2])

```

But this raises an IndexError: Only indices with at most one iterable index are supported. from the sparse package:

```python import xarray as xr import numpy as np import sparse

data = sparse.COO.from_numpy(np.random.uniform(-1,1,(1,1,100))) time = np.linspace(0,1,num=100)

data_array1 = xr.DataArray(data,name='default', dims=['source','receiver','time'], coords={'source':['X.1'], 'receiver':['X.2'], 'time':time}).to_dataset() data_array2 = xr.DataArray(data,name='default', dims=['source','receiver','time'], coords={'source':['X.2'], 'receiver':['X.1'], 'time':time}).to_dataset()

dataset1 = xr.merge([data_array1,data_array2]) ```

I have noticed this occurs when the merger would seem to add dimensions filled with nan values.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3445/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
544375718 MDU6SXNzdWU1NDQzNzU3MTg= 3659 Error concatenating Multiindex variables hazbottles 14136435 open 0     1 2020-01-01T16:36:26Z 2021-07-08T17:43:57Z   CONTRIBUTOR      

MCVE Code Sample

```python

import xarray as xr da = xr.DataArray([0, 1], dims=["location"], coords={"lat": ("location", [10, 11]), "lon": ("location", [20, 21])}).set_index(location=["lat", "lon"]) da2 = xr.DataArray([2, 3], dims=["location"], coords={"lat": ("location", [12, 13]), "lon": ("location", [22, 23])}).set_index(location=["lat", "lon"]) xr.concat([da["location"], da2["location"]], dim="location") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/harry/code/xarray/xarray/core/concat.py", line 135, in concat return f(objs, dim, data_vars, coords, compat, positions, fill_value, join) File "/home/harry/code/xarray/xarray/core/concat.py", line 431, in _dataarray_concat ds = _dataset_concat( File "/home/harry/code/xarray/xarray/core/concat.py", line 384, in _dataset_concat result = Dataset(result_vars, attrs=result_attrs) File "/home/harry/code/xarray/xarray/core/dataset.py", line 541, in init variables, coord_names, dims, indexes = merge_data_and_coords( File "/home/harry/code/xarray/xarray/core/merge.py", line 466, in merge_data_and_coords return merge_core( File "/home/harry/code/xarray/xarray/core/merge.py", line 556, in merge_core assert_unique_multiindex_level_names(variables) File "/home/harry/code/xarray/xarray/core/variable.py", line 2363, in assert_unique_multiindex_level_names raise ValueError("conflicting MultiIndex level name(s):\n%s" % conflict_str) ValueError: conflicting MultiIndex level name(s): 'lat' (location), 'lat' (<this-array>) 'lon' (location), 'lon' (<this-array>) ```

Expected Output

The output should be the same as first concatenating the DataArrays, then extracting the dimension location:

```python

xr.concat([da, da2], dim="location")["location"] <xarray.DataArray 'location' (location: 4)> array([(10, 20), (11, 21), (12, 22), (13, 23)], dtype=object) Coordinates: * location (location) MultiIndex - lat (location) int64 10 11 12 13 - lon (location) int64 20 21 22 23 ```

Problem Description

```python

da["location"] looks like a normal DataArray

location = da["location"] location <xarray.DataArray 'location' (location: 2)> array([(10, 20), (11, 21)], dtype=object) Coordinates: * location (location) MultiIndex - lat (location) int64 10 11 - lon (location) int64 20 21

but in actual fact, the variable._data is a MultiIndex

location.variable._data PandasIndexAdapter(array=MultiIndex([(10, 20), (11, 21)], names=['lat', 'lon']), dtype=dtype('O')) ```

This is why an error is thrown: variable.assert_unique_multiindex_level_names gets passed two variables: location.variable (the DataArray data values), and also location["location"].variable (the coordinate values), which are both MultiIndexes.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: b3d3b4480b7fb63402eb6c02103bb8d6c7dbf93a python: 3.8.0 | packaged by conda-forge | (default, Nov 22 2019, 19:11:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.4.0-18362-Microsoft machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.14.1+36.gb3d3b44 pandas: 0.25.3 numpy: 1.18.0 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.9.1 distributed: 2.9.1 matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 42.0.2.post20191201 pip: 19.3.1 conda: None pytest: 5.3.2 IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3659/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
489825483 MDU6SXNzdWU0ODk4MjU0ODM= 3281 [proposal] concatenate by axis, ignore dimension names Hoeze 1200058 open 0     4 2019-09-05T15:06:22Z 2021-07-08T17:42:53Z   NONE      

Hi, I wrote a helper function which allows to concatenate arrays like xr.combine_nested with the difference that it only supports xr.DataArrays, concatenates them by axis position similar to np.concatenate and overwrites all dimension names.

I often need this to combine very different feature types.

```python from typing import Union, Tuple, List import numpy as np import xarray as xr

def concat_by_axis( darrs: Union[List[xr.DataArray], Tuple[xr.DataArray]], dims: Union[List[str], Tuple[str]], axis: int = None, **kwargs ): """ Concat arrays along some axis similar to np.concatenate. Automatically renames the dimensions to dims. Please note that this renaming happens by the axis position, therefore make sure to transpose all arrays to the correct dimension order.

:param darrs: List or tuple of xr.DataArrays
:param dims: The dimension names of the resulting array. Renames axes where necessary.
:param axis: The axis which should be concatenated along
:param kwargs: Additional arguments which will be passed to `xr.concat()`
:return: Concatenated xr.DataArray with dimensions `dim`.
"""

# Get depth of nested lists. Assumes `darrs` is correctly formatted as list of lists.
if axis is None:
    axis = 0
    l = darrs
    # while l is a list or tuple and contains elements:
    while isinstance(l, List) or isinstance(l, Tuple) and l:
        # increase depth by one
        axis -= 1
        l = l[0]
    if axis == 0:
        raise ValueError("`darrs` has to be a (possibly nested) list or tuple of xr.DataArrays!")

to_concat = list()
for i, da in enumerate(darrs):
    # recursive call for nested arrays;
    # most inner call should have axis = -1,
    # most outer call should have axis = - depth_of_darrs
    if isinstance(da, list) or isinstance(da, tuple):
        da = concat_axis(da, dims=dims, axis=axis + 1, **kwargs)

    if not isinstance(da, xr.DataArray):
        raise ValueError("Input %d must be a xr.DataArray" % i)
    if len(da.dims) != len(dims):
        raise ValueError("Input %d must have the same number of dimensions as specified in the `dims` argument!" % i)

    # force-rename dimensions
    da = da.rename(dict(zip(da.dims, dims)))

    to_concat.append(da)

return xr.concat(to_concat, dim=dims[axis], **kwargs)

```

Would it make sense to include this in xarray?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3281/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
494906646 MDU6SXNzdWU0OTQ5MDY2NDY= 3315 xr.combine_nested() fails when passed nested DataSets friedrichknuth 10554254 open 0     8 2019-09-17T23:47:44Z 2021-07-08T17:42:53Z   NONE      

xr.__version__ '0.13.0'

xr.combine_nested() works when passed a nested list of DataArray objects. da1 = xr.DataArray(name="a", data=[[0]], dims=["x", "y"]) da2 = xr.DataArray(name="b", data=[[1]], dims=["x", "y"]) da3 = xr.DataArray(name="a", data=[[2]], dims=["x", "y"]) da4 = xr.DataArray(name="b", data=[[3]], dims=["x", "y"]) xr.combine_nested([[da1, da2], [da3, da4]], concat_dim=["x", "y"]) returns <xarray.DataArray 'a' (x: 2, y: 2)> array([[0, 1], [2, 3]]) Dimensions without coordinates: x, y but fails if passed a nested list of DataSet objects.

ds1 = da1.to_dataset() ds2 = da2.to_dataset() ds3 = da3.to_dataset() ds4 = da4.to_dataset() xr.combine_nested([[ds1, ds2], [ds3, ds4]], concat_dim=["x", "y"]) returns

```

KeyError Traceback (most recent call last) <ipython-input-8-c0035883fc68> in <module> 3 ds3 = da3.to_dataset() 4 ds4 = da4.to_dataset() ----> 5 xr.combine_nested([[ds1, ds2], [ds3, ds4]], concat_dim=["x", "y"])

~/repos/contribute/xarray/xarray/core/combine.py in combine_nested(datasets, concat_dim, compat, data_vars, coords, fill_value, join) 462 ids=False, 463 fill_value=fill_value, --> 464 join=join, 465 ) 466

~/repos/contribute/xarray/xarray/core/combine.py in _nested_combine(datasets, concat_dims, compat, data_vars, coords, ids, fill_value, join) 305 coords=coords, 306 fill_value=fill_value, --> 307 join=join, 308 ) 309 return combined

~/repos/contribute/xarray/xarray/core/combine.py in _combine_nd(combined_ids, concat_dims, data_vars, coords, compat, fill_value, join) 196 compat=compat, 197 fill_value=fill_value, --> 198 join=join, 199 ) 200 (combined_ds,) = combined_ids.values()

~/repos/contribute/xarray/xarray/core/combine.py in _combine_all_along_first_dim(combined_ids, dim, data_vars, coords, compat, fill_value, join) 218 datasets = combined_ids.values() 219 new_combined_ids[new_id] = _combine_1d( --> 220 datasets, dim, compat, data_vars, coords, fill_value, join 221 ) 222 return new_combined_ids

~/repos/contribute/xarray/xarray/core/combine.py in _combine_1d(datasets, concat_dim, compat, data_vars, coords, fill_value, join) 246 compat=compat, 247 fill_value=fill_value, --> 248 join=join, 249 ) 250 except ValueError as err:

~/repos/contribute/xarray/xarray/core/concat.py in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join) 131 "objects, got %s" % type(first_obj) 132 ) --> 133 return f(objs, dim, data_vars, coords, compat, positions, fill_value, join) 134 135

~/repos/contribute/xarray/xarray/core/concat.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join) 363 for k in datasets[0].variables: 364 if k in concat_over: --> 365 vars = ensure_common_dims([ds.variables[k] for ds in datasets]) 366 combined = concat_vars(vars, dim, positions) 367 assert isinstance(combined, Variable)

~/repos/contribute/xarray/xarray/core/concat.py in <listcomp>(.0) 363 for k in datasets[0].variables: 364 if k in concat_over: --> 365 vars = ensure_common_dims([ds.variables[k] for ds in datasets]) 366 combined = concat_vars(vars, dim, positions) 367 assert isinstance(combined, Variable)

~/repos/contribute/xarray/xarray/core/utils.py in getitem(self, key) 383 384 def getitem(self, key: K) -> V: --> 385 return self.mapping[key] 386 387 def iter(self) -> Iterator[K]:

KeyError: 'a' ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3315/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
329575874 MDU6SXNzdWUzMjk1NzU4NzQ= 2217 tolerance for alignment naomi-henderson 31460695 open 0     23 2018-06-05T18:34:45Z 2021-07-08T17:42:52Z   NONE      

When using open_mfdataset on files which 'should' share a grid, there is often a small mismatch which results in the grid not aligning properly. This happens frequently when trying to read data from large climate models from multiple files of the same variable, same lon,lat grid and different time intervals. This silent behavior means that I always have to check the sizes of the lon,lat grids whenever I rely on mfdataset to concatenate the data in time.

Here is an example in which I create two 1d DataArrays which have slightly different coordinates:

```python import xarray as xr import numpy as np from glob import glob

tol=1e-14 x1 = np.arange(1,6)+ tol*np.random.rand(5) da1 = xr.DataArray([9, 0, 2, 1, 0], dims=['x'], coords={'x': x1})

x2 = np.arange(1,6) + tol*np.random.rand(5) da2 = da1.copy() da2['x'] = x2

print(da1.x,'\n', da2.x) <xarray.DataArray 'x' (x: 5)> array([1., 2., 3., 4., 5.]) Coordinates: * x (x) float64 1.0 2.0 3.0 4.0 5.0 <xarray.DataArray 'x' (x: 5)> array([1., 2., 3., 4., 5.]) Coordinates: * x (x) float64 1.0 2.0 3.0 4.0 5.0 First I save both DataArrays as netcdf files and then use open_mfdataset to load them: da1.to_netcdf('da1.nc',encoding={'x':{'dtype':'float64'}}) da2.to_netcdf('da2.nc',encoding={'x':{'dtype':'float64'}})

db = xr.open_mfdataset(glob('da?.nc'))

db <xarray.Dataset> Dimensions: (x: 10) Coordinates: * x (x) float64 1.0 2.0 3.0 4.0 5.0 1.0 2.0 ... Data variables: xarray_dataarray_variable (x) int64 dask.array<shape=(10,), chunksize=(5,)> So the x grid is now twice the size. This behavior is the same if I just use align with join='outer': xr.align(da1,da2,join='outer') (<xarray.DataArray (x: 10)> array([nan, 9., nan, 0., 2., nan, nan, 1., 0., nan]) Coordinates: * x (x) float64 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0 5.0 5.0, <xarray.DataArray (x: 10)> array([ 9., nan, 0., nan, nan, 2., 1., nan, nan, 0.]) Coordinates: * x (x) float64 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0 5.0 5.0) ```

Request/ suggestion

What is needed is a user specified tolerance level to give to open_mfdataset and passed to align which will accept these grids as the same

Possibly related to https://github.com/pydata/xarray/issues/2215

xr.__version__ '0.10.4'

thanks, Naomi

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2217/reactions",
    "total_count": 10,
    "+1": 10,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
415802678 MDU6SXNzdWU0MTU4MDI2Nzg= 2796 Better explanation of 'minimal' in xarray.open_mfdataset(data_vars='minimal') in docs? wckoeppen 5704500 open 0     2 2019-02-28T20:11:42Z 2021-07-08T17:42:52Z   NONE      

Problem description

I'm currently troubleshooting some overly long (to me) load times using open_mfdataset on GFS data. In trying to speed things up, I'm trying to specify just the four variables I actually care about using data_vars=[strings], but to no avail. It still takes ~30 minutes to load 52 time slices from 7 files.

In the docs I do see that if data_vars =

list of str: "The listed data variables will be concatenated, in addition to the ‘minimal’ data variables."

However, I can't seem to understand what the 'minimal' variables are from this sentence in the docs:

‘minimal’: Only data variables in which the dimension already appears are included.

All the variables in the CF-compliant GFS data are associated with dimensions. So does that mean that all the variables in the files will be concatenated, regardless if I specify which ones I want? I feel like I'm misunderstanding what is included by default.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2796/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
423749397 MDU6SXNzdWU0MjM3NDkzOTc= 2836 xarray.concat() with compat='identical' fails for DataArray attrs aldanor 2418513 open 0     9 2019-03-21T14:11:29Z 2021-07-08T17:42:52Z   NONE      

Not sure if it was ever supposed to work with numpy arrays, but it actually does :thinking::

```python

attr = np.array([[3, 4]]) d1 = xr.Dataset({'z': 1}, attrs={'y': attr}) d2 = xr.Dataset({'z': 2}, attrs={'y': attr.copy()}) xr.concat([d1, d2], dim='z', compat='identical') ```

However, it fails if you use DataArray attrs:

```python

attr = xr.DataArray([3, 4], {'x': [1, 2]}, 'x') d1 = xr.Dataset({'z': 1}, attrs={'y': attr}) d2 = xr.Dataset({'z': 2}, attrs={'y': attr.copy()}) xr.concat([d1, d2], dim='z', compat='identical') ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ```

Given that the check is simply (a is b) or (a == b), should it try to do something smarter for array-like attrs?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2836/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
446054247 MDU6SXNzdWU0NDYwNTQyNDc= 2975 Inconsistent/confusing behaviour when concatenating dimension coords TomNicholas 35968931 open 0     2 2019-05-20T11:01:37Z 2021-07-08T17:42:52Z   MEMBER      

I noticed that with multiple conflicting dimension coords then concat can give pretty weird/counterintuitive results, at least compared to what the documentation suggests they should give:

```python

Create two datasets with conflicting coordinates

objs = [Dataset({'x': [0], 'y': [1]}), Dataset({'y': [0], 'x': [1]})]

[<xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: * x (x) int64 0 * y (y) int64 1 Data variables: empty, <xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: * y (y) int64 0 * x (x) int64 1 Data variables: empty] ```

```python

Try to join along only 'x',

coords='minimal' so concatenate "Only coordinates in which the dimension already appears"

concat(objs, dim='x', coords='minimal')

<xarray.Dataset> Dimensions: (x: 2, y: 2) Coordinates: * y (y) int64 0 1 * x (x) int64 0 1 Data variables: empty

It's joined along x and y!

```

Based on my reading of the docstring for concat, I would have expected this to not attempt to concatenate y, because coords='minimal', and instead to throw an error because 'y' is a "non-concatenated variable" whose values are not the same across datasets.

Now let's try to get concat to broadcast 'y' across 'x':

```python

Try to join along only 'x' by setting coords='different'

concat(objs, dim='x', coords='different') ```

Now as "Data variables which are not equal (ignoring attributes) across all datasets are also concatenated" then I would have expected 'y' to be concatenated across 'x', i.e. to add the 'x' dimension to the 'y' coord, i.e:

python <xarray.Dataset> Dimensions: (x: 2, y: 1) Coordinates: * y (y, x) int64 1 0 * x (x) int64 0 1 Data variables: *empty* But that's not what we get!: <xarray.Dataset> Dimensions: (x: 2, y: 2) Coordinates: * y (y) int64 0 1 * x (x) int64 0 1 Data variables: *empty*

Same again but without dimension coords

If we create the same sort of objects but the variables are data vars not coords, then everything behaves exactly as expected:

```python objs2 = [Dataset({'a': ('x', [0]), 'b': ('y', [1])}), Dataset({'a': ('x', [1]), 'b': ('y', [0])})]

[<xarray.Dataset> Dimensions: (x: 1, y: 1) Dimensions without coordinates: x, y Data variables: a (x) int64 0 b (y) int64 1, <xarray.Dataset> Dimensions: (x: 1, y: 1) Dimensions without coordinates: x, y Data variables: a (x) int64 1 b (y) int64 0]

concat(objs2, dim='x', data_vars='minimal')

ValueError: variable b not equal across datasets

concat(objs2, dim='x', data_vars='different')

<xarray.Dataset> Dimensions: (x: 2, y: 1) Dimensions without coordinates: x, y Data variables: a (x) int64 0 1 b (x, y) int64 1 0 ```

Also if you do the same again but with coordinates which are not dimension coords, i.e:

```python objs3 = [Dataset(coords={'a': ('x', [0]), 'b': ('y', [1])}), Dataset(coords={'a': ('x', [1]), 'b': ('y', [0])})]

[<xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: a (x) int64 0 b (y) int64 1 Dimensions without coordinates: x, y Data variables: empty, <xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: a (x) int64 1 b (y) int64 0 Dimensions without coordinates: x, y Data variables: empty] ``` then this again gives the expected concatenation behaviour.

So this implies that the compatibility checks that are being done on the data vars are not being done on the coords, but only if they are dimension coordinates!

Either this is not the desired behaviour or the concat docstring needs to be a lot clearer. If we agree that this is not the desired behaviour then I will have a look inside concat to work out why it's happening.

EDIT: Presumably this has something to do with the ToDo in the code for concat: # TODO: support concatenating scalar coordinates even if the concatenated dimension already exists...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2975/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
193294569 MDU6SXNzdWUxOTMyOTQ1Njk= 1151 Scalar coords vs. concat crusaderky 6213168 open 0     11 2016-12-03T15:42:18Z 2021-07-08T17:42:18Z   MEMBER      

Why does this work: ```

import xarray a = xarray.DataArray([1, 2, 3], dims=['x'], coords={'y': 10}) b = xarray.DataArray([4, 5, 6], dims=['x']) a + b <xarray.DataArray (x: 3)> array([5, 7, 9]) Coordinates: y int64 10 But this doesn't? xarray.concat([a, b], dim='x') KeyError: 'y' ``` It doesn't seem coherent to me...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1151/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
223231729 MDU6SXNzdWUyMjMyMzE3Mjk= 1379 xr.concat consuming too much resources rafa-guedes 7799184 open 0     4 2017-04-20T23:33:52Z 2021-07-08T17:42:18Z   CONTRIBUTOR      

Hi, I am reading in several (~1000) small ascii files into Dataset objects and trying to concatenate them over one specific dimension but I eventually blow my memory up. The file glob is not huge (~700M, my computer has ~16G) and I can do it fine if I only read in the Datasets appending them to a list without concatenating them (my memory increases by 5% only or so by the time I had read them all).

However, when trying to concatenate each file into one single Dataset upon reading over a loop, the processing speeds drastically reduce before I have read 10% of the files or so and my memory usage keeps going up until it eventually blows up before I read and concatenate 30% of these files (the screenshot below was taken before it blew up, the memory usage was under 20% by the start of the processing).

I was wondering if this is expected, or if there something that could be improved to make that work more efficiently please. I'm changing my approach now by extracting numpy arrays from the individual Datasets, concatenating these numpy arrays and defining the final Dataset only at the end.

Thanks.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1379/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
471409673 MDU6SXNzdWU0NzE0MDk2NzM= 3158 Out of date docstring for concat_dim in open_mfdataset zdgriffith 17169544 open 0     3 2019-07-23T00:01:05Z 2021-07-08T17:40:45Z   CONTRIBUTOR      

In the open_mfdataset docstring:

concat_dim : str, or list of str, DataArray, Index or None, optional Dimensions to concatenate files along. You only need to provide this argument if any of the dimensions along which you want to concatenate is not a dimension in the original datasets, e.g., if you want to stack a collection of 2D arrays along a third dimension. ...

This is true for the default combine='_old_auto', but when using combine='nested' it is required while it is not used by combine='by_coords'. It would be clearer to make that distinction here.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3158/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
496688781 MDU6SXNzdWU0OTY2ODg3ODE= 3330 Feature requests for DataArray.rolling fjanoos 923438 closed 0     1 2019-09-21T18:58:21Z 2021-07-08T16:29:18Z 2021-07-08T16:29:18Z NONE      

In DataArray.rolling it would be really nice to have support for window sizes specified in the units of the dimension (esp. time). For example if da has dimensions (time, space, feature) with time as DatetimeIndex - then it should be possible specificy da.rolling( time=pd.Timedelta( 100, 'D') ) as a valid window

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3330/reactions",
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 5815.817ms · About: xarray-datasette