id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 714905042,MDU6SXNzdWU3MTQ5MDUwNDI=,4486,Feature request: xr.concat: `stack` parameter,2129135,open,0,,,1,2020-10-05T14:43:40Z,2021-07-08T17:44:38Z,,NONE,,,," **Is your feature request related to a problem? Please describe.** In the case of dependent dimensions, there is a lot of missing data, and using a stacked layout is preferable. Composing an array using `concat` and then `stack` is not very memory efficient and results in NaN's that have to be removed. I am now composing an array using `concat` and then reshaping using `stack`. This can consume a lot of memory, and requires explicit removal of the NaN's after stacking. Having a `stack` parameter to `concat` that takes the desired index would be very useful. **Describe the solution you'd like** A `stack` parameter to `concat` that takes the desired index would be very useful. Initially it may just do the naive `concat` followed by a `stack` and removal of NaN, but eventually it should insert items correctly without creating NaN. **Describe alternatives you've considered** Composing an array using `concat` and then `stack` is not very memory efficient and results in NaN's that have to be removed. **Additional context** Issue related to `concat` and `stack` https://github.com/pydata/xarray/issues/981. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4486/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 788534915,MDU6SXNzdWU3ODg1MzQ5MTU=,4824,combine_by_coords can succed when it shouldn't,10194086,open,0,,,15,2021-01-18T20:39:29Z,2021-07-08T17:44:38Z,,MEMBER,,,,"**What happened**: `combine_by_coords` can succeed when it should not - depending on the name of the dimensions (which determines the order of operations in `combine_by_coords`). **What you expected to happen**: * I think it should throw an error in both cases. **Minimal Complete Verifiable Example**: ```python import numpy as np import xarray as xr data = np.arange(5).reshape(1, 5) x = np.arange(5) x_name = ""lat"" da0 = xr.DataArray(data, dims=(""t"", x_name), coords={""t"": [1], x_name: x}).to_dataset(name=""a"") x = x + 1e-6 da1 = xr.DataArray(data, dims=(""t"", x_name), coords={""t"": [2], x_name: x}).to_dataset(name=""a"") ds = xr.combine_by_coords((da0, da1)) ds ``` returns: ```python Dimensions: (lat: 10, t: 2) Coordinates: * lat (lat) float64 0.0 1e-06 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0 * t (t) int64 1 2 Data variables: a (t, lat) float64 0.0 nan 1.0 nan 2.0 nan ... 2.0 nan 3.0 nan 4.0 ``` Thus lat is interlaced - it don't think `combine_by_coords` should do this. If you set ```python x_name = ""lat"" ``` and run the example again, it returns: ```python-traceback ValueError: Resulting object does not have monotonic global indexes along dimension x ``` **Anything else we need to know?**: * this is vaguely related to #4077 but I think it is separate * `combine_by_coords` concatenates over all dimensions where the coords are different - therefore `compat=""override""` doesn't actually do anything? Or does it? https://github.com/pydata/xarray/blob/ba42c08af9afbd9e79d47bda404bf4a92a7314a0/xarray/core/combine.py#L69 cc @dcherian @TomNicholas **Environment**:
Output of xr.show_versions()
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4824/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 546303413,MDU6SXNzdWU1NDYzMDM0MTM=,3666,Raise nice error when attempting to concatenate CFTimeIndex & DatetimeIndex,22454970,open,0,,,9,2020-01-07T14:08:03Z,2021-07-08T17:43:58Z,,NONE,,,,"#### MCVE Code Sample ```python import subprocess import sys import wget import glob def install(package): subprocess.check_call([sys.executable, ""-m"", ""pip"", ""install"", package]) try: from xclim import ensembles except: install('xclim') from xclim import ensembles outdir = 'tmp' url = [] url.append('https://github.com/Ouranosinc/xclim/raw/master/tests/testdata/EnsembleStats/BCCAQv2+ANUSPLIN300_ACCESS1-0_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc') url.append('https://github.com/Ouranosinc/xclim/raw/master/tests/testdata/EnsembleStats/BCCAQv2+ANUSPLIN300_BNU-ESM_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc') url.append('https://github.com/Ouranosinc/xclim/raw/master/tests/testdata/EnsembleStats/BCCAQv2+ANUSPLIN300_CCSM4_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc') url.append('https://github.com/Ouranosinc/xclim/raw/master/tests/testdata/EnsembleStats/BCCAQv2+ANUSPLIN300_CCSM4_historical+rcp45_r2i1p1_1950-2100_tg_mean_YS.nc') for u in url: wget.download(u,out=outdir) datasets = glob.glob(f'{outdir}/*1950*.nc') ens1 = ensembles.create_ensemble(datasets) print(ens1) ``` #### Expected Output Following advice of @dcherian (https://github.com/Ouranosinc/xclim/issues/281#issue-508073942) we have started testing builds of ```xclim``` against the master branch as well as the current release: Using xarray 0.14.1 via pip the above code generates a concatenated dataset with new added dimension 'realization' #### Problem Description using xarray@master the ```xclim.ensembles.create_ensemble``` call gives the following error: ``` Traceback (most recent call last): File ""/home/travis/.PyCharmCE2019.3/config/scratches/scratch_26.py"", line 23, in ens1 = ensembles.create_ensemble(datasets) File ""/home/travis/github_xclim/xclim/xclim/ensembles.py"", line 83, in create_ensemble data = xr.concat(list1, dim=dim) File ""/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/xarray/core/concat.py"", line 135, in concat return f(objs, dim, data_vars, coords, compat, positions, fill_value, join) File ""/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/xarray/core/concat.py"", line 439, in _dataarray_concat join=join, File ""/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/xarray/core/concat.py"", line 303, in _dataset_concat *datasets, join=join, copy=False, exclude=[dim], fill_value=fill_value File ""/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/xarray/core/alignment.py"", line 298, in align index = joiner(matching_indexes) File ""/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/indexes/base.py"", line 2385, in __or__ return self.union(other) File ""/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/indexes/base.py"", line 2517, in union return self._union_incompatible_dtypes(other, sort=sort) File ""/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/indexes/base.py"", line 2436, in _union_incompatible_dtypes return Index.union(this, other, sort=sort).astype(object, copy=False) File ""/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/indexes/base.py"", line 2517, in union return self._union_incompatible_dtypes(other, sort=sort) .... File ""/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/indexes/base.py"", line 498, in __new__ return DatetimeIndex(subarr, copy=copy, name=name, **kwargs) File ""/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py"", line 334, in __new__ int_as_wall_time=True, File ""/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py"", line 446, in _from_sequence int_as_wall_time=int_as_wall_time, File ""/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py"", line 1854, in sequence_to_dt64ns data, copy = maybe_convert_dtype(data, copy) File ""/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py"", line 2060, in maybe_convert_dtype elif is_extension_type(data) and not is_datetime64tz_dtype(data): File ""/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/dtypes/common.py"", line 1734, in is_extension_type if is_categorical(arr): File ""/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/dtypes/common.py"", line 387, in is_categorical return isinstance(arr, ABCCategorical) or is_categorical_dtype(arr) File ""/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/dtypes/common.py"", line 708, in is_categorical_dtype return CategoricalDtype.is_dtype(arr_or_dtype) File ""/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/dtypes/base.py"", line 256, in is_dtype if isinstance(dtype, (ABCSeries, ABCIndexClass, ABCDataFrame, np.dtype)): File ""/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/dtypes/generic.py"", line 9, in _check return getattr(inst, attr, ""_typ"") in comp RecursionError: maximum recursion depth exceeded while calling a Python object ``` #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.7.5 (default, Oct 25 2019, 15:51:11) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-74-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.3 xarray: 0.14.1+37.gdb36c5c0 pandas: 0.25.3 numpy: 1.17.4 scipy: 1.3.1 netCDF4: 1.5.3 pydap: None h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.1 cfgrib: None iris: None bottleneck: 1.3.1 dask: 2.6.0 distributed: 2.6.0 matplotlib: 3.1.1 cartopy: None seaborn: None numbagg: None setuptools: 41.6.0.post20191030 pip: 19.3.1 conda: None pytest: 5.2.2 IPython: None sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3666/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 503711327,MDU6SXNzdWU1MDM3MTEzMjc=,3381,concat() fails when args have sparse.COO data and different fill values,1634164,open,0,,,4,2019-10-07T21:54:06Z,2021-07-08T17:43:57Z,,NONE,,,,"#### MCVE Code Sample ```python import numpy as np import pandas as pd import sparse import xarray as xr # Indices and raw data foo = [f'foo{i}' for i in range(6)] bar = [f'bar{i}' for i in range(6)] raw = np.random.rand(len(foo) // 2, len(bar)) # DataArray a = xr.DataArray( data=sparse.COO.from_numpy(raw), coords=[foo[:3], bar], dims=['foo', 'bar']) print(a.data.fill_value) # 0.0 # Created from a pd.Series b_series = pd.DataFrame(raw, index=foo[3:], columns=bar) \ .stack() \ .rename_axis(index=['foo', 'bar']) b = xr.DataArray.from_series(b_series, sparse=True) print(b.data.fill_value) # nan # Works despite inconsistent fill-values a + b a * b # Fails: complains about inconsistent fill-values # xr.concat([a, b], dim='foo') # *** # The fill_value argument doesn't help # xr.concat([a, b], dim='foo', fill_value=np.nan) def fill_value(da): """"""Try to coerce one argument to a consistent fill-value."""""" return xr.DataArray( data=sparse.as_coo(da.data, fill_value=np.nan), coords=da.coords, dims=da.dims, name=da.name, attrs=da.attrs, ) # Fails: ""Cannot provide a fill-value in combination with something that # already has a fill-value"" # print(xr.concat([a.pipe(fill_value), b], dim='foo')) # If we cheat by recreating 'a' from scratch, copying the fill value of the # intended other argument, it works again: a = xr.DataArray( data=sparse.COO.from_numpy(raw, fill_value=b.data.fill_value), coords=[foo[:3], bar], dims=['foo', 'bar']) c = xr.concat([a, b], dim='foo') print(c.data.fill_value) # nan # But simple operations again create objects with potentially incompatible # fill-values d = c.sum(dim='bar') print(d.data.fill_value) # 0.0 ``` #### Expected `concat()` can be used without having to create new objects; i.e. the line marked `***` just works. #### Problem Description Some basic xarray manipulations don't work on `sparse.COO`-backed objects. xarray should automatically coerce objects into a compatible state, or at least provide users with methods to do so. Behaviour should also be documented, e.g. in this instance, which operations (here, `.sum()`) modify the underlying storage format in ways that necessitate some kind of (re-)conversion. #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, Aug 20 2019, 17:04:43) [GCC 8.3.0] python-bits: 64 OS: Linux OS-release: 5.0.0-32-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.13.0 pandas: 0.25.0 numpy: 1.17.2 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: 0.7.1 h5py: 2.8.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.1.0 distributed: None matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: None setuptools: 40.8.0 pip: 19.2.3 conda: None pytest: 5.0.1 IPython: 5.8.0 sphinx: 2.2.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3381/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 512205079,MDU6SXNzdWU1MTIyMDUwNzk=,3445,Merge fails when sparse Dataset has overlapping dimension values,4605410,open,0,,,3,2019-10-24T22:08:12Z,2021-07-08T17:43:57Z,,NONE,,,,"Sparse numpy arrays used in a merge operation seem to fail under certain coordinate settings. for example, this works perfectly: ```python import xarray as xr import numpy as np data_array1 = xr.DataArray(data,name='default', dims=['source','receiver','time'], coords={'source':['X.1'], 'receiver':['X.2'], 'time':time}).to_dataset() data_array2 = xr.DataArray(data,name='default', dims=['source','receiver','time'], coords={'source':['X.2'], 'receiver':['X.1'], 'time':time}).to_dataset() dataset1 = xr.merge([data_array1,data_array2]) ``` But this raises an ```IndexError: Only indices with at most one iterable index are supported.``` from the sparse package: ```python import xarray as xr import numpy as np import sparse data = sparse.COO.from_numpy(np.random.uniform(-1,1,(1,1,100))) time = np.linspace(0,1,num=100) data_array1 = xr.DataArray(data,name='default', dims=['source','receiver','time'], coords={'source':['X.1'], 'receiver':['X.2'], 'time':time}).to_dataset() data_array2 = xr.DataArray(data,name='default', dims=['source','receiver','time'], coords={'source':['X.2'], 'receiver':['X.1'], 'time':time}).to_dataset() dataset1 = xr.merge([data_array1,data_array2]) ``` I have noticed this occurs when the merger would seem to add dimensions filled with nan values. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3445/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 544375718,MDU6SXNzdWU1NDQzNzU3MTg=,3659,Error concatenating Multiindex variables,14136435,open,0,,,1,2020-01-01T16:36:26Z,2021-07-08T17:43:57Z,,CONTRIBUTOR,,,,"#### MCVE Code Sample ```python >>> import xarray as xr >>> da = xr.DataArray([0, 1], dims=[""location""], coords={""lat"": (""location"", [10, 11]), ""lon"": (""location"", [20, 21])}).set_index(location=[""lat"", ""lon""]) >>> da2 = xr.DataArray([2, 3], dims=[""location""], coords={""lat"": (""location"", [12, 13]), ""lon"": (""location"", [22, 23])}).set_index(location=[""lat"", ""lon""]) >>> xr.concat([da[""location""], da2[""location""]], dim=""location"") Traceback (most recent call last): File """", line 1, in File ""/home/harry/code/xarray/xarray/core/concat.py"", line 135, in concat return f(objs, dim, data_vars, coords, compat, positions, fill_value, join) File ""/home/harry/code/xarray/xarray/core/concat.py"", line 431, in _dataarray_concat ds = _dataset_concat( File ""/home/harry/code/xarray/xarray/core/concat.py"", line 384, in _dataset_concat result = Dataset(result_vars, attrs=result_attrs) File ""/home/harry/code/xarray/xarray/core/dataset.py"", line 541, in __init__ variables, coord_names, dims, indexes = merge_data_and_coords( File ""/home/harry/code/xarray/xarray/core/merge.py"", line 466, in merge_data_and_coords return merge_core( File ""/home/harry/code/xarray/xarray/core/merge.py"", line 556, in merge_core assert_unique_multiindex_level_names(variables) File ""/home/harry/code/xarray/xarray/core/variable.py"", line 2363, in assert_unique_multiindex_level_names raise ValueError(""conflicting MultiIndex level name(s):\n%s"" % conflict_str) ValueError: conflicting MultiIndex level name(s): 'lat' (location), 'lat' () 'lon' (location), 'lon' () ``` #### Expected Output The output should be the same as first concatenating the DataArrays, then extracting the dimension location: ```python >>> xr.concat([da, da2], dim=""location"")[""location""] array([(10, 20), (11, 21), (12, 22), (13, 23)], dtype=object) Coordinates: * location (location) MultiIndex - lat (location) int64 10 11 12 13 - lon (location) int64 20 21 22 23 ``` #### Problem Description ```python >>> # da[""location""] looks like a normal DataArray >>> location = da[""location""] >>> location array([(10, 20), (11, 21)], dtype=object) Coordinates: * location (location) MultiIndex - lat (location) int64 10 11 - lon (location) int64 20 21 >>> # but in actual fact, the variable._data is a MultiIndex >>> location.variable._data PandasIndexAdapter(array=MultiIndex([(10, 20), (11, 21)], names=['lat', 'lon']), dtype=dtype('O')) ``` This is why an error is thrown: `variable.assert_unique_multiindex_level_names` gets passed two variables: `location.variable` (the DataArray data values), and also `location[""location""].variable` (the coordinate values), which are both MultiIndexes. #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: b3d3b4480b7fb63402eb6c02103bb8d6c7dbf93a python: 3.8.0 | packaged by conda-forge | (default, Nov 22 2019, 19:11:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.4.0-18362-Microsoft machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.14.1+36.gb3d3b44 pandas: 0.25.3 numpy: 1.18.0 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.9.1 distributed: 2.9.1 matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 42.0.2.post20191201 pip: 19.3.1 conda: None pytest: 5.3.2 IPython: None sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3659/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 489825483,MDU6SXNzdWU0ODk4MjU0ODM=,3281,"[proposal] concatenate by axis, ignore dimension names",1200058,open,0,,,4,2019-09-05T15:06:22Z,2021-07-08T17:42:53Z,,NONE,,,,"Hi, I wrote a helper function which allows to concatenate arrays like `xr.combine_nested` with the difference that it only supports `xr.DataArrays`, concatenates them by axis position similar to `np.concatenate` and overwrites all dimension names. I often need this to combine very different feature types. ```python from typing import Union, Tuple, List import numpy as np import xarray as xr def concat_by_axis( darrs: Union[List[xr.DataArray], Tuple[xr.DataArray]], dims: Union[List[str], Tuple[str]], axis: int = None, **kwargs ): """""" Concat arrays along some axis similar to `np.concatenate`. Automatically renames the dimensions to `dims`. Please note that this renaming happens by the axis position, therefore make sure to transpose all arrays to the correct dimension order. :param darrs: List or tuple of xr.DataArrays :param dims: The dimension names of the resulting array. Renames axes where necessary. :param axis: The axis which should be concatenated along :param kwargs: Additional arguments which will be passed to `xr.concat()` :return: Concatenated xr.DataArray with dimensions `dim`. """""" # Get depth of nested lists. Assumes `darrs` is correctly formatted as list of lists. if axis is None: axis = 0 l = darrs # while l is a list or tuple and contains elements: while isinstance(l, List) or isinstance(l, Tuple) and l: # increase depth by one axis -= 1 l = l[0] if axis == 0: raise ValueError(""`darrs` has to be a (possibly nested) list or tuple of xr.DataArrays!"") to_concat = list() for i, da in enumerate(darrs): # recursive call for nested arrays; # most inner call should have axis = -1, # most outer call should have axis = - depth_of_darrs if isinstance(da, list) or isinstance(da, tuple): da = concat_axis(da, dims=dims, axis=axis + 1, **kwargs) if not isinstance(da, xr.DataArray): raise ValueError(""Input %d must be a xr.DataArray"" % i) if len(da.dims) != len(dims): raise ValueError(""Input %d must have the same number of dimensions as specified in the `dims` argument!"" % i) # force-rename dimensions da = da.rename(dict(zip(da.dims, dims))) to_concat.append(da) return xr.concat(to_concat, dim=dims[axis], **kwargs) ``` Would it make sense to include this in xarray?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3281/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 494906646,MDU6SXNzdWU0OTQ5MDY2NDY=,3315,xr.combine_nested() fails when passed nested DataSets,10554254,open,0,,,8,2019-09-17T23:47:44Z,2021-07-08T17:42:53Z,,NONE,,,,"`xr.__version__ '0.13.0'` xr.combine_nested() works when passed a nested list of DataArray objects. ``` da1 = xr.DataArray(name=""a"", data=[[0]], dims=[""x"", ""y""]) da2 = xr.DataArray(name=""b"", data=[[1]], dims=[""x"", ""y""]) da3 = xr.DataArray(name=""a"", data=[[2]], dims=[""x"", ""y""]) da4 = xr.DataArray(name=""b"", data=[[3]], dims=[""x"", ""y""]) xr.combine_nested([[da1, da2], [da3, da4]], concat_dim=[""x"", ""y""]) ``` returns ``` array([[0, 1], [2, 3]]) Dimensions without coordinates: x, y ``` but fails if passed a nested list of DataSet objects. ``` ds1 = da1.to_dataset() ds2 = da2.to_dataset() ds3 = da3.to_dataset() ds4 = da4.to_dataset() xr.combine_nested([[ds1, ds2], [ds3, ds4]], concat_dim=[""x"", ""y""]) ``` returns ``` --------------------------------------------------------------------------- KeyError Traceback (most recent call last) in 3 ds3 = da3.to_dataset() 4 ds4 = da4.to_dataset() ----> 5 xr.combine_nested([[ds1, ds2], [ds3, ds4]], concat_dim=[""x"", ""y""]) ~/repos/contribute/xarray/xarray/core/combine.py in combine_nested(datasets, concat_dim, compat, data_vars, coords, fill_value, join) 462 ids=False, 463 fill_value=fill_value, --> 464 join=join, 465 ) 466 ~/repos/contribute/xarray/xarray/core/combine.py in _nested_combine(datasets, concat_dims, compat, data_vars, coords, ids, fill_value, join) 305 coords=coords, 306 fill_value=fill_value, --> 307 join=join, 308 ) 309 return combined ~/repos/contribute/xarray/xarray/core/combine.py in _combine_nd(combined_ids, concat_dims, data_vars, coords, compat, fill_value, join) 196 compat=compat, 197 fill_value=fill_value, --> 198 join=join, 199 ) 200 (combined_ds,) = combined_ids.values() ~/repos/contribute/xarray/xarray/core/combine.py in _combine_all_along_first_dim(combined_ids, dim, data_vars, coords, compat, fill_value, join) 218 datasets = combined_ids.values() 219 new_combined_ids[new_id] = _combine_1d( --> 220 datasets, dim, compat, data_vars, coords, fill_value, join 221 ) 222 return new_combined_ids ~/repos/contribute/xarray/xarray/core/combine.py in _combine_1d(datasets, concat_dim, compat, data_vars, coords, fill_value, join) 246 compat=compat, 247 fill_value=fill_value, --> 248 join=join, 249 ) 250 except ValueError as err: ~/repos/contribute/xarray/xarray/core/concat.py in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join) 131 ""objects, got %s"" % type(first_obj) 132 ) --> 133 return f(objs, dim, data_vars, coords, compat, positions, fill_value, join) 134 135 ~/repos/contribute/xarray/xarray/core/concat.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join) 363 for k in datasets[0].variables: 364 if k in concat_over: --> 365 vars = ensure_common_dims([ds.variables[k] for ds in datasets]) 366 combined = concat_vars(vars, dim, positions) 367 assert isinstance(combined, Variable) ~/repos/contribute/xarray/xarray/core/concat.py in (.0) 363 for k in datasets[0].variables: 364 if k in concat_over: --> 365 vars = ensure_common_dims([ds.variables[k] for ds in datasets]) 366 combined = concat_vars(vars, dim, positions) 367 assert isinstance(combined, Variable) ~/repos/contribute/xarray/xarray/core/utils.py in __getitem__(self, key) 383 384 def __getitem__(self, key: K) -> V: --> 385 return self.mapping[key] 386 387 def __iter__(self) -> Iterator[K]: KeyError: 'a' ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3315/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 329575874,MDU6SXNzdWUzMjk1NzU4NzQ=,2217,tolerance for alignment,31460695,open,0,,,23,2018-06-05T18:34:45Z,2021-07-08T17:42:52Z,,NONE,,,," When using open_mfdataset on files which 'should' share a grid, there is often a small mismatch which results in the grid not aligning properly. This happens frequently when trying to read data from large climate models from multiple files of the same variable, same lon,lat grid and different time intervals. This silent behavior means that I always have to check the sizes of the lon,lat grids whenever I rely on mfdataset to concatenate the data in time. Here is an example in which I create two 1d DataArrays which have slightly different coordinates: ```python import xarray as xr import numpy as np from glob import glob tol=1e-14 x1 = np.arange(1,6)+ tol*np.random.rand(5) da1 = xr.DataArray([9, 0, 2, 1, 0], dims=['x'], coords={'x': x1}) x2 = np.arange(1,6) + tol*np.random.rand(5) da2 = da1.copy() da2['x'] = x2 print(da1.x,'\n', da2.x) ``` ``` array([1., 2., 3., 4., 5.]) Coordinates: * x (x) float64 1.0 2.0 3.0 4.0 5.0 array([1., 2., 3., 4., 5.]) Coordinates: * x (x) float64 1.0 2.0 3.0 4.0 5.0 ``` First I save both DataArrays as netcdf files and then use open_mfdataset to load them: ``` da1.to_netcdf('da1.nc',encoding={'x':{'dtype':'float64'}}) da2.to_netcdf('da2.nc',encoding={'x':{'dtype':'float64'}}) db = xr.open_mfdataset(glob('da?.nc')) db ``` ``` Dimensions: (x: 10) Coordinates: * x (x) float64 1.0 2.0 3.0 4.0 5.0 1.0 2.0 ... Data variables: __xarray_dataarray_variable__ (x) int64 dask.array ``` So the x grid is now twice the size. This behavior is the same if I just use align with join='outer': ``` xr.align(da1,da2,join='outer') ``` ``` ( array([nan, 9., nan, 0., 2., nan, nan, 1., 0., nan]) Coordinates: * x (x) float64 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0 5.0 5.0, array([ 9., nan, 0., nan, nan, 2., 1., nan, nan, 0.]) Coordinates: * x (x) float64 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0 5.0 5.0) ``` #### Request/ suggestion What is needed is a user specified tolerance level to give to open_mfdataset and passed to align which will accept these grids as the same Possibly related to https://github.com/pydata/xarray/issues/2215
xr.__version__ '0.10.4'
thanks, Naomi","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2217/reactions"", ""total_count"": 10, ""+1"": 10, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 415802678,MDU6SXNzdWU0MTU4MDI2Nzg=,2796,Better explanation of 'minimal' in xarray.open_mfdataset(data_vars='minimal') in docs?,5704500,open,0,,,2,2019-02-28T20:11:42Z,2021-07-08T17:42:52Z,,NONE,,,,"#### Problem description I'm currently troubleshooting some overly long (to me) load times using open_mfdataset on GFS data. In trying to speed things up, I'm trying to specify just the four variables I actually care about using `data_vars=[strings]`, but to no avail. It still takes ~30 minutes to load 52 time slices from 7 files. In the [docs](http://xarray.pydata.org/en/stable/generated/xarray.open_mfdataset.html) I do see that if `data_vars = ` > list of str: ""The listed data variables will be concatenated, in addition to the ‘minimal’ data variables."" However, I can't seem to understand what the 'minimal' variables are from this sentence in the docs: > ‘minimal’: Only data variables in which the dimension already appears are included. All the variables in the CF-compliant GFS data are associated with dimensions. So does that mean that all the variables in the files will be concatenated, regardless if I specify which ones I want? I feel like I'm misunderstanding what is included by default.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2796/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 423749397,MDU6SXNzdWU0MjM3NDkzOTc=,2836,xarray.concat() with compat='identical' fails for DataArray attrs,2418513,open,0,,,9,2019-03-21T14:11:29Z,2021-07-08T17:42:52Z,,NONE,,,,"Not sure if it was ever supposed to work with numpy arrays, but it actually does :thinking:: ```python >>> attr = np.array([[3, 4]]) >>> d1 = xr.Dataset({'z': 1}, attrs={'y': attr}) >>> d2 = xr.Dataset({'z': 2}, attrs={'y': attr.copy()}) >>> xr.concat([d1, d2], dim='z', compat='identical') ``` However, it fails if you use DataArray attrs: ```python >>> attr = xr.DataArray([3, 4], {'x': [1, 2]}, 'x') >>> d1 = xr.Dataset({'z': 1}, attrs={'y': attr}) >>> d2 = xr.Dataset({'z': 2}, attrs={'y': attr.copy()}) >>> xr.concat([d1, d2], dim='z', compat='identical') ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ``` Given that the check is simply `(a is b) or (a == b)`, should it try to do something smarter for array-like attrs?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2836/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 446054247,MDU6SXNzdWU0NDYwNTQyNDc=,2975,Inconsistent/confusing behaviour when concatenating dimension coords,35968931,open,0,,,2,2019-05-20T11:01:37Z,2021-07-08T17:42:52Z,,MEMBER,,,,"I noticed that with multiple conflicting dimension coords then concat can give pretty weird/counterintuitive results, at least compared to what the documentation suggests they should give: ```python # Create two datasets with conflicting coordinates objs = [Dataset({'x': [0], 'y': [1]}), Dataset({'y': [0], 'x': [1]})] [ Dimensions: (x: 1, y: 1) Coordinates: * x (x) int64 0 * y (y) int64 1 Data variables: *empty*, Dimensions: (x: 1, y: 1) Coordinates: * y (y) int64 0 * x (x) int64 1 Data variables: *empty*] ``` ```python # Try to join along only 'x', # coords='minimal' so concatenate ""Only coordinates in which the dimension already appears"" concat(objs, dim='x', coords='minimal') Dimensions: (x: 2, y: 2) Coordinates: * y (y) int64 0 1 * x (x) int64 0 1 Data variables: *empty* # It's joined along x and y! ``` Based on my reading of the [docstring for concat](http://xarray.pydata.org/en/stable/generated/xarray.concat.html), I would have expected this to not attempt to concatenate y, because `coords='minimal'`, and instead to throw an error because 'y' is a ""non-concatenated variable"" whose values are not the same across datasets. Now let's try to get concat to broadcast 'y' across 'x': ```python # Try to join along only 'x' by setting coords='different' concat(objs, dim='x', coords='different') ``` Now as ""Data variables which are not equal (ignoring attributes) across all datasets are also concatenated"" then I would have expected 'y' to be concatenated across 'x', i.e. to add the 'x' dimension to the 'y' coord, i.e: ```python Dimensions: (x: 2, y: 1) Coordinates: * y (y, x) int64 1 0 * x (x) int64 0 1 Data variables: *empty* ``` But that's not what we get!: ``` Dimensions: (x: 2, y: 2) Coordinates: * y (y) int64 0 1 * x (x) int64 0 1 Data variables: *empty* ``` ### Same again but without dimension coords If we create the same sort of objects but the variables are data vars not coords, then everything behaves exactly as expected: ```python objs2 = [Dataset({'a': ('x', [0]), 'b': ('y', [1])}), Dataset({'a': ('x', [1]), 'b': ('y', [0])})] [ Dimensions: (x: 1, y: 1) Dimensions without coordinates: x, y Data variables: a (x) int64 0 b (y) int64 1, Dimensions: (x: 1, y: 1) Dimensions without coordinates: x, y Data variables: a (x) int64 1 b (y) int64 0] concat(objs2, dim='x', data_vars='minimal') ValueError: variable b not equal across datasets concat(objs2, dim='x', data_vars='different') Dimensions: (x: 2, y: 1) Dimensions without coordinates: x, y Data variables: a (x) int64 0 1 b (x, y) int64 1 0 ``` Also if you do the same again but with coordinates which are not dimension coords, i.e: ```python objs3 = [Dataset(coords={'a': ('x', [0]), 'b': ('y', [1])}), Dataset(coords={'a': ('x', [1]), 'b': ('y', [0])})] [ Dimensions: (x: 1, y: 1) Coordinates: a (x) int64 0 b (y) int64 1 Dimensions without coordinates: x, y Data variables: *empty*, Dimensions: (x: 1, y: 1) Coordinates: a (x) int64 1 b (y) int64 0 Dimensions without coordinates: x, y Data variables: *empty*] ``` then this again gives the expected concatenation behaviour. So this implies that the compatibility checks that are being done on the data vars are not being done on the coords, but only if they are dimension coordinates! Either this is not the desired behaviour or the concat docstring needs to be a lot clearer. If we agree that this is not the desired behaviour then I will have a look inside `concat` to work out why it's happening. EDIT: Presumably this has something to do with the ToDo in the code for `concat`: `# TODO: support concatenating scalar coordinates even if the concatenated dimension already exists`...","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2975/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 193294569,MDU6SXNzdWUxOTMyOTQ1Njk=,1151,Scalar coords vs. concat,6213168,open,0,,,11,2016-12-03T15:42:18Z,2021-07-08T17:42:18Z,,MEMBER,,,,"Why does this work: ``` >> import xarray >> a = xarray.DataArray([1, 2, 3], dims=['x'], coords={'y': 10}) >> b = xarray.DataArray([4, 5, 6], dims=['x']) >> a + b array([5, 7, 9]) Coordinates: y int64 10 ``` But this doesn't? ``` >> xarray.concat([a, b], dim='x') KeyError: 'y' ``` It doesn't seem coherent to me...","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1151/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 223231729,MDU6SXNzdWUyMjMyMzE3Mjk=,1379,xr.concat consuming too much resources,7799184,open,0,,,4,2017-04-20T23:33:52Z,2021-07-08T17:42:18Z,,CONTRIBUTOR,,,,"Hi, I am reading in several (~1000) small ascii files into Dataset objects and trying to concatenate them over one specific dimension but I eventually blow my memory up. The file glob is not huge (~700M, my computer has ~16G) and I can do it fine if I only read in the Datasets appending them to a list without concatenating them (my memory increases by 5% only or so by the time I had read them all). However, when trying to concatenate each file into one single Dataset upon reading over a loop, the processing speeds drastically reduce before I have read 10% of the files or so and my memory usage keeps going up until it eventually blows up before I read and concatenate 30% of these files (the screenshot below was taken before it blew up, the memory usage was under 20% by the start of the processing). I was wondering if this is expected, or if there something that could be improved to make that work more efficiently please. I'm changing my approach now by extracting numpy arrays from the individual Datasets, concatenating these numpy arrays and defining the final Dataset only at the end. Thanks. ![screenshot from 2017-04-21 11-14-27](https://cloud.githubusercontent.com/assets/7799184/25256452/e7cdd4b4-2684-11e7-9c27-e28c76317a77.png) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1379/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 471409673,MDU6SXNzdWU0NzE0MDk2NzM=,3158,Out of date docstring for concat_dim in open_mfdataset,17169544,open,0,,,3,2019-07-23T00:01:05Z,2021-07-08T17:40:45Z,,CONTRIBUTOR,,,,"In the `open_mfdataset` [docstring](https://github.com/pydata/xarray/blob/f7e299f665dada1934e0f53d4294902103dcec74/xarray/backends/api.py#L608): ``` concat_dim : str, or list of str, DataArray, Index or None, optional Dimensions to concatenate files along. You only need to provide this argument if any of the dimensions along which you want to concatenate is not a dimension in the original datasets, e.g., if you want to stack a collection of 2D arrays along a third dimension. ... ``` This is true for the default `combine='_old_auto'`, but when using `combine='nested'` it is required while it is not used by `combine='by_coords'`. It would be clearer to make that distinction here.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3158/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 496688781,MDU6SXNzdWU0OTY2ODg3ODE=,3330,Feature requests for DataArray.rolling,923438,closed,0,,,1,2019-09-21T18:58:21Z,2021-07-08T16:29:18Z,2021-07-08T16:29:18Z,NONE,,,,"In `DataArray.rolling` it would be really nice to have support for window sizes specified in the units of the dimension (esp. time). For example if `da` has dimensions ```(time, space, feature)``` with `time` as `DatetimeIndex` - then it should be possible specificy `da.rolling( time=pd.Timedelta( 100, 'D') )` as a valid window ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3330/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue