github: issues: 16 rows where repo = 13221727 and "updated_at" is on date 2021-07-08 sorted by updated

16 rows where repo = 13221727 and "updated_at" is on date 2021-07-08 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
714905042	MDU6SXNzdWU3MTQ5MDUwNDI=	4486	Feature request: xr.concat: `stack` parameter	FRidh 2129135	open	1	2020-10-05T14:43:40Z	2021-07-08T17:44:38Z		NONE	Is your feature request related to a problem? Please describe. In the case of dependent dimensions, there is a lot of missing data, and using a stacked layout is preferable. Composing an array using `concat` and then `stack` is not very memory efficient and results in NaN's that have to be removed. I am now composing an array using `concat` and then reshaping using `stack`. This can consume a lot of memory, and requires explicit removal of the NaN's after stacking. Having a `stack` parameter to `concat` that takes the desired index would be very useful. Describe the solution you'd like A `stack` parameter to `concat` that takes the desired index would be very useful. Initially it may just do the naive `concat` followed by a `stack` and removal of NaN, but eventually it should insert items correctly without creating NaN. Describe alternatives you've considered Composing an array using `concat` and then `stack` is not very memory efficient and results in NaN's that have to be removed. Additional context Issue related to `concat` and `stack` https://github.com/pydata/xarray/issues/981.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4486/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
788534915	MDU6SXNzdWU3ODg1MzQ5MTU=	4824	combine_by_coords can succed when it shouldn't	mathause 10194086	open	15	2021-01-18T20:39:29Z	2021-07-08T17:44:38Z		MEMBER	What happened: `combine_by_coords` can succeed when it should not - depending on the name of the dimensions (which determines the order of operations in `combine_by_coords`). What you expected to happen: I think it should throw an error in both cases. Minimal Complete Verifiable Example: ```python import numpy as np import xarray as xr data = np.arange(5).reshape(1, 5) x = np.arange(5) x_name = "lat" da0 = xr.DataArray(data, dims=("t", x_name), coords={"t": [1], x_name: x}).to_dataset(name="a") x = x + 1e-6 da1 = xr.DataArray(data, dims=("t", x_name), coords={"t": [2], x_name: x}).to_dataset(name="a") ds = xr.combine_by_coords((da0, da1)) ds ``` returns: `python <xarray.Dataset> Dimensions: (lat: 10, t: 2) Coordinates: * lat (lat) float64 0.0 1e-06 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0 * t (t) int64 1 2 Data variables: a (t, lat) float64 0.0 nan 1.0 nan 2.0 nan ... 2.0 nan 3.0 nan 4.0` Thus lat is interlaced - it don't think `combine_by_coords` should do this. If you set `python x_name = "lat"` and run the example again, it returns: ```python-traceback ValueError: Resulting object does not have monotonic global indexes along dimension x ``` Anything else we need to know?: this is vaguely related to #4077 but I think it is separate `combine_by_coords` concatenates over all dimensions where the coords are different - therefore `compat="override"` doesn't actually do anything? Or does it? https://github.com/pydata/xarray/blob/ba42c08af9afbd9e79d47bda404bf4a92a7314a0/xarray/core/combine.py#L69 cc @dcherian @TomNicholas Environment: Output of <tt>xr.show_versions()</tt>	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4824/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
546303413	MDU6SXNzdWU1NDYzMDM0MTM=	3666	Raise nice error when attempting to concatenate CFTimeIndex & DatetimeIndex	tlogan2000 22454970	open	9	2020-01-07T14:08:03Z	2021-07-08T17:43:58Z		NONE	MCVE Code Sample ```python import subprocess import sys import wget import glob def install(package): subprocess.check_call([sys.executable, "-m", "pip", "install", package]) try: from xclim import ensembles except: install('xclim') from xclim import ensembles outdir = 'tmp' url = [] url.append('https://github.com/Ouranosinc/xclim/raw/master/tests/testdata/EnsembleStats/BCCAQv2+ANUSPLIN300_ACCESS1-0_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc') url.append('https://github.com/Ouranosinc/xclim/raw/master/tests/testdata/EnsembleStats/BCCAQv2+ANUSPLIN300_BNU-ESM_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc') url.append('https://github.com/Ouranosinc/xclim/raw/master/tests/testdata/EnsembleStats/BCCAQv2+ANUSPLIN300_CCSM4_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc') url.append('https://github.com/Ouranosinc/xclim/raw/master/tests/testdata/EnsembleStats/BCCAQv2+ANUSPLIN300_CCSM4_historical+rcp45_r2i1p1_1950-2100_tg_mean_YS.nc') for u in url: wget.download(u,out=outdir) datasets = glob.glob(f'{outdir}/1950.nc') ens1 = ensembles.create_ensemble(datasets) print(ens1) ``` Expected Output Following advice of @dcherian (https://github.com/Ouranosinc/xclim/issues/281#issue-508073942) we have started testing builds of `xclim` against the master branch as well as the current release: Using xarray 0.14.1 via pip the above code generates a concatenated dataset with new added dimension 'realization' Problem Description using xarray@master the `xclim.ensembles.create_ensemble` call gives the following error: Traceback (most recent call last): File "/home/travis/.PyCharmCE2019.3/config/scratches/scratch_26.py", line 23, in <module> ens1 = ensembles.create_ensemble(datasets) File "/home/travis/github_xclim/xclim/xclim/ensembles.py", line 83, in create_ensemble data = xr.concat(list1, dim=dim) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/xarray/core/concat.py", line 135, in concat return f(objs, dim, data_vars, coords, compat, positions, fill_value, join) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/xarray/core/concat.py", line 439, in _dataarray_concat join=join, File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/xarray/core/concat.py", line 303, in _dataset_concat datasets, join=join, copy=False, exclude=[dim], fill_value=fill_value File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/xarray/core/alignment.py", line 298, in align index = joiner(matching_indexes) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2385, in __or__ return self.union(other) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2517, in union return self._union_incompatible_dtypes(other, sort=sort) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2436, in _union_incompatible_dtypes return Index.union(this, other, sort=sort).astype(object, copy=False) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2517, in union return self._union_incompatible_dtypes(other, sort=sort) .... File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 498, in __new__ return DatetimeIndex(subarr, copy=copy, name=name, *kwargs) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py", line 334, in __new__ int_as_wall_time=True, File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 446, in _from_sequence int_as_wall_time=int_as_wall_time, File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1854, in sequence_to_dt64ns data, copy = maybe_convert_dtype(data, copy) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 2060, in maybe_convert_dtype elif is_extension_type(data) and not is_datetime64tz_dtype(data): File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/dtypes/common.py", line 1734, in is_extension_type if is_categorical(arr): File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/dtypes/common.py", line 387, in is_categorical return isinstance(arr, ABCCategorical) or is_categorical_dtype(arr) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/dtypes/common.py", line 708, in is_categorical_dtype return CategoricalDtype.is_dtype(arr_or_dtype) File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/dtypes/base.py", line 256, in is_dtype if isinstance(dtype, (ABCSeries, ABCIndexClass, ABCDataFrame, np.dtype)): File "/home/travis/.conda/envs/xclim_dev/lib/python3.7/site-packages/pandas/core/dtypes/generic.py", line 9, in _check return getattr(inst, attr, "_typ") in comp RecursionError: maximum recursion depth exceeded while calling a Python object Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.5 (default, Oct 25 2019, 15:51:11) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-74-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.3 xarray: 0.14.1+37.gdb36c5c0 pandas: 0.25.3 numpy: 1.17.4 scipy: 1.3.1 netCDF4: 1.5.3 pydap: None h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.1 cfgrib: None iris: None bottleneck: 1.3.1 dask: 2.6.0 distributed: 2.6.0 matplotlib: 3.1.1 cartopy: None seaborn: None numbagg: None setuptools: 41.6.0.post20191030 pip: 19.3.1 conda: None pytest: 5.2.2 IPython: None sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3666/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
503711327	MDU6SXNzdWU1MDM3MTEzMjc=	3381	concat() fails when args have sparse.COO data and different fill values	khaeru 1634164	open	4	2019-10-07T21:54:06Z	2021-07-08T17:43:57Z		NONE	MCVE Code Sample ```python import numpy as np import pandas as pd import sparse import xarray as xr Indices and raw data foo = [f'foo{i}' for i in range(6)] bar = [f'bar{i}' for i in range(6)] raw = np.random.rand(len(foo) // 2, len(bar)) DataArray a = xr.DataArray( data=sparse.COO.from_numpy(raw), coords=[foo[:3], bar], dims=['foo', 'bar']) print(a.data.fill_value) # 0.0 Created from a pd.Series b_series = pd.DataFrame(raw, index=foo[3:], columns=bar) \ .stack() \ .rename_axis(index=['foo', 'bar']) b = xr.DataArray.from_series(b_series, sparse=True) print(b.data.fill_value) # nan Works despite inconsistent fill-values a + b a * b Fails: complains about inconsistent fill-values xr.concat([a, b], dim='foo') # * The fill_value argument doesn't help xr.concat([a, b], dim='foo', fill_value=np.nan) def fill_value(da): """Try to coerce one argument to a consistent fill-value.""" return xr.DataArray( data=sparse.as_coo(da.data, fill_value=np.nan), coords=da.coords, dims=da.dims, name=da.name, attrs=da.attrs, ) Fails: "Cannot provide a fill-value in combination with something that already has a fill-value" print(xr.concat([a.pipe(fill_value), b], dim='foo')) If we cheat by recreating 'a' from scratch, copying the fill value of the intended other argument, it works again: a = xr.DataArray( data=sparse.COO.from_numpy(raw, fill_value=b.data.fill_value), coords=[foo[:3], bar], dims=['foo', 'bar']) c = xr.concat([a, b], dim='foo') print(c.data.fill_value) # nan But simple operations again create objects with potentially incompatible fill-values d = c.sum(dim='bar') print(d.data.fill_value) # 0.0 ``` Expected `concat()` can be used without having to create new objects; i.e. the line marked `*` just works. Problem Description Some basic xarray manipulations don't work on `sparse.COO`-backed objects. xarray should automatically coerce objects into a compatible state, or at least provide users with methods to do so. Behaviour should also be documented, e.g. in this instance, which operations (here, `.sum()`) modify the underlying storage format in ways that necessitate some kind of (re-)conversion. Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, Aug 20 2019, 17:04:43) [GCC 8.3.0] python-bits: 64 OS: Linux OS-release: 5.0.0-32-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.13.0 pandas: 0.25.0 numpy: 1.17.2 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: 0.7.1 h5py: 2.8.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.1.0 distributed: None matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: None setuptools: 40.8.0 pip: 19.2.3 conda: None pytest: 5.0.1 IPython: 5.8.0 sphinx: 2.2.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3381/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
512205079	MDU6SXNzdWU1MTIyMDUwNzk=	3445	Merge fails when sparse Dataset has overlapping dimension values	k-a-mendoza 4605410	open	3	2019-10-24T22:08:12Z	2021-07-08T17:43:57Z		NONE	Sparse numpy arrays used in a merge operation seem to fail under certain coordinate settings. for example, this works perfectly: ```python import xarray as xr import numpy as np data_array1 = xr.DataArray(data,name='default', dims=['source','receiver','time'], coords={'source':['X.1'], 'receiver':['X.2'], 'time':time}).to_dataset() data_array2 = xr.DataArray(data,name='default', dims=['source','receiver','time'], coords={'source':['X.2'], 'receiver':['X.1'], 'time':time}).to_dataset() dataset1 = xr.merge([data_array1,data_array2]) ``` But this raises an `IndexError: Only indices with at most one iterable index are supported.` from the sparse package: ```python import xarray as xr import numpy as np import sparse data = sparse.COO.from_numpy(np.random.uniform(-1,1,(1,1,100))) time = np.linspace(0,1,num=100) data_array1 = xr.DataArray(data,name='default', dims=['source','receiver','time'], coords={'source':['X.1'], 'receiver':['X.2'], 'time':time}).to_dataset() data_array2 = xr.DataArray(data,name='default', dims=['source','receiver','time'], coords={'source':['X.2'], 'receiver':['X.1'], 'time':time}).to_dataset() dataset1 = xr.merge([data_array1,data_array2]) ``` I have noticed this occurs when the merger would seem to add dimensions filled with nan values.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3445/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
544375718	MDU6SXNzdWU1NDQzNzU3MTg=	3659	Error concatenating Multiindex variables	hazbottles 14136435	open	1	2020-01-01T16:36:26Z	2021-07-08T17:43:57Z		CONTRIBUTOR	MCVE Code Sample ```python import xarray as xr da = xr.DataArray([0, 1], dims=["location"], coords={"lat": ("location", [10, 11]), "lon": ("location", [20, 21])}).set_index(location=["lat", "lon"]) da2 = xr.DataArray([2, 3], dims=["location"], coords={"lat": ("location", [12, 13]), "lon": ("location", [22, 23])}).set_index(location=["lat", "lon"]) xr.concat([da["location"], da2["location"]], dim="location") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/harry/code/xarray/xarray/core/concat.py", line 135, in concat return f(objs, dim, data_vars, coords, compat, positions, fill_value, join) File "/home/harry/code/xarray/xarray/core/concat.py", line 431, in _dataarray_concat ds = _dataset_concat( File "/home/harry/code/xarray/xarray/core/concat.py", line 384, in _dataset_concat result = Dataset(result_vars, attrs=result_attrs) File "/home/harry/code/xarray/xarray/core/dataset.py", line 541, in init variables, coord_names, dims, indexes = merge_data_and_coords( File "/home/harry/code/xarray/xarray/core/merge.py", line 466, in merge_data_and_coords return merge_core( File "/home/harry/code/xarray/xarray/core/merge.py", line 556, in merge_core assert_unique_multiindex_level_names(variables) File "/home/harry/code/xarray/xarray/core/variable.py", line 2363, in assert_unique_multiindex_level_names raise ValueError("conflicting MultiIndex level name(s):\n%s" % conflict_str) ValueError: conflicting MultiIndex level name(s): 'lat' (location), 'lat' (<this-array>) 'lon' (location), 'lon' (<this-array>) ``` Expected Output The output should be the same as first concatenating the DataArrays, then extracting the dimension location: ```python xr.concat([da, da2], dim="location")["location"] <xarray.DataArray 'location' (location: 4)> array([(10, 20), (11, 21), (12, 22), (13, 23)], dtype=object) Coordinates: * location (location) MultiIndex - lat (location) int64 10 11 12 13 - lon (location) int64 20 21 22 23 ``` Problem Description ```python da["location"] looks like a normal DataArray location = da["location"] location <xarray.DataArray 'location' (location: 2)> array([(10, 20), (11, 21)], dtype=object) Coordinates: * location (location) MultiIndex - lat (location) int64 10 11 - lon (location) int64 20 21 but in actual fact, the variable._data is a MultiIndex location.variable._data PandasIndexAdapter(array=MultiIndex([(10, 20), (11, 21)], names=['lat', 'lon']), dtype=dtype('O')) ``` This is why an error is thrown: `variable.assert_unique_multiindex_level_names` gets passed two variables: `location.variable` (the DataArray data values), and also `location["location"].variable` (the coordinate values), which are both MultiIndexes. Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: b3d3b4480b7fb63402eb6c02103bb8d6c7dbf93a python: 3.8.0 \| packaged by conda-forge \| (default, Nov 22 2019, 19:11:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.4.0-18362-Microsoft machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.14.1+36.gb3d3b44 pandas: 0.25.3 numpy: 1.18.0 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.9.1 distributed: 2.9.1 matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 42.0.2.post20191201 pip: 19.3.1 conda: None pytest: 5.3.2 IPython: None sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3659/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
489825483	MDU6SXNzdWU0ODk4MjU0ODM=	3281	[proposal] concatenate by axis, ignore dimension names	Hoeze 1200058	open	4	2019-09-05T15:06:22Z	2021-07-08T17:42:53Z		NONE	Hi, I wrote a helper function which allows to concatenate arrays like `xr.combine_nested` with the difference that it only supports `xr.DataArrays`, concatenates them by axis position similar to `np.concatenate` and overwrites all dimension names. I often need this to combine very different feature types. ```python from typing import Union, Tuple, List import numpy as np import xarray as xr def concat_by_axis( darrs: Union[List[xr.DataArray], Tuple[xr.DataArray]], dims: Union[List[str], Tuple[str]], axis: int = None, kwargs ): """ Concat arrays along some axis similar to `np.concatenate`. Automatically renames the dimensions to `dims`. Please note that this renaming happens by the axis position, therefore make sure to transpose all arrays to the correct dimension order. :param darrs: List or tuple of xr.DataArrays :param dims: The dimension names of the resulting array. Renames axes where necessary. :param axis: The axis which should be concatenated along :param kwargs: Additional arguments which will be passed to `xr.concat()` :return: Concatenated xr.DataArray with dimensions `dim`. """ # Get depth of nested lists. Assumes `darrs` is correctly formatted as list of lists. if axis is None: axis = 0 l = darrs # while l is a list or tuple and contains elements: while isinstance(l, List) or isinstance(l, Tuple) and l: # increase depth by one axis -= 1 l = l[0] if axis == 0: raise ValueError("`darrs` has to be a (possibly nested) list or tuple of xr.DataArrays!") to_concat = list() for i, da in enumerate(darrs): # recursive call for nested arrays; # most inner call should have axis = -1, # most outer call should have axis = - depth_of_darrs if isinstance(da, list) or isinstance(da, tuple): da = concat_axis(da, dims=dims, axis=axis + 1, kwargs) if not isinstance(da, xr.DataArray): raise ValueError("Input %d must be a xr.DataArray" % i) if len(da.dims) != len(dims): raise ValueError("Input %d must have the same number of dimensions as specified in the `dims` argument!" % i) # force-rename dimensions da = da.rename(dict(zip(da.dims, dims))) to_concat.append(da) return xr.concat(to_concat, dim=dims[axis], **kwargs) ``` Would it make sense to include this in xarray?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3281/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
494906646	MDU6SXNzdWU0OTQ5MDY2NDY=	3315	xr.combine_nested() fails when passed nested DataSets	friedrichknuth 10554254	open	8	2019-09-17T23:47:44Z	2021-07-08T17:42:53Z		NONE	`xr.__version__ '0.13.0'` xr.combine_nested() works when passed a nested list of DataArray objects. `da1 = xr.DataArray(name="a", data=[[0]], dims=["x", "y"]) da2 = xr.DataArray(name="b", data=[[1]], dims=["x", "y"]) da3 = xr.DataArray(name="a", data=[[2]], dims=["x", "y"]) da4 = xr.DataArray(name="b", data=[[3]], dims=["x", "y"]) xr.combine_nested([[da1, da2], [da3, da4]], concat_dim=["x", "y"])` returns `<xarray.DataArray 'a' (x: 2, y: 2)> array([[0, 1], [2, 3]]) Dimensions without coordinates: x, y` but fails if passed a nested list of DataSet objects. `ds1 = da1.to_dataset() ds2 = da2.to_dataset() ds3 = da3.to_dataset() ds4 = da4.to_dataset() xr.combine_nested([[ds1, ds2], [ds3, ds4]], concat_dim=["x", "y"])` returns ``` KeyError Traceback (most recent call last) <ipython-input-8-c0035883fc68> in <module> 3 ds3 = da3.to_dataset() 4 ds4 = da4.to_dataset() ----> 5 xr.combine_nested([[ds1, ds2], [ds3, ds4]], concat_dim=["x", "y"]) ~/repos/contribute/xarray/xarray/core/combine.py in combine_nested(datasets, concat_dim, compat, data_vars, coords, fill_value, join) 462 ids=False, 463 fill_value=fill_value, --> 464 join=join, 465 ) 466 ~/repos/contribute/xarray/xarray/core/combine.py in _nested_combine(datasets, concat_dims, compat, data_vars, coords, ids, fill_value, join) 305 coords=coords, 306 fill_value=fill_value, --> 307 join=join, 308 ) 309 return combined ~/repos/contribute/xarray/xarray/core/combine.py in _combine_nd(combined_ids, concat_dims, data_vars, coords, compat, fill_value, join) 196 compat=compat, 197 fill_value=fill_value, --> 198 join=join, 199 ) 200 (combined_ds,) = combined_ids.values() ~/repos/contribute/xarray/xarray/core/combine.py in _combine_all_along_first_dim(combined_ids, dim, data_vars, coords, compat, fill_value, join) 218 datasets = combined_ids.values() 219 new_combined_ids[new_id] = _combine_1d( --> 220 datasets, dim, compat, data_vars, coords, fill_value, join 221 ) 222 return new_combined_ids ~/repos/contribute/xarray/xarray/core/combine.py in _combine_1d(datasets, concat_dim, compat, data_vars, coords, fill_value, join) 246 compat=compat, 247 fill_value=fill_value, --> 248 join=join, 249 ) 250 except ValueError as err: ~/repos/contribute/xarray/xarray/core/concat.py in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join) 131 "objects, got %s" % type(first_obj) 132 ) --> 133 return f(objs, dim, data_vars, coords, compat, positions, fill_value, join) 134 135 ~/repos/contribute/xarray/xarray/core/concat.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join) 363 for k in datasets[0].variables: 364 if k in concat_over: --> 365 vars = ensure_common_dims([ds.variables[k] for ds in datasets]) 366 combined = concat_vars(vars, dim, positions) 367 assert isinstance(combined, Variable) ~/repos/contribute/xarray/xarray/core/concat.py in <listcomp>(.0) 363 for k in datasets[0].variables: 364 if k in concat_over: --> 365 vars = ensure_common_dims([ds.variables[k] for ds in datasets]) 366 combined = concat_vars(vars, dim, positions) 367 assert isinstance(combined, Variable) ~/repos/contribute/xarray/xarray/core/utils.py in getitem(self, key) 383 384 def getitem(self, key: K) -> V: --> 385 return self.mapping[key] 386 387 def iter(self) -> Iterator[K]: KeyError: 'a' ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3315/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
329575874	MDU6SXNzdWUzMjk1NzU4NzQ=	2217	tolerance for alignment	naomi-henderson 31460695	open	23	2018-06-05T18:34:45Z	2021-07-08T17:42:52Z		NONE	When using open_mfdataset on files which 'should' share a grid, there is often a small mismatch which results in the grid not aligning properly. This happens frequently when trying to read data from large climate models from multiple files of the same variable, same lon,lat grid and different time intervals. This silent behavior means that I always have to check the sizes of the lon,lat grids whenever I rely on mfdataset to concatenate the data in time. Here is an example in which I create two 1d DataArrays which have slightly different coordinates: ```python import xarray as xr import numpy as np from glob import glob tol=1e-14 x1 = np.arange(1,6)+ tolnp.random.rand(5) da1 = xr.DataArray([9, 0, 2, 1, 0], dims=['x'], coords={'x': x1}) x2 = np.arange(1,6) + tolnp.random.rand(5) da2 = da1.copy() da2['x'] = x2 print(da1.x,'\n', da2.x) <xarray.DataArray 'x' (x: 5)> array([1., 2., 3., 4., 5.]) Coordinates: * x (x) float64 1.0 2.0 3.0 4.0 5.0 <xarray.DataArray 'x' (x: 5)> array([1., 2., 3., 4., 5.]) Coordinates: * x (x) float64 1.0 2.0 3.0 4.0 5.0 `First I save both DataArrays as netcdf files and then use open_mfdataset to load them:` da1.to_netcdf('da1.nc',encoding={'x':{'dtype':'float64'}}) da2.to_netcdf('da2.nc',encoding={'x':{'dtype':'float64'}}) db = xr.open_mfdataset(glob('da?.nc')) db <xarray.Dataset> Dimensions: (x: 10) Coordinates: * x (x) float64 1.0 2.0 3.0 4.0 5.0 1.0 2.0 ... Data variables: xarray_dataarray_variable (x) int64 dask.array<shape=(10,), chunksize=(5,)> `So the x grid is now twice the size. This behavior is the same if I just use align with join='outer':` xr.align(da1,da2,join='outer') (<xarray.DataArray (x: 10)> array([nan, 9., nan, 0., 2., nan, nan, 1., 0., nan]) Coordinates: * x (x) float64 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0 5.0 5.0, <xarray.DataArray (x: 10)> array([ 9., nan, 0., nan, nan, 2., 1., nan, nan, 0.]) Coordinates: * x (x) float64 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0 5.0 5.0) ``` Request/ suggestion What is needed is a user specified tolerance level to give to open_mfdataset and passed to align which will accept these grids as the same Possibly related to https://github.com/pydata/xarray/issues/2215 xr.__version__ '0.10.4' thanks, Naomi	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2217/reactions", "total_count": 10, "+1": 10, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
415802678	MDU6SXNzdWU0MTU4MDI2Nzg=	2796	Better explanation of 'minimal' in xarray.open_mfdataset(data_vars='minimal') in docs?	wckoeppen 5704500	open	2	2019-02-28T20:11:42Z	2021-07-08T17:42:52Z		NONE	Problem description I'm currently troubleshooting some overly long (to me) load times using open_mfdataset on GFS data. In trying to speed things up, I'm trying to specify just the four variables I actually care about using `data_vars=[strings]`, but to no avail. It still takes ~30 minutes to load 52 time slices from 7 files. In the docs I do see that if `data_vars =` list of str: "The listed data variables will be concatenated, in addition to the ‘minimal’ data variables." However, I can't seem to understand what the 'minimal' variables are from this sentence in the docs: ‘minimal’: Only data variables in which the dimension already appears are included. All the variables in the CF-compliant GFS data are associated with dimensions. So does that mean that all the variables in the files will be concatenated, regardless if I specify which ones I want? I feel like I'm misunderstanding what is included by default.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2796/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
423749397	MDU6SXNzdWU0MjM3NDkzOTc=	2836	xarray.concat() with compat='identical' fails for DataArray attrs	aldanor 2418513	open	9	2019-03-21T14:11:29Z	2021-07-08T17:42:52Z		NONE	Not sure if it was ever supposed to work with numpy arrays, but it actually does :thinking:: ```python attr = np.array([[3, 4]]) d1 = xr.Dataset({'z': 1}, attrs={'y': attr}) d2 = xr.Dataset({'z': 2}, attrs={'y': attr.copy()}) xr.concat([d1, d2], dim='z', compat='identical') ``` However, it fails if you use DataArray attrs: ```python attr = xr.DataArray([3, 4], {'x': [1, 2]}, 'x') d1 = xr.Dataset({'z': 1}, attrs={'y': attr}) d2 = xr.Dataset({'z': 2}, attrs={'y': attr.copy()}) xr.concat([d1, d2], dim='z', compat='identical') ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ``` Given that the check is simply `(a is b) or (a == b)`, should it try to do something smarter for array-like attrs?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2836/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
446054247	MDU6SXNzdWU0NDYwNTQyNDc=	2975	Inconsistent/confusing behaviour when concatenating dimension coords	TomNicholas 35968931	open	2	2019-05-20T11:01:37Z	2021-07-08T17:42:52Z		MEMBER	I noticed that with multiple conflicting dimension coords then concat can give pretty weird/counterintuitive results, at least compared to what the documentation suggests they should give: ```python Create two datasets with conflicting coordinates objs = [Dataset({'x': [0], 'y': [1]}), Dataset({'y': [0], 'x': [1]})] [<xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: * x (x) int64 0 * y (y) int64 1 Data variables: empty, <xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: * y (y) int64 0 * x (x) int64 1 Data variables: empty] ``` ```python Try to join along only 'x', coords='minimal' so concatenate "Only coordinates in which the dimension already appears" concat(objs, dim='x', coords='minimal') <xarray.Dataset> Dimensions: (x: 2, y: 2) Coordinates: * y (y) int64 0 1 * x (x) int64 0 1 Data variables: empty It's joined along x and y! ``` Based on my reading of the docstring for concat, I would have expected this to not attempt to concatenate y, because `coords='minimal'`, and instead to throw an error because 'y' is a "non-concatenated variable" whose values are not the same across datasets. Now let's try to get concat to broadcast 'y' across 'x': ```python Try to join along only 'x' by setting coords='different' concat(objs, dim='x', coords='different') ``` Now as "Data variables which are not equal (ignoring attributes) across all datasets are also concatenated" then I would have expected 'y' to be concatenated across 'x', i.e. to add the 'x' dimension to the 'y' coord, i.e: `python <xarray.Dataset> Dimensions: (x: 2, y: 1) Coordinates: * y (y, x) int64 1 0 * x (x) int64 0 1 Data variables: empty` But that's not what we get!: `<xarray.Dataset> Dimensions: (x: 2, y: 2) Coordinates: * y (y) int64 0 1 * x (x) int64 0 1 Data variables: empty` Same again but without dimension coords If we create the same sort of objects but the variables are data vars not coords, then everything behaves exactly as expected: ```python objs2 = [Dataset({'a': ('x', [0]), 'b': ('y', [1])}), Dataset({'a': ('x', [1]), 'b': ('y', [0])})] [<xarray.Dataset> Dimensions: (x: 1, y: 1) Dimensions without coordinates: x, y Data variables: a (x) int64 0 b (y) int64 1, <xarray.Dataset> Dimensions: (x: 1, y: 1) Dimensions without coordinates: x, y Data variables: a (x) int64 1 b (y) int64 0] concat(objs2, dim='x', data_vars='minimal') ValueError: variable b not equal across datasets concat(objs2, dim='x', data_vars='different') <xarray.Dataset> Dimensions: (x: 2, y: 1) Dimensions without coordinates: x, y Data variables: a (x) int64 0 1 b (x, y) int64 1 0 ``` Also if you do the same again but with coordinates which are not dimension coords, i.e: ```python objs3 = [Dataset(coords={'a': ('x', [0]), 'b': ('y', [1])}), Dataset(coords={'a': ('x', [1]), 'b': ('y', [0])})] [<xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: a (x) int64 0 b (y) int64 1 Dimensions without coordinates: x, y Data variables: empty, <xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: a (x) int64 1 b (y) int64 0 Dimensions without coordinates: x, y Data variables: empty] ``` then this again gives the expected concatenation behaviour. So this implies that the compatibility checks that are being done on the data vars are not being done on the coords, but only if they are dimension coordinates! Either this is not the desired behaviour or the concat docstring needs to be a lot clearer. If we agree that this is not the desired behaviour then I will have a look inside `concat` to work out why it's happening. EDIT: Presumably this has something to do with the ToDo in the code for `concat`: `# TODO: support concatenating scalar coordinates even if the concatenated dimension already exists`...	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2975/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
193294569	MDU6SXNzdWUxOTMyOTQ1Njk=	1151	Scalar coords vs. concat	crusaderky 6213168	open	11	2016-12-03T15:42:18Z	2021-07-08T17:42:18Z		MEMBER	Why does this work: ``` import xarray a = xarray.DataArray([1, 2, 3], dims=['x'], coords={'y': 10}) b = xarray.DataArray([4, 5, 6], dims=['x']) a + b <xarray.DataArray (x: 3)> array([5, 7, 9]) Coordinates: y int64 10 `But this doesn't?` xarray.concat([a, b], dim='x') KeyError: 'y' ``` It doesn't seem coherent to me...	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1151/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
223231729	MDU6SXNzdWUyMjMyMzE3Mjk=	1379	xr.concat consuming too much resources	rafa-guedes 7799184	open	4	2017-04-20T23:33:52Z	2021-07-08T17:42:18Z		CONTRIBUTOR	Hi, I am reading in several (~1000) small ascii files into Dataset objects and trying to concatenate them over one specific dimension but I eventually blow my memory up. The file glob is not huge (~700M, my computer has ~16G) and I can do it fine if I only read in the Datasets appending them to a list without concatenating them (my memory increases by 5% only or so by the time I had read them all). However, when trying to concatenate each file into one single Dataset upon reading over a loop, the processing speeds drastically reduce before I have read 10% of the files or so and my memory usage keeps going up until it eventually blows up before I read and concatenate 30% of these files (the screenshot below was taken before it blew up, the memory usage was under 20% by the start of the processing). I was wondering if this is expected, or if there something that could be improved to make that work more efficiently please. I'm changing my approach now by extracting numpy arrays from the individual Datasets, concatenating these numpy arrays and defining the final Dataset only at the end. Thanks.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1379/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
471409673	MDU6SXNzdWU0NzE0MDk2NzM=	3158	Out of date docstring for concat_dim in open_mfdataset	zdgriffith 17169544	open	3	2019-07-23T00:01:05Z	2021-07-08T17:40:45Z		CONTRIBUTOR	In the `open_mfdataset` docstring: `concat_dim : str, or list of str, DataArray, Index or None, optional Dimensions to concatenate files along. You only need to provide this argument if any of the dimensions along which you want to concatenate is not a dimension in the original datasets, e.g., if you want to stack a collection of 2D arrays along a third dimension. ...` This is true for the default `combine='_old_auto'`, but when using `combine='nested'` it is required while it is not used by `combine='by_coords'`. It would be clearer to make that distinction here.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3158/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
496688781	MDU6SXNzdWU0OTY2ODg3ODE=	3330	Feature requests for DataArray.rolling	fjanoos 923438	closed	1	2019-09-21T18:58:21Z	2021-07-08T16:29:18Z	2021-07-08T16:29:18Z	NONE	In `DataArray.rolling` it would be really nice to have support for window sizes specified in the units of the dimension (esp. time). For example if `da` has dimensions `(time, space, feature)` with `time` as `DatetimeIndex` - then it should be possible specificy `da.rolling( time=pd.Timedelta( 100, 'D') )` as a valid window	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3330/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

16 rows where repo = 13221727 and "updated_at" is on date 2021-07-08 sorted by updated_at descending

MCVE Code Sample

Expected Output

Problem Description

Output of xr.show_versions()

MCVE Code Sample

Indices and raw data

DataArray

Created from a pd.Series

Works despite inconsistent fill-values

Fails: complains about inconsistent fill-values

xr.concat([a, b], dim='foo') # ***

The fill_value argument doesn't help

xr.concat([a, b], dim='foo', fill_value=np.nan)

Fails: "Cannot provide a fill-value in combination with something that

already has a fill-value"

print(xr.concat([a.pipe(fill_value), b], dim='foo'))

If we cheat by recreating 'a' from scratch, copying the fill value of the

intended other argument, it works again:

But simple operations again create objects with potentially incompatible

fill-values

Expected

Problem Description

Output of xr.show_versions()

MCVE Code Sample

Expected Output

Problem Description

da["location"] looks like a normal DataArray

but in actual fact, the variable._data is a MultiIndex

Output of xr.show_versions()

```

Request/ suggestion

Problem description

Create two datasets with conflicting coordinates

Try to join along only 'x',

coords='minimal' so concatenate "Only coordinates in which the dimension already appears"

It's joined along x and y!

Try to join along only 'x' by setting coords='different'

Same again but without dimension coords

Advanced export

Output of `xr.show_versions()`

Output of `xr.show_versions()`

Output of `xr.show_versions()`