github: issues: 58 rows where type = "issue" and "updated_at" is on date 2022-04-09 sorted by updated

58 rows where type = "issue" and "updated_at" is on date 2022-04-09 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	assignee	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
1177665302	I_kwDOAMm_X85GMb8W	6401	Unnecessary warning when specifying `chunks` opening dataset with empty dimension	jaicher 4666753	closed		0	2022-03-23T06:38:25Z	2022-04-09T20:27:40Z	2022-04-09T20:27:40Z	CONTRIBUTOR	What happened? I receive unnecessary warnings when opening Zarr datasets with empty dimensions/arrays using the `chunks` argument (for a non-empty dimension). If an array has zero size (due to an empty dimension), it is saved as a single chunk regardless of Dask chunking on other dimensions (#5742). If the `chunks` parameter is provided for other dimensions when loading the Zarr file (based on the expected chunksizes were the array nonempty), xarray gives a warning about potentially degraded performance from splitting the single chunk. What did you expect to happen? I expect no warning to be raised when there is no data: performance degradation on an empty array should be negligible. we don't always know if one of the dimensions is empty until loading. But we would use the `chunks` parameter for dimensions with consistent chunksizes (to specify a multiple of what's on disk) -- this is thrown off when other dimensions are empty. Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np each `a` is expected to be chunked separately ds = xr.Dataset({"x": (("a", "b"), np.empty((4, 0)))}).chunk({"a": 1}) but when we save it, it gets saved as a single chunk ds.to_zarr("tmp.zarr") so if we open it up with expected chunksizes (not knowing that b is empty): ds2 = xr.open_zarr("tmp.zarr", chunks={"a": 1}) we get a warning :( ``` Relevant log output `Python {...}/miniconda3/envs/new-majiq/lib/python3.8/site-packages/xarray/core/dataset.py:410: UserWarning: Specified Dask chunks (1, 1, 1, 1) would separate on disks chunk shape 4 for dime nsion a. This could degrade performance. (chunks = {'a': (1, 1, 1, 1), 'b': (0,)}, preferred_ch unks = {'a': 4, 'b': 1}). Consider rechunking after loading instead. _check_chunks_compatibility(var, output_chunks, preferred_chunks)` Anything else we need to know? This can be fixed by only calling `_check_chunks_compatibility()` whenever `var` is nonempty (PR forthcoming). Environment INSTALLED VERSIONS [3/1946] commit: None python: 3.8.12 \| packaged by conda-forge \| (default, Jan 30 2022, 23:42:07) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: None xarray: 2022.3.0 pandas: 1.4.1 numpy: 1.22.2 scipy: 1.8.0 netCDF4: None pydap: None h5netcdf: None h5py: 3.6.0 Nio: None zarr: 2.11.1 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.4 dask: 2022.01.0 distributed: 2022.01.0 matplotlib: 3.5.1 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.01.0 cupy: None pint: None sparse: None setuptools: 59.8.0 pip: 22.0.4 conda: None pytest: 7.0.1 IPython: 8.1.1 sphinx: 4.4.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6401/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1167883842	I_kwDOAMm_X85FnH5C	6352	to_netcdf from subsetted Dataset with strings loaded from char array netCDF can sometimes fail	DocOtak 868027	open		0	2022-03-14T04:52:38Z	2022-04-09T16:59:52Z		CONTRIBUTOR	What happened? Not quite sure what to actually title this, so feel free to edit it. I have some netcdf files modeled after the Argo _prof file format (CF Discrete sampling geometry incomplete multidimensional array representation). While working on splitting these into individual profiles, I would occasionally get exceptions thrown complaining about broadcasting. I eventually narrowed this down to some string variables we maintain for historic purposes. Depending on the row split apart, the string data in each cell could be shorter which would result in a stringN having some different N (e.g. string4 = 3 in the CDL). If while serializing, a different string variable is being encoded that actually has length 4, it would reuse the now incorrect string4 dim name. The above situation seems to only occur when a netCDF file is read back into xarray and the `char_dim_name` encoding key is set. What did you expect to happen? Successful serialization to netCDF. Minimal Complete Verifiable Example ```Python setup import numpy as np import xarray as xr one_two = xr.DataArray(np.array(["a", "aa"], dtype="object"), dims=["dim0"]) two_two = xr.DataArray(np.array(["aa", "aa"], dtype="object"), dims=["dim0"]) ds = xr.Dataset({"var0": one_two, "var1": two_two}) ds.var0.encoding["dtype"] = "S1" ds.var1.encoding["dtype"] = "S1" need to write out and read back in ds.to_netcdf("test.nc") only selecting the shorter string will fail ds1 = xr.load_dataset("test.nc") ds1[{"dim0": 1}].to_netcdf("ok.nc") ds1[{"dim0": 0}].to_netcdf("error.nc") will work if the char dim name is removed from encoding of the now shorter arr ds1 = xr.load_dataset("test.nc") del ds1.var0.encoding["char_dim_name"] ds1[{"dim0": 0}].to_netcdf("will_work.nc") ``` Relevant log output ```Python IndexError Traceback (most recent call last) /var/folders/y1/63dlf4614h5d2cgr5g1t_5lh0000gn/T/ipykernel_64155/447008818.py in <module> 2 ds1 = xr.load_dataset("test.nc") 3 ds1[{"dim0": 1}].to_netcdf("ok.nc") ----> 4 ds1[{"dim0": 0}].to_netcdf("error.nc") ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1899 from ..backends.api import to_netcdf 1900 -> 1901 return to_netcdf( 1902 self, 1903 path, ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1070 # TODO: allow this work (setting up the file for writing array data) 1071 # to be parallelized with dask -> 1072 dump_to_store( 1073 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1074 ) ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1117 variables, attrs = encoder(variables, attrs) 1118 -> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) 1120 1121 ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 263 self.set_attributes(attributes) 264 self.set_dimensions(variables, unlimited_dims=unlimited_dims) --> 265 self.set_variables( 266 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims 267 ) ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 305 ) 306 --> 307 writer.add(source, target) 308 309 def set_dimensions(self, variables, unlimited_dims=None): ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in add(self, source, target, region) 154 target[region] = source 155 else: --> 156 target[...] = source 157 158 def sync(self, compute=True): ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/netCDF4_.py in setitem(self, key, value) 70 with self.datastore.lock: 71 data = self.get_array(needs_lock=False) ---> 72 data[key] = value 73 if self.datastore.autoclose: 74 self.datastore.close(needs_lock=False) src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.setitem() src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._put() IndexError: size of data array does not conform to slice ``` Anything else we need to know? I've been unable to recreate the specific error I'm getting in a minimal example. However, removing the `char_dim_name` encoding key does solve this. When digging in the xarray issues, these looked maybe relevant: #2219 #2895 Actual traceback I get with my data ```python --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /var/folders/y1/63dlf4614h5d2cgr5g1t_5lh0000gn/T/ipykernel_64155/3328648456.py in <module> ----> 1 ds[{"N_PROF": 0}].to_netcdf("test.nc") ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1899 from ..backends.api import to_netcdf 1900 -> 1901 return to_netcdf( 1902 self, 1903 path, ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1070 # TODO: allow this work (setting up the file for writing array data) 1071 # to be parallelized with dask -> 1072 dump_to_store( 1073 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1074 ) ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1117 variables, attrs = encoder(variables, attrs) 1118 -> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) 1120 1121 ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 263 self.set_attributes(attributes) 264 self.set_dimensions(variables, unlimited_dims=unlimited_dims) --> 265 self.set_variables( 266 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims 267 ) ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 305 ) 306 --> 307 writer.add(source, target) 308 309 def set_dimensions(self, variables, unlimited_dims=None): ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in add(self, source, target, region) 154 target[region] = source 155 else: --> 156 target[...] = source 157 158 def sync(self, compute=True): ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/netCDF4_.py in __setitem__(self, key, value) 70 with self.datastore.lock: 71 data = self.get_array(needs_lock=False) ---> 72 data[key] = value 73 if self.datastore.autoclose: 74 self.datastore.close(needs_lock=False) src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__setitem__() ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/netCDF4/utils.py in _StartCountStride(elem, shape, dimensions, grp, datashape, put, use_get_vars) 354 fullslice = False 355 if fullslice and datashape and put and not hasunlim: --> 356 datashape = broadcasted_shape(shape, datashape) 357 358 # pad datashape with zeros for dimensions not being sliced (issue #906) ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/netCDF4/utils.py in broadcasted_shape(shp1, shp2) 962 a = as_strided(x, shape=shp1, strides=[0] * len(shp1)) 963 b = as_strided(x, shape=shp2, strides=[0] * len(shp2)) --> 964 return np.broadcast(a, b).shape ValueError: shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (5,) and arg 1 with shape (6,). ``` Environment INSTALLED VERSIONS commit: None python: 3.9.9 (main, Jan 5 2022, 11:21:18) [Clang 13.0.0 (clang-1300.0.29.30)] python-bits: 64 OS: Darwin OS-release: 21.3.0 machine: arm64 processor: arm byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.13.0 libnetcdf: 4.8.1 xarray: 2022.3.0 pandas: 1.3.5 numpy: 1.22.0 scipy: None netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: 0.18 sparse: None setuptools: 58.1.0 pip: 21.2.4 conda: None pytest: 6.2.5 IPython: 7.31.0 sphinx: 4.4.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6352/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
558455147	MDU6SXNzdWU1NTg0NTUxNDc=	3740	Error during slicing of a dataarray	ankitesh97 16163706	closed		1	2020-02-01T01:26:55Z	2022-04-09T15:52:32Z	2022-04-09T15:52:31Z	NONE	MCVE Code Sample ```python Your code here ``` loaded the dataset using ds = xr.open_mfdataset(in_fns, decode_times=False, decode_cf=False, concat_dim='time') Expected Output Problem Description this my data array (da) <xarray.DataArray 'QAP' (time: 5184, lev: 30, lat: 64, lon: 128)> dask.array<concatenate, shape=(5184, 30, 64, 128), dtype=float32, chunksize=(48, 30, 64, 128), chunktype=numpy.ndarray> Coordinates: * lev (lev) float64 3.643 7.595 14.36 ... 957.5 976.3 992.6 * lon (lon) float64 0.0 2.812 5.625 8.438 ... 351.6 354.4 357.2 * lat (lat) float64 -87.86 -85.1 -82.31 ... 82.31 85.1 87.86 * time (time) float64 365.0 365.0 365.0 ... 707.9 708.0 708.0 Attributes: units: kg/kg long_name: Q after physics when I am trying to slice it via da[1:] it throws an error saying conflicting sizes for dimension 'time': length 96 on 'this-array' and length 5183 on 'time' Output of `xr.show_versions()` version = 0.14.0 # Paste the output here xr.show_versions() here	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3740/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
596606599	MDU6SXNzdWU1OTY2MDY1OTk=	3957	Sort DataArray by data values along one dim	zxdawn 30388627	closed		10	2020-04-08T14:05:44Z	2022-04-09T15:52:20Z	2022-04-09T15:52:20Z	NONE	`.sortby()` only supports sorting DataArray by `coords values`. I'm trying to sort one DataArray (`cld`) by `data values` along one dim and sort another DataArray (`pair`) by the same order. MCVE Code Sample ```python import xarray as xr import numpy as np x = 4 y = 2 z = 4 data = np.arange(xyz).reshape(z, y, x) 3d array with coords cld_1 = xr.DataArray(data, dims=['z', 'y', 'x'], coords={'z': np.arange(z)}) 2d array without coords cld_2 = xr.DataArray(np.arange(xy).reshape(y, x)1.5+1, dims=['y', 'x']) expand 2d to 3d cld_2 = cld_2.expand_dims(z=[4]) concat cld = xr.concat([cld_1, cld_2], dim='z') paired array pair = cld.copy(data=np.arange(xy(z+1)).reshape(z+1, y, x)) print(cld) print(pair) ``` Output ``` <xarray.DataArray (z: 5, y: 2, x: 4)> array([[[ 0. , 1. , 2. , 3. ], [ 4. , 5. , 6. , 7. ]], `[[ 8. , 9. , 10. , 11. ], [12. , 13. , 14. , 15. ]], [[16. , 17. , 18. , 19. ], [20. , 21. , 22. , 23. ]], [[24. , 25. , 26. , 27. ], [28. , 29. , 30. , 31. ]], [[ 1. , 2.5, 4. , 5.5], [ 7. , 8.5, 10. , 11.5]]])` Coordinates: * z (z) int64 0 1 2 3 4 Dimensions without coordinates: y, x <xarray.DataArray (z: 5, y: 2, x: 4)> array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7]], `[[ 8, 9, 10, 11], [12, 13, 14, 15]], [[16, 17, 18, 19], [20, 21, 22, 23]], [[24, 25, 26, 27], [28, 29, 30, 31]], [[32, 33, 34, 35], [36, 37, 38, 39]]])` Coordinates: * z (z) int64 0 1 2 3 4 Dimensions without coordinates: y, x ``` Problem Description I've tried `argsort()`: `cld.argsort(axis=0)`, but the result is wrong: ``` <xarray.DataArray (z: 5, y: 2, x: 4)> array([[[0, 0, 0, 0], [0, 0, 0, 0]], `[[4, 4, 4, 4], [4, 4, 4, 4]], [[1, 1, 1, 1], [1, 1, 1, 1]], [[2, 2, 2, 2], [2, 2, 2, 2]], [[3, 3, 3, 3], [3, 3, 3, 3]]], dtype=int64)` Coordinates: * z (z) int64 0 1 2 3 4 Dimensions without coordinates: y, x ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3957/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
621177286	MDU6SXNzdWU2MjExNzcyODY=	4082	"write to read-only" Error in xarray.open_mfdataset() with opendap datasets	EliT1626 65610153	closed		26	2020-05-19T18:00:58Z	2022-04-09T15:51:46Z	2022-04-09T15:51:46Z	NONE	Error in loading in data from a THREDDS server. Can't find any info on what might be causing it based on the error messages themselves. Code Sample ``` def list_dates(start, end): num_days = (end - start).days return [start + dt.timedelta(days=x) for x in range(num_days)] start_date = dt.date(2017, 3, 1) end_date = dt.date(2017, 3, 31) date_list = list_dates(start_date, end_date) window = dt.timedelta(days=5) url = 'https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.0/AVHRR/{0:%Y%m}/avhrr-only-v2.{0:%Y%m%d}.nc' data = [] cur_date = start_date for cur_date in date_list: `date_window = list_dates(cur_date - window, cur_date + window) url_list = [url.format(x) for x in date_window] window_data=xr.open_mfdataset(url_list).sst data.append(window_data.mean('time'))` dataf=xr.concat(data, dim=pd.DatetimeIndex(date_list, name='time')) ``` Expected Output No error with dataf containing a data array with the dates listed above. Error Description Error 1: `KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.0/AVHRR/201703/avhrr-only-v2.20170322.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]` Error 2: `OSError: [Errno -37] NetCDF: Write to read only: b'https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.0/AVHRR/201703/avhrr-only-v2.20170322.nc'` Versions python: 3.7.4 xarray: 0.15.0 pandas: 0.25.1 numpy: 1.16.5 scipy: 1.3.1 netcdf4: 1.5.3	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4082/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
643035732	MDU6SXNzdWU2NDMwMzU3MzI=	4169	"write to read-only" Error in xarray.open_mfdataset() when trying to write to a netcdf file	EliT1626 65610153	closed		4	2020-06-22T12:35:57Z	2022-04-09T15:50:51Z	2022-04-09T15:50:51Z	NONE	Code Sample ``` xr.set_options(file_cache_maxsize=10) Assumes daily increments def list_dates(start, end): num_days = (end - start).days return [start + dt.timedelta(days=x) for x in range(num_days)] def list_dates1(start, end): num_days = (end - start).days dates = [start + dt.timedelta(days=x) for x in range(num_days)] sorted_dates = sorted(dates, key=lambda date: (date.month, date.day)) grouped_dates = [list(g) for _, g in groupby(sorted_dates, key=lambda date: (date.month, date.day))] return grouped_dates start_date = dt.date(2010, 1, 1) end_date = dt.date(2019, 12, 31) date_list = list_dates1(start_date, end_date) window1 = dt.timedelta(days=5) window2 = dt.timedelta(days=6) url = 'https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/{0:%Y%m}/oisst-avhrr-v02r01.{0:%Y%m%d}.nc' end_date2 = dt.date(2010, 1, 2) sst_mean=[] cur_date = start_date for cur_date in date_list: sst_mean_calc = [] for i in cur_date: date_window=list_dates(i - window1, i + window2) url_list_window = [url.format(x) for x in date_window] window_data=xr.open_mfdataset(url_list_window).sst sst_mean_calc.append(window_data.mean('time')) sst_mean.append(xr.concat(sst_mean_calc, dim='time').mean('time')) cur_date+=cur_date if cur_date[0] >= end_date2: break else: continue sst_mean_climo_test=xr.concat(sst_mean, dim='time') sst_std=xr.concat(sst_std_calc, dim=pd.DatetimeIndex(date_list, name='time')) sst_min = xr.concat(sst_min_calc, dim=pd.DatetimeIndex(date_list, name='time')) sst_max = xr.concat(sst_max_calc, dim=pd.DatetimeIndex(date_list, name='time')) sst_mean_climo_test.to_netcdf(path='E:/Riskpulse_HD/SST_stuff/sst_mean_climo_test') ``` Explanation of Code This code (climatology for SSTs) creates a list of dates between the specified start and end dates that contains the same day number for every month through the year span. For example, date_list[0] contains 10 datetime dates that start with 1-1-2010, 1-1-2011...1-1-2019. I then request OISST data from an opendap server and take a centered mean of the date in question (this case I did it for the first and second of January). In other words, I am opening the files for Dec 27-Jan 6 and averaging all of them together. The final xarray dataset then contains two 'times', which is 10 years worth of data for Jan 1 and Jan 2. I want to then send this to a netcdf file such that I can save it on my local machine and use to create plots down the road. Hope this makes sense. Error Messages ``` KeyError Traceback (most recent call last) ~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock) 197 try: --> 198 file = self._cache[self._key] 199 except KeyError: ~\Anaconda3\lib\site-packages\xarray\backends\lru_cache.py in getitem(self, key) 52 with self._lock: ---> 53 value = self._cache[key] 54 self._cache.move_to_end(key) KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/201801/oisst-avhrr-v02r01.20180106.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))] During handling of the above exception, another exception occurred: RuntimeError Traceback (most recent call last) <ipython-input-3-f8395dcffb5e> in <module> 1 #xr.set_options(file_cache_maxsize=500) ----> 2 sst_mean_climo_test.to_netcdf(path='E:/Riskpulse_HD/SST_stuff/sst_mean_climo_test') ~\Anaconda3\lib\site-packages\xarray\core\dataarray.py in to_netcdf(self, args, kwargs) 2356 dataset = self.to_dataset() 2357 -> 2358 return dataset.to_netcdf(args, kwargs) 2359 2360 def to_dict(self, data: bool = True) -> dict: ~\Anaconda3\lib\site-packages\xarray\core\dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1552 unlimited_dims=unlimited_dims, 1553 compute=compute, -> 1554 invalid_netcdf=invalid_netcdf, 1555 ) 1556 ~\Anaconda3\lib\site-packages\xarray\backends\api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1095 return writer, store 1096 -> 1097 writes = writer.sync(compute=compute) 1098 1099 if path_or_file is None: ~\Anaconda3\lib\site-packages\xarray\backends\common.py in sync(self, compute) 202 compute=compute, 203 flush=True, --> 204 regions=self.regions, 205 ) 206 self.sources = [] ~\Anaconda3\lib\site-packages\dask\array\core.py in store(sources, targets, lock, regions, compute, return_stored, kwargs) 943 944 if compute: --> 945 result.compute(kwargs) 946 return None 947 else: ~\Anaconda3\lib\site-packages\dask\base.py in compute(self, kwargs) 164 dask.base.compute 165 """ --> 166 (result,) = compute(self, traverse=False,** kwargs) 167 return result 168 ~\Anaconda3\lib\site-packages\dask\base.py in compute(args, kwargs) 442 postcomputes.append(x.dask_postcompute()) 443 --> 444 results = schedule(dsk, keys, kwargs) 445 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)]) 446 ~\Anaconda3\lib\site-packages\dask\threaded.py in get(dsk, result, cache, num_workers, pool, kwargs) 82 get_id=_thread_get_id, 83 pack_exception=pack_exception, ---> 84 kwargs 85 ) 86 ~\Anaconda3\lib\site-packages\dask\local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, *kwargs) 484 _execute_task(task, data) # Re-execute locally 485 else: --> 486 raise_exception(exc, tb) 487 res, worker_id = loads(res_info) 488 state["cache"][key] = res ~\Anaconda3\lib\site-packages\dask\local.py in reraise(exc, tb) 314 if exc.traceback is not tb: 315 raise exc.with_traceback(tb) --> 316 raise exc 317 318 ~\Anaconda3\lib\site-packages\dask\local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception) 220 try: 221 task, data = loads(task_info) --> 222 result = _execute_task(task, data) 223 id = get_id() 224 result = dumps((result, id)) ~\Anaconda3\lib\site-packages\dask\core.py in _execute_task(arg, cache, dsk) 119 # temporaries by their reference count and can execute certain 120 # operations in-place. --> 121 return func((_execute_task(a, cache) for a in args)) 122 elif not ishashable(arg): 123 return arg ~\Anaconda3\lib\site-packages\dask\array\core.py in getter(a, b, asarray, lock) 98 c = a[b] 99 if asarray: --> 100 c = np.asarray(c) 101 finally: 102 if lock: ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 489 490 def array(self, dtype=None): --> 491 return np.asarray(self.array, dtype=dtype) 492 493 def getitem(self, key): ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 651 652 def array(self, dtype=None): --> 653 return np.asarray(self.array, dtype=dtype) 654 655 def getitem(self, key): ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 555 def array(self, dtype=None): 556 array = as_indexable(self.array) --> 557 return np.asarray(array[self.key], dtype=None) 558 559 def transpose(self, order): ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\coding\variables.py in array(self, dtype) 70 71 def array(self, dtype=None): ---> 72 return self.func(self.array) 73 74 def repr(self): ~\Anaconda3\lib\site-packages\xarray\coding\variables.py in _scale_offset_decoding(data, scale_factor, add_offset, dtype) 216 217 def _scale_offset_decoding(data, scale_factor, add_offset, dtype): --> 218 data = np.array(data, dtype=dtype, copy=True) 219 if scale_factor is not None: 220 data = scale_factor ~\Anaconda3\lib\site-packages\xarray\coding\variables.py in array(self, dtype) 70 71 def array(self, dtype=None): ---> 72 return self.func(self.array) 73 74 def repr(self): ~\Anaconda3\lib\site-packages\xarray\coding\variables.py in _apply_mask(data, encoded_fill_values, decoded_fill_value, dtype) 136 ) -> np.ndarray: 137 """Mask all matching values in a NumPy arrays.""" --> 138 data = np.asarray(data, dtype=dtype) 139 condition = False 140 for fv in encoded_fill_values: ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 555 def array(self, dtype=None): 556 array = as_indexable(self.array) --> 557 return np.asarray(array[self.key], dtype=None) 558 559 def transpose(self, order): ~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in getitem(self, key) 71 def getitem(self, key): 72 return indexing.explicit_indexing_adapter( ---> 73 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem 74 ) 75 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method) 835 """ 836 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support) --> 837 result = raw_indexing_method(raw_key.tuple) 838 if numpy_indices.tuple: 839 # index the loaded np.ndarray ~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in _getitem(self, key) 82 try: 83 with self.datastore.lock: ---> 84 original_array = self.get_array(needs_lock=False) 85 array = getitem(original_array, key) 86 except IndexError: ~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in get_array(self, needs_lock) 61 62 def get_array(self, needs_lock=True): ---> 63 ds = self.datastore._acquire(needs_lock) 64 variable = ds.variables[self.variable_name] 65 variable.set_auto_maskandscale(False) ~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in _acquire(self, needs_lock) 359 360 def _acquire(self, needs_lock=True): --> 361 with self._manager.acquire_context(needs_lock) as root: 362 ds = _nc4_require_group(root, self._group, self._mode) 363 return ds ~\Anaconda3\lib\contextlib.py in enter(self) 110 del self.args, self.kwds, self.func 111 try: --> 112 return next(self.gen) 113 except StopIteration: 114 raise RuntimeError("generator didn't yield") from None ~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in acquire_context(self, needs_lock) 184 def acquire_context(self, needs_lock=True): 185 """Context manager for acquiring a file.""" --> 186 file, cached = self._acquire_with_cache_info(needs_lock) 187 try: 188 yield file ~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock) 206 # ensure file doesn't get overriden when opened again 207 self._mode = "a" --> 208 self._cache[self._key] = file 209 return file, False 210 else: ~\Anaconda3\lib\site-packages\xarray\backends\lru_cache.py in setitem(self, key, value) 71 elif self._maxsize: 72 # make room if necessary ---> 73 self._enforce_size_limit(self._maxsize - 1) 74 self._cache[key] = value 75 elif self._on_evict is not None: ~\Anaconda3\lib\site-packages\xarray\backends\lru_cache.py in _enforce_size_limit(self, capacity) 61 key, value = self._cache.popitem(last=False) 62 if self._on_evict is not None: ---> 63 self._on_evict(key, value) 64 65 def setitem(self, key: K, value: V) -> None: ~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in <lambda>(k, v) 12 # Global cache for storing open files. 13 FILE_CACHE: LRUCache[str, io.IOBase] = LRUCache( ---> 14 maxsize=cast(int, OPTIONS["file_cache_maxsize"]), on_evict=lambda k, v: v.close() 15 ) 16 assert FILE_CACHE.maxsize, "file cache must be at least size one" netCDF4_netCDF4.pyx in netCDF4._netCDF4.Dataset.close() netCDF4_netCDF4.pyx in netCDF4._netCDF4.Dataset._close() netCDF4_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success() RuntimeError: NetCDF: HDF error ``` I also tried changing setting xr.set_options(file_cache_maxsize=500) outside of the loop before trying to create the netcdf file and received this error: ``` KeyError Traceback (most recent call last) ~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock) 197 try: --> 198 file = self._cache[self._key] 199 except KeyError: ~\Anaconda3\lib\site-packages\xarray\backends\lru_cache.py in getitem(self, key) 52 with self._lock: ---> 53 value = self._cache[key] 54 self._cache.move_to_end(key) KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/201512/oisst-avhrr-v02r01.20151231.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))] During handling of the above exception, another exception occurred: OSError Traceback (most recent call last) <ipython-input-4-474cdce51e60> in <module> 1 xr.set_options(file_cache_maxsize=500) ----> 2 sst_mean_climo_test.to_netcdf(path='E:/Riskpulse_HD/SST_stuff/sst_mean_climo_test') ~\Anaconda3\lib\site-packages\xarray\core\dataarray.py in to_netcdf(self, args,* kwargs) 2356 dataset = self.to_dataset() 2357 -> 2358 return dataset.to_netcdf(args, kwargs) 2359 2360 def to_dict(self, data: bool = True) -> dict: ~\Anaconda3\lib\site-packages\xarray\core\dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1552 unlimited_dims=unlimited_dims, 1553 compute=compute, -> 1554 invalid_netcdf=invalid_netcdf, 1555 ) 1556 ~\Anaconda3\lib\site-packages\xarray\backends\api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1095 return writer, store 1096 -> 1097 writes = writer.sync(compute=compute) 1098 1099 if path_or_file is None: ~\Anaconda3\lib\site-packages\xarray\backends\common.py in sync(self, compute) 202 compute=compute, 203 flush=True, --> 204 regions=self.regions, 205 ) 206 self.sources = [] ~\Anaconda3\lib\site-packages\dask\array\core.py in store(sources, targets, lock, regions, compute, return_stored, kwargs) 943 944 if compute: --> 945 result.compute(kwargs) 946 return None 947 else: ~\Anaconda3\lib\site-packages\dask\base.py in compute(self, kwargs) 164 dask.base.compute 165 """ --> 166 (result,) = compute(self, traverse=False,** kwargs) 167 return result 168 ~\Anaconda3\lib\site-packages\dask\base.py in compute(args, kwargs) 442 postcomputes.append(x.dask_postcompute()) 443 --> 444 results = schedule(dsk, keys, kwargs) 445 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)]) 446 ~\Anaconda3\lib\site-packages\dask\threaded.py in get(dsk, result, cache, num_workers, pool, kwargs) 82 get_id=_thread_get_id, 83 pack_exception=pack_exception, ---> 84 kwargs 85 ) 86 ~\Anaconda3\lib\site-packages\dask\local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, *kwargs) 484 _execute_task(task, data) # Re-execute locally 485 else: --> 486 raise_exception(exc, tb) 487 res, worker_id = loads(res_info) 488 state["cache"][key] = res ~\Anaconda3\lib\site-packages\dask\local.py in reraise(exc, tb) 314 if exc.traceback is not tb: 315 raise exc.with_traceback(tb) --> 316 raise exc 317 318 ~\Anaconda3\lib\site-packages\dask\local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception) 220 try: 221 task, data = loads(task_info) --> 222 result = _execute_task(task, data) 223 id = get_id() 224 result = dumps((result, id)) ~\Anaconda3\lib\site-packages\dask\core.py in _execute_task(arg, cache, dsk) 119 # temporaries by their reference count and can execute certain 120 # operations in-place. --> 121 return func((_execute_task(a, cache) for a in args)) 122 elif not ishashable(arg): 123 return arg ~\Anaconda3\lib\site-packages\dask\array\core.py in getter(a, b, asarray, lock) 98 c = a[b] 99 if asarray: --> 100 c = np.asarray(c) 101 finally: 102 if lock: ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 489 490 def array(self, dtype=None): --> 491 return np.asarray(self.array, dtype=dtype) 492 493 def getitem(self, key): ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 651 652 def array(self, dtype=None): --> 653 return np.asarray(self.array, dtype=dtype) 654 655 def getitem(self, key): ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 555 def array(self, dtype=None): 556 array = as_indexable(self.array) --> 557 return np.asarray(array[self.key], dtype=None) 558 559 def transpose(self, order): ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\coding\variables.py in array(self, dtype) 70 71 def array(self, dtype=None): ---> 72 return self.func(self.array) 73 74 def repr(self): ~\Anaconda3\lib\site-packages\xarray\coding\variables.py in _scale_offset_decoding(data, scale_factor, add_offset, dtype) 216 217 def _scale_offset_decoding(data, scale_factor, add_offset, dtype): --> 218 data = np.array(data, dtype=dtype, copy=True) 219 if scale_factor is not None: 220 data = scale_factor ~\Anaconda3\lib\site-packages\xarray\coding\variables.py in array(self, dtype) 70 71 def array(self, dtype=None): ---> 72 return self.func(self.array) 73 74 def repr(self): ~\Anaconda3\lib\site-packages\xarray\coding\variables.py in _apply_mask(data, encoded_fill_values, decoded_fill_value, dtype) 136 ) -> np.ndarray: 137 """Mask all matching values in a NumPy arrays.""" --> 138 data = np.asarray(data, dtype=dtype) 139 condition = False 140 for fv in encoded_fill_values: ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 555 def array(self, dtype=None): 556 array = as_indexable(self.array) --> 557 return np.asarray(array[self.key], dtype=None) 558 559 def transpose(self, order): ~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in getitem(self, key) 71 def getitem(self, key): 72 return indexing.explicit_indexing_adapter( ---> 73 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem 74 ) 75 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method) 835 """ 836 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support) --> 837 result = raw_indexing_method(raw_key.tuple) 838 if numpy_indices.tuple: 839 # index the loaded np.ndarray ~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in _getitem(self, key) 82 try: 83 with self.datastore.lock: ---> 84 original_array = self.get_array(needs_lock=False) 85 array = getitem(original_array, key) 86 except IndexError: ~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in get_array(self, needs_lock) 61 62 def get_array(self, needs_lock=True): ---> 63 ds = self.datastore._acquire(needs_lock) 64 variable = ds.variables[self.variable_name] 65 variable.set_auto_maskandscale(False) ~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in _acquire(self, needs_lock) 359 360 def _acquire(self, needs_lock=True): --> 361 with self._manager.acquire_context(needs_lock) as root: 362 ds = _nc4_require_group(root, self._group, self._mode) 363 return ds ~\Anaconda3\lib\contextlib.py in enter(self) 110 del self.args, self.kwds, self.func 111 try: --> 112 return next(self.gen) 113 except StopIteration: 114 raise RuntimeError("generator didn't yield") from None ~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in acquire_context(self, needs_lock) 184 def acquire_context(self, needs_lock=True): 185 """Context manager for acquiring a file.""" --> 186 file, cached = self._acquire_with_cache_info(needs_lock) 187 try: 188 yield file ~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock) 202 kwargs = kwargs.copy() 203 kwargs["mode"] = self._mode --> 204 file = self._opener(self._args,* kwargs) 205 if self._mode == "w": 206 # ensure file doesn't get overriden when opened again netCDF4_netCDF4.pyx in netCDF4._netCDF4.Dataset.init() netCDF4_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success() OSError: [Errno -37] NetCDF: Write to read only: b'https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/201512/oisst-avhrr-v02r01.20151231.nc' ``` I believe these errors have something to do with a post that I created a couple weeks ago (https://github.com/pydata/xarray/issues/4082). I'm not sure if you can @ users on here, but @rsignell-usgs found out something about the caching before hand. It seems that this is some sort of Windows issue. Versions* python: 3.7.4 xarray: 0.15.1 pandas: 1.0.3 numpy: 1.18.1 scipy: 1.4.1 netcdf4: 1.4.2	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4169/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
924002003	MDU6SXNzdWU5MjQwMDIwMDM=	5483	Cannot interpolate on a multifile .grib array. Single file works fine.	Alexander-Serov 22743277	closed		1	2021-06-17T14:36:57Z	2022-04-09T15:50:24Z	2022-04-09T15:50:23Z	NONE	What happened: I have multiple .grib files that I am able to successfully open using the `xr.open_mfdataset()` function and the `cfgrib` engine. However, I cannot interpolate the opened array due to a `NonImplementedError` from the `dask` package. Apparently, internally the interpolation requires some complicated slicing that is not yet there. The latitude and longitude are well within the stored grid. The interpolation works just fine if I open a single file using `xr.load_dataset('file.grb', engine='cfgrib')`. Since the files are too big, I cannot just load the array completely or resave the array into a single file. So I was wondering whether you might have ideas of a workaround so that I could get to the values I need, until it's implemented in `dask`. Basically, I just need to extract (interpolate) all variables at a handful of locations. What you expected to happen: Interpolate the mutlifile grib array along latitude and longitude. Minimal Complete Verifiable Example: `python dsmf = xr.open_mfdataset(glob('<root_path>/*/.grb', recursive=True), engine='cfgrib', parallel = True, combine = 'nested', concat_dim='time') dsmf.interp(latitude=48, longitude=12)` Result: Traceback (most recent call last): File "<input>", line 1, in <module> File "C:\tools\miniconda3\envs\my_env\lib\site-packages\xarray\core\dataset.py", line 2989, in interp obj = self if assume_sorted else self.sortby([k for k in coords]) File "C:\tools\miniconda3\envs\my_env\lib\site-packages\xarray\core\dataset.py", line 5920, in sortby return aligned_self.isel(indices) File "C:\tools\miniconda3\envs\my_env\lib\site-packages\xarray\core\dataset.py", line 2230, in isel var_value = var_value.isel(var_indexers) File "C:\tools\miniconda3\envs\my_env\lib\site-packages\xarray\core\variable.py", line 1135, in isel return self[key] File "C:\tools\miniconda3\envs\my_env\lib\site-packages\xarray\core\variable.py", line 780, in __getitem__ data = as_indexable(self._data)[indexer] File "C:\tools\miniconda3\envs\my_env\lib\site-packages\xarray\core\indexing.py", line 1312, in __getitem__ return array[key] File "C:\tools\miniconda3\envs\my_env\lib\site-packages\dask\array\core.py", line 1749, in __getitem__ dsk, chunks = slice_array(out, self.name, self.chunks, index2, self.itemsize) File "C:\tools\miniconda3\envs\my_env\lib\site-packages\dask\array\slicing.py", line 170, in slice_array dsk_out, bd_out = slice_with_newaxes(out_name, in_name, blockdims, index, itemsize) File "C:\tools\miniconda3\envs\my_env\lib\site-packages\dask\array\slicing.py", line 192, in slice_with_newaxes dsk, blockdims2 = slice_wrap_lists(out_name, in_name, blockdims, index2, itemsize) File "C:\tools\miniconda3\envs\my_env\lib\site-packages\dask\array\slicing.py", line 238, in slice_wrap_lists raise NotImplementedError("Don't yet support nd fancy indexing") NotImplementedError: Don't yet support nd fancy indexing Anything else we need to know?: Since the files are too big, I am unable to share for the moment, but I suspect the issue might be reproducible on any multifile grib combination. Environment**: INSTALLED VERSIONS commit: None python: 3.8.10 \| packaged by conda-forge \| (default, May 11 2021, 06:25:23) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 13, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: ('English_United Kingdom', '1252') libhdf5: 1.10.6 libnetcdf: 4.7.3 xarray: 0.18.2 pandas: 1.2.4 numpy: 1.20.3 scipy: 1.6.3 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.8.3 cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.9.0 iris: None bottleneck: None dask: 2021.06.0 distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.1.2 conda: None pytest: 6.2.4 IPython: None sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5483/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
878481461	MDU6SXNzdWU4Nzg0ODE0NjE=	5276	open_mfdataset: Not a valid ID	minhhg 11815787	closed		4	2021-05-07T05:34:02Z	2022-04-09T15:49:50Z	2022-04-09T15:49:50Z	NONE	I have about 601 NETCDF4 files saved using xarray. We try to use open_mfdataset to access these files. The main code calls this function many times. At the first few calls, it works fine, after for a while it throw the following error message "RuntimeError: NetCDF: Not a valid ID" `python def func(xpath, spec): doc = deepcopy(spec) with xr.open_mfdataset(xpath + "/.nc", concat_dim='maturity') as data: var_name= list(data.data_vars)[0] ar = data[var_name] maturity = spec['maturity'] ann = ar.cumsum(dim='maturity') ann = ann - 1 ar1 = ann.sel(maturity=maturity) doc['data'] = ar1.load().values return doc` Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.6.8.final.0 python-bits: 64 OS: Linux OS-release: 5.4.0-1047-aws machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.11.0 pandas: 0.24.1 numpy: 1.15.4 scipy: 1.2.0 netCDF4: 1.4.2 h5netcdf: None h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 PseudonetCDF: None rasterio: None iris: None bottleneck: 1.2.1 cyordereddict: None dask: 1.1.1 distributed: 1.25.3 matplotlib: 3.0.2 cartopy: None seaborn: 0.9.0 setuptools: 40.7.3 pip: 19.0.1 conda: None pytest: 4.2.0 IPython: 7.1.1 sphinx: 1.8.4 This error also happens with xarray version 0.10.9 Error trace: ```python 2021-05-05 09:28:19,911, DEBUG 7621, sim_io.py:483 - load_unique_document(), xpa th=/home/ubuntu/runs/20210331_001/nominal_dfs/uk 2021-05-05 09:28:42,774, ERROR 7621, run_gov_ret.py:33 - <module>(), Unknown error=NetCDF: Not a valid ID Traceback (most recent call last): File "/home/ubuntu/dev/py36/python/ev/model/api3/run_gov_ret.py", line 31, in <module> res = govRet() File "/home/ubuntu/dev/py36/python/ev/model/api3/returns.py", line 56, in __ca ll__ decompose=self.decompose)) File "/home/ubuntu/dev/py36/python/ev/model/returns/returnsGenerator.py", line 70, in calc_returns dfs_data = self.mongo_dfs.get_data(mats=[1,mat,mat-1]) File "/home/ubuntu/dev/py36/python/ev/model/api3/dfs.py", line 262, in get_dat a record = self.mdb.load_unique_document(self.dfs_collection_name, spec) File "/home/ubuntu/dev/py36/python/ev/model/api3/sim_io.py", line 1109, in load_unique_document return self.collections[collection].load_unique_document(query, args, kwargs) File "/home/ubuntu/dev/py36/python/ev/model/api3/sim_io.py", line 501, in load_unique_document doc['data'] = ar1.load().values File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/dataarray.py", line 631, in load ds = self._to_temp_dataset().load(kwargs) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/dataset.py", line 494, in load evaluated_data = da.compute(lazy_data.values(), kwargs) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/base.py", line 398, in compute results = schedule(dsk, keys, kwargs) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/threaded.py", line 76, in get pack_exception=pack_exception, kwargs) pack_exception=pack_exception, kwargs) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/local .py", line 459, in get_async raise_exception(exc, tb) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/compa tibility.py", line 112, in reraise raise exc File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/local .py", line 230, in execute_task result = _execute_task(task, data) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/core. py", line 119, in _execute_task return func(args2) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/array /core.py", line 82, in getter c = np.asarray(c) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/numpy/core /numeric.py", line 501, in asarray return array(a, dtype, copy=False, order=order) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/cor e/indexing.py", line 602, in __array__ return np.asarray(self.array, dtype=dtype) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/numpy/core/numeric.py", line 501, in asarray return array(a, dtype, copy=False, order=order) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/indexing.py", line 508, in __array__ return np.asarray(array[self.key], dtype=None) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 64, in __getitem__ self._getitem) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/indexing.py", line 776, in explicit_indexing_adapter result = raw_indexing_method(raw_key.tuple) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 76, in _getitem array = getitem(original_array, key) File "netCDF4/_netCDF4.pyx", line 4095, in netCDF4._netCDF4.Variable.__getitem__ File "netCDF4/_netCDF4.pyx", line 3798, in netCDF4._netCDF4.Variable.shape.__get__ File "netCDF4/_netCDF4.pyx", line 3746, in netCDF4._netCDF4.Variable._getdims File "netCDF4/_netCDF4.pyx", line 1754, in netCDF4._netCDF4._ensure_nc_success RuntimeError: NetCDF: Not a valid ID ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5276/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
427644858	MDU6SXNzdWU0Mjc2NDQ4NTg=	2861	WHERE function, problems with memory operations?	rpnaut 30219501	closed		8	2019-04-01T11:09:11Z	2022-04-09T15:41:51Z	2022-04-09T15:41:51Z	NONE	I am facing with the where-functionality in xarray. I have two datasets ref = array([[14.82, 14.94, nan, ..., 16.21, 16.24, nan], [14.52, 14.97, nan, ..., 16.32, 16.34, nan], [15.72, 16.09, nan, ..., 17.38, 17.44, nan], ..., [ 6.55, 6.34, nan, ..., 6.67, 6.6 , nan], [ 8.76, 9.12, nan, ..., 9.07, 9.52, nan], [ 8.15, 8.97, nan, ..., 9.65, 9.52, nan]], dtype=float32) Coordinates: * height_WSS (height_WSS) float32 40.3 50.3 60.3 70.3 80.3 90.3 101.2 105.0 lat float32 54.01472 lon float32 6.5875 * time (time) datetime64[ns] 2006-10-31T00:10:00 ... 2006-11-03T23:10:00 Attributes: standard_name: wind_speed long_name: wind speed units: m s-1 cell_methods: time: mean comment: direction of the boom holding the measurement devices: 41... sensor: cup anemometer sensor_type: Vector Instruments Windspeed Ltd. A100LK/PC3/WR accuracy: 0.1 m s-1 and proof= <xarray.DataArray 'WSS' (time: 96, height_WSS: 8)> array([[13.395692, 13.653825, 13.911958, ..., 14.511758, 14.703774, 14.770716], [14.740592, 15.010887, 15.281183, ..., 15.866542, 16.045753, 16.10823 ], [15.241853, 15.523318, 15.804785, ..., 16.417458, 16.605673, 16.67129 ], ..., [ 8.254081, 8.309716, 8.365352, ..., 8.46401 , 8.489728, 8.498694], [ 9.83241 , 9.895019, 9.957627, ..., 10.055538, 10.077768, 10.085519], [ 8.772054, 8.849378, 8.926702, ..., 9.065577, 9.102219, 9.114992]], dtype=float32) Coordinates: * time (time) datetime64[ns] 2006-10-31T00:10:00 ... 2006-11-03T23:10:00 lon float32 6.5875 lat float32 54.01472 * height_WSS (height_WSS) float32 40.3 50.3 60.3 70.3 80.3 90.3 101.2 105.0 Attributes: standard_name: wind_speed long_name: wind speed units: m s-1 Applying something like this: `DSproof = proof["WSS"].where(ref["WSS"].notnull()).to_dataset(name="WSS")` gives me a dataarray of time length zero: `<xarray.Dataset> Dimensions: (height_WSS: 8, time: 0) Coordinates: * time (time) datetime64[ns] lon float32 6.5875 lat float32 54.01472 * height_WSS (height_WSS) float32 40.3 50.3 60.3 70.3 80.3 90.3 101.2 105.0 Data variables: WSS (time, height_WSS) float32` Problem description The problem seems to be that 'ref' and 'proof' are not entirely consistent somehow regarding coordinates. But if a subtract the coordinates from each other I do not get a difference. However, as I always fight with getting datasets consistent to each other for mathematical calculations with xarray, I have figured out following workarounds: One can drop the coordinates lon and lat from both datasets. Then everything works fine with 'where'. I am using WHERE in a large script with some operations done before WHERE is called. One operation is to make the data types and the coordinate names between 'ref' and 'proof' consistent (thatswhy the above output is very similar). If I save the files and reload them immediately before applying WHERE, it fixes my problem. Using a selection of all height levels `proof["WSS"].isel(height=slice(0,9).where(ref["WSS"].isel(height=slice(0,9).notnull()).to_dataset(name="WSS")` also fixes my problem. Maybe, here I deal with a problem of incomplete operations in memory? The printout between datasets is maybe consistent but still an additional operation on the datasets is required to make the datasets consistent in memory? Thanks in advance for your help	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2861/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
469440752	MDU6SXNzdWU0Njk0NDA3NTI=	3139	Change the signature of DataArray to DataArray(data, dims, coords, ...)?	shoyer 1217238	open		1	2019-07-17T20:54:57Z	2022-04-09T15:28:51Z		MEMBER	Currently, the signature of DataArray is `DataArray(data, coords, dims, ...)`: http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html In the long term, I think `DataArray(data, dims, coords, ...)` would be more intuitive: dimensions are a more fundamental part of xarray's data model than coordinates. Certainly I find it much more common to omit `coords` than to omit `dims` when I create a `DataArray`. My original reasoning for this argument order was that `dims` could be copied from `coords`, e.g., `DataArray(new_data, old_dataarray.coords)`, and it was nice to be able to pass this sole argument by position instead of by name. But a cleaner way to write this now is `old_dataarray.copy(data=new_data)`. The challenge in making any change here would be to have a smooth deprecation process, and that ideally avoids requiring users to rewrite all of their code and avoids loads of pointless/extraneous warnings. I'm not entirely sure this is possible. We could likely use heuristics to distinguish between `dims` and `coords` arguments regardless of their order, but this probably isn't something we would want to preserve in the long term. An alternative that might achieve some of the convenience of this change would be to allow for passing lists of strings in the `coords` argument by position, which are interpreted as dimensions, e.g., `DataArray(data, ['x', 'y'])`. The downside of this alternative is that it would add even more special cases to the `DataArray` constructor , which would make it harder to understand.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3139/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
208312826	MDU6SXNzdWUyMDgzMTI4MjY=	1273	replace a dim with a coordinate from another dataset	rabernat 1197350	open		4	2017-02-17T02:15:36Z	2022-04-09T15:26:20Z		MEMBER	I often want a function that takes a dataarray / dataset and replaces a dimension with a coordinate from a different dataset. @shoyer proposed the following simple solution. ```python def replace_dim(da, olddim, newdim): renamed = da.rename({olddim: newdim.name}) `# note that alignment along a dimension is skipped when you are overriding # the relevant coordinate values renamed.coords[newdim.name] = newdim return renamed` ``` Is this of broad enough interest to add a build in method for?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1273/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
995207525	MDU6SXNzdWU5OTUyMDc1MjU=	5790	combining 2 arrays with xr.merge() causes temporary spike in memory usage ~3x the combined size of the arrays	zachglee 23262800	closed		6	2021-09-13T18:42:03Z	2022-04-09T15:25:28Z	2022-04-09T15:25:28Z	NONE	What happened: When attempting to combine two arrays of sizes `b1` and `b2` bytes: `xr.merge([da1, da2])`, I observe that memory usage temporarily increases by about ~`3(b1+b2)` bytes. Once the operation finishes, the memory usage has a net increase of `(b1+b2)` bytes, which is what I would expect, since that's the size of the merged array I just created. What I did not expect was the temporary increase of ~`3(b1+b2)` bytes. For small arrays this temporary spike in memory is fine, but for larger arrays this means we are essentially limited to combining arrays of total size below 1/3rd of an instance's memory limit. Anything above that and the temporary spike causes the instance to crash. What you expected to happen: I expected there to be only a memory increase of `b1+b2` bytes, the amount needed to store the merged array. I did not expect memory increase to go higher than that during the merge operation. Minimal Complete Verifiable Example: ```python Put your MCVE code here import numpy as np import xarray as xr import tracemalloc tracemalloc.start() print("(current, peak) memory at start:") print(tracemalloc.get_traced_memory()) create the test data (each is 100 by 100 by 10 array of random floats) Their A and B coordinates are completely matching. Their C coordinates are completely disjoint. data1 = np.random.rand(100, 100, 10) da1 = xr.DataArray( data1, dims=("A", "B", "C"), coords={ "A": [f"A{i}" for i in range(100)], "B": [f"B{i}" for i in range(100)], "C": [f"C{i}" for i in range(10)]}, ) da1.name = "da" data2 = np.random.rand(100, 100, 10) da2 = xr.DataArray( data2, dims=("A", "B", "C"), coords={ "A": [f"A{i}" for i in range(100)], "B": [f"B{i}" for i in range(100)], "C": [f"C{i+10}" for i in range(10)]}, ) da2.name = "da" print("(current, peak) memory after creation of arrays to be combined:") print(tracemalloc.get_traced_memory()) print(f"da1.nbytes = {da1.nbytes}") print(f"da2.nbytes = {da2.nbytes}") da_combined = xr.merge([da1, da2]).to_array() print("(current, peak) memory after merging. You should observe that the peak memory usage is now much higher.") print(tracemalloc.get_traced_memory()) print(f"da_combined.nbytes = {da_combined.nbytes}") print(da_combined) ``` Anything else we need to know?: Interestingly, when I try merging 3 arrays at once, (sizes `b1`, `b2`, `b3`) I observe temporary memory usage increase of about `5(b1+b2+b3)`. I have a hunch that all arrays get aligned to the final merged coordinate space (which is much bigger), before* they are combined, which means at some point in the middle of the process we have a bunch of arrays in memory that have been inflated to the size of the final output array. If that's the case, it seems like it should be possible to make this operation more efficient by creating just one inflated array and adding the data from the input arrays to it in-place? Or is this an expected and unavoidable behavior with merging? (fwiw this also affects several other combination methods, presumably because they use `merge()` under the hood?) Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 \| packaged by conda-forge \| (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 4.19.121-linuxkit machine: x86_64 processor: byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.17.0 pandas: 1.2.3 numpy: 1.19.5 scipy: 1.6.0 netCDF4: 1.5.6 pydap: None h5netcdf: 0.11.0 h5py: 3.3.0 Nio: None zarr: None cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.4.2 cartopy: None seaborn: None numbagg: None pint: 0.16.1 setuptools: 57.4.0 pip: 21.2.4 conda: None pytest: 6.2.2 IPython: 7.23.1 sphinx: 3.5.2	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5790/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
257070215	MDU6SXNzdWUyNTcwNzAyMTU=	1569	Grouping with multiple levels	jjpr-mit 25231875	closed		6	2017-09-12T14:46:12Z	2022-04-09T15:25:07Z	2022-04-09T15:25:06Z	NONE	http://xarray.pydata.org/en/stable/groupby.html says: xarray supports “group by” operations with the same API as pandas but when I supply the `level` keyword argument as described at https://pandas.pydata.org/pandas-docs/stable/groupby.html#groupby-with-multiindex, I get: ``` TypeError Traceback (most recent call last) <ipython-input-12-566fc67c0151> in <module>() ----> 1 hvm_it_v6_obj = hvm_it_v6.groupby(level=["category","obj"]).mean(dim="presentation") 2 hvm_it_v6_obj TypeError: groupby() got an unexpected keyword argument 'level' ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1569/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
438947247	MDU6SXNzdWU0Mzg5NDcyNDc=	2933	Stack() & unstack() issues on Multindex	ray306 1559890	closed		4	2019-04-30T19:47:51Z	2022-04-09T15:23:28Z	2022-04-09T15:23:28Z	NONE	I would like to reshape the DataArray by one level in the Multindex, and I thought the `stack()`/`unstack()` should be the solution. Make a DataArray with Multindex: `python import pandas as pd arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo']), np.array(['one', 'two', 'one', 'two', 'one', 'two'])] da = pd.DataFrame(np.random.randn(6, 4)).to_xarray().to_array() da.coords['index'] = pd.MultiIndex.from_arrays(arrays, names=['first', 'second']) da` <xarray.DataArray (variable: 4, index: 6)> array([[ 0.379189, 1.082292, -2.073478, -0.84626 , -1.529927, -0.837407], [-0.267983, -0.2516 , -1.016653, -0.085762, -0.058382, -0.667891], [-0.013488, -0.855332, -0.038072, -0.385211, -2.149742, -0.304361], [ 1.749561, -0.606031, 1.914146, 1.6292 , -0.515519, 1.996283]]) Coordinates: * index (index) MultiIndex - first (index) object 'bar' 'bar' 'baz' 'baz' 'foo' 'foo' - second (index) object 'one' 'two' 'one' 'two' 'one' 'two' * variable (variable) int32 0 1 2 3 Stack problem: I want a dimension merges into another one: `python da.stack({'index':['variable']})` `ValueError: cannot create a new dimension with the same name as an existing dimension` Unstack problem: Unstacking by the whole Multindex worked: `python da.unstack('index')` ``` <xarray.DataArray (variable: 4, first: 3, second: 2)> array([[[ 0.379189, 1.082292], [-2.073478, -0.84626 ], [-1.529927, -0.837407]], `[[-0.267983, -0.2516 ], [-1.016653, -0.085762], [-0.058382, -0.667891]], [[-0.013488, -0.855332], [-0.038072, -0.385211], [-2.149742, -0.304361]], [[ 1.749561, -0.606031], [ 1.914146, 1.6292 ], [-0.515519, 1.996283]]])` Coordinates: * variable (variable) int32 0 1 2 3 * first (first) object 'bar' 'baz' 'foo' * second (second) object 'one' 'two' `But unstacking by a specified level failed:`python da.unstack('first') ValueError: Dataset does not contain the dimensions: ['first'] ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2933/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
816540158	MDU6SXNzdWU4MTY1NDAxNTg=	4958	to_zarr mode='a-', append_dim; if dim value exists raise error	ahuang11 15331990	open		1	2021-02-25T15:26:02Z	2022-04-09T15:19:28Z		CONTRIBUTOR	If I have a ds with time, lat, lon and I call the same command twice: `python ds.to_zarr('test.zarr', append_dim='time') ds.to_zarr('test.zarr', append_dim='time')` Can it raise an error since all the times already exist? Kind of like: ```python import numpy as np import xarray as xr ds = xr.tutorial.open_dataset('air_temperature') ds.to_zarr('test_air.zarr', append_dim='time') ds_tmp = xr.open_mfdataset('test_air.zarr', engine='zarr') overlap = np.intersect1d(ds['time'], ds_tmp['time']) if len(overlap) > 1: raise ValueError(f'Found overlapping values in datasets {overlap}') ds.to_zarr('test_air.zarr', append_dim='time') ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4958/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
707571360	MDU6SXNzdWU3MDc1NzEzNjA=	4452	Change default for concat_characters to False in open_* functions	eric-czech 6130352	open		2	2020-09-23T18:06:07Z	2022-04-09T03:21:43Z		NONE	I wanted to propose that concat_characters be False for `open_{dataset,zarr,dataarray}`. I'm not sure how often that affects anyone since working with individual character arrays is probably rare, but it's a particularly bad default in genetics. We often represent individual variations as single characters and the concatenation is destructive because we can't invert it when one of the characters is an empty string (which often corresponds to a deletion at a base pair location, and the order of the characters matters). I also find it to be confusing behavior (e.g. https://github.com/pydata/xarray/issues/4405) since no other arrays are automatically transformed like this when deserialized. If submit a PR for this, would anybody object?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4452/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
903922477	MDU6SXNzdWU5MDM5MjI0Nzc=	5386	Add xr.open_dataset("file.tif", engine="rasterio") to docs	raybellwaves 17162724	closed		1	2021-05-27T15:39:29Z	2022-04-09T03:15:45Z	2022-04-09T03:15:45Z	CONTRIBUTOR	Kind of related to https://github.com/pydata/xarray/issues/4697 I see https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html#rioxarray shows `ds = xarray.open_dataset("file.tif", engine="rasterio")` This could be added to https://xarray.pydata.org/en/latest/user-guide/io.html#rasterio	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5386/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
312203596	MDU6SXNzdWUzMTIyMDM1OTY=	2042	Anyone working on a to_tiff? Alternatively, how do you write an xarray to a geotiff?	ebo 601025	closed		31	2018-04-07T12:43:41Z	2022-04-09T03:14:41Z	2022-04-09T01:19:10Z	NONE	Matthew Rocklin wrote a gist https://gist.github.com/mrocklin/3df315e93d4bdeccf76db93caca2a9bd to demonstrate using XArray to read tiled GeoTIFF datasets, but I am still confused as to how to write them to a GeoTIFF. I can easily create a tiff with "rasterio.open(out, 'w', src.profile)", but the following does not seem like the best/cleanest way to do this: ``` ds = xr.open_rasterio('myfile.tif', chunks={'band': 1, 'x': 2048, 'y': 2048}) with rasterio.open('myfile.tif', 'r') as src: with rasterio.open('new_myfile.tif', 'w', src.profile) as dst: for i in range(1, src.count + 1): dst.write(ds.variable.data[i-1].compute(), i) ``` Also, if the profile and tags were propagated through open_rasterio, then the second open would not be necessary and would be generally useful.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2042/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
956259734	MDU6SXNzdWU5NTYyNTk3MzQ=	5649	xr.merge bug? when using combine_attrs='drop_conflicts'	jbusecke 14314623	open	keewis 14808389	3	2021-07-29T22:47:43Z	2022-04-09T03:14:24Z		CONTRIBUTOR	What happened: I have recently encountered a situation where combining two datasets failed, due to the datatype of their attributes. This example illustrates the situation: ```python ds1 = xr.Dataset(attrs={'a':[5]}) ds2 = xr.Dataset(attrs={'a':6}) xr.merge([ds1, ds2], combine_attrs='drop_conflicts') `give me this error:` TypeError Traceback (most recent call last) <ipython-input-12-1c8e82be0882> in <module> 2 ds2 = xr.Dataset(attrs={'a':6}) 3 ----> 4 xr.merge([ds1, ds2], combine_attrs='drop_conflicts') /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in merge(objects, compat, join, fill_value, combine_attrs) 898 dict_like_objects.append(obj) 899 --> 900 merge_result = merge_core( 901 dict_like_objects, 902 compat, /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value) 654 ) 655 --> 656 attrs = merge_attrs( 657 [var.attrs for var in coerced if isinstance(var, (Dataset, DataArray))], 658 combine_attrs, /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in merge_attrs(variable_attrs, combine_attrs, context) 544 } 545 ) --> 546 result = { 547 key: value 548 for key, value in result.items() /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in <dictcomp>(.0) 547 key: value 548 for key, value in result.items() --> 549 if key not in attrs or equivalent(attrs[key], value) 550 } 551 dropped_keys \|= {key for key in attrs if key not in result} /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/utils.py in equivalent(first, second) 171 return duck_array_ops.array_equiv(first, second) 172 elif isinstance(first, list) or isinstance(second, list): --> 173 return list_equiv(first, second) 174 else: 175 return ( /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/utils.py in list_equiv(first, second) 182 def list_equiv(first, second): 183 equiv = True --> 184 if len(first) != len(second): 185 return False 186 else: TypeError: object of type 'int' has no len() ``` Took me a while to find out what the root cause of this was with a fully populated dataset, since the error is less than obvious. What you expected to happen: In my understanding this should just drop the attribute `a`. The example works like expected when both attributes are an integer or both are lists with an integer. The error is only triggered when the type is mixed. Is there a way to handle this case more elegantly? Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 \| packaged by conda-forge \| (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.89+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.1.dev8+gda99a566 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.6.2 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.7.1 cftime: 1.4.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.2.2 cfgrib: 0.9.9.0 iris: None bottleneck: 1.3.2 dask: 2021.04.1 distributed: 2021.04.1 matplotlib: 3.4.1 cartopy: 0.19.0 seaborn: None numbagg: None pint: 0.17 setuptools: 49.6.0.post20210108 pip: 20.3.4 conda: None pytest: None IPython: 7.22.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5649/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
576502871	MDU6SXNzdWU1NzY1MDI4NzE=	3834	encode_cf_datetime() casts dask arrays to NumPy arrays	andersy005 13301940	open		2	2020-03-05T20:11:37Z	2022-04-09T03:10:49Z		MEMBER	Currently, when `xarray.coding.times.encode_cf_datetime()` is called, it always casts the input to a NumPy array. This is not what I would expect when the input is a dask array. I am wondering if we could make this operation lazy when the input is a dask array? https://github.com/pydata/xarray/blob/01462d65c7213e5e1cddf36492c6a34a7e53ce55/xarray/coding/times.py#L352-L354 ```python In [46]: import numpy as np In [47]: import xarray as xr In [48]: import pandas as pd In [49]: times = pd.date_range("2000-01-01", "2001-01-01", periods=11) In [50]: time_bounds = np.vstack((times[:-1], times[1:])).T In [51]: arr = xr.DataArray(time_bounds).chunk() In [52]: arr Out[52]: <xarray.DataArray (dim_0: 10, dim_1: 2)> dask.array<xarray-\<this-array>, shape=(10, 2), dtype=datetime64[ns], chunksize=(10, 2), chunktype=numpy.ndarray> Dimensions without coordinates: dim_0, dim_1 In [53]: xr.coding.times.encode_cf_datetime(arr) Out[53]: (array([[ 0, 52704], [ 52704, 105408], [105408, 158112], [158112, 210816], [210816, 263520], [263520, 316224], [316224, 368928], [368928, 421632], [421632, 474336], [474336, 527040]]), 'minutes since 2000-01-01 00:00:00', 'proleptic_gregorian') ``` Cc @jhamman	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3834/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
606165039	MDU6SXNzdWU2MDYxNjUwMzk=	4000	Add hook to get progress of long-running operations	cwerner 13906519	closed		3	2020-04-24T09:13:02Z	2022-04-09T03:08:45Z	2022-04-09T03:08:45Z	NONE	Hi. I currently work on a large dataframe that I convert to a Xarray dataset. It works, but takes quite some (unknown) amount of time. MCVE Code Sample `python data = pd.DataFrame("huge data frame with time, lat, Lon as multiindex and about 60 data columns ") dsout = xr.Dataset() dsout = dsout.from_dataframe(data)` Expected Output A progress report/ bar about the operation Problem Description It would be nice to have some hook or other functionality to tap into the xr.from_dataframe() and return a progress status that I then could pass to tqdm or something similar... Versions 0.15.1	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4000/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
607718350	MDU6SXNzdWU2MDc3MTgzNTA=	4011	missing empty group when iterate over groupby_bins	miniufo 9312831	open		4	2020-04-27T17:22:31Z	2022-04-09T03:08:14Z		NONE	When I try to iterate over the object `grouped` returned by `groupby_bins`, I found that the empty group is missing silently. Here is a simple case: ```python array = xr.DataArray(np.arange(4), dims='dim_0') one of these bins will be empty bins = [0,4,5] grouped = array.groupby_bins('dim_0', bins) for i, group in enumerate(grouped): print(str(i)+' '+group) ``` When a bin contains no samples (bin of (4, 5]), the empty group will be dropped. Then how to iterate over the full bins even when some bins contain nothing? I've read this related issue #1019. But my case here need the correct order in grouped and empty groups need to be iterated over.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4011/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
666896781	MDU6SXNzdWU2NjY4OTY3ODE=	4279	intersphinx looks for implementation modules	crusaderky 6213168	open		0	2020-07-28T08:55:12Z	2022-04-09T03:03:30Z		MEMBER	This is a widespread issue caused by the pattern of defining objects in private module and then exposing them to the final user by importing them in the top-level `__init__.py`, vs. how intersphinx works. Exact same issue in different projects: - https://github.com/aio-libs/aiohttp/issues/3714 - https://jira.mongodb.org/browse/MOTOR-338 - https://github.com/tkem/cachetools/issues/178 - https://github.com/AmphoraInc/xarray_mongodb/pull/22 - https://github.com/jonathanslenders/asyncio-redis/issues/143 If a project 1. uses xarray, intersphinx, and autodoc 2. subclasses any of the classes exposed by `xarray/__init__.py` and documents the new class with the `:show-inheritance:` flag 3. Starting from Sphinx 3, has any of the above classes anywhere in a type annotation Then Sphinx emits a warning and fails to create a hyperlink, because intersphinx uses the `__module__` attribute to look up the object in objects.inv, but `__module__` points to the implementation module while objects.inv points to the top-level `xarray` module. Workaround In conf.py: `python import xarray xarray.DataArray.__module__ = "xarray"` Solution Put the above hack in `xarray/__init__.py`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4279/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
667203487	MDU6SXNzdWU2NjcyMDM0ODc=	4282	Values change when writing combined Dataset loaded with open_mfdataset	chpolste 11723107	closed		1	2020-07-28T16:20:09Z	2022-04-09T03:00:55Z	2022-04-09T03:00:55Z	NONE	What happened: Loading two netcdf files with `open_mfdataset` then writing into a combined file results in some values changed in the file. What you expected to happen: That the written file contains the same values than the in-memory `Dataset` when read again. Minimal Complete Verifiable Example: ```python import numpy as np import xarray as xr data1 = xr.open_dataset("file1.nc") data2 = xr.open_dataset("file2.nc") merged = xr.open_mfdataset(["file1.nc", "file2.nc"]) np.all(np.isclose(merged["u"].values[0], data1["u"].values[0])) True np.all(np.isclose(merged["u"].values[-1], data2["u"].values[-1])) True merged.to_netcdf("foo.nc") merged_file = xr.load_dataset("foo.nc") np.all(np.isclose(merged_file["u"].values, merged["u"].values)) False ``` The files contain wind data from the ERA5 reanalysis, downloaded from CDS. Anything else we need to know?: The issue might be related to the scale and offset values of the variable. Continuing the example: ```python np.all(np.isclose(merged_file["u"].values[0], data1["u"].values[0])) True np.all(np.isclose(merged_file["u"].values[-1], data2["u"].values[-1])) False ``` Data from the first file seems to be correct. When writing the combined dataset, the scale and offset from the first file are written to the combined file: ```python data1_nomas = xr.open_dataset("file1.nc", mask_and_scale=False) data2_nomas = xr.open_dataset("file2.nc", mask_and_scale=False) merged_file_nomas = xr.open_dataset("foo.nc", mask_and_scale=False) data1_nomas["u"].attrs {'scale_factor': 0.002397265127278432, 'add_offset': 25.620963232670736, '_FillValue': -32767, 'missing_value': -32767, 'units': 'm s-1', 'long_name': 'U component of wind', 'standard_name': 'eastward_wind'} data2_nomas["u"].attrs {'scale_factor': 0.0024358825557859445, 'add_offset': 21.288035293585388, '_FillValue': -32767, 'missing_value': -32767, 'units': 'm s-1', 'long_name': 'U component of wind', 'standard_name': 'eastward_wind'} merged_file_nomas["u"].attrs {'scale_factor': 0.002397265127278432, 'add_offset': 25.620963232670736, '_FillValue': -32767, 'units': 'm s-1', 'long_name': 'U component of wind', 'standard_name': 'eastward_wind', 'missing_value': -32767} ``` Maybe the data from the second file is not adjusted to fit the new scaling and offset. Environment**: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 \| packaged by conda-forge \| (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.15.0-107-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.0 pandas: 1.0.4 numpy: 1.18.5 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.2 iris: None bottleneck: None dask: 2.18.1 distributed: 2.21.0 matplotlib: 3.2.1 cartopy: 0.18.0 seaborn: None numbagg: None pint: 0.14 setuptools: 49.2.0.post20200712 pip: 20.1.1 conda: 4.8.3 pytest: None IPython: 7.16.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4282/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
673682661	MDU6SXNzdWU2NzM2ODI2NjE=	4313	Using Dependabot to manage doc build and CI versions	jthielen 3460034	open		4	2020-08-05T16:24:24Z	2022-04-09T02:59:21Z		CONTRIBUTOR	As brought up on the bi-weekly community developers meeting, it sounds like Pandas v1.1.0 is breaking doc builds on RTD. One solution to the issues of frequent breakages in doc builds and CI due to upstream updates is having fixed version lists for all of these, which are then incrementally updated as new versions come out. @dopplershift has done a lot of great work in MetPy getting such a workflow set up with Dependabot (https://github.com/Unidata/MetPy/pull/1410) among other CI updates, and this could be adapted for use here in xarray. We've generally been quite happy with our updated CI configuration with Dependabot over the past couple weeks. The only major issue has been https://github.com/Unidata/MetPy/issues/1424 / https://github.com/dependabot/dependabot-core/issues/2198#issuecomment-649726022, which has required some contributors to have to delete and recreate their forks in order for Dependabot to not auto-submit PRs to the forked repos. Any thoughts that you had here @dopplershift would be appreciated! xref https://github.com/pydata/xarray/issues/4287, https://github.com/pydata/xarray/pull/4296	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4313/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
685739084	MDU6SXNzdWU2ODU3MzkwODQ=	4375	allow using non-dimension coordinates in polyfit	mathause 10194086	open		1	2020-08-25T19:40:55Z	2022-04-09T02:58:48Z		MEMBER	`polyfit` currently only allows to fit along a dimension and not along a non-dimension coordinate (or a virtual coordinate) Example: ```python da = xr.DataArray( [1, 3, 2], dims=["x"], coords=dict(x=["a", "b", "c"], y=("x", [0, 1, 2])) ) print(da) da.polyfit("y", 1) `Output:`python <xarray.DataArray (x: 3)> array([1, 3, 2]) Coordinates: * x (x) <U1 'a' 'b' 'c' y (x) int64 0 1 2 KeyError Traceback (most recent call last) <ipython-input-80-9bb2dacf50f7> in <module> 5 print(da) 6 ----> 7 da.polyfit("y", 1) ~/.conda/envs/ipcc_ar6/lib/python3.7/site-packages/xarray/core/dataarray.py in polyfit(self, dim, deg, skipna, rcond, w, full, cov) 3507 """ 3508 return self._to_temp_dataset().polyfit( -> 3509 dim, deg, skipna=skipna, rcond=rcond, w=w, full=full, cov=cov 3510 ) 3511 ~/.conda/envs/ipcc_ar6/lib/python3.7/site-packages/xarray/core/dataset.py in polyfit(self, dim, deg, skipna, rcond, w, full, cov) 6005 skipna_da = skipna 6006 -> 6007 x = get_clean_interp_index(self, dim, strict=False) 6008 xname = "{}_".format(self[dim].name) 6009 order = int(deg) + 1 ~/.conda/envs/ipcc_ar6/lib/python3.7/site-packages/xarray/core/missing.py in get_clean_interp_index(arr, dim, use_coordinate, strict) 246 247 if use_coordinate is True: --> 248 index = arr.get_index(dim) 249 250 else: # string ~/.conda/envs/ipcc_ar6/lib/python3.7/site-packages/xarray/core/common.py in get_index(self, key) 378 """ 379 if key not in self.dims: --> 380 raise KeyError(key) 381 382 try: KeyError: 'y' ``` Describe the solution you'd like Would be nice if that worked. Describe alternatives you've considered One could just set the non-dimension coordinate as index, e.g.: `da = da.set_index(x="y")` Additional context Allowing this may be as easy as replacing https://github.com/pydata/xarray/blob/9c85dd5f792805bea319f01f08ee51b83bde0f3b/xarray/core/missing.py#L248 by `index = arr[dim]` but I might be missing something. Or probably a `use_coordinate` must be threaded through to `get_clean_interp_index` (although I am a bit confused by this argument).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4375/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
792651098	MDU6SXNzdWU3OTI2NTEwOTg=	4840	Opening a dataset doesn't display groups.	dklink 11861183	open		2	2021-01-23T21:16:32Z	2022-04-09T02:31:03Z		NONE	Problem I know xarray doesn't support netCDF4 Group functionality. That's fine, I bet it's incredibly thorny. My issue is, when you open the root group of a netCDF4 file which contains groups, xarray doesn't even tell you that there are groups; they are totally invisible. This seems like a big flaw; you've opened a file, shouldn't you at least be told what's in it? Solution When you open a dataset with the netcdf4-python library, you get something like this: >>> netCDF4.Dataset(path) <class 'netCDF4._netCDF4.Dataset'> root group (NETCDF4 data model, file format HDF5): some global attribute: some value dimensions(sizes): ... variables(dimensions): ... *groups: group1, group2* "groups" shows up sort of like an auto-generated attribute. Surely xarray can do something similar: >>> xr.open_dataset(path) <xarray.Dataset> Dimensions: ... Coordinates: ... Data variables: ... Attributes: ... *Groups: group1, group2* Workaround The workaround I am considering is to actually add an attribute to my root group which contains a list of the groups in the file, so people using xarray will see that there are more groups in the file. However, this is redundant considering the information is already in the netCDF file, and also brittle since there's no guarantee the attribute truly reflects the groups in the file. Conclusion Considering that `xr.open_dataset` has a `group` parameter to open groups, it seems unfortunate that when you open a file, you don't see what groups are in there. Instead, you have to use an external tool to get information on the file's groups, then open them with xarray. Since this is only a matter of extracting group data and printing it, surely this is a simple (and imo, valuable) addition. I'd be happy to implement it and submit a PR if people are on-board. I might need some direction though, this is my first time digging into the xarray source code, and I don't see a `__str__` method on the Dataset class, which is where I expected to make this addition.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4840/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
770006670	MDU6SXNzdWU3NzAwMDY2NzA=	4704	Retries for rare failures	eric-czech 6130352	open		2	2020-12-17T13:06:51Z	2022-04-09T02:30:16Z		NONE	I recently ran into several issues with gcsfs (https://github.com/dask/gcsfs/issues/316, https://github.com/dask/gcsfs/issues/315, and https://github.com/dask/gcsfs/issues/318) where errors are occasionally thrown, but only in large worfklows where enough http calls are made for them to become probable. @martindurant suggested forcing dask to retry tasks that may fail like this with `.compute(... retries=N)` in https://github.com/dask/gcsfs/issues/316, which has worked well. However, I also see this in Xarray/Zarr code interacting with gcsfs directly: Example Traceback ``` Traceback (most recent call last): File "scripts/convert_phesant_data.py", line 100, in <module> fire.Fire() File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fire/core.py", line 138, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fire/core.py", line 463, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace component = fn(varargs, kwargs) File "scripts/convert_phesant_data.py", line 96, in sort_zarr ds.to_zarr(fsspec.get_mapper(output_path), consolidated=True, mode="w") File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/core/dataset.py", line 1652, in to_zarr return to_zarr( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/api.py", line 1368, in to_zarr dump_to_store(dataset, zstore, writer, encoding=encoding) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/api.py", line 1128, in dump_to_store store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/zarr.py", line 417, in store self.set_variables( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/zarr.py", line 489, in set_variables writer.add(v.data, zarr_array, region=region) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/common.py", line 145, in add target[...] = source File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1115, in __setitem__ self.set_basic_selection(selection, value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1210, in set_basic_selection return self._set_basic_selection_nd(selection, value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1501, in _set_basic_selection_nd self._set_selection(indexer, value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1550, in _set_selection self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1664, in _chunk_setitem self._chunk_setitem_nosync(chunk_coords, chunk_selection, value, File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1729, in _chunk_setitem_nosync self.chunk_store[ckey] = cdata File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/mapping.py", line 151, in __setitem__ self.fs.pipe_file(key, maybe_convert(value)) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 121, in wrapper return maybe_sync(func, self, args, *kwargs) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 100, in maybe_sync return sync(loop, func, args, **kwargs) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync raise exc.with_traceback(tb) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 55, in f result[0] = await future File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 1007, in _pipe_file return await simple_upload( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 1523, in simple_upload j = await fs._call( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 525, in _call raise e File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 507, in _call self.validate_response(status, contents, json, path, headers) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 1228, in validate_response raise HttpError(error) gcsfs.utils.HttpError: Required ``` Has there already been a discussion about how to address rare errors like this? Arguably, I could file the same issue with Zarr but it seemed more productive to start here at a higher level of abstraction. To be clear, the code for the example failure above typically succeeds and reproducing this failure is difficult. I have only seen it a couple times now like this, where the calling code does not include dask, but it did make me want to know if there were any plans to tolerate rare failures in Xarray as Dask does.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4704/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
517192343	MDU6SXNzdWU1MTcxOTIzNDM=	3482	geo raster accessor	shaharkadmiel 6872529	closed		1	2019-11-04T14:34:27Z	2022-04-09T02:28:38Z	2022-04-09T02:28:25Z	NONE	Hi, I have put together a very simple package that provides a universal `read` function for reading various raster formats including of course netCDF but also any other format that gdal or rasterio can recognize. This read function can also handle merging several tiles into one dataset. In addition, the package provides a `.geo` dataset accessor that currently adds trimming functionality to extract a geographical subset of the data. I plan to also add reprojection and spatial resampling methods which will wrap either rasterio functionality or directly use gdal's api. I hope this is of interest to the geosciences community and perhaps even a broader community. Contributions and any other input from others is of course welcome. Have a quick look at the Demo section in the readme file to get some ideas as to what this package can do for you.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3482/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
532647948	MDU6SXNzdWU1MzI2NDc5NDg=	3593	xr.open_dataset not reloading data in jupyter-notebook	lkroen 58510627	closed		1	2019-12-04T12:17:13Z	2022-04-09T02:27:17Z	2022-04-09T02:27:17Z	NONE	First, I reported this issue on Jupyter-Notebook and was told, that it might be an issue of xarry: https://github.com/jupyter/notebook/issues/5101 I load an .nc file and print it Cell 1 `python 3 import xarray as xr data_file = 'path_to_file/WMI_Lear.nc'` Cell 2 `python 3 data = '' data = xr.open_dataset(data_file) print(data)` and I get the correct output: `python <xarray.Dataset> Dimensions: (time: 180) Coordinates: * time (time) datetime64[ns] 2003-07-06T06:30:13 ... 2003-07-06T06:59:59 Data variables: altitude (time) float32 ... latitude (time) float32 ... longitude (time) float32 ... pressure (time) float32 ... tdry (time) float32 ... dp (time) float32 ... mr (time) float32 ... wspd (time) float32 ... wdir (time) float32 ... Drops (time) float64 ... Attributes: history: $Id: TrackFile.java,v 1.20 2003/05/07 04:53:23 maclean Exp $` Now I (re)move the data in a terminal so that does not exist under the same name `bash mv path_to_file/WMI_Lear.nc path_to_file/WMI_Lear.nc_new` Cell 3 `python 3 data = '' data = xr.open_dataset(data_file) print(data)` and I correctly get an error, that the file does not exist `python FileNotFoundError: [Errno 2] No such file or directory: b'/path_to_file/WMI_Lear.nc'` Now I move the data in a terminal backwards so that it exits again under the correct name `bash mv path_to_file/WMI_Lear.nc_new path_to_file/WMI_Lear.nc` Cell 4 `python 3 data = '' data = xr.open_dataset(data_file) print(data)` And again the output is correct as after cell 2 Now I (re)emove the data in a terminal so that does not exist under the same name again `bash mv path_to_file/WMI_Lear.nc path_to_file/WMI_Lear.nc_new` Cell 5 `python 3 data = '' data = xr.open_dataset(data_file) print(data)` Now I expect again the error message that I follows after cell 3, which says that the file does not exist. But, I get the output as If the file would exist. `If I replace`python3 data = xr.open_dataset(data_file) `with`python3 data = xr.open_dataset(data_file,cache='old') ``` I get the same problem. The same issue occurs, if I only change the file. Then the changed file isn't loaded anymore. Deleting the file is just a drastic example. The same issue occurs, if I just repeatedly run the cell which is supposed to load the file. Then the file change is not loaded anymore. This is a real issue since I would need to restart the kernel always which is just not practical.. Attached you'll find the simple .nc file. WMI_Lear.nc.zip	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3593/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
552987067	MDU6SXNzdWU1NTI5ODcwNjc=	3712	[Documentation/API?] {DataArray,Dataset}.sortby is stable sort?	jaicher 4666753	open		0	2020-01-21T16:27:37Z	2022-04-09T02:26:34Z		CONTRIBUTOR	I noticed that `{DataArray,Dataset}.sortby()` are implemented using `np.lexsort()`, which is a stable sort. Can we expect this function to remain a stable sort in the future even if the implementation is changed for some reason? It is not explicitly stated in the docs that the sorting will be stable. If this function is meant to always be stable, I think the documentation should explicitly state this. If not, I think it would be helpful to have an optional argument to ensure that the sort is kept stable in case the implementation changes in the future.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3712/reactions", "total_count": 3, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 1 }		xarray 13221727	issue
559283550	MDU6SXNzdWU1NTkyODM1NTA=	3745	groupby drops the variable used to group	malmans2 22245117	open		0	2020-02-03T19:25:06Z	2022-04-09T02:25:17Z		CONTRIBUTOR	MCVE Code Sample `python import xarray as xr ds = xr.tutorial.load_dataset('rasm')` ```python Seasonal mean ds_season = ds.groupby('time.season').mean() ds_season ``` <xarray.Dataset> Dimensions: (season: 4, x: 275, y: 205) Coordinates: yc (y, x) float64 16.53 16.78 17.02 17.27 ... 28.26 28.01 27.76 27.51 xc (y, x) float64 189.2 189.4 189.6 189.7 ... 17.65 17.4 17.15 16.91 * season (season) object 'DJF' 'JJA' 'MAM' 'SON' Dimensions without coordinates: x, y Data variables: Tair (season, y, x) float64 nan nan nan nan ... 23.13 22.06 21.72 21.94 ```python The seasons are ordered in alphabetical order. I want to sort them based on time. But time was dropped, so I have to do this: time_season = ds['time'].groupby('time.season').mean() ds_season.sortby(time_season) ``` <xarray.Dataset> Dimensions: (season: 4, x: 275, y: 205) Coordinates: yc (y, x) float64 16.53 16.78 17.02 17.27 ... 28.26 28.01 27.76 27.51 xc (y, x) float64 189.2 189.4 189.6 189.7 ... 17.65 17.4 17.15 16.91 * season (season) object 'SON' 'DJF' 'MAM' 'JJA' Dimensions without coordinates: x, y Data variables: Tair (season, y, x) float64 nan nan nan nan ... 29.27 28.39 27.94 28.05 Expected Output ```python Why does groupby drop time? I would expect a dataset that looks like this: ds_season['time'] = time_season ds_season ``` <xarray.Dataset> Dimensions: (season: 4, x: 275, y: 205) Coordinates: yc (y, x) float64 16.53 16.78 17.02 17.27 ... 28.26 28.01 27.76 27.51 xc (y, x) float64 189.2 189.4 189.6 189.7 ... 17.65 17.4 17.15 16.91 * season (season) object 'DJF' 'JJA' 'MAM' 'SON' Dimensions without coordinates: x, y Data variables: Tair (season, y, x) float64 nan nan nan nan ... 23.13 22.06 21.72 21.94 time (season) object 1982-01-16 12:00:00 ... 1981-10-17 00:00:00 Problem Description I often use `groupby` on time variables. When I do that, the time variable is dropped and replaced (e.g., time is replaced by season, month, year, ...). Most of the time I also want to sort the new dataset based on the original time. The example above shows why this is useful for seasons. Another example would be to sort monthly averages of a dataset that originally had daily data from Sep-2000 to Aug-2001. Why is time dropped? Does it make sense to keep it in the grouped dataset? Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-1067-oem machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.14.1 pandas: 1.0.0 numpy: 1.18.1 scipy: None netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.1 dask: 2.10.1 distributed: 2.10.0 matplotlib: 3.1.2 cartopy: None seaborn: None numbagg: None setuptools: 45.1.0.post20200127 pip: 20.0.2 conda: None pytest: None IPython: 7.11.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3745/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
564240510	MDU6SXNzdWU1NjQyNDA1MTA=	3767	ValueError when reading netCDF	jjm0022 16228337	closed		2	2020-02-12T20:08:45Z	2022-04-09T02:24:48Z	2022-04-09T02:24:48Z	NONE	MCVE Code Sample ```python ds = xr.open_dataset('20090327_0600') ``` Problem Description Whenever I try to read certain netCDF files it raises a `ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.` Here is a link to the one of the files that is raising the error: https://madis-data.ncep.noaa.gov/madisPublic1/data/archive/2009/03/27/LDAD/hfmetar/netCDF/20090327_0600.gz I don't have any issues reading this file though: https://madis-data.ncep.noaa.gov/madisPublic1/data/archive/2019/05/15/LDAD/hfmetar/netCDF/20190515_1200.gz The traceback looks like this: ```python KeyError Traceback (most recent call last) ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock) 197 try: --> 198 file = self._cache[self._key] 199 except KeyError: ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/lru_cache.py in getitem(self, key) 52 with self._lock: ---> 53 value = self._cache[key] 54 self._cache.move_to_end(key) KeyError: [<function _open_scipy_netcdf at 0x11c8fc160>, ('/Users/jmiller/data/madis/20090327_0600',), 'r', (('mmap', None), ('version', 2))] During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) <ipython-input-26-04ef422e5840> in <module> ----> 1 ds = xr.open_dataset('madis/20090327_0600') ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime) 536 537 with close_on_error(store): --> 538 ds = maybe_decode_store(store) 539 540 # Ensure source filename always stored in dataset object (GH issue #2550) ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/api.py in maybe_decode_store(store, lock) 444 445 def maybe_decode_store(store, lock=False): --> 446 ds = conventions.decode_cf( 447 store, 448 mask_and_scale=mask_and_scale, ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/conventions.py in decode_cf(obj, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime) 568 encoding = obj.encoding 569 elif isinstance(obj, AbstractDataStore): --> 570 vars, attrs = obj.load() 571 extra_coords = set() 572 file_obj = obj ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/common.py in load(self) 121 """ 122 variables = FrozenDict( --> 123 (_decode_variable_name(k), v) for k, v in self.get_variables().items() 124 ) 125 attributes = FrozenDict(self.get_attrs()) ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/scipy_.py in get_variables(self) 155 def get_variables(self): 156 return FrozenDict( --> 157 (k, self.open_store_variable(k, v)) for k, v in self.ds.variables.items() 158 ) 159 ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/scipy_.py in ds(self) 144 @property 145 def ds(self): --> 146 return self._manager.acquire() 147 148 def open_store_variable(self, name, var): ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/file_manager.py in acquire(self, needs_lock) 178 An open file object, as returned by `opener(args, kwargs)`. 179 """ --> 180 file, _ = self._acquire_with_cache_info(needs_lock) 181 return file 182 ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock) 202 kwargs = kwargs.copy() 203 kwargs["mode"] = self._mode --> 204 file = self._opener(self._args,* kwargs) 205 if self._mode == "w": 206 # ensure file doesn't get overriden when opened again ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/scipy_.py in _open_scipy_netcdf(filename, mode, mmap, version) 81 82 try: ---> 83 return scipy.io.netcdf_file(filename, mode=mode, mmap=mmap, version=version) 84 except TypeError as e: # netcdf3 message is obscure in this case 85 errmsg = e.args[0] ~/miniconda3/envs/proc/lib/python3.8/site-packages/scipy/io/netcdf.py in init(self, filename, mode, mmap, version, maskandscale) 282 283 if mode in 'ra': --> 284 self._read() 285 286 def setattr(self, attr, value): ~/miniconda3/envs/proc/lib/python3.8/site-packages/scipy/io/netcdf.py in _read(self) 614 self._read_dim_array() 615 self._read_gatt_array() --> 616 self._read_var_array() 617 618 def _read_numrecs(self): ~/miniconda3/envs/proc/lib/python3.8/site-packages/scipy/io/netcdf.py in _read_var_array(self) 720 # Build rec array. 721 if self.use_mmap: --> 722 rec_array = self._mm_buf[begin:begin+self._recsself._recsize].view(dtype=dtypes) 723 rec_array.shape = (self._recs,) 724 else: ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array. ``` Output of `xr.show_versions()` # Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.8.1 \| packaged by conda-forge \| (default, Jan 29 2020, 15:06:10) [Clang 9.0.1 ] python-bits: 64 OS: Darwin OS-release: 19.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.15.0 pandas: 1.0.0 numpy: 1.18.1 scipy: 1.4.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.10.1 distributed: 2.10.0 matplotlib: 3.1.3 cartopy: 0.17.0 seaborn: 0.10.0 numbagg: None setuptools: 45.1.0.post20200119 pip: 20.0.2 conda: None pytest: None IPython: 7.12.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3767/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
361237908	MDU6SXNzdWUzNjEyMzc5MDg=	2419	Document ways to reshape a DataArray	dimitryx2017 9844249	open		5	2018-09-18T10:27:36Z	2022-04-09T02:21:15Z		NONE	Code Sample, a copy-pastable example if possible A "Minimal, Complete and Verifiable Example" will make it much easier for maintainers to help you: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports ```python Your code here def xr_reshape(A, dim, newdims, coords): """ Reshape DataArray A to convert its dimension dim into sub-dimensions given by newdims and the corresponding coords. Example: Ar = xr_reshape(A, 'time', ['year', 'month'], [(2017, 2018), np.arange(12)]) """ # Create a pandas MultiIndex from these labels ind = pd.MultiIndex.from_product(coords, names=newdims) # Replace the time index in the DataArray by this new index, A1 = A.copy() A1.coords[dim] = ind # Convert multiindex to individual dims using DataArray.unstack(). # This changes dimension order! The new dimensions are at the end. A1 = A1.unstack(dim) # Permute to restore dimensions i = A.dims.index(dim) dims = list(A1.dims) for d in newdims[::-1]: dims.insert(i, d) for d in newdims: _ = dims.pop(-1) return A1.transpose(dims) ``` Problem description [this should explain why* the current behavior is a problem and why the expected output is a better solution.] It would be great to have the above function as a DataArray's method. Expected Output A reshaped DataArray. In the example in the function comment it would correspond to an array like In[1] Ar.dims Out[1]: ('year', 'month', 'lat', 'lon') Output of `xr.show_versions()` # Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-693.5.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8 xarray: 0.10.4 pandas: 0.23.0 numpy: 1.13.3 scipy: 0.19.1 netCDF4: 1.3.1 h5netcdf: None h5py: 2.7.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.15.3 distributed: 1.19.1 matplotlib: 2.1.0 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 36.5.0.post20170921 pip: 18.0 conda: 4.4.7 pytest: 3.2.1 IPython: 6.1.0 sphinx: 1.6.3	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2419/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
414641120	MDU6SXNzdWU0MTQ2NDExMjA=	2789	Appending to zarr with string dtype	davidbrochart 4711805	open		2	2019-02-26T14:31:42Z	2022-04-09T02:18:05Z		CONTRIBUTOR	```python import xarray as xr da = xr.DataArray(['foo']) ds = da.to_dataset(name='da') ds.to_zarr('ds') # no special encoding specified ds = xr.open_zarr('ds') print(ds.da.values) ``` The following code prints `['foo']` (string type). The encoding chosen by zarr is `"dtype": "\|S3"`, which corresponds to bytes, but it seems to be decoded to a string, which is what we want. `$ cat ds/da/.zarray { "chunks": [ 1 ], "compressor": { "blocksize": 0, "clevel": 5, "cname": "lz4", "id": "blosc", "shuffle": 1 }, "dtype": "\|S3", "fill_value": null, "filters": null, "order": "C", "shape": [ 1 ], "zarr_format": 2 }` The problem is that if I want to append to the zarr archive, like so: ```python import zarr ds = zarr.open('ds', mode='a') da_new = xr.DataArray(['barbar']) ds.da.append(da_new) ds = xr.open_zarr('ds') print(ds.da.values) ``` It prints `['foo' 'bar']`. Indeed the encoding was kept as `"dtype": "\|S3"`, which is fine for a string of 3 characters but not for 6. If I want to specify the encoding with the maximum length, e.g: `python ds.to_zarr('ds', encoding={'da': {'dtype': '\|S6'}})` It solves the length problem, but now my strings are kept as bytes: `[b'foo' b'barbar']`. If I specify a Unicode encoding: `python ds.to_zarr('ds', encoding={'da': {'dtype': 'U6'}})` It is not taken into account. The zarr encoding is `"dtype": "\|S3"` and I am back to my length problem: `['foo' 'bar']`. The solution with `'dtype': '\|S6'` is acceptable, but I need to encode my strings to bytes when indexing, which is annoying.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2789/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
415192339	MDU6SXNzdWU0MTUxOTIzMzk=	2790	Bug in xarray.open_dataset with variables/coordinates of dtype 'timedelta64[ns]'	SK-E 48060979	closed		1	2019-02-27T15:48:14Z	2022-04-09T02:17:56Z	2022-04-09T02:17:56Z	NONE	Code Sample, a copy-pastable example if possible ```python import xarray as xr import pandas as pd Create array, coordinate time's dtype is timedelta64[ns] time = pd.timedelta_range(f"{2.0}s",f"{2.05}s",freq="10ms",name="time") data = range(len(time)) arr = xr.DataArray(data=psi,coords={"time":time},dims="time",name="psi") Save array savefile = "/path/to/file/BugXarray.nc" arr.to_netcdf(savefile) Load array arr_loaded = xr.open_dataset(savefile) Show time-coordinate on arr and arr_loaded print(arr.time.values) Output: [2000000000 2010000000 2020000000 2030000000 2040000000 2050000000] print(arr_loaded.time.values) Output: [2000000000 2009999999 2020000000 2029999999 2040000000 2049999999] Same problem with pandas to_timedelta timedelta = np.arange(200,206,1)/100 timedelta = pd.to_timedelta(timedelta,unit="s") Show time and timedelta print(time.values) Output: [2000000000 2010000000 2020000000 2030000000 2040000000 2050000000] print(timedelta.values) Output: [2000000000 2009999999 2020000000 2029999999 2040000000 2049999999] ``` Problem description Opening a netcdf-file that contains variables/coordinates with a dtype that is supposed to be 'timedelta64[ns]' might cause errors due to a loss in precision. I realized that the pandas-function pandas.to_timedelta shows the same misbehavior, though I don't know if xarray.open_dataset uses that function internally. Expected Output In the example above arr_loaded.time.values should equal arr.time.values! Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.2 (default, Dec 29 2018, 06:19:36) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-45-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: de_DE.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.11.3 pandas: 0.24.1 numpy: 1.15.4 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 cyordereddict: None dask: 1.1.1 distributed: 1.25.3 matplotlib: 3.0.2 cartopy: None seaborn: 0.9.0 setuptools: 40.8.0 pip: 19.0.1 conda: 4.6.4 pytest: 4.2.1 IPython: 7.2.0 sphinx: 1.8.4	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2790/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
428180638	MDU6SXNzdWU0MjgxODA2Mzg=	2863	Memory Error for simple operations on NETCDF4 internally zipped files	rpnaut 30219501	closed		3	2019-04-02T11:48:01Z	2022-04-09T02:15:45Z	2022-04-09T02:15:45Z	NONE	Assuming you want to make easy computations with a data array loaded from internally zipped NETCDF4 files, you need at first to load a dataset: In [2]: eobs = xarray.open_dataset("eObs_ens_mean_0.1deg_reg_v18.0e.T_2M.1950-2018.nc") In [3]: eobs Out[3]: <xarray.Dataset> Dimensions: (lat: 465, lon: 705, time: 25049) Coordinates: * time (time) datetime64[ns] 1950-01-01 1950-01-02 1950-01-03 ... * lon (lon) float64 -24.95 -24.85 -24.75 -24.65 -24.55 -24.45 -24.35 ... * lat (lat) float64 25.05 25.15 25.25 25.35 25.45 25.55 25.65 25.75 ... Data variables: T_2M (time, lat, lon) float64 nan nan nan nan nan nan nan nan nan ... Attributes: _NCProperties: version=1\|netcdflibversion=4.4.1\|hdf5libversion=1.8.17 E-OBS_version: 18.0e Conventions: CF-1.4 References: http://surfobs.climate.copernicus.eu/dataaccess/access_eo... Afterwards I have tried to do this: ``` In [4]: datarray=eobs["T_2M"]+273.15 MemoryError Traceback (most recent call last) <ipython-input-4-eaff3bff5e27> in <module>() ----> 1 datarray=eobs["T_2M"]+273.15 /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/dataarray.py in func(self, other) 1539 1540 variable = (f(self.variable, other_variable) -> 1541 if not reflexive 1542 else f(other_variable, self.variable)) 1543 coords = self.coords._merge_raw(other_coords) /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in func(self, other) 1139 if isinstance(other, (xr.DataArray, xr.Dataset)): 1140 return NotImplemented -> 1141 self_data, other_data, dims = _broadcast_compat_data(self, other) 1142 new_data = (f(self_data, other_data) 1143 if not reflexive /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in _broadcast_compat_data(self, other) 1379 else: 1380 # rely on numpy broadcasting rules -> 1381 self_data = self.data 1382 other_data = other 1383 dims = self.dims /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in data(self) 265 return self._data 266 else: --> 267 return self.values 268 269 @data.setter /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in values(self) 306 def values(self): 307 """The variable's data as a numpy.ndarray""" --> 308 return _as_array_or_item(self._data) 309 310 @values.setter /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in _as_array_or_item(data) 182 TODO: remove this (replace with np.asarray) once these issues are fixed 183 """ --> 184 data = np.asarray(data) 185 if data.ndim == 0: 186 if data.dtype.kind == 'M': /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/numpy-1.11.2-py3.5-linux-x86_64.egg/numpy/core/numeric.py in asarray(a, dtype, order) 480 481 """ --> 482 return array(a, dtype, copy=False, order=order) 483 484 def asanyarray(a, dtype=None, order=None): /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/indexing.py in array(self, dtype) 417 418 def array(self, dtype=None): --> 419 self._ensure_cached() 420 return np.asarray(self.array, dtype=dtype) 421 /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/indexing.py in _ensure_cached(self) 414 def _ensure_cached(self): 415 if not isinstance(self.array, np.ndarray): --> 416 self.array = np.asarray(self.array) 417 418 def array(self, dtype=None): /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/numpy-1.11.2-py3.5-linux-x86_64.egg/numpy/core/numeric.py in asarray(a, dtype, order) 480 481 """ --> 482 return array(a, dtype, copy=False, order=order) 483 484 def asanyarray(a, dtype=None, order=None): /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/indexing.py in array(self, dtype) 398 399 def array(self, dtype=None): --> 400 return np.asarray(self.array, dtype=dtype) 401 402 def getitem(self, key): /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/numpy-1.11.2-py3.5-linux-x86_64.egg/numpy/core/numeric.py in asarray(a, dtype, order) 480 481 """ --> 482 return array(a, dtype, copy=False, order=order) 483 484 def asanyarray(a, dtype=None, order=None): /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/indexing.py in array(self, dtype) 373 def array(self, dtype=None): 374 array = orthogonally_indexable(self.array) --> 375 return np.asarray(array[self.key], dtype=None) 376 377 def getitem(self, key): /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/conventions.py in getitem(self, key) 361 def getitem(self, key): 362 return mask_and_scale(self.array[key], self.fill_value, --> 363 self.scale_factor, self.add_offset, self._dtype) 364 365 def repr(self): /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/conventions.py in mask_and_scale(array, fill_value, scale_factor, add_offset, dtype) 57 """ 58 # by default, cast to float to ensure NaN is meaningful ---> 59 values = np.array(array, dtype=dtype, copy=True) 60 if fill_value is not None and not np.all(pd.isnull(fill_value)): 61 if getattr(fill_value, 'size', 1) > 1: MemoryError: ``` I have uploaded the datafile to the following link: https://swiftbrowser.dkrz.de/public/dkrz_c0725fe8741c474b97f291aac57f268f/GregorMoeller/ Do I use the wrong netcdf-engine?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2863/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
441192361	MDU6SXNzdWU0NDExOTIzNjE=	2945	Implicit conversion from int to float tampers with values when int is not representable as float	floogit 14000880	closed		1	2019-05-07T11:57:20Z	2022-04-09T02:14:28Z	2022-04-09T02:14:28Z	NONE	```python ds = xr.Dataset() val = 95042027804193144 ds['var1'] = xr.DataArray(val) ds_1 = ds.where(ds.var1==val) print(ds_1.var1.dtype) dtype('float64') print(int(ds_1.var1)) 95042027804193152 ``` Problem description As described in #2183, int values are converted to float in `where()`, also when there are no NaNs in the data. This is a serious issue for the case when the int64 number is not representable as float64, as is the case in the example above. The resulting numbers are then actually different from the original numbers, without any warning. Expected Output I guess this is hard to fix. At a minimum, `where()` should probably not cast to float when there are no NaNs (which would already fix our use case). I would also rather expect an error instead of silently changing the values of a variable. Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.5.3 (default, Sep 27 2018, 17:25:39) [GCC 6.3.0 20170516] python-bits: 64 OS: Linux OS-release: 4.19.0-0.bpo.2-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.8.18 libnetcdf: 4.4.1.1 xarray: 0.12.1 pandas: 0.24.2 numpy: 1.15.4 scipy: 1.2.1 netCDF4: 1.2.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: 0.36.0 cfgrib: 0.9.6.post1 iris: None bottleneck: 1.2.1 dask: 1.1.4 distributed: None matplotlib: 3.0.3 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 39.2.0 pip: 19.0.3 conda: None pytest: 4.4.1 IPython: 7.5.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2945/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
326344778	MDU6SXNzdWUzMjYzNDQ3Nzg=	2183	converting int vars to floats when I where the enclosing ds?	IvoCrnkovic 1778852	open		5	2018-05-25T00:48:43Z	2022-04-09T02:14:23Z		NONE	Code Sample ```python test_ds = xr.Dataset() test_ds['var1'] = xr.DataArray(np.arange(5)) test_ds['var2'] = xr.DataArray(np.ones(5)) assert(test_ds['var1'].dtype == np.int64) assert(test_ds.where(test_ds['var2'] == 1)['var1'].dtype == np.int64) ``` Problem description Second assert fails, which is a bit strange I think. Is that intended? If so, whats the reasoning? Output of `xr.show_versions()` commit: None python: 2.7.14.final.0 python-bits: 64 OS: Linux OS-release: 4.9.87-linuxkit-aufs machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: None LOCALE: None.None xarray: 0.10.3 pandas: 0.22.0 numpy: 1.14.3 scipy: 1.1.0 netCDF4: None h5netcdf: None h5py: None Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: None distributed: None matplotlib: 2.2.2 cartopy: None seaborn: 0.8.1 setuptools: 39.1.0 pip: 10.0.1 conda: None pytest: 3.5.1 IPython: 5.6.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2183/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
446868198	MDU6SXNzdWU0NDY4NjgxOTg=	2978	sel(method=x) is not propagated for MultiIndex	mschrimpf 5308236	open		3	2019-05-21T23:30:56Z	2022-04-09T02:09:00Z		NONE	When passing a `method` different from `None` to the selection method (e.g. `.sel(method='nearest')`), it is not propagated if the index is a MultiIndex. Specifically, the passing of the `method` key seems to be missing in `xarray/core/indexing.py:convert_label_indexer` https://github.com/pydata/xarray/blob/0811141e8f985a1f3b95ead92c3850cc74e160a5/xarray/core/indexing.py#L158-L159 For a normal index, the `method` is passed properly: https://github.com/pydata/xarray/blob/0811141e8f985a1f3b95ead92c3850cc74e160a5/xarray/core/indexing.py#L181 This leads to an unexpected `KeyError` when the selection value is not in the index, even if a nearest value could have been found. Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.6.7.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-143-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.8 pandas: 0.24.2 numpy: 1.16.2 scipy: 1.1.0 netCDF4: 1.4.2 h5netcdf: None h5py: 2.8.0 Nio: None zarr: None bottleneck: None cyordereddict: None dask: 0.20.0 distributed: None matplotlib: 3.0.1 cartopy: None seaborn: 0.9.0 setuptools: 40.8.0 pip: 19.0.3 conda: None pytest: 3.10.0 IPython: 7.1.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2978/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
447044177	MDU6SXNzdWU0NDcwNDQxNzc=	2980	Jupyter Notebooks for Tutorials(USER GUIDE)	hdsingh 30382331	open		3	2019-05-22T10:01:26Z	2022-04-09T02:07:55Z		NONE	This issue is more of a suggestion. A small issue that users reading documentation face is unavailability of jupyter notebooks for the tutorial docs User Guide. User constantly has to copy paste code from the documentation or `.rst` file which results in wastage of time. Having executable notebooks for new users would help them save time and quickly move on to using `xarray` for their specific tasks.It would ease the learning process for new users which may somehow bring more contributors to xarray community. Let's take example of `pyviz`, `holoviews`, `pytorch`. 00 Setup — PyViz 0.10.0 documentation holoviews/examples/user_guide at master · pyviz/holoviews · GitHub Chatbot Tutorial — PyTorch Tutorials 1.1.0.dev20190507 documentation All of them provide option to download the tutorial in the form of `.ipynb` file either in the beginning or end of the notebook.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2980/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
467736580	MDU6SXNzdWU0Njc3MzY1ODA=	3109	In the contribution instructions, the py36.yml fails to set up	mmartini-usgs 23199378	closed		2	2019-07-13T15:55:23Z	2022-04-09T02:05:48Z	2022-04-09T02:05:48Z	NONE	Code Sample, a copy-pastable example if possible conda env create -f ci/requirements/py36.yml Problem description In the contribution instructions, the py36.yml fails to set up, so the test environment does nto get created Expected Output A test environment Output of `xr.show_versions()` Environment fails to build, cannot be resolved. The fix is to change `conda env create -f ci/requirements/py36.yml` to `conda env create -f ci/requirements/py37.yml` on this page: http://xarray.pydata.org/en/latest/contributing.html	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3109/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
484699415	MDU6SXNzdWU0ODQ2OTk0MTU=	3256	.item() on a DataArray with dtype='datetime64[ns]' returns int	IvoCrnkovic 1778852	open		4	2019-08-23T20:29:50Z	2022-04-09T02:03:43Z		NONE	MCVE Code Sample ```python import datetime import xarray as xr test_da = xr.DataArray(datetime.datetime(2019, 1, 1, 1, 1)) test_da <xarray.DataArray ()> array('2019-01-01T01:01:00.000000000', dtype='datetime64[ns]') test_da.item() 1546304460000000000 ``` Expected Output I would think it would be nice to get a `datetime` out of the `.item()` call then the nanosecond representation. Output of `xr.show_versions()` When I call xr.show_versions() i get an error but im running xarray 0.12.3	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3256/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
478398026	MDU6SXNzdWU0NzgzOTgwMjY=	3192	Cloud Storage Buckets	pl-marasco 22492773	closed		1	2019-08-08T10:58:05Z	2022-04-09T01:51:09Z	2022-04-09T01:51:09Z	NONE	Following the instruction to create cloud storage here I stumbled with the fact that seems gcsfs doesn't anymore implement`.mapping` in the example is used as : `gcsfs.mapping.GCSMap('<bucket-name>', gcs=fs, check=True, create=False)` Is it the example correct or must be rewritten?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3192/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
60303760	MDU6SXNzdWU2MDMwMzc2MA==	364	pd.Grouper support?	naught101 167164	open		24	2015-03-09T06:25:14Z	2022-04-09T01:48:48Z		NONE	In pandas, you can pas a `pandas.TimeGrouper` object to a `.groupby()` call, and it allows you to group by month, year, day, or other times, without manually creating a new index with those values first. It would be great if you could do this with `xray`, but at the moment, I get: `` /usr/local/lib/python3.4/dist-packages/xray/core/groupby.py in __init__(self, obj, group, squeeze) 66 if the dimension is squeezed out. 67 """ ---> 68 if group.ndim != 1: 69 # TODO: remove this limitation? 70 raise ValueError('group` must be 1 dimensional') AttributeError: 'TimeGrouper' object has no attribute 'ndim' ``` Not sure how this will work though, because pandas.TimeGrouper doesn't appear to work with multi-index dataframes yet anyway, so maybe there needs to be a feature request over there too, or maybe it's better to implement something from scratch...	{ "url": "https://api.github.com/repos/pydata/xarray/issues/364/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
400289716	MDU6SXNzdWU0MDAyODk3MTY=	2686	Is `create_test_data()` public API?	TomNicholas 35968931	open		3	2019-01-17T14:00:20Z	2022-04-09T01:48:14Z		MEMBER	We want to encourage people to use and extend xarray, and we already provide testing functions as public API to help with this. One function I keep using when writing code which uses xarray is `xarray.tests.test_dataset.create_test_data()`. This is very useful for quickly writing tests for the same reasons that it's useful in xarray's internal tests, but it's not explicitly public API. This means that there's no guarantee it won't change/disappear, which is not ideal if you're trying to write a test suite for separate software. But so many tests in xarray rely on it that presumably it's not going to get changed. Is there any reason why it shouldn't be public API? Is there something I should use instead?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2686/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
377096851	MDU6SXNzdWUzNzcwOTY4NTE=	2539	Request: Add support for the ERDDAP griddap request	rmendels 1919031	closed		3	2018-11-03T21:56:10Z	2022-04-09T01:47:28Z	2022-04-09T01:47:28Z	NONE	xarray already supports OPenDAP requests, and the ERDDAP service is being installed in many places, and while an ERDDAP server can function as an OPeNDAP server, and its syntax is very close to the OpeNDAP syntax, ERDDAP/griddap has the advantage that requests can be made in coordinate space. Moreover, it would not have to be coded from scratch, ERDDAPy (https://github.com/pyoceans/erddapy) already has the code, it would be more of a question on how to integrate it. The ERDDAP service can return both netcdf or .dds files if that makes it easier to integrate. Thanks.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2539/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
818266159	MDU6SXNzdWU4MTgyNjYxNTk=	4973	NetCDF encoded data not automatically decoded back into original dtype	chrism0dwk 625462	closed		2	2021-02-28T17:57:33Z	2022-04-09T01:41:22Z	2022-04-09T01:41:22Z	NONE	What happened: When reading in an encoded netCDF4 file, encoded variables are not transformed back to their original dtype in the resulting xarray. What you expected to happen: As with the raw netCDF4 package, if an `xarray.DataArray` of dtype `float64` is encoded into a netCDF4 file as a `float32`, it should be converted back to the original `float64` when the netCDF4 dataset is read back in. Minimal Complete Verifiable Example: `python import xarray as xr import numpy as np foo = xr.DataArray(np.random.uniform(size=[100,100]).astype(np.float64)) foo.dtype # float64 ds = xr.Dataset({'foo': foo}) ds.to_netcdf("foo.nc", encoding={'foo': {'dtype': 'float32', 'scale_factor': 1.0, 'add_offset': 0.0}}) ds1 = xr.open_dataset("foo.nc") ds1['foo'].dtype # float32, not float64 as expected` Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.7 (default, Mar 23 2020, 22:36:06) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-66-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.17.0 pandas: 1.1.5 numpy: 1.19.5 scipy: None netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.02.0 distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0 pip: 20.2.2 conda: None pytest: None IPython: 7.21.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4973/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
830040696	MDU6SXNzdWU4MzAwNDA2OTY=	5024	xr.DataArray.sum() converts string objects into unicode	FabianHofmann 19226431	open		0	2021-03-12T11:47:06Z	2022-04-09T01:40:09Z		CONTRIBUTOR	What happened: When summing over all axes of a DataArray with strings of dtype `object`, the result is a one-size `unicode` DataArray. What you expected to happen: I expected the summation would preserve the dtype, meaning the one-size DataArray would be of dtype `object` Minimal Complete Verifiable Example: `ds = xr.DataArray('a', [range(3), range(3)]).astype(object) ds.sum()` Output `<xarray.DataArray ()> array('aaaaaaaaa', dtype='<U9')` On the other hand, when summing over one dimension only, the dtype is preserved `ds.sum('dim_0')` Output: `<xarray.DataArray (dim_1: 3)> array(['aaa', 'aaa', 'aaa'], dtype=object) Coordinates: * dim_1 (dim_1) int64 0 1 2` Anything else we need to know?: The problem becomes relevant as soon as dask is used in the workflow. Dask expects the aggregated DataArray to be of dtype `object` which will likely lead to errors in the operations to follow. Probably the behavior comes from creating a new DataArray after the reduction with `np.sum()` (which itself leads results in a pure python string). Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-66-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.2.1 numpy: 1.19.5 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: 0.7.4 h5py: 3.1.0 Nio: None zarr: 2.3.2 cftime: 1.3.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.0 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.01.1 distributed: 2021.01.1 matplotlib: 3.3.3 cartopy: 0.18.0 seaborn: 0.11.1 numbagg: None pint: None setuptools: 52.0.0.post20210125 pip: 21.0 conda: 4.9.2 pytest: 6.2.2 IPython: 7.19.0 sphinx: 3.4.3	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5024/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
882105903	MDU6SXNzdWU4ODIxMDU5MDM=	5281	'Parallelized' apply_ufunc for scripy.interpolate.griddata	LJaksic 74414841	open		4	2021-05-09T10:08:46Z	2022-04-09T01:39:13Z		NONE	Hi, I'm working with large files from an ocean model with an unstructered grid. For instance, variable flow velocity `ux` with dimensions `(194988, 1009, 20)` for respectively: 'nFlowElement' (name unstructered grid element), 'time' and laydim (depth dimension). I'd like to interpolate these results to a structured grid with dimensions `(600, 560, 1009, 20)`for respectively: latitude, longitude, time and laydim. For this I am using `scipy.interpolate.griddata`. As these dataarrays are too large to load into your working memory at once, I am trying to work with 'chunks' (dask). Unfortunately, I bump into problems when trying to use apply_ufunc with setting: `dask = 'parallelized'`. For smaller computational domains (smaller nFlowElement dimension) I ám still able to load the dataarray in my work memory. Then, the following code gives me the wanted result: ``` def interp_to_grid(u,xc,yc,xint,yint): print(u.shape,xc.shape,xint.shape) ug = griddata((xc,yc),u,(xint,yint), method='nearest', fill_value=np.nan) return ug uxg = xr.apply_ufunc(interp_to_grid, ux, xc, yc, xint, yint, dask = 'allowed', input_core_dims=[['nFlowElem','time','laydim'],['nFlowElem'],['nFlowElem'],['dim_0','dim_1'],['dim_0','dim_1']], output_core_dims=[['dim_0','dim_1','time','laydim']], output_dtypes = [xr.DataArray] ) `` Notice that in the function interp_to_grid the input variables have the following dimensions: -u`(i.e. ux, the original flow velocity output): (194988, 1009, 20) for (nFlowElem, time, laydim) -`xc,yc`(the latitude and longitude coordinates associated with these 194988 elements) so both (194988,) -`xint, yint`(the structured grid coordinates to which I would like to interpolate the data): both are (600, 560) for (dim_0,dim_1) Notice that scipy.interpolate.griddata does not require me to loop over the time and laydim dimension (as formulated in the code above). For this it is criticial to feed`griddata` the dimensions in the right order ('time' and 'laydim' last). The interpolated result, uxg, has dimensions (600, 560, 1009, 20) - as wanted and expected. However, for much larger spatial domains it is required to work with dask = 'parallelized', because these input dataarrays can nolonger be loaded into my working memory. I have tried to apply chunks over the time dimension, but also over the nFlowElement dimension. I am aware that it is not possible to chunk over core dimensions. This is one of my "parallel" attempts (with chunks along the time dim): Input ux: <xarray.DataArray 'ucx' (nFlowElem: 194988, time: 1009, laydim: 20)> dask.array<transpose, shape=(194988, 1009, 20), dtype=float64, chunksize=(194988, 10, 20), chunktype=numpy.ndarray> Coordinates: FlowElem_xcc (nFlowElem) float64 dask.array<chunksize=(194988,), meta=np.ndarray> FlowElem_ycc (nFlowElem) float64 dask.array<chunksize=(194988,), meta=np.ndarray> * time (time) datetime64[ns] 2014-09-17 ... 2014-10-01 Dimensions without coordinates: nFlowElem, laydim Attributes: standard_name: eastward_sea_water_velocity long_name: velocity on flow element center, x-component units: m s-1 grid_mapping: wgs84 Apply_func: `uxg = xr.apply_ufunc(interp_to_grid, ux, xc, yc, xint, yint, dask = 'parallelized', input_core_dims=[['nFlowElem'],['nFlowElem'],['nFlowElem'],['dim_0','dim_1'],['dim_0','dim_1']], output_core_dims=[['dim_0','dim_1']], output_dtypes = [xr.DataArray], )` Gives error: ``` File "interpnd.pyx", line 78, in scipy.interpolate.interpnd.NDInterpolatorBase.init File "interpnd.pyx", line 192, in scipy.interpolate.interpnd._check_init_shape ValueError: different number of values and points `` I have played around a lot with changing the core dimensions in apply_ufunc and the dimension along which to chunk. Also I have tried to manually change the order of dimensions of dataarrayu`which is 'fed to' griddata (in`interp_to_grid`). Any advice is very welcome! Best Wishes, Luka	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5281/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
856900805	MDU6SXNzdWU4NTY5MDA4MDU=	5148	Handling of non-string dimension names	bcbnz 367900	open		5	2021-04-13T12:13:44Z	2022-04-09T01:36:19Z		CONTRIBUTOR	While working on a pull request (#5149) for #5146 I came across an inconsistency in allowed dimension names. If I try and create a DataArray with a non-string dimension, I get a TypeError: ```python console import xarray as xr da = xr.DataArray(np.ones((5, 5)), dims=[1, "y"]) ... TypeError: dimension 1 is not a string ``` But creating it with a string and renaming it works: ```python console da = xr.DataArray(np.ones((5, 5)), dims=["x", "y"]).rename(x=1) da <xarray.DataArray (1: 5, y: 5)> array([[1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.]]) Dimensions without coordinates: 1, y ``` I can create a dataset via this renaming, but trying to get the repr value fails as `xarray.core.utils.SortedKeysDict` tries to sort it and cannot compare the string dimension to the int dimension: ```python console import xarray as xr ds = xr.Dataset({"test": xr.DataArray(np.ones((5, 5)), dims=["x", "y"]).rename(x=1)}) ds ... ~/software/external/xarray/xarray/core/formatting.py in dataset_repr(ds) 519 520 dims_start = pretty_print("Dimensions:", col_width) --> 521 summary.append("{}({})".format(dims_start, dim_summary(ds))) 522 523 if ds.coords: ~/software/external/xarray/xarray/core/formatting.py in dim_summary(obj) 422 423 def dim_summary(obj): --> 424 elements = [f"{k}: {v}" for k, v in obj.sizes.items()] 425 return ", ".join(elements) 426 ~/software/external/xarray/xarray/core/formatting.py in <listcomp>(.0) 422 423 def dim_summary(obj): --> 424 elements = [f"{k}: {v}" for k, v in obj.sizes.items()] 425 return ", ".join(elements) 426 /usr/lib/python3.9/_collections_abc.py in iter(self) 847 848 def iter(self): --> 849 for key in self._mapping: 850 yield (key, self._mapping[key]) 851 ~/software/external/xarray/xarray/core/utils.py in iter(self) 437 438 def iter(self) -> Iterator[K]: --> 439 return iter(self.mapping) 440 441 def len(self) -> int: ~/software/external/xarray/xarray/core/utils.py in iter(self) 504 def iter(self) -> Iterator[K]: 505 # see #4571 for the reason of the type ignore --> 506 return iter(sorted(self.mapping)) # type: ignore[type-var] 507 508 def len(self) -> int: TypeError: '<' not supported between instances of 'str' and 'int' ``` The same thing happens if I call rename on the dataset rather than the array it is initialised with. If the initialiser requires the dimension names to be strings, and other code (which includes the HTML formatter I was looking at when I found this) assume that they are, then `rename` and any other method which can alter dimension names should also enforce the string requirement. Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: 851d85b9203b49039237b447b3707b270d613db5 python: 3.9.2 (default, Feb 20 2021, 18:40:11) [GCC 10.2.0] python-bits: 64 OS: Linux OS-release: 5.11.13-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_NZ.UTF-8 LOCALE: en_NZ.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.17.0 pandas: 1.2.3 numpy: 1.20.1 scipy: 1.6.2 netCDF4: 1.5.6 pydap: None h5netcdf: 0.10.0 h5py: 3.2.1 Nio: None zarr: None cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.2 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.03.0 distributed: 2021.03.0 matplotlib: 3.4.1 cartopy: 0.18.0 seaborn: 0.11.1 numbagg: None pint: None setuptools: 54.2.0 pip: 20.3.1 conda: None pytest: 6.2.3 IPython: 7.22.0 sphinx: 3.5.4	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5148/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
910844095	MDU6SXNzdWU5MTA4NDQwOTU=	5434	xarray.open_rasterio	ghost 10137	closed		2	2021-06-03T20:51:38Z	2022-04-09T01:31:26Z	2022-04-09T01:31:26Z	NONE	Could you please change `xarray.open_rasterio` from `experimental` to `stable` with more faster capability of reading geotiff files (if possible)? For original array indexing capabilities, I would like to stick in `xarray` than `rioxarray`. With much respected. Thank you.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5434/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1030768250	I_kwDOAMm_X849cEZ6	5877	Rolling() gives values different from pd.rolling()	chiaral 8453445	open		4	2021-10-19T21:41:42Z	2022-04-09T01:29:07Z		CONTRIBUTOR	I am not sure this is a bug - but it clearly doesn't give the results the user would expect. The rolling sum of zeros gives me values that are not zeros ```python var = np.array([0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.31 , 0.91999996, 8.3 , 1.42 , 0.03 , 1.22 , 0.09999999, 0.14 , 0.13 , 0. , 0.12 , 0.03 , 2.53 , 0. , 0.19999999, 0.19999999, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], dtype='float32') timet = np.array([ 43200000000000, 129600000000000, 216000000000000, 302400000000000, 388800000000000, 475200000000000, 561600000000000, 648000000000000, 734400000000000, 820800000000000, 907200000000000, 993600000000000, 1080000000000000, 1166400000000000, 1252800000000000, 1339200000000000, 1425600000000000, 1512000000000000, 1598400000000000, 1684800000000000, 1771200000000000, 1857600000000000, 1944000000000000, 2030400000000000, 2116800000000000, 2203200000000000, 2289600000000000, 2376000000000000, 2462400000000000, 2548800000000000, 2635200000000000, 2721600000000000, 2808000000000000, 2894400000000000, 2980800000000000], dtype='timedelta64[ns]') ds_ex = xr.Dataset(data_vars=dict( pr=(["time"], var), ), coords=dict( time=("time", timet) ), ) ds_ex.rolling(time=3).sum().pr.values ``` it gives me this result: array([ nan, nan, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 3.1000000e-01, 1.2300000e+00, 9.5300007e+00, 1.0640000e+01, 9.7500000e+00, 2.6700001e+00, 1.3500001e+00, 1.4600002e+00, 3.7000012e-01, 2.7000013e-01, 2.5000012e-01, 1.5000013e-01, 2.6800001e+00, 2.5600002e+00, 2.7300003e+00, 4.0000033e-01, 4.0000033e-01, 2.0000035e-01, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07], dtype=float32) Note the non zero values - the non zero value changes depending on whether i use float64 or float32 as precision of my data. So this seems to be a precision related issue (although the first values are correctly set to zero), in fact other sums of values are not exactly what they should be. The small difference at the 8th/9th decimal position can be expected due to precision, but the fact that the 0s become non zeros is problematic imho, especially if not documented. Oftentimes zero in geoscience data can mean a very specific thing (i.e. zero rainfall will be characterized differently than non-zero). in pandas this instead works: `python df_ex = ds_ex.to_dataframe() df_ex.rolling(window=3).sum().values.T` gives me array([[ nan, nan, 0. , 0. , 0. , 0. , 0. , 0.31 , 1.22999996, 9.53000015, 10.6400001 , 9.75000015, 2.66999999, 1.35000001, 1.46000002, 0.36999998, 0.27 , 0.24999999, 0.15 , 2.67999997, 2.55999997, 2.72999996, 0.39999998, 0.39999998, 0.19999999, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ]]) What you expected to happen: the sum of zeros should be zero. If this cannot be achieved/expected because of precision issues, it should be documented. Anything else we need to know?: I discovered this behavior in my old environments, but I created a new ad hoc environment with the latest versions, and it does the same thing. Environment: INSTALLED VERSIONS commit: None python: 3.9.7 (default, Sep 16 2021, 08:50:36) [Clang 10.0.0 ] python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 0.19.0 pandas: 1.3.3 numpy: 1.21.2 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 58.0.4 pip: 21.2.4 conda: None pytest: None IPython: 7.28.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5877/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
653442225	MDU6SXNzdWU2NTM0NDIyMjU=	4209	`xr.save_mfdataset()` doesn't honor `compute=False` argument	andersy005 13301940	open		4	2020-07-08T16:40:11Z	2022-04-09T01:25:56Z		MEMBER	What happened: While using `xr.save_mfdataset()` function with `compute=False` I noticed that the function returns a `dask.delayed` object, but it doesn't actually defer the computation i.e. it actually writes datasets right away. What you expected to happen: I expect the datasets to be written when I explicitly call `.compute()` on the returned delayed object. Minimal Complete Verifiable Example: ```python In [2]: import xarray as xr In [3]: ds = xr.tutorial.open_dataset('rasm', chunks={}) In [4]: ds Out[4]: <xarray.Dataset> Dimensions: (time: 36, x: 275, y: 205) Coordinates: * time (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00 xc (y, x) float64 dask.array<chunksize=(205, 275), meta=np.ndarray> yc (y, x) float64 dask.array<chunksize=(205, 275), meta=np.ndarray> Dimensions without coordinates: x, y Data variables: Tair (time, y, x) float64 dask.array<chunksize=(36, 205, 275), meta=np.ndarray> Attributes: title: /workspace/jhamman/processed/R1002RBRxaaa01a/l... institution: U.W. source: RACM R1002RBRxaaa01a output_frequency: daily output_mode: averaged convention: CF-1.4 references: Based on the initial model of Liang et al., 19... comment: Output from the Variable Infiltration Capacity... nco_openmp_thread_number: 1 NCO: "4.6.0" history: Tue Dec 27 14:15:22 2016: ncatted -a dimension... In [5]: path = "test.nc" In [7]: ls -ltrh test.nc ls: cannot access test.nc: No such file or directory In [8]: tasks = xr.save_mfdataset(datasets=[ds], paths=[path], compute=False) In [9]: tasks Out[9]: Delayed('list-aa0b52e0-e909-4e65-849f-74526d137542') In [10]: ls -ltrh test.nc -rw-r--r-- 1 abanihi ncar 14K Jul 8 10:29 test.nc ``` Anything else we need to know?: Environment: Output of <tt>xr.show_versions()</tt> ```python INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 \| packaged by conda-forge \| (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 3.10.0-693.21.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4 xarray: 0.15.1 pandas: 0.25.3 numpy: 1.18.5 scipy: 1.5.0 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.2.0 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.20.0 distributed: 2.20.0 matplotlib: 3.2.1 cartopy: None seaborn: None numbagg: None setuptools: 49.1.0.post20200704 pip: 20.1.1 conda: None pytest: None IPython: 7.16.1 sphinx: None ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4209/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
439875798	MDU6SXNzdWU0Mzk4NzU3OTg=	2937	encoding of boolean dtype in zarr	rabernat 1197350	open		3	2019-05-03T03:53:27Z	2022-04-09T01:22:42Z		MEMBER	I want to store an array with 1364688000 boolean values in zarr. I will have to read this array many times, so I am trying to do it as efficiently as possible. I have noticed that, if we try to write boolean data to zarr from xarray, zarr stores it as `i8`. ~This means we are using 8x more memory than we actually need.~ In researching this, I actually learned that numpy bools use a full byte of memory 😲! However, we could still improve performance (albeit very marginally) by skipping the unnecessary dtype encoding that happens here. Example `python import xarray as xr import zarr for dtype in ['f8', 'i4', 'bool']: ds = xr.DataArray([1, 0]).astype(dtype).to_dataset('foo') store = {} ds.to_zarr(store) za = zarr.open(store)['foo'] print(dtype, za.dtype, za.attrs.get('dtype'))` gives `f8 float64 None i4 int32 None bool int8 bool` So it seems like, during serialization of bool data, xarray is converting the data to int8 and then adding a `{'dtype': 'bool'}` to the attributes as encoding. When the data is read back, this gets decoded and the data is coerced back to bool. Problem description Since zarr is fully capable of storing bool data directly, we should not need to encode the data as i8. I think this happens in `encode_cf_variable`: https://github.com/pydata/xarray/blob/612d390f925e5490314c363e5e368b2a8bd5daf0/xarray/conventions.py#L236 which calls `maybe_encode_bools`: https://github.com/pydata/xarray/blob/612d390f925e5490314c363e5e368b2a8bd5daf0/xarray/conventions.py#L105-L112 So maybe we make the boolean encoding optional? Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 \| packaged by conda-forge \| (default, Feb 28 2019, 09:07:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-693.17.1.el7.centos.plus.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.8.18 libnetcdf: 4.4.1.1 xarray: 0.12.1 pandas: 0.20.3 numpy: 1.13.3 scipy: 1.1.0 netCDF4: 1.3.0 pydap: None h5netcdf: 0.5.0 h5py: 2.7.1 Nio: None zarr: 2.3.1 cftime: None nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 0.19.0+3.g064ebb1 distributed: 1.21.8 matplotlib: 3.0.3 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 36.6.0 pip: 9.0.1 conda: None pytest: 3.2.1 IPython: 6.2.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2937/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
650549352	MDU6SXNzdWU2NTA1NDkzNTI=	4197	Provide a "shrink" command to remove bounding nan/ whitespace of DataArray	cwerner 13906519	open		7	2020-07-03T11:55:05Z	2022-04-09T01:22:31Z		NONE	I'm currently trying to come up with an elegant solution to remove extra whitespace/ nan-values along the edges of a 2D DataArray. I'm working with geographic data and search for an automatic way to shrink the extend to valid data only. Think a map of the EU, but remove all cols/ rows of the array (starting from the edges) that only contain nan. Describe the solution you'd like A shrink command that removes all nan rows/ cols at the edges of a DataArray. Describe alternatives you've considered I currently do this with NumPy operating on the raw data and creating a new DataArray afterwards	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4197/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
528168017	MDU6SXNzdWU1MjgxNjgwMTc=	3573	rasterio test failure	dcherian 2448579	closed		1	2019-11-25T15:40:19Z	2022-04-09T01:17:32Z	2022-04-09T01:17:32Z	MEMBER	version `rasterio 1.1.1 py36h900e953_0 conda-forge` ``` =================================== FAILURES =================================== ___ TestRasterio.testrasterio_vrt ____ self = <xarray.tests.test_backends.TestRasterio object at 0x7fc8355c8f60> `def test_rasterio_vrt(self): import rasterio # tmp_file default crs is UTM: CRS({'init': 'epsg:32618'} with create_tmp_geotiff() as (tmp_file, expected): with rasterio.open(tmp_file) as src: with rasterio.vrt.WarpedVRT(src, crs="epsg:4326") as vrt: expected_shape = (vrt.width, vrt.height) expected_crs = vrt.crs expected_res = vrt.res # Value of single pixel in center of image lon, lat = vrt.xy(vrt.width // 2, vrt.height // 2)` `expected_val = next(vrt.sample([(lon, lat)]))` xarray/tests/test_backends.py:3966: /usr/share/miniconda/envs/xarray-tests/lib/python3.6/site-packages/rasterio/sample.py:43: in sample_gen data = read(indexes, window=window, masked=masked, boundless=True) ??? E ValueError: WarpedVRT does not permit boundless reads rasterio/_warp.pyx:978: ValueError ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3573/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
504497403	MDU6SXNzdWU1MDQ0OTc0MDM=	3386	add option to open_mfdataset for not using dask	sipposip 42270910	closed		6	2019-10-09T08:33:53Z	2022-04-09T01:16:21Z	2022-04-09T01:16:21Z	NONE	open_mfdataset only works with dask, whereas with open_dataset one can choose to use dask or not. It would be nice have an option (e.g. use_dask=False) to not use dask. My special use-case is the following: I use netcdf data as input for a tensorflow/keras application. I use parallel preprocessing threads in Keras. When using dask arrays, it gets complicated because both dask and tensorflow work with threads. I do not need any processing capability of dask/xarray, I only need a lazily loaded array that I can slice, and where the slices are loaded the moment they are accessed. So my application works nice with open_dataset (without defining chunks, and thus not using dask, but the data is accessed slice by slice, so it is never loaded as a whole into memory). However, it would be nice to have the same with open_mfdataset. Right now my workaround is to use netCDF4.MFDataset . (Obviously another workaround would be to concatenate my files into one and use open_dataset) Opening each file separately with open_dataset, and then concatenating them with xr.concat does not work, as this loads the data into memory.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3386/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

58 rows where type = "issue" and "updated_at" is on date 2022-04-09 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

each a is expected to be chunked separately

but when we save it, it gets saved as a single chunk

so if we open it up with expected chunksizes (not knowing that b is empty):

we get a warning :(

Relevant log output

Anything else we need to know?

Environment

INSTALLED VERSIONS [3/1946]

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

setup

need to write out and read back in

only selecting the shorter string will fail

will work if the char dim name is removed from encoding of the now shorter arr

Relevant log output

```Python

Anything else we need to know?

Environment

INSTALLED VERSIONS

MCVE Code Sample

Your code here

Expected Output

Problem Description

Output of xr.show_versions()

MCVE Code Sample

3d array with coords

2d array without coords

expand 2d to 3d

concat

paired array

Output

Problem Description

Assumes daily increments

sst_std=xr.concat(sst_std_calc, dim=pd.DatetimeIndex(date_list, name='time'))

sst_min = xr.concat(sst_min_calc, dim=pd.DatetimeIndex(date_list, name='time'))

sst_max = xr.concat(sst_max_calc, dim=pd.DatetimeIndex(date_list, name='time'))

INSTALLED VERSIONS

Problem description

Put your MCVE code here

create the test data (each is 100 by 100 by 10 array of random floats)

Their A and B coordinates are completely matching. Their C coordinates are completely disjoint.

Stack problem:

Unstack problem:

MCVE Code Sample

Expected Output

Problem Description

Versions

one of these bins will be empty

Workaround

Solution

Problem

Solution

Workaround

Conclusion

MCVE Code Sample

Seasonal mean

The seasons are ordered in alphabetical order.

I want to sort them based on time.

But time was dropped, so I have to do this:

Expected Output

Why does groupby drop time?

I would expect a dataset that looks like this:

Problem Description

Output of xr.show_versions()

MCVE Code Sample

Problem Description

Output of xr.show_versions()

Code Sample, a copy-pastable example if possible

Your code here

Problem description

Expected Output

Output of xr.show_versions()

Code Sample, a copy-pastable example if possible

Create array, coordinate time's dtype is timedelta64[ns]

each `a` is expected to be chunked separately

Output of `xr.show_versions()`

Output of `xr.show_versions()`

Output of `xr.show_versions()`

Output of `xr.show_versions()`

Output of `xr.show_versions()`

Output of `xr.show_versions()`

Output of `xr.show_versions()`

Output of `xr.show_versions()`

Output of `xr.show_versions()`

Output of `xr.show_versions()`

Output of `xr.show_versions()`