home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

58 rows where type = "issue" and "updated_at" is on date 2022-04-09 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, author_association, created_at (date), updated_at (date), closed_at (date)

state 2

  • open 30
  • closed 28

type 1

  • issue · 58 ✖

repo 1

  • xarray 58
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1177665302 I_kwDOAMm_X85GMb8W 6401 Unnecessary warning when specifying `chunks` opening dataset with empty dimension jaicher 4666753 closed 0     0 2022-03-23T06:38:25Z 2022-04-09T20:27:40Z 2022-04-09T20:27:40Z CONTRIBUTOR      

What happened?

I receive unnecessary warnings when opening Zarr datasets with empty dimensions/arrays using the chunks argument (for a non-empty dimension).

If an array has zero size (due to an empty dimension), it is saved as a single chunk regardless of Dask chunking on other dimensions (#5742). If the chunks parameter is provided for other dimensions when loading the Zarr file (based on the expected chunksizes were the array nonempty), xarray gives a warning about potentially degraded performance from splitting the single chunk.

What did you expect to happen?

I expect no warning to be raised when there is no data:

  • performance degradation on an empty array should be negligible.
  • we don't always know if one of the dimensions is empty until loading. But we would use the chunks parameter for dimensions with consistent chunksizes (to specify a multiple of what's on disk) -- this is thrown off when other dimensions are empty.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np

each a is expected to be chunked separately

ds = xr.Dataset({"x": (("a", "b"), np.empty((4, 0)))}).chunk({"a": 1})

but when we save it, it gets saved as a single chunk

ds.to_zarr("tmp.zarr")

so if we open it up with expected chunksizes (not knowing that b is empty):

ds2 = xr.open_zarr("tmp.zarr", chunks={"a": 1})

we get a warning :(

```

Relevant log output

Python {...}/miniconda3/envs/new-majiq/lib/python3.8/site-packages/xarray/core/dataset.py:410: UserWarning: Specified Dask chunks (1, 1, 1, 1) would separate on disks chunk shape 4 for dime nsion a. This could degrade performance. (chunks = {'a': (1, 1, 1, 1), 'b': (0,)}, preferred_ch unks = {'a': 4, 'b': 1}). Consider rechunking after loading instead. _check_chunks_compatibility(var, output_chunks, preferred_chunks)

Anything else we need to know?

This can be fixed by only calling _check_chunks_compatibility() whenever var is nonempty (PR forthcoming).

Environment

INSTALLED VERSIONS [3/1946]

commit: None python: 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:42:07) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: None

xarray: 2022.3.0 pandas: 1.4.1 numpy: 1.22.2 scipy: 1.8.0 netCDF4: None pydap: None h5netcdf: None h5py: 3.6.0 Nio: None zarr: 2.11.1 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.4 dask: 2022.01.0 distributed: 2022.01.0 matplotlib: 3.5.1 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.01.0 cupy: None pint: None sparse: None setuptools: 59.8.0 pip: 22.0.4 conda: None pytest: 7.0.1 IPython: 8.1.1 sphinx: 4.4.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6401/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1167883842 I_kwDOAMm_X85FnH5C 6352 to_netcdf from subsetted Dataset with strings loaded from char array netCDF can sometimes fail DocOtak 868027 open 0     0 2022-03-14T04:52:38Z 2022-04-09T16:59:52Z   CONTRIBUTOR      

What happened?

Not quite sure what to actually title this, so feel free to edit it.

I have some netcdf files modeled after the Argo _prof file format (CF Discrete sampling geometry incomplete multidimensional array representation). While working on splitting these into individual profiles, I would occasionally get exceptions thrown complaining about broadcasting. I eventually narrowed this down to some string variables we maintain for historic purposes. Depending on the row split apart, the string data in each cell could be shorter which would result in a stringN having some different N (e.g. string4 = 3 in the CDL). If while serializing, a different string variable is being encoded that actually has length 4, it would reuse the now incorrect string4 dim name.

The above situation seems to only occur when a netCDF file is read back into xarray and the char_dim_name encoding key is set.

What did you expect to happen?

Successful serialization to netCDF.

Minimal Complete Verifiable Example

```Python

setup

import numpy as np import xarray as xr

one_two = xr.DataArray(np.array(["a", "aa"], dtype="object"), dims=["dim0"]) two_two = xr.DataArray(np.array(["aa", "aa"], dtype="object"), dims=["dim0"]) ds = xr.Dataset({"var0": one_two, "var1": two_two}) ds.var0.encoding["dtype"] = "S1" ds.var1.encoding["dtype"] = "S1"

need to write out and read back in

ds.to_netcdf("test.nc")

only selecting the shorter string will fail

ds1 = xr.load_dataset("test.nc") ds1[{"dim0": 1}].to_netcdf("ok.nc") ds1[{"dim0": 0}].to_netcdf("error.nc")

will work if the char dim name is removed from encoding of the now shorter arr

ds1 = xr.load_dataset("test.nc") del ds1.var0.encoding["char_dim_name"] ds1[{"dim0": 0}].to_netcdf("will_work.nc") ```

Relevant log output

```Python

IndexError Traceback (most recent call last) /var/folders/y1/63dlf4614h5d2cgr5g1t_5lh0000gn/T/ipykernel_64155/447008818.py in <module> 2 ds1 = xr.load_dataset("test.nc") 3 ds1[{"dim0": 1}].to_netcdf("ok.nc") ----> 4 ds1[{"dim0": 0}].to_netcdf("error.nc")

~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1899 from ..backends.api import to_netcdf 1900 -> 1901 return to_netcdf( 1902 self, 1903 path,

~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1070 # TODO: allow this work (setting up the file for writing array data) 1071 # to be parallelized with dask -> 1072 dump_to_store( 1073 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1074 )

~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1117 variables, attrs = encoder(variables, attrs) 1118 -> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) 1120 1121

~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 263 self.set_attributes(attributes) 264 self.set_dimensions(variables, unlimited_dims=unlimited_dims) --> 265 self.set_variables( 266 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims 267 )

~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 305 ) 306 --> 307 writer.add(source, target) 308 309 def set_dimensions(self, variables, unlimited_dims=None):

~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in add(self, source, target, region) 154 target[region] = source 155 else: --> 156 target[...] = source 157 158 def sync(self, compute=True):

~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/netCDF4_.py in setitem(self, key, value) 70 with self.datastore.lock: 71 data = self.get_array(needs_lock=False) ---> 72 data[key] = value 73 if self.datastore.autoclose: 74 self.datastore.close(needs_lock=False)

src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.setitem()

src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._put()

IndexError: size of data array does not conform to slice ```

Anything else we need to know?

I've been unable to recreate the specific error I'm getting in a minimal example. However, removing the char_dim_name encoding key does solve this.

When digging in the xarray issues, these looked maybe relevant: #2219 #2895

Actual traceback I get with my data ```python --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /var/folders/y1/63dlf4614h5d2cgr5g1t_5lh0000gn/T/ipykernel_64155/3328648456.py in <module> ----> 1 ds[{"N_PROF": 0}].to_netcdf("test.nc") ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1899 from ..backends.api import to_netcdf 1900 -> 1901 return to_netcdf( 1902 self, 1903 path, ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1070 # TODO: allow this work (setting up the file for writing array data) 1071 # to be parallelized with dask -> 1072 dump_to_store( 1073 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1074 ) ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1117 variables, attrs = encoder(variables, attrs) 1118 -> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) 1120 1121 ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 263 self.set_attributes(attributes) 264 self.set_dimensions(variables, unlimited_dims=unlimited_dims) --> 265 self.set_variables( 266 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims 267 ) ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 305 ) 306 --> 307 writer.add(source, target) 308 309 def set_dimensions(self, variables, unlimited_dims=None): ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in add(self, source, target, region) 154 target[region] = source 155 else: --> 156 target[...] = source 157 158 def sync(self, compute=True): ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/netCDF4_.py in __setitem__(self, key, value) 70 with self.datastore.lock: 71 data = self.get_array(needs_lock=False) ---> 72 data[key] = value 73 if self.datastore.autoclose: 74 self.datastore.close(needs_lock=False) src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__setitem__() ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/netCDF4/utils.py in _StartCountStride(elem, shape, dimensions, grp, datashape, put, use_get_vars) 354 fullslice = False 355 if fullslice and datashape and put and not hasunlim: --> 356 datashape = broadcasted_shape(shape, datashape) 357 358 # pad datashape with zeros for dimensions not being sliced (issue #906) ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/netCDF4/utils.py in broadcasted_shape(shp1, shp2) 962 a = as_strided(x, shape=shp1, strides=[0] * len(shp1)) 963 b = as_strided(x, shape=shp2, strides=[0] * len(shp2)) --> 964 return np.broadcast(a, b).shape ValueError: shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (5,) and arg 1 with shape (6,). ```

Environment

INSTALLED VERSIONS

commit: None python: 3.9.9 (main, Jan 5 2022, 11:21:18) [Clang 13.0.0 (clang-1300.0.29.30)] python-bits: 64 OS: Darwin OS-release: 21.3.0 machine: arm64 processor: arm byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.13.0 libnetcdf: 4.8.1

xarray: 2022.3.0 pandas: 1.3.5 numpy: 1.22.0 scipy: None netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: 0.18 sparse: None setuptools: 58.1.0 pip: 21.2.4 conda: None pytest: 6.2.5 IPython: 7.31.0 sphinx: 4.4.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6352/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
558455147 MDU6SXNzdWU1NTg0NTUxNDc= 3740 Error during slicing of a dataarray ankitesh97 16163706 closed 0     1 2020-02-01T01:26:55Z 2022-04-09T15:52:32Z 2022-04-09T15:52:31Z NONE      

MCVE Code Sample

```python

Your code here

``` loaded the dataset using ds = xr.open_mfdataset(in_fns, decode_times=False, decode_cf=False, concat_dim='time')

Expected Output

Problem Description

this my data array (da) <xarray.DataArray 'QAP' (time: 5184, lev: 30, lat: 64, lon: 128)> dask.array<concatenate, shape=(5184, 30, 64, 128), dtype=float32, chunksize=(48, 30, 64, 128), chunktype=numpy.ndarray> Coordinates: * lev (lev) float64 3.643 7.595 14.36 ... 957.5 976.3 992.6 * lon (lon) float64 0.0 2.812 5.625 8.438 ... 351.6 354.4 357.2 * lat (lat) float64 -87.86 -85.1 -82.31 ... 82.31 85.1 87.86 * time (time) float64 365.0 365.0 365.0 ... 707.9 708.0 708.0 Attributes: units: kg/kg long_name: Q after physics

when I am trying to slice it via da[1:] it throws an error saying conflicting sizes for dimension 'time': length 96 on 'this-array' and length 5183 on 'time'

Output of xr.show_versions()

version = 0.14.0

# Paste the output here xr.show_versions() here
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3740/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
596606599 MDU6SXNzdWU1OTY2MDY1OTk= 3957 Sort DataArray by data values along one dim zxdawn 30388627 closed 0     10 2020-04-08T14:05:44Z 2022-04-09T15:52:20Z 2022-04-09T15:52:20Z NONE      

.sortby() only supports sorting DataArray by coords values. I'm trying to sort one DataArray (cld) by data values along one dim and sort another DataArray (pair) by the same order.

MCVE Code Sample

```python import xarray as xr import numpy as np

x = 4 y = 2 z = 4 data = np.arange(xyz).reshape(z, y, x)

3d array with coords

cld_1 = xr.DataArray(data, dims=['z', 'y', 'x'], coords={'z': np.arange(z)})

2d array without coords

cld_2 = xr.DataArray(np.arange(xy).reshape(y, x)1.5+1, dims=['y', 'x'])

expand 2d to 3d

cld_2 = cld_2.expand_dims(z=[4])

concat

cld = xr.concat([cld_1, cld_2], dim='z')

paired array

pair = cld.copy(data=np.arange(xy(z+1)).reshape(z+1, y, x))

print(cld) print(pair) ```

Output

``` <xarray.DataArray (z: 5, y: 2, x: 4)> array([[[ 0. , 1. , 2. , 3. ], [ 4. , 5. , 6. , 7. ]],

   [[ 8. ,  9. , 10. , 11. ],
    [12. , 13. , 14. , 15. ]],

   [[16. , 17. , 18. , 19. ],
    [20. , 21. , 22. , 23. ]],

   [[24. , 25. , 26. , 27. ],
    [28. , 29. , 30. , 31. ]],

   [[ 1. ,  2.5,  4. ,  5.5],
    [ 7. ,  8.5, 10. , 11.5]]])

Coordinates: * z (z) int64 0 1 2 3 4 Dimensions without coordinates: y, x

<xarray.DataArray (z: 5, y: 2, x: 4)> array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7]],

   [[ 8,  9, 10, 11],
    [12, 13, 14, 15]],

   [[16, 17, 18, 19],
    [20, 21, 22, 23]],

   [[24, 25, 26, 27],
    [28, 29, 30, 31]],

   [[32, 33, 34, 35],
    [36, 37, 38, 39]]])

Coordinates: * z (z) int64 0 1 2 3 4 Dimensions without coordinates: y, x ```

Problem Description

I've tried argsort(): cld.argsort(axis=0), but the result is wrong: ``` <xarray.DataArray (z: 5, y: 2, x: 4)> array([[[0, 0, 0, 0], [0, 0, 0, 0]],

   [[4, 4, 4, 4],
    [4, 4, 4, 4]],

   [[1, 1, 1, 1],
    [1, 1, 1, 1]],

   [[2, 2, 2, 2],
    [2, 2, 2, 2]],

   [[3, 3, 3, 3],
    [3, 3, 3, 3]]], dtype=int64)

Coordinates: * z (z) int64 0 1 2 3 4 Dimensions without coordinates: y, x ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3957/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
621177286 MDU6SXNzdWU2MjExNzcyODY= 4082 "write to read-only" Error in xarray.open_mfdataset() with opendap datasets EliT1626 65610153 closed 0     26 2020-05-19T18:00:58Z 2022-04-09T15:51:46Z 2022-04-09T15:51:46Z NONE      

Error in loading in data from a THREDDS server. Can't find any info on what might be causing it based on the error messages themselves.

Code Sample

``` def list_dates(start, end): num_days = (end - start).days return [start + dt.timedelta(days=x) for x in range(num_days)]

start_date = dt.date(2017, 3, 1) end_date = dt.date(2017, 3, 31) date_list = list_dates(start_date, end_date) window = dt.timedelta(days=5)

url = 'https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.0/AVHRR/{0:%Y%m}/avhrr-only-v2.{0:%Y%m%d}.nc' data = [] cur_date = start_date for cur_date in date_list:

date_window = list_dates(cur_date - window, cur_date + window)
url_list = [url.format(x) for x in date_window]
window_data=xr.open_mfdataset(url_list).sst
data.append(window_data.mean('time'))

dataf=xr.concat(data, dim=pd.DatetimeIndex(date_list, name='time'))

``` Expected Output No error with dataf containing a data array with the dates listed above.

Error Description Error 1:

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.0/AVHRR/201703/avhrr-only-v2.20170322.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]

Error 2: OSError: [Errno -37] NetCDF: Write to read only: b'https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.0/AVHRR/201703/avhrr-only-v2.20170322.nc'

Versions python: 3.7.4 xarray: 0.15.0 pandas: 0.25.1 numpy: 1.16.5 scipy: 1.3.1 netcdf4: 1.5.3

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4082/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
643035732 MDU6SXNzdWU2NDMwMzU3MzI= 4169 "write to read-only" Error in xarray.open_mfdataset() when trying to write to a netcdf file EliT1626 65610153 closed 0     4 2020-06-22T12:35:57Z 2022-04-09T15:50:51Z 2022-04-09T15:50:51Z NONE      

Code Sample

``` xr.set_options(file_cache_maxsize=10)

Assumes daily increments

def list_dates(start, end): num_days = (end - start).days return [start + dt.timedelta(days=x) for x in range(num_days)]

def list_dates1(start, end): num_days = (end - start).days dates = [start + dt.timedelta(days=x) for x in range(num_days)] sorted_dates = sorted(dates, key=lambda date: (date.month, date.day)) grouped_dates = [list(g) for _, g in groupby(sorted_dates, key=lambda date: (date.month, date.day))] return grouped_dates

start_date = dt.date(2010, 1, 1) end_date = dt.date(2019, 12, 31) date_list = list_dates1(start_date, end_date) window1 = dt.timedelta(days=5) window2 = dt.timedelta(days=6)

url = 'https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/{0:%Y%m}/oisst-avhrr-v02r01.{0:%Y%m%d}.nc' end_date2 = dt.date(2010, 1, 2)

sst_mean=[] cur_date = start_date

for cur_date in date_list: sst_mean_calc = [] for i in cur_date: date_window=list_dates(i - window1, i + window2) url_list_window = [url.format(x) for x in date_window] window_data=xr.open_mfdataset(url_list_window).sst sst_mean_calc.append(window_data.mean('time'))
sst_mean.append(xr.concat(sst_mean_calc, dim='time').mean('time')) cur_date+=cur_date if cur_date[0] >= end_date2: break else: continue

sst_mean_climo_test=xr.concat(sst_mean, dim='time')

sst_std=xr.concat(sst_std_calc, dim=pd.DatetimeIndex(date_list, name='time'))

sst_min = xr.concat(sst_min_calc, dim=pd.DatetimeIndex(date_list, name='time'))

sst_max = xr.concat(sst_max_calc, dim=pd.DatetimeIndex(date_list, name='time'))

sst_mean_climo_test.to_netcdf(path='E:/Riskpulse_HD/SST_stuff/sst_mean_climo_test') ``` Explanation of Code This code (climatology for SSTs) creates a list of dates between the specified start and end dates that contains the same day number for every month through the year span. For example, date_list[0] contains 10 datetime dates that start with 1-1-2010, 1-1-2011...1-1-2019. I then request OISST data from an opendap server and take a centered mean of the date in question (this case I did it for the first and second of January). In other words, I am opening the files for Dec 27-Jan 6 and averaging all of them together. The final xarray dataset then contains two 'times', which is 10 years worth of data for Jan 1 and Jan 2. I want to then send this to a netcdf file such that I can save it on my local machine and use to create plots down the road. Hope this makes sense.

Error Messages

``` KeyError Traceback (most recent call last) ~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock) 197 try: --> 198 file = self._cache[self._key] 199 except KeyError:

~\Anaconda3\lib\site-packages\xarray\backends\lru_cache.py in getitem(self, key) 52 with self._lock: ---> 53 value = self._cache[key] 54 self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/201801/oisst-avhrr-v02r01.20180106.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last) <ipython-input-3-f8395dcffb5e> in <module> 1 #xr.set_options(file_cache_maxsize=500) ----> 2 sst_mean_climo_test.to_netcdf(path='E:/Riskpulse_HD/SST_stuff/sst_mean_climo_test')

~\Anaconda3\lib\site-packages\xarray\core\dataarray.py in to_netcdf(self, args, kwargs) 2356 dataset = self.to_dataset() 2357 -> 2358 return dataset.to_netcdf(args, **kwargs) 2359 2360 def to_dict(self, data: bool = True) -> dict:

~\Anaconda3\lib\site-packages\xarray\core\dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1552 unlimited_dims=unlimited_dims, 1553 compute=compute, -> 1554 invalid_netcdf=invalid_netcdf, 1555 ) 1556

~\Anaconda3\lib\site-packages\xarray\backends\api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1095 return writer, store 1096 -> 1097 writes = writer.sync(compute=compute) 1098 1099 if path_or_file is None:

~\Anaconda3\lib\site-packages\xarray\backends\common.py in sync(self, compute) 202 compute=compute, 203 flush=True, --> 204 regions=self.regions, 205 ) 206 self.sources = []

~\Anaconda3\lib\site-packages\dask\array\core.py in store(sources, targets, lock, regions, compute, return_stored, kwargs) 943 944 if compute: --> 945 result.compute(kwargs) 946 return None 947 else:

~\Anaconda3\lib\site-packages\dask\base.py in compute(self, kwargs) 164 dask.base.compute 165 """ --> 166 (result,) = compute(self, traverse=False, kwargs) 167 return result 168

~\Anaconda3\lib\site-packages\dask\base.py in compute(args, kwargs) 442 postcomputes.append(x.dask_postcompute()) 443 --> 444 results = schedule(dsk, keys, kwargs) 445 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)]) 446

~\Anaconda3\lib\site-packages\dask\threaded.py in get(dsk, result, cache, num_workers, pool, kwargs) 82 get_id=_thread_get_id, 83 pack_exception=pack_exception, ---> 84 kwargs 85 ) 86

~\Anaconda3\lib\site-packages\dask\local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, **kwargs) 484 _execute_task(task, data) # Re-execute locally 485 else: --> 486 raise_exception(exc, tb) 487 res, worker_id = loads(res_info) 488 state["cache"][key] = res

~\Anaconda3\lib\site-packages\dask\local.py in reraise(exc, tb) 314 if exc.traceback is not tb: 315 raise exc.with_traceback(tb) --> 316 raise exc 317 318

~\Anaconda3\lib\site-packages\dask\local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception) 220 try: 221 task, data = loads(task_info) --> 222 result = _execute_task(task, data) 223 id = get_id() 224 result = dumps((result, id))

~\Anaconda3\lib\site-packages\dask\core.py in _execute_task(arg, cache, dsk) 119 # temporaries by their reference count and can execute certain 120 # operations in-place. --> 121 return func(*(_execute_task(a, cache) for a in args)) 122 elif not ishashable(arg): 123 return arg

~\Anaconda3\lib\site-packages\dask\array\core.py in getter(a, b, asarray, lock) 98 c = a[b] 99 if asarray: --> 100 c = np.asarray(c) 101 finally: 102 if lock:

~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87

~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 489 490 def array(self, dtype=None): --> 491 return np.asarray(self.array, dtype=dtype) 492 493 def getitem(self, key):

~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87

~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 651 652 def array(self, dtype=None): --> 653 return np.asarray(self.array, dtype=dtype) 654 655 def getitem(self, key):

~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87

~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 555 def array(self, dtype=None): 556 array = as_indexable(self.array) --> 557 return np.asarray(array[self.key], dtype=None) 558 559 def transpose(self, order):

~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87

~\Anaconda3\lib\site-packages\xarray\coding\variables.py in array(self, dtype) 70 71 def array(self, dtype=None): ---> 72 return self.func(self.array) 73 74 def repr(self):

~\Anaconda3\lib\site-packages\xarray\coding\variables.py in _scale_offset_decoding(data, scale_factor, add_offset, dtype) 216 217 def _scale_offset_decoding(data, scale_factor, add_offset, dtype): --> 218 data = np.array(data, dtype=dtype, copy=True) 219 if scale_factor is not None: 220 data *= scale_factor

~\Anaconda3\lib\site-packages\xarray\coding\variables.py in array(self, dtype) 70 71 def array(self, dtype=None): ---> 72 return self.func(self.array) 73 74 def repr(self):

~\Anaconda3\lib\site-packages\xarray\coding\variables.py in _apply_mask(data, encoded_fill_values, decoded_fill_value, dtype) 136 ) -> np.ndarray: 137 """Mask all matching values in a NumPy arrays.""" --> 138 data = np.asarray(data, dtype=dtype) 139 condition = False 140 for fv in encoded_fill_values:

~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87

~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 555 def array(self, dtype=None): 556 array = as_indexable(self.array) --> 557 return np.asarray(array[self.key], dtype=None) 558 559 def transpose(self, order):

~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in getitem(self, key) 71 def getitem(self, key): 72 return indexing.explicit_indexing_adapter( ---> 73 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem 74 ) 75

~\Anaconda3\lib\site-packages\xarray\core\indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method) 835 """ 836 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support) --> 837 result = raw_indexing_method(raw_key.tuple) 838 if numpy_indices.tuple: 839 # index the loaded np.ndarray

~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in _getitem(self, key) 82 try: 83 with self.datastore.lock: ---> 84 original_array = self.get_array(needs_lock=False) 85 array = getitem(original_array, key) 86 except IndexError:

~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in get_array(self, needs_lock) 61 62 def get_array(self, needs_lock=True): ---> 63 ds = self.datastore._acquire(needs_lock) 64 variable = ds.variables[self.variable_name] 65 variable.set_auto_maskandscale(False)

~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in _acquire(self, needs_lock) 359 360 def _acquire(self, needs_lock=True): --> 361 with self._manager.acquire_context(needs_lock) as root: 362 ds = _nc4_require_group(root, self._group, self._mode) 363 return ds

~\Anaconda3\lib\contextlib.py in enter(self) 110 del self.args, self.kwds, self.func 111 try: --> 112 return next(self.gen) 113 except StopIteration: 114 raise RuntimeError("generator didn't yield") from None

~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in acquire_context(self, needs_lock) 184 def acquire_context(self, needs_lock=True): 185 """Context manager for acquiring a file.""" --> 186 file, cached = self._acquire_with_cache_info(needs_lock) 187 try: 188 yield file

~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock) 206 # ensure file doesn't get overriden when opened again 207 self._mode = "a" --> 208 self._cache[self._key] = file 209 return file, False 210 else:

~\Anaconda3\lib\site-packages\xarray\backends\lru_cache.py in setitem(self, key, value) 71 elif self._maxsize: 72 # make room if necessary ---> 73 self._enforce_size_limit(self._maxsize - 1) 74 self._cache[key] = value 75 elif self._on_evict is not None:

~\Anaconda3\lib\site-packages\xarray\backends\lru_cache.py in _enforce_size_limit(self, capacity) 61 key, value = self._cache.popitem(last=False) 62 if self._on_evict is not None: ---> 63 self._on_evict(key, value) 64 65 def setitem(self, key: K, value: V) -> None:

~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in <lambda>(k, v) 12 # Global cache for storing open files. 13 FILE_CACHE: LRUCache[str, io.IOBase] = LRUCache( ---> 14 maxsize=cast(int, OPTIONS["file_cache_maxsize"]), on_evict=lambda k, v: v.close() 15 ) 16 assert FILE_CACHE.maxsize, "file cache must be at least size one"

netCDF4_netCDF4.pyx in netCDF4._netCDF4.Dataset.close()

netCDF4_netCDF4.pyx in netCDF4._netCDF4.Dataset._close()

netCDF4_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

RuntimeError: NetCDF: HDF error ``` I also tried changing setting xr.set_options(file_cache_maxsize=500) outside of the loop before trying to create the netcdf file and received this error:

``` KeyError Traceback (most recent call last) ~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock) 197 try: --> 198 file = self._cache[self._key] 199 except KeyError:

~\Anaconda3\lib\site-packages\xarray\backends\lru_cache.py in getitem(self, key) 52 with self._lock: ---> 53 value = self._cache[key] 54 self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/201512/oisst-avhrr-v02r01.20151231.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]

During handling of the above exception, another exception occurred:

OSError Traceback (most recent call last) <ipython-input-4-474cdce51e60> in <module> 1 xr.set_options(file_cache_maxsize=500) ----> 2 sst_mean_climo_test.to_netcdf(path='E:/Riskpulse_HD/SST_stuff/sst_mean_climo_test')

~\Anaconda3\lib\site-packages\xarray\core\dataarray.py in to_netcdf(self, args, kwargs) 2356 dataset = self.to_dataset() 2357 -> 2358 return dataset.to_netcdf(args, **kwargs) 2359 2360 def to_dict(self, data: bool = True) -> dict:

~\Anaconda3\lib\site-packages\xarray\core\dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1552 unlimited_dims=unlimited_dims, 1553 compute=compute, -> 1554 invalid_netcdf=invalid_netcdf, 1555 ) 1556

~\Anaconda3\lib\site-packages\xarray\backends\api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1095 return writer, store 1096 -> 1097 writes = writer.sync(compute=compute) 1098 1099 if path_or_file is None:

~\Anaconda3\lib\site-packages\xarray\backends\common.py in sync(self, compute) 202 compute=compute, 203 flush=True, --> 204 regions=self.regions, 205 ) 206 self.sources = []

~\Anaconda3\lib\site-packages\dask\array\core.py in store(sources, targets, lock, regions, compute, return_stored, kwargs) 943 944 if compute: --> 945 result.compute(kwargs) 946 return None 947 else:

~\Anaconda3\lib\site-packages\dask\base.py in compute(self, kwargs) 164 dask.base.compute 165 """ --> 166 (result,) = compute(self, traverse=False, kwargs) 167 return result 168

~\Anaconda3\lib\site-packages\dask\base.py in compute(args, kwargs) 442 postcomputes.append(x.dask_postcompute()) 443 --> 444 results = schedule(dsk, keys, kwargs) 445 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)]) 446

~\Anaconda3\lib\site-packages\dask\threaded.py in get(dsk, result, cache, num_workers, pool, kwargs) 82 get_id=_thread_get_id, 83 pack_exception=pack_exception, ---> 84 kwargs 85 ) 86

~\Anaconda3\lib\site-packages\dask\local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, **kwargs) 484 _execute_task(task, data) # Re-execute locally 485 else: --> 486 raise_exception(exc, tb) 487 res, worker_id = loads(res_info) 488 state["cache"][key] = res

~\Anaconda3\lib\site-packages\dask\local.py in reraise(exc, tb) 314 if exc.traceback is not tb: 315 raise exc.with_traceback(tb) --> 316 raise exc 317 318

~\Anaconda3\lib\site-packages\dask\local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception) 220 try: 221 task, data = loads(task_info) --> 222 result = _execute_task(task, data) 223 id = get_id() 224 result = dumps((result, id))

~\Anaconda3\lib\site-packages\dask\core.py in _execute_task(arg, cache, dsk) 119 # temporaries by their reference count and can execute certain 120 # operations in-place. --> 121 return func(*(_execute_task(a, cache) for a in args)) 122 elif not ishashable(arg): 123 return arg

~\Anaconda3\lib\site-packages\dask\array\core.py in getter(a, b, asarray, lock) 98 c = a[b] 99 if asarray: --> 100 c = np.asarray(c) 101 finally: 102 if lock:

~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87

~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 489 490 def array(self, dtype=None): --> 491 return np.asarray(self.array, dtype=dtype) 492 493 def getitem(self, key):

~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87

~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 651 652 def array(self, dtype=None): --> 653 return np.asarray(self.array, dtype=dtype) 654 655 def getitem(self, key):

~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87

~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 555 def array(self, dtype=None): 556 array = as_indexable(self.array) --> 557 return np.asarray(array[self.key], dtype=None) 558 559 def transpose(self, order):

~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87

~\Anaconda3\lib\site-packages\xarray\coding\variables.py in array(self, dtype) 70 71 def array(self, dtype=None): ---> 72 return self.func(self.array) 73 74 def repr(self):

~\Anaconda3\lib\site-packages\xarray\coding\variables.py in _scale_offset_decoding(data, scale_factor, add_offset, dtype) 216 217 def _scale_offset_decoding(data, scale_factor, add_offset, dtype): --> 218 data = np.array(data, dtype=dtype, copy=True) 219 if scale_factor is not None: 220 data *= scale_factor

~\Anaconda3\lib\site-packages\xarray\coding\variables.py in array(self, dtype) 70 71 def array(self, dtype=None): ---> 72 return self.func(self.array) 73 74 def repr(self):

~\Anaconda3\lib\site-packages\xarray\coding\variables.py in _apply_mask(data, encoded_fill_values, decoded_fill_value, dtype) 136 ) -> np.ndarray: 137 """Mask all matching values in a NumPy arrays.""" --> 138 data = np.asarray(data, dtype=dtype) 139 condition = False 140 for fv in encoded_fill_values:

~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87

~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 555 def array(self, dtype=None): 556 array = as_indexable(self.array) --> 557 return np.asarray(array[self.key], dtype=None) 558 559 def transpose(self, order):

~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in getitem(self, key) 71 def getitem(self, key): 72 return indexing.explicit_indexing_adapter( ---> 73 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem 74 ) 75

~\Anaconda3\lib\site-packages\xarray\core\indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method) 835 """ 836 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support) --> 837 result = raw_indexing_method(raw_key.tuple) 838 if numpy_indices.tuple: 839 # index the loaded np.ndarray

~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in _getitem(self, key) 82 try: 83 with self.datastore.lock: ---> 84 original_array = self.get_array(needs_lock=False) 85 array = getitem(original_array, key) 86 except IndexError:

~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in get_array(self, needs_lock) 61 62 def get_array(self, needs_lock=True): ---> 63 ds = self.datastore._acquire(needs_lock) 64 variable = ds.variables[self.variable_name] 65 variable.set_auto_maskandscale(False)

~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in _acquire(self, needs_lock) 359 360 def _acquire(self, needs_lock=True): --> 361 with self._manager.acquire_context(needs_lock) as root: 362 ds = _nc4_require_group(root, self._group, self._mode) 363 return ds

~\Anaconda3\lib\contextlib.py in enter(self) 110 del self.args, self.kwds, self.func 111 try: --> 112 return next(self.gen) 113 except StopIteration: 114 raise RuntimeError("generator didn't yield") from None

~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in acquire_context(self, needs_lock) 184 def acquire_context(self, needs_lock=True): 185 """Context manager for acquiring a file.""" --> 186 file, cached = self._acquire_with_cache_info(needs_lock) 187 try: 188 yield file

~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock) 202 kwargs = kwargs.copy() 203 kwargs["mode"] = self._mode --> 204 file = self._opener(self._args, *kwargs) 205 if self._mode == "w": 206 # ensure file doesn't get overriden when opened again

netCDF4_netCDF4.pyx in netCDF4._netCDF4.Dataset.init()

netCDF4_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

OSError: [Errno -37] NetCDF: Write to read only: b'https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/201512/oisst-avhrr-v02r01.20151231.nc' ``` I believe these errors have something to do with a post that I created a couple weeks ago (https://github.com/pydata/xarray/issues/4082).

I'm not sure if you can @ users on here, but @rsignell-usgs found out something about the caching before hand. It seems that this is some sort of Windows issue.

Versions python: 3.7.4 xarray: 0.15.1 pandas: 1.0.3 numpy: 1.18.1 scipy: 1.4.1 netcdf4: 1.4.2

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4169/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
924002003 MDU6SXNzdWU5MjQwMDIwMDM= 5483 Cannot interpolate on a multifile .grib array. Single file works fine. Alexander-Serov 22743277 closed 0     1 2021-06-17T14:36:57Z 2022-04-09T15:50:24Z 2022-04-09T15:50:23Z NONE      

What happened: I have multiple .grib files that I am able to successfully open using the xr.open_mfdataset() function and the cfgrib engine. However, I cannot interpolate the opened array due to a NonImplementedError from the dask package. Apparently, internally the interpolation requires some complicated slicing that is not yet there. The latitude and longitude are well within the stored grid. The interpolation works just fine if I open a single file using xr.load_dataset('file.grb', engine='cfgrib'). Since the files are too big, I cannot just load the array completely or resave the array into a single file. So I was wondering whether you might have ideas of a workaround so that I could get to the values I need, until it's implemented in dask. Basically, I just need to extract (interpolate) all variables at a handful of locations.

What you expected to happen: Interpolate the mutlifile grib array along latitude and longitude.

Minimal Complete Verifiable Example:

python dsmf = xr.open_mfdataset(glob('<root_path>/**/*.grb', recursive=True), engine='cfgrib', parallel = True, combine = 'nested', concat_dim='time') dsmf.interp(latitude=48, longitude=12)

Result: Traceback (most recent call last): File "<input>", line 1, in <module> File "C:\tools\miniconda3\envs\my_env\lib\site-packages\xarray\core\dataset.py", line 2989, in interp obj = self if assume_sorted else self.sortby([k for k in coords]) File "C:\tools\miniconda3\envs\my_env\lib\site-packages\xarray\core\dataset.py", line 5920, in sortby return aligned_self.isel(**indices) File "C:\tools\miniconda3\envs\my_env\lib\site-packages\xarray\core\dataset.py", line 2230, in isel var_value = var_value.isel(var_indexers) File "C:\tools\miniconda3\envs\my_env\lib\site-packages\xarray\core\variable.py", line 1135, in isel return self[key] File "C:\tools\miniconda3\envs\my_env\lib\site-packages\xarray\core\variable.py", line 780, in __getitem__ data = as_indexable(self._data)[indexer] File "C:\tools\miniconda3\envs\my_env\lib\site-packages\xarray\core\indexing.py", line 1312, in __getitem__ return array[key] File "C:\tools\miniconda3\envs\my_env\lib\site-packages\dask\array\core.py", line 1749, in __getitem__ dsk, chunks = slice_array(out, self.name, self.chunks, index2, self.itemsize) File "C:\tools\miniconda3\envs\my_env\lib\site-packages\dask\array\slicing.py", line 170, in slice_array dsk_out, bd_out = slice_with_newaxes(out_name, in_name, blockdims, index, itemsize) File "C:\tools\miniconda3\envs\my_env\lib\site-packages\dask\array\slicing.py", line 192, in slice_with_newaxes dsk, blockdims2 = slice_wrap_lists(out_name, in_name, blockdims, index2, itemsize) File "C:\tools\miniconda3\envs\my_env\lib\site-packages\dask\array\slicing.py", line 238, in slice_wrap_lists raise NotImplementedError("Don't yet support nd fancy indexing") NotImplementedError: Don't yet support nd fancy indexing

Anything else we need to know?: Since the files are too big, I am unable to share for the moment, but I suspect the issue might be reproducible on any multifile grib combination.

Environment:

INSTALLED VERSIONS

commit: None python: 3.8.10 | packaged by conda-forge | (default, May 11 2021, 06:25:23) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 13, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: ('English_United Kingdom', '1252') libhdf5: 1.10.6 libnetcdf: 4.7.3 xarray: 0.18.2 pandas: 1.2.4 numpy: 1.20.3 scipy: 1.6.3 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.8.3 cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.9.0 iris: None bottleneck: None dask: 2021.06.0 distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.1.2 conda: None pytest: 6.2.4 IPython: None sphinx: None

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5483/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
878481461 MDU6SXNzdWU4Nzg0ODE0NjE= 5276 open_mfdataset: Not a valid ID minhhg 11815787 closed 0     4 2021-05-07T05:34:02Z 2022-04-09T15:49:50Z 2022-04-09T15:49:50Z NONE      

I have about 601 NETCDF4 files saved using xarray. We try to use open_mfdataset to access these files. The main code calls this function many times. At the first few calls, it works fine, after for a while it throw the following error message "RuntimeError: NetCDF: Not a valid ID"

python def func(xpath, spec): doc = deepcopy(spec) with xr.open_mfdataset(xpath + "/*.nc", concat_dim='maturity') as data: var_name= list(data.data_vars)[0] ar = data[var_name] maturity = spec['maturity'] ann = ar.cumsum(dim='maturity') ann = ann - 1 ar1 = ann.sel(maturity=maturity) doc['data'] = ar1.load().values return doc

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.6.8.final.0 python-bits: 64 OS: Linux OS-release: 5.4.0-1047-aws machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.11.0 pandas: 0.24.1 numpy: 1.15.4 scipy: 1.2.0 netCDF4: 1.4.2 h5netcdf: None h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 PseudonetCDF: None rasterio: None iris: None bottleneck: 1.2.1 cyordereddict: None dask: 1.1.1 distributed: 1.25.3 matplotlib: 3.0.2 cartopy: None seaborn: 0.9.0 setuptools: 40.7.3 pip: 19.0.1 conda: None pytest: 4.2.0 IPython: 7.1.1 sphinx: 1.8.4

This error also happens with xarray version 0.10.9

Error trace:

```python 2021-05-05 09:28:19,911, DEBUG 7621, sim_io.py:483 - load_unique_document(), xpa th=/home/ubuntu/runs/20210331_001/nominal_dfs/uk 2021-05-05 09:28:42,774, ERROR 7621, run_gov_ret.py:33 - <module>(), Unknown error=NetCDF: Not a valid ID Traceback (most recent call last): File "/home/ubuntu/dev/py36/python/ev/model/api3/run_gov_ret.py", line 31, in <module> res = govRet() File "/home/ubuntu/dev/py36/python/ev/model/api3/returns.py", line 56, in __ca ll__ decompose=self.decompose)) File "/home/ubuntu/dev/py36/python/ev/model/returns/returnsGenerator.py", line 70, in calc_returns dfs_data = self.mongo_dfs.get_data(mats=[1,mat,mat-1]) File "/home/ubuntu/dev/py36/python/ev/model/api3/dfs.py", line 262, in get_dat a record = self.mdb.load_unique_document(self.dfs_collection_name, spec) File "/home/ubuntu/dev/py36/python/ev/model/api3/sim_io.py", line 1109, in load_unique_document return self.collections[collection].load_unique_document(query, *args, **kwargs) File "/home/ubuntu/dev/py36/python/ev/model/api3/sim_io.py", line 501, in load_unique_document doc['data'] = ar1.load().values File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/dataarray.py", line 631, in load ds = self._to_temp_dataset().load(**kwargs) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/dataset.py", line 494, in load evaluated_data = da.compute(*lazy_data.values(), **kwargs) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/base.py", line 398, in compute results = schedule(dsk, keys, **kwargs) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/threaded.py", line 76, in get pack_exception=pack_exception, **kwargs) pack_exception=pack_exception, **kwargs) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/local .py", line 459, in get_async raise_exception(exc, tb) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/compa tibility.py", line 112, in reraise raise exc File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/local .py", line 230, in execute_task result = _execute_task(task, data) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/core. py", line 119, in _execute_task return func(*args2) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/array /core.py", line 82, in getter c = np.asarray(c) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/numpy/core /numeric.py", line 501, in asarray return array(a, dtype, copy=False, order=order) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/cor e/indexing.py", line 602, in __array__ return np.asarray(self.array, dtype=dtype) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/numpy/core/numeric.py", line 501, in asarray return array(a, dtype, copy=False, order=order) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/indexing.py", line 508, in __array__ return np.asarray(array[self.key], dtype=None) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 64, in __getitem__ self._getitem) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/indexing.py", line 776, in explicit_indexing_adapter result = raw_indexing_method(raw_key.tuple) File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 76, in _getitem array = getitem(original_array, key) File "netCDF4/_netCDF4.pyx", line 4095, in netCDF4._netCDF4.Variable.__getitem__ File "netCDF4/_netCDF4.pyx", line 3798, in netCDF4._netCDF4.Variable.shape.__get__ File "netCDF4/_netCDF4.pyx", line 3746, in netCDF4._netCDF4.Variable._getdims File "netCDF4/_netCDF4.pyx", line 1754, in netCDF4._netCDF4._ensure_nc_success RuntimeError: NetCDF: Not a valid ID ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5276/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
427644858 MDU6SXNzdWU0Mjc2NDQ4NTg= 2861 WHERE function, problems with memory operations? rpnaut 30219501 closed 0     8 2019-04-01T11:09:11Z 2022-04-09T15:41:51Z 2022-04-09T15:41:51Z NONE      

I am facing with the where-functionality in xarray. I have two datasets

ref = array([[14.82, 14.94, nan, ..., 16.21, 16.24, nan], [14.52, 14.97, nan, ..., 16.32, 16.34, nan], [15.72, 16.09, nan, ..., 17.38, 17.44, nan], ..., [ 6.55, 6.34, nan, ..., 6.67, 6.6 , nan], [ 8.76, 9.12, nan, ..., 9.07, 9.52, nan], [ 8.15, 8.97, nan, ..., 9.65, 9.52, nan]], dtype=float32) Coordinates: * height_WSS (height_WSS) float32 40.3 50.3 60.3 70.3 80.3 90.3 101.2 105.0 lat float32 54.01472 lon float32 6.5875 * time (time) datetime64[ns] 2006-10-31T00:10:00 ... 2006-11-03T23:10:00 Attributes: standard_name: wind_speed long_name: wind speed units: m s-1 cell_methods: time: mean comment: direction of the boom holding the measurement devices: 41... sensor: cup anemometer sensor_type: Vector Instruments Windspeed Ltd. A100LK/PC3/WR accuracy: 0.1 m s-1

and

proof= <xarray.DataArray 'WSS' (time: 96, height_WSS: 8)> array([[13.395692, 13.653825, 13.911958, ..., 14.511758, 14.703774, 14.770716], [14.740592, 15.010887, 15.281183, ..., 15.866542, 16.045753, 16.10823 ], [15.241853, 15.523318, 15.804785, ..., 16.417458, 16.605673, 16.67129 ], ..., [ 8.254081, 8.309716, 8.365352, ..., 8.46401 , 8.489728, 8.498694], [ 9.83241 , 9.895019, 9.957627, ..., 10.055538, 10.077768, 10.085519], [ 8.772054, 8.849378, 8.926702, ..., 9.065577, 9.102219, 9.114992]], dtype=float32) Coordinates: * time (time) datetime64[ns] 2006-10-31T00:10:00 ... 2006-11-03T23:10:00 lon float32 6.5875 lat float32 54.01472 * height_WSS (height_WSS) float32 40.3 50.3 60.3 70.3 80.3 90.3 101.2 105.0 Attributes: standard_name: wind_speed long_name: wind speed units: m s-1

Applying something like this: DSproof = proof["WSS"].where(ref["WSS"].notnull()).to_dataset(name="WSS")

gives me a dataarray of time length zero: <xarray.Dataset> Dimensions: (height_WSS: 8, time: 0) Coordinates: * time (time) datetime64[ns] lon float32 6.5875 lat float32 54.01472 * height_WSS (height_WSS) float32 40.3 50.3 60.3 70.3 80.3 90.3 101.2 105.0 Data variables: WSS (time, height_WSS) float32

Problem description

The problem seems to be that 'ref' and 'proof' are not entirely consistent somehow regarding coordinates. But if a subtract the coordinates from each other I do not get a difference. However, as I always fight with getting datasets consistent to each other for mathematical calculations with xarray, I have figured out following workarounds:

  1. One can drop the coordinates lon and lat from both datasets. Then everything works fine with 'where'.
  2. I am using WHERE in a large script with some operations done before WHERE is called. One operation is to make the data types and the coordinate names between 'ref' and 'proof' consistent (thatswhy the above output is very similar). If I save the files and reload them immediately before applying WHERE, it fixes my problem.
  3. Using a selection of all height levels proof["WSS"].isel(height=slice(0,9).where(ref["WSS"].isel(height=slice(0,9).notnull()).to_dataset(name="WSS") also fixes my problem.

Maybe, here I deal with a problem of incomplete operations in memory? The printout between datasets is maybe consistent but still an additional operation on the datasets is required to make the datasets consistent in memory?

Thanks in advance for your help

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2861/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
469440752 MDU6SXNzdWU0Njk0NDA3NTI= 3139 Change the signature of DataArray to DataArray(data, dims, coords, ...)? shoyer 1217238 open 0     1 2019-07-17T20:54:57Z 2022-04-09T15:28:51Z   MEMBER      

Currently, the signature of DataArray is DataArray(data, coords, dims, ...): http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html

In the long term, I think DataArray(data, dims, coords, ...) would be more intuitive: dimensions are a more fundamental part of xarray's data model than coordinates. Certainly I find it much more common to omit coords than to omit dims when I create a DataArray.

My original reasoning for this argument order was that dims could be copied from coords, e.g., DataArray(new_data, old_dataarray.coords), and it was nice to be able to pass this sole argument by position instead of by name. But a cleaner way to write this now is old_dataarray.copy(data=new_data).

The challenge in making any change here would be to have a smooth deprecation process, and that ideally avoids requiring users to rewrite all of their code and avoids loads of pointless/extraneous warnings. I'm not entirely sure this is possible. We could likely use heuristics to distinguish between dims and coords arguments regardless of their order, but this probably isn't something we would want to preserve in the long term.

An alternative that might achieve some of the convenience of this change would be to allow for passing lists of strings in the coords argument by position, which are interpreted as dimensions, e.g., DataArray(data, ['x', 'y']). The downside of this alternative is that it would add even more special cases to the DataArray constructor , which would make it harder to understand.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3139/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
208312826 MDU6SXNzdWUyMDgzMTI4MjY= 1273 replace a dim with a coordinate from another dataset rabernat 1197350 open 0     4 2017-02-17T02:15:36Z 2022-04-09T15:26:20Z   MEMBER      

I often want a function that takes a dataarray / dataset and replaces a dimension with a coordinate from a different dataset.

@shoyer proposed the following simple solution. ```python def replace_dim(da, olddim, newdim): renamed = da.rename({olddim: newdim.name})

# note that alignment along a dimension is skipped when you are overriding
# the relevant coordinate values
renamed.coords[newdim.name] = newdim
return renamed

```

Is this of broad enough interest to add a build in method for?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1273/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
995207525 MDU6SXNzdWU5OTUyMDc1MjU= 5790 combining 2 arrays with xr.merge() causes temporary spike in memory usage ~3x the combined size of the arrays zachglee 23262800 closed 0     6 2021-09-13T18:42:03Z 2022-04-09T15:25:28Z 2022-04-09T15:25:28Z NONE      

What happened: When attempting to combine two arrays of sizes b1 and b2 bytes: xr.merge([da1, da2]), I observe that memory usage temporarily increases by about ~3*(b1+b2) bytes. Once the operation finishes, the memory usage has a net increase of (b1+b2) bytes, which is what I would expect, since that's the size of the merged array I just created. What I did not expect was the temporary increase of ~3*(b1+b2) bytes.

For small arrays this temporary spike in memory is fine, but for larger arrays this means we are essentially limited to combining arrays of total size below 1/3rd of an instance's memory limit. Anything above that and the temporary spike causes the instance to crash.

What you expected to happen: I expected there to be only a memory increase of b1+b2 bytes, the amount needed to store the merged array. I did not expect memory increase to go higher than that during the merge operation.

Minimal Complete Verifiable Example:

```python

Put your MCVE code here

import numpy as np import xarray as xr import tracemalloc

tracemalloc.start()

print("(current, peak) memory at start:") print(tracemalloc.get_traced_memory())

create the test data (each is 100 by 100 by 10 array of random floats)

Their A and B coordinates are completely matching. Their C coordinates are completely disjoint.

data1 = np.random.rand(100, 100, 10) da1 = xr.DataArray( data1, dims=("A", "B", "C"), coords={ "A": [f"A{i}" for i in range(100)], "B": [f"B{i}" for i in range(100)], "C": [f"C{i}" for i in range(10)]}, ) da1.name = "da"

data2 = np.random.rand(100, 100, 10) da2 = xr.DataArray( data2, dims=("A", "B", "C"), coords={ "A": [f"A{i}" for i in range(100)], "B": [f"B{i}" for i in range(100)], "C": [f"C{i+10}" for i in range(10)]}, ) da2.name = "da"

print("(current, peak) memory after creation of arrays to be combined:") print(tracemalloc.get_traced_memory()) print(f"da1.nbytes = {da1.nbytes}") print(f"da2.nbytes = {da2.nbytes}")

da_combined = xr.merge([da1, da2]).to_array()

print("(current, peak) memory after merging. You should observe that the peak memory usage is now much higher.") print(tracemalloc.get_traced_memory()) print(f"da_combined.nbytes = {da_combined.nbytes}")

print(da_combined)

```

Anything else we need to know?:

Interestingly, when I try merging 3 arrays at once, (sizes b1, b2, b3) I observe temporary memory usage increase of about 5*(b1+b2+b3). I have a hunch that all arrays get aligned to the final merged coordinate space (which is much bigger), before they are combined, which means at some point in the middle of the process we have a bunch of arrays in memory that have been inflated to the size of the final output array.

If that's the case, it seems like it should be possible to make this operation more efficient by creating just one inflated array and adding the data from the input arrays to it in-place? Or is this an expected and unavoidable behavior with merging? (fwiw this also affects several other combination methods, presumably because they use merge() under the hood?)

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 4.19.121-linuxkit machine: x86_64 processor: byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.17.0 pandas: 1.2.3 numpy: 1.19.5 scipy: 1.6.0 netCDF4: 1.5.6 pydap: None h5netcdf: 0.11.0 h5py: 3.3.0 Nio: None zarr: None cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.4.2 cartopy: None seaborn: None numbagg: None pint: 0.16.1 setuptools: 57.4.0 pip: 21.2.4 conda: None pytest: 6.2.2 IPython: 7.23.1 sphinx: 3.5.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5790/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
257070215 MDU6SXNzdWUyNTcwNzAyMTU= 1569 Grouping with multiple levels jjpr-mit 25231875 closed 0     6 2017-09-12T14:46:12Z 2022-04-09T15:25:07Z 2022-04-09T15:25:06Z NONE      

http://xarray.pydata.org/en/stable/groupby.html says:

xarray supports “group by” operations with the same API as pandas

but when I supply the level keyword argument as described at https://pandas.pydata.org/pandas-docs/stable/groupby.html#groupby-with-multiindex, I get:
``` TypeError Traceback (most recent call last) <ipython-input-12-566fc67c0151> in <module>() ----> 1 hvm_it_v6_obj = hvm_it_v6.groupby(level=["category","obj"]).mean(dim="presentation") 2 hvm_it_v6_obj

TypeError: groupby() got an unexpected keyword argument 'level' ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1569/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
438947247 MDU6SXNzdWU0Mzg5NDcyNDc= 2933 Stack() & unstack() issues on Multindex ray306 1559890 closed 0     4 2019-04-30T19:47:51Z 2022-04-09T15:23:28Z 2022-04-09T15:23:28Z NONE      

I would like to reshape the DataArray by one level in the Multindex, and I thought the stack()/unstack() should be the solution.

Make a DataArray with Multindex: python import pandas as pd arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo']), np.array(['one', 'two', 'one', 'two', 'one', 'two'])] da = pd.DataFrame(np.random.randn(6, 4)).to_xarray().to_array() da.coords['index'] = pd.MultiIndex.from_arrays(arrays, names=['first', 'second']) da <xarray.DataArray (variable: 4, index: 6)> array([[ 0.379189, 1.082292, -2.073478, -0.84626 , -1.529927, -0.837407], [-0.267983, -0.2516 , -1.016653, -0.085762, -0.058382, -0.667891], [-0.013488, -0.855332, -0.038072, -0.385211, -2.149742, -0.304361], [ 1.749561, -0.606031, 1.914146, 1.6292 , -0.515519, 1.996283]]) Coordinates: * index (index) MultiIndex - first (index) object 'bar' 'bar' 'baz' 'baz' 'foo' 'foo' - second (index) object 'one' 'two' 'one' 'two' 'one' 'two' * variable (variable) int32 0 1 2 3

Stack problem:

I want a dimension merges into another one: python da.stack({'index':['variable']}) ValueError: cannot create a new dimension with the same name as an existing dimension

Unstack problem:

Unstacking by the whole Multindex worked: python da.unstack('index') ``` <xarray.DataArray (variable: 4, first: 3, second: 2)> array([[[ 0.379189, 1.082292], [-2.073478, -0.84626 ], [-1.529927, -0.837407]],

   [[-0.267983, -0.2516  ],
    [-1.016653, -0.085762],
    [-0.058382, -0.667891]],

   [[-0.013488, -0.855332],
    [-0.038072, -0.385211],
    [-2.149742, -0.304361]],

   [[ 1.749561, -0.606031],
    [ 1.914146,  1.6292  ],
    [-0.515519,  1.996283]]])

Coordinates: * variable (variable) int32 0 1 2 3 * first (first) object 'bar' 'baz' 'foo' * second (second) object 'one' 'two' But unstacking by a specified level failed:python da.unstack('first') ValueError: Dataset does not contain the dimensions: ['first'] ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2933/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
816540158 MDU6SXNzdWU4MTY1NDAxNTg= 4958 to_zarr mode='a-', append_dim; if dim value exists raise error ahuang11 15331990 open 0     1 2021-02-25T15:26:02Z 2022-04-09T15:19:28Z   CONTRIBUTOR      

If I have a ds with time, lat, lon and I call the same command twice: python ds.to_zarr('test.zarr', append_dim='time') ds.to_zarr('test.zarr', append_dim='time') Can it raise an error since all the times already exist?

Kind of like: ```python import numpy as np import xarray as xr

ds = xr.tutorial.open_dataset('air_temperature') ds.to_zarr('test_air.zarr', append_dim='time') ds_tmp = xr.open_mfdataset('test_air.zarr', engine='zarr') overlap = np.intersect1d(ds['time'], ds_tmp['time']) if len(overlap) > 1: raise ValueError(f'Found overlapping values in datasets {overlap}') ds.to_zarr('test_air.zarr', append_dim='time') ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4958/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
707571360 MDU6SXNzdWU3MDc1NzEzNjA= 4452 Change default for concat_characters to False in open_* functions eric-czech 6130352 open 0     2 2020-09-23T18:06:07Z 2022-04-09T03:21:43Z   NONE      

I wanted to propose that concat_characters be False for open_{dataset,zarr,dataarray}. I'm not sure how often that affects anyone since working with individual character arrays is probably rare, but it's a particularly bad default in genetics. We often represent individual variations as single characters and the concatenation is destructive because we can't invert it when one of the characters is an empty string (which often corresponds to a deletion at a base pair location, and the order of the characters matters).

I also find it to be confusing behavior (e.g. https://github.com/pydata/xarray/issues/4405) since no other arrays are automatically transformed like this when deserialized.

If submit a PR for this, would anybody object?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4452/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
903922477 MDU6SXNzdWU5MDM5MjI0Nzc= 5386 Add xr.open_dataset("file.tif", engine="rasterio") to docs raybellwaves 17162724 closed 0     1 2021-05-27T15:39:29Z 2022-04-09T03:15:45Z 2022-04-09T03:15:45Z CONTRIBUTOR      

Kind of related to https://github.com/pydata/xarray/issues/4697

I see https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html#rioxarray

shows

ds = xarray.open_dataset("file.tif", engine="rasterio")

This could be added to

https://xarray.pydata.org/en/latest/user-guide/io.html#rasterio

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5386/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
312203596 MDU6SXNzdWUzMTIyMDM1OTY= 2042 Anyone working on a to_tiff? Alternatively, how do you write an xarray to a geotiff? ebo 601025 closed 0     31 2018-04-07T12:43:41Z 2022-04-09T03:14:41Z 2022-04-09T01:19:10Z NONE      

Matthew Rocklin wrote a gist https://gist.github.com/mrocklin/3df315e93d4bdeccf76db93caca2a9bd to demonstrate using XArray to read tiled GeoTIFF datasets, but I am still confused as to how to write them to a GeoTIFF. I can easily create a tiff with "rasterio.open(out, 'w', **src.profile)", but the following does not seem like the best/cleanest way to do this:

``` ds = xr.open_rasterio('myfile.tif', chunks={'band': 1, 'x': 2048, 'y': 2048}) with rasterio.open('myfile.tif', 'r') as src: with rasterio.open('new_myfile.tif', 'w', **src.profile) as dst: for i in range(1, src.count + 1): dst.write(ds.variable.data[i-1].compute(), i)

``` Also, if the profile and tags were propagated through open_rasterio, then the second open would not be necessary and would be generally useful.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2042/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
956259734 MDU6SXNzdWU5NTYyNTk3MzQ= 5649 xr.merge bug? when using combine_attrs='drop_conflicts' jbusecke 14314623 open 0 keewis 14808389   3 2021-07-29T22:47:43Z 2022-04-09T03:14:24Z   CONTRIBUTOR      

What happened: I have recently encountered a situation where combining two datasets failed, due to the datatype of their attributes. This example illustrates the situation: ```python ds1 = xr.Dataset(attrs={'a':[5]}) ds2 = xr.Dataset(attrs={'a':6})

xr.merge([ds1, ds2], combine_attrs='drop_conflicts') give me this error:


TypeError Traceback (most recent call last) <ipython-input-12-1c8e82be0882> in <module> 2 ds2 = xr.Dataset(attrs={'a':6}) 3 ----> 4 xr.merge([ds1, ds2], combine_attrs='drop_conflicts')

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in merge(objects, compat, join, fill_value, combine_attrs) 898 dict_like_objects.append(obj) 899 --> 900 merge_result = merge_core( 901 dict_like_objects, 902 compat,

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value) 654 ) 655 --> 656 attrs = merge_attrs( 657 [var.attrs for var in coerced if isinstance(var, (Dataset, DataArray))], 658 combine_attrs,

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in merge_attrs(variable_attrs, combine_attrs, context) 544 } 545 ) --> 546 result = { 547 key: value 548 for key, value in result.items()

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in <dictcomp>(.0) 547 key: value 548 for key, value in result.items() --> 549 if key not in attrs or equivalent(attrs[key], value) 550 } 551 dropped_keys |= {key for key in attrs if key not in result}

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/utils.py in equivalent(first, second) 171 return duck_array_ops.array_equiv(first, second) 172 elif isinstance(first, list) or isinstance(second, list): --> 173 return list_equiv(first, second) 174 else: 175 return (

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/utils.py in list_equiv(first, second) 182 def list_equiv(first, second): 183 equiv = True --> 184 if len(first) != len(second): 185 return False 186 else:

TypeError: object of type 'int' has no len() ``` Took me a while to find out what the root cause of this was with a fully populated dataset, since the error is less than obvious.

What you expected to happen: In my understanding this should just drop the attribute a. The example works like expected when both attributes are an integer or both are lists with an integer. The error is only triggered when the type is mixed.

Is there a way to handle this case more elegantly?

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.89+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.1.dev8+gda99a566 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.6.2 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.7.1 cftime: 1.4.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.2.2 cfgrib: 0.9.9.0 iris: None bottleneck: 1.3.2 dask: 2021.04.1 distributed: 2021.04.1 matplotlib: 3.4.1 cartopy: 0.19.0 seaborn: None numbagg: None pint: 0.17 setuptools: 49.6.0.post20210108 pip: 20.3.4 conda: None pytest: None IPython: 7.22.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5649/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
576502871 MDU6SXNzdWU1NzY1MDI4NzE= 3834 encode_cf_datetime() casts dask arrays to NumPy arrays andersy005 13301940 open 0     2 2020-03-05T20:11:37Z 2022-04-09T03:10:49Z   MEMBER      

Currently, when xarray.coding.times.encode_cf_datetime() is called, it always casts the input to a NumPy array. This is not what I would expect when the input is a dask array. I am wondering if we could make this operation lazy when the input is a dask array?

https://github.com/pydata/xarray/blob/01462d65c7213e5e1cddf36492c6a34a7e53ce55/xarray/coding/times.py#L352-L354

```python In [46]: import numpy as np

In [47]: import xarray as xr

In [48]: import pandas as pd

In [49]: times = pd.date_range("2000-01-01", "2001-01-01", periods=11)

In [50]: time_bounds = np.vstack((times[:-1], times[1:])).T

In [51]: arr = xr.DataArray(time_bounds).chunk()

In [52]: arr
Out[52]: <xarray.DataArray (dim_0: 10, dim_1: 2)> dask.array<xarray-\<this-array>, shape=(10, 2), dtype=datetime64[ns], chunksize=(10, 2), chunktype=numpy.ndarray> Dimensions without coordinates: dim_0, dim_1

In [53]: xr.coding.times.encode_cf_datetime(arr)
Out[53]: (array([[ 0, 52704], [ 52704, 105408], [105408, 158112], [158112, 210816], [210816, 263520], [263520, 316224], [316224, 368928], [368928, 421632], [421632, 474336], [474336, 527040]]), 'minutes since 2000-01-01 00:00:00', 'proleptic_gregorian')

```

Cc @jhamman

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3834/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
606165039 MDU6SXNzdWU2MDYxNjUwMzk= 4000 Add hook to get progress of long-running operations cwerner 13906519 closed 0     3 2020-04-24T09:13:02Z 2022-04-09T03:08:45Z 2022-04-09T03:08:45Z NONE      

Hi. I currently work on a large dataframe that I convert to a Xarray dataset. It works, but takes quite some (unknown) amount of time.

MCVE Code Sample

python data = pd.DataFrame("huge data frame with time, lat, Lon as multiindex and about 60 data columns ") dsout = xr.Dataset() dsout = dsout.from_dataframe(data)

Expected Output

A progress report/ bar about the operation

Problem Description

It would be nice to have some hook or other functionality to tap into the xr.from_dataframe() and return a progress status that I then could pass to tqdm or something similar...

Versions

0.15.1

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4000/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
607718350 MDU6SXNzdWU2MDc3MTgzNTA= 4011 missing empty group when iterate over groupby_bins miniufo 9312831 open 0     4 2020-04-27T17:22:31Z 2022-04-09T03:08:14Z   NONE      

When I try to iterate over the object grouped returned by groupby_bins, I found that the empty group is missing silently. Here is a simple case: ```python array = xr.DataArray(np.arange(4), dims='dim_0')

one of these bins will be empty

bins = [0,4,5] grouped = array.groupby_bins('dim_0', bins)

for i, group in enumerate(grouped): print(str(i)+' '+group) ``` When a bin contains no samples (bin of (4, 5]), the empty group will be dropped. Then how to iterate over the full bins even when some bins contain nothing? I've read this related issue #1019. But my case here need the correct order in grouped and empty groups need to be iterated over.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4011/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
666896781 MDU6SXNzdWU2NjY4OTY3ODE= 4279 intersphinx looks for implementation modules crusaderky 6213168 open 0     0 2020-07-28T08:55:12Z 2022-04-09T03:03:30Z   MEMBER      

This is a widespread issue caused by the pattern of defining objects in private module and then exposing them to the final user by importing them in the top-level __init__.py, vs. how intersphinx works.

Exact same issue in different projects: - https://github.com/aio-libs/aiohttp/issues/3714 - https://jira.mongodb.org/browse/MOTOR-338 - https://github.com/tkem/cachetools/issues/178 - https://github.com/AmphoraInc/xarray_mongodb/pull/22 - https://github.com/jonathanslenders/asyncio-redis/issues/143

If a project 1. uses xarray, intersphinx, and autodoc 2. subclasses any of the classes exposed by xarray/__init__.py and documents the new class with the :show-inheritance: flag 3. Starting from Sphinx 3, has any of the above classes anywhere in a type annotation

Then Sphinx emits a warning and fails to create a hyperlink, because intersphinx uses the __module__ attribute to look up the object in objects.inv, but __module__ points to the implementation module while objects.inv points to the top-level xarray module.

Workaround

In conf.py:

python import xarray xarray.DataArray.__module__ = "xarray"

Solution

Put the above hack in xarray/__init__.py

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4279/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
667203487 MDU6SXNzdWU2NjcyMDM0ODc= 4282 Values change when writing combined Dataset loaded with open_mfdataset chpolste 11723107 closed 0     1 2020-07-28T16:20:09Z 2022-04-09T03:00:55Z 2022-04-09T03:00:55Z NONE      

What happened:

Loading two netcdf files with open_mfdataset then writing into a combined file results in some values changed in the file.

What you expected to happen:

That the written file contains the same values than the in-memory Dataset when read again.

Minimal Complete Verifiable Example:

```python

import numpy as np import xarray as xr data1 = xr.open_dataset("file1.nc") data2 = xr.open_dataset("file2.nc") merged = xr.open_mfdataset(["file1.nc", "file2.nc"]) np.all(np.isclose(merged["u"].values[0], data1["u"].values[0])) True np.all(np.isclose(merged["u"].values[-1], data2["u"].values[-1])) True merged.to_netcdf("foo.nc") merged_file = xr.load_dataset("foo.nc") np.all(np.isclose(merged_file["u"].values, merged["u"].values)) False ```

The files contain wind data from the ERA5 reanalysis, downloaded from CDS.

Anything else we need to know?:

The issue might be related to the scale and offset values of the variable. Continuing the example:

```python

np.all(np.isclose(merged_file["u"].values[0], data1["u"].values[0])) True np.all(np.isclose(merged_file["u"].values[-1], data2["u"].values[-1])) False ```

Data from the first file seems to be correct. When writing the combined dataset, the scale and offset from the first file are written to the combined file:

```python

data1_nomas = xr.open_dataset("file1.nc", mask_and_scale=False) data2_nomas = xr.open_dataset("file2.nc", mask_and_scale=False) merged_file_nomas = xr.open_dataset("foo.nc", mask_and_scale=False) data1_nomas["u"].attrs {'scale_factor': 0.002397265127278432, 'add_offset': 25.620963232670736, '_FillValue': -32767, 'missing_value': -32767, 'units': 'm s-1', 'long_name': 'U component of wind', 'standard_name': 'eastward_wind'} data2_nomas["u"].attrs {'scale_factor': 0.0024358825557859445, 'add_offset': 21.288035293585388, '_FillValue': -32767, 'missing_value': -32767, 'units': 'm s-1', 'long_name': 'U component of wind', 'standard_name': 'eastward_wind'} merged_file_nomas["u"].attrs {'scale_factor': 0.002397265127278432, 'add_offset': 25.620963232670736, '_FillValue': -32767, 'units': 'm s**-1', 'long_name': 'U component of wind', 'standard_name': 'eastward_wind', 'missing_value': -32767}

```

Maybe the data from the second file is not adjusted to fit the new scaling and offset.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.15.0-107-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.0 pandas: 1.0.4 numpy: 1.18.5 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.2 iris: None bottleneck: None dask: 2.18.1 distributed: 2.21.0 matplotlib: 3.2.1 cartopy: 0.18.0 seaborn: None numbagg: None pint: 0.14 setuptools: 49.2.0.post20200712 pip: 20.1.1 conda: 4.8.3 pytest: None IPython: 7.16.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4282/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
673682661 MDU6SXNzdWU2NzM2ODI2NjE= 4313 Using Dependabot to manage doc build and CI versions jthielen 3460034 open 0     4 2020-08-05T16:24:24Z 2022-04-09T02:59:21Z   CONTRIBUTOR      

As brought up on the bi-weekly community developers meeting, it sounds like Pandas v1.1.0 is breaking doc builds on RTD. One solution to the issues of frequent breakages in doc builds and CI due to upstream updates is having fixed version lists for all of these, which are then incrementally updated as new versions come out. @dopplershift has done a lot of great work in MetPy getting such a workflow set up with Dependabot (https://github.com/Unidata/MetPy/pull/1410) among other CI updates, and this could be adapted for use here in xarray.

We've generally been quite happy with our updated CI configuration with Dependabot over the past couple weeks. The only major issue has been https://github.com/Unidata/MetPy/issues/1424 / https://github.com/dependabot/dependabot-core/issues/2198#issuecomment-649726022, which has required some contributors to have to delete and recreate their forks in order for Dependabot to not auto-submit PRs to the forked repos.

Any thoughts that you had here @dopplershift would be appreciated!

xref https://github.com/pydata/xarray/issues/4287, https://github.com/pydata/xarray/pull/4296

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4313/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
685739084 MDU6SXNzdWU2ODU3MzkwODQ= 4375 allow using non-dimension coordinates in polyfit mathause 10194086 open 0     1 2020-08-25T19:40:55Z 2022-04-09T02:58:48Z   MEMBER      

polyfit currently only allows to fit along a dimension and not along a non-dimension coordinate (or a virtual coordinate)

Example: ```python da = xr.DataArray( [1, 3, 2], dims=["x"], coords=dict(x=["a", "b", "c"], y=("x", [0, 1, 2])) )

print(da)

da.polyfit("y", 1) Output:python <xarray.DataArray (x: 3)> array([1, 3, 2]) Coordinates: * x (x) <U1 'a' 'b' 'c' y (x) int64 0 1 2


KeyError Traceback (most recent call last) <ipython-input-80-9bb2dacf50f7> in <module> 5 print(da) 6 ----> 7 da.polyfit("y", 1)

~/.conda/envs/ipcc_ar6/lib/python3.7/site-packages/xarray/core/dataarray.py in polyfit(self, dim, deg, skipna, rcond, w, full, cov) 3507 """ 3508 return self._to_temp_dataset().polyfit( -> 3509 dim, deg, skipna=skipna, rcond=rcond, w=w, full=full, cov=cov 3510 ) 3511

~/.conda/envs/ipcc_ar6/lib/python3.7/site-packages/xarray/core/dataset.py in polyfit(self, dim, deg, skipna, rcond, w, full, cov) 6005 skipna_da = skipna 6006 -> 6007 x = get_clean_interp_index(self, dim, strict=False) 6008 xname = "{}_".format(self[dim].name) 6009 order = int(deg) + 1

~/.conda/envs/ipcc_ar6/lib/python3.7/site-packages/xarray/core/missing.py in get_clean_interp_index(arr, dim, use_coordinate, strict) 246 247 if use_coordinate is True: --> 248 index = arr.get_index(dim) 249 250 else: # string

~/.conda/envs/ipcc_ar6/lib/python3.7/site-packages/xarray/core/common.py in get_index(self, key) 378 """ 379 if key not in self.dims: --> 380 raise KeyError(key) 381 382 try:

KeyError: 'y' ```

Describe the solution you'd like

Would be nice if that worked.

Describe alternatives you've considered

One could just set the non-dimension coordinate as index, e.g.: da = da.set_index(x="y")

Additional context

Allowing this may be as easy as replacing

https://github.com/pydata/xarray/blob/9c85dd5f792805bea319f01f08ee51b83bde0f3b/xarray/core/missing.py#L248

by index = arr[dim] but I might be missing something. Or probably a use_coordinate must be threaded through to get_clean_interp_index (although I am a bit confused by this argument).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4375/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
792651098 MDU6SXNzdWU3OTI2NTEwOTg= 4840 Opening a dataset doesn't display groups. dklink 11861183 open 0     2 2021-01-23T21:16:32Z 2022-04-09T02:31:03Z   NONE      

Problem

I know xarray doesn't support netCDF4 Group functionality. That's fine, I bet it's incredibly thorny. My issue is, when you open the root group of a netCDF4 file which contains groups, xarray doesn't even tell you that there are groups; they are totally invisible. This seems like a big flaw; you've opened a file, shouldn't you at least be told what's in it?

Solution

When you open a dataset with the netcdf4-python library, you get something like this:

>>> netCDF4.Dataset(path) <class 'netCDF4._netCDF4.Dataset'> root group (NETCDF4 data model, file format HDF5): some global attribute: some value dimensions(sizes): ... variables(dimensions): ... groups: group1, group2

"groups" shows up sort of like an auto-generated attribute. Surely xarray can do something similar:

>>> xr.open_dataset(path) <xarray.Dataset> Dimensions: ... Coordinates: ... Data variables: ... Attributes: ... Groups: group1, group2

Workaround

The workaround I am considering is to actually add an attribute to my root group which contains a list of the groups in the file, so people using xarray will see that there are more groups in the file. However, this is redundant considering the information is already in the netCDF file, and also brittle since there's no guarantee the attribute truly reflects the groups in the file.

Conclusion

Considering that xr.open_dataset has a group parameter to open groups, it seems unfortunate that when you open a file, you don't see what groups are in there. Instead, you have to use an external tool to get information on the file's groups, then open them with xarray. Since this is only a matter of extracting group data and printing it, surely this is a simple (and imo, valuable) addition. I'd be happy to implement it and submit a PR if people are on-board. I might need some direction though, this is my first time digging into the xarray source code, and I don't see a __str__ method on the Dataset class, which is where I expected to make this addition.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4840/reactions",
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
770006670 MDU6SXNzdWU3NzAwMDY2NzA= 4704 Retries for rare failures eric-czech 6130352 open 0     2 2020-12-17T13:06:51Z 2022-04-09T02:30:16Z   NONE      

I recently ran into several issues with gcsfs (https://github.com/dask/gcsfs/issues/316, https://github.com/dask/gcsfs/issues/315, and https://github.com/dask/gcsfs/issues/318) where errors are occasionally thrown, but only in large worfklows where enough http calls are made for them to become probable.

@martindurant suggested forcing dask to retry tasks that may fail like this with .compute(... retries=N) in https://github.com/dask/gcsfs/issues/316, which has worked well. However, I also see this in Xarray/Zarr code interacting with gcsfs directly:

Example Traceback ``` Traceback (most recent call last): File "scripts/convert_phesant_data.py", line 100, in <module> fire.Fire() File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fire/core.py", line 138, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fire/core.py", line 463, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "scripts/convert_phesant_data.py", line 96, in sort_zarr ds.to_zarr(fsspec.get_mapper(output_path), consolidated=True, mode="w") File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/core/dataset.py", line 1652, in to_zarr return to_zarr( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/api.py", line 1368, in to_zarr dump_to_store(dataset, zstore, writer, encoding=encoding) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/api.py", line 1128, in dump_to_store store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/zarr.py", line 417, in store self.set_variables( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/zarr.py", line 489, in set_variables writer.add(v.data, zarr_array, region=region) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/common.py", line 145, in add target[...] = source File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1115, in __setitem__ self.set_basic_selection(selection, value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1210, in set_basic_selection return self._set_basic_selection_nd(selection, value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1501, in _set_basic_selection_nd self._set_selection(indexer, value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1550, in _set_selection self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1664, in _chunk_setitem self._chunk_setitem_nosync(chunk_coords, chunk_selection, value, File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1729, in _chunk_setitem_nosync self.chunk_store[ckey] = cdata File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/mapping.py", line 151, in __setitem__ self.fs.pipe_file(key, maybe_convert(value)) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 121, in wrapper return maybe_sync(func, self, *args, **kwargs) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 100, in maybe_sync return sync(loop, func, *args, **kwargs) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync raise exc.with_traceback(tb) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 55, in f result[0] = await future File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 1007, in _pipe_file return await simple_upload( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 1523, in simple_upload j = await fs._call( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 525, in _call raise e File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 507, in _call self.validate_response(status, contents, json, path, headers) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 1228, in validate_response raise HttpError(error) gcsfs.utils.HttpError: Required ```

Has there already been a discussion about how to address rare errors like this? Arguably, I could file the same issue with Zarr but it seemed more productive to start here at a higher level of abstraction.

To be clear, the code for the example failure above typically succeeds and reproducing this failure is difficult. I have only seen it a couple times now like this, where the calling code does not include dask, but it did make me want to know if there were any plans to tolerate rare failures in Xarray as Dask does.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4704/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
517192343 MDU6SXNzdWU1MTcxOTIzNDM= 3482 geo raster accessor shaharkadmiel 6872529 closed 0     1 2019-11-04T14:34:27Z 2022-04-09T02:28:38Z 2022-04-09T02:28:25Z NONE      

Hi, I have put together a very simple package that provides a universal read function for reading various raster formats including of course netCDF but also any other format that gdal or rasterio can recognize. This read function can also handle merging several tiles into one dataset.

In addition, the package provides a .geo dataset accessor that currently adds trimming functionality to extract a geographical subset of the data.

I plan to also add reprojection and spatial resampling methods which will wrap either rasterio functionality or directly use gdal's api.

I hope this is of interest to the geosciences community and perhaps even a broader community.

Contributions and any other input from others is of course welcome.

Have a quick look at the Demo section in the readme file to get some ideas as to what this package can do for you.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3482/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
532647948 MDU6SXNzdWU1MzI2NDc5NDg= 3593 xr.open_dataset not reloading data in jupyter-notebook lkroen 58510627 closed 0     1 2019-12-04T12:17:13Z 2022-04-09T02:27:17Z 2022-04-09T02:27:17Z NONE      

First, I reported this issue on Jupyter-Notebook and was told, that it might be an issue of xarry: https://github.com/jupyter/notebook/issues/5101

I load an .nc file and print it

Cell 1 python 3 import xarray as xr data_file = 'path_to_file/WMI_Lear.nc'

Cell 2 python 3 data = '' data = xr.open_dataset(data_file) print(data)

and I get the correct output: python <xarray.Dataset> Dimensions: (time: 180) Coordinates: * time (time) datetime64[ns] 2003-07-06T06:30:13 ... 2003-07-06T06:59:59 Data variables: altitude (time) float32 ... latitude (time) float32 ... longitude (time) float32 ... pressure (time) float32 ... tdry (time) float32 ... dp (time) float32 ... mr (time) float32 ... wspd (time) float32 ... wdir (time) float32 ... Drops (time) float64 ... Attributes: history: $Id: TrackFile.java,v 1.20 2003/05/07 04:53:23 maclean Exp $

Now I (re)move the data in a terminal so that does not exist under the same name bash mv path_to_file/WMI_Lear.nc path_to_file/WMI_Lear.nc_new

Cell 3 python 3 data = '' data = xr.open_dataset(data_file) print(data)

and I correctly get an error, that the file does not exist python FileNotFoundError: [Errno 2] No such file or directory: b'/path_to_file/WMI_Lear.nc'

Now I move the data in a terminal backwards so that it exits again under the correct name bash mv path_to_file/WMI_Lear.nc_new path_to_file/WMI_Lear.nc

Cell 4 python 3 data = '' data = xr.open_dataset(data_file) print(data) And again the output is correct as after cell 2

Now I (re)emove the data in a terminal so that does not exist under the same name again bash mv path_to_file/WMI_Lear.nc path_to_file/WMI_Lear.nc_new

Cell 5 python 3 data = '' data = xr.open_dataset(data_file) print(data)

Now I expect again the error message that I follows after cell 3, which says that the file does not exist. But, I get the output as If the file would exist.

If I replacepython3 data = xr.open_dataset(data_file) withpython3 data = xr.open_dataset(data_file,cache='old') ``` I get the same problem.

The same issue occurs, if I only change the file. Then the changed file isn't loaded anymore. Deleting the file is just a drastic example.

The same issue occurs, if I just repeatedly run the cell which is supposed to load the file. Then the file change is not loaded anymore. This is a real issue since I would need to restart the kernel always which is just not practical..

Attached you'll find the simple .nc file.

WMI_Lear.nc.zip

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3593/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
552987067 MDU6SXNzdWU1NTI5ODcwNjc= 3712 [Documentation/API?] {DataArray,Dataset}.sortby is stable sort? jaicher 4666753 open 0     0 2020-01-21T16:27:37Z 2022-04-09T02:26:34Z   CONTRIBUTOR      

I noticed that {DataArray,Dataset}.sortby() are implemented using np.lexsort(), which is a stable sort. Can we expect this function to remain a stable sort in the future even if the implementation is changed for some reason?

It is not explicitly stated in the docs that the sorting will be stable. If this function is meant to always be stable, I think the documentation should explicitly state this. If not, I think it would be helpful to have an optional argument to ensure that the sort is kept stable in case the implementation changes in the future.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3712/reactions",
    "total_count": 3,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 1
}
    xarray 13221727 issue
559283550 MDU6SXNzdWU1NTkyODM1NTA= 3745 groupby drops the variable used to group malmans2 22245117 open 0     0 2020-02-03T19:25:06Z 2022-04-09T02:25:17Z   CONTRIBUTOR      

MCVE Code Sample

python import xarray as xr ds = xr.tutorial.load_dataset('rasm') ```python

Seasonal mean

ds_season = ds.groupby('time.season').mean() ds_season ```

<xarray.Dataset>
Dimensions:  (season: 4, x: 275, y: 205)
Coordinates:
    yc       (y, x) float64 16.53 16.78 17.02 17.27 ... 28.26 28.01 27.76 27.51
    xc       (y, x) float64 189.2 189.4 189.6 189.7 ... 17.65 17.4 17.15 16.91
  * season   (season) object 'DJF' 'JJA' 'MAM' 'SON'
Dimensions without coordinates: x, y
Data variables:
    Tair     (season, y, x) float64 nan nan nan nan ... 23.13 22.06 21.72 21.94

```python

The seasons are ordered in alphabetical order.

I want to sort them based on time.

But time was dropped, so I have to do this:

time_season = ds['time'].groupby('time.season').mean() ds_season.sortby(time_season) ```

<xarray.Dataset>
Dimensions:  (season: 4, x: 275, y: 205)
Coordinates:
    yc       (y, x) float64 16.53 16.78 17.02 17.27 ... 28.26 28.01 27.76 27.51
    xc       (y, x) float64 189.2 189.4 189.6 189.7 ... 17.65 17.4 17.15 16.91
  * season   (season) object 'SON' 'DJF' 'MAM' 'JJA'
Dimensions without coordinates: x, y
Data variables:
    Tair     (season, y, x) float64 nan nan nan nan ... 29.27 28.39 27.94 28.05

Expected Output

```python

Why does groupby drop time?

I would expect a dataset that looks like this:

ds_season['time'] = time_season ds_season ```

<xarray.Dataset>
Dimensions:  (season: 4, x: 275, y: 205)
Coordinates:
    yc       (y, x) float64 16.53 16.78 17.02 17.27 ... 28.26 28.01 27.76 27.51
    xc       (y, x) float64 189.2 189.4 189.6 189.7 ... 17.65 17.4 17.15 16.91
  * season   (season) object 'DJF' 'JJA' 'MAM' 'SON'
Dimensions without coordinates: x, y
Data variables:
    Tair     (season, y, x) float64 nan nan nan nan ... 23.13 22.06 21.72 21.94
    time     (season) object 1982-01-16 12:00:00 ... 1981-10-17 00:00:00

Problem Description

I often use groupby on time variables. When I do that, the time variable is dropped and replaced (e.g., time is replaced by season, month, year, ...). Most of the time I also want to sort the new dataset based on the original time. The example above shows why this is useful for seasons. Another example would be to sort monthly averages of a dataset that originally had daily data from Sep-2000 to Aug-2001. Why is time dropped? Does it make sense to keep it in the grouped dataset?

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-1067-oem machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.14.1 pandas: 1.0.0 numpy: 1.18.1 scipy: None netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.1 dask: 2.10.1 distributed: 2.10.0 matplotlib: 3.1.2 cartopy: None seaborn: None numbagg: None setuptools: 45.1.0.post20200127 pip: 20.0.2 conda: None pytest: None IPython: 7.11.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3745/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
564240510 MDU6SXNzdWU1NjQyNDA1MTA= 3767 ValueError when reading netCDF jjm0022 16228337 closed 0     2 2020-02-12T20:08:45Z 2022-04-09T02:24:48Z 2022-04-09T02:24:48Z NONE      

MCVE Code Sample

```python

ds = xr.open_dataset('20090327_0600')

```

Problem Description

Whenever I try to read certain netCDF files it raises a ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array. Here is a link to the one of the files that is raising the error: https://madis-data.ncep.noaa.gov/madisPublic1/data/archive/2009/03/27/LDAD/hfmetar/netCDF/20090327_0600.gz I don't have any issues reading this file though: https://madis-data.ncep.noaa.gov/madisPublic1/data/archive/2019/05/15/LDAD/hfmetar/netCDF/20190515_1200.gz

The traceback looks like this: ```python KeyError Traceback (most recent call last) ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock) 197 try: --> 198 file = self._cache[self._key] 199 except KeyError:

~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/lru_cache.py in getitem(self, key) 52 with self._lock: ---> 53 value = self._cache[key] 54 self._cache.move_to_end(key)

KeyError: [<function _open_scipy_netcdf at 0x11c8fc160>, ('/Users/jmiller/data/madis/20090327_0600',), 'r', (('mmap', None), ('version', 2))]

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last) <ipython-input-26-04ef422e5840> in <module> ----> 1 ds = xr.open_dataset('madis/20090327_0600')

~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime) 536 537 with close_on_error(store): --> 538 ds = maybe_decode_store(store) 539 540 # Ensure source filename always stored in dataset object (GH issue #2550)

~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/api.py in maybe_decode_store(store, lock) 444 445 def maybe_decode_store(store, lock=False): --> 446 ds = conventions.decode_cf( 447 store, 448 mask_and_scale=mask_and_scale,

~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/conventions.py in decode_cf(obj, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime) 568 encoding = obj.encoding 569 elif isinstance(obj, AbstractDataStore): --> 570 vars, attrs = obj.load() 571 extra_coords = set() 572 file_obj = obj

~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/common.py in load(self) 121 """ 122 variables = FrozenDict( --> 123 (_decode_variable_name(k), v) for k, v in self.get_variables().items() 124 ) 125 attributes = FrozenDict(self.get_attrs())

~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/scipy_.py in get_variables(self) 155 def get_variables(self): 156 return FrozenDict( --> 157 (k, self.open_store_variable(k, v)) for k, v in self.ds.variables.items() 158 ) 159

~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/scipy_.py in ds(self) 144 @property 145 def ds(self): --> 146 return self._manager.acquire() 147 148 def open_store_variable(self, name, var):

~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/file_manager.py in acquire(self, needs_lock) 178 An open file object, as returned by opener(*args, **kwargs). 179 """ --> 180 file, _ = self._acquire_with_cache_info(needs_lock) 181 return file 182

~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock) 202 kwargs = kwargs.copy() 203 kwargs["mode"] = self._mode --> 204 file = self._opener(self._args, *kwargs) 205 if self._mode == "w": 206 # ensure file doesn't get overriden when opened again

~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/scipy_.py in _open_scipy_netcdf(filename, mode, mmap, version) 81 82 try: ---> 83 return scipy.io.netcdf_file(filename, mode=mode, mmap=mmap, version=version) 84 except TypeError as e: # netcdf3 message is obscure in this case 85 errmsg = e.args[0]

~/miniconda3/envs/proc/lib/python3.8/site-packages/scipy/io/netcdf.py in init(self, filename, mode, mmap, version, maskandscale) 282 283 if mode in 'ra': --> 284 self._read() 285 286 def setattr(self, attr, value):

~/miniconda3/envs/proc/lib/python3.8/site-packages/scipy/io/netcdf.py in _read(self) 614 self._read_dim_array() 615 self._read_gatt_array() --> 616 self._read_var_array() 617 618 def _read_numrecs(self):

~/miniconda3/envs/proc/lib/python3.8/site-packages/scipy/io/netcdf.py in _read_var_array(self) 720 # Build rec array. 721 if self.use_mmap: --> 722 rec_array = self._mm_buf[begin:begin+self._recs*self._recsize].view(dtype=dtypes) 723 rec_array.shape = (self._recs,) 724 else:

ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array. ```

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.8.1 | packaged by conda-forge | (default, Jan 29 2020, 15:06:10) [Clang 9.0.1 ] python-bits: 64 OS: Darwin OS-release: 19.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.15.0 pandas: 1.0.0 numpy: 1.18.1 scipy: 1.4.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.10.1 distributed: 2.10.0 matplotlib: 3.1.3 cartopy: 0.17.0 seaborn: 0.10.0 numbagg: None setuptools: 45.1.0.post20200119 pip: 20.0.2 conda: None pytest: None IPython: 7.12.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3767/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
361237908 MDU6SXNzdWUzNjEyMzc5MDg= 2419 Document ways to reshape a DataArray dimitryx2017 9844249 open 0     5 2018-09-18T10:27:36Z 2022-04-09T02:21:15Z   NONE      

Code Sample, a copy-pastable example if possible

A "Minimal, Complete and Verifiable Example" will make it much easier for maintainers to help you: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

```python

Your code here

def xr_reshape(A, dim, newdims, coords): """ Reshape DataArray A to convert its dimension dim into sub-dimensions given by newdims and the corresponding coords. Example: Ar = xr_reshape(A, 'time', ['year', 'month'], [(2017, 2018), np.arange(12)]) """

# Create a pandas MultiIndex from these labels
ind = pd.MultiIndex.from_product(coords, names=newdims)

# Replace the time index in the DataArray by this new index,
A1 = A.copy()

A1.coords[dim] = ind

# Convert multiindex to individual dims using DataArray.unstack().
# This changes dimension order! The new dimensions are at the end.
A1 = A1.unstack(dim)

# Permute to restore dimensions
i = A.dims.index(dim)
dims = list(A1.dims)

for d in newdims[::-1]:
    dims.insert(i, d)

for d in newdims:
    _ = dims.pop(-1)


return A1.transpose(*dims)

```

Problem description

[this should explain why the current behavior is a problem and why the expected output is a better solution.]

It would be great to have the above function as a DataArray's method.

Expected Output

A reshaped DataArray. In the example in the function comment it would correspond to an array like

In[1] Ar.dims Out[1]: ('year', 'month', 'lat', 'lon')

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-693.5.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8 xarray: 0.10.4 pandas: 0.23.0 numpy: 1.13.3 scipy: 0.19.1 netCDF4: 1.3.1 h5netcdf: None h5py: 2.7.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.15.3 distributed: 1.19.1 matplotlib: 2.1.0 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 36.5.0.post20170921 pip: 18.0 conda: 4.4.7 pytest: 3.2.1 IPython: 6.1.0 sphinx: 1.6.3
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2419/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
414641120 MDU6SXNzdWU0MTQ2NDExMjA= 2789 Appending to zarr with string dtype davidbrochart 4711805 open 0     2 2019-02-26T14:31:42Z 2022-04-09T02:18:05Z   CONTRIBUTOR      

```python import xarray as xr

da = xr.DataArray(['foo']) ds = da.to_dataset(name='da') ds.to_zarr('ds') # no special encoding specified

ds = xr.open_zarr('ds') print(ds.da.values) ```

The following code prints ['foo'] (string type). The encoding chosen by zarr is "dtype": "|S3", which corresponds to bytes, but it seems to be decoded to a string, which is what we want.

$ cat ds/da/.zarray { "chunks": [ 1 ], "compressor": { "blocksize": 0, "clevel": 5, "cname": "lz4", "id": "blosc", "shuffle": 1 }, "dtype": "|S3", "fill_value": null, "filters": null, "order": "C", "shape": [ 1 ], "zarr_format": 2 }

The problem is that if I want to append to the zarr archive, like so:

```python import zarr

ds = zarr.open('ds', mode='a') da_new = xr.DataArray(['barbar']) ds.da.append(da_new)

ds = xr.open_zarr('ds') print(ds.da.values) ```

It prints ['foo' 'bar']. Indeed the encoding was kept as "dtype": "|S3", which is fine for a string of 3 characters but not for 6.

If I want to specify the encoding with the maximum length, e.g:

python ds.to_zarr('ds', encoding={'da': {'dtype': '|S6'}})

It solves the length problem, but now my strings are kept as bytes: [b'foo' b'barbar']. If I specify a Unicode encoding:

python ds.to_zarr('ds', encoding={'da': {'dtype': 'U6'}})

It is not taken into account. The zarr encoding is "dtype": "|S3" and I am back to my length problem: ['foo' 'bar'].

The solution with 'dtype': '|S6' is acceptable, but I need to encode my strings to bytes when indexing, which is annoying.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2789/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
415192339 MDU6SXNzdWU0MTUxOTIzMzk= 2790 Bug in xarray.open_dataset with variables/coordinates of dtype 'timedelta64[ns]' SK-E 48060979 closed 0     1 2019-02-27T15:48:14Z 2022-04-09T02:17:56Z 2022-04-09T02:17:56Z NONE      

Code Sample, a copy-pastable example if possible

```python import xarray as xr import pandas as pd

Create array, coordinate time's dtype is timedelta64[ns]

time = pd.timedelta_range(f"{2.0}s",f"{2.05}s",freq="10ms",name="time") data = range(len(time)) arr = xr.DataArray(data=psi,coords={"time":time},dims="time",name="psi")

Save array

savefile = "/path/to/file/BugXarray.nc" arr.to_netcdf(savefile)

Load array

arr_loaded = xr.open_dataset(savefile)

Show time-coordinate on arr and arr_loaded

print(arr.time.values)

Output: [2000000000 2010000000 2020000000 2030000000 2040000000 2050000000]

print(arr_loaded.time.values)

Output: [2000000000 2009999999 2020000000 2029999999 2040000000 2049999999]

Same problem with pandas to_timedelta

timedelta = np.arange(200,206,1)/100 timedelta = pd.to_timedelta(timedelta,unit="s")

Show time and timedelta

print(time.values)

Output: [2000000000 2010000000 2020000000 2030000000 2040000000 2050000000]

print(timedelta.values)

Output: [2000000000 2009999999 2020000000 2029999999 2040000000 2049999999]

```

Problem description

Opening a netcdf-file that contains variables/coordinates with a dtype that is supposed to be 'timedelta64[ns]' might cause errors due to a loss in precision. I realized that the pandas-function pandas.to_timedelta shows the same misbehavior, though I don't know if xarray.open_dataset uses that function internally.

Expected Output

In the example above arr_loaded.time.values should equal arr.time.values!

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.2 (default, Dec 29 2018, 06:19:36) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-45-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: de_DE.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.11.3 pandas: 0.24.1 numpy: 1.15.4 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 cyordereddict: None dask: 1.1.1 distributed: 1.25.3 matplotlib: 3.0.2 cartopy: None seaborn: 0.9.0 setuptools: 40.8.0 pip: 19.0.1 conda: 4.6.4 pytest: 4.2.1 IPython: 7.2.0 sphinx: 1.8.4
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2790/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
428180638 MDU6SXNzdWU0MjgxODA2Mzg= 2863 Memory Error for simple operations on NETCDF4 internally zipped files rpnaut 30219501 closed 0     3 2019-04-02T11:48:01Z 2022-04-09T02:15:45Z 2022-04-09T02:15:45Z NONE      

Assuming you want to make easy computations with a data array loaded from internally zipped NETCDF4 files, you need at first to load a dataset: In [2]: eobs = xarray.open_dataset("eObs_ens_mean_0.1deg_reg_v18.0e.T_2M.1950-2018.nc") In [3]: eobs Out[3]: <xarray.Dataset> Dimensions: (lat: 465, lon: 705, time: 25049) Coordinates: * time (time) datetime64[ns] 1950-01-01 1950-01-02 1950-01-03 ... * lon (lon) float64 -24.95 -24.85 -24.75 -24.65 -24.55 -24.45 -24.35 ... * lat (lat) float64 25.05 25.15 25.25 25.35 25.45 25.55 25.65 25.75 ... Data variables: T_2M (time, lat, lon) float64 nan nan nan nan nan nan nan nan nan ... Attributes: _NCProperties: version=1|netcdflibversion=4.4.1|hdf5libversion=1.8.17 E-OBS_version: 18.0e Conventions: CF-1.4 References: http://surfobs.climate.copernicus.eu/dataaccess/access_eo...

Afterwards I have tried to do this: ``` In [4]: datarray=eobs["T_2M"]+273.15


MemoryError Traceback (most recent call last) <ipython-input-4-eaff3bff5e27> in <module>() ----> 1 datarray=eobs["T_2M"]+273.15

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/dataarray.py in func(self, other) 1539 1540 variable = (f(self.variable, other_variable) -> 1541 if not reflexive 1542 else f(other_variable, self.variable)) 1543 coords = self.coords._merge_raw(other_coords)

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in func(self, other) 1139 if isinstance(other, (xr.DataArray, xr.Dataset)): 1140 return NotImplemented -> 1141 self_data, other_data, dims = _broadcast_compat_data(self, other) 1142 new_data = (f(self_data, other_data) 1143 if not reflexive

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in _broadcast_compat_data(self, other) 1379 else: 1380 # rely on numpy broadcasting rules -> 1381 self_data = self.data 1382 other_data = other 1383 dims = self.dims

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in data(self) 265 return self._data 266 else: --> 267 return self.values 268 269 @data.setter

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in values(self) 306 def values(self): 307 """The variable's data as a numpy.ndarray""" --> 308 return _as_array_or_item(self._data) 309 310 @values.setter

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in _as_array_or_item(data) 182 TODO: remove this (replace with np.asarray) once these issues are fixed 183 """ --> 184 data = np.asarray(data) 185 if data.ndim == 0: 186 if data.dtype.kind == 'M':

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/numpy-1.11.2-py3.5-linux-x86_64.egg/numpy/core/numeric.py in asarray(a, dtype, order) 480 481 """ --> 482 return array(a, dtype, copy=False, order=order) 483 484 def asanyarray(a, dtype=None, order=None):

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/indexing.py in array(self, dtype) 417 418 def array(self, dtype=None): --> 419 self._ensure_cached() 420 return np.asarray(self.array, dtype=dtype) 421

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/indexing.py in _ensure_cached(self) 414 def _ensure_cached(self): 415 if not isinstance(self.array, np.ndarray): --> 416 self.array = np.asarray(self.array) 417 418 def array(self, dtype=None):

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/numpy-1.11.2-py3.5-linux-x86_64.egg/numpy/core/numeric.py in asarray(a, dtype, order) 480 481 """ --> 482 return array(a, dtype, copy=False, order=order) 483 484 def asanyarray(a, dtype=None, order=None):

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/indexing.py in array(self, dtype) 398 399 def array(self, dtype=None): --> 400 return np.asarray(self.array, dtype=dtype) 401 402 def getitem(self, key):

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/numpy-1.11.2-py3.5-linux-x86_64.egg/numpy/core/numeric.py in asarray(a, dtype, order) 480 481 """ --> 482 return array(a, dtype, copy=False, order=order) 483 484 def asanyarray(a, dtype=None, order=None):

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/indexing.py in array(self, dtype) 373 def array(self, dtype=None): 374 array = orthogonally_indexable(self.array) --> 375 return np.asarray(array[self.key], dtype=None) 376 377 def getitem(self, key):

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/conventions.py in getitem(self, key) 361 def getitem(self, key): 362 return mask_and_scale(self.array[key], self.fill_value, --> 363 self.scale_factor, self.add_offset, self._dtype) 364 365 def repr(self):

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/conventions.py in mask_and_scale(array, fill_value, scale_factor, add_offset, dtype) 57 """ 58 # by default, cast to float to ensure NaN is meaningful ---> 59 values = np.array(array, dtype=dtype, copy=True) 60 if fill_value is not None and not np.all(pd.isnull(fill_value)): 61 if getattr(fill_value, 'size', 1) > 1:

MemoryError: ``` I have uploaded the datafile to the following link:

https://swiftbrowser.dkrz.de/public/dkrz_c0725fe8741c474b97f291aac57f268f/GregorMoeller/

Do I use the wrong netcdf-engine?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2863/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
441192361 MDU6SXNzdWU0NDExOTIzNjE= 2945 Implicit conversion from int to float tampers with values when int is not representable as float floogit 14000880 closed 0     1 2019-05-07T11:57:20Z 2022-04-09T02:14:28Z 2022-04-09T02:14:28Z NONE      

```python ds = xr.Dataset() val = 95042027804193144 ds['var1'] = xr.DataArray(val) ds_1 = ds.where(ds.var1==val)

print(ds_1.var1.dtype) dtype('float64') print(int(ds_1.var1)) 95042027804193152 ```

Problem description

As described in #2183, int values are converted to float in where(), also when there are no NaNs in the data. This is a serious issue for the case when the int64 number is not representable as float64, as is the case in the example above. The resulting numbers are then actually different from the original numbers, without any warning.

Expected Output

I guess this is hard to fix. At a minimum, where() should probably not cast to float when there are no NaNs (which would already fix our use case). I would also rather expect an error instead of silently changing the values of a variable.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.3 (default, Sep 27 2018, 17:25:39) [GCC 6.3.0 20170516] python-bits: 64 OS: Linux OS-release: 4.19.0-0.bpo.2-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.8.18 libnetcdf: 4.4.1.1 xarray: 0.12.1 pandas: 0.24.2 numpy: 1.15.4 scipy: 1.2.1 netCDF4: 1.2.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: 0.36.0 cfgrib: 0.9.6.post1 iris: None bottleneck: 1.2.1 dask: 1.1.4 distributed: None matplotlib: 3.0.3 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 39.2.0 pip: 19.0.3 conda: None pytest: 4.4.1 IPython: 7.5.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2945/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
326344778 MDU6SXNzdWUzMjYzNDQ3Nzg= 2183 converting int vars to floats when I where the enclosing ds? IvoCrnkovic 1778852 open 0     5 2018-05-25T00:48:43Z 2022-04-09T02:14:23Z   NONE      

Code Sample

```python test_ds = xr.Dataset()

test_ds['var1'] = xr.DataArray(np.arange(5)) test_ds['var2'] = xr.DataArray(np.ones(5))

assert(test_ds['var1'].dtype == np.int64)

assert(test_ds.where(test_ds['var2'] == 1)['var1'].dtype == np.int64) ```

Problem description

Second assert fails, which is a bit strange I think. Is that intended? If so, whats the reasoning?

Output of xr.show_versions()

commit: None python: 2.7.14.final.0 python-bits: 64 OS: Linux OS-release: 4.9.87-linuxkit-aufs machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: None LOCALE: None.None xarray: 0.10.3 pandas: 0.22.0 numpy: 1.14.3 scipy: 1.1.0 netCDF4: None h5netcdf: None h5py: None Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: None distributed: None matplotlib: 2.2.2 cartopy: None seaborn: 0.8.1 setuptools: 39.1.0 pip: 10.0.1 conda: None pytest: 3.5.1 IPython: 5.6.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2183/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
446868198 MDU6SXNzdWU0NDY4NjgxOTg= 2978 sel(method=x) is not propagated for MultiIndex mschrimpf 5308236 open 0     3 2019-05-21T23:30:56Z 2022-04-09T02:09:00Z   NONE      

When passing a method different from None to the selection method (e.g. .sel(method='nearest')), it is not propagated if the index is a MultiIndex. Specifically, the passing of the method key seems to be missing in xarray/core/indexing.py:convert_label_indexer https://github.com/pydata/xarray/blob/0811141e8f985a1f3b95ead92c3850cc74e160a5/xarray/core/indexing.py#L158-L159

For a normal index, the method is passed properly: https://github.com/pydata/xarray/blob/0811141e8f985a1f3b95ead92c3850cc74e160a5/xarray/core/indexing.py#L181

This leads to an unexpected KeyError when the selection value is not in the index, even if a nearest value could have been found.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-143-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.8 pandas: 0.24.2 numpy: 1.16.2 scipy: 1.1.0 netCDF4: 1.4.2 h5netcdf: None h5py: 2.8.0 Nio: None zarr: None bottleneck: None cyordereddict: None dask: 0.20.0 distributed: None matplotlib: 3.0.1 cartopy: None seaborn: 0.9.0 setuptools: 40.8.0 pip: 19.0.3 conda: None pytest: 3.10.0 IPython: 7.1.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2978/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
447044177 MDU6SXNzdWU0NDcwNDQxNzc= 2980 Jupyter Notebooks for Tutorials(USER GUIDE) hdsingh 30382331 open 0     3 2019-05-22T10:01:26Z 2022-04-09T02:07:55Z   NONE      

This issue is more of a suggestion.

A small issue that users reading documentation face is unavailability of jupyter notebooks for the tutorial docs User Guide. User constantly has to copy paste code from the documentation or .rst file which results in wastage of time. Having executable notebooks for new users would help them save time and quickly move on to using xarray for their specific tasks.It would ease the learning process for new users which may somehow bring more contributors to xarray community.

Let's take example of pyviz, holoviews, pytorch.

00 Setup — PyViz 0.10.0 documentation

holoviews/examples/user_guide at master · pyviz/holoviews · GitHub

Chatbot Tutorial — PyTorch Tutorials 1.1.0.dev20190507 documentation

All of them provide option to download the tutorial in the form of .ipynb file either in the beginning or end of the notebook.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2980/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
467736580 MDU6SXNzdWU0Njc3MzY1ODA= 3109 In the contribution instructions, the py36.yml fails to set up mmartini-usgs 23199378 closed 0     2 2019-07-13T15:55:23Z 2022-04-09T02:05:48Z 2022-04-09T02:05:48Z NONE      

Code Sample, a copy-pastable example if possible

conda env create -f ci/requirements/py36.yml

Problem description

In the contribution instructions, the py36.yml fails to set up, so the test environment does nto get created

Expected Output

A test environment

Output of xr.show_versions()

Environment fails to build, cannot be resolved.

The fix is to change

conda env create -f ci/requirements/py36.yml

to

conda env create -f ci/requirements/py37.yml

on this page: http://xarray.pydata.org/en/latest/contributing.html

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3109/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
484699415 MDU6SXNzdWU0ODQ2OTk0MTU= 3256 .item() on a DataArray with dtype='datetime64[ns]' returns int IvoCrnkovic 1778852 open 0     4 2019-08-23T20:29:50Z 2022-04-09T02:03:43Z   NONE      

MCVE Code Sample

```python import datetime import xarray as xr

test_da = xr.DataArray(datetime.datetime(2019, 1, 1, 1, 1))

test_da

<xarray.DataArray ()>

array('2019-01-01T01:01:00.000000000', dtype='datetime64[ns]')

test_da.item()

1546304460000000000

```

Expected Output

I would think it would be nice to get a datetime out of the .item() call then the nanosecond representation.

Output of xr.show_versions()

When I call xr.show_versions() i get an error but im running xarray 0.12.3
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3256/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
478398026 MDU6SXNzdWU0NzgzOTgwMjY= 3192 Cloud Storage Buckets pl-marasco 22492773 closed 0     1 2019-08-08T10:58:05Z 2022-04-09T01:51:09Z 2022-04-09T01:51:09Z NONE      

Following the instruction to create cloud storage here I stumbled with the fact that seems gcsfs doesn't anymore implement.mapping in the example is used as : gcsfs.mapping.GCSMap('<bucket-name>', gcs=fs, check=True, create=False)

Is it the example correct or must be rewritten?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3192/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
60303760 MDU6SXNzdWU2MDMwMzc2MA== 364 pd.Grouper support? naught101 167164 open 0     24 2015-03-09T06:25:14Z 2022-04-09T01:48:48Z   NONE      

In pandas, you can pas a pandas.TimeGrouper object to a .groupby() call, and it allows you to group by month, year, day, or other times, without manually creating a new index with those values first. It would be great if you could do this with xray, but at the moment, I get:

`` /usr/local/lib/python3.4/dist-packages/xray/core/groupby.py in __init__(self, obj, group, squeeze) 66 if the dimension is squeezed out. 67 """ ---> 68 if group.ndim != 1: 69 # TODO: remove this limitation? 70 raise ValueError('group` must be 1 dimensional')

AttributeError: 'TimeGrouper' object has no attribute 'ndim' ```

Not sure how this will work though, because pandas.TimeGrouper doesn't appear to work with multi-index dataframes yet anyway, so maybe there needs to be a feature request over there too, or maybe it's better to implement something from scratch...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/364/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
400289716 MDU6SXNzdWU0MDAyODk3MTY= 2686 Is `create_test_data()` public API? TomNicholas 35968931 open 0     3 2019-01-17T14:00:20Z 2022-04-09T01:48:14Z   MEMBER      

We want to encourage people to use and extend xarray, and we already provide testing functions as public API to help with this.

One function I keep using when writing code which uses xarray is xarray.tests.test_dataset.create_test_data(). This is very useful for quickly writing tests for the same reasons that it's useful in xarray's internal tests, but it's not explicitly public API. This means that there's no guarantee it won't change/disappear, which is not ideal if you're trying to write a test suite for separate software. But so many tests in xarray rely on it that presumably it's not going to get changed.

Is there any reason why it shouldn't be public API? Is there something I should use instead?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2686/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
377096851 MDU6SXNzdWUzNzcwOTY4NTE= 2539 Request: Add support for the ERDDAP griddap request rmendels 1919031 closed 0     3 2018-11-03T21:56:10Z 2022-04-09T01:47:28Z 2022-04-09T01:47:28Z NONE      

xarray already supports OPenDAP requests, and the ERDDAP service is being installed in many places, and while an ERDDAP server can function as an OPeNDAP server, and its syntax is very close to the OpeNDAP syntax, ERDDAP/griddap has the advantage that requests can be made in coordinate space.

Moreover, it would not have to be coded from scratch, ERDDAPy (https://github.com/pyoceans/erddapy) already has the code, it would be more of a question on how to integrate it. The ERDDAP service can return both netcdf or .dds files if that makes it easier to integrate.

Thanks.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2539/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
818266159 MDU6SXNzdWU4MTgyNjYxNTk= 4973 NetCDF encoded data not automatically decoded back into original dtype chrism0dwk 625462 closed 0     2 2021-02-28T17:57:33Z 2022-04-09T01:41:22Z 2022-04-09T01:41:22Z NONE      

What happened: When reading in an encoded netCDF4 file, encoded variables are not transformed back to their original dtype in the resulting xarray.

What you expected to happen: As with the raw netCDF4 package, if an xarray.DataArray of dtype float64 is encoded into a netCDF4 file as a float32, it should be converted back to the original float64 when the netCDF4 dataset is read back in.

Minimal Complete Verifiable Example:

python import xarray as xr import numpy as np foo = xr.DataArray(np.random.uniform(size=[100,100]).astype(np.float64)) foo.dtype # float64 ds = xr.Dataset({'foo': foo}) ds.to_netcdf("foo.nc", encoding={'foo': {'dtype': 'float32', 'scale_factor': 1.0, 'add_offset': 0.0}}) ds1 = xr.open_dataset("foo.nc") ds1['foo'].dtype # float32, not float64 as expected

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.7 (default, Mar 23 2020, 22:36:06) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-66-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.17.0 pandas: 1.1.5 numpy: 1.19.5 scipy: None netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.02.0 distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0 pip: 20.2.2 conda: None pytest: None IPython: 7.21.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4973/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
830040696 MDU6SXNzdWU4MzAwNDA2OTY= 5024 xr.DataArray.sum() converts string objects into unicode FabianHofmann 19226431 open 0     0 2021-03-12T11:47:06Z 2022-04-09T01:40:09Z   CONTRIBUTOR      

What happened:

When summing over all axes of a DataArray with strings of dtype object, the result is a one-size unicode DataArray.

What you expected to happen:

I expected the summation would preserve the dtype, meaning the one-size DataArray would be of dtype object

Minimal Complete Verifiable Example:

ds = xr.DataArray('a', [range(3), range(3)]).astype(object) ds.sum()

Output <xarray.DataArray ()> array('aaaaaaaaa', dtype='<U9')

On the other hand, when summing over one dimension only, the dtype is preserved ds.sum('dim_0')

Output: <xarray.DataArray (dim_1: 3)> array(['aaa', 'aaa', 'aaa'], dtype=object) Coordinates: * dim_1 (dim_1) int64 0 1 2

Anything else we need to know?:

The problem becomes relevant as soon as dask is used in the workflow. Dask expects the aggregated DataArray to be of dtype object which will likely lead to errors in the operations to follow.

Probably the behavior comes from creating a new DataArray after the reduction with np.sum() (which itself leads results in a pure python string).

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-66-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.2.1 numpy: 1.19.5 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: 0.7.4 h5py: 3.1.0 Nio: None zarr: 2.3.2 cftime: 1.3.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.0 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.01.1 distributed: 2021.01.1 matplotlib: 3.3.3 cartopy: 0.18.0 seaborn: 0.11.1 numbagg: None pint: None setuptools: 52.0.0.post20210125 pip: 21.0 conda: 4.9.2 pytest: 6.2.2 IPython: 7.19.0 sphinx: 3.4.3
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5024/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
882105903 MDU6SXNzdWU4ODIxMDU5MDM= 5281 'Parallelized' apply_ufunc for scripy.interpolate.griddata LJaksic 74414841 open 0     4 2021-05-09T10:08:46Z 2022-04-09T01:39:13Z   NONE      

Hi,

I'm working with large files from an ocean model with an unstructered grid. For instance, variable flow velocity ux with dimensions (194988, 1009, 20) for respectively: 'nFlowElement' (name unstructered grid element), 'time' and laydim (depth dimension). I'd like to interpolate these results to a structured grid with dimensions (600, 560, 1009, 20)for respectively: latitude, longitude, time and laydim. For this I am using scipy.interpolate.griddata. As these dataarrays are too large to load into your working memory at once, I am trying to work with 'chunks' (dask). Unfortunately, I bump into problems when trying to use apply_ufunc with setting: dask = 'parallelized'.

For smaller computational domains (smaller nFlowElement dimension) I ám still able to load the dataarray in my work memory. Then, the following code gives me the wanted result:

``` def interp_to_grid(u,xc,yc,xint,yint): print(u.shape,xc.shape,xint.shape) ug = griddata((xc,yc),u,(xint,yint), method='nearest', fill_value=np.nan) return ug

uxg = xr.apply_ufunc(interp_to_grid, ux, xc, yc, xint, yint, dask = 'allowed', input_core_dims=[['nFlowElem','time','laydim'],['nFlowElem'],['nFlowElem'],['dim_0','dim_1'],['dim_0','dim_1']], output_core_dims=[['dim_0','dim_1','time','laydim']], output_dtypes = [xr.DataArray] ) `` Notice that in the function interp_to_grid the input variables have the following dimensions: -u(i.e. ux, the original flow velocity output): (194988, 1009, 20) for (nFlowElem, time, laydim) -xc,yc(the latitude and longitude coordinates associated with these 194988 elements) so both (194988,) -xint, yint(the structured grid coordinates to which I would like to interpolate the data): both are (600, 560) for (dim_0,dim_1) Notice that scipy.interpolate.griddata does not require me to loop over the time and laydim dimension (as formulated in the code above). For this it is criticial to feedgriddata` the dimensions in the right order ('time' and 'laydim' last). The interpolated result, uxg, has dimensions (600, 560, 1009, 20) - as wanted and expected.

However, for much larger spatial domains it is required to work with dask = 'parallelized', because these input dataarrays can nolonger be loaded into my working memory. I have tried to apply chunks over the time dimension, but also over the nFlowElement dimension. I am aware that it is not possible to chunk over core dimensions.

This is one of my "parallel" attempts (with chunks along the time dim):

Input ux: <xarray.DataArray 'ucx' (nFlowElem: 194988, time: 1009, laydim: 20)> dask.array<transpose, shape=(194988, 1009, 20), dtype=float64, chunksize=(194988, 10, 20), chunktype=numpy.ndarray> Coordinates: FlowElem_xcc (nFlowElem) float64 dask.array<chunksize=(194988,), meta=np.ndarray> FlowElem_ycc (nFlowElem) float64 dask.array<chunksize=(194988,), meta=np.ndarray> * time (time) datetime64[ns] 2014-09-17 ... 2014-10-01 Dimensions without coordinates: nFlowElem, laydim Attributes: standard_name: eastward_sea_water_velocity long_name: velocity on flow element center, x-component units: m s-1 grid_mapping: wgs84 Apply_func: uxg = xr.apply_ufunc(interp_to_grid, ux, xc, yc, xint, yint, dask = 'parallelized', input_core_dims=[['nFlowElem'],['nFlowElem'],['nFlowElem'],['dim_0','dim_1'],['dim_0','dim_1']], output_core_dims=[['dim_0','dim_1']], output_dtypes = [xr.DataArray], ) Gives error: ``` File "interpnd.pyx", line 78, in scipy.interpolate.interpnd.NDInterpolatorBase.init

File "interpnd.pyx", line 192, in scipy.interpolate.interpnd._check_init_shape

ValueError: different number of values and points `` I have played around a lot with changing the core dimensions in apply_ufunc and the dimension along which to chunk. Also I have tried to manually change the order of dimensions of dataarrayuwhich is 'fed to' griddata (ininterp_to_grid`).

Any advice is very welcome! Best Wishes, Luka

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5281/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
856900805 MDU6SXNzdWU4NTY5MDA4MDU= 5148 Handling of non-string dimension names bcbnz 367900 open 0     5 2021-04-13T12:13:44Z 2022-04-09T01:36:19Z   CONTRIBUTOR      

While working on a pull request (#5149) for #5146 I came across an inconsistency in allowed dimension names. If I try and create a DataArray with a non-string dimension, I get a TypeError:

```python console

import xarray as xr da = xr.DataArray(np.ones((5, 5)), dims=[1, "y"]) ... TypeError: dimension 1 is not a string ```

But creating it with a string and renaming it works:

```python console

da = xr.DataArray(np.ones((5, 5)), dims=["x", "y"]).rename(x=1) da <xarray.DataArray (1: 5, y: 5)> array([[1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.]]) Dimensions without coordinates: 1, y ```

I can create a dataset via this renaming, but trying to get the repr value fails as xarray.core.utils.SortedKeysDict tries to sort it and cannot compare the string dimension to the int dimension:

```python console

import xarray as xr ds = xr.Dataset({"test": xr.DataArray(np.ones((5, 5)), dims=["x", "y"]).rename(x=1)}) ds ... ~/software/external/xarray/xarray/core/formatting.py in dataset_repr(ds) 519 520 dims_start = pretty_print("Dimensions:", col_width) --> 521 summary.append("{}({})".format(dims_start, dim_summary(ds))) 522 523 if ds.coords:

~/software/external/xarray/xarray/core/formatting.py in dim_summary(obj) 422 423 def dim_summary(obj): --> 424 elements = [f"{k}: {v}" for k, v in obj.sizes.items()] 425 return ", ".join(elements) 426

~/software/external/xarray/xarray/core/formatting.py in <listcomp>(.0) 422 423 def dim_summary(obj): --> 424 elements = [f"{k}: {v}" for k, v in obj.sizes.items()] 425 return ", ".join(elements) 426

/usr/lib/python3.9/_collections_abc.py in iter(self) 847 848 def iter(self): --> 849 for key in self._mapping: 850 yield (key, self._mapping[key]) 851

~/software/external/xarray/xarray/core/utils.py in iter(self) 437 438 def iter(self) -> Iterator[K]: --> 439 return iter(self.mapping) 440 441 def len(self) -> int:

~/software/external/xarray/xarray/core/utils.py in iter(self) 504 def iter(self) -> Iterator[K]: 505 # see #4571 for the reason of the type ignore --> 506 return iter(sorted(self.mapping)) # type: ignore[type-var] 507 508 def len(self) -> int:

TypeError: '<' not supported between instances of 'str' and 'int' ``` The same thing happens if I call rename on the dataset rather than the array it is initialised with.

If the initialiser requires the dimension names to be strings, and other code (which includes the HTML formatter I was looking at when I found this) assume that they are, then rename and any other method which can alter dimension names should also enforce the string requirement.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: 851d85b9203b49039237b447b3707b270d613db5 python: 3.9.2 (default, Feb 20 2021, 18:40:11) [GCC 10.2.0] python-bits: 64 OS: Linux OS-release: 5.11.13-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_NZ.UTF-8 LOCALE: en_NZ.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.17.0 pandas: 1.2.3 numpy: 1.20.1 scipy: 1.6.2 netCDF4: 1.5.6 pydap: None h5netcdf: 0.10.0 h5py: 3.2.1 Nio: None zarr: None cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.2 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.03.0 distributed: 2021.03.0 matplotlib: 3.4.1 cartopy: 0.18.0 seaborn: 0.11.1 numbagg: None pint: None setuptools: 54.2.0 pip: 20.3.1 conda: None pytest: 6.2.3 IPython: 7.22.0 sphinx: 3.5.4
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5148/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
910844095 MDU6SXNzdWU5MTA4NDQwOTU= 5434 xarray.open_rasterio ghost 10137 closed 0     2 2021-06-03T20:51:38Z 2022-04-09T01:31:26Z 2022-04-09T01:31:26Z NONE      

Could you please change xarray.open_rasterio from experimental to stable with more faster capability of reading geotiff files (if possible)? For original array indexing capabilities, I would like to stick in xarray than rioxarray. With much respected. Thank you.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5434/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1030768250 I_kwDOAMm_X849cEZ6 5877 Rolling() gives values different from pd.rolling() chiaral 8453445 open 0     4 2021-10-19T21:41:42Z 2022-04-09T01:29:07Z   CONTRIBUTOR      

I am not sure this is a bug - but it clearly doesn't give the results the user would expect.

The rolling sum of zeros gives me values that are not zeros

```python var = np.array([0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.31 , 0.91999996, 8.3 , 1.42 , 0.03 , 1.22 , 0.09999999, 0.14 , 0.13 , 0. , 0.12 , 0.03 , 2.53 , 0. , 0.19999999, 0.19999999, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], dtype='float32')

timet = np.array([ 43200000000000, 129600000000000, 216000000000000, 302400000000000, 388800000000000, 475200000000000, 561600000000000, 648000000000000, 734400000000000, 820800000000000, 907200000000000, 993600000000000, 1080000000000000, 1166400000000000, 1252800000000000, 1339200000000000, 1425600000000000, 1512000000000000, 1598400000000000, 1684800000000000, 1771200000000000, 1857600000000000, 1944000000000000, 2030400000000000, 2116800000000000, 2203200000000000, 2289600000000000, 2376000000000000, 2462400000000000, 2548800000000000, 2635200000000000, 2721600000000000, 2808000000000000, 2894400000000000, 2980800000000000], dtype='timedelta64[ns]')

ds_ex = xr.Dataset(data_vars=dict( pr=(["time"], var), ), coords=dict( time=("time", timet) ), )

ds_ex.rolling(time=3).sum().pr.values

``` it gives me this result:

array([ nan, nan, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 3.1000000e-01, 1.2300000e+00, 9.5300007e+00, 1.0640000e+01, 9.7500000e+00, 2.6700001e+00, 1.3500001e+00, 1.4600002e+00, 3.7000012e-01, 2.7000013e-01, 2.5000012e-01, 1.5000013e-01, 2.6800001e+00, 2.5600002e+00, 2.7300003e+00, 4.0000033e-01, 4.0000033e-01, 2.0000035e-01, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07], dtype=float32)

Note the non zero values - the non zero value changes depending on whether i use float64 or float32 as precision of my data. So this seems to be a precision related issue (although the first values are correctly set to zero), in fact other sums of values are not exactly what they should be.

The small difference at the 8th/9th decimal position can be expected due to precision, but the fact that the 0s become non zeros is problematic imho, especially if not documented. Oftentimes zero in geoscience data can mean a very specific thing (i.e. zero rainfall will be characterized differently than non-zero).

in pandas this instead works:

python df_ex = ds_ex.to_dataframe() df_ex.rolling(window=3).sum().values.T gives me

array([[ nan, nan, 0. , 0. , 0. , 0. , 0. , 0.31 , 1.22999996, 9.53000015, 10.6400001 , 9.75000015, 2.66999999, 1.35000001, 1.46000002, 0.36999998, 0.27 , 0.24999999, 0.15 , 2.67999997, 2.55999997, 2.72999996, 0.39999998, 0.39999998, 0.19999999, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ]])

What you expected to happen:

the sum of zeros should be zero. If this cannot be achieved/expected because of precision issues, it should be documented.

Anything else we need to know?:

I discovered this behavior in my old environments, but I created a new ad hoc environment with the latest versions, and it does the same thing.

Environment:

INSTALLED VERSIONS

commit: None python: 3.9.7 (default, Sep 16 2021, 08:50:36) [Clang 10.0.0 ] python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None

xarray: 0.19.0 pandas: 1.3.3 numpy: 1.21.2 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 58.0.4 pip: 21.2.4 conda: None pytest: None IPython: 7.28.0 sphinx: None

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5877/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
653442225 MDU6SXNzdWU2NTM0NDIyMjU= 4209 `xr.save_mfdataset()` doesn't honor `compute=False` argument andersy005 13301940 open 0     4 2020-07-08T16:40:11Z 2022-04-09T01:25:56Z   MEMBER      

What happened:

While using xr.save_mfdataset() function with compute=False I noticed that the function returns a dask.delayed object, but it doesn't actually defer the computation i.e. it actually writes datasets right away.

What you expected to happen:

I expect the datasets to be written when I explicitly call .compute() on the returned delayed object.

Minimal Complete Verifiable Example:

```python In [2]: import xarray as xr

In [3]: ds = xr.tutorial.open_dataset('rasm', chunks={})

In [4]: ds Out[4]: <xarray.Dataset> Dimensions: (time: 36, x: 275, y: 205) Coordinates: * time (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00 xc (y, x) float64 dask.array<chunksize=(205, 275), meta=np.ndarray> yc (y, x) float64 dask.array<chunksize=(205, 275), meta=np.ndarray> Dimensions without coordinates: x, y Data variables: Tair (time, y, x) float64 dask.array<chunksize=(36, 205, 275), meta=np.ndarray> Attributes: title: /workspace/jhamman/processed/R1002RBRxaaa01a/l... institution: U.W. source: RACM R1002RBRxaaa01a output_frequency: daily output_mode: averaged convention: CF-1.4 references: Based on the initial model of Liang et al., 19... comment: Output from the Variable Infiltration Capacity... nco_openmp_thread_number: 1 NCO: "4.6.0" history: Tue Dec 27 14:15:22 2016: ncatted -a dimension...

In [5]: path = "test.nc"

In [7]: ls -ltrh test.nc ls: cannot access test.nc: No such file or directory

In [8]: tasks = xr.save_mfdataset(datasets=[ds], paths=[path], compute=False)

In [9]: tasks Out[9]: Delayed('list-aa0b52e0-e909-4e65-849f-74526d137542')

In [10]: ls -ltrh test.nc -rw-r--r-- 1 abanihi ncar 14K Jul 8 10:29 test.nc ```

Anything else we need to know?:

Environment:

Output of <tt>xr.show_versions()</tt> ```python INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 3.10.0-693.21.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4 xarray: 0.15.1 pandas: 0.25.3 numpy: 1.18.5 scipy: 1.5.0 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.2.0 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.20.0 distributed: 2.20.0 matplotlib: 3.2.1 cartopy: None seaborn: None numbagg: None setuptools: 49.1.0.post20200704 pip: 20.1.1 conda: None pytest: None IPython: 7.16.1 sphinx: None ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4209/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
439875798 MDU6SXNzdWU0Mzk4NzU3OTg= 2937 encoding of boolean dtype in zarr rabernat 1197350 open 0     3 2019-05-03T03:53:27Z 2022-04-09T01:22:42Z   MEMBER      

I want to store an array with 1364688000 boolean values in zarr. I will have to read this array many times, so I am trying to do it as efficiently as possible.

I have noticed that, if we try to write boolean data to zarr from xarray, zarr stores it as i8. ~This means we are using 8x more memory than we actually need.~ In researching this, I actually learned that numpy bools use a full byte of memory 😲! However, we could still improve performance (albeit very marginally) by skipping the unnecessary dtype encoding that happens here.

Example python import xarray as xr import zarr for dtype in ['f8', 'i4', 'bool']: ds = xr.DataArray([1, 0]).astype(dtype).to_dataset('foo') store = {} ds.to_zarr(store) za = zarr.open(store)['foo'] print(dtype, za.dtype, za.attrs.get('dtype')) gives f8 float64 None i4 int32 None bool int8 bool

So it seems like, during serialization of bool data, xarray is converting the data to int8 and then adding a {'dtype': 'bool'} to the attributes as encoding. When the data is read back, this gets decoded and the data is coerced back to bool.

Problem description

Since zarr is fully capable of storing bool data directly, we should not need to encode the data as i8.

I think this happens in encode_cf_variable: https://github.com/pydata/xarray/blob/612d390f925e5490314c363e5e368b2a8bd5daf0/xarray/conventions.py#L236

which calls maybe_encode_bools: https://github.com/pydata/xarray/blob/612d390f925e5490314c363e5e368b2a8bd5daf0/xarray/conventions.py#L105-L112

So maybe we make the boolean encoding optional?

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 09:07:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-693.17.1.el7.centos.plus.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.8.18 libnetcdf: 4.4.1.1 xarray: 0.12.1 pandas: 0.20.3 numpy: 1.13.3 scipy: 1.1.0 netCDF4: 1.3.0 pydap: None h5netcdf: 0.5.0 h5py: 2.7.1 Nio: None zarr: 2.3.1 cftime: None nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 0.19.0+3.g064ebb1 distributed: 1.21.8 matplotlib: 3.0.3 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 36.6.0 pip: 9.0.1 conda: None pytest: 3.2.1 IPython: 6.2.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2937/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
650549352 MDU6SXNzdWU2NTA1NDkzNTI= 4197 Provide a "shrink" command to remove bounding nan/ whitespace of DataArray cwerner 13906519 open 0     7 2020-07-03T11:55:05Z 2022-04-09T01:22:31Z   NONE      

I'm currently trying to come up with an elegant solution to remove extra whitespace/ nan-values along the edges of a 2D DataArray. I'm working with geographic data and search for an automatic way to shrink the extend to valid data only. Think a map of the EU, but remove all cols/ rows of the array (starting from the edges) that only contain nan.

Describe the solution you'd like A shrink command that removes all nan rows/ cols at the edges of a DataArray.

Describe alternatives you've considered I currently do this with NumPy operating on the raw data and creating a new DataArray afterwards

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4197/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
528168017 MDU6SXNzdWU1MjgxNjgwMTc= 3573 rasterio test failure dcherian 2448579 closed 0     1 2019-11-25T15:40:19Z 2022-04-09T01:17:32Z 2022-04-09T01:17:32Z MEMBER      

version rasterio 1.1.1 py36h900e953_0 conda-forge

``` =================================== FAILURES =================================== ___ TestRasterio.testrasterio_vrt ____

self = <xarray.tests.test_backends.TestRasterio object at 0x7fc8355c8f60>

def test_rasterio_vrt(self):
    import rasterio

    # tmp_file default crs is UTM: CRS({'init': 'epsg:32618'}
    with create_tmp_geotiff() as (tmp_file, expected):
        with rasterio.open(tmp_file) as src:
            with rasterio.vrt.WarpedVRT(src, crs="epsg:4326") as vrt:
                expected_shape = (vrt.width, vrt.height)
                expected_crs = vrt.crs
                expected_res = vrt.res
                # Value of single pixel in center of image
                lon, lat = vrt.xy(vrt.width // 2, vrt.height // 2)
              expected_val = next(vrt.sample([(lon, lat)]))

xarray/tests/test_backends.py:3966:


/usr/share/miniconda/envs/xarray-tests/lib/python3.6/site-packages/rasterio/sample.py:43: in sample_gen data = read(indexes, window=window, masked=masked, boundless=True)


??? E ValueError: WarpedVRT does not permit boundless reads

rasterio/_warp.pyx:978: ValueError ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3573/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
504497403 MDU6SXNzdWU1MDQ0OTc0MDM= 3386 add option to open_mfdataset for not using dask sipposip 42270910 closed 0     6 2019-10-09T08:33:53Z 2022-04-09T01:16:21Z 2022-04-09T01:16:21Z NONE      

open_mfdataset only works with dask, whereas with open_dataset one can choose to use dask or not. It would be nice have an option (e.g. use_dask=False) to not use dask.

My special use-case is the following: I use netcdf data as input for a tensorflow/keras application. I use parallel preprocessing threads in Keras. When using dask arrays, it gets complicated because both dask and tensorflow work with threads. I do not need any processing capability of dask/xarray, I only need a lazily loaded array that I can slice, and where the slices are loaded the moment they are accessed. So my application works nice with open_dataset (without defining chunks, and thus not using dask, but the data is accessed slice by slice, so it is never loaded as a whole into memory). However, it would be nice to have the same with open_mfdataset. Right now my workaround is to use netCDF4.MFDataset . (Obviously another workaround would be to concatenate my files into one and use open_dataset) Opening each file separately with open_dataset, and then concatenating them with xr.concat does not work, as this loads the data into memory.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3386/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 928.922ms · About: xarray-datasette