home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

12 rows where repo = 13221727, state = "closed" and user = 43613877 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, closed_at, draft, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 7
  • pull 5

state 1

  • closed · 12 ✖

repo 1

  • xarray · 12 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2193178037 I_kwDOAMm_X86CuT21 8850 `groupy_bins` returns conflicting sizes error when DataArray to group is lazy observingClouds 43613877 closed 0     2 2024-03-18T20:04:19Z 2024-03-18T22:14:19Z 2024-03-18T22:14:19Z CONTRIBUTOR      

What happened?

The xr.DataArray.groupy_bins seems to have an issue with lazy DataArrays and throws a mis-leading error-message about conflicting sizes for dimension, when the aggregator function, e.g. mean, is called.

What did you expect to happen?

I expected that the arrays are handled without any issue or an error-message that would mention that the arrays have to be loaded before groupby_bins can be applied.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np

x = np.arange(10) y = np.arange(5)

var1 = np.random.rand(len(y), len(x)) var2 = np.random.rand(len(y), len(x))

ds = xr.Dataset( { 'var1': (['y', 'x'], var1), 'var2': (['y', 'x'], 10+var2*10), }, coords={ 'x': x, 'y': y, } )

ds['var1'] = ds.var1.chunk(x=3)

ds.var1.groupby_bins(ds.var2, bins=np.linspace(10,20,100)).mean()

fails with

ValueError: conflicting sizes for dimension 'var2_bins': length 99 on 'var2_bins' and length 41 on {'var2_bins': <this-array>}

ds.var1.compute().groupby_bins(ds.var2, bins=np.linspace(10,20,100)).mean()

returns the expected output

""" <xarray.DataArray 'var1' (var2_bins: 99)> Size: 792B array([0.90665731, 0.39259895, 0.09858736, 0.94222699, 0.83785883, nan, 0.46287129, nan, nan, 0.02260558, 0.06989385, nan, nan, nan, 0.41192196, nan, nan, nan, 0.90680258, 0.74418783, 0.84559937, 0.43462018, nan, nan, 0.00244231, 0.65950057, nan, nan, 0.00515549, nan, 0.41554394, 0.74563456, nan, nan, nan, nan, nan, nan, nan, nan, nan, 0.48631902, nan, nan, 0.86050492, 0.05572065, nan, 0.7567633 , nan, 0.70537106, nan, nan, nan, nan, nan, nan, 0.65957427, 0.39201731, 0.3159046 , nan, 0.71012231, nan, nan, nan, nan, nan, nan, nan, 0.7104425 , nan, nan, 0.94564132, 0.81052373, nan, nan, 0.94000787, nan, 0.88280569, nan, nan, 0.33939775, 0.50393615, nan, 0.84943353, nan, nan, nan, nan, 0.28231671, 0.35149525, nan, nan, 0.18657728, nan, 0.23287227, 0.34968875, nan, nan, 0.3135791 ]) Coordinates: * var2_bins (var2_bins) object 792B (10.0, 10.101] ... (19.899, 20.0] """ ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

```Python

ValueError Traceback (most recent call last) Input In [11], in <cell line: 1>() ----> 1 ds.var1.groupby_bins(ds.var2, bins=np.linspace(10,20,100)).mean()

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/_aggregations.py:5984, in DataArrayGroupByAggregations.mean(self, dim, skipna, keep_attrs, kwargs) 5900 """ 5901 Reduce this DataArray's data by applying mean along some dimension(s). 5902 (...) 5977 * labels (labels) object 24B 'a' 'b' 'c' 5978 """ 5979 if ( 5980 flox_available 5981 and OPTIONS["use_flox"] 5982 and contains_only_chunked_or_numpy(self._obj) 5983 ): -> 5984 return self._flox_reduce( 5985 func="mean", 5986 dim=dim, 5987 skipna=skipna, 5988 # fill_value=fill_value, 5989 keep_attrs=keep_attrs, 5990 kwargs, 5991 ) 5992 else: 5993 return self._reduce_without_squeeze_warn( 5994 duck_array_ops.mean, 5995 dim=dim, (...) 5998 **kwargs, 5999 )

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/groupby.py:1079, in GroupBy._flox_reduce(self, dim, keep_attrs, kwargs) 1076 kwargs.setdefault("min_count", 1) 1078 output_index = grouper.full_index -> 1079 result = xarray_reduce( 1080 obj.drop_vars(non_numeric.keys()), 1081 self._codes, 1082 dim=parsed_dim, 1083 # pass RangeIndex as a hint to flox that by is already factorized 1084 expected_groups=(pd.RangeIndex(len(output_index)),), 1085 isbin=False, 1086 keep_attrs=keep_attrs, 1087 kwargs, 1088 ) 1090 # we did end up reducing over dimension(s) that are 1091 # in the grouped variable 1092 group_dims = grouper.group.dims

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/flox/xarray.py:384, in xarray_reduce(obj, func, expected_groups, isbin, sort, dim, split_out, fill_value, method, engine, keep_attrs, skipna, min_count, reindex, by, *finalize_kwargs) 382 actual = actual.set_coords(levelnames) 383 else: --> 384 actual[name] = expect 385 if keep_attrs: 386 actual[name].attrs = by_.attrs

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/dataset.py:1603, in Dataset.setitem(self, key, value) 1598 if isinstance(value, Dataset): 1599 raise TypeError( 1600 "Cannot assign a Dataset to a single key - only a DataArray or Variable " 1601 "object can be stored under a single key." 1602 ) -> 1603 self.update({key: value}) 1605 elif utils.iterable_of_hashable(key): 1606 keylist = list(key)

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/dataset.py:5617, in Dataset.update(self, other) 5581 def update(self, other: CoercibleMapping) -> Self: 5582 """Update this dataset's variables with those from another dataset. 5583 5584 Just like :py:meth:dict.update this is a in-place operation. (...) 5615 Dataset.merge 5616 """ -> 5617 merge_result = dataset_update_method(self, other) 5618 return self._replace(inplace=True, **merge_result._asdict())

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/merge.py:1075, in dataset_update_method(dataset, other) 1072 if coord_names: 1073 other[key] = value.drop_vars(coord_names) -> 1075 return merge_core( 1076 [dataset, other], 1077 priority_arg=1, 1078 indexes=dataset.xindexes, 1079 combine_attrs="override", 1080 )

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/merge.py:724, in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value, skip_align_args) 719 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat) 720 variables, out_indexes = merge_collected( 721 collected, prioritized, compat=compat, combine_attrs=combine_attrs 722 ) --> 724 dims = calculate_dimensions(variables) 726 coord_names, noncoord_names = determine_coords(coerced) 727 if compat == "minimal": 728 # coordinates may be dropped in merged results

File /work/mh0010/m300408/envs/covariability/lib/python3.9/site-packages/xarray/core/variable.py:2947, in calculate_dimensions(variables) 2945 last_used[dim] = k 2946 elif dims[dim] != size: -> 2947 raise ValueError( 2948 f"conflicting sizes for dimension {dim!r}: " 2949 f"length {size} on {k!r} and length {dims[dim]} on {last_used!r}" 2950 ) 2951 return dims

ValueError: conflicting sizes for dimension 'var2_bins': length 99 on 'var2_bins' and length 41 on {'var2_bins': <this-array>} ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:24:11) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 4.18.0-425.10.1.el8_7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2024.2.0 pandas: 1.4.3 numpy: 1.24.4 scipy: 1.9.0 netCDF4: 1.6.0 pydap: installed h5netcdf: 1.0.2 h5py: 3.7.0 Nio: None zarr: 2.12.0 cftime: 1.6.0 nc_time_axis: 1.4.1 iris: None bottleneck: 1.3.5 dask: 2022.9.2 distributed: 2022.9.2 matplotlib: 3.5.1 cartopy: 0.20.1 seaborn: 0.11.2 numbagg: None fsspec: 2023.9.2 cupy: None pint: 0.17 sparse: None flox: 0.5.9 numpy_groupies: 0.9.19 setuptools: 68.0.0 pip: 22.0.4 conda: 4.11.0 pytest: 7.1.3 mypy: 0.971 IPython: 8.1.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8850/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1396889729 I_kwDOAMm_X85TQtiB 7127 Document that `Variable.encoding` is ignored if encoding is given in `to_netcdf` observingClouds 43613877 closed 0     1 2022-10-04T21:57:48Z 2023-07-21T21:57:41Z 2023-07-21T21:57:41Z CONTRIBUTOR      

What happened?

With a change from xarray version 2022.06.0 to 2022.09.0 the following output is no longer written as float32 but float64.

What did you expect to happen?

I expected the output to have the same dtype.

Minimal Complete Verifiable Example

Python import xarray as xr ds = xr.tutorial.load_dataset("eraint_uvz") encoding = {'z':{'zlib':True} ds.z.to_netcdf("compressed.nc", encoding=encoding)

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python

xarray version == 2022.06.0

netcdf compressed { dimensions: longitude = 480 ; latitude = 241 ; level = 3 ; month = 2 ; variables: float longitude(longitude) ; longitude:_FillValue = NaNf ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; float latitude(latitude) ; latitude:_FillValue = NaNf ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; int level(level) ; level:units = "millibars" ; level:long_name = "pressure_level" ; int month(month) ; float z(month, level, latitude, longitude) ; z:_FillValue = NaNf ; z:number_of_significant_digits = 5 ; z:units = "m2 s-2" ; z:long_name = "Geopotential" ; z:standard_name = "geopotential" ;

xarray version == 2022.09.0

netcdf compressed { dimensions: longitude = 480 ; latitude = 241 ; level = 3 ; month = 2 ; variables: float longitude(longitude) ; longitude:_FillValue = NaNf ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; float latitude(latitude) ; latitude:_FillValue = NaNf ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; int level(level) ; level:units = "millibars" ; level:long_name = "pressure_level" ; int month(month) ; double z(month, level, latitude, longitude) ; z:_FillValue = NaN ; z:number_of_significant_digits = 5 ; z:units = "m2 s-2" ; z:long_name = "Geopotential" ; z:standard_name = "geopotential" ; ```

Anything else we need to know?

In addition to the change of dtype from float to double, I wonder if both outputs should actually rather be int16, because this is the dtype of the original dataset:

```python

import xarray as xr ds = xr.tutorial.load_dataset("eraint_uvz") ds.z.encoding {'source': '.../.cache/xarray_tutorial_data/e4bb6ebf67663eeab3ff30beae6a5acf-eraint_uvz.nc', 'original_shape': (2, 3, 241, 480), 'dtype': dtype('int16'), '_FillValue': nan, 'scale_factor': -1.7250274674967954, 'add_offset': 66825.5} ds.z.to_netcdf("original.nc") ```

netcdf original { dimensions: longitude = 480 ; latitude = 241 ; level = 3 ; month = 2 ; variables: float longitude(longitude) ; longitude:_FillValue = NaNf ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; float latitude(latitude) ; latitude:_FillValue = NaNf ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; int level(level) ; level:units = "millibars" ; level:long_name = "pressure_level" ; int month(month) ; short z(month, level, latitude, longitude) ; z:_FillValue = 0s ; z:number_of_significant_digits = 5 ; z:units = "m**2 s**-2" ; z:long_name = "Geopotential" ; z:standard_name = "geopotential" ; z:add_offset = 66825.5 ; z:scale_factor = -1.7250274674968 ;

Sorry for mixing an issue with a question, but why is the add_offset and scale_factor applied and the values saved as float32/float64 in case encoding is set? I guess encoding in to_netcdf is overwriting the initial encoding, because

python ds.z.to_netcdf("test_w_offset.nc", encoding={"z":{"add_offset":66825.5, "scale_factor":-1.7250274674968, "dtype":'int16'}}) produces the expected output that matches the original one. So I imagine, a good way of setting the output encoding is currently something like python ds.to_netcdf("compressed.nc", encoding={v:{**ds.v.encoding, "zlib":True} for v in ds.data_vars}) in case an encoding similar to the input encoding - with additional parameters (e.g. 'zlib') - is requested.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0] python-bits: 64 OS: Linux OS-release: 4.18.0-305.25.1.el8_4.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 2022.6.0. # or 2022.9.0 pandas: 1.5.0 numpy: 1.23.3 scipy: None netCDF4: 1.6.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.13.2 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.4.1 pip: 22.2.2 conda: None pytest: None IPython: 8.3.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7127/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1571143098 PR_kwDOAMm_X85JRTpv 7500 Zarr: drop "source" and "original_shape" from encoding observingClouds 43613877 closed 0     3 2023-02-04T22:01:30Z 2023-02-07T04:39:37Z 2023-02-07T04:22:09Z CONTRIBUTOR   0 pydata/xarray/pulls/7500
  • [x] Closes #7129
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7500/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1396997022 I_kwDOAMm_X85TRHue 7129 dataset encodings 'source' and 'original_shape' are not dropped in `zarr` backend observingClouds 43613877 closed 0     1 2022-10-05T00:12:12Z 2023-02-07T04:22:11Z 2023-02-07T04:22:11Z CONTRIBUTOR      

What happened?

When opening a dataset, like one from the tutorial, and writing it as zarr file, an error is raised due to encodings that are invalid for the zarr driver, when the encoding is given in to_zarr. In this particular case, the encodings source and original_shape are added by xarray itself, so that I expect that it can handle these encodings without raising an error.

What did you expect to happen?

I expect that the encodings source and original_shape being dropped similar to the netCDF4 backend.

Minimal Complete Verifiable Example

Python import xarray as xr ds = xr.tutorial.load_dataset("eraint_uvz") ds.to_zarr("test.zarr", encoding={"z":{**ds.z.encoding}})

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python

ds.to_zarr("test_w_offset.zarr01", encoding={"z":{**ds.z.encoding}}) Traceback (most recent call last): File "<stdin>", line 1, in <module> File ".../.conda/envs/xarray2022090/lib/python3.10/site-packages/xarray/core/dataset.py", line 2068, in to_zarr return to_zarr( # type: ignore File ".../.conda/envs/xarray2022090/lib/python3.10/site-packages/xarray/backends/api.py", line 1653, in to_zarr dump_to_store(dataset, zstore, writer, encoding=encoding) File ".../.conda/envs/xarray2022090/lib/python3.10/site-packages/xarray/backends/api.py", line 1273, in dump_to_store store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File ".../.conda/envs/xarray2022090/lib/python3.10/site-packages/xarray/backends/zarr.py", line 574, in store self.set_variables( File ".../.conda/envs/xarray2022090/lib/python3.10/site-packages/xarray/backends/zarr.py", line 621, in set_variables encoding = extract_zarr_variable_encoding( File ".../.conda/envs/xarray2022090/lib/python3.10/site-packages/xarray/backends/zarr.py", line 247, in extract_zarr_variable_encoding raise ValueError( ValueError: unexpected encoding parameters for zarr backend: ['source', 'original_shape'] ```

Anything else we need to know?

The respective lines in the netCDF4 backend are: https://github.com/pydata/xarray/blob/13c52b27b777709fc3316cf4334157f50904c02b/xarray/backends/netCDF4_.py#L235 and

https://github.com/pydata/xarray/blob/13c52b27b777709fc3316cf4334157f50904c02b/xarray/backends/netCDF4_.py#L272-L274

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0] python-bits: 64 OS: Linux OS-release: 4.18.0-305.25.1.el8_4.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 2022.9.0 pandas: 1.5.0 numpy: 1.23.3 scipy: None netCDF4: 1.6.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.13.2 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.4.1 pip: 22.2.2 conda: None pytest: None IPython: 8.3.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7129/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1419649041 I_kwDOAMm_X85UniAR 7198 `groupby_bins` raises `AttributeError` when used with Dask, flox and option `labels` observingClouds 43613877 closed 0     0 2022-10-23T05:44:04Z 2022-10-26T15:56:36Z 2022-10-26T15:56:36Z CONTRIBUTOR      

What happened?

I updated xarray and installed flox in my environment which caused my previously working call of groupby_bins to fail. The failure only occurs when I use the labels-options.

What did you expect to happen?

I expected that groupby_bins would work independent of the used algorithm.

Minimal Complete Verifiable Example

```Python import dask import xarray as xr import numpy as np

ds = xr.Dataset({"d":(("x","y"),dask.array.random.random((10,20)))}, coords={'x':np.arange(10),'y':np.arange(10,30)}) xr.set_options(use_flox=True) ds.groupby_bins('x', np.arange(0,11,5), labels=[5,10]).sum().compute() ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python

AttributeError Traceback (most recent call last) Cell In [1], line 7 5 ds = xr.Dataset({"d":(("x","y"),dask.array.random.random((10,20)))}, coords={'x':np.arange(10),'y':np.arange(10,30)}) 6 xr.set_options(use_flox=True) ----> 7 ds.groupby_bins('x', np.arange(0,11,5), labels=[5,10]).sum().compute()

File ~/mambaforge/envs/flox/lib/python3.10/site-packages/xarray/core/_reductions.py:2774, in DatasetGroupByReductions.sum(self, dim, skipna, min_count, keep_attrs, kwargs) 2678 """ 2679 Reduce this Dataset's data by applying sum along some dimension(s). 2680 (...) 2771 da (labels) float64 nan 4.0 4.0 2772 """ 2773 if flox and OPTIONS["use_flox"] and contains_only_dask_or_numpy(self._obj): -> 2774 return self._flox_reduce( 2775 func="sum", 2776 dim=dim, 2777 skipna=skipna, 2778 min_count=min_count, 2779 numeric_only=True, 2780 # fill_value=fill_value, 2781 keep_attrs=keep_attrs, 2782 kwargs, 2783 ) 2784 else: 2785 return self.reduce( 2786 duck_array_ops.sum, 2787 dim=dim, (...) 2792 **kwargs, 2793 )

File ~/mambaforge/envs/flox/lib/python3.10/site-packages/xarray/core/groupby.py:774, in GroupBy._flox_reduce(self, dim, keep_attrs, **kwargs) 769 if self._bins is not None: 770 # bins provided to flox are at full precision 771 # the bin edge labels have a default precision of 3 772 # reassign to fix that. 773 assert self._full_index is not None --> 774 new_coord = [ 775 pd.Interval(inter.left, inter.right) for inter in self._full_index 776 ] 777 result[self._group.name] = new_coord 778 # Fix dimension order when binning a dimension coordinate 779 # Needed as long as we do a separate code path for pint; 780 # For some reason Datasets and DataArrays behave differently!

File ~/mambaforge/envs/flox/lib/python3.10/site-packages/xarray/core/groupby.py:775, in <listcomp>(.0) 769 if self._bins is not None: 770 # bins provided to flox are at full precision 771 # the bin edge labels have a default precision of 3 772 # reassign to fix that. 773 assert self._full_index is not None 774 new_coord = [ --> 775 pd.Interval(inter.left, inter.right) for inter in self._full_index 776 ] 777 result[self._group.name] = new_coord 778 # Fix dimension order when binning a dimension coordinate 779 # Needed as long as we do a separate code path for pint; 780 # For some reason Datasets and DataArrays behave differently!

AttributeError: 'int' object has no attribute 'left' ```

Anything else we need to know?

python xr.set_options(use_flox=False) ds.groupby_bins('x', np.arange(0,11,5), labels=[5,10]).sum().compute() works as expected.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:43:44) [Clang 13.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: None libnetcdf: None xarray: 2022.10.0 pandas: 1.5.1 numpy: 1.23.4 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.10.0 distributed: 2022.10.0 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2022.10.0 cupy: None pint: None sparse: None flox: 0.6.1 numpy_groupies: 0.9.20 setuptools: 65.5.0 pip: 22.3 conda: None pytest: None IPython: 8.5.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7198/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1255359858 PR_kwDOAMm_X8440kR1 6656 pdyap version dependent client.open_url call observingClouds 43613877 closed 0     4 2022-06-01T08:33:39Z 2022-06-17T09:38:32Z 2022-06-16T21:17:30Z CONTRIBUTOR   0 pydata/xarray/pulls/6656
  • [x] Closes #6648
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6656/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1251032633 I_kwDOAMm_X85KkT45 6648 Loading data with pydap engine fails due to outdated/missing dependency observingClouds 43613877 closed 0     1 2022-05-27T17:24:21Z 2022-06-16T21:17:29Z 2022-06-16T21:17:29Z CONTRIBUTOR      

What happened?

Hi,

I was trying to load a dataset with the pydap engine, but unfortunately it failed due to a mis-match of arguments in xarray's pydap backend and the used version of pydap (3.2.2).

What did you expect to happen?

Retrieving a nice dataset 😄

Minimal Complete Verifiable Example

Python import xarray as xr xr.open_dataset('http://test.opendap.org/dap/data/nc/coads_climatology.nc', engine='pydap')

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python TypeError Traceback (most recent call last) <ipython-input-4-36e60ce43c69> in <module> ----> 1 xr.open_dataset('http://test.opendap.org/dap/data/nc/coads_climatology.nc', engine='pydap')

~/mambaforge/envs/how_to_eurec4a/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, args, *kwargs) 493 494 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None) --> 495 backend_ds = backend.open_dataset( 496 filename_or_obj, 497 drop_variables=drop_variables,

~/mambaforge/envs/how_to_eurec4a/lib/python3.8/site-packages/xarray/backends/pydap_.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, application, session, output_grid, timeout, verify, user_charset) 163 ): 164 --> 165 store = PydapDataStore.open( 166 url=filename_or_obj, 167 application=application,

~/mambaforge/envs/how_to_eurec4a/lib/python3.8/site-packages/xarray/backends/pydap_.py in open(cls, url, application, session, output_grid, timeout, verify, user_charset) 112 user_charset = "ascii" 113 --> 114 ds = pydap.client.open_url( 115 url=url, 116 application=application,

TypeError: open_url() got an unexpected keyword argument 'verify' ```

Anything else we need to know?

The root-cause of this issue seems to be a missing dependency.

With https://github.com/pydata/xarray/commit/dfaedb2773208c78ab93940ef4a1979238ee0f55 the verify-argument has been added to xarray/backends/pydap_.py, which is supported by pydap since late 2018 (https://github.com/pydap/pydap/pull/112). However, only recently a new release of pydap has been published and incorporates these changes now.

Version 3.3.0, released on 1. Feb. 2022 (includes verify) Version 3.2.2 released on 25. May 2017 (has no verify)

Unfortunately, version 3.3.0 is not yet available on pypi and only on conda-forge.

I couldn't find any pins or limitations on versions in e.g. requirements.txt or setup.cfg for non-core dependencies. Should this dependency be introduced somewhere? At least https://github.com/pydata/xarray/blob/e02b1c3f6d18c7afcdf4f78cf3463652b4cc96c9/ci/requirements/min-all-deps.yml needs to be updated I guess.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:06:49) [Clang 12.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.3.0 pandas: 1.4.2 numpy: 1.22.3 scipy: 1.8.0 netCDF4: 1.5.8 pydap: installed h5netcdf: None h5py: None Nio: None zarr: 2.11.1 cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.04.1 distributed: 2022.4.1 matplotlib: 3.5.1 cartopy: 0.20.2 seaborn: None numbagg: None fsspec: 2022.5.0 cupy: None pint: None sparse: None setuptools: 61.1.1 pip: 22.0.4 conda: None pytest: 7.1.2 IPython: 7.29.0 sphinx: 4.5.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6648/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
822256201 MDExOlB1bGxSZXF1ZXN0NTg0OTEwODE4 4994 Add date attribute to datetime accessor observingClouds 43613877 closed 0     4 2021-03-04T15:47:17Z 2021-03-16T10:34:19Z 2021-03-16T10:00:23Z CONTRIBUTOR   0 pydata/xarray/pulls/4994
  • [x] Closes #4983
  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4994/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
819897789 MDU6SXNzdWU4MTk4OTc3ODk= 4983 Date missing in datetime accessor observingClouds 43613877 closed 0     4 2021-03-02T10:52:00Z 2021-03-16T10:00:23Z 2021-03-16T10:00:23Z CONTRIBUTOR      

What happened: I wonder if there is a reason, why there is no date attribute in the datetime accessor.

What you expected to happen: As the time attribute is supported I would expect the same for the date attribute

Minimal Complete Verifiable Example:

```python import xarray as xr import pandas as pd time_coord = pd.date_range("2020-01-01","2020-01-03", freq="12H") da = xr.DataArray([1,2,3,4,5], dims=["time"], coords={'time': time_coord})

print(da.time.dt.time)

<xarray.DataArray 'time' (time: 5)>

array([datetime.time(0, 0), datetime.time(12, 0), datetime.time(0, 0),

datetime.time(12, 0), datetime.time(0, 0)], dtype=object)

Coordinates:

* time (time) datetime64[ns] 2020-01-01 2020-01-01T12:00:00 ... 2020-01-03

print(da.time.dt.date)

---------------------------------------------------------------------------

AttributeError Traceback (most recent call last)

<ipython-input-42-13741f407661> in <module>

----> 1 da.time.dt.date

AttributeError: 'DatetimeAccessor' object has no attribute 'date'

```

Suggestion: A simple addition of date = Properties._tslib_field_accessor( "date", "Date corresponding to datetimes", object ) in core/accessor_dt.py should do the trick. Happy to do a PR.

Anything else we need to know?:

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 2.6.32-754.33.1.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.17.0 pandas: 1.2.1 numpy: 1.20.0 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.6.1 cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.02.0 distributed: 2021.02.0 matplotlib: 3.3.4 cartopy: 0.18.0 seaborn: 0.11.1 numbagg: None pint: 0.16.1 setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: None IPython: 7.20.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4983/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
822323098 MDExOlB1bGxSZXF1ZXN0NTg0OTY2OTk5 4996 Drop indices outside tolerance when selecting with method nearest observingClouds 43613877 closed 0     0 2021-03-04T17:02:49Z 2021-03-15T02:29:39Z 2021-03-15T02:29:39Z CONTRIBUTOR   1 pydata/xarray/pulls/4996
  • [x] Closes #4995
  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4996/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
403462155 MDExOlB1bGxSZXF1ZXN0MjQ3OTAxNzYy 2716 ENH: resample methods with tolerance observingClouds 43613877 closed 0     5 2019-01-26T17:23:22Z 2019-01-31T17:28:16Z 2019-01-31T17:28:10Z CONTRIBUTOR   0 pydata/xarray/pulls/2716
  • ENH: resample methods bfill, pad, nearest accept tolerance keyword
  • [x] Closes #2695
  • [x] Tests added
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2716/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
401392318 MDU6SXNzdWU0MDEzOTIzMTg= 2695 Resample with limit/tolerance observingClouds 43613877 closed 0     3 2019-01-21T15:04:30Z 2019-01-31T17:28:09Z 2019-01-31T17:28:09Z CONTRIBUTOR      

Upsampling methods cannot be limited

It is comes very handy to limit the scope of the resample method e.g. nearest in time series. In pandas the limit argument can be given, such that:

```python import pandas as pd import datetime as dt

dates=[dt.datetime(2018,1,1), dt.datetime(2018,1,2)] data=[10,20] df=pd.DataFrame(data,index=dates) df.resample('1H').nearest(limit=1) ```

This leads to 2018-01-01 00:00:00 10.0 2018-01-01 01:00:00 10.0 2018-01-01 02:00:00 NaN 2018-01-01 03:00:00 NaN 2018-01-01 04:00:00 NaN ... 2018-01-01 20:00:00 NaN 2018-01-01 21:00:00 NaN 2018-01-01 22:00:00 NaN 2018-01-01 23:00:00 20.0 2018-01-02 00:00:00 20.0

Currently: python import xarray as xr xdf = xr.Dataset.from_dataframe(df) xdf.resample({'index':'1H'}).nearest(limit=1) leads to Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: nearest() got an unexpected keyword argument 'limit'

Problem description

This is very helpful, as one might not want to fill gaps with the nearest method indefinitely. To my understanding the following modifications might be made by comparisions to the pandas code:

/xarray/core/resample.py python def _upsample(self, method, limit=None, *args, **kwargs): ... elif method in ['pad', 'ffill', 'backfill', 'bfill', 'nearest']: kwargs = kwargs.copy() kwargs.update(**{self._dim: upsampled_index}) return self._obj.reindex(method=method, tolerance=limit, *args, **kwargs) ...

and python def nearest(self, limit=None): """Take new values from nearest original coordinate to up-sampled frequency coordinates. """ return self._upsample('nearest',limit=limit)

So I think, with the tolerance keyword, reindex supports already the limit, but it just hasn't been forwarded to the _upsample and nearest methods.

Current Output

```python import xarray as xr

xdf = xr.Dataset.from_dataframe(df) xdf.resample({'index':'1H'}).nearest() <xarray.Dataset> Dimensions: (index: 25) Coordinates: * index (index) datetime64[ns] 2018-01-01 ... 2018-01-02 Data variables: 0 (index) int64 10 10 10 10 10 10 10 10 ... 20 20 20 20 20 20 20 20 ```

However, it would be nice, if the following would work: ```python xdf.resample({'index':'1H'}).nearest(limit=1)

<xarray.Dataset> Dimensions: (index: 25) Coordinates: * index (index) datetime64[ns] 2018-01-01 ... 2018-01-02 Data variables: 0 (index) float64 10.0 10.0 nan nan nan nan ... nan nan nan 20.0 20.0 ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2695/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 652.196ms · About: xarray-datasette