home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

18 rows where repo = 13221727, state = "open" and user = 10194086 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date)

type 2

  • issue 17
  • pull 1

state 1

  • open · 18 ✖

repo 1

  • xarray · 18 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2163675672 PR_kwDOAMm_X85obI_8 8803 missing chunkmanager: update error message mathause 10194086 open 0     4 2024-03-01T15:48:00Z 2024-03-15T11:02:45Z   MEMBER   0 pydata/xarray/pulls/8803

When dask is missing we get the following error message:

python-traceback ValueError: unrecognized chunk manager dask - must be one of: []

this could be confusing - the error message seems geared towards a typo in the requested manager. However, I think it's much more likely that a chunk manager is just not installed. I tried to update the error message - happy to get feedback.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8803/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2105703882 I_kwDOAMm_X859gn3K 8679 Dataset.weighted along a dimension not on weights errors mathause 10194086 open 0     2 2024-01-29T15:03:39Z 2024-02-04T11:24:54Z   MEMBER      

What happened?

ds.weighted(weights).mean(dims) errors when reducing over a dimension that is neither on the weights nor on the variable.

What did you expect to happen?

This used to work and was "broken" by #8606. However, we may want to fix this by ignoring (?) those data vars instead (#7027).

Minimal Complete Verifiable Example

```Python import xarray as xr

ds = xr.Dataset({"a": (("y", "x"), [[1, 2]]), "scalar": 1}) weights = xr.DataArray([1, 2], dims="x")

ds.weighted(weights).mean("y") ```

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

```Python ValueError Traceback (most recent call last) Cell In[1], line 6 3 ds = xr.Dataset({"a": (("y", "x"), [[1, 2]]), "scalar": 1}) 4 weights = xr.DataArray([1, 2], dims="x") ----> 6 ds.weighted(weights).mean("y")

File ~/code/xarray/xarray/util/deprecation_helpers.py:115, in _deprecate_positional_args.<locals>._decorator.<locals>.inner(args, kwargs) 111 kwargs.update({name: arg for name, arg in zip_args}) 113 return func(args[:-n_extra_args], kwargs) --> 115 return func(*args, kwargs)

File ~/code/xarray/xarray/core/weighted.py:497, in Weighted.mean(self, dim, skipna, keep_attrs) 489 @_deprecate_positional_args("v2023.10.0") 490 def mean( 491 self, (...) 495 keep_attrs: bool | None = None, 496 ) -> T_Xarray: --> 497 return self._implementation( 498 self._weighted_mean, dim=dim, skipna=skipna, keep_attrs=keep_attrs 499 )

File ~/code/xarray/xarray/core/weighted.py:558, in DatasetWeighted._implementation(self, func, dim, kwargs) 555 def _implementation(self, func, dim, kwargs) -> Dataset: 556 self._check_dim(dim) --> 558 return self.obj.map(func, dim=dim, **kwargs)

File ~/code/xarray/xarray/core/dataset.py:6924, in Dataset.map(self, func, keep_attrs, args, kwargs) 6922 if keep_attrs is None: 6923 keep_attrs = _get_keep_attrs(default=False) -> 6924 variables = { 6925 k: maybe_wrap_array(v, func(v, *args, kwargs)) 6926 for k, v in self.data_vars.items() 6927 } 6928 if keep_attrs: 6929 for k, v in variables.items():

File ~/code/xarray/xarray/core/dataset.py:6925, in <dictcomp>(.0) 6922 if keep_attrs is None: 6923 keep_attrs = _get_keep_attrs(default=False) 6924 variables = { -> 6925 k: maybe_wrap_array(v, func(v, args, *kwargs)) 6926 for k, v in self.data_vars.items() 6927 } 6928 if keep_attrs: 6929 for k, v in variables.items():

File ~/code/xarray/xarray/core/weighted.py:286, in Weighted._weighted_mean(self, da, dim, skipna) 278 def _weighted_mean( 279 self, 280 da: T_DataArray, 281 dim: Dims = None, 282 skipna: bool | None = None, 283 ) -> T_DataArray: 284 """Reduce a DataArray by a weighted mean along some dimension(s).""" --> 286 weighted_sum = self._weighted_sum(da, dim=dim, skipna=skipna) 288 sum_of_weights = self._sum_of_weights(da, dim=dim) 290 return weighted_sum / sum_of_weights

File ~/code/xarray/xarray/core/weighted.py:276, in Weighted._weighted_sum(self, da, dim, skipna) 268 def _weighted_sum( 269 self, 270 da: T_DataArray, 271 dim: Dims = None, 272 skipna: bool | None = None, 273 ) -> T_DataArray: 274 """Reduce a DataArray by a weighted sum along some dimension(s).""" --> 276 return self._reduce(da, self.weights, dim=dim, skipna=skipna)

File ~/code/xarray/xarray/core/weighted.py:231, in Weighted._reduce(da, weights, dim, skipna) 227 da = da.fillna(0.0) 229 # dot does not broadcast arrays, so this avoids creating a large 230 # DataArray (if weights has additional dimensions) --> 231 return dot(da, weights, dim=dim)

File ~/code/xarray/xarray/util/deprecation_helpers.py:140, in deprecate_dims.<locals>.wrapper(args, kwargs) 132 emit_user_level_warning( 133 "The dims argument has been renamed to dim, and will be removed " 134 "in the future. This renaming is taking place throughout xarray over the " (...) 137 PendingDeprecationWarning, 138 ) 139 kwargs["dim"] = kwargs.pop("dims") --> 140 return func(args, **kwargs)

File ~/code/xarray/xarray/core/computation.py:1885, in dot(dim, arrays, *kwargs) 1883 dim = tuple(d for d, c in dim_counts.items() if c > 1) 1884 else: -> 1885 dim = parse_dims(dim, all_dims=tuple(all_dims)) 1887 dot_dims: set[Hashable] = set(dim) 1889 # dimensions to be parallelized

File ~/code/xarray/xarray/core/utils.py:1046, in parse_dims(dim, all_dims, check_exists, replace_none) 1044 dim = (dim,) 1045 if check_exists: -> 1046 _check_dims(set(dim), set(all_dims)) 1047 return tuple(dim)

File ~/code/xarray/xarray/core/utils.py:1131, in _check_dims(dim, all_dims) 1129 if wrong_dims: 1130 wrong_dims_str = ", ".join(f"'{d!s}'" for d in wrong_dims) -> 1131 raise ValueError( 1132 f"Dimension(s) {wrong_dims_str} do not exist. Expected one or more of {all_dims}" 1133 )

ValueError: Dimension(s) 'y' do not exist. Expected one or more of {'x'} ```

Anything else we need to know?

No response

Environment

Newest main (i.e. 2024.01)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8679/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
748684119 MDU6SXNzdWU3NDg2ODQxMTk= 4601 Don't type check __getattr__? mathause 10194086 open 0     8 2020-11-23T10:41:21Z 2023-09-25T05:33:09Z   MEMBER      

In #4592 I had the issue that mypy did not raise an error on a missing method:

```python from xarray.core.common import DataWithCoords

hasattr(xr.core.common.DataWithCoords, "reduce") # -> False

def test(x: "DataWithCoords"): x.reduce() # mypy does not error ```

This is because DataWithCoords implements __getattr__:

```python

class A: pass

class B: def getattr(self, name): ...

def testA(x: "A"): x.reduce() # mypy errors

def testB(x: "B"): x.reduce() # mypy does not error ```

The solution seems to be to not typecheck __getattr__ (see https://github.com/python/mypy/issues/6251#issuecomment-457287161):

```python from typing import no_type_check

class C: @no_type_check def getattr(self, name): ...

def testC(x: "C"): x.reduce() # mypy errors ```

The only __getattr__ within xarray is here:

https://github.com/pydata/xarray/blob/17358922d480c038e66430735bf4c365a7677df8/xarray/core/common.py#L221

Using @no_type_check leads to 24 errors and not all of them can be trivially solved. E.g. DataWithCoords wants of use self.isel but does not implement the method. The solution is probably to add isel to DataWithCoords as an ABC or using NotImplemented.

Thoughts?

All errors

```python-traceback xarray/core/common.py:370: error: "DataWithCoords" has no attribute "isel" xarray/core/common.py:374: error: "DataWithCoords" has no attribute "dims" xarray/core/common.py:378: error: "DataWithCoords" has no attribute "indexes" xarray/core/common.py:381: error: "DataWithCoords" has no attribute "sizes" xarray/core/common.py:698: error: "DataWithCoords" has no attribute "_groupby_cls" xarray/core/common.py:761: error: "DataWithCoords" has no attribute "_groupby_cls" xarray/core/common.py:866: error: "DataWithCoords" has no attribute "_rolling_cls"; maybe "_rolling_exp_cls"? xarray/core/common.py:977: error: "DataWithCoords" has no attribute "_coarsen_cls" xarray/core/common.py:1108: error: "DataWithCoords" has no attribute "dims" xarray/core/common.py:1109: error: "DataWithCoords" has no attribute "dims" xarray/core/common.py:1133: error: "DataWithCoords" has no attribute "indexes" xarray/core/common.py:1144: error: "DataWithCoords" has no attribute "_resample_cls"; maybe "resample"? xarray/core/common.py:1261: error: "DataWithCoords" has no attribute "isel" xarray/core/alignment.py:278: error: "DataAlignable" has no attribute "copy" xarray/core/alignment.py:283: error: "DataAlignable" has no attribute "dims" xarray/core/alignment.py:286: error: "DataAlignable" has no attribute "indexes" xarray/core/alignment.py:288: error: "DataAlignable" has no attribute "sizes" xarray/core/alignment.py:348: error: "DataAlignable" has no attribute "dims" xarray/core/alignment.py:351: error: "DataAlignable" has no attribute "copy" xarray/core/alignment.py:353: error: "DataAlignable" has no attribute "reindex" xarray/core/alignment.py:356: error: "DataAlignable" has no attribute "encoding" xarray/core/weighted.py:157: error: "DataArray" has no attribute "notnull" xarray/core/dataset.py:3792: error: "Dataset" has no attribute "virtual_variables" xarray/core/dataset.py:6135: error: "DataArray" has no attribute "isnull" ```

Edit: one problem is certainly the method injection, as mypy cannot detect those types.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4601/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1371397741 I_kwDOAMm_X85Rvd5t 7027 don't apply `weighted`, `groupby`, etc. to `DataArray` without `dims`? mathause 10194086 open 0     1 2022-09-13T12:44:34Z 2023-08-26T19:13:39Z   MEMBER      

What is your issue?

Applying e.g. ds.weighted(weights).mean() applies the operation over all DataArray objects - even if they don't have the dimensions over which it is applied (or is a scalar variable). I don't think this is wanted.

```python import xarray as xr

air = xr.tutorial.open_dataset("air_temperature") air.attrs = {}

add variable without dims

air["foo"] = 5

print("resample") print(air.resample(time="MS").mean(dim="time").foo.dims)

print("groupby") print(air.groupby("time.year").mean(dim="time").foo.dims)

print("weighted") print(air.weighted(weights=air.time.dt.year).mean("lat").foo.dims)

print("where") print(air.where(air.air > 5).foo.dims) ```

Results resample ('time',) groupby ('year',) weighted ('time',)

Related #6952 - I am sure there are other issues, but couldn't find them quickly...

rolling and coarsen don't seem to do this.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7027/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1719805837 I_kwDOAMm_X85mgieN 7860 diff of cftime.Datetime mathause 10194086 open 0     3 2023-05-22T14:21:06Z 2023-08-04T12:01:33Z   MEMBER      

What happened?

A cftime variable returns a timedelta64[ns] when calling diff / + / - and it can then not be added/ subtracted from the original data.

What did you expect to happen?

We can add cftime timedeltas.

Minimal Complete Verifiable Example

```Python import xarray as xr

air = xr.tutorial.open_dataset("air_temperature", use_cftime=True)

air.time + air.time.diff("time") / 2 ```

MVCE confirmation

  • [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [x] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [x] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Python air.time.variable.values[1:] - air.time.variable.values[:-1] returns array([datetime.timedelta(seconds=21600), ...]) but then Python xr.Variable(("time",), np.array([datetime.timedelta(0)])) returns a dtype='timedelta64[ns]' array.

Anything else we need to know?

  • See upstream PR: xarray-contrib/cf-xarray#441
  • Similar to #7381 (but I don't think it's the same issue, feel free to close if you disagree)
  • That might need a special data type for timedeltas of cftime.Datetime objects, or allowing to add 'timedelta64[ns]' to cftime.Datetime objects
  • The casting comes from

https://github.com/pydata/xarray/blob/d8ec3a3f6b02a8b941b484b3d254537af84b5fde/xarray/core/variable.py#L366

https://github.com/pydata/xarray/blob/d8ec3a3f6b02a8b941b484b3d254537af84b5fde/xarray/core/variable.py#L272

Environment

INSTALLED VERSIONS ------------------ commit: d8ec3a3f6b02a8b941b484b3d254537af84b5fde python: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.14.21-150400.24.63-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.1 xarray: 2023.2.1.dev20+g06a87062 pandas: 1.5.3 numpy: 1.23.5 scipy: 1.10.1 netCDF4: 1.6.2 pydap: installed h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.13.6 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: 3.2.2 iris: 3.4.1 bottleneck: 1.3.6 dask: 2023.2.1 distributed: 2023.2.1 matplotlib: 3.7.0 cartopy: 0.21.1 seaborn: 0.12.2 numbagg: 0.2.2 fsspec: 2023.1.0 cupy: None pint: 0.20.1 sparse: 0.14.0 flox: 0.6.8 numpy_groupies: 0.9.20 setuptools: 67.4.0 pip: 23.0.1 conda: None pytest: 7.2.1 mypy: None IPython: 8.11.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7860/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
594669577 MDU6SXNzdWU1OTQ2Njk1Nzc= 3937 compose weighted with groupby, coarsen, resample, rolling etc. mathause 10194086 open 0     7 2020-04-05T22:00:40Z 2023-07-27T18:10:10Z   MEMBER      

It would be nice to make weighted work with groupby - e.g. #3935 (comment)

However, it is not entirely clear to me how that should be done. One way would be to do: python da.groupby(...).weighted(weights).mean() this would require that the groupby operation is applied over the weights (how would this be done?) Or should it be

python da.weighted(weights).groupby(...).mean() but this seems less intuitive to me.

Or python da.groupby(..., weights=weights).mean()

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3937/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1094725752 I_kwDOAMm_X85BQDB4 6142 dimensions: type as `str | Iterable[Hashable]`? mathause 10194086 open 0     14 2022-01-05T20:39:00Z 2022-06-26T11:57:40Z   MEMBER      

What happened?

We generally type dimensions as:

python dims: Hashable | Iterable[Hashable]

However, this is in conflict with passing a tuple of independent dimensions to a method - e.g. da.mean(("x", "y")) because a tuple is also hashable.

Also mypy requires an isinstance(dims, Hashable) check when typing a function. We use an isinstance(dims, str) check in many places to wrap a single dimension in a list. Changing this to isinstance(dims, Hashable) will change the behavior for tuples.

What did you expect to happen?

In the community call today we discussed to change this to

python dims: str | Iterable[Hashable]

i.e. if a single dim is passed it has to be a string and wrapping it in a list is a convenience function. Special use cases with Hashable types should be wrapped in a Iterable by the user. This probably best reflects the current state of the repo (dims = [dims] if isinstance(dims, str) else dims).

The disadvantage could be that it is a bit more difficult to explain in the docstrings?

@shoyer - did I get this right from the discussion?


Other options

  1. Require str as dimension names.

This could be too restrictive. @keewis mentioned that tuple dimension names are already used somwehere in the xarray repo. Also we discussed in another issue or PR (which I cannot find right know) that we want to keep allowing Hashable.

  1. Disallow passing tuples (only allow tuples if a dimension is a tuple), require lists to pass several dimensions.

This is too restrictive in the other direction and will probably lead to a lot of downstream troubles. Naming a single dimension with a tuple will be a very rare case, in contrast to passing several dimension names as a tuple.

  1. Special case tuples. We could potentially check if dims is a tuple and if there are any dimension names consisting of a tuple. Seems more complicated and potentially brittle for probably small gains (IMO).

Minimal Complete Verifiable Example

No response

Relevant log output

No response

Anything else we need to know?

  • We need to check carefully where general Hashable are really allowed. E.g. dims of a DataArray are typed as

https://github.com/pydata/xarray/blob/e056cacdca55cc9d9118c830ca622ea965ebcdef/xarray/core/dataarray.py#L380

but tuples are not actually allowed:

```python import xarray as xr xr.DataArray([1], dims=("x", "y"))

ValueError: different number of dimensions on data and dims: 1 vs 2

xr.DataArray([1], dims=[("x", "y")])

TypeError: dimension ('x', 'y') is not a string

```

  • We need to be careful typing functions where only one dim is allowed, e.g. xr.concat, which should probably set dim: Hashable (and make sure it works).
  • Do you have examples for other real-world hashable types except for str and tuple? (Would be good for testing purposes).

Environment

N/A

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6142/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
685739084 MDU6SXNzdWU2ODU3MzkwODQ= 4375 allow using non-dimension coordinates in polyfit mathause 10194086 open 0     1 2020-08-25T19:40:55Z 2022-04-09T02:58:48Z   MEMBER      

polyfit currently only allows to fit along a dimension and not along a non-dimension coordinate (or a virtual coordinate)

Example: ```python da = xr.DataArray( [1, 3, 2], dims=["x"], coords=dict(x=["a", "b", "c"], y=("x", [0, 1, 2])) )

print(da)

da.polyfit("y", 1) Output:python <xarray.DataArray (x: 3)> array([1, 3, 2]) Coordinates: * x (x) <U1 'a' 'b' 'c' y (x) int64 0 1 2


KeyError Traceback (most recent call last) <ipython-input-80-9bb2dacf50f7> in <module> 5 print(da) 6 ----> 7 da.polyfit("y", 1)

~/.conda/envs/ipcc_ar6/lib/python3.7/site-packages/xarray/core/dataarray.py in polyfit(self, dim, deg, skipna, rcond, w, full, cov) 3507 """ 3508 return self._to_temp_dataset().polyfit( -> 3509 dim, deg, skipna=skipna, rcond=rcond, w=w, full=full, cov=cov 3510 ) 3511

~/.conda/envs/ipcc_ar6/lib/python3.7/site-packages/xarray/core/dataset.py in polyfit(self, dim, deg, skipna, rcond, w, full, cov) 6005 skipna_da = skipna 6006 -> 6007 x = get_clean_interp_index(self, dim, strict=False) 6008 xname = "{}_".format(self[dim].name) 6009 order = int(deg) + 1

~/.conda/envs/ipcc_ar6/lib/python3.7/site-packages/xarray/core/missing.py in get_clean_interp_index(arr, dim, use_coordinate, strict) 246 247 if use_coordinate is True: --> 248 index = arr.get_index(dim) 249 250 else: # string

~/.conda/envs/ipcc_ar6/lib/python3.7/site-packages/xarray/core/common.py in get_index(self, key) 378 """ 379 if key not in self.dims: --> 380 raise KeyError(key) 381 382 try:

KeyError: 'y' ```

Describe the solution you'd like

Would be nice if that worked.

Describe alternatives you've considered

One could just set the non-dimension coordinate as index, e.g.: da = da.set_index(x="y")

Additional context

Allowing this may be as easy as replacing

https://github.com/pydata/xarray/blob/9c85dd5f792805bea319f01f08ee51b83bde0f3b/xarray/core/missing.py#L248

by index = arr[dim] but I might be missing something. Or probably a use_coordinate must be threaded through to get_clean_interp_index (although I am a bit confused by this argument).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4375/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
310833761 MDU6SXNzdWUzMTA4MzM3NjE= 2037 to_netcdf -> _fill_value without NaN mathause 10194086 open 0     8 2018-04-03T13:20:19Z 2022-03-10T10:59:17Z   MEMBER      

Code Sample, a copy-pastable example if possible

```python

Your code here

import xarray as xr import numpy as np x = np.arange(10.) da = xr.Dataset(data_vars=dict(data=('dim1', x)), coords=dict(dim1=('dim1', x))) da.to_netcdf('tst.nc')

```

Problem description

Apologies if this was discussed somwhere and it probably does not matter much, but tst.nc has _FillValue although it is not really necessary.

Output of xr.show_versions()

# Paste the output here xr.show_versions() here
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2037/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1150251120 I_kwDOAMm_X85Ej3Bw 6304 add join argument to xr.broadcast? mathause 10194086 open 0     1 2022-02-25T09:52:14Z 2022-02-25T21:50:16Z   MEMBER      

Is your feature request related to a problem?

xr.broadcast always does an outer join:

https://github.com/pydata/xarray/blob/de965f342e1c9c5de92ab135fbc4062e21e72453/xarray/core/alignment.py#L702

https://github.com/pydata/xarray/blob/de965f342e1c9c5de92ab135fbc4062e21e72453/xarray/core/alignment.py#L768

This is not how the (default) broadcasting (arithmetic join) works, e.g. the following first does an inner join and then broadcasts:

```python import xarray as xr

da1 = xr.DataArray([[0, 1, 2]], dims=("y", "x"), coords={"x": [0, 1, 2]}) da2 = xr.DataArray([0, 1, 2, 3, 4], dims="x", coords={"x": [0, 1, 2, 3, 4]}) da1 + da2 ```

<xarray.DataArray (y: 1, x: 3)> array([[0, 2, 4]]) Coordinates: * x (x) int64 0 1 2 Dimensions without coordinates: y

Describe the solution you'd like

Add a join argument to xr.broadcast. I would propose to leave the default as is

python def broadcast(*args, exclude=None, join="outer"): args = align(*args, join=join, copy=False, exclude=exclude)

Describe alternatives you've considered

  • We could make broadcast respect options -> arithmetic_join but that would be a breaking change and I am not sure how the deprecation should/ would be handled...
  • We could leave it as is.

Additional context

  • xr.broadcast should not be used often because this is should happen automatically in most cases
  • in #6059 I use broadcast because I couldn't get it to work otherwise (maybe there is a better way?). However, the "outer elements" are immediately discarded again - so it's kind of pointless to do an outer join.

```python import numpy as np import xarray as xr

da = xr.DataArray(np.arange(6).reshape(3, 2), coords={"dim_0": [0, 1, 2]}) w = xr.DataArray([1, 1, 1, 1, 1, 1], coords={"dim_0": [0, 1, 2, 4, 5, 6]}) da.weighted(w).quantile(0.5) ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6304/reactions",
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
307783090 MDU6SXNzdWUzMDc3ODMwOTA= 2007 rolling: allow control over padding mathause 10194086 open 0     20 2018-03-22T19:27:07Z 2021-07-14T19:10:47Z   MEMBER      

Code Sample, a copy-pastable example if possible

```python import numpy as np import xarray as xr

x = np.arange(1, 366) y = np.random.randn(365) ds = xr.DataArray(y, dims=dict(dayofyear=x))

ds.rolling(center=True, dayofyear=31).mean() ```

Problem description

rolling cannot directly handle periodic boundary conditions (lon, dayofyear, ...), but could be very helpful to e.g. calculate climate indices. Also I cannot really think of an easy way to append the first elements to the end of the dataset and then calculate rolling.

Is there a way to do this? Should xarray support this feature?

This might also belong to SO...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2007/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
788534915 MDU6SXNzdWU3ODg1MzQ5MTU= 4824 combine_by_coords can succed when it shouldn't mathause 10194086 open 0     15 2021-01-18T20:39:29Z 2021-07-08T17:44:38Z   MEMBER      

What happened:

combine_by_coords can succeed when it should not - depending on the name of the dimensions (which determines the order of operations in combine_by_coords).

What you expected to happen:

  • I think it should throw an error in both cases.

Minimal Complete Verifiable Example:

```python import numpy as np import xarray as xr

data = np.arange(5).reshape(1, 5) x = np.arange(5) x_name = "lat"

da0 = xr.DataArray(data, dims=("t", x_name), coords={"t": [1], x_name: x}).to_dataset(name="a") x = x + 1e-6 da1 = xr.DataArray(data, dims=("t", x_name), coords={"t": [2], x_name: x}).to_dataset(name="a") ds = xr.combine_by_coords((da0, da1))

ds ```

returns: python <xarray.Dataset> Dimensions: (lat: 10, t: 2) Coordinates: * lat (lat) float64 0.0 1e-06 1.0 1.0 2.0 2.0 3.0 3.0 4.0 4.0 * t (t) int64 1 2 Data variables: a (t, lat) float64 0.0 nan 1.0 nan 2.0 nan ... 2.0 nan 3.0 nan 4.0 Thus lat is interlaced - it don't think combine_by_coords should do this. If you set

python x_name = "lat" and run the example again, it returns:

```python-traceback ValueError: Resulting object does not have monotonic global indexes along dimension x

```

Anything else we need to know?:

  • this is vaguely related to #4077 but I think it is separate
  • combine_by_coords concatenates over all dimensions where the coords are different - therefore compat="override" doesn't actually do anything? Or does it?

https://github.com/pydata/xarray/blob/ba42c08af9afbd9e79d47bda404bf4a92a7314a0/xarray/core/combine.py#L69

cc @dcherian @TomNicholas

Environment:

Output of <tt>xr.show_versions()</tt>
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4824/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
773750763 MDU6SXNzdWU3NzM3NTA3NjM= 4727 xr.testing.assert_equal does not test for dtype mathause 10194086 open 0     5 2020-12-23T13:14:41Z 2021-07-04T04:08:51Z   MEMBER      

In #4622 @toddrjen points out that xr.testing.assert_equal does not test for the dtype, only for the value. Therefore the following does not raise an error:

```python import numpy as np import xarray as xr import pandas as pd

xr.testing.assert_equal( xr.DataArray(np.array(1, dtype=int)), xr.DataArray(np.array(1, dtype=float)) ) xr.testing.assert_equal( xr.DataArray(np.array(1, dtype=int)), xr.DataArray(np.array(1, dtype=object)) ) xr.testing.assert_equal( xr.DataArray(np.array("a", dtype=str)), xr.DataArray(np.array("a", dtype=object)) ) ```

This comes back to numpy, i.e. the following is True:

python np.array(1, dtype=int) == np.array(1, dtype=float)

Depending on the situation one or the other is desirable or not. Thus, I would suggest to add a check_dtype argument to xr.testing.assert_equal and also to DataArray.equals (and Dataset and Variable and identical). I have not seen such an option in numpy, but pandas has it (e.g. pd.testing.assert_series_equal(left, right, check_dtype=True, ...). I would not change __eq__.

  • Thoughts?
  • What should the default be? We could try True first and see how many failures this creates?
  • What to do with coords and indexes? pd.testing.assert_series_equal has a check_index_type keyword. Probably we need check_coords_type as well? This makes the whole thing much more complicated... Also #4543
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4727/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
559217441 MDU6SXNzdWU1NTkyMTc0NDE= 3744 Contour with vmin/ vmax differs from matplotlib mathause 10194086 open 0     0 2020-02-03T17:11:24Z 2021-07-04T02:03:02Z   MEMBER      

MCVE Code Sample

```python import numpy as np import xarray as xr import matplotlib as mpl import matplotlib.pyplot as plt

data = xr.DataArray(np.arange(24).reshape(4, 6))

data.plot.contour(vmax=10, add_colorbar=True) ```

Expected Output

python h = plt.contour(data.values, vmax=10) plt.colorbar(h)

Problem Description

A contour(vmax=vmax) plot differs between xarray and matplotlib. I think the problem is here:

https://github.com/pydata/xarray/blob/95e4f6c7a636878c94b892ee8d49866823d0748f/xarray/plot/utils.py#L265

xarray calculates the levels from vmax while matplotlib (probably) calculates the levels from data.max() and uses vmax only for the norm. For contourf and pcolormesh this is not so relevant as the capped values are then drawn with the over color. However, there may also be a good reason for this behavior.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: 4c96d53e6caa78d56b785f4edee49bbd4037a82f python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.12.14-lp151.28.36-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.6.2 xarray: 999 (master) pandas: 0.25.3 numpy: 1.17.3 scipy: 1.4.1 netCDF4: 1.5.1.2 pydap: installed h5netcdf: 0.7.4 h5py: 2.10.0 Nio: 1.5.5 zarr: 2.4.0 cftime: 1.0.4.2 nc_time_axis: 1.2.0 PseudoNetCDF: installed rasterio: 1.1.0 cfgrib: 0.9.7.6 iris: 2.2.0 bottleneck: 1.3.1 dask: 2.9.2 distributed: 2.9.2 matplotlib: 3.1.2 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: installed setuptools: 45.0.0.post20200113 pip: 19.3.1 conda: None pytest: 5.3.3 IPython: 7.11.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3744/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
587048587 MDU6SXNzdWU1ODcwNDg1ODc= 3883 weighted operations: performance optimisations mathause 10194086 open 0     3 2020-03-24T15:31:54Z 2021-07-04T02:01:28Z   MEMBER      

There was a discussion on the performance of the weighted mean/ sum in terms of memory footprint but also speed, and there may indeed be some things that can be optimized. See the posts at the end of the PR. However, the optimal implementation will probably depend on the use case and some profiling will be required.

I'll just open an issue to keep track of this. @seth-p

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3883/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
806218687 MDU6SXNzdWU4MDYyMTg2ODc= 4892 disallow boolean coordinates? mathause 10194086 open 0     2 2021-02-11T09:33:17Z 2021-03-31T10:30:49Z   MEMBER      

Today I stumbled over a small pitfall, which I think could be avoided:

I am working with arrays that have axes labeled with categorical values and I ended up using True/False as labels for some binary categories:

python test = xarray.DataArray( numpy.ones((3,2)), dims=["binary","ternary"], coords={"ternary":[3,7,9],"binary":[False,True]} ) now came the big surprise, when I wanted to reduce over selections of the data:

test.sel(ternary=[9,3,7]) # does exactly what I expect and gives me the correctly permuted 3x2 array test.sel(binary=[True,False]) # does not do what I expect Instead of using the coordinate values like with the ternary category, it uses the list as boolean mask and hence I get a 3x1 array at the binary=False coordinate.

I assume that this behavior is reasonable in most cases - And I for sure will stop using bools as binary category labels. That said in the above case the conceptually identical call results in completely different outcome.

My (radical) proposal would be: forbid binary coordinates in general to avoid such confusion.

Curious about your thoughts! Hth,

Marti

Originally posted by @martinitus in https://github.com/pydata/xarray/discussions/4861

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4892/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
683777199 MDU6SXNzdWU2ODM3NzcxOTk= 4364 plt.pcolormesh will infer interval breaks per default mathause 10194086 open 0     3 2020-08-21T19:15:57Z 2021-03-19T14:09:52Z   MEMBER      

Looking at some warnings in #3266 I saw that matplotlib will deprecate the old behaviour of pcolormesh when the shape of the data and the coordinates are equal (they silently cut a row and a column of the data). With the new behaviour they will interpolate the coordinates.

```python import numpy as np import matplotlib.pyplot as plt

x = np.array([1, 2, 3]) y = np.array([1, 2, 3, 4, 5])

data = np.random.randn(*y.shape + x.shape)

f, axes = plt.subplots(1, 2)

for ax, shading, behavior in zip(axes, ["flat", "nearest"], ["old", "new"]): ax.pcolormesh(x, y, data, shading=shading, vmin=-0.75, vmax=0.75) ax.set_title(f"{behavior}: shading='{shading}'") ```

This is a good thing in general - we already do this for a long time with the infer_intervals keyword. Unfortunately they don't check if the data is monotonic (matplotlib/matplotlib#18317) which can lead to problems for maps (scitools/cartopy#1638). I don't think there is a need to do something right now - let's see what they think upstream.

This change was introduced in mpl 3.3.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4364/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
802992417 MDU6SXNzdWU4MDI5OTI0MTc= 4875 assigning values with incompatible dtype mathause 10194086 open 0     0 2021-02-07T16:28:24Z 2021-02-07T16:28:24Z   MEMBER      

The behavior of xarray when assigning values with incompatible dtypes is a bit arbitrary. This is partly due to the behavior of numpy.... numpy 1.20 got a bit cleverer but still seems inconsistent at times... I am not sure what to do about this (and if we should actually be clever here).

  1. Direct assignment (dupe of #4612)

```python import xarray as xr import numpy as np

arr = np.array([2])

arr[0] = np.nan

ValueError (since numpy 1.20)

arr[0:1] = np.array([np.nan])

-> array([-9223372036854775808])

da = xr.DataArray([5], dims="x")

da[0] = np.nan

<xarray.DataArray (x: 1)>

array([-9223372036854775808])

Dimensions without coordinates: x

(because this gets converted to da.variable._data[0:1, 0:1] = np.array([np.nan]) (approximately).

da[0] = 1.2345

casts constant_values to int

```

  1. Via a numpy function (pad, shift, rolling)

pad

```python da.pad(x=1, constant_values=np.nan)

ValueError: cannot convert float NaN to integer

da.pad(x=1, constant_values=None)

casts da to float

da.pad(x=1, constant_values=1.5)

casts constant_values to int

```

**shift** ```python da.shift(x=1, fill_value=np.nan) # ValueError: cannot convert float NaN to integer # da.shift(x=1, fill_value=None) # None not allowed by shift da.shift(x=1, fill_value=1.5) # casts fill_value to int ``` **rolling** ```python da.rolling(x=1).construct("new_axis", stride=3, fill_value=np.nan) # ValueError: cannot convert float NaN to integer # da.rolling(x=1).construct("new_axis", stride=3, fill_value=None) # None not allowed by rolling da.rolling(x=3).construct("new_axis", stride=3, fill_value=1.5) # casts fill_value to int ```

To check:

  • What does dask do in these cases?
  • What does pandas do?
  • What about str dtypes?
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4875/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 58.144ms · About: xarray-datasette