home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

29 rows where type = "issue" and user = 43316012 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: title, comments, state_reason, created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 20
  • open 9

type 1

  • issue · 29 ✖

repo 1

  • xarray 29
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1462173557 I_kwDOAMm_X85XJv91 7316 Support for python 3.11 headtr1ck 43316012 closed 0     2 2022-11-23T17:52:18Z 2024-03-15T06:07:26Z 2024-03-15T06:07:26Z COLLABORATOR      

Is your feature request related to a problem?

Now that python 3.11 has been released, we should start to support it officially.

Describe the solution you'd like

I guess the first step would be to replace python 3.10 as a maximum version in the tests and see what crashes (and get lucky).

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7316/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2038622503 I_kwDOAMm_X855gukn 8548 Shaping the future of Backends headtr1ck 43316012 open 0     3 2023-12-12T22:08:50Z 2023-12-15T17:14:59Z   COLLABORATOR      

What is your issue?

Backends in xarray are used to read and write files (or in general objects) and transform them into useful xarray Datasets.

This issue will collect ideas on how to continuously improve them.

Current state

Along the reading and writing process there are many implicit and explicit configuration possibilities. There are many backend specific options and many en-,decoder specific options. Most of them are currently difficult or even impossible to discover.

There is the infamous open_dataset method which can do everything, but there are also some specialized methods like open_zarr or to_netcdf.

The only really formalized way to extend xarray capabilities is via the BackendEntrypoint. Currently only for reading files. This has proven to work and things are going so well that people are discussing getting rid of the special reading methods (#7495). A major critique in this thread is again the discoverability of configuration options.

Problems

To name a few:

  • Discoverability of configuration options is poor
  • No distinction between backend and encoding options
  • New options are simply added as another keyword argument to open_dataset
  • No writing support for backends

What already improved

  • Adding URL and description attributes to the backends (#7000, #7200)
  • Add static typing
  • Allow creating instances of backends with their respective options (#8520)

The future

After listing all the problems, lets see how we can improve the situation and make backends an allrounder solution to reading and writing all kinds of files.

What happens behind the scenes

In general the reading and writing of Datasets in xarray is a three-step process.

[ done by backend.open_dataset] Dataset < chunking < decoding < opening_in_store < file Dataset > validating > encoding > storing_in_store > file Probably you could consider combining the chunking and decoding as well as validation and encoding into a single logical step in the pipeline. This view should help decide how to set up a future architecture of backends.

You can see that there is a common middle object in this process, a in-memory representation of the file on disc between en-, decoding and the abstract store. This is actually a xarray.Dataset and is internally called a "backend dataset".

write_dataset method

A quite natural extension of backends would be to implement a write_dataset method (name pending). This would allow backends to fulfill the complete right side of the pipeline.

Transformer class

Due to a lack of a common word for a class that handles "encoding" and "decoding" I will call them transformer here.

The process of en- and decoding is currently done "hardcoded" by the respective open_dataset and to_netcdf methods. One could imagine to introduce the concept of a common class that handles both.

This class could handle the implemented CF or netcdf encoding conventions. But it would also allow users to define their own storing conventions (Why not create a custom transformer that adds indexes based on variable attributes?) The possibilities are endless, and an interface that fulfills all the requirements still has to be found.

This would homogenize the reading and writing process to Dataset <> Transformer <> Backend <> file As a bonus this would increase discoverability of the configuration options of the decoding options (then transformer arguments).

The new interface then could be python backend = Netcdf4BackendEntrypoint(group="data") decoder = CFTransformer(cftime=True) ds = xr.open_dataset("file.nc", engine=backend, decoder=decoder) while of course still allowing to pass all options simply as kwarg (since this is still the easiest way of telling beginners how to open files)

The final improvement here would be to add additional entrypoints for these transformers ;)

Disclaimer

Now this issue is just a bunch of random ideas that require quite some refinement or they might even turn out to be nonsense. So lets have a exciting discussion about these things :) If you have something to add to the above points I will include your ideas as well. This is meant as a collection of ideas on how to improve our backends :)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8548/reactions",
    "total_count": 5,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2034528244 I_kwDOAMm_X855RG_0 8537 Doctests failing headtr1ck 43316012 closed 0     1 2023-12-10T20:49:43Z 2023-12-11T21:00:03Z 2023-12-11T21:00:03Z COLLABORATOR      

What is your issue?

The doctest is currently failing with

E UserWarning: h5py is running against HDF5 1.14.3 when it was built against 1.14.2, this may cause problems

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8537/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1936080078 I_kwDOAMm_X85zZjzO 8291 `NamedArray.shape` does not support unknown dimensions headtr1ck 43316012 closed 0     1 2023-10-10T19:36:42Z 2023-10-18T06:22:54Z 2023-10-18T06:22:54Z COLLABORATOR      

What is your issue?

According to the array api standard, the shape property returns tuple[int | None, ...]. Currently we only support tuple[int, ...]

This will actually raise some errors if a duckarray actually returns some None. E.g. NamedArray.size will fail.

(On a side note: dask arrays actually use NaN instead of None for some reason.... Only advantage of this is that NamedArray.size will actually also return NaN instead of raising...)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8291/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1915876808 I_kwDOAMm_X85yMfXI 8236 DataArray with multiple (Pandas)Indexes on the same dimension is impossible to align headtr1ck 43316012 closed 0     3 2023-09-27T15:52:05Z 2023-10-02T06:53:27Z 2023-10-01T07:19:09Z COLLABORATOR      

What happened?

I have a DataArray with a single dimension and multiple (Pandas)Indexes assigned to various coordinates for efficient indexing using sel.

Edit: the problem is even worse than originally described below: such a DataArray breaks all alignment and it's basically unusable...


When I try to add an additional coordinate without any index (I simply use the tuple[dimension, values] way) I get a ValueError about aligning with conflicting indexes.

If the original DataArray only has a single (Pandas)Index everything works as expected.

What did you expect to happen?

I expected that I can simply assign new coordinates without an index.

Minimal Complete Verifiable Example

```Python import xarray as xr

da = xr.DataArray( [1, 2, 3], dims="t", coords={ "a": ("t", [3, 4, 5]), "b": ("t", [5, 6, 7]) } )

set one index

da2 = da.set_xindex("a")

set second index (same dimension, maybe thats a problem?)

da3 = da2.set_xindex("b")

this works

da2.coords["c"] = ("t", [2, 3, 4])

this does not

da3.coords["c"] = ("t", [2, 3, 4]) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

ValueError: cannot re-index or align objects with conflicting indexes found for the following dimensions: 't' (2 conflicting indexes) Conflicting indexes may occur when - they relate to different sets of coordinate and/or dimension names - they don't have the same type - they may be used to reindex data along common dimensions

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.10 (main, Mar 21 2022, 13:08:11) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.66.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.0 xarray: 2022.12.0 pandas: 2.0.2 numpy: 1.24.3 scipy: 1.10.0 netCDF4: 1.6.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.6.3 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 58.1.0 pip: 21.2.4 conda: None pytest: 7.3.2 mypy: 1.0.0 IPython: 8.8.0 sphinx: None

I have not yet tried this with a newer version of xarray....

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8236/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  not_planned xarray 13221727 issue
1899895419 I_kwDOAMm_X85xPhp7 8199 Use Generic Types instead of Hashable or Any headtr1ck 43316012 open 0     2 2023-09-17T19:41:39Z 2023-09-18T14:16:02Z   COLLABORATOR      

Is your feature request related to a problem?

Currently, part of the static type of a DataArray or Dataset is a Mapping[Hashable, DataArray]. I'm quite sure that 99% of the users will actually use str key values (aka. variable names), while some exotic people (me included) want to use e.g. Enums for their keys. Currently, we allow to use anything as keys as long as it is hashable, but once the DataArray/set is created, the type information of the keys is lost.

Consider e.g. ```python

for name, da in Dataset({"a": ("t", np.arange(5))}).items(): reveal_type(name) # hashable reveal_type(da.dims) # tuple[hashable, ...] `` Woudn't that be nice if this would actually returnstr`, so you don't have to cast it or assert it everytime?

This could be solved by making these classes generic.

Another related issue is the underlying data. This could be introduced as a Generic type as well. Probably, this should reach some common ground on all wrapping array libs that are out there. Every one should use a Generic Array class that keeps track of the type of the wrapped array, e.g. dask.array.core.Array[np.ndarray]. In return, we could do DataArray[np.ndarray] or then DataArray[dask.array.core.Array[nd.ndarray]].

Describe the solution you'd like

The implementation would be something along the lines of:

```python KeyT = TypeVar("KeyT", bound=Hashable) DataT = TypeVar("DataT", bound=<some protocol?>)

class DataArray(Generic[KeyT, DataT]):

_coords: dict[KeyT, Variable[DataT]]
_indexes: dict[KeyT, Index[DataT]]
_name: KeyT | None
_variable: Variable[DataT]

def __init__(
    self,
    data: DataT = dtypes.NA,
    coords: Sequence[Sequence[DataT] | pd.Index | DataArray[KeyT]]
    | Mapping[KeyT, DataT]
    | None = None,
    dims: str | Sequence[KeyT] | None = None,
    name: KeyT | None = None,
    attrs: Mapping[KeyT, Any] | None = None,
    # internal parameters
    indexes: Mapping[KeyT, Index] | None = None,
    fastpath: bool = False,
) -> None:
...

```

Now you could create a "classical" DataArray: ```python da = DataArray(np.arange(10), {"t": np.arange(10)}, dims=["t"])

will be of type

DataArray[str, np.ndarray]

while you could also create something more fancypython da2 = DataArray(dask.array.array([1, 2, 3]), {}, dims=[("tup1", "tup2),])

will be of type

DataArray[tuple[str, str], dask.array.core.Array]

``` Any whenever you access the dimensions / coord names / underlying data you will get the correct type.

For now I only see three mayor problems: 1) non-array types (like lists or anything iterable) will get cast to a np.ndarray and I have no idea how to tell the type checker that DataArray([1, 2, 3], {}, "a") should be DataArray[str, np.ndarray] and not DataArray[str, list[int]]. Depending on the Protocol in the bound TypeVar this might even fail static type analysis or require tons of special casing and overloads. 2) How does the type checker extract the dimension type for Datasets? This is quite convoluted and I am not sure this can be typed correctly... 3) The parallel compute workflows are quite dynamic and I am not sure if static type checking can keep track of the underlying datatype... What does DataArray([1, 2, 3], dims="a").chunk({"a": 2}) return? Is it DataArray[str, dask.array.core.Array]? But what about other chunking frameworks?

Describe alternatives you've considered

One could even extend this and add more Generic types.

Different types for dimensions and variable names would be a first (and probably quite a nice) feature addition.

One could even go so far and type the keys and values of variables and coords (for Datasets) differently. This came up e.g. in https://github.com/pydata/xarray/issues/3967 However, this would create a ridiculous amount of Generic types and is probably more confusing than helpful.

Additional context

Probably this feature should be done in consecutive PRs that each implement one Generic each, otherwise this will be a giant task!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8199/reactions",
    "total_count": 5,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1275752720 I_kwDOAMm_X85MCnEQ 6704 Future of `DataArray.rename` headtr1ck 43316012 open 0     11 2022-06-18T10:14:43Z 2023-09-11T00:53:31Z   COLLABORATOR      

What is your issue?

In https://github.com/pydata/xarray/pull/6665 the question came up what to do with DataArray.rename in light of the new index refactor.

To be consistent with Dataset we should introduce a

  • DataArray.rename_dims
  • DataArray.rename_vars
  • DataArray.rename

Several open questions about the behavior (Similar things apply to Dataset.rename{, _dims, _vars}):

  • [ ] Should rename_dims also rename indexes (dimension coordinates)?
  • [ ] Should rename_vars also rename the DataArray?
  • [ ] What to do if the DataArray has the same name as one of its coordinates?
  • [ ] Should rename still rename everything (like it is now) or only the name (Possibly with some deprecation cycle)?

The current implementation of DataArray.rename is a bit inconsistent:

As stated by @max-sixty in https://github.com/pydata/xarray/issues/6665#issuecomment-1154368202_: - rename operates on DataArray as described in https://github.com/pydata/xarray/pull/6665#issuecomment-1150810485.%C2%A0Generally I'm less keen on "different types have different semantics", and here a positional arg would mean a DataArray rename, and kwarg would mean var rename. But it does work locally to DataArray quite well. - rename only exists on DataArrays for the name of the DataArray, and we use rename_vars & rename_dims for both DataArrays & Datasets. So Dataset.rename is soft-deprecated.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6704/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1401066481 I_kwDOAMm_X85TgpPx 7141 Coverage shows reduced value since mypy flag was added headtr1ck 43316012 closed 0     3 2022-10-07T12:01:15Z 2023-08-30T18:47:35Z 2023-08-30T18:47:35Z COLLABORATOR      

What is your issue?

The coverage was reduced from ~94% to ~68% after merging #7126 See https://app.codecov.io/gh/pydata/xarray or our badge

I think this is because the unittests never included the tests directory while mypy does. And codecov uses the sum of both coverage reports to come up with its number.

Adding the flag to the badge also does not seem to help?

Not sure how or even if that is possible to solve, maybe we need to ask in codecov?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7141/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1648748263 I_kwDOAMm_X85iRebn 7703 Readthedocs build failing headtr1ck 43316012 closed 0     3 2023-03-31T06:20:53Z 2023-03-31T15:45:10Z 2023-03-31T15:45:10Z COLLABORATOR      

What is your issue?

It seems that the readthedocs build is failing since some upstream update. pydata-sphinx-theme seems to be incompatible with the sphinx-book-theme.

Maybe we have to pin to a specific or a maximum version for now.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7703/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1419825696 I_kwDOAMm_X85UoNIg 7199 Deprecate cfgrib backend headtr1ck 43316012 closed 0     4 2022-10-23T15:09:14Z 2023-03-29T15:19:53Z 2023-03-29T15:19:53Z COLLABORATOR      

What is your issue?

Since cfgrib 0.9.9 (04/2021) it comes with its own xarray backend plugin (looks mainly like a copy of our internal version). We should deprecate our internal plugin.

The deprecation is complicated since we usually bind the minimum version to a minor step, but cfgrib seems to be on 0.9 since 4 years already. Maybye an exception like for netCDF4?

Anyway, if we decide to leave it as it is for now, this ticket is just a reminder to remove it someday :)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7199/reactions",
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1603831809 I_kwDOAMm_X85fmIgB 7572 `test_open_nczarr` failing headtr1ck 43316012 closed 0     3 2023-02-28T21:20:22Z 2023-03-02T16:49:25Z 2023-03-02T16:49:25Z COLLABORATOR      

What is your issue?

In the latest CI runs it seems that test_backends.py::TestNCZarr::test_open_nczarr is failing with

KeyError: 'Zarr object is missing the attribute _ARRAY_DIMENSIONS and the NCZarr metadata, which are required for xarray to determine variable dimensions.'

I don't see an obvious reason for this, especially since the zarr version has not changed compared to some runs that were successful (2.13.6).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7572/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1361246796 I_kwDOAMm_X85RIvpM 6985 FutureWarning for pandas date_range headtr1ck 43316012 closed 0     1 2022-09-04T20:35:17Z 2023-02-06T17:51:48Z 2023-02-06T17:51:48Z COLLABORATOR      

What is your issue?

Xarray raises a FutureWarning in its date_range, also observable in your tests. The precise warning is:

xarray/coding/cftime_offsets.py:1130: FutureWarning: Argument closed is deprecated in favor of inclusive.

You should discuss if you will adapt the new inclusive argument or add a workaround.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6985/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1548948097 I_kwDOAMm_X85cUxKB 7457 Typing of internal datatypes headtr1ck 43316012 open 0     5 2023-01-19T11:08:43Z 2023-01-19T19:49:19Z   COLLABORATOR      

Is your feature request related to a problem?

Currently there is no static typing of the underlying data structures used in DataArrays. Simply running reveal_type(da.data) returns Any.

Adding static typing support to that is unfortunately non-trivial since xarray supports a wide variety of duck-types.

This also comes with internal typing difficulties.

Describe the solution you'd like

I think the way to go is making the DataArray class generic in it's underlying data type. Something like DataArray[np.ndarray] or DataArray[dask.array].

The implementation would require a TypeVar that is bound to some minimal required Protocol for internal consistency (I think at least it needs dtype and shape attributes).

Datasets would have to be typed the same way, this means only one datatype for all variables is possible, when you mix it it will fall back to the common ancestor which will be the before mentioned protocol. This is basically the same restriction that a dict has.

Now to the main issue that I see with this approach: I don't know how to type coordinates. They have the same problems than mentioned above for Datasets. I think it is very common to have dask arrays in the variables but simple numpy arrays in the coordinates, so either one excludes them from the typing or in such cases the common generic typing falls back to the protocol again. Not sure what is the best approach here.

Describe alternatives you've considered

Since the most common workflow for beginners and intermediate-advanced users is to stick with the DataArrays themself and never touch the underlying data, I am not sure if this change is as beneficial as I want it to be. Maybe it just complicates things and leaving it as Any is easier to solve for advanced users that then have to cast or ignore this.

Additional context

It came up in this discussion: https://github.com/pydata/xarray/pull/7020#discussion_r972617770_

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7457/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1464905814 I_kwDOAMm_X85XULBW 7322 Doctests failing headtr1ck 43316012 closed 0     4 2022-11-25T20:20:29Z 2022-11-28T19:31:04Z 2022-11-28T19:31:04Z COLLABORATOR      

What is your issue?

It seems that some update in urllib3 causes our doctests to fail.

The reason seems to be that botocore uses an interesting construction to import deprecated urllib3 things: python try: # pyopenssl will be removed in urllib3 2.0, we'll fall back to ssl_ at that point. # This can be removed once our urllib3 floor is raised to >= 2.0. with warnings.catch_warnings(): warnings.simplefilter("ignore", category=DeprecationWarning) # Always import the original SSLContext, even if it has been patched from urllib3.contrib.pyopenssl import ( orig_util_SSLContext as SSLContext, ) except ImportError: from urllib3.util.ssl_ import

I assume that this fails because we use -Werror which translates the warning into an error which then is not ignored...

Not sure if this is an issue with botocore or we have to catch this?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7322/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1347715262 I_kwDOAMm_X85QVIC- 6949 Plot accessors miss static typing headtr1ck 43316012 closed 0     0 2022-08-23T10:38:56Z 2022-10-16T09:26:55Z 2022-10-16T09:26:55Z COLLABORATOR      

What happened?

The plot accessors i.e. dataarray.plot of type _PlotMethods are missing static typing especially of function attributes. See #6947 for an example.

The problem is that many plotting methods are added using hooks via decorators, something that mypy does not understand.

What did you expect to happen?

As a quick fix: type the plot accessors as _PlotMethods | Any to avoid false positives in mypy.

Better to either restructure the accessor with static methods instead of hooks or figure out another way of telling static type checkers about these methods.

Anyway: mypy should not complain.

Minimal Complete Verifiable Example

```Python import xarray as xr

da = xr.DataArray([[1,2,3], [4,5,6]], dims=["x", "y"]) da.plot.contourf(x="x", y="y")

mypy complains:

error: "_PlotMethods" has no attribute "contourf"

```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

On mobile, can edit it later if required. Newest xarray should have this problem, before the accessor was Any.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6949/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1388372090 I_kwDOAMm_X85SwOB6 7094 Align typing of dimension inputs headtr1ck 43316012 open 0     5 2022-09-27T20:59:17Z 2022-10-13T18:02:16Z   COLLABORATOR      

What is your issue?

Currently the input type for "one or more dims" is changing from function to function. There are some open PRs that move to str | Iterable[Hashable] which allows the use of tuples as dimensions.

Some changes are still required: - [ ] Accept None in all functions that accept dims as default, this would simplify typing alot (see https://github.com/pydata/xarray/pull/7048#discussion_r973813607) - [ ] Check if we can always include ellipsis "..." in dim arguments (see https://github.com/pydata/xarray/pull/7048#pullrequestreview-1111498309) - [ ] Iterable[Hashable] includes sets, which do not preserve the ordering (see https://github.com/pydata/xarray/pull/6971#discussion_r981166670). This means we need to distinguish between the cases where the order matters (constructor, transpose etc.) and where it does not (drop_dims, reductions etc.). Probably this needs to be typed as a str | Sequence[Hashable] (a numpy.ndarray is not a Sequence, but who uses this for dimensions anyway?).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7094/reactions",
    "total_count": 5,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1120405560 I_kwDOAMm_X85CyAg4 6229 [Bug]: rename_vars to dimension coordinate does not create an index headtr1ck 43316012 closed 0     6 2022-02-01T09:09:50Z 2022-09-27T09:33:42Z 2022-09-27T09:33:42Z COLLABORATOR      

What happened?

We used Data{set,Array}.rename{_vars}({coord: dim_coord}) to make a coordinate a dimension coordinate (instead of set_index). This results in the coordinate correctly being displayed as a dimension coordinate (with the *) but it does not create an index, such that further operations like sel fail with a strange KeyError.

What did you expect to happen?

I expect one of two things to be true:

  1. rename{_vars} does not allow setting dimension coordinates (raises Error and tells you to use set_index)
  2. rename{_vars} checks for this occasion and sets the index correctly

Minimal Complete Verifiable Example

```python import xarray as xr

data = xr.DataArray([5, 6, 7], coords={"c": ("x", [1, 2, 3])}, dims="x")

<xarray.DataArray (x: 3)>

array([5, 6, 7])

Coordinates:

c (x) int64 1 2 3

Dimensions without coordinates: x

data_renamed = data.rename({"c": "x"})

<xarray.DataArray (x: 3)>

array([5, 6, 7])

Coordinates:

* x (x) int64 1 2 3

data_renamed.indexes

Empty

data_renamed.sel(x=2)

KeyError: 'no index found for coordinate x'

if we use set_index it works

data_indexed = data.set_index({"x": "c"})

looks the same as data_renamed!

<xarray.DataArray (x: 3)>

array([1, 2, 3])

Coordinates:

* x (x) int64 1 2 3

data_indexed.indexes

x: Int64Index([1, 2, 3], dtype='int64', name='x')

```

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.9.1 (default, Jan 13 2021, 15:21:08) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.49.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4

xarray: 0.20.2 pandas: 1.3.5 numpy: 1.21.5 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None setuptools: 49.2.1 pip: 22.0.2 conda: None pytest: 6.2.5 IPython: 8.0.0 sphinx: None

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6229/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
917034151 MDU6SXNzdWU5MTcwMzQxNTE= 5458 DataArray.rename docu missing renaming of dimensions headtr1ck 43316012 closed 0     0 2021-06-10T07:57:11Z 2022-07-18T14:48:02Z 2022-07-18T14:48:02Z COLLABORATOR      

What happened: http://xarray.pydata.org/en/stable/generated/xarray.DataArray.rename.html#xarray.DataArray.rename states that:

Returns a new DataArray with renamed coordinates or a new name.

What you expected to happen: It should state: "Returns a new DataArray with renamed coordinates, dimensions or a new name." Since it definitely can do that.

Minimal example xr.DataArray([1, 2, 3]).rename({"dim_0": "new"})

Further While at it: Dataset.rename als does not mention explicitly that you can rename coordinates.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5458/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1292284929 I_kwDOAMm_X85NBrQB 6749 What should `Dataset.count` return for missing dims? headtr1ck 43316012 open 0     5 2022-07-03T11:49:12Z 2022-07-14T17:27:23Z   COLLABORATOR      

What is your issue?

When using a dataset with multiple variables and using Dataset.count("x") it will return ones for variables that are missing dimension "x", e.g.: ```python import xarray as xr ds = xr.Dataset({"a": ("x", [1, 2, 3]), "b": ("y", [4, 5])}) ds.count("x")

returns:

<xarray.Dataset>

Dimensions: (y: 2)

Dimensions without coordinates: y

Data variables:

a int32 3

b (y) int32 1 1

``` I can understand why "1" can be a valid answer, but the result is probably a bit philosophical.

For my usecase I would like it to return an array of ds.sizes["x"] / 0. I think this is also a valid return value, considering the broadcasting rules, where the size of the missing dimension is actually known in the dataset.

Maybe one could make this behavior adjustable with a kwarg, e.g. "missing_dim_value: {int, "size"}, default 1.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6749/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1275747776 I_kwDOAMm_X85MCl3A 6703 Add coarsen, rolling and weighted to generate_reductions headtr1ck 43316012 open 0     1 2022-06-18T09:49:22Z 2022-06-18T16:04:15Z   COLLABORATOR      

Is your feature request related to a problem?

Coarsen reductions are currently added dynamically which is not very useful for typing. This is a follow-up to @Illviljan in https://github.com/pydata/xarray/pull/6702#discussion_r900700532_

Same goes for Weighted. And similar for Rolling (not sure if it is exactly the same though?)

Describe the solution you'd like

Extend the generate_reductions script to include DataArrayCoarsen and DatasetCoarsen. Once finished: use type checking in all test_coarsen tests.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6703/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1120378011 I_kwDOAMm_X85Cx5yb 6227 [Bug]: Dataset.where(x, drop=True) behaves inconsistent headtr1ck 43316012 closed 0     0 2022-02-01T08:40:30Z 2022-06-12T22:06:51Z 2022-06-12T22:06:51Z COLLABORATOR      

What happened?

I tried to reduce some dimensions using where (sel did not work in this case) and shorten the dimensions with "drop=True". This works fine on DataArrays and Datasets with only a single dimension but fails as soon as you have a Dataset with two dimensions on different variables. The dimensions are left untouched and you have NaNs in the data, just as if you were using "drop=False" (see example).

I am actually not sure what the expected behavior is, maybe I am wrong and it is correct due to some broadcasting rules?

What did you expect to happen?

I expected that relevant dims are shortened. If the ds.where with "drop=False" all variables along a dimenions have some NaNs, then using "drop=True" I expect these dimensions to be shortened and the NaNs removed.

Minimal Complete Verifiable Example

```python import xarray as xr

this works

ds = xr.Dataset({"a": ("x", [1, 2 ,3])}) ds.where(ds > 2, drop=True)

returns:

<xarray.Dataset>

Dimensions: (x: 1)

Dimensions without coordinates: x

Data variables:

a (x) float64 3.0

this doesn't

ds = xr.Dataset({"a": ("x", [1, 2 ,3]), "b": ("y", [2, 3, 4])}) ds.where(ds > 2, drop=True)

returns:

<xarray.Dataset>

Dimensions: (x: 3, y: 3)

Dimensions without coordinates: x, y

Data variables:

a (x) float64 nan nan 3.0

b (y) float64 nan 3.0 4.0

```

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.9.1 (default, Jan 13 2021, 15:21:08) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.49.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4

xarray: 0.20.2 pandas: 1.3.5 numpy: 1.21.5 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None setuptools: 49.2.1 pip: 22.0.2 conda: None pytest: 6.2.5 IPython: 8.0.0 sphinx: None

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6227/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1245726154 I_kwDOAMm_X85KQEXK 6632 Literal type of engine argument incompatible with custom backends headtr1ck 43316012 closed 0     5 2022-05-23T21:40:14Z 2022-05-28T10:29:16Z 2022-05-28T10:29:16Z COLLABORATOR      

What is your issue?

In the recent typing improvements the engine argument for open_dataset was changed from Str to a Literal of xarrays internal engines. This will cause problems for all third party backend plugins.

We have several possibilities:

  1. I don't know if there is a way to know installed backends at type checking time. Then we could add this support. (I doubt this is possible seeing how dynamic these imports are)
  2. Is it possible for these plugins to tell type checkers that their engine is valid, i.e. change the type signature of xarrays function? Then we should add a how-to in the docu.
  3. Else we should probably revert to using Str.

Any typing experts here that could help?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6632/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1150618439 I_kwDOAMm_X85ElQtH 6306 Assigning to dataset with missing dim raises ValueError headtr1ck 43316012 open 0     1 2022-02-25T16:08:04Z 2022-05-21T20:35:52Z   COLLABORATOR      

What happened?

I tried to assign values to a dataset with a selector-dict where a variable is missing the dim from the selector-dict. This raises a ValueError.

What did you expect to happen?

I expect that assigning works the same as selecting and it will ignore the missing dims.

Minimal Complete Verifiable Example

```Python import xarray as xr

ds = xr.Dataset({"a": ("x", [1, 2, 3]), "b": ("y", [4, 5])})

ds[{"x": 1}]

this works and returns:

<xarray.Dataset>

Dimensions: (y: 2)

Dimensions without coordinates: y

Data variables:

a int64 2

b (y) int64 4 5

ds[{"x": 1}] = 1

this fails and raises a ValueError

ValueError: Variable 'b': indexer {'x': 1} not available

```

Relevant log output

```Python Traceback (most recent call last): File "xarray/core/dataset.py", line 1591, in _setitem_check var_k = var[key] File "xarray/core/dataarray.py", line 740, in getitem return self.isel(indexers=self._item_key_to_dict(key)) File "xarray/core/dataarray.py", line 1204, in isel variable = self._variable.isel(indexers, missing_dims=missing_dims) File "xarray/core/variable.py", line 1181, in isel indexers = drop_dims_from_indexers(indexers, self.dims, missing_dims) File "xarray/core/utils.py", line 834, in drop_dims_from_indexers raise ValueError( ValueError: Dimensions {'x'} do not exist. Expected one or more of ('y',)

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "xarray/core/dataset.py", line 1521, in setitem value = self._setitem_check(key, value) File "xarray/core/dataset.py", line 1593, in _setitem_check raise ValueError( ValueError: Variable 'b': indexer {'x': 1} not available ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.9.1 (default, Jan 13 2021, 15:21:08) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.49.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4

xarray: 0.21.1 pandas: 1.4.0 numpy: 1.21.5 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None setuptools: 49.2.1 pip: 22.0.3 conda: None pytest: 6.2.5 IPython: 8.0.0 sphinx: None

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6306/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1222103599 I_kwDOAMm_X85I19Iv 6554 isel with drop=True does not drop coordinates if using scalar DataArray as indexer headtr1ck 43316012 closed 0     2 2022-05-01T10:14:37Z 2022-05-10T06:18:19Z 2022-05-10T06:18:19Z COLLABORATOR      

What happened?

When using DataArray/Dataset.isel with drop=True with a scalar DataArray as indexer (see example) resulting scalar coordinates do not get dropped. When using an integer the behavior is as expected.

What did you expect to happen?

I expect that using a scalar DataArray behaves the same as an integer.

Minimal Complete Verifiable Example

```Python import xarray as xr

da = xr.DataArray([1, 2, 3], dims="x", coord={"k": ("x", [0, 1, 2])})

<xarray.DataArray (x: 3)>

array([1, 2, 3])

Coordinates:

k (x) int32 0 1 2

da.isel({"x": 1}, drop=True)

works

<xarray.DataArray ()>

array(2)

da.isel({"x": xr.DataArray(1)}, drop=True)

does not drop "k" coordinate

<xarray.DataArray ()>

array(2)

Coordinates:

k int32 1

```

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: 4fbca23a9fd8458ec8f917dd0e54656925503e90 python: 3.9.6 | packaged by conda-forge | (default, Jul 6 2021, 08:46:02) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('de_DE', 'cp1252') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.18.2.dev76+g3a7e7ca2.d20210706 pandas: 1.3.0 numpy: 1.21.0 scipy: 1.7.0 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.11.0 h5py: 3.3.0 Nio: None zarr: 2.8.3 cftime: 1.5.0 nc_time_axis: 1.3.1 PseudoNetCDF: installed cfgrib: None iris: 2.4.0 bottleneck: 1.3.2 dask: 2021.06.2 distributed: 2021.06.2 matplotlib: 3.4.2 cartopy: 0.19.0.post1 seaborn: 0.11.1 numbagg: 0.2.1 fsspec: 2021.06.1 cupy: None pint: 0.17 sparse: 0.12.0 setuptools: 49.6.0.post20210108 pip: 21.3.1 conda: None pytest: 6.2.4 IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6554/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1217543476 I_kwDOAMm_X85Ikj00 6526 xr.polyval first arg requires name attribute headtr1ck 43316012 closed 0     2 2022-04-27T15:47:02Z 2022-05-05T19:15:58Z 2022-05-05T19:15:58Z COLLABORATOR      

What happened?

I have some polynomial coefficients and want to evaluate them at some values using xr.polyval.

As described in the docstring/docu I created a 1D coordinate DataArray and pass it to xr.polyval but it raises a KeyError (see example).

What did you expect to happen?

I expected that the polynomial would be evaluated at the given points.

Minimal Complete Verifiable Example

```Python import xarray as xr

coeffs = xr.DataArray([1, 2, 3], dims="degree")

With a "handmade" coordinate it fails:

coord = xr.DataArray([0, 1, 2], dims="x")

xr.polyval(coord, coeffs)

raises:

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

File "xarray/core/computation.py", line 1847, in polyval

x = get_clean_interp_index(coord, coord.name, strict=False)

File "xarray/core/missing.py", line 252, in get_clean_interp_index

index = arr.get_index(dim)

File "xarray/core/common.py", line 404, in get_index

raise KeyError(key)

KeyError: None

If one adds a name to the coord that is called like the dimension:

coord2 = xr.DataArray([0, 1, 2], dims="x", name="x")

xr.polyval(coord2, coeffs)

works

```

Relevant log output

No response

Anything else we need to know?

I assume that the "standard" workflow is to obtain the coord argument from an existing DataArrays coordinate, where the name would be correctly set already. However, that is not clear from the description, and also prevents my "manual" workflow.

It could be that the problem will be solved by replacing the coord DataArray argument by an explicit Index in the future.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.10 (main, Mar 15 2022, 15:56:56) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.49.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 2022.3.0 pandas: 1.4.2 numpy: 1.22.3 scipy: None netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.1 cartopy: 0.20.2 seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None setuptools: 58.1.0 pip: 22.0.4 conda: None pytest: None IPython: 8.2.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6526/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1222215528 I_kwDOAMm_X85I2Ydo 6555 sortby with ascending=False should create an index headtr1ck 43316012 closed 0     4 2022-05-01T16:57:51Z 2022-05-01T22:17:50Z 2022-05-01T22:17:50Z COLLABORATOR      

Is your feature request related to a problem?

When using sortby with ascending=False on a DataArray/Dataset without an explicit index, the data gets correctly reversed, but it is not possible to tell anymore which ordering the data has.

If an explicit index (like [0, 1, 2]) exists, it gets correctly reordered and allowes correct aligning.

Describe the solution you'd like

For consistency with aligning xarray should create a new index that indicates that the data has been reordered, i.e. [2, 1, 0].

Only downside: this will break code that relies on non-existent indexes.

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6555/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1221885425 I_kwDOAMm_X85I1H3x 6549 Improved Dataset broadcasting headtr1ck 43316012 open 0     3 2022-04-30T17:51:37Z 2022-05-01T14:37:43Z   COLLABORATOR      

Is your feature request related to a problem?

I am a bit puzzled about how xarrays is broadcasting Datasets. It seems to always add all dimensions to all variables. Is this what you want in general?

See this example: ```python import xarray as xr

da = xr.DataArray([[1, 2, 3]], dims=("x", "y"))

<xarray.DataArray (x: 1, y: 3)>

array([[1, 2, 3]])

ds = xr.Dataset({"a": ("x", [1]), "b": ("z", [2, 3])})

<xarray.Dataset>

Dimensions: (x: 1, z: 2)

Dimensions without coordinates: x, z

Data variables:

a (x) int32 1

b (z) int32 2 3

ds.broadcast_like(da)

returns:

<xarray.Dataset>

Dimensions: (x: 1, y: 3, z: 2)

Dimensions without coordinates: x, y, z

Data variables:

a (x, y, z) int32 1 1 1 1 1 1

b (x, y, z) int32 2 3 2 3 2 3

I think it should return:

<xarray.Dataset>

Dimensions: (x: 1, y: 3, z: 2)

Dimensions without coordinates: x, y, z

Data variables:

a (x, y) int32 1 1 1 # notice here without "z" dim

b (x, y, z) int32 2 3 2 3 2 3

```

Describe the solution you'd like

I would like broadcasting to behave the same way as e.g. a simple addition. In the upper example da + ds produces the dimensions that I want.

Describe alternatives you've considered

ds + xr.zeros_like(da) this works, but seems more like a "dirty hack".

Additional context

Maybe one can add an option to broadcasting that controls this behavior?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6549/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1155321209 I_kwDOAMm_X85E3M15 6313 groubpy on array with multiindex renames indices headtr1ck 43316012 closed 0     1 2022-03-01T13:08:30Z 2022-03-17T17:11:44Z 2022-03-17T17:11:44Z COLLABORATOR      

What happened?

When grouping and reducing an array or dataset over a multi-index the coordinates that make up the multi-index get renamed to "{name_of_multiindex}_level_{i}".

It only works correctly when the Multiindex is a "homogenous grid", i.e. as obtained by stacking.

What did you expect to happen?

I expect that all coordinates keep their initial names.

Minimal Complete Verifiable Example

```Python import xarray as xr

this works:

d = xr.DataArray(range(4), dims="t", coords={"x": ("t", [0, 0, 1, 1]), "y": ("t", [0, 1, 0, 1])}) dd = d.set_index({"t": ["x", "y"]})

returns

<xarray.DataArray (t: 4)>

array([0, 1, 2, 3])

Coordinates:

* t (t) MultiIndex

- x (t) int64 0 0 1 1

- y (t) int64 0 1 0 1

dd.groupby("t").mean(...)

returns

<xarray.DataArray (t: 4)>

array([0., 1., 2., 3.])

Coordinates:

* t (t) MultiIndex

- x (t) int64 0 0 1 1

- y (t) int64 0 1 0 1

this does not work

d2 = xr.DataArray(range(6), dims="t", coords={"x": ("t", [0, 0, 1, 1, 0, 1]), "y": ("t", [0, 1, 0, 1, 0, 0])}) dd2 = d2.set_index({"t": ["x", "y"]})

returns

<xarray.DataArray (t: 6)>

array([0, 1, 2, 3, 4, 5])

Coordinates:

* t (t) MultiIndex

- x (t) int64 0 0 1 1 0 1

- y (t) int64 0 1 0 1 0 0

dd2.groupby("t").mean(...)

returns

<xarray.DataArray (t: 4)>

array([2. , 1. , 3.5, 3. ])

Coordinates:

* t (t) MultiIndex

- t_level_0 (t) int64 0 0 1 1

- t_level_1 (t) int64 0 1 0 1

```

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.9.1 (default, Jan 13 2021, 15:21:08) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.49.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4

xarray: 0.21.1 pandas: 1.4.0 numpy: 1.21.5 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None setuptools: 49.2.1 pip: 22.0.3 conda: None pytest: 6.2.5 IPython: 8.0.0 sphinx: None

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6313/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
932677183 MDU6SXNzdWU5MzI2NzcxODM= 5550 Dataset.transpose support for missing_dims headtr1ck 43316012 closed 0     6 2021-06-29T13:32:37Z 2021-07-17T21:02:59Z 2021-07-17T21:02:59Z COLLABORATOR      

Is your feature request related to a problem? Please describe. I have a dataset where I do not know which of two dimensions (lets call them a and b) exists in this dataset (So either it has dims ("a", "other") or ("b", "other")). I would like to make sure that this dimension is first using transpose, but currently this is only possible using if or try statements. Just using ds.transpose("a", "b", "other") raises a ValueError arguments to transpose XXX must be permuted dataset dimensions YYY.

Describe the solution you'd like It would be nice if I could just use ds.transpose("a", "b", "other", missing_dims="ignore") similar to how DataArray.transpose handles it.

Describe alternatives you've considered Currently I'm also using ds.map(lambda x: x.transpose("a", "b", "other", missing_dims="ignore")), which could (maybe?) replace the current implementation of the transpose.

While at it, transpose_coords could also be exposed to Dataset.transpose.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5550/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 32.259ms · About: xarray-datasette