home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1094725752

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1094725752 I_kwDOAMm_X85BQDB4 6142 dimensions: type as `str | Iterable[Hashable]`? 10194086 open 0     14 2022-01-05T20:39:00Z 2022-06-26T11:57:40Z   MEMBER      

What happened?

We generally type dimensions as:

python dims: Hashable | Iterable[Hashable]

However, this is in conflict with passing a tuple of independent dimensions to a method - e.g. da.mean(("x", "y")) because a tuple is also hashable.

Also mypy requires an isinstance(dims, Hashable) check when typing a function. We use an isinstance(dims, str) check in many places to wrap a single dimension in a list. Changing this to isinstance(dims, Hashable) will change the behavior for tuples.

What did you expect to happen?

In the community call today we discussed to change this to

python dims: str | Iterable[Hashable]

i.e. if a single dim is passed it has to be a string and wrapping it in a list is a convenience function. Special use cases with Hashable types should be wrapped in a Iterable by the user. This probably best reflects the current state of the repo (dims = [dims] if isinstance(dims, str) else dims).

The disadvantage could be that it is a bit more difficult to explain in the docstrings?

@shoyer - did I get this right from the discussion?


Other options

  1. Require str as dimension names.

This could be too restrictive. @keewis mentioned that tuple dimension names are already used somwehere in the xarray repo. Also we discussed in another issue or PR (which I cannot find right know) that we want to keep allowing Hashable.

  1. Disallow passing tuples (only allow tuples if a dimension is a tuple), require lists to pass several dimensions.

This is too restrictive in the other direction and will probably lead to a lot of downstream troubles. Naming a single dimension with a tuple will be a very rare case, in contrast to passing several dimension names as a tuple.

  1. Special case tuples. We could potentially check if dims is a tuple and if there are any dimension names consisting of a tuple. Seems more complicated and potentially brittle for probably small gains (IMO).

Minimal Complete Verifiable Example

No response

Relevant log output

No response

Anything else we need to know?

  • We need to check carefully where general Hashable are really allowed. E.g. dims of a DataArray are typed as

https://github.com/pydata/xarray/blob/e056cacdca55cc9d9118c830ca622ea965ebcdef/xarray/core/dataarray.py#L380

but tuples are not actually allowed:

```python import xarray as xr xr.DataArray([1], dims=("x", "y"))

ValueError: different number of dimensions on data and dims: 1 vs 2

xr.DataArray([1], dims=[("x", "y")])

TypeError: dimension ('x', 'y') is not a string

```

  • We need to be careful typing functions where only one dim is allowed, e.g. xr.concat, which should probably set dim: Hashable (and make sure it works).
  • Do you have examples for other real-world hashable types except for str and tuple? (Would be good for testing purposes).

Environment

N/A

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6142/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 14 rows from issue in issue_comments
Powered by Datasette · Queries took 0.652ms · About: xarray-datasette