home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1548948097

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1548948097 I_kwDOAMm_X85cUxKB 7457 Typing of internal datatypes 43316012 open 0     5 2023-01-19T11:08:43Z 2023-01-19T19:49:19Z   COLLABORATOR      

Is your feature request related to a problem?

Currently there is no static typing of the underlying data structures used in DataArrays. Simply running reveal_type(da.data) returns Any.

Adding static typing support to that is unfortunately non-trivial since xarray supports a wide variety of duck-types.

This also comes with internal typing difficulties.

Describe the solution you'd like

I think the way to go is making the DataArray class generic in it's underlying data type. Something like DataArray[np.ndarray] or DataArray[dask.array].

The implementation would require a TypeVar that is bound to some minimal required Protocol for internal consistency (I think at least it needs dtype and shape attributes).

Datasets would have to be typed the same way, this means only one datatype for all variables is possible, when you mix it it will fall back to the common ancestor which will be the before mentioned protocol. This is basically the same restriction that a dict has.

Now to the main issue that I see with this approach: I don't know how to type coordinates. They have the same problems than mentioned above for Datasets. I think it is very common to have dask arrays in the variables but simple numpy arrays in the coordinates, so either one excludes them from the typing or in such cases the common generic typing falls back to the protocol again. Not sure what is the best approach here.

Describe alternatives you've considered

Since the most common workflow for beginners and intermediate-advanced users is to stick with the DataArrays themself and never touch the underlying data, I am not sure if this change is as beneficial as I want it to be. Maybe it just complicates things and leaving it as Any is easier to solve for advanced users that then have to cast or ignore this.

Additional context

It came up in this discussion: https://github.com/pydata/xarray/pull/7020#discussion_r972617770_

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7457/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 5 rows from issue in issue_comments
Powered by Datasette · Queries took 0.789ms · About: xarray-datasette