home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1198668507

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1198668507 I_kwDOAMm_X85Hcjrb 6462 Provide protocols for creating structural subtypes of DataArray/Dataset 29104956 open 0     5 2022-04-09T15:09:40Z 2023-09-16T19:55:59Z   NONE      

Is your feature request related to a problem?

I frequently find myself wanting to annotate functions in terms of xarray objects that adhere to a particular schema. Given that a dataset's adherence to a schema is a matter of its structure/contents, it is unnatural to try to describe a schema as a subtype of xr.Dataset (or DataArray) (i.e. a type-checker ought not care that a dataset is an instance of a specific subclass of Dataset).

Describe the solution you'd like

Instead, it would be ideal to define a schema as a Protocol (structural subtype) of xr.Dataset. Unfortunately, one cannot subclass a normal class to create a protocol.

Thus, I am proposing that xarray provide Protocol-based descriptions of DataArray and Dataset so that users can describe schemas as structural subtypes of these classes. E.g.

```python from typing import Protocol

from xarray import DataArray from xarray.typing import DatasetProtocol

class ClimateData(DatasetProtocol, Protocol): lat: DataArray lon: DataArray temp: DataArray precip: DataArray

def process_climate_data(ds: ClimateData): ds.banana # type checker flags as unknown attribute ds.temp # type checker sees "DataArray" (as informed by ClimateData) ds.sel(lat=1.0) # type checker sees Dataset (as informed by DatasetProtocol) ```

The contents of DatasetProtocol would essentially look like a modified type stub for xarray.Dataset so the implementation details are relatively simple, I believe.

Describe alternatives you've considered

Creating a strict subtype of Dataset is not ideal for a few reasons:

  1. Static type checkers would then expect to see that datasets must derive from that particular subclass, which is generally not the case.
  2. The annotations / design of xarray.Dataset is too broad for describing a schema. E.g. the presence of __getattr__ prevents type checkers from flagging access to non-existent data variables and coordinates during static analysis. DatasetProtocol would need to be designed to be less permissive than this.

Additional context

Hopefully this could be leveraged by the likes of xarray-schema so that xarray schemas can be used to provide both runtime and static validation capabilities.

I'd love to get feedback on this, and would be happy to open a PR if xarray devs are willing to weigh in on the design of these protocols.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6462/reactions",
    "total_count": 11,
    "+1": 11,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 0.673ms · About: xarray-datasette