home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 612513722

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/3959#issuecomment-612513722 https://api.github.com/repos/pydata/xarray/issues/3959 612513722 MDEyOklzc3VlQ29tbWVudDYxMjUxMzcyMg== 6130352 2020-04-11T21:07:07Z 2020-04-11T21:39:42Z NONE

Thanks @keewis! I like those ideas so I experimented a bit and found a few things.

you could emulate the availability of the accessors by checking your variables in the constructor of the accessor using ... now that we have a "type" registry, we could also have one accessor, and pass a kind parameter to your analyze function:

Is there any reason not to put the name of the type into attrs and just switch on that rather than the keys in data_vars? Forcing unique data_vars keys across the different dataset types isn't a big deal, but I thought a single type name or something of the like in attrs would be simpler.

If someone actually gets this to work, we might be able to provide a xarray.typing module to allow something like (but depending on the amount of code needed, this could also fit in the Cookbook docs section)

I would love to try to use something like that. I couldn't get it to work either when trying to have a TypedDict that represents entire datasets, so I tried creating them for data_vars and coords separately. I think https://github.com/python/mypy/issues/4976 is particularly problematic in either case though. The gist of that issue is that covariance for TypeDict types doesn't really exist (i.e. TypedDict -> Mapping is ok but not TypedDict -> Mapping[Hashable, Any]) and contravariance definitely isn't supported (at least not with Dict or Mapping). Some examples I played around with:

```python MyDict = TypedDict('MyDict', {'x': str}) v1: MyDict = MyDict(x='x')

This is fine

v2: Mapping = v1

But this doesn't work:

v2: Mapping[Hashable, Any] = v1 # A notable examples since it's used in xr.Dataset

error: Incompatible types in assignment (expression has type "MyDict", variable has type "Mapping[Hashable, Any]")

And neither do any of these:

v2: dict = v1

error: Incompatible types in assignment (expression has type "MyDict", variable has type "Dict[Any, Any]")

v2: Mapping[str, str] = v1

error: Incompatible types in assignment (expression has type "MyDict", variable has type "Mapping[str, str]")

```

Going the other direction isn't possible at all (i.e. from Mapping -> TypeDict) since TypedDict acts like a subtype of Mapping. I think that's a big issue downstream if xr.Dataset requires Mapping types for data_vars and coords since you could never do something like this:

```python ds = xr.Dataset(data_vars=MyTypedDict(data=...))

Now assume a user wants to use data_vars/coords with type safety:

data_vars: MyTypedDict = ds.data_vars # This doesn't work ```

Generics seem like a decent solution to all these problems, but it would obviously involve a lot of type annotation changes:

```python

Ideally, xarray.typing would help specify more specific constraints,

but this works with what exists today:

GenotypeDataVars = TypedDict('GenotypeDataVars', {'data': DataArray, 'mask': DataArray}) GenotypeCoords = TypedDict('GenotypeCoords', {'variant': DataArray, 'sample': DataArray})

D = TypeVar('D', bound=Mapping) C = TypeVar('C', bound=Mapping)

Assume xr.Dataset was written something like this instead:

class Dataset(Generic[D, C]):

def __init__(self, data_vars: D, coords: C):
    self.data_vars = data_vars
    self.coords = coords

ds1: Dataset[GenotypeDataVars, GenotypeCoords] = Dataset( GenotypeDataVars(data=xr.DataArray(), mask=xr.DataArray()), GenotypeCoords(variant=xr.DataArray(), sample=xr.DataArray()) )

Types should then be preserved even if xarray is constantly redefining

new instances in internal functions:

ds2: Dataset[GenotypeDataVars, GenotypeCoords] = type(ds1)(ds1.data_vars, ds1.coords) # This is OK ```

Anyways, my takeaways from everything on the thread so far are:

  • Using accessors and some kind of runtime type analysis will cover what was in the scope of my original post, but it will prohibit any kind of static type safety.
  • I imagine supporting type safety on the structure of arrays, coordinates, and attributes in Xarray would be far easier than supporting polymorphism for Dataset/DataArray so I'm curious if that has been discussed much. Do you think it's worth opening a separate issue to continue a conversation that builds on your idea @keewis ?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  597475005
Powered by Datasette · Queries took 0.846ms · About: xarray-datasette