home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where author_association = "MEMBER" and issue = 597475005 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • keewis 3
  • TomNicholas 2

issue 1

  • Extending Xarray for domain-specific toolkits · 5 ✖

author_association 1

  • MEMBER · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
612598462 https://github.com/pydata/xarray/issues/3959#issuecomment-612598462 https://api.github.com/repos/pydata/xarray/issues/3959 MDEyOklzc3VlQ29tbWVudDYxMjU5ODQ2Mg== keewis 14808389 2020-04-12T11:11:26Z 2020-04-12T22:18:31Z MEMBER

Is there any reason not to put the name of the type into attrs and just switch on that rather than the keys in data_vars?

Not really, I just thought the variables in the dataset were a way to uniquely identify its variant (i.e. do the validation of the dataset's structure). If you have different means to do so, of course you can use that instead.

Re TypedDict: the PEP introducing TypedDict especially mentions that it is only intended for Dict[str, Any] (so no subclasses of Dict for TypedDict). However, looking at the code of TypedDict, we should be able to do something similar for Dataset.

Edit: we'd still need to convince mypy that the custom TypedDict is a type...

so I'm curious if that has been discussed much

I don't think so? There were a few discussions about subclassing, but I couldn't find anything about static type analysis. It's definitely worth having this discussion, either here (repurposing this issue) or in a new issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extending Xarray for domain-specific toolkits 597475005
612076605 https://github.com/pydata/xarray/issues/3959#issuecomment-612076605 https://api.github.com/repos/pydata/xarray/issues/3959 MDEyOklzc3VlQ29tbWVudDYxMjA3NjYwNQ== keewis 14808389 2020-04-10T15:23:08Z 2020-04-10T15:56:08Z MEMBER

you could emulate the availability of the accessors by checking your variables in the constructor of the accessor using ```python dataset_types = { frozenset("variable1", "variable2"): "type1", frozenset("variable2", "variable3"): "type2", frozenset("variable1", "variable3"): "type3", }

def _dataset_type(ds): data_vars = frozenset(ds.data_vars.keys()) return dataset_types[data_vars]

@xr.register_dataset_accessor("type1") class Type1Accessor: def init(self, ds): if _dataset_type(ds) != "type1": raise AttributeError("not a type1 dataset") self.dataset = ds though now that we have a "type" registry, we could also have one accessor, and pass a `kind` parameter to your `analyze` function:python def analyze(self, kind="auto"): analyzers = { "type1": _analyze_type1, "type2": _analyze_type2, }

if kind == "auto":
    kind = self.dataset_type
return analyzers.get(kind)(self.dataset)

```

If you just wanted to use static code analysis using e.g. mypy, consider using TypedDict. I don't know anything about mypy, though, so I wasn't able to get it to accept Dataset objects instead of dict. If someone actually gets this to work, we might be able to provide a xarray.typing module to allow something like (but depending on the amount of code needed, this could also fit in the Cookbook docs section): ```python from xarray.typing as DatasetType, Coordinate, ArrayType, Int64Type, FloatType

class Dataset1(DatasetType): longitude : Coordinate[ArrayType[Float64Type]] latitude : Coordinate[ArrayType[Float64Type]]

temperature : ArrayType[Float64Type]

def function(ds : Dataset1): # ... return ds ``` and have the type checker validate the structure of the dataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extending Xarray for domain-specific toolkits 597475005
611997039 https://github.com/pydata/xarray/issues/3959#issuecomment-611997039 https://api.github.com/repos/pydata/xarray/issues/3959 MDEyOklzc3VlQ29tbWVudDYxMTk5NzAzOQ== keewis 14808389 2020-04-10T11:49:32Z 2020-04-10T11:49:32Z MEMBER

do you have any control on how the datasets are created? If so, you could provide a factory function (maybe pass in arrays via required kwargs?) that does the checks and describes the required dataset structure in its docstring.

If you have other questions about dtypes in xarray then please feel free to raise another issue about that.

Will do.

This probably won't happen in the near future, though, since the custom dtypes for numpy are still a work in progress (NEP-40, etc.)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extending Xarray for domain-specific toolkits 597475005
611967822 https://github.com/pydata/xarray/issues/3959#issuecomment-611967822 https://api.github.com/repos/pydata/xarray/issues/3959 MDEyOklzc3VlQ29tbWVudDYxMTk2NzgyMg== TomNicholas 35968931 2020-04-10T10:02:39Z 2020-04-10T10:02:39Z MEMBER

A docstring on a constructor is great -- is there a way to do something like that with accessors?

There surely must be some way to do that, but I'm afraid I'm not a docs wizard. However the accessor is still just a class, whose methods you want to document - would it be too unclear for them to hang off each HaploAccessor.specific_method()?

Is there a way to avoid running check_* methods multiple times?

There is some caching, but you shouldn't rely on it. In #3268 @crusaderky said "The more high level discussion is that the statefulness of the accessor is something that is OK to use for caching and performance improvements, and not OK for storing functional information like yours."

I think those checks could be expensive

those arrays should meet different dtype and dimensionality constraints

Checking dtype and dimensions shouldn't be expensive though, or is it more than that?

Well, we do actually have that problem in trying to find some way to represent 2-bit integers with sub-byte data types but I wasn't trying to get into that on this thread. I'll make the title better.

If you have other questions about dtypes in xarray then please feel free to raise another issue about that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extending Xarray for domain-specific toolkits 597475005
611719548 https://github.com/pydata/xarray/issues/3959#issuecomment-611719548 https://api.github.com/repos/pydata/xarray/issues/3959 MDEyOklzc3VlQ29tbWVudDYxMTcxOTU0OA== TomNicholas 35968931 2020-04-09T19:46:50Z 2020-04-09T19:47:45Z MEMBER

All that said, is it still a bad idea to try to subclass Xarray data structures even if the intent was never to touch any part of the internal APIs?

One of the more immediate problems you'll find if you subclass is that xarray internally uses methods like self._construct_dataarray(dims, values, coords, attrs) to construct return values, so you will likely find that for a lot of methods you call you will only get back a bare DataArray, not the subclass you put in.

You could make custom accessors which perform checks on the input arrays when they get used? ```python @xr.register_dataset_accessor('haplo') def HaploDatasetAccessor: def init(self, ds) check_conforms_to_haplo_requirements(ds) self.data = ds

def analyse(self):
    ...

ds.haplo.analyse() ```

I'm also wondering whether given that the only real difference (not just by convention) of your desired data structures from xarray's is the dtype, then (if xarray actually offered it) would something akin to pandas' ExtensionDtype solve your problem?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extending Xarray for domain-specific toolkits 597475005

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.578ms · About: xarray-datasette