github: issue_comments: 5 rows where author_association = "MEMBER" and issue = 597475005 sorted by updated

5 rows where author_association = "MEMBER" and issue = 597475005 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
612598462	https://github.com/pydata/xarray/issues/3959#issuecomment-612598462	https://api.github.com/repos/pydata/xarray/issues/3959	MDEyOklzc3VlQ29tbWVudDYxMjU5ODQ2Mg==	keewis 14808389	2020-04-12T11:11:26Z	2020-04-12T22:18:31Z	MEMBER	Is there any reason not to put the name of the type into `attrs` and just switch on that rather than the keys in `data_vars`? Not really, I just thought the variables in the dataset were a way to uniquely identify its variant (i.e. do the validation of the dataset's structure). If you have different means to do so, of course you can use that instead. Re `TypedDict`: the PEP introducing `TypedDict` especially mentions that it is only intended for `Dict[str, Any]` (so no subclasses of `Dict` for `TypedDict`). However, looking at the code of `TypedDict`, we should be able to do something similar for `Dataset`. Edit: we'd still need to convince `mypy` that the custom `TypedDict` is a type... so I'm curious if that has been discussed much I don't think so? There were a few discussions about subclassing, but I couldn't find anything about static type analysis. It's definitely worth having this discussion, either here (repurposing this issue) or in a new issue.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extending Xarray for domain-specific toolkits 597475005
612076605	https://github.com/pydata/xarray/issues/3959#issuecomment-612076605	https://api.github.com/repos/pydata/xarray/issues/3959	MDEyOklzc3VlQ29tbWVudDYxMjA3NjYwNQ==	keewis 14808389	2020-04-10T15:23:08Z	2020-04-10T15:56:08Z	MEMBER	you could emulate the availability of the accessors by checking your variables in the constructor of the accessor using ```python dataset_types = { frozenset("variable1", "variable2"): "type1", frozenset("variable2", "variable3"): "type2", frozenset("variable1", "variable3"): "type3", } def _dataset_type(ds): data_vars = frozenset(ds.data_vars.keys()) return dataset_types[data_vars] @xr.register_dataset_accessor("type1") class Type1Accessor: def init(self, ds): if _dataset_type(ds) != "type1": raise AttributeError("not a type1 dataset") self.dataset = ds though now that we have a "type" registry, we could also have one accessor, and pass a `kind` parameter to your `analyze` function:python def analyze(self, kind="auto"): analyzers = { "type1": _analyze_type1, "type2": _analyze_type2, } `if kind == "auto": kind = self.dataset_type return analyzers.get(kind)(self.dataset)` ``` If you just wanted to use static code analysis using e.g. `mypy`, consider using `TypedDict`. I don't know anything about `mypy`, though, so I wasn't able to get it to accept `Dataset` objects instead of `dict`. If someone actually gets this to work, we might be able to provide a `xarray.typing` module to allow something like (but depending on the amount of code needed, this could also fit in the `Cookbook` docs section): ```python from xarray.typing as DatasetType, Coordinate, ArrayType, Int64Type, FloatType class Dataset1(DatasetType): longitude : Coordinate[ArrayType[Float64Type]] latitude : Coordinate[ArrayType[Float64Type]] `temperature : ArrayType[Float64Type]` def function(ds : Dataset1): # ... return ds ``` and have the type checker validate the structure of the dataset.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extending Xarray for domain-specific toolkits 597475005
611997039	https://github.com/pydata/xarray/issues/3959#issuecomment-611997039	https://api.github.com/repos/pydata/xarray/issues/3959	MDEyOklzc3VlQ29tbWVudDYxMTk5NzAzOQ==	keewis 14808389	2020-04-10T11:49:32Z	2020-04-10T11:49:32Z	MEMBER	do you have any control on how the datasets are created? If so, you could provide a factory function (maybe pass in arrays via required kwargs?) that does the checks and describes the required dataset structure in its docstring. If you have other questions about dtypes in xarray then please feel free to raise another issue about that. Will do. This probably won't happen in the near future, though, since the custom dtypes for `numpy` are still a work in progress (NEP-40, etc.)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extending Xarray for domain-specific toolkits 597475005
611967822	https://github.com/pydata/xarray/issues/3959#issuecomment-611967822	https://api.github.com/repos/pydata/xarray/issues/3959	MDEyOklzc3VlQ29tbWVudDYxMTk2NzgyMg==	TomNicholas 35968931	2020-04-10T10:02:39Z	2020-04-10T10:02:39Z	MEMBER	A docstring on a constructor is great -- is there a way to do something like that with accessors? There surely must be some way to do that, but I'm afraid I'm not a docs wizard. However the accessor is still just a class, whose methods you want to document - would it be too unclear for them to hang off each `HaploAccessor.specific_method()`? Is there a way to avoid running check_* methods multiple times? There is some caching, but you shouldn't rely on it. In #3268 @crusaderky said "The more high level discussion is that the statefulness of the accessor is something that is OK to use for caching and performance improvements, and not OK for storing functional information like yours." I think those checks could be expensive those arrays should meet different dtype and dimensionality constraints Checking dtype and dimensions shouldn't be expensive though, or is it more than that? Well, we do actually have that problem in trying to find some way to represent 2-bit integers with sub-byte data types but I wasn't trying to get into that on this thread. I'll make the title better. If you have other questions about dtypes in xarray then please feel free to raise another issue about that.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extending Xarray for domain-specific toolkits 597475005
611719548	https://github.com/pydata/xarray/issues/3959#issuecomment-611719548	https://api.github.com/repos/pydata/xarray/issues/3959	MDEyOklzc3VlQ29tbWVudDYxMTcxOTU0OA==	TomNicholas 35968931	2020-04-09T19:46:50Z	2020-04-09T19:47:45Z	MEMBER	All that said, is it still a bad idea to try to subclass Xarray data structures even if the intent was never to touch any part of the internal APIs? One of the more immediate problems you'll find if you subclass is that xarray internally uses methods like `self._construct_dataarray(dims, values, coords, attrs)` to construct return values, so you will likely find that for a lot of methods you call you will only get back a bare `DataArray`, not the subclass you put in. You could make custom accessors which perform checks on the input arrays when they get used? ```python @xr.register_dataset_accessor('haplo') def HaploDatasetAccessor: def init(self, ds) check_conforms_to_haplo_requirements(ds) self.data = ds `def analyse(self): ...` ds.haplo.analyse() ``` I'm also wondering whether given that the only real difference (not just by convention) of your desired data structures from xarray's is the dtype, then (if xarray actually offered it) would something akin to pandas' `ExtensionDtype` solve your problem?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extending Xarray for domain-specific toolkits 597475005

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);