github: issues: 2 rows where comments = 6, state_reason = "completed" and user = 35968931 sorted by updated

2 rows where comments = 6, state_reason = "completed" and user = 35968931 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at ▲	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1020282789	I_kwDOAMm_X8480Eel	5843	Why are `da.chunks` and `ds.chunks` properties inconsistent?	TomNicholas 35968931	closed	0			6	2021-10-07T17:21:01Z	2021-10-29T18:12:22Z	2021-10-29T18:12:22Z	MEMBER				Basically the title, but what I'm referring to is this: ```python In [2]: da = xr.DataArray([[0, 1], [2, 3]], name='foo').chunk(1) In [3]: ds = da.to_dataset() In [4]: da.chunks Out[4]: ((1, 1), (1, 1)) In [5]: ds.chunks Out[5]: Frozen({'dim_0': (1, 1), 'dim_1': (1, 1)}) ``` Why does `DataArray.chunks` return a tuple and `Dataset.chunks` return a frozen dictionary? This seems a bit silly, for a few reasons: 1) it means that some perfectly reasonable code might fail unnecessarily if passed a DataArray instead of a Dataset or vice versa, such as ```python def is_core_dim_chunked(obj, core_dim): return len(obj.chunks[core_dim]) > 1 ``` which will work as intended for a dataset but raises a `TypeError` for a dataarray. 2) it breaks the pattern we use for `.sizes`, where ```python In [14]: da.sizes Out[14]: Frozen({'dim_0': 2, 'dim_1': 2}) In [15]: ds.sizes Out[15]: Frozen({'dim_0': 2, 'dim_1': 2}) ``` 3) if you want the chunks as a tuple they are always accessible via `da.data.chunks`, which is a more sensible place to look to find the chunks without dimension names. 4) It's an undocumented difference, as the docstrings for `ds.chunks` and `da.chunks` both only say `"""Block dimensions for this dataset’s data or None if it’s not a dask array."""` which doesn't tell me anything about the return type, or warn me that the return types are different. EDIT: In fact `DataArray.chunk` doesn't even appear to be listed on the API docs page at all. In our codebase this difference is mostly washed out by us using `._to_temp_dataset()` all the time, and also by the way that the `.chunk()` method accepts both the tuple and dict form, so both of these invariants hold (but in different ways): `ds == ds.chunk(ds.chunks) da == da.chunk(da.chunks)` I'm not sure whether making this consistent is worth the effort of a significant breaking change though :confused: (Sort of related to https://github.com/pydata/xarray/issues/2103)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5843/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	xarray 13221727	issue
367763373	MDU6SXNzdWUzNjc3NjMzNzM=	2473	Recommended way to extend xarray Datasets using accessors?	TomNicholas 35968931	closed	0			6	2018-10-08T12:19:21Z	2018-10-31T09:58:05Z	2018-10-31T09:58:05Z	MEMBER				Hi, I'm now regularly using xarray (& dask) for organising and analysing the output of the simulation code I use (BOUT++) and it's very helpful, thank you!. However my current approach is quite clunky at dealing the extra information and functionality that's specific to the simulation code I'm using, and I have questions about what the recommended way to extend the xarray Dataset class is. This seems like a general enough problem that I thought I would make an issue for it. Desired What I ideally want to do is extend the xarray.Dataset class to accommodate extra attributes and methods, while retaining as much xarray functionality as possible, but avoiding reimplementing any of the API. This might not be possible, but ideally I want to make a `BoutDataset` class which contains extra attributes to hold information about the run which doesn't naturally fit into the xarray data model, extra methods to perform analysis/plotting which only users of this code would require, but also be able to use xarray-specific methods and top-level functions: ```python bd = BoutDataset('/path/to/data') ds = bd.data # access the wrapped xarray dataset extra_data = bd.extra_data # access the BOUT-specific data bd.isel(time=-1) # use xarray dataset methods bd2 = BoutDataset('/path/to/other/data') concatenated_bd = xr.concat([bd, bd2]) # apply top-level xarray functions to the data bd.plot_tokamak() # methods implementing bout-specific functionality ``` Problems with my current approach I have read the documentation about extending xarray, and the issue threads about subclassing Datasets (#706) and accessors (#1080), but I wanted to check that what I'm doing is the recommended approach. Right now I'm trying to do something like ```python @xr.register_dataset_accessor('bout') class BoutDataset: def init(self, path): self.data = collect_data(path) # collect all my numerical data from output files self.extra_data = read_extra_data(path) # collect extra data about the simulation `def plot_tokamak(): plot_in_bout_specific_way(self.data, self.extra_data)` ``` which works in the sense that I can do ```python bd = BoutDataset('/path/to/data') ds = bd.bout.data # access the wrapped xarray dataset extra_data = bd.bout.extra_data # access the BOUT-specific data bd.bout.plot_tokamak() # methods implementing bout-specific functionality ``` but not so well with ```python bd.isel(time=-1) # AttributeError: 'BoutDataset' object has no attribute 'isel' bd.bout.data.isel(time=-1) # have to do this instead, but this returns an xr.Dataset not a BoutDataset concatenated_bd = xr.concat([bd1, bd2]) # TypeError: can only concatenate xarray Dataset and DataArray objects, got <class 'BoutDataset'> concatenated_ds = xr.concat([bd1.bout.data, bd2.bout.data]) # again have to do this instead, which again returns an xr.Dataset not a BoutDataset ``` If I have to reimplement the APl for methods like `.isel()` and top-level functions like `concat()`, then why should I not just subclass `xr.Dataset`? There aren't very many top-level xarray functions so reimplementing them would be okay, but there are loads of Dataset methods. However I think I know how I want my `BoutDataset` class to behave when an `xr.Dataset` method is called on it: I want it to implement that method on the underlying dataset and return the full BoutDatset with extra data and attributes still attached. Is it possible to do something like: "if calling an `xr.Dataset` method on an instance of `BoutDataset`, call the corresponding method on the wrapped dataset and return a BoutDataset that has the extra BOUT-specific data propagated through"? Thanks in advance, apologies if this is either impossible or relatively trivial, I just thought other xarray users might have the same questions.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2473/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

2 rows where comments = 6, state_reason = "completed" and user = 35968931 sorted by updated_at descending

Desired

Problems with my current approach

Advanced export