home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 367763373

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
367763373 MDU6SXNzdWUzNjc3NjMzNzM= 2473 Recommended way to extend xarray Datasets using accessors? 35968931 closed 0     6 2018-10-08T12:19:21Z 2018-10-31T09:58:05Z 2018-10-31T09:58:05Z MEMBER      

Hi,

I'm now regularly using xarray (& dask) for organising and analysing the output of the simulation code I use (BOUT++) and it's very helpful, thank you!.

However my current approach is quite clunky at dealing the extra information and functionality that's specific to the simulation code I'm using, and I have questions about what the recommended way to extend the xarray Dataset class is. This seems like a general enough problem that I thought I would make an issue for it.

Desired

What I ideally want to do is extend the xarray.Dataset class to accommodate extra attributes and methods, while retaining as much xarray functionality as possible, but avoiding reimplementing any of the API. This might not be possible, but ideally I want to make a BoutDataset class which contains extra attributes to hold information about the run which doesn't naturally fit into the xarray data model, extra methods to perform analysis/plotting which only users of this code would require, but also be able to use xarray-specific methods and top-level functions:

```python bd = BoutDataset('/path/to/data')

ds = bd.data # access the wrapped xarray dataset extra_data = bd.extra_data # access the BOUT-specific data

bd.isel(time=-1) # use xarray dataset methods

bd2 = BoutDataset('/path/to/other/data') concatenated_bd = xr.concat([bd, bd2]) # apply top-level xarray functions to the data

bd.plot_tokamak() # methods implementing bout-specific functionality ```

Problems with my current approach

I have read the documentation about extending xarray, and the issue threads about subclassing Datasets (#706) and accessors (#1080), but I wanted to check that what I'm doing is the recommended approach.

Right now I'm trying to do something like

```python @xr.register_dataset_accessor('bout') class BoutDataset: def init(self, path): self.data = collect_data(path) # collect all my numerical data from output files self.extra_data = read_extra_data(path) # collect extra data about the simulation

def plot_tokamak():
    plot_in_bout_specific_way(self.data, self.extra_data)

```

which works in the sense that I can do

```python bd = BoutDataset('/path/to/data')

ds = bd.bout.data # access the wrapped xarray dataset extra_data = bd.bout.extra_data # access the BOUT-specific data bd.bout.plot_tokamak() # methods implementing bout-specific functionality ```

but not so well with

```python bd.isel(time=-1) # AttributeError: 'BoutDataset' object has no attribute 'isel' bd.bout.data.isel(time=-1) # have to do this instead, but this returns an xr.Dataset not a BoutDataset

concatenated_bd = xr.concat([bd1, bd2]) # TypeError: can only concatenate xarray Dataset and DataArray objects, got <class 'BoutDataset'> concatenated_ds = xr.concat([bd1.bout.data, bd2.bout.data]) # again have to do this instead, which again returns an xr.Dataset not a BoutDataset ```

If I have to reimplement the APl for methods like .isel() and top-level functions like concat(), then why should I not just subclass xr.Dataset?

There aren't very many top-level xarray functions so reimplementing them would be okay, but there are loads of Dataset methods. However I think I know how I want my BoutDataset class to behave when an xr.Dataset method is called on it: I want it to implement that method on the underlying dataset and return the full BoutDatset with extra data and attributes still attached.

Is it possible to do something like: "if calling an xr.Dataset method on an instance of BoutDataset, call the corresponding method on the wrapped dataset and return a BoutDataset that has the extra BOUT-specific data propagated through"?

Thanks in advance, apologies if this is either impossible or relatively trivial, I just thought other xarray users might have the same questions.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2473/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 6 rows from issue in issue_comments
Powered by Datasette · Queries took 0.739ms · About: xarray-datasette