home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

1 row where repo = 13221727 and user = 6273919 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 1

  • issue 1

state 1

  • open 1

repo 1

  • xarray · 1 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1977485456 I_kwDOAMm_X8513giQ 8413 Add a perception of a __xarray__ magic method swamidass 6273919 open 0     4 2023-11-04T19:55:14Z 2023-11-05T18:50:14Z   NONE      

Is your feature request related to a problem?

I am often moving data from external objects (of all sorts!) into xarray. This is a common use case

Much of this code would be greatly simplified if there was a way of giving non-xarray classes a way of declaring to xarray how these objects can be marshaled into

Describe the solution you'd like

So here is an initial proposal for comment. Much of this could be implemented in a third party library. But doing this in xarray itself would likely be best.

Magic Methods

It would be great to see these magic method signatures become integrated throughout the library:

___xarray__ -> xr.Dataset | xr.DataArray ___xarray_array__ -> xr.DatArray ___xarray_dataset__ -> xr.Dataset ___xarray_datatree__ -> xr.DataTree # when DataTree is finally integrated into xarray

Conversion Registry

And these extension functions to register converters:

def register_xarray_converter(class, name: str, func : Callable[[class, ...] | None) -> xr.Dataset | xr.DataArray]: ... def register_dataarray_converter(class, name: str, func : Callable[[class, ...] | None) -> xr.DataArray: ... def register_dataset_converter(class, name: str, func : Callable[[class, ...] | None) -> xr.Dataset: ... def register_datatree_converter(class, name: str, func : Callable[[class, ...], xr.DataArray] | None) -> DataTree # when DataTree is finally integrated into xarray ... Registering a converter if if cls implements a corresponding xarray_* method or another converter already registered for cls. Perhaps add an argument that specifies if the converter should or should not be added if their is a clash. Perhaps these functions return the replaced converter so it can be added back in if needed?

Ideally, also, "deregister" versions (.e.g deregister would also be available. So context managers that change marshaling behavior could easily be constructed.

User API

Along with the following new user API functions:

def as_xarray(x, *args, **kwargs) -> xr.Dataset | xr.DataArray: ... def as_dataarray(x,*args, **kwargs) -> xr.DataArray: ... def as_dataset(x,*args, **kwargs) -> xr.DataSet: ... def as_dataset(x,*args, **kwargs) -> xr.DataSet: # when DataTree is finally integrated into xarray ...

"as_xarray" returns (in order of precedence: - x unaltered if it is an xarray objects - registered_xarray_converter(x, args, kwargs) if it is callable and does not throw an exception - registered_dataarray_converter(x, args, kwargs) if it is callable and does not throw an exception - registered_dataarray_converter(x, *args, kwargs) if it is callable and does not throw an exception - x.xarray(args, kwargs), if it exits, is callable, and does not throw an exception - x.xarray_dataset(args, kwargs), if it exists, is callable, and does not throw an exception - x.xarray_dataarray(*args, kwargs), if it exists, is callable, and does not throw an exception - well known aliases of xarray_dataarray, such as x.to_xarray(args, *kwargs) (see pandas) - [DESIGN DECISION] convert and return tuple[dims, data, [attr, encoding] to DataArray? - [DESIGN DECISION] convert and return tuple encoding of DataSet? - [DESIGN DECISION] return DataArray wrapped duck-typed array in DataArray?

The rationale for putting the registered functions first is that this would enable

"as_dataarrray" would be slimilar, but it would only call x.xarray_dataarray and well known aliases.

"as_dataset" would be slimilar, but it would only call x.xarray_dataset, well known aliases, and perhaps falling back to calling x.xarray_dataarray and converting the return a dataset if it has a name attribute.

"as_datatree" would be slimilar, but it would only call x.xarray_datatree, and perhaps falling back to calling x.xarray_dataarray and wrapping it in a single node datatree. (Though of course at this point this method would probably be implemented by the DataTree package, not xarray)

The design decisions are flexible from my point of view, and might be decided in a way that makes the code base simplest or most usable. There is also a question of whether or not this method should default the backup methods. These decisions also can be deferred entirely by delegating to the converter registry.

Across the Xarray Library

Finally, across the xarray library, there may be places where passing input arguments through as_xarray, as_dataarray, or as_dataset would make a lot of sense. This could be the final thing to do, but cannot be handled by a third party library.

Doing this would give give another pathway for third party libraries to integrate with xarray, with a far easier way than the converter registry or explicit calls to as_* functions.

Describe alternatives you've considered

This can be done with a private library. But it seems to a lot of code that is pretty useful to other use cases.

Most of this (but not all) can accomplished in a 3rd party library, but it wouldn't allow the seamless sort of integration with (for example) xarray use of repr_html to integrate with pandas.

The existing backend hooks work great when we are marshaling from file-based sources. See, for example, tiffslide-xarray (https://github.com/swamidasslab/tiffslide-xarray). This approach is seemless for reading files, but cannot marshal objects. For example, this is possible:

x = xr.open_dataset("slide.tiff")

But this doesn't work.

t = tiffslide.TiffSlide("slide.tiff") x = xr.open_dataset(t) # won't work x = xr.DataArray(t) # won't work either

This is an important use case because there are cases where we want to create an xarray like this from objects that are never stored on the filesystem.

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8413/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 22.241ms · About: xarray-datasette