home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1485037066

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1485037066 PR_kwDOAMm_X85Ez9Gj 7368 Expose "Coordinates" as part of Xarray's public API 4160723 closed 0     31 2022-12-08T16:59:29Z 2023-08-30T09:11:57Z 2023-07-21T20:40:03Z MEMBER   0 pydata/xarray/pulls/7368
  • [x] Closes #7214
  • [x] Closes #6392
  • [x] xref #6633
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst

This is a rework of #7214. It follows the suggestions made in https://github.com/pydata/xarray/pull/7214#issuecomment-1295283938, https://github.com/pydata/xarray/pull/7214#issuecomment-1297046405 and https://github.com/pydata/xarray/pull/7214#issuecomment-1293774799:

  • No indexes argument is added to Dataset.__init__, and the indexes argument of DataArray.__init__ is kept private (i.e., valid only if fastpath=True)
  • When a Coordinates object is passed to a new Dataset or DataArray via the coords argument, both coordinate variables and indexes are copied/extracted and added to the new object
  • This PR also adds ~~an IndexedCoordinates subclass~~ Coordinates public constructors used to create Xarray coordinates and indexes from non-Xarray objects. For example, the Coordinates.from_pandas_multiindex() class method creates a new set of index and coordinates from an existing pd.MultiIndex.

EDIT: IndexCoordinates has been merged with Coordinates

EDIT2: it ended up as a pretty big refactor with the promotion of Coordinates has a 2nd-class Xarray container that supports alignment like Dataset and DataArray. It is still quite advanced API, useful for passing coordinate variables and indexes around. Internally, Coordinates objects are still "virtual" containers (i.e., proxies for coordinate variables and indexes stored in their corresponding DataArray or Dataset objects). For now, a "stand-alone" Coordinates object created from scratch wraps a Dataset with no data variables.

Some examples of usage:

```python import pandas as pd import xarray as xr

midx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("one", "two"))

coords = xr.Coordinates.from_pandas_multiindex(midx, "x")

Coordinates:

* x (x) object MultiIndex

* one (x) object 'a' 'a' 'b' 'b'

* two (x) int64 1 2 1 2

ds = xr.Dataset(coords=coords)

<xarray.Dataset>

Dimensions: (x: 4)

Coordinates:

* x (x) object MultiIndex

* one (x) object 'a' 'a' 'b' 'b'

* two (x) int64 1 2 1 2

Data variables:

empty

ds_to_be_deprecated = xr.Dataset(coords={"x": midx}) ds_to_be_deprecated.identical(ds)

True

da = xr.DataArray([1, 2, 3, 4], dims="x", coords=ds.coords)

<xarray.DataArray (x: 4)>

array([1, 2, 3, 4])

Coordinates:

* x (x) object MultiIndex

* one (x) object 'a' 'a' 'b' 'b'

* two (x) int64 1 2 1 2

```

TODO:

  • [x] update assign_coords too so it has the same behavior if a Coordinates object is passed?
  • [x] How to avoid building any default index? It seems silly to add or use the indexes argument just for that purpose? ~~We could address that later.~~ Solution: wrap the coordinates dict in a Coordinates objects, e.g., ds = xr.Dataset(coords=xr.Coordinates(coords_dict)).

@shoyer, @dcherian, anyone -- what do you think about the approach proposed here? I'd like to check that with you before going further with tests, docs, etc.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7368/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 pull

Links from other tables

  • 4 rows from issues_id in issues_labels
  • 16 rows from issue in issue_comments
Powered by Datasette · Queries took 0.515ms · About: xarray-datasette