pull_requests: 1154470307
This data as json
id | node_id | number | state | locked | title | user | body | created_at | updated_at | closed_at | merged_at | merge_commit_sha | assignee | milestone | draft | head | base | author_association | auto_merge | repo | url | merged_by |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1154470307 | PR_kwDOAMm_X85Ez9Gj | 7368 | closed | 0 | Expose "Coordinates" as part of Xarray's public API | 4160723 | <!-- Feel free to remove check-list items aren't relevant to your change --> - [x] Closes #7214 - [x] Closes #6392 - [x] xref #6633 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [x] New functions/methods are listed in `api.rst` This is a rework of #7214. It follows the suggestions made in https://github.com/pydata/xarray/pull/7214#issuecomment-1295283938, https://github.com/pydata/xarray/pull/7214#issuecomment-1297046405 and https://github.com/pydata/xarray/pull/7214#issuecomment-1293774799: - No `indexes` argument is added to `Dataset.__init__`, and the `indexes` argument of `DataArray.__init__` is kept private (i.e., valid only if fastpath=True) - When a `Coordinates` object is passed to a new Dataset or DataArray via the `coords` argument, both coordinate variables and indexes are copied/extracted and added to the new object - This PR also adds ~~an `IndexedCoordinates` subclass~~ `Coordinates` public constructors used to create Xarray coordinates and indexes from non-Xarray objects. For example, the `Coordinates.from_pandas_multiindex()` class method creates a new set of index and coordinates from an existing `pd.MultiIndex`. EDIT: `IndexCoordinates` has been merged with `Coordinates` EDIT2: it ended up as a pretty big refactor with the promotion of `Coordinates` has a 2nd-class Xarray container that supports alignment like Dataset and DataArray. It is still quite advanced API, useful for passing coordinate variables and indexes around. Internally, `Coordinates` objects are still "virtual" containers (i.e., proxies for coordinate variables and indexes stored in their corresponding DataArray or Dataset objects). For now, a "stand-alone" `Coordinates` object created from scratch wraps a Dataset with no data variables. Some examples of usage: ```python import pandas as pd import xarray as xr midx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("one", "two")) coords = xr.Coordinates.from_pandas_multiindex(midx, "x") # Coordinates: # * x (x) object MultiIndex # * one (x) object 'a' 'a' 'b' 'b' # * two (x) int64 1 2 1 2 ds = xr.Dataset(coords=coords) # <xarray.Dataset> # Dimensions: (x: 4) # Coordinates: # * x (x) object MultiIndex # * one (x) object 'a' 'a' 'b' 'b' # * two (x) int64 1 2 1 2 # Data variables: # *empty* ds_to_be_deprecated = xr.Dataset(coords={"x": midx}) ds_to_be_deprecated.identical(ds) # True da = xr.DataArray([1, 2, 3, 4], dims="x", coords=ds.coords) # <xarray.DataArray (x: 4)> # array([1, 2, 3, 4]) # Coordinates: # * x (x) object MultiIndex # * one (x) object 'a' 'a' 'b' 'b' # * two (x) int64 1 2 1 2 ``` TODO: - [x] update `assign_coords` too so it has the same behavior if a `Coordinates` object is passed? - [x] How to avoid building any default index? It seems silly to add or use the `indexes` argument just for that purpose? ~~We could address that later.~~ Solution: wrap the coordinates dict in a Coordinates objects, e.g., `ds = xr.Dataset(coords=xr.Coordinates(coords_dict))`. @shoyer, @dcherian, anyone -- what do you think about the approach proposed here? I'd like to check that with you before going further with tests, docs, etc. | 2022-12-08T16:59:29Z | 2023-08-30T09:11:57Z | 2023-07-21T20:40:03Z | 2023-07-21T20:40:03Z | 4441f9915fa978ad5b276096ab67ba49602a09d2 | 0 | 4ef5f17db6d2aefd91fb02485ab7a815fe460b47 | 6b1ff6d13bf360df786500dfa7d62556d23e6df9 | MEMBER | 13221727 | https://github.com/pydata/xarray/pull/7368 |
Links from other tables
- 4 rows from pull_requests_id in labels_pull_requests