id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1389295853,I_kwDOAMm_X85Szvjt,7099,Pass arbitrary options to sel(),4160723,open,0,,,4,2022-09-28T12:44:52Z,2024-04-30T00:44:18Z,,MEMBER,,,,"### Is your feature request related to a problem?
Currently `.sel()` accepts two options `method` and `tolerance`. These are relevant for default (pandas) indexes but not necessarily for other, custom indexes.
It would be also useful for custom indexes to expose their own selection options, e.g.,
- index query optimization like the `dualtree` flag of [sklearn.neighbors.KDTree.query](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html#sklearn.neighbors.KDTree.query)
- k-nearest neighbors selection with the creation of a new ""k"" dimension (+ coordinate / index) with user-defined name and size.
From #3223, it would be nice if we could also pass distinct options values per index.
What would be a good API for that?
### Describe the solution you'd like
Some ideas:
A. Allow passing a tuple `(labels, options_dict)` as indexer value
```python
ds.sel(x=([0, 2], {""method"": ""nearest""}), y=3)
```
B. Expose an `options` kwarg that would accept a nested dict
```python
ds.sel(x=[0, 2], y=3, options={""x"": {""method"": ""nearest""}})
```
Option A does not look very readable. Option B is slightly better, although the nested dictionary is not great.
Any other ideas? Some sort of context manager? Some `Index` specific API?
### Describe alternatives you've considered
The API proposed in #3223 would look great if `method` and `tolerance` were the only accepted options, but less so for arbitrary options.
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7099/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
2227413822,PR_kwDOAMm_X85rz7ZX,8911,Refactor swap dims,4160723,open,0,,,5,2024-04-05T08:45:49Z,2024-04-17T16:46:34Z,,MEMBER,,1,pydata/xarray/pulls/8911,"
- [ ] Attempt at fixing #8646
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [ ] New functions/methods are listed in `api.rst`
I've tried here re-implementing `swap_dims` using `rename_dims`, `drop_indexes` and `set_xindex`. This fixes the example in #8646 but unfortunately this fails at handling the pandas multi-index special case (i.e., a single non-dimension coordinate wrapping a `pd.MultiIndex` that is promoted to a dimension coordinate in `swap-dims` auto-magically results in a `PandasMultiIndex` with both dimension and level coordinates).
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8911/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
2215059449,PR_kwDOAMm_X85rJr7c,8888,to_base_variable: coerce multiindex data to numpy array,4160723,open,0,,,3,2024-03-29T10:10:42Z,2024-03-29T15:54:19Z,,MEMBER,,0,pydata/xarray/pulls/8888,"
- [x] Closes #8887, and probably supersedes #8809
- [x] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- ~~New functions/methods are listed in `api.rst`~~
@slevang this should also make work your test case added in #8809. I haven't added it here, instead I added a basic check that should be enough.
I don't really understand why the serialization backends (zarr?) do not seem to work with the `PandasMultiIndexingAdapter.__array__()` implementation, which should normally coerce the multi-index levels into numpy arrays as needed. Anyway, I guess that coercing it early like in this PR doesn't hurt and may avoid the confusion of a non-indexed, isolated coordinate variable that still wraps a pandas.MultiIndex. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8888/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
2101987013,PR_kwDOAMm_X85lJbZW,8672,Fix multiindex level serialization after reset_index,4160723,closed,0,,,6,2024-01-26T10:40:42Z,2024-02-23T01:22:17Z,2024-01-31T17:42:29Z,MEMBER,,0,pydata/xarray/pulls/8672,"
- [x] Closes #8628
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8672/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
915057433,MDU6SXNzdWU5MTUwNTc0MzM=,5452,[community] Flexible indexes meeting,4160723,closed,0,,,7,2021-06-08T13:32:16Z,2024-02-15T01:39:08Z,2024-02-15T01:39:08Z,MEMBER,,,,"In addition to the [bi-weekly community developers meeting](https://github.com/pydata/xarray/issues/4001), we plan to have 30min meetings on a weekly basis -- every Tue 8:30-9:00 PDT (17:30-18:00 CEST) -- to discuss the flexible indexes refactor.
Anyone from @pydata/xarray feel free to join! The first meeting is in a couple of hours.
[Zoom link](https://us05web.zoom.us/j/84894064491?pwd=UDFjUjBVbTFQQ1k2SEJIa0UwRFFjZz09) (subject to change).
[Google calendar](https://calendar.google.com/event?action=TEMPLATE&tmeid=OTVsbzRlajE4Y2NyMDg3Nm80bzduamQ1OXNfMjAyMTA2MTVUMTUzMDAwWiBiZW5ib3Z5QG0&tmsrc=benbovy%40gmail.com&scp=ALL)
[Meeting notes](https://hackmd.io/I6u0oA0ISECNl3bwfvIcjA)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5452/reactions"", ""total_count"": 5, ""+1"": 5, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1861543091,I_kwDOAMm_X85u9OSz,8097,Documentation rendering issues (dark mode),4160723,open,0,,,2,2023-08-22T14:06:03Z,2024-02-13T02:31:10Z,,MEMBER,,,,"### What is your issue?
There is a couple of rendering issues in Xarray's documentation landing page, especially with the dark mode.
- we should display two versions of of the logo in the light vs. dark mode (note: if the logo is in the svg format, it may be possible to add CSS classes so that it renders consistently with the active mode)
- same for the images in the section cards (would be nice also to display all the images with the same width / height)
- if possible, it would be nice moving the twitter logo just next to the github logo (upper right) with consistent styling.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8097/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
213004586,MDU6SXNzdWUyMTMwMDQ1ODY=,1303,`xarray.core.variable.as_variable()` part of the public API?,4160723,closed,0,,,5,2017-03-09T11:07:52Z,2024-02-06T17:57:21Z,2017-06-02T17:55:12Z,MEMBER,,,,"Is it safe to use `xarray.core.variable.as_variable()` externally? I guess that currently it is not.
I have a specific use case where this would be very useful.
I'm working on a package that heavily uses and extends xarray for landscape evolution modeling, and inside a custom class for model parameters I want to be able to create `xarray.Variable` objects on the fly from any provided object, e.g., a scalar value, an array-like, a `(dims, data[, attrs])` tuple, another `xarray.Variable`, a `xarray.DataArray`... exactly what `xarray.core.variable.as_variable()` does.
Although I know that `Variable` objects are not needed in most use cases, in this specific case a clean solution would be the following
```python
import xarray as xr
class Parameter(object):
def to_variable(self, obj):
return xr.as_variable(obj)
# ... some validation logic on, e.g., data type, value bounds, dimensions...
# ... add default attributes to the created variable (e.g., units, description...)
```
I don't think it is a viable option to copy `as_variable()` and all its dependent code in my package as it seems to have quite a lot of logic implemented.
A workaround using only public API would be something like:
```python
class Parameter(object):
def to_variable(self, obj):
return xr.Dataset(data_vars={'v': obj}).variables['v']
```
but it feels a bit hacky.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1303/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1864056633,PR_kwDOAMm_X85YovK-,8107,Better default behavior of the Coordinates constructor,4160723,closed,0,,,2,2023-08-23T21:42:51Z,2024-02-04T18:32:42Z,2023-08-31T07:35:47Z,MEMBER,,0,pydata/xarray/pulls/8107,"
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
After working more on `Coordinates` I realize that the default behavior of its constructor could be more consistent with other Xarray objects. This PR changes this default behavior such that:
- Pandas indexes are created for dimension coordinates if `indexes=None` (default). To create dimension coordinates with no index, just pass `indexes={}`.
- If another `Coordinates` object is passed as input, its indexes are also added to the new created object. Since we don't support alignment / merge here, the following call raises an error: `xr.Coordinates(coords=xr.Coordinates(...), indexes={...})`.
This PR introduces a breaking change since `Coordinates` are now exposed in v2023.8.0, which has just been released. It is a bit unfortunate but I think it may be OK for a fresh feature, especially if the next release will be soon after this one.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8107/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1839199929,PR_kwDOAMm_X85XUl4W,8051,Allow setting (or skipping) new indexes in open_dataset,4160723,open,0,,,9,2023-08-07T10:53:46Z,2024-02-03T19:12:48Z,,MEMBER,,0,pydata/xarray/pulls/8051,"
- [x] Closes #6633
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [ ] New functions/methods are listed in `api.rst`
This PR introduces a new boolean parameter `set_indexes=True` to `xr.open_dataset()`, which may be used to skip the creation of default (pandas) indexes when opening a dataset.
Currently works with the Zarr backend:
```python
import numpy as np
import xarray as xr
# example dataset (real dataset may be much larger)
arr = np.random.random(size=1_000_000)
xr.Dataset({""x"": arr}).to_zarr(""dataset.zarr"")
xr.open_dataset(""dataset.zarr"", set_indexes=False, engine=""zarr"")
#
# Dimensions: (x: 1000000)
# Coordinates:
# x (x) float64 ...
# Data variables:
# *empty*
xr.open_zarr(""dataset.zarr"", set_indexes=False)
#
# Dimensions: (x: 1000000)
# Coordinates:
# x (x) float64 ...
# Data variables:
# *empty*
```
I'll add it to the other Xarray backends as well, but I'd like to get your thoughts about the API first.
1. Do we want to add yet another keyword parameter to `xr.open_dataset()`? There are already many...
2. Do we want to add this parameter to the `BackendEntrypoint.open_dataset()` API?
- I'm afraid we must do it if we want this parameter in `xr.open_dataset()`
- this would also make it possible skipping the creation of custom indexes (if any) in custom IO backends
- con: if we require `set_indexes` in the signature in addition to the `drop_variables` parameter, this is a breaking change for all existing 3rd-party backends. Or should we group `set_indexes` with the other xarray decoder kwargs? This would feel a bit odd to me as setting indexes is different from decoding data.
3. Or should we leave this up to the backends?
- pros: no breaking change, more flexible (3rd party backends may want to offer more control like choosing between custom indexes and default pandas indexes or skipping the creation of indexes by default)
- cons: less discoverable, consistency is not enforced across 3rd party backends (although for such advanced case this is probably OK), not available by default in every backend.
Currently 1 and 2 are implemented in this PR, although as I write this comment I think that I would prefer 3. I guess this depends on whether we prefer `open_***` vs. `xr.open_dataset(engine=""***"")` and unless I missed something there is still no real consensus about that? (e.g., #7496).
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8051/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
667864088,MDU6SXNzdWU2Njc4NjQwODg=,4285,Awkward array backend?,4160723,open,0,,,38,2020-07-29T13:53:45Z,2023-12-30T18:47:48Z,,MEMBER,,,,"Just curious if anyone here has thoughts on this.
For more context: [Awkward](https://github.com/scikit-hep/awkward-1.0) is like numpy but for arrays of very arbitrary (dynamic) structure.
I don't know much yet about that library (I've just seen [this SciPy 2020 presentation](https://www.youtube.com/watch?v=WlnUF3LRBj4)), but now I could imagine using xarray for dealing with labelled collections of geometrical / geospatial objects like polylines or polygons.
At this stage, any integration between xarray and awkward arrays would be something highly experimental, but I think this might be an interesting case for flexible arrays (and possibly flexible indexes) mentioned in the [roadmap](http://xarray.pydata.org/en/stable/roadmap.html). There is some discussion here: https://github.com/scikit-hep/awkward-1.0/issues/27.
Does anyone see any other potential use case?
cc @pydata/xarray
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4285/reactions"", ""total_count"": 6, ""+1"": 6, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1989356758,I_kwDOAMm_X852kyzW,8447,Improve discoverability of backend engine options,4160723,open,0,,,5,2023-11-12T11:14:56Z,2023-12-12T20:30:28Z,,MEMBER,,,,"### Is your feature request related to a problem?
Backend engine options are not easily discoverable and we need to know or figure out them before passing it as kwargs to `xr.open_dataset()`.
### Describe the solution you'd like
The solution is similar to the one proposed in #8002 for setting a new index.
The API could look like this:
```python
import xarray as xr
ds = xr.open_dataset(
file_or_obj,
engine=xr.backends.engine(""myengine"").with_options(
option1=True,
option2=100,
),
)
```
where `xr.backends.engine(""myengine"")` returns the `MyEngineBackendEntrypoint` subclass.
We would need to extend the API for `BackendEntrypoint` with a `.with_options()` factory method:
```python
class BackendEntrypoint:
_open_dataset_options: dict[str, Any]
@classmethod
def with_options(cls):
""""""This backend does not implement `with_options`.""""""
raise NotImplementedError()
```
Such that
```python
class MyEngineBackendEntryPoint(BackendEntrypoint):
open_dataset_parameters = (""option1"", ""option2"")
@classmethod
def with_options(
cls,
option1: bool = False,
option2: int | None = None,
):
""""""Get the backend with user-defined options.
Parameters
-----------
option1 : bool, optional
This is option1.
option2 : int, optional
This is option2.
""""""
obj = cls()
# maybe validate the given input options
if option2 is None:
option2 = 1
obj._options = {""option1"": option1, ""option2"": option2}
return obj
def open_dataset(
self,
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
*,
drop_variables: str | Iterable[str] | None = None,
**kwargs, # no static checker error (liskov substitution principle)
):
# kwargs passed directly to open_dataset take precedence to options
# or alternatively raise an error?
option1 = kwargs.get(""option1"", self._options.get(""option1"", False))
...
```
Pros:
- Using `.with_options(...)` would seamlessly work with IDE auto-completion, static type checkers (I guess? I'm not sure how static checkers support entry-points), documentation, etc.
- There is no breaking change (`xr.open_dataset(obj, engine=...)` accepts either a string or a BackenEntryPoint subtype but not yet a BackendEntryPoint object) and this feature could be adopted progressively by existing 3rd-party backends.
Cons:
- The possible duplicated declaration of options among `open_dataset_parameters`, `.with_options()` and `.open_dataset()` does not look super nice but I don't really know how to avoid that.
### Describe alternatives you've considered
A `BackendEntryPoint.with_options()` factory is not really needed and we could just go with `BackendEntryPoint.__init__()` instead. Perhaps `with_options` looks a bit clearer and leaves room for more flexibility in `__init__` , though?
### Additional context
cc @jsignell https://github.com/stac-utils/pystac/issues/846#issuecomment-1405758442","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8447/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1148021907,I_kwDOAMm_X85EbWyT,6293,Explicit indexes: next steps,4160723,open,0,,,3,2022-02-23T12:19:38Z,2023-12-01T09:34:28Z,,MEMBER,,,,"#5692 is ~~not merged yet~~ now merged ~~but~~ and we can ~~already~~ start thinking about the next steps. I’m opening this issue to list and track the remaining tasks. @pydata/xarray, do not hesitate to add a comment below if you think about something that is missing here.
## Continue the refactoring of the internals
Although in #5692 everything seems to work with the current pandas index wrappers for dimension coordinates, not all of Xarray's internals have been refactored yet to fully support (or at least be compatible with) custom indexes. Here is a list of `Dataset` / `DataArray` methods that still need to be checked / updated (this list may be incomplete):
- [ ] `as_numpy` (#8001)
- [ ] `broadcast` (#6430, #6481 )
- [ ] `drop_sel` (#6605, #7699)
- [ ] `drop_isel`
- [ ] `drop_dims`
- [ ] `drop_duplicates` (#8499)
- [ ] `transpose`
- [ ] `interpolate_na`
- [ ] `ffill`
- [ ] `bfill`
- [ ] `reduce`
- [ ] `map`
- [ ] `apply`
- [ ] `quantile`
- [ ] `rank`
- [ ] `integrate`
- [ ] `cumulative_integrate`
- [ ] `filter_by_attrs`
- [ ] `idxmin`
- [ ] `idxmax`
- [ ] `argmin`
- [ ] `argmax`
- [ ] `concat` (partially refactored, may not fully work with multi-dimension indexes)
- [ ] `polyfit`
I ended up following a common pattern in #5692 when adding explicit / flexible index support for various features (it is quite generic, though, the actual procedure may vary from one case to another and many steps may be skipped):
- Check if it’s worth adding a new method to the Xarray `Index` base class. There may be several motivations:
- Avoid handling Pandas index objects inside Dataset or DataArray methods (even if we don’t plan to fully support custom indexes for everything, it is preferable to put this logic behind the `PandasIndex` or `PandasMultiIndex` wrapper classes for clarity and also if eventually we want to make Xarray less dependent on Pandas)
- We want a specific implementation rather than relying on the `Variable`’s corresponding method for speed-up or for other reasons, e.g.,
- `IndexVariable.concat` exists to avoid unnecessary Pandas/Numpy conversions ; in #5692 `PandasIndex.concat` has the same logic and will fully replace the former if/once we get rid of `IndexVariable`
- `PandasIndex.roll` reuses `pandas.Index` indexing and `append` capabilities
- `Index` API closely follows DataArray, Dataset and Variable API (i.e., same method names) for consistency
- Within the Dataset or DataArray method, first call the `Index` API (if it exists) to create new indexes
- The `Indexes` class (i.e., the `.xindexes` property returns an instance of this class) provides convenient API for iterating through indexes (e.g., get a list of unique indexes, get all coordinates or dimensions for a given index, etc.)
- If there’s no implementation for the called `Index` API, either raise an error or fallback to calling the `Variable` API (below) depending on the case
- Create new coordinate variables for each of the new indexes using `Index.create_variables`
- It is possible to pass a dict of current coordinate variables to `Index.create_variables` ; it is used to propagate variable metadata (`dtype`, `attrs` and `encoding`)
- Not all indexes should create new coordinate variables, only those for which it is possible to reuse index data as coordinate variable data (like Pandas indexes)
- Iterate through the variables and call the `Variable` API (if it exists)
- Skip new coordinate variables created at the previous step (just reuse it)
- Propagate the indexes that are not affected by the operation and clean up all indexes, i.e., ensure consistency between indexes and coordinate variables
- There is a couple of convenient methods that have been added in #5692 for that purpose: `filter_indexes_from_coords` and `assert_no_index_corrupted`
- Replace indexes and variables, e.g., using `_replace`, `_replace_with_new_dims` or `_overwrite_indexes` methods
## Relax all constraints related to “dimension (index) coordinates” in Xarray
- [x] Allow multi-dimensional variables with the name matching one of its dimensions: #2233 #2405 (https://github.com/pydata/xarray/pull/2405#issuecomment-419969570)
- #7989
## Indexes repr
- [x] Add an `Indexes` section to Dataset and DataArray reprs
- #6795
- #7185
- [ ] Make the repr of `Indexes` (i.e., `.xindexes` property) consistent with the repr of `Coordinates` (`.coords` property)
- [x] Add `Index._repr_inline_` for tweaking the inline representation of each index shown in the reprs above
- #7183
## Public API for assigning and (re)setting indexes
There is no public API yet for creating and/or assigning existing indexes to Dataset and DataArray objects.
- [ ] Enable and/or document the `indexes` parameter in Dataset and DataArray constructors
- [ ] Depreciate the implicit creation of pandas multi-index wrappers (and their corresponding coordinates) from anything passed via the `data`, `data_vars` or `coords` arguments in favor of a more explicit way to pass it.
- [ ] https://github.com/pydata/xarray/issues/6633 (pass empty dictionary)
- #6392
- #7214
- #7368
- [x] Add `set_xindex` and `drop_indexes` methods
- #6849
- #6971
- Depreciate `set_index` and `reset_index`? See https://github.com/pydata/xarray/issues/4366#issuecomment-920458966
We still need to figure out how best we can (1) assign existing indexes (possibly with their coordinates) and (2) pass index build options.
## Other public API for index-based operations
To fully leverage the power and flexibility of custom indexes, we might want to update some parts of Xarray’s public API in order to allow passing arbitrary options per index. For example:
- [ ] `sel`: the current `method` and `tolerance` may not be relevant for all indexes, pass extra arguments to Scipy's [cKDTree.query](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.cKDTree.query.html#scipy.spatial.cKDTree.query), etc. #7099
- [ ] `align`: #2217
Also:
- [ ] Make public the `Indexes` API as it provides convenient methods that might be useful for end-users
- [ ] Import the `Index` base class into Xarray’s main namespace (i.e., `xr.Index`)? Also `PandasIndex` and `PandasMultiIndex`? The latter may be useful if we depreciate `set_index(append=True)` and/or if we depreciate “unpacking” `pandas.MultiIndex` objects to coordinates when given as `coords` in the Dataset / DataArray constructors.
- [ ] Add references in docstrings (https://github.com/pydata/xarray/pull/5692#discussion_r820117354).
## Documentation
- [ ] User guide:
- [x] Update the “Terminology” section: “Index” may include custom indexes, review “Dimension coordinate” / “Non-dimension coordinate” as “Indexed coordinate” / “Non-indexed coordinate”
- [ ] Update the “Data structure” section such that it clearly mentions indexes as 1st class citizen of the Xarray data model
- [ ] Maybe update other parts of the documentation that refer to the concept of “dimension coordinate”
- [ ] API reference:
- [ ] add `Indexes` API
- [ ] add `Index` API: #6975
- [ ] Xarray internals: add a subsection on how to add custom indexes, maybe with some basic examples: #6975
- [ ] Update development roadmap section
## Index types and helper classes built in Xarray
- [ ] Since a lot of potential use-cases for custom indexes may consist in adding some extra logic on top of one or more pandas indexes along one or more dimensions (i.e., “meta-indexes”), it might be worth providing a helper `Index` abstract subclass that would basically dispatch the given arguments to the corresponding, encapsulated `PandasIndex` instances and then merge the results
- #7182
- [ ] Depreciate `PandasMultiIndex` dimension coordinate?
## 3rd party indexes
- [ ] Add custom index entrypoint / plugin system, similarly to storage backend entrypoints
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6293/reactions"", ""total_count"": 12, ""+1"": 6, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 6, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1879109770,PR_kwDOAMm_X85ZbILy,8140,Deprecate passing pd.MultiIndex implicitly,4160723,open,0,,,23,2023-09-03T14:01:18Z,2023-11-15T20:15:00Z,,MEMBER,,0,pydata/xarray/pulls/8140,"
- Follow-up #8094
- [x] Closes #6481
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
This PR should normally raise a warning *each time* when indexed coordinates are created implicitly from a `pd.MultiIndex` object.
I updated the tests to create coordinates explicitly using `Coordinates.from_pandas_multiindex()`.
I also refactored some parts where a `pd.MultiIndex` could still be passed and promoted internally, with the exception of:
- `swap_dims()`: it should raise a warning! Right now the warning message is a bit confusing for this case, but instead of adding a special case we should probably deprecate the whole method? As it is suggested as a TODO comment... This method was to circumvent the limitations of dimension coordinates, which isn't needed anymore (`rename_dims` and/or `set_xindex` is equivalent and less confusing).
- `xr.DataArray(pandas_obj_with_multiindex, dims=...)`: I guess it should raise a warning too?
- `da.stack(z=...).groupby(""z"")`: it shoudn't raise a warning, but this requires a (heavy?) refactoring of groupby. During building the ""grouper"" objects, `grouper.group1d` or `grouper.unique_coord` may still be built by extracting only the multi-index dimension coordinate. I'd greatly appreciate if anyone familiar with the groupby implementation could help me with this! @dcherian ?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8140/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1865494976,PR_kwDOAMm_X85Ytlq0,8111,Alignment: allow flexible index coordinate order,4160723,open,0,,,3,2023-08-24T16:18:49Z,2023-09-28T15:58:38Z,,MEMBER,,0,pydata/xarray/pulls/8111,"
- [x] Closes #7002
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
This PR relaxes some of the rules used in alignment for finding the indexes to compare or join together. Those indexes must still be of the same type and must relate to the same set of coordinates (and dimensions), but the order of coordinates is now ignored.
It is up to the index to implement the equal / join logic if it needs to care about that order.
Regarding `pandas.MultiIndex`, it seems that the level names are ignored when comparing indexes:
```python
midx = pd.MultiIndex.from_product([[""a"", ""b""], [0, 1]], names=(""one"", ""two"")))
midx2 = pd.MultiIndex.from_product([[""a"", ""b""], [0, 1]], names=(""two"", ""one""))
midx.equals(midx2) # True
```
However, in Xarray the names of the multi-index levels (and their order) matter since each level has its own xarray coordinate. In this PR, `PandasMultiIndex.equals()` and `PandasMultiIndex.join()` thus check that the level names match. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8111/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1869879398,PR_kwDOAMm_X85Y8P4c,8118,Add Coordinates `set_xindex()` and `drop_indexes()` methods,4160723,open,0,,,0,2023-08-28T14:28:24Z,2023-09-19T01:53:18Z,,MEMBER,,0,pydata/xarray/pulls/8118,"
- Complements #8102
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [ ] New functions/methods are listed in `api.rst`
I don't think that we need to copy most API from Dataset / DataArray to `Coordinates`, but I find it convenient to have some relevant methods there too. For example, building Coordinates from scratch (with custom indexes) before passing the whole coords + indexes bundle around:
```python
import dask.array as da
import numpy as np
import xarray as xr
coords = (
xr.Coordinates(
coords={""x"": da.arange(100_000_000), ""y"": np.arange(100)},
indexes={},
)
.set_xindex(""x"", DaskIndex)
.set_xindex(""y"", xr.indexes.PandasIndex)
)
ds = xr.Dataset(coords=coords)
#
# Dimensions: (x: 100000000, y: 100)
# Coordinates:
# * x (x) int64 dask.array
# * y (y) int64 0 1 2 3 4 5 6 7 8 9 10 ... 90 91 92 93 94 95 96 97 98 99
# Data variables:
# *empty*
# Indexes:
# x DaskIndex
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8118/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1890893841,I_kwDOAMm_X85wtMAR,8171,Fancy reprs,4160723,open,0,,,10,2023-09-11T16:46:43Z,2023-09-15T21:07:52Z,,MEMBER,,,,"### What is your issue?
In Xarray we already have the plain-text and html reprs, which is great.
Recently, I've tried [anywidget](https://anywidget.dev/) and I think that it has potential to overcome some of the limitations of the current repr and possibly go well beyond it.
The main advantages of anywidget:
- it is broadly compatible with jupyter-like front-ends (Jupyterlab, notebook, vscode, colab, etc.), although I haven't tested it myself on all those front-ends yet.
- it is super easy to get started: almost no project setup (build, packaging) is required before experimenting with it, although it still requires writing Javascript / HTML / CSS, etc..
I don't think we should replace the current html repr (it is still useful to have a basic, pure HTML/CSS version), but having a new widget could improve some aspects like not including the whole CSS each time an object repr is displayed, removing some HTML/CSS hacks... and actually has much more potential since we would have the whole javascript ecosystem at our fingertips (quick plots, etc.). Also bi-directional communication with Python is possible.
I'm opening this issue to brainstorm about what would be nice to have in widget-based Xarray reprs:
- fancy hover effects (e.g., highlight all variables sharing common dimensions, coordinates sharing a common index, etc.)
- more icons next to each variable reprs (attributes, array repr, quick plot? quick map?)
- ... ?
cc @pydata/xarray ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8171/reactions"", ""total_count"": 5, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 2, ""eyes"": 0}",,,13221727,issue
1889195671,I_kwDOAMm_X85wmtaX,8166,Dataset.from_dataframe: deprecate expanding the multi-index,4160723,open,0,,,3,2023-09-10T15:54:31Z,2023-09-11T06:20:50Z,,MEMBER,,,,"### What is your issue?
Let's continue here the discussion about changing the behavior of Dataset.from_dataframe (see https://github.com/pydata/xarray/pull/8140#issuecomment-1712485626).
> The current behaviour of Dataset.from_dataframe where it always unstacks feels wrong to me.
> To me, it seems sensible that Dataset.from_dataframe(df) automatically creates a Dataset with PandasMultiIndex if df has a MultiIndex. The user can then use that or quite easily unstack to a dense or sparse array.
If we don't unstack anymore the multi-index in `Dataset.from_dataframe`, are we OK that the ""Dataset -> DataFrame -> Dataset"" round-trip will not yield expected results unless we unstack explicitly?
```python
ds = xr.Dataset(
{""foo"": ((""x"", ""y""), [[1, 2], [3, 4]])},
coords={""x"": [""a"", ""b""], ""y"": [1, 2]},
)
df = ds.to_dataframe()
ds2 = xr.Dataset.from_dataframe(df, dim=""z"")
ds2.identical(ds) # False
ds2.unstack(""z"").identical(ds) # True
```
cc @max-sixty @dcherian
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8166/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1889751633,PR_kwDOAMm_X85Z-5v1,8170,Dataset.from_dataframe: optionally keep multi-index unexpanded,4160723,open,0,,,0,2023-09-11T06:20:17Z,2023-09-11T06:20:17Z,,MEMBER,,1,pydata/xarray/pulls/8170,"
- [x] Closes #8166
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [ ] New functions/methods are listed in `api.rst`
I added both the `unstack` and `dim` arguments but we can change that.
- [ ] update `DataArray.from_series()`","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8170/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1879864306,PR_kwDOAMm_X85ZdmTF,8142,Dirty workaround for mypy 1.5 error,4160723,closed,0,,,8,2023-09-04T09:21:18Z,2023-09-07T16:04:55Z,2023-09-07T08:21:12Z,MEMBER,,0,pydata/xarray/pulls/8142,"I wanted to fix the following error with mypy 1.5:
```
xarray/core/dataset.py:505: error: Definition of ""__eq__"" in base class ""DatasetOpsMixin"" is incompatible with definition in base class ""Mapping"" [misc]
```
Which looks similar to https://github.com/python/mypy/issues/9319. It is weird that here it worked with mypy versions < 1.5, though.
I don't know if there is a better fix, but I thought that redefining `__eq__` in `Dataset` would be a bit less dirty workaround than adding `type: ignore` in the class declaration.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8142/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1879652439,PR_kwDOAMm_X85Zc4ub,8141,Fix doctests: pandas 2.1 MultiIndex repr with nan,4160723,closed,0,,,0,2023-09-04T07:08:55Z,2023-09-05T08:35:37Z,2023-09-05T08:35:36Z,MEMBER,,0,pydata/xarray/pulls/8141,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8141/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1880184915,PR_kwDOAMm_X85ZespA,8143,Deprecate the multi-index dimension coordinate,4160723,open,0,,,0,2023-09-04T12:32:36Z,2023-09-04T12:32:48Z,,MEMBER,,0,pydata/xarray/pulls/8143,"
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
This PR adds a `future_no_mindex_dim_coord=False` option that, if set to True, enables the future behavior of `PandasMultiIndex` (i.e., no added dimension coordinate with tuple values):
```python
import xarray as xr
ds = xr.Dataset(coords={""x"": [""a"", ""b""], ""y"": [1, 2]})
ds.stack(z=[""x"", ""y""])
#
# Dimensions: (z: 4)
# Coordinates:
# * z (z) object MultiIndex
# * x (z)
# Dimensions: (z: 4)
# Coordinates:
# * x (z)
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [x] New functions/methods are listed in `api.rst`
This is consistent with the Dataset and DataArray `assign` methods (now that `Coordinates` is also exposed as public API).
This allows writing:
```python
midx = pd.MultiIndex.from_arrays([[""a"", ""a"", ""b"", ""b""], [0, 1, 0, 1]])
midx_coords = xr.Coordinates.from_pandas_multiindex(midx, ""x"")
ds = xr.Dataset(coords=midx_coords.assign(y=[1, 2]))
```
which is quite common (at least in the tests) and a bit nicer than
```python
ds = xr.Dataset(coords=midx_coords.merge({""y"": [1, 2]}).coords)
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8102/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1874412700,PR_kwDOAMm_X85ZLe24,8124,More flexible index variables,4160723,open,0,,,0,2023-08-30T21:45:12Z,2023-08-31T16:02:20Z,,MEMBER,,1,pydata/xarray/pulls/8124,"
- [ ] Closes #xxxx
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [ ] New functions/methods are listed in `api.rst`
The goal of this PR is to provide a more general solution to indexed coordinate variables, i.e., support arbitrary dimensions and/or duck arrays for those variables while at the same time prevent them from being updated in a way that would invalidate their index.
This would solve problems like the one mentioned here: https://github.com/pydata/xarray/issues/1650#issuecomment-1697237429
@shoyer I've tried to implement what you have suggested in https://github.com/pydata/xarray/pull/4979#discussion_r589798510. It would be nice indeed if eventually we could get rid of `IndexVariable`. It won't be easy to deprecate it until we finish the index refactor (i.e., all methods listed in #6293), though. Also, I didn't find an easy way to refactor that class as it has been designed too closely around a 1-d variable backed by a `pandas.Index`.
So the approach implemented in this PR is to keep using `IndexVariable` for PandasIndex until we can deprecate / remove it later, and for the other cases use `Variable` with data wrapped in a custom `IndexedCoordinateArray` object.
The latter solution (wrapper) doesn't always work nicely, though. For example, several methods of `Variable` expect that `self._data` directly returns a duck array (e.g., a dask array or a chunked duck array). A wrapped duck array will result in unexpected behavior there. We could probably add some checks / indirection or extend the wrapper API... But I wonder if there wouldn't be a more elegant approach?
More generally, which operations should we allow / forbid / skip for an indexed coordinate variable?
- Set array items in-place? Do not allow.
- Replace data? Do not allow.
- (Re)Chunk?
- Load lazy data?
- ... ?
(Note: we could add `Index.chunk()` and `Index.load()` methods in order to allow an Xarray index implement custom logic for the two latter cases like, e.g., convert a DaskIndex to a PandasIndex during load, see #8128).
cc @andersy005 (some changes made here may conflict with what you are refactoring in #8075).
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8124/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1875631817,PR_kwDOAMm_X85ZPnjq,8128,Add Index.load() and Index.chunk() methods,4160723,open,0,,,0,2023-08-31T14:16:27Z,2023-08-31T15:49:06Z,,MEMBER,,1,pydata/xarray/pulls/8128,"
- [ ] Closes #xxxx
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [ ] New functions/methods are listed in `api.rst`
As mentioned in #8124, it gives more control to custom Xarray indexes on what best to do when the Dataset / DataArray `load()` and `chunk()` counterpart methods are called.
`PandasIndex.load()` and `PandasIndex.chunk()` always return self (no action required).
For a DaskIndex, we might want to return a PandasIndex (or another non-lazy index) from `load()` and rebuild a DaskIndex object from `chunk()` (rechunk).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8128/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
180638999,MDExOlB1bGxSZXF1ZXN0ODc3MTUzMDM=,1028,"Add `set_index`, `reset_index` and `reorder_levels` methods",4160723,closed,0,,,8,2016-10-03T13:22:24Z,2023-08-30T09:28:26Z,2016-12-27T17:03:00Z,MEMBER,,0,pydata/xarray/pulls/1028,"Another item in #719.
I added tests and updated the docs, so this is ready for review.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1028/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1864650372,PR_kwDOAMm_X85YqtUk,8109,Better error message when trying to set an index from a scalar coordinate,4160723,closed,0,,,0,2023-08-24T08:18:13Z,2023-08-30T09:27:27Z,2023-08-30T07:13:15Z,MEMBER,,0,pydata/xarray/pulls/8109,"
- [x] Closes #4091
- [x] Tests added
The message suggests using `.expand_dims()`.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8109/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
966983801,MDExOlB1bGxSZXF1ZXN0NzA5MTg3NDY2,5692,Explicit indexes,4160723,closed,0,,,46,2021-08-11T15:57:41Z,2023-08-30T09:26:37Z,2022-03-17T17:11:44Z,MEMBER,,0,pydata/xarray/pulls/5692,"
- [x] Closes many issues:
- [x] closes #1366
- [x] closes #1408
- [x] closes #2489
- [x] closes #3432
- [x] closes #4542
- [x] closes #4955
- [x] closes #5202
- [x] closes #5645
- [x] closes #5691
- [x] closes #5697
- [x] closes #5700
- [x] closes #5727
- [x] closes #5953
- [x] closes #6183
- [x] closes #6313
- [x] Tests added
- [x] Passes `pre-commit run --all-files`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- New functions/methods are listed in `api.rst` (new `Index` and `Indexes` API not public yet)
Follow-up on #5636 (work in progress), supersedes #2195.
This is likely to be going big, sorry in advance! It'll be safer to make a release before merging this PR.
Current progress:
- [x] create (default) indexes using the `Index` classes
- [x] refactor default indexes created when 1st accessing `.xindexes` or `.indexes`
- [x] support for non-default indexes (no public API yet)
- [x] remove multi-index virtual coordinates (replace it by regular coordinates)
- [x] refactor internal (text / html) formatting functions
- [x] internal refactor of location-based selection (`.isel()`)
- [x] internal refactor of label-based selection (`.sel()`)
- [x] internal refactor of `.rename()`
- Some changes in behavior (see comments below)
- see #4108
- see #4107
- see #4417
- [x] internal refactor of `set_index` / `reset_index`
- [x] internal refactor of `stack` / `unstack`
- Some changes in behavior (see comments below)
- [x] internal refactor of `Dataset.to_stacked_array`
- [x] internal refactor of `swap_dims`
- [x] internal refactor of `expand_dims`
- [x] internal refactor of alignment
- [x] internal refactor of `reindex` and `reindex_like`
- [x] internal refactor of `interp` and `interp_like`
- [x] internal refactor of merge
- [x] internal refactor of concat
- [x] internal refactor of computation
- [x] internal refactor of copy
- [x] internal refactor of `update`, `assign`, `__setitem__`, `del`, `drop_vars`, etc.
- updates must not corrupt multi-coordinate indexes
- [x] internal refactor of `set_coords` and `reset_coords`
- internal refactor of `drop_sel` and `drop_isel` (maybe later)
- [x] internal refactor of `pad`
- [x] internal refactor of `shift`
- [x] internal refactor of `roll`
TODO:
- [x] Uniformize Index API with Xarray's API
- [x] rename `Index.query()` -> `Index.sel()`?
- [x] rename `PandasMultiIndex.from_product()` -> `PandasMultiIndex.stack()`? Add `Index.stack()` and `Index.unstack()`.
- [x] remove `Index.union()` and `Index.intersection()`
- [x] Use `Index.create_variables()` internally
- [x] remove `PandasIndex.from_pandas_index()` and `PandasMultiIndex.from_pandas_index()` (use constructor + `.create_variables()` instead)
- [x] Review where `.xindexes` is used and use private API instead (`._indexes`) if possible for speed
- [x] requires that `_indexes` always returns a mapping
- [x] Use `from __future__ import annotations` in `indexes.py`
- [x] Re-activate default indexes invariant check (with opt-out for some tests)
In next PRs:
- custom `Index.__repr__` and `Index._repr_inline_`
- add an `Indexes` section in `DataArray` / `Dataset` reprs
- update public API (`set_index`, `reset_index`, `drop_indexes`, `Dataset` and `DataArray` constructors, etc.)
- allow multi-dimensional variables with `name` in `var.dims`","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5692/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
953235338,MDExOlB1bGxSZXF1ZXN0Njk3MzA3NDc3,5636,Refactor index vs. coordinate variable(s),4160723,closed,0,,,4,2021-07-26T19:54:25Z,2023-08-30T09:21:55Z,2021-08-09T07:56:56Z,MEMBER,,0,pydata/xarray/pulls/5636,"
- [x] Closes #5553
- [x] Tests added
- [x] Passes `pre-commit run --all-files`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
This implements option 3 (sort of) described in https://github.com/pydata/xarray/issues/5553#issue-933551030:
- the goal is to avoid wrapping an `xarray.Index` into an `xarray.Variable` and keep those two concepts distinct from each other.
- the `xarray.Index.from_variables` class constructor accepts a dictionary of `xarray.Variable` objects as argument and may (or should?) also return corresponding `xarray.IndexVariable` objects to ensure immutability.
- for `PandasIndex`, the new returned `xarray.IndexVariable` wraps the underlying `pd.Index` via a `PandasIndexingAdapter` (this reverts some changes made in #5102).
- for `PandasMultiIndex`, this PR adds `PandasMultiIndexingAdapter` so that we can wrap the pandas multi-index in separate coordinate variables objects: one for the dimension + one for each level. The level coordinates data internally hold a reference to the dimension coordinate data to avoid indexing the same underlying `pd.MultiIndex` for each of those coordinates (`PandasMultiIndexingAdapter.__getitem__` is memoized for that purpose).
This is very much work in progress, I need to update (or revert) all related parts of Xarray's internals, update tests, etc. At this stage any comment on the approach described above is welcome. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5636/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1485037066,PR_kwDOAMm_X85Ez9Gj,7368,"Expose ""Coordinates"" as part of Xarray's public API",4160723,closed,0,,,31,2022-12-08T16:59:29Z,2023-08-30T09:11:57Z,2023-07-21T20:40:03Z,MEMBER,,0,pydata/xarray/pulls/7368,"
- [x] Closes #7214
- [x] Closes #6392
- [x] xref #6633
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [x] New functions/methods are listed in `api.rst`
This is a rework of #7214. It follows the suggestions made in https://github.com/pydata/xarray/pull/7214#issuecomment-1295283938, https://github.com/pydata/xarray/pull/7214#issuecomment-1297046405 and https://github.com/pydata/xarray/pull/7214#issuecomment-1293774799:
- No `indexes` argument is added to `Dataset.__init__`, and the `indexes` argument of `DataArray.__init__` is kept private (i.e., valid only if fastpath=True)
- When a `Coordinates` object is passed to a new Dataset or DataArray via the `coords` argument, both coordinate variables and indexes are copied/extracted and added to the new object
- This PR also adds ~~an `IndexedCoordinates` subclass~~ `Coordinates` public constructors used to create Xarray coordinates and indexes from non-Xarray objects. For example, the `Coordinates.from_pandas_multiindex()` class method creates a new set of index and coordinates from an existing `pd.MultiIndex`.
EDIT: `IndexCoordinates` has been merged with `Coordinates`
EDIT2: it ended up as a pretty big refactor with the promotion of `Coordinates` has a 2nd-class Xarray container that supports alignment like Dataset and DataArray. It is still quite advanced API, useful for passing coordinate variables and indexes around. Internally, `Coordinates` objects are still ""virtual"" containers (i.e., proxies for coordinate variables and indexes stored in their corresponding DataArray or Dataset objects). For now, a ""stand-alone"" `Coordinates` object created from scratch wraps a Dataset with no data variables.
Some examples of usage:
```python
import pandas as pd
import xarray as xr
midx = pd.MultiIndex.from_product([[""a"", ""b""], [1, 2]], names=(""one"", ""two""))
coords = xr.Coordinates.from_pandas_multiindex(midx, ""x"")
# Coordinates:
# * x (x) object MultiIndex
# * one (x) object 'a' 'a' 'b' 'b'
# * two (x) int64 1 2 1 2
ds = xr.Dataset(coords=coords)
#
# Dimensions: (x: 4)
# Coordinates:
# * x (x) object MultiIndex
# * one (x) object 'a' 'a' 'b' 'b'
# * two (x) int64 1 2 1 2
# Data variables:
# *empty*
ds_to_be_deprecated = xr.Dataset(coords={""x"": midx})
ds_to_be_deprecated.identical(ds)
# True
da = xr.DataArray([1, 2, 3, 4], dims=""x"", coords=ds.coords)
#
# array([1, 2, 3, 4])
# Coordinates:
# * x (x) object MultiIndex
# * one (x) object 'a' 'a' 'b' 'b'
# * two (x) int64 1 2 1 2
```
TODO:
- [x] update `assign_coords` too so it has the same behavior if a `Coordinates` object is passed?
- [x] How to avoid building any default index? It seems silly to add or use the `indexes` argument just for that purpose? ~~We could address that later.~~ Solution: wrap the coordinates dict in a Coordinates objects, e.g., `ds = xr.Dataset(coords=xr.Coordinates(coords_dict))`.
@shoyer, @dcherian, anyone -- what do you think about the approach proposed here? I'd like to check that with you before going further with tests, docs, etc.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7368/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1422543378,PR_kwDOAMm_X85BgRaG,7214,Pass indexes directly to the DataArray and Dataset constructors,4160723,closed,0,,,17,2022-10-25T14:16:44Z,2023-08-30T09:11:56Z,2023-07-18T11:52:11Z,MEMBER,,1,pydata/xarray/pulls/7214,"
- [x] Closes #6392
- [x] Closes #6633 ?
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [ ] New functions/methods are listed in `api.rst`
From https://github.com/pydata/xarray/issues/6392#issuecomment-1290454937:
I'm thinking of only accepting one or more instances of [Indexes](https://github.com/pydata/xarray/blob/e678a1d7884a3c24dba22d41b2eef5d7fe5258e7/xarray/core/indexes.py#L1030) as indexes argument in the Dataset and DataArray constructors. The only exception is when `fastpath=True` a mapping can be given directly. Also, when an empty collection of indexes is passed this skips the creation of default pandas indexes for dimension coordinates.
- It is much easier to handle: just check that keys returned by `Indexes.variables` do no conflict with the coordinate names in the `coords` argument
- It is slightly safer: it requires the user to explicitly create an `Indexes` object, thus with less chance to accidentally provide coordinate variables and index objects that do not relate to each other (we could probably add some safe guards in the `Indexes` class itself)
- It is more convenient: an Xarray `Index` may provide a factory method that returns an instance of `Indexes` that we just need to pass as indexes, and we could also do something like `ds = xr.Dataset(indexes=other_ds.xindexes)`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7214/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1863646946,PR_kwDOAMm_X85YnWau,8104,Fix merge with compat=minimal (coord names),4160723,closed,0,,,0,2023-08-23T16:20:48Z,2023-08-30T09:11:18Z,2023-08-30T07:57:35Z,MEMBER,,0,pydata/xarray/pulls/8104,"
- [x] Closes #7405
- [x] Closes #7588
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8104/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1358841264,PR_kwDOAMm_X84-NgIX,6975,Add documentation on custom indexes,4160723,closed,0,,,9,2022-09-01T13:20:00Z,2023-08-30T09:10:34Z,2023-07-17T23:23:22Z,MEMBER,,0,pydata/xarray/pulls/6975,"This PR documents the API of the `Index` base class and adds a guide for creating custom indexes (reworked from https://hackmd.io/Zxw_zCa7Rbynx_iJu6Y3LA). Hopefully it will help anyone experimenting with this feature.
@pydata/xarray your feedback would be very much appreciated! I've been into this for quite some time, so there may be things that seem obvious to me but that you can still find very confusing or non-intuitive. It would then deserve some extra or better explanation.
More specifically, I'm open to any suggestion on how to better illustrate this with clear and succinct examples.
There are other parts of the documentation that still need to be updated regarding the indexes refactor (e.g., ""dimension"" coordinates, `xindexes` property, set/drop indexes, etc.). But I suggest to do that in separate PRs and focus here on creating custom indexes.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6975/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1859437888,PR_kwDOAMm_X85YY-II,8094,Refactor update coordinates to better handle multi-coordinate indexes,4160723,closed,0,,,4,2023-08-21T13:57:38Z,2023-08-30T09:06:28Z,2023-08-29T14:23:29Z,MEMBER,,0,pydata/xarray/pulls/8094,"
- [x] Closes #7563
- [x] Closes #8039
- [x] Closes #8056
- [x] Closes #7885
- [x] Closes #7921
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
This refactor should better handle multi-coordinate indexes when updating (or assigning) new coordinates.
It also fixes, better isolates and better warns a bunch of deprecated pandas multi-index special cases (i.e., directly passing `pd.MultiIndex` objects or updating a multi-index dimension coordinate). I very much look forward to seeing support for those cases dropped :).
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8094/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1498386428,PR_kwDOAMm_X85FiyaY,7382,Some alignment optimizations,4160723,closed,0,,,4,2022-12-15T12:54:56Z,2023-08-30T09:05:24Z,2023-01-05T21:25:55Z,MEMBER,,0,pydata/xarray/pulls/7382,"
- [x] Benchmark added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
May fix some performance regressions, e.g., see https://github.com/pydata/xarray/issues/7376#issuecomment-1352989233.
@ravwojdyla with this PR `ds.assign(foo=~ds[""d3""])` in your example should be much faster (on par with version 2022.3.0).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7382/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1362148668,PR_kwDOAMm_X84-YVgW,6992,Review (re)set_index,4160723,closed,0,,,1,2022-09-05T15:07:43Z,2023-08-30T09:05:10Z,2022-09-27T10:35:38Z,MEMBER,,0,pydata/xarray/pulls/6992,"
- [x] Closes
- [x] fixes #6946
- [x] fixes #6989
- [x] fixes #6959
- [x] fixes #6969
- [x] fixes #7036
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
Restore behavior prior to the explicit indexes refactor (i.e., refactored but without breaking changes).
TODO:
- [x] review `set_index`
- [x] review `reset_index`
For `reset_index`, the only behavior that is not restored here is the coordinate renamed with a `_` suffix when dropping a single index. This was originally to prevent any coordinate with no index matching a dimension name, which is now irrelevant. That is a quite dirty workaround and I don't know who is relying on it (no complaints yet), but I'm open to restore it if needed (esp. considering that we may later deprecate `reset_index` completely in favor of `drop_indexes` #6971).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6992/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1412901282,PR_kwDOAMm_X85A_96j,7182,add MultiPandasIndex helper class,4160723,open,0,,,2,2022-10-18T09:42:58Z,2023-08-23T16:30:28Z,,MEMBER,,1,pydata/xarray/pulls/7182,"
- [ ] Closes #xxxx
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [ ] New functions/methods are listed in `api.rst`
This PR adds a `xarray.indexes.MultiPandasIndex` helper class for building custom, meta-indexes that encapsulate multiple `PandasIndex` instances. Unlike `PandasMultiIndex`, the meta-index classes inheriting from this helper class may encapsulate loosely coupled (pandas) indexes, with coordinates of arbitrary dimensions (each coordinate must be 1-dimensional but an Xarray index may be created from coordinates with differing dimensions).
Early prototype in this [notebook](https://notebooksharing.space/view/3d599addf8bd6b06a6acc241453da95e28c61dea4281ecd194fbe8464c9b296f#displayOptions=)
TODO / TO FIX:
- How to allow custom `__init__` options in subclasses be passed to all the `type(self)(new_indexes)` calls inside the `MultiPandasIndex` ""base"" class? This could be done via `**kwargs` passed through... However, mypy will certainly complain (Liskov Substitution Principle).
- Is `MultiPandasIndex` a good name for this helper class?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7182/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1364388790,I_kwDOAMm_X85RUuu2,7002,Custom indexes and coordinate (re)ordering,4160723,open,0,,,2,2022-09-07T09:44:12Z,2023-08-23T14:35:32Z,,MEMBER,,,,"### What is your issue?
(From https://github.com/pydata/xarray/issues/5647#issuecomment-946546464).
The current alignment logic (as refactored in #5692) requires that two compatible indexes (i.e., of the same type) must relate to one or more coordinates with matching names but also in a matching order.
For some multi-coordinate indexes like `PandasMultiIndex` this makes sense. However, for other multi-coordinate indexes (e.g., staggered grid indexes) the order of the coordinates doesn't matter much.
Possible options:
1. Setting new Xarray indexes may reorder the coordinate variables, possibly via `Index.create_variables()`, to ensure consistent order
2. Xarray indexes must implement a `Index.matching_key` abstract property in order to support re-indexing and alignment.
3. Take care of coordinate order (and maybe other things) inside `Index.join` and `Index.equals`, e.g., for `PandasMultiIndex` maybe reorder the levels beforehand.
- pros: more flexible
- cons: not great to implicitly reorder levels if it's a costly operation?
4. Find matching indexes using a two-passes approach: (1) group all indexes by dimension name and (2) check compatibility between the indexes listed in each group.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7002/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
979316661,MDU6SXNzdWU5NzkzMTY2NjE=,5738,Flexible indexes: how to handle possible dimension vs. coordinate name conflicts?,4160723,closed,0,,,4,2021-08-25T15:31:39Z,2023-08-23T13:28:41Z,2023-08-23T13:28:40Z,MEMBER,,,,"Another thing that I've noticed while working on #5692.
Currently it is not possible to have a Dataset with a same name used for both a dimension and a multi-index level. I guess the reason is to prevent some errors like unmatched dimension sizes when eventually the multi-index is dropped with renamed dimension(s) according to the level names (e.g., with `sel` or `unstack`). See #2299.
I'm wondering how we should handle this in the context of flexible / custom indexes:
A. Keep this current behavior as a special case for (pandas) multi-indexes. This would avoid breaking changes but how to support custom indexes that could eventually be used like pandas multi-indexes in `sel` or `stack`?
B. Introduce some tag in `xarray.Index` so that we can identify a multi-coordinate index that behaves like a hierarchical index (i.e., levels may be dropped into a single index/coordinate with dimension renaming)
C. Do not allow any dimension name matching the name of a coordinate attached to a multi-coordinate index. This seems silly?
D. Eventually revert #2353 and let users taking care of potential conflicts.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5738/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1175329407,I_kwDOAMm_X85GDhp_,6392,Pass indexes to the Dataset and DataArray constructors,4160723,closed,0,,,6,2022-03-21T12:41:51Z,2023-07-21T20:40:05Z,2023-07-21T20:40:04Z,MEMBER,,,,"### Is your feature request related to a problem?
This is part of #6293 (explicit indexes next steps).
### Describe the solution you'd like
A `Mapping[Hashable, Index]` would probably be the most obvious (optional) value type accepted for the `indexes` argument of the Dataset and DataArray constructors.
pros:
- consistent with the `xindexes` property
cons:
- need to be careful with what is passed as `coords` and `indexes`
- multi-indexes: redundancy and order matters (e.g., pandas multi-index levels)
### An example with a pandas multi-index
Currently a pandas multi-index may be passed directly as one (dimension) coordinate ; it is then ""unpacked"" into one dimension (tuple values) coordinate and one or more level coordinates. I would suggest depreciating this behavior in favor of a more explicit (although more verbose) way to pass an existing pandas multi-index:
```python
import pandas as pd
import xarray as xr
pd_idx = pd.MultiIndex.from_product([[""a"", ""b""], [1, 2]], names=(""foo"", ""bar""))
idx = xr.PandasMultiIndex(pd_idx, ""x"")
indexes = {""x"": idx, ""foo"": idx, ""bar"": idx}
coords = idx.create_variables()
ds = xr.Dataset(coords=coords, indexes=indexes)
```
The cases below should raise an error:
```python
ds = xr.Dataset(indexes=indexes)
# ValueError: missing coordinate(s) for index(es): 'x', 'foo', 'bar'
ds = xr.Dataset(
coords=coords,
indexes={""x"": idx, ""foo"": idx},
)
# ValueError: missing index(es) for coordinate(s): 'bar'
ds = xr.Dataset(
coords={""x"": coords[""x""], ""foo"": [0, 1, 2, 3], ""bar"": coords[""bar""]},
indexes=indexes,
)
# ValueError: conflict between coordinate(s) and index(es): 'foo'
ds = xr.Dataset(
coords=coords,
indexes={""x"": idx, ""foo"": idx, ""bar"": xr.PandasIndex([0, 1, 2], ""y"")},
)
# ValueError: conflict between coordinate(s) and index(es): 'bar'
```
Should we raise an error or simply ignore the index in the case below?
```python
ds = xr.Dataset(coords=coords)
# ValueError: missing index(es) for coordinate(s): 'x', 'foo', 'bar'
# or
# create unindexed coordinates 'foo' and 'bar' and a 'x' coordinate with a single pandas index
```
Should we silently reorder the coordinates and/or indexes when the levels are not passed in the right order? It seems odd requiring mapping elements be passed in a given order.
```python
ds = xr.Dataset(coords=coords, indexes={""bar"": idx, ""x"": idx, ""foo"": idx})
list(ds.xindexes.keys())
# [""x"", ""foo"", ""bar""]
```
### How to generalize to any (custom) index?
With the case of multi-index, it is pretty easy to check whether the coordinates and indexes are consistent because we ensure consistent `pd_idx.names` vs. coordinate names and because `idx.get_variables()` returns Xarray `IndexVariable` objects where variable data wraps the pandas multi-index.
However, this may not be easy for other indexes. Some Xarray custom indexes (like a KD-Tree index) likely won't return anything from `.get_variables()` as they don't support wrapping internal data as coordinate data. Right now there's nothing in the Xarray `Index` base class that could help checking consistency between indexes vs. coordinates for *any* kind of index.
How could we solve this?
- A. add a `.coords` property to the Xarray `Index` base class, that returns a `dict[Hashable, IndexVariable]`.
- Ambiguous when an Index is created directly, i.e., like above `xr.PandasMultiIndex(pd_idx, ""x"")`. Should `.coords` return `None` and return the coordinates returned by the last `.get_variables()` call?
- What if different sets of coordinates refer to a common index (e.g., after copying the coordinate variables, etc.)?
- B. add a `.coord_names` property to the Xarray `Index` base class that returns `tuple[Hashable, ...]`, and add a private attribute to `IndexVariable` that returns the index object (or return it via a very lightweight `IndexAdapter` base class used to wrap variable data).
- `Index.get_variables(variables)` would by default return shallow copies of the input variables with a reference to the index object.
- If that's necessary, we could also store the coordinate dimensions in `coord_names`, i.e., using `tuple[tuple[Hashable, tuple[Hashable, ...]], ...]`.
I think I prefer the second option.
### Describe alternatives you've considered
### Also allow passing index types (and build options) via `indexes`
I.e., `Mapping[Hashable, Index | Type[Index] | tuple[TypeIndex, Mapping[Any, Any]]]`, so that new indexes can be created from the passed coordinates at DataArray or Dataset creation.
pros:
- Flexible.
cons:
- This is complicated. Constructing the Dataset / DataArray (with default indexes) first then calling `.set_index` is probably better.
- Hard to deal with multi-index (redundancy of build option, etc.)
### Pass multi-indexes once, grouped by coordinate names
I.e., `indexes` keys accept tuples: `Mapping[Hashable | tuple[Hashable, ...], Index]`
pros:
- No redundancy and easier to check consistency between indexes vs. coordinates
cons:
- Not consistent with the `.xindexes` property
- Complicated when eventually using tuples for coordinate names?
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6392/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1812008663,I_kwDOAMm_X85sAQ7X,8002,Improve discoverability of index build options,4160723,open,0,,,2,2023-07-19T13:54:09Z,2023-07-19T17:48:51Z,,MEMBER,,,,"### Is your feature request related to a problem?
Currently `Dataset.set_xindex(coord_names, index_cls=None, **options)` allows passing index build options (if any) via the `**options` arguments. Those options are not easily discoverable, though (no auto-completion, etc.).
### Describe the solution you'd like
What about something like this?
```python
ds.set_xindex(""x"", MyCustomIndex.with_options(foo=1, bar=True))
# or
ds.set_xindex(""x"", *MyCustomIndex.with_options(foo=1, bar=True))
```
This would require adding a `.with_options()` class method that can be overridden in Index subclasses (optional):
```python
# xarray.core.indexes
class Index:
@classmethod
def with_options(cls) -> tuple[type[Self], dict[str, Any]]:
return cls, {}
```
```python
# third-party code
from xarray.indexes import Index
class MyCustomIndex(Index):
@classmethod
def with_options(cls, foo: int = 0, bar: bool = False) -> tuple[type[Self], dict[str, Any]]:
""""""Set a new MyCustomIndex with options.
Parameters
------------
foo : int, optional
The foo option (default: 1).
bar : bool, optional
The bar option (default: False).
""""""
return cls, {""foo"": foo, ""bar"": bar}
```
Thoughts?
### Describe alternatives you've considered
Build options are also likely defined in the Index constructor, e.g.,
```python
# third-party code
from xarray.indexes import Index
class MyCustomIndex(Index):
def __init__(self, data, foo=0, bar=False):
...
```
However, the Index constructor is not public API (only used internally and indirectly in Xarray when setting a new index from existing coordinates).
Any other idea?
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8002/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1307195361,PR_kwDOAMm_X847hz6o,6800,"(scipy 2022 branch) Add an ""options"" argument to Index.from_variables()",4160723,closed,0,,,1,2022-07-17T20:01:00Z,2022-12-08T09:38:50Z,2022-09-02T13:54:46Z,MEMBER,,0,pydata/xarray/pulls/6800,"It allows passing options to the constructor of a custom `Index` subclass, in case there's any relevant build options to expose to users. This could for example be the distance metric chosen for an index based on `sklearn.neighbors.BallTree`, or the CRS definition for a geospatial index.
The `**options` arguments of `Dataset.set_xindex()` are passed through.
An alternative way would be to pass options via coordinate metadata, like the `spatial_ref` coordinate in rioxarray. Perhaps both alternatives may co-exist?
This PR also adds type annotations to `set_xindex()`.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6800/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1357296406,PR_kwDOAMm_X84-IR52,6971,Add set_xindex and drop_indexes methods,4160723,closed,0,,,7,2022-08-31T12:54:35Z,2022-12-08T09:38:13Z,2022-09-28T07:25:15Z,MEMBER,,0,pydata/xarray/pulls/6971,"
- [x] Closes #6849
- [x] Supersedes #6800
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
- [x] New functions/methods are listed in `api.rst`
This PR adds Dataset and DataArray `.set_xindex` and `.drop_indexes` methods (the latter is also discussed in #4366). I've cherry picked the relevant commits in the `scipy22` branch and added a few more commits. This PR also allows passing build options to any `Index`.
Some comments and open questions:
- Should we make the `index_cls` argument of `set_xindex` optional?
- I.e., `set_index(coord_names, index_cls=None, **options)` where a pandas index is created by default (or a pandas multi-index if several coordinate names are given), provided that the coordinate(s) are valid 1-d candidates.
- This would be redundant with the existing `set_index` method, but this would be convenient if we later depreciate it.
- Should we depreciate `set_index` and `reset_index`? I think we should, but probably not at this point yet.
- There's a special case for multi-indexes where `set_xindex([""foo"", ""bar""], PandasMultiIndex)` adds a dimension coordinate in addition to the ""foo"" and ""bar"" level coordinates so that it is consistent with the rest of Xarray. I find it a bit annoying, though. Probably another motivation for depreciating this dimension coordinate.
- In this PR I also imported the `Index` base class in Xarray's root namespace.
- It is needed for custom indexes and it's just a little more convenient than importing it from `xarray.core.indexes`.
- Should we do the same for `PandasIndex` and `PandasMultiIndex` subclasses? Maybe if one wants to create a custom index inheriting from it. `PandasMultiIndex` factory methods could be also useful if we depreciate passing `pd.MultiIndex` objects as DataArray / Dataset coordinates.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6971/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1363524666,PR_kwDOAMm_X84-c82D,6999,Raise UserWarning when rename creates a new dimension coord,4160723,closed,0,,,2,2022-09-06T16:16:17Z,2022-12-08T09:38:13Z,2022-09-27T09:33:40Z,MEMBER,,0,pydata/xarray/pulls/6999,"
- [x] Closes #6607
- [x] Closes #4107
- [x] Closes #6229
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
Current implemented ""fix"": raise a `UserWarning` and suggest using `swap_dims` (*)
Alternatively, we could:
- revert the breaking change (i.e., create the index again) and raise a `DeprecationWarning` instead
- raise an error instead of a warning
I don't have strong opinions on this, I'm happy to implement another alternative. The downside of reverting the breaking change now is that unfortunately it will introduce a breaking change in the next release., while workarounds are pretty straightforward.
(*) from https://github.com/pydata/xarray/issues/6607#issuecomment-1126587818, doing `ds.set_coords(['lon']).rename(x='lon').set_index(lon='lon')` is working too. With #6971, `.set_xindex('lon')` could work as well.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6999/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1364493817,PR_kwDOAMm_X84-gJCw,7003,Misc. fixes for Indexes with pd.Index objects,4160723,closed,0,,,0,2022-09-07T11:05:02Z,2022-12-08T09:36:51Z,2022-09-23T07:30:38Z,MEMBER,,0,pydata/xarray/pulls/7003,"
- [x] Closes #6987
- [x] Tests added
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7003/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1390999159,PR_kwDOAMm_X84_3QjW,7105,Fix to_index(): return multiindex level as single index,4160723,closed,0,,,4,2022-09-29T14:44:22Z,2022-12-08T09:36:51Z,2022-10-12T14:12:48Z,MEMBER,,0,pydata/xarray/pulls/7105,"
- [x] Closes #6836
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7105/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1193611401,PR_kwDOAMm_X841rm9D,6443,Fix concat with scalar coordinate (wrong index type),4160723,closed,0,,,1,2022-04-05T19:16:30Z,2022-12-08T09:36:50Z,2022-04-06T01:19:48Z,MEMBER,,0,pydata/xarray/pulls/6443,"
- [x] Closes #6434
- [x] Tests added
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6443/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1389632629,PR_kwDOAMm_X84_ywy1,7101,Fix Dataset.assign_coords overwriting multi-index,4160723,closed,0,,,0,2022-09-28T16:21:48Z,2022-12-08T09:36:50Z,2022-09-28T18:02:16Z,MEMBER,,0,pydata/xarray/pulls/7101,"
- [x] Closes #7097
- [x] Tests added
@dcherian the `DeprecationWarning` was ignored by default for `.assign_coords()` because of https://github.com/pydata/xarray/pull/6798#discussion_r924653224. I changed it to `FutureWarning` so that it is shown for both `.assign()` and `.assign_coords()`.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7101/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1324225268,PR_kwDOAMm_X848a7mk,6857,Fix aligned index variable metadata side effect,4160723,closed,0,,,0,2022-08-01T10:57:16Z,2022-12-08T09:36:49Z,2022-08-31T07:16:14Z,MEMBER,,0,pydata/xarray/pulls/6857,"
- [x] Closes #6852
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6857/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1472483025,PR_kwDOAMm_X85EHyv7,7347,Fix assign_coords resetting all dimension coords to default index,4160723,closed,0,,,3,2022-12-02T08:19:01Z,2022-12-08T09:36:49Z,2022-12-02T16:32:40Z,MEMBER,,0,pydata/xarray/pulls/7347,"
- [x] Closes #7346
- [x] Tests added
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7347/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1472470718,I_kwDOAMm_X85XxB6-,7346,assign_coords reset all dimension coords to default (pandas) index,4160723,closed,0,,,0,2022-12-02T08:07:55Z,2022-12-02T16:32:41Z,2022-12-02T16:32:41Z,MEMBER,,,,"### What happened?
See https://github.com/martinfleis/xvec/issues/13#issue-1472023524
### What did you expect to happen?
`assign_coords()` should preserve the index of coordinates that are not updated or not part of a dropped multi-coordinate index.
### Minimal Complete Verifiable Example
See https://github.com/martinfleis/xvec/issues/13#issue-1472023524
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
Xarray version 2022.11.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7346/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1151751524,I_kwDOAMm_X85EplVk,6308,xr.doctor(): diagnostics on a Dataset / DataArray ?,4160723,open,0,,,4,2022-02-26T12:10:07Z,2022-11-07T15:28:35Z,,MEMBER,,,,"### Is your feature request related to a problem?
Recently I've been reading through various issue reports here and there (GH issues and discussions, forums, etc.) and I'm wondering if it wouldn't be useful to have some function in Xarray that inspects a Dataset or DataArray and reports a bunch of diagnostics, so that the community could better help troubleshooting performance or other issues faced by users.
It's not always obvious where to look (e.g., number of chunks of a dask array, number of tasks of a dask graph, etc.) to diagnose issues, sometimes even for experienced users.
### Describe the solution you'd like
A `xr.doctor(dataset_or_dataarray)` top-level function (or `Dataset.doctor()` / `DataArray.doctor()` methods) that would perform a battery of checks and return helpful diagnostics, e.g.,
- ""Data variable ""x"" wraps a dask array that contains a lot of tasks, which may affect performance""
- ""Data variable ""x"" wraps a dask array that contains many small chunks""
- ... possibly many other diagnostics?
### Describe alternatives you've considered
None
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6308/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1322198907,I_kwDOAMm_X85Ozyd7,6849,Public API for setting new indexes: add a set_xindex method?,4160723,closed,0,,,5,2022-07-29T12:38:34Z,2022-09-28T07:25:16Z,2022-09-28T07:25:16Z,MEMBER,,,,"### What is your issue?
xref https://github.com/pydata/xarray/pull/6795#discussion_r932665544 and #6293 (Public API section).
The `scipy22` branch contains the addition of a `.set_xindex()` method to DataArray and Dataset so that participants at the SciPy 2022 Xarray sprint could experiment with custom indexes. After thinking more about it, I'm wondering if it couldn't actually be part of Xarray's public API alongside `.set_index()` (at least for a while).
- Having two methods `.set_xindex()` vs. `.set_index()` would be quite consistent with the `.xindexes` vs. `.indexes` properties that are already there.
- I actually like the `.set_xindex()` API proposed in the `scipy22`, i.e., setting one index at a time from one or more coordinates, possibly with build options. While it *could* be possible to support both that and `.set_index()`'s current API (quite specific to pandas multi-indexes) all in one method, it would certainly result in a much more confusing API and internal implementation.
- In the long term we could progressively get rid of `.indexes` and `.set_index()` and/or rename `.xindexes` to `.indexes` and `.set_xindex()` to `.set_index()`.
Thoughts @pydata/xarray?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6849/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1361896826,I_kwDOAMm_X85RLOV6,6989,reset multi-index to single index (level): coordinate not renamed,4160723,closed,0,4160723,,0,2022-09-05T12:45:22Z,2022-09-27T10:35:39Z,2022-09-27T10:35:39Z,MEMBER,,,,"### What happened?
Resetting a multi-index to a single level (i.e., a single index) does not rename the remaining level coordinate to the dimension name.
### What did you expect to happen?
While it is certainly more consistent not to rename the level coordinate here (since an index can be assigned to a non-dimension coordinate now), it breaks from the old behavior. I think it's better not introduce any breaking change. As discussed elsewhere, we might eventually want to deprecate `reset_index` in favor of `drop_indexes` (#6971).
### Minimal Complete Verifiable Example
```Python
import pandas as pd
import xarray as xr
midx = pd.MultiIndex.from_product([[""a"", ""b""], [1, 2]], names=(""foo"", ""bar""))
ds = xr.Dataset(coords={""x"": midx})
#
# Dimensions: (x: 4)
# Coordinates:
# * x (x) object MultiIndex
# * foo (x) object 'a' 'a' 'b' 'b'
# * bar (x) int64 1 2 1 2
# Data variables:
# *empty*
rds = ds.reset_index(""foo"")
# v2022.03.0
#
#
# Dimensions: (x: 4)
# Coordinates:
# * x (x) int64 1 2 1 2
# foo (x) object 'a' 'a' 'b' 'b'
# Data variables:
# *empty*
# v2022.06.0
#
#
# Dimensions: (x: 4)
# Coordinates:
# foo (x) object 'a' 'a' 'b' 'b'
# * bar (x) int64 1 2 1 2
# Dimensions without coordinates: x
# Data variables:
# *empty*
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6989/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1361626450,I_kwDOAMm_X85RKMVS,6987,Indexes.get_unique() TypeError with pandas indexes,4160723,closed,0,4160723,,0,2022-09-05T09:02:50Z,2022-09-23T07:30:39Z,2022-09-23T07:30:39Z,MEMBER,,,,"@benbovy I also just tested the `get_unique()` method that you mentioned and maybe noticed a related issue here, which I'm not sure is wanted / expected.
Taking the above dataset `ds`, accessing this function results in an error:
```python
> ds.indexes.get_unique()
TypeError: unhashable type: 'MultiIndex'
```
However, for `xindexes` it works:
```python
> ds.xindexes.get_unique()
[]
```
_Originally posted by @lukasbindreiter in https://github.com/pydata/xarray/issues/6752#issuecomment-1236717180_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6987/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1364798843,PR_kwDOAMm_X84-hLRI,7004,Rework PandasMultiIndex.sel internals,4160723,open,0,,,2,2022-09-07T14:57:29Z,2022-09-22T20:38:41Z,,MEMBER,,0,pydata/xarray/pulls/7004,"
- [x] Closes #6838
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
This PR hopefully improves how are handled the labels that are provided for multi-index level coordinates in `.sel()`.
More specifically, slices are handled in a cleaner way and it is now allowed to provide array-like labels.
`PandasMultiIndex.sel()` relies on the underlying `pandas.MultiIndex` methods like this:
- use ``get_loc`` when all levels are provided with each a scalar label (no slice, no array)
- always drops the index and returns scalar coordinates for each multi-index level
- use ``get_loc_level`` when only a subset of levels are provided with scalar labels only
- may collapse one or more levels of the multi-index (dropped levels result in scalar coordinates)
- if only one level remains: renames the dimension and the corresponding dimension coordinate
- use ``get_locs`` for all other cases.
- always keeps the multi-index and its coordinates (even if only one item or one level is selected)
This yields a predictable behavior: as soon as one of the provided labels is a slice or array-like, the multi-index and all its level coordinates are kept in the result.
Some cases illustrated below (I compare this PR with an older release due to the errors reported in #6838):
```python
import xarray as xr
import pandas as pd
midx = pd.MultiIndex.from_product([list(""abc""), range(4)], names=(""one"", ""two""))
ds = xr.Dataset(coords={""x"": midx})
#
# Dimensions: (x: 12)
# Coordinates:
# * x (x) object MultiIndex
# * one (x) object 'a' 'a' 'a' 'a' 'b' 'b' 'b' 'b' 'c' 'c' 'c' 'c'
# * two (x) int64 0 1 2 3 0 1 2 3 0 1 2 3
# Data variables:
# *empty*
```
```python
ds.sel(one=""a"", two=0)
# this PR
#
#
# Dimensions: ()
# Coordinates:
# x object ('a', 0)
# one
# Dimensions: ()
# Coordinates:
# x object ('a', 0)
# Data variables:
# *empty*
#
```
```python
ds.sel(one=""a"")
# this PR:
#
#
# Dimensions: (two: 4)
# Coordinates:
# * two (two) int64 0 1 2 3
# one
# Dimensions: (two: 4)
# Coordinates:
# * two (two) int64 0 1 2 3
# Data variables:
# *empty*
#
```
```python
ds.sel(one=slice(""a"", ""b""))
# this PR
#
#
# Dimensions: (x: 8)
# Coordinates:
# * x (x) object MultiIndex
# * one (x) object 'a' 'a' 'a' 'a' 'b' 'b' 'b' 'b'
# * two (x) int64 0 1 2 3 0 1 2 3
# Data variables:
# *empty*
#
# v2022.3.0
#
#
# Dimensions: (two: 8)
# Coordinates:
# * two (two) int64 0 1 2 3 0 1 2 3
# Data variables:
# *empty*
#
```
```python
ds.sel(one=""a"", two=slice(1, 1))
# this PR
#
#
# Dimensions: (x: 1)
# Coordinates:
# * x (x) object MultiIndex
# * one (x) object 'a'
# * two (x) int64 1
# Data variables:
# *empty*
#
# v2022.3.0
#
#
# Dimensions: (x: 1)
# Coordinates:
# * x (x) MultiIndex
# - one (x) object 'a'
# - two (x) int64 1
# Data variables:
# *empty*
#
```
```python
ds.sel(one=[""b"", ""c""], two=[0, 2])
# this PR
#
#
# Dimensions: (x: 4)
# Coordinates:
# * x (x) object MultiIndex
# * one (x) object 'b' 'b' 'c' 'c'
# * two (x) int64 0 2 0 2
# Data variables:
# *empty*
#
# v2022.3.0
#
# ValueError: Vectorized selection is not available along coordinate 'one' (multi-index level)
#
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7004/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
302077805,MDU6SXNzdWUzMDIwNzc4MDU=,1961,"Extend xarray with custom ""coordinate wrappers""",4160723,closed,0,,,10,2018-03-04T11:26:15Z,2022-09-19T08:47:45Z,2022-09-19T08:47:44Z,MEMBER,,,,"Recent and ongoing developments in xarray turn DataArray and Dataset more and more into data wrappers that are extensible at (almost) every level:
- domain-specific methods (accessors)
- io backends (netcdf, raster, zarr, etc.) via an abstract `DataStore` interface
- array backends (numpy, dask, sparse) via multidispatch or hooks (#1938)
- soon custom indexes? (kd-tree, out-of-core indexes... #1603, #1650, #475)
Regarding the latter, I’m thinking about the idea of extending xarray at an even more abstract level, i.e., the possibility of adding / registering ""coordinate wrappers"" to `DataArray` or `Dataset` objects. Basically, it would correspond to adding any *object that allows to do some operation based on one or several coordinates* ~~(I haven’t found any better name than ""coordinate agent"" to describe that)~~.
EDIT: ""coordinate agents"" may not be quite right here, I changed that to ""coordinate wrappers"")
Indexes are a specific case of coordinate wrappers that serve the purpose of indexing. This is built in xarray.
While indexing is enough in 80% of cases, I see a couple of use cases where other coordinate wrappers (built outside of xarray) would be nice to have:
- Grids. For example, [xgcm](https://github.com/xgcm/xgcm) implements operations (interp, diff) on physical axes that may each include several coordinates, depending on the position of the coordinate labels on the axis (center, left…). Other grids define their topology using a greater number of coordinates (e.g., [ugrid](https://github.com/ugrid-conventions/ugrid-conventions)). Storing regridding weights might be another use case?
- Clocks. For example, [xarray-simlab](https://github.com/benbovy/xarray-simlab/) use one or several coordinates to define the timeline of a computational simulation.
In those examples we usually rely on coordinate attributes and/or classes that encapsulate xarray objects to implement the specific features that we need. While it works, it has limitations and I think it can be improved.
Custom coordinate wrappers would be a way of extending xarray that is very consistent with other current (or considered) extension mechanisms.
This is still a very vague idea and I’m sure that there are lots of details that can be discussed (serialization, etc.).
But before going further, I’d like to know your thoughts @pydata/xarray. Do you think it is a silly idea? Do you have in mind other use cases where custom coordinate wrappers would be useful?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1961/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
955936490,MDU6SXNzdWU5NTU5MzY0OTA=,5647,Flexible indexes: review the implementation of alignment and merge,4160723,closed,0,,,12,2021-07-29T15:03:23Z,2022-09-07T09:47:13Z,2022-09-07T09:47:13Z,MEMBER,,,,"The current implementation of the `align` function is problematic in the context of flexible indexes because:
- the sizes of the joined indexes are reused for checking compatibility with unlabelled dimension sizes
- the joined indexes are used as indexers to compute the aligned Dataset / DataArray.
This currently works well since a pd.Index can be directly treated as a 1-d array but this won’t be always the case anymore with custom indexes.
I'm opening this issue to gather ideas on how best to handle alignment in a more flexible way (I haven't been thinking much at this problem yet).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5647/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1325016510,I_kwDOAMm_X85O-iW-,6860,Align with join='override' may update index coordinate metadata,4160723,open,0,,,0,2022-08-01T21:45:13Z,2022-08-01T21:49:41Z,,MEMBER,,,,"### What happened?
It seems that `align(*, join=""override"")` may have affected and still may affect the metadata of index coordinate data in an incorrect way. See the MCV example below.
cf. @keewis' original https://github.com/pydata/xarray/pull/6857#discussion_r934425142.
### What did you expect to happen?
Index coordinate metadata unaffected by alignment (i.e., metadata is passed through object -> aligned object for each object), like for align with other join methods.
### Minimal Complete Verifiable Example
```Python
import xarray as xr
ds1 = xr.Dataset(coords={""x"": (""x"", [1, 2, 3], {""foo"": 1})})
ds2 = xr.Dataset(coords={""x"": (""x"", [1, 2, 3], {""bar"": 2})})
aligned1, aligned2 = xr.align(ds1, ds2, join=""override"")
aligned1.x.attrs
# v2022.03.0 -> {'foo': 1}
# v2022.06.0 -> {'foo': 1, 'bar': 2}
# PR #6857 -> {'foo': 1}
# expected -> {'foo': 1}
aligned2.x.attrs
# v2022.03.0 -> {}
# v2022.06.0 -> {'foo': 1, 'bar': 2}
# PR #6857 -> {'foo': 1, 'bar': 2}
# expected -> {'bar': 2}
aligned11, aligned22 = xr.align(ds1, ds2, join=""inner"")
aligned11.x.attrs
# {'foo': 1}
aligned22.x.attrs
# {'bar': 2}
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:36:15)
[Clang 11.1.0 ]
python-bits: 64
OS: Darwin
OS-release: 20.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 0.21.2.dev137+g30023a484
pandas: 1.4.0
numpy: 1.22.2
scipy: 1.7.1
netCDF4: 1.5.8
pydap: installed
h5netcdf: 0.11.0
h5py: 3.4.0
Nio: None
zarr: 2.6.1
cftime: 1.5.2
nc_time_axis: 1.2.0
PseudoNetCDF: installed
rasterio: 1.2.10
cfgrib: 0.9.8.5
iris: 3.0.4
bottleneck: 1.3.2
dask: 2022.01.1
distributed: 2022.01.1
matplotlib: 3.4.3
cartopy: 0.20.1
seaborn: 0.11.1
numbagg: 0.2.1
fsspec: 0.8.5
cupy: None
pint: 0.16.1
sparse: 0.13.0
flox: None
numpy_groupies: None
setuptools: 57.4.0
pip: 20.2.4
conda: None
pytest: 6.2.5
IPython: 7.27.0
sphinx: 3.3.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6860/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1322190255,I_kwDOAMm_X85OzwWv,6848,Update API,4160723,closed,0,,,0,2022-07-29T12:30:08Z,2022-07-29T12:30:23Z,2022-07-29T12:30:23Z,MEMBER,,,,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6848/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1176745736,PR_kwDOAMm_X840z4zt,6400,Speed-up multi-index html repr + add display_values_threshold option,4160723,closed,0,,,3,2022-03-22T12:57:37Z,2022-03-29T07:10:22Z,2022-03-29T07:05:32Z,MEMBER,,0,pydata/xarray/pulls/6400,"This adds `PandasMultiIndexingAdapter._repr_html_` that can greatly speed-up the html repr of Xarray objects with
multi-indexes.
This optimized `_repr_html_` implementation is now used for formatting the array detailed view of all multi-index coordinates in the html repr, instead of converting the full index and each levels to numpy arrays before formatting them.
```python
import xarray as xr
ds = xr.tutorial.load_dataset(""air_temperature"")
da = ds[""air""].stack(z=[...])
da.shape
# (3869000,)
%timeit -n 1 -r 1 da._repr_html_()
# 9.96 ms !
```
- [x] Closes #5529
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6400/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1174675456,PR_kwDOAMm_X840tJ9A,6388,isel: convert IndexVariable to Variable if index is dropped,4160723,closed,0,,,1,2022-03-20T20:29:58Z,2022-03-29T07:10:08Z,2022-03-21T04:47:48Z,MEMBER,,0,pydata/xarray/pulls/6388,"
- [x] Closes #6381
- [x] Tests added
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6388/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
616432851,MDExOlB1bGxSZXF1ZXN0NDE2NTQ0MzE4,4053,Fix html repr in untrusted notebooks (plain text fallback),4160723,closed,0,,,5,2020-05-12T07:38:22Z,2022-03-29T07:10:07Z,2020-05-20T17:06:40Z,MEMBER,,0,pydata/xarray/pulls/4053,"
- [x] Closes #4041
- [x] Tests added
- [x] Passes `isort -rc . && black . && mypy . && flake8`
- [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
This is not very elegant (actually plain text repr is already included in the notebook as `text/plain` mime type but it is ignored when `text/html` mime type is present), but it seems to work. I haven't found a better workaround.
I don't really know if this can be properly tested (I only added a basic test).
Steps to test this fix:
- To ""untrust"" a notebook: open an existing notebook with a simple editor, manually edit one output cell with a xarray object repr, and save the ipynb file.
- Open this notebook with the Notebook app, you should see the plain text repr.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4053/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
849315490,MDExOlB1bGxSZXF1ZXN0NjA4MTEwNjI0,5102,Flexible indexes: add Index base class and xindexes properties,4160723,closed,0,,,10,2021-04-02T16:18:07Z,2022-03-29T07:10:07Z,2021-05-11T08:21:26Z,MEMBER,,0,pydata/xarray/pulls/5102,"This PR clears up the path for flexible indexes:
- it adds a new ~~`IndexAdapter`~~ `Index` base class that is meant to be inherited by all xarray-compatible indexes (built-in or 3rd-party)
- `PandasIndexAdapter` now inherits from ~~`IndexAdapter`~~ `Index`
- the `xarray_obj.xindexes` properties return `Index` (`PandasIndexAdapter`) instances. `xarray_obj.indexes` properties still return `pandas.Index` instances.
~~The latter is a breaking change, although I'm not sure if the `indexes` property has been made public yet.~~
This is still work in progress, there are many broken tests that are not fixed yet. (EDIT: all tests should be fixed now).
There's a lot of dirty fixes to avoid circular dependencies and in the many places where we still need direct access to the `pandas.Index` objects, but I'd expect that these will be cleaned-up further in the refactoring.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5102/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
893415955,MDExOlB1bGxSZXF1ZXN0NjQ1OTMzODI3,5322,Internal refactor of label-based data selection,4160723,closed,0,,,1,2021-05-17T14:52:49Z,2022-03-29T07:10:07Z,2021-06-08T09:35:54Z,MEMBER,,0,pydata/xarray/pulls/5322,"Xarray label-based data selection now relies on a newly added `xarray.Index.query(self, labels: Dict[Hashable, Any]) -> Tuple[Any, Optional[None, Index]]` method where:
- `labels` is a always a dictionary with coordinate name(s) as key(s) and the corresponding selection label(s) as values
- When calling `.sel` with some coordinate(s)/label(s) pairs, those are first grouped by index so that only the relevant pairs are passed to an `Index.query`
- the returned tuple contains the positional indexers and (optionally) a new index object
For a simple `pd.Index`, `labels` always corresponds to a 1-item dictionary like `{'coord_name': label_values}`, which is not very useful in this case, but this format is useful for `pd.MultiIndex` and will likely be for other, custom indexes.
Moving the label->positional indexer conversion logic into `PandasIndex.query()`, I've tried to separate `pd.Index` vs `pd.MultiIndex` concerns by adding a new `PandasMultiIndex` wrapper class (it will probably be useful for other things as well) and refactor the complex logic that was implemented in `convert_label_indexer`. Hopefully it is a bit clearer now.
Working towards a more flexible/generic system, we still need to figure out how to:
- pass index query extra arguments like `method` and `tolerance` for `pd.Index` but in a more generic way
- handle several positional indexers over multiple dimensions possibly returned by a custom ""meta-index"" (e.g., staggered grid index)
- handle the case of positional indexers returned from querying >1 indexes along the same dimension (e.g., multiple coordinates along `x` with a simple `pd.Index`)
- pandas indexes don't need information like the names or shapes of their corresponding coordinate(s) to perform label-based selection, but this kind of information will probably be needed for other indexes (we actually need it for advanced point-wise selection using tree-based indexes in [xoak](https://github.com/xarray-contrib/xoak)).
This could be done in follow-up PRs..
Side note: I've initially tried to return from `xindexes` items for multi-index levels as well (not only index dimensions), but it's probably wiser to save this for later (when we'll tackle the multi-index virtual coordinate refactoring) as there are many places in Xarray where this is clearly not expected.
Happy to hear your thoughts @pydata/xarray.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5322/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
819062172,MDExOlB1bGxSZXF1ZXN0NTgyMjI0MTQ4,4979,Flexible indexes refactoring notes,4160723,closed,0,,,22,2021-03-01T16:57:32Z,2022-03-29T07:09:31Z,2021-03-17T16:47:29Z,MEMBER,,0,pydata/xarray/pulls/4979,"As a preliminary step before I take on the refactoring and implementation of flexible indexes in Xarray for the next few months, I reviewed the status of https://github.com/pydata/xarray/projects/1 and started compiling partially implemented or planned changes, thoughts, etc. into a single document that may serve as a basis for further discussion and implementation work.
It's still very much work in progress (I will update it regularly in the forthcoming days) and it is very open to discussion (we can use this PR for that)!
I'm not sure if Xarray's root folder is a good place for this document, though. We could move this into a new repository in `xarray-contrib` (that could also host other enhancement proposals) if that's necessary.
I'm looking forward to getting started on this and to getting your thoughts/feedback!
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4979/reactions"", ""total_count"": 13, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 7, ""confused"": 0, ""heart"": 3, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
903899735,MDExOlB1bGxSZXF1ZXN0NjU1MTA5NDg0,5385,Cast PandasIndex to pd.(Multi)Index,4160723,closed,0,,,0,2021-05-27T15:15:41Z,2022-03-29T07:09:31Z,2021-05-28T08:28:11Z,MEMBER,,0,pydata/xarray/pulls/5385,"
- [x] Closes #5384
- [x] Tests added
- [x] Passes `pre-commit run --all-files`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5385/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1174687047,PR_kwDOAMm_X840tLrz,6389,Re-index: fix missing variable metadata,4160723,closed,0,,,2,2022-03-20T21:11:38Z,2022-03-29T07:09:31Z,2022-03-21T07:53:05Z,MEMBER,,0,pydata/xarray/pulls/6389,"
- [x] Closes #6382
- [x] Tests added
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6389/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1174610081,PR_kwDOAMm_X840s_xU,6385,Fix concat with scalar coordinate,4160723,closed,0,,,0,2022-03-20T16:46:48Z,2022-03-29T07:09:30Z,2022-03-21T04:49:23Z,MEMBER,,0,pydata/xarray/pulls/6385,"
- [x] Closes #6384
- [x] Tests added
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6385/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1174615799,PR_kwDOAMm_X840tAtL,6386,Fix Dataset groupby returning a DataArray,4160723,closed,0,,,0,2022-03-20T17:06:13Z,2022-03-29T07:09:30Z,2022-03-20T18:55:27Z,MEMBER,,0,pydata/xarray/pulls/6386,"
- [x] Closes #6379
- [x] Tests added
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6386/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1175490214,PR_kwDOAMm_X840vt1_,6394,Fix DataArray groupby returning a Dataset,4160723,closed,0,,,0,2022-03-21T14:43:21Z,2022-03-29T07:09:30Z,2022-03-21T15:26:20Z,MEMBER,,0,pydata/xarray/pulls/6394,"
- [x] Closes #6393
- [x] Tests added
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6394/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1174622308,PR_kwDOAMm_X840tBvD,6387,Fix concat with variable or dataarray as dim (propagate attrs),4160723,closed,0,,,1,2022-03-20T17:27:41Z,2022-03-29T07:09:29Z,2022-03-20T18:53:46Z,MEMBER,,0,pydata/xarray/pulls/6387,"
- [x] Closes #6380
- [x] Tests added
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6387/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
1183360119,PR_kwDOAMm_X841JuRv,6418,Fix concat with scalar coordinate (dtype),4160723,closed,0,,,0,2022-03-28T12:22:50Z,2022-03-29T07:06:46Z,2022-03-28T16:05:01Z,MEMBER,,0,pydata/xarray/pulls/6418,"
- [x] Closes #6416
- [x] Tests added
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6418/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
968796847,MDU6SXNzdWU5Njg3OTY4NDc=,5697,Coerce the labels passed to Index.query to array-like objects,4160723,closed,0,,,3,2021-08-12T13:09:40Z,2022-03-17T17:11:43Z,2022-03-17T17:11:43Z,MEMBER,,,,"When looking at #5691 I noticed that the labels are sometimes coerced to arrays (i.e., #3153) but not always.
Later in `PandasIndex.query` those may again be coerced to arrays (i.e., `_as_array_tuplesafe`). In #5692 (https://github.com/pydata/xarray/pull/5692/commits/a551c7f05abf90a492fb59068b59ebb2bac8cb4c) they are always coerced to arrays before maybe be converted as scalars.
Shouldn't we therefore make things easier and ensure that the labels given to `xarray.Index.query()` always have an array interface? This would also yield a more predictable behavior to anyone who wants to implement custom xarray indexes.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5697/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
968990058,MDU6SXNzdWU5Njg5OTAwNTg=,5700,Selection with multi-index and float32 values,4160723,closed,0,,,0,2021-08-12T14:55:11Z,2022-03-17T17:11:43Z,2022-03-17T17:11:43Z,MEMBER,,,,"I guess it's rather an edge case, but a similar issue than the one fixed in #3153 may occur with multi-indexes:
```python
>>> foo_data = ['a', 'a', 'b', 'b']
>>> bar_data = np.array([0.1, 0.2, 0.7, 0.9], dtype=np.float32)
>>> da = xr.DataArray([1, 2, 3, 4], dims=""x"", coords={""foo"": (""x"", foo_data), ""bar"": (""x"", bar_data)})
>>> da = da.set_index(x=[""foo"", ""bar""])
```
```python
>>> da.sel(bar=0.1)
KeyError: 0.1
```
```python
>>> da.sel(bar=np.array(0.1, dtype=np.float32).item())
array([1])
Coordinates:
* foo (foo) object 'a'
```
(xarray version: 0.18.2 as there's a regression introduced in 0.19.0 #5691)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5700/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
955605233,MDU6SXNzdWU5NTU2MDUyMzM=,5645,Flexible indexes: handle renaming coordinate variables,4160723,closed,0,,,0,2021-07-29T08:42:00Z,2022-03-17T17:11:42Z,2022-03-17T17:11:42Z,MEMBER,,,,"We should have some API in `xarray.Index` to update the index when its corresponding coordinate variables are renamed.
This currently implemented here where the underlying `pd.Index` name(s) are updated: https://github.com/pydata/xarray/blob/c5530d52d1bcbd071f4a22d471b728a4845ea36f/xarray/core/dataset.py#L3299-L3314
This logic should be moved into `PandasIndex` and `PandasMultiIndex`.
Other, custom indexes might also have internal attributes to update, so we might need formal API for that.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5645/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1005623261,I_kwDOAMm_X8478Jfd,5812,Check explicit indexes when comparing two xarray objects,4160723,open,0,,,2,2021-09-23T16:19:32Z,2021-09-24T15:59:02Z,,MEMBER,,,,"
**Is your feature request related to a problem? Please describe.**
With the explicit index refactor, two Dataset or DataArray objects `a` and `b` may have the same variables / coordinates and attributes but different indexes.
**Describe the solution you'd like**
I'd suggest that `a.identical(b)` by default also checks for equality between`a.xindexes` and `b.xindexes`.
One drawback is when we want to check either the attributes or the indexes but not both. Should we add options like suggested in #5733 then?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5812/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1006335177,I_kwDOAMm_X847-3TJ,5814,Confusing assertion message when comparing datasets with differing coordinates,4160723,open,0,,,1,2021-09-24T10:50:11Z,2021-09-24T15:17:00Z,,MEMBER,,,,"
**What happened**:
When two datasets `a` and `b` have only differing coordinates, `xr.testing.assert_*` may output a confusing message that also reports differing data variables (although strictly equal/identical) sharing common dimensions with those differing coordinates. I guess it is because when comparing the data variables we compare `DataArray` objects (thus including the coordinates).
**What you expected to happen**:
An output assertion error message that shows only the differing coordinates.
**Minimal Complete Verifiable Example**:
```python
>>> import xarray as xr
>>> a = xr.Dataset(data_vars={""var"": (""x"", [10.0, 11.0])}, coords={""x"": [0, 1]})
>>> b = xr.Dataset(data_vars={""var"": (""x"", [10.0, 11.0])}, coords={""x"": [2, 3]})
>>> xr.testing.assert_equal(a, b)
```
```
AssertionError: Left and right Dataset objects are not equal
Differing coordinates:
L * x (x) int64 0 1
R * x (x) int64 2 3
Differing data variables:
L var (x) float64 10.0 11.0
R var (x) float64 10.0 11.0
```
I would rather expect:
```python
>>> xr.testing.assert_equal(a, b)
```
```
AssertionError: Left and right Dataset objects are not equal
Differing coordinates:
L * x (x) int64 0 1
R * x (x) int64 2 3
```
**Anything else we need to know?**:
**Environment**:
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:36:15)
[Clang 11.1.0 ]
python-bits: 64
OS: Darwin
OS-release: 20.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.19.1.dev72+ga8d84c703.d20210901
pandas: 1.3.2
numpy: 1.21.2
scipy: 1.7.1
netCDF4: 1.5.6
pydap: installed
h5netcdf: 0.8.1
h5py: 3.3.0
Nio: None
zarr: 2.6.1
cftime: 1.5.0
nc_time_axis: 1.2.0
PseudoNetCDF: installed
rasterio: 1.2.1
cfgrib: 0.9.8.5
iris: 3.0.4
bottleneck: 1.3.2
dask: 2021.01.1
distributed: 2021.01.1
matplotlib: 3.4.3
cartopy: 0.18.0
seaborn: 0.11.1
numbagg: None
fsspec: 0.8.5
cupy: None
pint: 0.16.1
sparse: 0.11.2
setuptools: 57.4.0
pip: 20.2.4
conda: None
pytest: 6.2.5
IPython: 7.27.0
sphinx: 3.3.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5814/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
985162305,MDU6SXNzdWU5ODUxNjIzMDU=,5755,Mypy errors with the last version of _typed_ops.pyi ,4160723,closed,0,,,5,2021-09-01T13:34:52Z,2021-09-13T10:53:16Z,2021-09-13T00:04:54Z,MEMBER,,,,"**What happened**:
Since #5569 I get a lot of mypy errors from `_typed_ops.pyi` (see below). What's weird is that it is not happening in all cases:
```
$ mypy # ok
$ mypy . # errors
$ pre-commit run --all-files # ok
$ pre-commit run # errors
$ git commit # (via pre-commit hooks) errors
```
I also tried `pre-commit clean` with no luck. EDIT: I also tried on a freshly cloned xarray repository.
@max-sixty @Illviljan Any idea on what's happening?
**What you expected to happen**:
No mypy error in all cases.
**Anything else we need to know?**:
```
xarray/core/_typed_ops.pyi:32: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:33: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:34: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:35: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:36: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:37: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:38: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:39: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:40: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:41: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:42: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:43: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:44: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:45: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:46: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:47: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:48: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:49: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:50: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:51: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:52: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:53: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:54: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:55: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:56: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:57: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:60: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:61: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:62: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:63: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:64: error: The erased type of self ""xarray.core.dataset.Dataset"" is not a supertype of its class ""xarray.core._typed_ops.DatasetOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:65: error: The erased type of self ""xarray.core.dataset.Dataset"" is not a supertype of its class ""xarray.core._typed_ops.DatasetOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:66: error: The erased type of self ""xarray.core.dataset.Dataset"" is not a supertype of its class ""xarray.core._typed_ops.DatasetOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:67: error: The erased type of self ""xarray.core.dataset.Dataset"" is not a supertype of its class ""xarray.core._typed_ops.DatasetOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:77: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:83: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:89: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:95: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:101: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:107: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:113: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:119: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:125: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:131: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:137: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:143: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:149: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:155: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:161: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:167: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:173: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:179: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:185: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:191: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:197: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:203: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:209: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:215: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:221: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:227: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:230: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:231: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:232: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:233: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:234: error: The erased type of self ""xarray.core.dataarray.DataArray"" is not a supertype of its class ""xarray.core._typed_ops.DataArrayOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:235: error: The erased type of self ""xarray.core.dataarray.DataArray"" is not a supertype of its class ""xarray.core._typed_ops.DataArrayOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:236: error: The erased type of self ""xarray.core.dataarray.DataArray"" is not a supertype of its class ""xarray.core._typed_ops.DataArrayOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:237: error: The erased type of self ""xarray.core.dataarray.DataArray"" is not a supertype of its class ""xarray.core._typed_ops.DataArrayOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:247: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:253: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:259: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:265: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:271: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:277: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:283: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:289: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:295: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:301: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:307: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:313: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:319: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:325: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:331: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:337: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:343: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:349: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:355: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:361: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:367: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:373: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:379: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:385: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:391: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:397: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:400: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:401: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:402: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:403: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:404: error: The erased type of self ""xarray.core.variable.Variable"" is not a supertype of its class ""xarray.core._typed_ops.VariableOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:405: error: The erased type of self ""xarray.core.variable.Variable"" is not a supertype of its class ""xarray.core._typed_ops.VariableOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:406: error: The erased type of self ""xarray.core.variable.Variable"" is not a supertype of its class ""xarray.core._typed_ops.VariableOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:407: error: The erased type of self ""xarray.core.variable.Variable"" is not a supertype of its class ""xarray.core._typed_ops.VariableOpsMixin"" [misc]
```
**Environment**:
mypy 0.910
python 3.9.6 (also tested with 3.8)
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5755/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
977149831,MDU6SXNzdWU5NzcxNDk4MzE=,5732,Coordinates implicitly created when passing a DataArray as coord to Dataset constructor,4160723,open,0,,,3,2021-08-23T15:20:37Z,2021-08-24T14:18:09Z,,MEMBER,,,,"I stumbled on this while working on #5692. Is this intended behavior or unwanted side effect?
**What happened**:
Create a new Dataset by passing a DataArray object as coordinate also add the DataArray coordinates to the dataset:
```python
>>> foo = xr.DataArray([1.0, 2.0, 3.0], coords={""x"": [0, 1, 2]}, dims=""x"")
>>> ds = xr.Dataset(coords={""foo"": foo})
>>> ds
Dimensions: (x: 3)
Coordinates:
* x (x) int64 0 1 2
foo (x) float64 1.0 2.0 3.0
Data variables:
*empty*
```
**What you expected to happen**:
The behavior above seems a bit counter-intuitive to me. I would rather expect no additional coordinates auto-magically added to the dataset, i.e. only one `foo` coordinate in this example:
```python
>>> ds
Dimensions: (x: 3)
Coordinates:
foo (x) float64 1.0 2.0 3.0
Data variables:
*empty*
```
**Environment**:
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.6 | packaged by conda-forge | (default, Nov 27 2020, 19:17:44)
[Clang 11.0.0 ]
python-bits: 64
OS: Darwin
OS-release: 20.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.19.0
pandas: 1.1.5
numpy: 1.21.1
scipy: 1.7.0
netCDF4: 1.5.5.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.3.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.07.2
distributed: 2021.07.2
matplotlib: 3.3.3
cartopy: 0.19.0.post1
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20201009
pip: 20.3.1
conda: None
pytest: 6.1.2
IPython: 7.25.0
sphinx: 3.3.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5732/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
933551030,MDU6SXNzdWU5MzM1NTEwMzA=,5553,Flexible indexes: how best to implement the new data model?,4160723,closed,0,,,2,2021-06-30T10:38:13Z,2021-08-09T07:56:56Z,2021-08-09T07:56:56Z,MEMBER,,,,"Yesterday during the flexible indexes weekly meeting we have discussed with @shoyer and @jhamman on what would be the best approach to implement the new data model described [here](https://github.com/pydata/xarray/blob/main/design_notes/flexible_indexes_notes.md#1-data-model). In this issue I summarize the implementation of the current data model as well as some suggestions for the new data model along with their pros / cons (I might still be missing important ones!). I don't think there's an easy or ideal solution unfortunately, so @pydata/xarray any feedback would be very welcome!
## Current data model implementation
Currently any (pandas) index is wrapped into an `IndexVariable` object through an intermediate adapter to preserve dtypes and handle explicit indexing. This allows directly reusing the index data as a xarray coordinate variable. For a pandas multi-index, virtual coordinates are created for each level from the `IndexVariable` object wrapping the index. Although relying on ""virtual coordinates"" more or less worked so far, it is over-complicated. Moreover, this wouldn't work with the new data model where an index may be built from a set of coordinates with different dimensions.
## Proposed alternatives
### Option 1: independent (coordinate) variables and indexes
Indexes and coordinates are loosely coupled, i.e., a `xarray.Index` holds a reference (mapping) to the coordinate variable(s) from which it is built but both manage their own data independently of each other.
Pros:
- separation of concerns.
- we don't need anymore those complicated adapters for reusing the index data as xarray (virtual) variable(s), which may simplify some xarray internals.
- if we drop an index, that's simple, we just drop it and all its related coordinate variables are left as-is.
- we could theoretically build a (pandas) index from a chunked coordinate, and then when we drop the index we still have this chunked coordinate left untouched.
Cons:
- data duplication
- this would clearly be a regression when using pandas indexes, but maybe less so for other indexes like kd-trees where adapting those objects for using it like coordinate variables wouldn't be easy or even possible.
- what if we want to build a `DataArray` or `Dataset` from one or more existing indexes (pandas or other)? Passing an index and treating as an array then re-building an index from this array is not optimal.
- keeping an index and its corresponding coordinate variable(s) in a consistent, in-sync state may be tricky, given that those variables may be mutable (although we could prevent this by encapsulating those variables using a very lightweight wrapper inspired by `IndexVariable`).
### Option 2: indexes hold coordinate variables
This is the opposite approach of the current one. Here, a `xarray.Index` would wrap one or more `xarray.Variable` objects.
Pros:
- probably easier to keep an index and its corresponding coordinate variable(s) in-sync.
- sharing data between an index and its coordinate variables may be easier.
Cons:
- accessing / iterating through all coordinate variables in a `DataArray` or `Dataset` may be less straightforward.
- when the index is dropped, we might need some logic / API to return the coordinates as new `xarray.Variable` objects with their own data (or should we simply always drop the corresponding coordinates too? maybe not...).
- more responsibility / work for developers who want to provide 3rd party xarray indexes.
### Option 3: intermediate solution
When an index is set (or unset), it returns a new set of coordinate variables to replace the existing ones.
Pros:
- it keeps some separation of concerns, while it allows data sharing through adapters and/or ensures that variables are immutable using lightweight wrappers.
Cons:
- like option 2, more things to care of for 3rd party xarray index developers.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5553/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
187859705,MDU6SXNzdWUxODc4NTk3MDU=,1092,Dataset groups,4160723,closed,0,,,20,2016-11-07T23:28:36Z,2021-07-02T19:56:50Z,2021-07-02T19:56:49Z,MEMBER,,,,"EDIT: see https://github.com/pydata/xarray/issues/4118 for ongoing discussion
-------------------
Probably it has been already suggested, but similarly to netCDF4 groups it would be nice if we could access `Dataset` data variables, coordinates and attributes via groups.
Currently xarray allows loading a specific netCDF4 group into a `Dataset`. Different groups can be loaded as separate `Dataset` objects, which may be then combined into a single, flat `Dataset`. Yet, in some cases it makes sense to represent data as a single object while it would be convenient to keep some nested structure. For example, a `Dataset` representing data on a staggered grid might have `scalar_vars` and `flux_vars` groups. [Here](https://unidata.ucar.edu/software/netcdf/workshops/2010/groups-types/GroupUses.html) are some potential uses for groups. When there are a lot of data variables and/or attributes, it would also help to have a more concise repr.
I think about an implementation of `Dataset.groups` that would be specific to xarray, i.e., independent of any backend, and which would easily co-exist with the flat `Dataset`. It shouldn't be required for a backend to support groups (some existing backends simply don't). It is up to each backend to eventually transpose the `Dataset.groups` logic to its own group logic.
`Dataset.groups` might return a `DatasetGroups` object, which quite similarly to `xarray.core.coordinates.DatasetCoordinates` would (1) have a reference to the Dataset object, (2) basically consist of a Mapping of group names to data variable/coordinate/attribute names and (3) dynamically create another `Dataset` object (sub-dataset) on `__getitem__`. Keys of `Dataset.groups` should be accessible as attributes , e.g., `ds.groups['scalar_vars'] == ds.scalar_vars`.
Questions:
- How to handle hierarchies of > 1 levels (i.e., groups of groups...)?
- How to ensure that a variable / attribute in one group is not also present in another group?
- Case of methods called from groups with `inplace=True`?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1092/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
902009258,MDU6SXNzdWU5MDIwMDkyNTg=,5376,Multi-scale datasets and custom indexes,4160723,open,0,,,6,2021-05-26T08:38:00Z,2021-06-02T08:07:38Z,,MEMBER,,,,"I've been wondering if:
- multi-scale datasets are generic enough to implement some related functionality in Xarray, e.g., as new `Dataset` and/or `DataArray` method(s)
- we could leverage custom indexes for that (see the [design notes](https://github.com/pydata/xarray/blob/master/design_notes/flexible_indexes_notes.md))
I'm thinking of an API that would look like this:
```python
# lazily load a big n-d image (full resolution) as a xarray.Dataset
xyz_dataset = ...
# set a new index for the x/y/z coordinates
# (`reduction` and `pre_compute_scales` are optional and passed
# as arguments to `ImagePyramidIndex`)
xyz_dataset.set_index(
('x', 'y', 'z'),
ImagePyramidIndex,
reduction=np.mean,
pre_compute_scales=(2, 2),
)
# get a slice (ImagePyramidIndex will be used to dynamically scale the data
# or load the right pre-computed dataset)
xyz_slice = xyz_dataset.sel_and_rescale(x=slice(...), y=slice(...), z=slice(...))
```
where `ImagePyramidIndex` is not a ""common"" index, i.e., it cannot be used directly with Xarray's `.sel()` nor for data alignment. Using an index here might still make sense for such data extraction and resampling operation IMHO. We could extend the `xarray.Index` API to handle multi-scale datasets, so that `ImagePyramidIndex` could either do the scaling dynamically (maybe using a cache) or just lazily load pre-computed data, e.g., from a [NGFF](https://ngff.openmicroscopy.org/latest/) / OME-Zarr dataset... Both the implementation and functionality can be pretty flexible. Custom options may be passed through the Xarray API either when creating the index or when extracting a data slice.
A hierarchical structure of `xarray.Dataset` objects is already discussed in #4118 for multi-scale datasets, but I'm wondering if using indexes could be an alternative approach (it could also be complementary, i.e., `ImagePyramidIndex` could rely on such hierarchical structure under the hood).
I'd see some advantages of the index approach, although this is the perspective from a naive user who is not working with multi-scale datasets:
- it is flexible: the scaling may be done dynamically without having to store the results in a hierarchical collection with some predefined discrete levels
- we don't need to expose anything other than a simple `xarray.Dataset` + a ""black-box"" index in which we abstract away all the implementation details. The API example shown above seems more intuitive to me than having to deal directly with Dataset groups.
- Xarray will provide a plugin system for 3rd party indexes, allowing for more `ImagePyramidIndex` variants. Xarray already provides an extension mechanism (accessors) for methods like `sel_and_rescale` in the example above...
That said, I'd also see the benefits of exposing Dataset groups more transparently to users (in case those are loaded from a store that supports it).
cc @thewtex @joshmoore @d-v-b","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5376/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 1}",,,13221727,issue
869721207,MDU6SXNzdWU4Njk3MjEyMDc=,5226,Attributes encoding compatibility between backends,4160723,open,0,,,1,2021-04-28T09:11:19Z,2021-04-28T15:42:42Z,,MEMBER,,,,"**What happened**:
Let's create an Zarr dataset with some ""less common"" dtype and fill value, open it with Xarray and save the dataset as NetCDF:
```python
import xarray as xr
import zarr
g = zarr.group()
g.create('arr', shape=3, fill_value='z', dtype='Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
libhdf5: None
libnetcdf: None
xarray: 0.17.0
pandas: 1.0.3
numpy: 1.18.1
scipy: 1.3.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.8.1
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.11.0
distributed: 2.14.0
matplotlib: 3.1.1
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 46.1.3.post20200325
pip: 19.2.3
conda: None
pytest: 5.4.1
IPython: 7.13.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5226/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
733077617,MDU6SXNzdWU3MzMwNzc2MTc=,4555,Vectorized indexing (isel) of chunked data with 1D indices gives weird chunks,4160723,open,0,,,1,2020-10-30T10:55:33Z,2021-03-02T17:36:48Z,,MEMBER,,,,"
**What happened**:
Applying `.isel()` on a DataArray or Dataset with chunked data using 1-d indices (either stored in a `xarray.Variable` or a `numpy.ndarray`) gives weird chunks (i.e., a lot of chunks with small sizes).
**What you expected to happen**:
More consistent chunk sizes.
**Minimal Complete Verifiable Example**:
Let's create a chunked DataArray
```python
In [1]: import numpy as np
In [2]: import xarray as xr
In [3]: da = xr.DataArray(np.random.rand(100), dims='points').chunk(50)
In [4]: da
Out[4]:
dask.array, shape=(100,), dtype=float64, chunksize=(50,), chunktype=numpy.ndarray>
Dimensions without coordinates: points
```
Select random indices results in a lot of small chunks
```python
In [5]: indices = xr.Variable('nodes', np.random.choice(np.arange(100, dtype='int'), size=10))
In [6]: da_sel = da.isel(points=indices)
In [7]: da_sel.chunks
Out[7]: ((1, 1, 3, 1, 1, 3),)
```
What I would expect
```python
In [8]: da.data.vindex[indices.data].chunks
Out[8]: ((10,),)
```
This works fine with 2+ dimensional indexers, e.g.,
```python
In [9]: indices_2d = xr.Variable(('x', 'y'), np.random.choice(np.arange(100), size=(10, 10)))
In [10]: da_sel_2d = da.isel(points=indices_2d)
In [11]: da_sel_2d.chunks
Out[11]: ((10,), (10,))
```
**Anything else we need to know?**:
I suspect the issue is here:
https://github.com/pydata/xarray/blob/063606b90946d869e90a6273e2e18ed24bffb052/xarray/core/variable.py#L616-L617
In the example above I think we still want vectorized indexing (i.e., call `dask.array.Array.vindex[]` instead of `dask.array.Array[]`).
**Environment**:
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.3 | packaged by conda-forge | (default, Jun 1 2020, 17:21:09)
[Clang 9.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 18.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.UTF-8
libhdf5: None
libnetcdf: None
xarray: 0.16.1
pandas: 1.1.3
numpy: 1.19.1
scipy: 1.5.2
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.19.0
distributed: 2.25.0
matplotlib: 3.3.1
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 47.3.1.post20200616
pip: 20.1.1
conda: None
pytest: 5.4.3
IPython: 7.16.1
sphinx: 3.2.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4555/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
187873247,MDU6SXNzdWUxODc4NzMyNDc=,1094,Supporting out-of-core computation/indexing for very large indexes,4160723,open,0,,,5,2016-11-08T00:56:56Z,2021-01-26T20:09:12Z,,MEMBER,,,,"(Follow-up of discussion here https://github.com/pydata/xarray/pull/1024#issuecomment-258524115).
xarray + dask.array successfully enable out-of-core computation for very large variables that doesn't fit in memory. One current limitation is that the indexes of a `Dataset` or `DataArray`, which rely on `pandas.Index`, are still fully loaded into memory (it will be soon loaded eagerly after #1024). In many cases this is not a problem, as the sizes of 1-dimensional indexes are usually much smaller than the sizes of n-dimensional variables or coordinates.
However, this may be problematic in some specific cases where we have to deal with very large indexes. As an example, big unstructured meshes often have coordinates (x, y, z) arranged as 1-d arrays of length that equals the number of nodes, which can be very large!! (See, e.g., [ugrid conventions](http://ugrid-conventions.github.io/ugrid-conventions/)).
It would be very nice if xarray could also help for these use cases. Therefore I'm wondering if (and how) out-of-core support can be extended to indexes and indexing.
I've briefly looked at the documentation on `dask.dataframe`, and a first naive approach I have in mind would be to allow partitioning an index into multiple, contiguous indexes. For label-based indexing, we might for example map `indexing.convert_label_indexer` to each partition and combine the returned indexers.
My knowledge of dask is very limited, though. So I've no doubt that this suggestion is very simplistic and not very efficient, or that there are better approaches. I'm also certainly missing other issues not directly related to indexing.
Any thoughts?
cc @shoyer @mrocklin
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1094/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
512564243,MDExOlB1bGxSZXF1ZXN0MzMyNTUyNTA3,3448,Add license for the icons used in the html repr,4160723,closed,0,,,1,2019-10-25T14:57:20Z,2019-10-25T15:48:52Z,2019-10-25T15:40:46Z,MEMBER,,0,pydata/xarray/pulls/3448,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3448/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
249584098,MDExOlB1bGxSZXF1ZXN0MTM1Mjk4ODY3,1507,Detailed report for testing.assert_equal and testing.assert_identical,4160723,closed,0,,,18,2017-08-11T09:38:23Z,2019-10-25T15:07:39Z,2019-01-18T09:16:31Z,MEMBER,,0,pydata/xarray/pulls/1507," - ~~Closes #xxxx~~
- [x] Tests added / passed
- [x] Passes ``git diff upstream/master | flake8 --diff``
- [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
~~In addition to `Dataset` repr, the error message also shows the output of `Dataset.info()` for both datasets.~~
~~This may not be the most elegant solution, but it is helpful when datasets only differ by their attributes attached to coordinates or data variables (not shown in repr). I'm open to any suggestion.~~
The report shows the differences for dimensions, data values (``Variable`` and ``DataArray``), coordinates, data variables and attributes (the latter only for ``testing.assert_identical``).
There is currently not much tests for `xarray.testing` functions, but I'm willing to add more if needed.
Not sure if it's worth a what's new entry (EDIT: added one).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1507/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
274619743,MDExOlB1bGxSZXF1ZXN0MTUzMTE4MjQ3,1723,Fix unexpected behavior of .set_index() since pandas 0.21.0,4160723,closed,0,,,0,2017-11-16T18:37:20Z,2019-10-25T15:07:18Z,2017-11-17T00:54:51Z,MEMBER,,0,pydata/xarray/pulls/1723," - [x] Closes #1722
- [x] Tests added / passed
- [x] Passes ``git diff upstream/master **/*py | flake8 --diff``
- [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1723/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
287844110,MDExOlB1bGxSZXF1ZXN0MTYyNDI2NzU2,1820,WIP: html repr,4160723,closed,0,,,40,2018-01-11T16:33:07Z,2019-10-25T15:06:58Z,2019-10-24T16:48:46Z,MEMBER,,0,pydata/xarray/pulls/1820," - [x] Closes #1627
- [ ] Tests added
- [ ] Tests passed
- [ ] Passes ``git diff upstream/master **/*py | flake8 --diff``
- [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
This is work in progress, although the basic functionality is there. You can see a preview here:
http://nbviewer.jupyter.org/gist/benbovy/3009f342fb283bd0288125a1f7883ef2
TODO:
- [ ] Add support for Multi-indexes
- [ ] Probably good to have some opt-in or fail back system in case where we (or users) know that the rendering will not work
- [ ] Add some tests
Nice to have (keep this for later):
- Clean-up CSS code and HTML template (track CSS [subgrid support](https://caniuse.com/#feat=css-subgrid) in browsers, this may simplify a lot the things here).
- Dynamically adapt cell widths (given the length of the names of variables and dimensions). Currently all cells have a fixed width. This is tricky, though, as we don't use a monospace font here.
- Integration with jupyterlab/notebook themes (CSS classes) and maybe allow custom CSS.
- Integration of Dask arrays HTML repr (+ integration of repr for other array backends).
- Maybe find a way (if possible) to include CSS only once in the notebook (currently it is included each time a xarray object is displayed in an output cell, which is not very nice).
- Review the rules for collapsing the `Coordinates`, `Data variables` and `Attributes` sections (maybe expose them as global options).
- Maybe also define some rules to collapse automatically the data section (DataArray and Variable) when the data repr is too long.
- Maybe add rich representation for `Dataset.coords` and `Dataset.data_vars` as well?
Other thoughts (old)
A big challenge here is to provide both robust and flexible styling (CSS):
- I have tested the current styling in jupyterlab (0.30.6, light theme), notebook (5.2.2) and nbviewer: despite some slight differences it looks quite good!
- However, the current CSS code is a bit fragile (I had to add a lot of `!important`). Probably this could be a bit cleaned and optimized (unfortunately my CSS skills are limited).
- Also, with the jupyterlab's dark theme it looks ugly. We probably need to use jupyterlab CSS variables so that our CSS scheme is compatible with the theme machinery, but at the same time we need to support other front-ends. So we probably need to maintain different stylings (i.e., multiple CSS files, one of them picked-up depending on the front-end), though I don't know if it's easy to automatically detect the front-end (choosing a default style is difficult too).
- The notebook rendering on Github seems to disable style tags (no style is applied to the output, see https://gist.github.com/benbovy/3009f342fb283bd0288125a1f7883ef2). Output is not readable at all in this case, so it might be useful to allow turning off rich output as an option.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1820/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
264747372,MDU6SXNzdWUyNjQ3NDczNzI=,1627,html repr of xarray object (for the notebook),4160723,closed,0,,,39,2017-10-11T21:49:20Z,2019-10-24T16:56:15Z,2019-10-24T16:48:47Z,MEMBER,,,,"Edit: preview for `Dataset` and `DataArray` (pure html/css)
`Dataset`: https://jsfiddle.net/tay08cn9/4/
`DataArray`: https://jsfiddle.net/43z4v2wt/9/
---
I started to think a bit more deeply about how could look like a more rich, html-based representation of xarray objects that we would see, e.g., in jupyter notebooks.
Here are some ideas for `Dataset`: https://jsfiddle.net/9ab4c3tr/35/
Some notes:
- The html repr looks pretty similar than the plain-text repr. I think it's better if they don't differ too much from each other.
- For the sake of consistency, I've stolen some style from `pandas.Dataframe` repr as it is shown in jupyterlab.
- I tried to emphasize the most important parts of the repr, i.e., the lists of dimensions, coordinates and variables.
- I think it's best if we keep a very lightweight implementation, i.e., pure HTML/CSS (no Javascript). It already allows some interaction like hover effects and collapsible sections. However, I doubt that more fancy stuff (like, e.g., highlighting on hover a specific dimension simultaneously at several places of the repr) would be possible here without Javascript. I have limited skills in this area, though.
It is still, of course, some preliminary thoughts. Any feedback/suggestion is welcome, even opinions about whether an html repr is really needed or not!
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1627/reactions"", ""total_count"": 11, ""+1"": 7, ""-1"": 0, ""laugh"": 0, ""hooray"": 4, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
234658224,MDU6SXNzdWUyMzQ2NTgyMjQ=,1447,"Package naming ""conventions"" for xarray extensions",4160723,closed,0,,,5,2017-06-08T21:14:24Z,2019-06-28T22:58:33Z,2019-06-28T21:58:33Z,MEMBER,,,,"I'm wondering what would be a good name for a package that primarily aims at providing an xarray extension (in the form of a `DataArray` and/or `Dataset` accessor).
I'm currently thinking about using a prefix like the `scikit` package family (e.g., `scikit-learn`, `scikit-image`).
For example, for a xarray extension for signal processing we would have:
package full name: `xarray-signal`
package import name: `xrsignal` (like `sklearn`)
accessor name: `signal`.
```python
>>> import xarray as xr
>>> import xrsignal
>>> ds = xr.Dataset()
>>> ds.signal.process(...)
```
The main advantage is that we directly have an idea on what the package is about. It may be also good for the overall visibility of both xarray and its 3rd-party extensions.
The downside is that there is three name variations: one for getting and installing the package, another one for importing the package and again another one for using the accessor. This may be annoying especially for new users who are not accustomed to this kind of naming convention.
Conversely, choosing a different, unrelated name like [salem](https://github.com/fmaussion/salem) or [pangaea](https://github.com/snowman2/pangaea) has the advantage of using the same name everywhere and perhaps providing multiple accessors in the same package, but given that the number of xarray extensions is likely to grow in a next future (see, e.g., the [pangeo-data](https://pangeo-data.github.io/) project) it would become difficult to have a clear view of the whole xarray package ecosystem.
Any thoughts?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1447/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
180676935,MDU6SXNzdWUxODA2NzY5MzU=,1030,Concatenate multiple variables into one variable with a multi-index (categories),4160723,closed,0,,,3,2016-10-03T15:54:23Z,2019-02-25T07:25:40Z,2019-02-25T07:25:40Z,MEMBER,,,,"I often have to deal with datasets in this form (multiple variables of different sizes, each representing different categories, on the same physical dimension but using different names as they have different labels),
```
Dimensions: (wn_band1: 4, wn_band2: 6, wn_band3: 8)
Coordinates:
* wn_band1 (wn_band1) float64 200.0 266.7 333.3 400.0
* wn_band2 (wn_band2) float64 500.0 560.0 620.0 680.0 740.0 800.0
* wn_band3 (wn_band3) float64 1.5e+03 1.643e+03 1.786e+03 1.929e+03 ...
Data variables:
data_band3 (wn_band3) float64 0.7515 0.5302 0.6697 0.9621 0.01815 ...
data_band1 (wn_band1) float64 0.3801 0.6649 0.01884 0.9407
data_band2 (wn_band2) float64 0.8813 0.4481 0.2353 0.9681 0.1085 0.0835
```
where it would be more convenient to have the data re-arranged into the following form (concatenate the variables into a single variable with a multi-index with the labels of both the categories and the physical coordinate):
```
Dimensions: (spectrum: 18)
Coordinates:
* spectrum (spectrum) MultiIndex
- band (spectrum) int64 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3 3
- wn (spectrum) float64 200.0 266.7 333.3 400.0 500.0 560.0 620.0 ...
Data variables:
data (spectrum) float64 0.3801 0.6649 0.01884 0.9407 0.8813 0.4481 ...
```
The latter would allow using xarray's nice features like `ds.groupby('band').mean()`.
Currently, the best way that I've found to transform the data is something like:
``` python
data = np.concatenate([ds.data_band1, ds.data_band2, ds.data_band3])
wn = np.concatenate([ds.wn_band1, ds.wn_band2, ds.wn_band3])
band = np.concatenate([np.repeat(1, 4), np.repeat(2, 6), np.repeat(3, 8)])
midx = pd.MultiIndex.from_arrays([band, wn], names=('band', 'wn'))
ds2 = xr.Dataset({'data': ('spectrum', data)}, coords={'spectrum': midx})
```
Maybe I miss a better way to do this? If I don't, it would be nice to have a convenience method for this, unless this use case is too rare to be worth it. Also not sure at all on what would be a good API such a method.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1030/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
349078381,MDExOlB1bGxSZXF1ZXN0MjA3Mjc3NDg2,2357,DOC: move xarray related projects to top-level TOC section,4160723,closed,0,,,1,2018-08-09T10:57:47Z,2018-08-11T13:41:24Z,2018-08-10T20:13:08Z,MEMBER,,0,pydata/xarray/pulls/2357,"Make xarray-related projects more discoverable, as it has been suggested in xarray mailing-list.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2357/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
300588788,MDExOlB1bGxSZXF1ZXN0MTcxNjMxNTQ1,1946,DOC: add main sections to toc,4160723,closed,0,,,0,2018-02-27T11:13:17Z,2018-02-27T21:16:18Z,2018-02-27T19:04:24Z,MEMBER,,0,pydata/xarray/pulls/1946,"Not a big change, but adds a little more clarity IMO.
I'm open to any suggestion for better section names and/or organization. Also I let ""What's new"" at the top, but not sure if ""Getting started"" is the right section.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1946/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
275033174,MDU6SXNzdWUyNzUwMzMxNzQ=,1727,IPython auto-completion triggers data loading,4160723,closed,0,,,11,2017-11-18T00:14:00Z,2017-11-18T07:09:41Z,2017-11-18T07:09:40Z,MEMBER,,,,"I create a big netcdf file like this:
```python
In [1]: import xarray as xr
In [2]: import numpy as np
In [3]: ds = xr.Dataset({'myvar': np.arange(100000000, dtype='float64')})
In [4]: ds.to_netcdf('test.nc')
```
Then when I open the file in a IPython console and I use auto-completion, it triggers loading the data.
```python
In [1]: import xarray as xr
In [2]: ds = xr.open_dataset('test.nc')
In [3]: ds.my # autocompletion with any character -> triggers loading
```
I don't have that issue using the python console. Auto-completion for dictionary access in IPython (#1632) works fine too.
#### Output of ``xr.show_versions()``
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: fr_BE.UTF-8
LOCALE: fr_BE.UTF-8
xarray: 0.10.0rc1-2-gf83361c
pandas: 0.21.0
numpy: 1.13.1
scipy: 0.19.1
netCDF4: 1.3.1
h5netcdf: 0.5.0
Nio: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.15.4
matplotlib: None
cartopy: None
seaborn: None
setuptools: 36.6.0
pip: 9.0.1
conda: None
pytest: None
IPython: 6.2.1
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1727/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
274591962,MDU6SXNzdWUyNzQ1OTE5NjI=,1722,Change in behavior of .set_index() from pandas 0.20.3 to 0.21.0,4160723,closed,0,,,1,2017-11-16T17:05:20Z,2017-11-17T00:54:51Z,2017-11-17T00:54:51Z,MEMBER,,,,"I use xarray 0.9.6 for both examples below.
With pandas 0.20.3, `Dataset.set_index` gives me what I expect (i.e., the `grid__x` data variable becomes a coordinate `x`):
```python
In [1]: import xarray as xr
In [2]: import pandas as pd
In [3]: pd.__version__
Out[3]: '0.20.3'
In [4]: ds = xr.Dataset({'grid__x': ('x', [1, 2, 3])})
In [5]: ds.set_index(x='grid__x')
Out[5]:
Dimensions: (x: 3)
Coordinates:
* x (x) int64 1 2 3
Data variables:
*empty*
```
With pandas 0.21.0, it creates a `MultiIndex`, which is not what I expect here when setting an index with only one data variable:
```python
In [1]: import xarray as xr
In [2]: import pandas as pd
In [3]: pd.__version__
Out[3]: '0.21.0'
In [4]: ds = xr.Dataset({'grid__x': ('x', [1, 2, 3])})
In [5]: ds.set_index(x='grid__x')
Out[5]:
Dimensions: (x: 3)
Coordinates:
* x (x) MultiIndex
- grid__x (x) int64 1 2 3
Data variables:
*empty*
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1722/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
230631480,MDExOlB1bGxSZXF1ZXN0MTIxOTQyNjMx,1422,xarray.core.variable.as_variable part of the public API,4160723,closed,0,,,6,2017-05-23T08:44:08Z,2017-06-10T18:33:34Z,2017-06-02T17:55:12Z,MEMBER,,0,pydata/xarray/pulls/1422," - [x] Closes #1303
- [x] Tests added / passed
- [x] Passes ``git diff upstream/master | flake8 --diff`` (if we ignore messages for .rst files and ""imported but not used"" messages for `xarray.__init__.py`)
- [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
Make `xarray.core.variable.as_variable` part of the public API and accessible as a top-level function: `xarray.as_variable`.
I changed the docstrings to follow the numpydoc format more closely.
I also removed the `copy=False` keyword arguments as apparently it was unused. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1422/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
134359597,MDU6SXNzdWUxMzQzNTk1OTc=,767,MultiIndex and data selection,4160723,closed,0,,,9,2016-02-17T18:24:00Z,2016-09-14T14:28:29Z,2016-09-14T14:28:29Z,MEMBER,,,,"[Edited for more clarity]
First of all, I find the MultiIndex very useful and I'm looking forward to see the TODOs in #719 implemented in the next releases, especially the three first ones in the list!
Apart from these issues, I think that some other aspects may be improved, notably regarding data selection. Or maybe I've not correctly understood how to deal with multi-index and data selection...
To illustrate this, I use some fake spectral data with two discontinuous bands of different length / resolution:
```
In [1]: import pandas as pd
In [2]: import xarray as xr
In [3]: band = np.array(['foo', 'foo', 'bar', 'bar', 'bar'])
In [4]: wavenumber = np.array([4050.2, 4050.3, 4100.1, 4100.3, 4100.5])
In [5]: spectrum = np.array([1.7e-4, 1.4e-4, 1.2e-4, 1.0e-4, 8.5e-5])
In [6]: s = pd.Series(spectrum, index=[band, wavenumber])
In [7]: s.index.names = ('band', 'wavenumber')
In [8]: da = xr.DataArray(s, dims='band_wavenumber')
In [9]: da
Out[9]:
array([ 1.70000000e-04, 1.40000000e-04, 1.20000000e-04,
1.00000000e-04, 8.50000000e-05])
Coordinates:
* band_wavenumber (band_wavenumber) object ('foo', 4050.2) ...
```
I extract the band 'bar' using `sel`:
```
In [10]: da_bar = da.sel(band_wavenumber='bar')
In [11]: da_bar
Out[11]:
array([ 1.20000000e-04, 1.00000000e-04, 8.50000000e-05])
Coordinates:
* band_wavenumber (band_wavenumber) object ('bar', 4100.1) ...
```
It selects the data the way I want, although using the dimension name is confusing in this case. It would be nice if we can also use the `MultiIndex` names as arguments of the `sel` method, even though I don't know if it is easy to implement.
Futhermore, `da_bar` still has the 'band_wavenumber' dimension and the 'band' index-level, but it is not very useful anymore. Ideally, I'd rather like to obtain a `DataArray` object with a 'wavenumber' dimension / coordinate and the 'bar' band name dropped from the multi-index, i.e., something would require automatic index-level removal and/or automatic unstack when selecting data.
Extracting the band 'bar' from the pandas `Series` object gives something closer to what I need (see below), but using pandas is not an option as my spectral data involves other dimensions (e.g., time, scans, iterations...) not shown here for simplicity.
```
In [12]: s_bar = s.loc['bar']
In [13]: s_bar
Out[13]:
wavenumber
4100.1 0.000120
4100.3 0.000100
4100.5 0.000085
dtype: float64
```
The problem is also that the unstacked `DataArray` object resulting from the selection has the same dimensions and size than the original, unstacked `DataArray` object. The only difference is that unselected values are replaced by `nan`.
```
In [13]: da.unstack('band_wavenumber')
Out[13]:
array([[ nan, nan, 1.20000000e-04,
1.00000000e-04, 8.50000000e-05],
[ 1.70000000e-04, 1.40000000e-04, nan,
nan, nan]])
Coordinates:
* band (band) object 'bar' 'foo'
* wavenumber (wavenumber) float64 4.05e+03 4.05e+03 4.1e+03 4.1e+03 4.1e+03
In [14]: da_bar.unstack('band_wavenumber')
Out[14]:
array([[ nan, nan, 1.20000000e-04,
1.00000000e-04, 8.50000000e-05],
[ nan, nan, nan,
nan, nan]])
Coordinates:
* band (band) object 'bar' 'foo'
* wavenumber (wavenumber) float64 4.05e+03 4.05e+03 4.1e+03 4.1e+03 4.1e+03
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/767/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
169588316,MDExOlB1bGxSZXF1ZXN0ODAyMjk0OTM=,947,Multi-index levels as coordinates,4160723,closed,0,,,17,2016-08-05T11:34:49Z,2016-09-14T03:35:04Z,2016-09-14T03:34:51Z,MEMBER,,0,pydata/xarray/pulls/947,"Implements 2, 4 and 5 in #719.
Demo:
```
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: import xarray as xr
In [4]: index = pd.MultiIndex.from_product((list('ab'), range(2)),
...: names= ('level_1', 'level_2'))
In [5]: da = xr.DataArray(np.random.rand(4, 4), coords={'x': index},
...: dims=('x', 'y'), name='test')
In [6]: da
Out[6]:
array([[ 0.15036153, 0.68974802, 0.40082234, 0.94451318],
[ 0.26732938, 0.49598123, 0.8679231 , 0.6149102 ],
[ 0.3313594 , 0.93857424, 0.73023367, 0.44069622],
[ 0.81304837, 0.81244159, 0.37274953, 0.86405196]])
Coordinates:
* level_1 (x) object 'a' 'a' 'b' 'b'
* level_2 (x) int64 0 1 0 1
* y (y) int64 0 1 2 3
In [7]: da['level_1']
Out[7]:
array(['a', 'a', 'b', 'b'], dtype=object)
Coordinates:
* level_1 (x) object 'a' 'a' 'b' 'b'
* level_2 (x) int64 0 1 0 1
In [8]: da.sel(x='a', level_2=1)
Out[8]:
array([ 0.26732938, 0.49598123, 0.8679231 , 0.6149102 ])
Coordinates:
x object ('a', 1)
* y (y) int64 0 1 2 3
In [9]: da.sel(level_2=1)
Out[9]:
array([[ 0.26732938, 0.49598123, 0.8679231 , 0.6149102 ],
[ 0.81304837, 0.81244159, 0.37274953, 0.86405196]])
Coordinates:
* level_1 (level_1) object 'a' 'b'
* y (y) int64 0 1 2 3
```
Some notes about the implementation:
- I slightly modified `Coordinate` so that it allows setting different values for the names of the coordinate and its dimension. There is no breaking change.
- I also added a `Coordinate.get_level_coords` method to get independent, single-index coordinates objects from a MultiIndex coordinate.
Remaining issues:
- `Coordinate.get_level_coords` calls `pandas.MultiIndex.get_level_values` for each level and is itself called each time when indexing and for repr. This can be very costly!! It would be nice to return some kind of lazy index object instead of computing the actual level values.
- repr replace a MultiIndex coordinate by its level coordinates. That can be confusing in some cases (see below). Maybe we can set a different marker than `*` for level coordinates.
```
In [6]: [name for name in da.coords]
Out[6]: ['x', 'y']
In [7]: da.coords.keys()
Out[7]:
KeysView(Coordinates:
* level_1 (x) object 'a' 'a' 'b' 'b'
* level_2 (x) int64 0 1 0 1
* y (y) int64 0 1 2 3)
```
- `DataArray.level_1` doesn't return another `DataArray` object:
```
In [10]: da.level_1
Out[10]:
array(['a', 'a', 'b', 'b'], dtype=object)
```
- Maybe we need to test the uniqueness of level names at `DataArray` or `Dataset` creation.
Of course still needs proper tests and docs...
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/947/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
159768214,MDExOlB1bGxSZXF1ZXN0NzM0NjU0MTA=,879,Multi-index repr,4160723,closed,0,,,2,2016-06-11T10:58:13Z,2016-08-31T21:40:59Z,2016-08-31T21:40:59Z,MEMBER,,0,pydata/xarray/pulls/879,"Another item of #719.
An example:
``` python
>>> index = pd.MultiIndex.from_product((list('ab'), range(10)))
>>> index.names= ('a_long_level_name', 'level_1')
>>> data = xr.DataArray(range(20), [('x', index)])
>>> data
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19])
Coordinates:
* x (x) object MultiIndex
- a_long_level_name object 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'a' 'b' ...
- level_1 int64 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
```
To be consistent with the displayed coordinates and/or data variables, it displays the actual used level values. Using the `pandas.MultiIndex.get_level_values` method would be expensive for big indexes, so I re-implemented it in xarray so that we can truncate the computation to the first _x_ values, which is very cheap.
It still needs testing.
Maybe it would be nice to align the level values.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/879/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull