id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
915057433,MDU6SXNzdWU5MTUwNTc0MzM=,5452,[community] Flexible indexes meeting,4160723,closed,0,,,7,2021-06-08T13:32:16Z,2024-02-15T01:39:08Z,2024-02-15T01:39:08Z,MEMBER,,,,"In addition to the [bi-weekly community developers meeting](https://github.com/pydata/xarray/issues/4001), we plan to have 30min meetings on a weekly basis -- every Tue 8:30-9:00 PDT (17:30-18:00 CEST) -- to discuss the flexible indexes refactor.
Anyone from @pydata/xarray feel free to join! The first meeting is in a couple of hours.
[Zoom link](https://us05web.zoom.us/j/84894064491?pwd=UDFjUjBVbTFQQ1k2SEJIa0UwRFFjZz09) (subject to change).
[Google calendar](https://calendar.google.com/event?action=TEMPLATE&tmeid=OTVsbzRlajE4Y2NyMDg3Nm80bzduamQ1OXNfMjAyMTA2MTVUMTUzMDAwWiBiZW5ib3Z5QG0&tmsrc=benbovy%40gmail.com&scp=ALL)
[Meeting notes](https://hackmd.io/I6u0oA0ISECNl3bwfvIcjA)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5452/reactions"", ""total_count"": 5, ""+1"": 5, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
213004586,MDU6SXNzdWUyMTMwMDQ1ODY=,1303,`xarray.core.variable.as_variable()` part of the public API?,4160723,closed,0,,,5,2017-03-09T11:07:52Z,2024-02-06T17:57:21Z,2017-06-02T17:55:12Z,MEMBER,,,,"Is it safe to use `xarray.core.variable.as_variable()` externally? I guess that currently it is not.
I have a specific use case where this would be very useful.
I'm working on a package that heavily uses and extends xarray for landscape evolution modeling, and inside a custom class for model parameters I want to be able to create `xarray.Variable` objects on the fly from any provided object, e.g., a scalar value, an array-like, a `(dims, data[, attrs])` tuple, another `xarray.Variable`, a `xarray.DataArray`... exactly what `xarray.core.variable.as_variable()` does.
Although I know that `Variable` objects are not needed in most use cases, in this specific case a clean solution would be the following
```python
import xarray as xr
class Parameter(object):
def to_variable(self, obj):
return xr.as_variable(obj)
# ... some validation logic on, e.g., data type, value bounds, dimensions...
# ... add default attributes to the created variable (e.g., units, description...)
```
I don't think it is a viable option to copy `as_variable()` and all its dependent code in my package as it seems to have quite a lot of logic implemented.
A workaround using only public API would be something like:
```python
class Parameter(object):
def to_variable(self, obj):
return xr.Dataset(data_vars={'v': obj}).variables['v']
```
but it feels a bit hacky.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1303/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
979316661,MDU6SXNzdWU5NzkzMTY2NjE=,5738,Flexible indexes: how to handle possible dimension vs. coordinate name conflicts?,4160723,closed,0,,,4,2021-08-25T15:31:39Z,2023-08-23T13:28:41Z,2023-08-23T13:28:40Z,MEMBER,,,,"Another thing that I've noticed while working on #5692.
Currently it is not possible to have a Dataset with a same name used for both a dimension and a multi-index level. I guess the reason is to prevent some errors like unmatched dimension sizes when eventually the multi-index is dropped with renamed dimension(s) according to the level names (e.g., with `sel` or `unstack`). See #2299.
I'm wondering how we should handle this in the context of flexible / custom indexes:
A. Keep this current behavior as a special case for (pandas) multi-indexes. This would avoid breaking changes but how to support custom indexes that could eventually be used like pandas multi-indexes in `sel` or `stack`?
B. Introduce some tag in `xarray.Index` so that we can identify a multi-coordinate index that behaves like a hierarchical index (i.e., levels may be dropped into a single index/coordinate with dimension renaming)
C. Do not allow any dimension name matching the name of a coordinate attached to a multi-coordinate index. This seems silly?
D. Eventually revert #2353 and let users taking care of potential conflicts.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5738/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1175329407,I_kwDOAMm_X85GDhp_,6392,Pass indexes to the Dataset and DataArray constructors,4160723,closed,0,,,6,2022-03-21T12:41:51Z,2023-07-21T20:40:05Z,2023-07-21T20:40:04Z,MEMBER,,,,"### Is your feature request related to a problem?
This is part of #6293 (explicit indexes next steps).
### Describe the solution you'd like
A `Mapping[Hashable, Index]` would probably be the most obvious (optional) value type accepted for the `indexes` argument of the Dataset and DataArray constructors.
pros:
- consistent with the `xindexes` property
cons:
- need to be careful with what is passed as `coords` and `indexes`
- multi-indexes: redundancy and order matters (e.g., pandas multi-index levels)
### An example with a pandas multi-index
Currently a pandas multi-index may be passed directly as one (dimension) coordinate ; it is then ""unpacked"" into one dimension (tuple values) coordinate and one or more level coordinates. I would suggest depreciating this behavior in favor of a more explicit (although more verbose) way to pass an existing pandas multi-index:
```python
import pandas as pd
import xarray as xr
pd_idx = pd.MultiIndex.from_product([[""a"", ""b""], [1, 2]], names=(""foo"", ""bar""))
idx = xr.PandasMultiIndex(pd_idx, ""x"")
indexes = {""x"": idx, ""foo"": idx, ""bar"": idx}
coords = idx.create_variables()
ds = xr.Dataset(coords=coords, indexes=indexes)
```
The cases below should raise an error:
```python
ds = xr.Dataset(indexes=indexes)
# ValueError: missing coordinate(s) for index(es): 'x', 'foo', 'bar'
ds = xr.Dataset(
coords=coords,
indexes={""x"": idx, ""foo"": idx},
)
# ValueError: missing index(es) for coordinate(s): 'bar'
ds = xr.Dataset(
coords={""x"": coords[""x""], ""foo"": [0, 1, 2, 3], ""bar"": coords[""bar""]},
indexes=indexes,
)
# ValueError: conflict between coordinate(s) and index(es): 'foo'
ds = xr.Dataset(
coords=coords,
indexes={""x"": idx, ""foo"": idx, ""bar"": xr.PandasIndex([0, 1, 2], ""y"")},
)
# ValueError: conflict between coordinate(s) and index(es): 'bar'
```
Should we raise an error or simply ignore the index in the case below?
```python
ds = xr.Dataset(coords=coords)
# ValueError: missing index(es) for coordinate(s): 'x', 'foo', 'bar'
# or
# create unindexed coordinates 'foo' and 'bar' and a 'x' coordinate with a single pandas index
```
Should we silently reorder the coordinates and/or indexes when the levels are not passed in the right order? It seems odd requiring mapping elements be passed in a given order.
```python
ds = xr.Dataset(coords=coords, indexes={""bar"": idx, ""x"": idx, ""foo"": idx})
list(ds.xindexes.keys())
# [""x"", ""foo"", ""bar""]
```
### How to generalize to any (custom) index?
With the case of multi-index, it is pretty easy to check whether the coordinates and indexes are consistent because we ensure consistent `pd_idx.names` vs. coordinate names and because `idx.get_variables()` returns Xarray `IndexVariable` objects where variable data wraps the pandas multi-index.
However, this may not be easy for other indexes. Some Xarray custom indexes (like a KD-Tree index) likely won't return anything from `.get_variables()` as they don't support wrapping internal data as coordinate data. Right now there's nothing in the Xarray `Index` base class that could help checking consistency between indexes vs. coordinates for *any* kind of index.
How could we solve this?
- A. add a `.coords` property to the Xarray `Index` base class, that returns a `dict[Hashable, IndexVariable]`.
- Ambiguous when an Index is created directly, i.e., like above `xr.PandasMultiIndex(pd_idx, ""x"")`. Should `.coords` return `None` and return the coordinates returned by the last `.get_variables()` call?
- What if different sets of coordinates refer to a common index (e.g., after copying the coordinate variables, etc.)?
- B. add a `.coord_names` property to the Xarray `Index` base class that returns `tuple[Hashable, ...]`, and add a private attribute to `IndexVariable` that returns the index object (or return it via a very lightweight `IndexAdapter` base class used to wrap variable data).
- `Index.get_variables(variables)` would by default return shallow copies of the input variables with a reference to the index object.
- If that's necessary, we could also store the coordinate dimensions in `coord_names`, i.e., using `tuple[tuple[Hashable, tuple[Hashable, ...]], ...]`.
I think I prefer the second option.
### Describe alternatives you've considered
### Also allow passing index types (and build options) via `indexes`
I.e., `Mapping[Hashable, Index | Type[Index] | tuple[TypeIndex, Mapping[Any, Any]]]`, so that new indexes can be created from the passed coordinates at DataArray or Dataset creation.
pros:
- Flexible.
cons:
- This is complicated. Constructing the Dataset / DataArray (with default indexes) first then calling `.set_index` is probably better.
- Hard to deal with multi-index (redundancy of build option, etc.)
### Pass multi-indexes once, grouped by coordinate names
I.e., `indexes` keys accept tuples: `Mapping[Hashable | tuple[Hashable, ...], Index]`
pros:
- No redundancy and easier to check consistency between indexes vs. coordinates
cons:
- Not consistent with the `.xindexes` property
- Complicated when eventually using tuples for coordinate names?
### Additional context
_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6392/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1472470718,I_kwDOAMm_X85XxB6-,7346,assign_coords reset all dimension coords to default (pandas) index,4160723,closed,0,,,0,2022-12-02T08:07:55Z,2022-12-02T16:32:41Z,2022-12-02T16:32:41Z,MEMBER,,,,"### What happened?
See https://github.com/martinfleis/xvec/issues/13#issue-1472023524
### What did you expect to happen?
`assign_coords()` should preserve the index of coordinates that are not updated or not part of a dropped multi-coordinate index.
### Minimal Complete Verifiable Example
See https://github.com/martinfleis/xvec/issues/13#issue-1472023524
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
Xarray version 2022.11.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7346/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1322198907,I_kwDOAMm_X85Ozyd7,6849,Public API for setting new indexes: add a set_xindex method?,4160723,closed,0,,,5,2022-07-29T12:38:34Z,2022-09-28T07:25:16Z,2022-09-28T07:25:16Z,MEMBER,,,,"### What is your issue?
xref https://github.com/pydata/xarray/pull/6795#discussion_r932665544 and #6293 (Public API section).
The `scipy22` branch contains the addition of a `.set_xindex()` method to DataArray and Dataset so that participants at the SciPy 2022 Xarray sprint could experiment with custom indexes. After thinking more about it, I'm wondering if it couldn't actually be part of Xarray's public API alongside `.set_index()` (at least for a while).
- Having two methods `.set_xindex()` vs. `.set_index()` would be quite consistent with the `.xindexes` vs. `.indexes` properties that are already there.
- I actually like the `.set_xindex()` API proposed in the `scipy22`, i.e., setting one index at a time from one or more coordinates, possibly with build options. While it *could* be possible to support both that and `.set_index()`'s current API (quite specific to pandas multi-indexes) all in one method, it would certainly result in a much more confusing API and internal implementation.
- In the long term we could progressively get rid of `.indexes` and `.set_index()` and/or rename `.xindexes` to `.indexes` and `.set_xindex()` to `.set_index()`.
Thoughts @pydata/xarray?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6849/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1361896826,I_kwDOAMm_X85RLOV6,6989,reset multi-index to single index (level): coordinate not renamed,4160723,closed,0,4160723,,0,2022-09-05T12:45:22Z,2022-09-27T10:35:39Z,2022-09-27T10:35:39Z,MEMBER,,,,"### What happened?
Resetting a multi-index to a single level (i.e., a single index) does not rename the remaining level coordinate to the dimension name.
### What did you expect to happen?
While it is certainly more consistent not to rename the level coordinate here (since an index can be assigned to a non-dimension coordinate now), it breaks from the old behavior. I think it's better not introduce any breaking change. As discussed elsewhere, we might eventually want to deprecate `reset_index` in favor of `drop_indexes` (#6971).
### Minimal Complete Verifiable Example
```Python
import pandas as pd
import xarray as xr
midx = pd.MultiIndex.from_product([[""a"", ""b""], [1, 2]], names=(""foo"", ""bar""))
ds = xr.Dataset(coords={""x"": midx})
#
# Dimensions: (x: 4)
# Coordinates:
# * x (x) object MultiIndex
# * foo (x) object 'a' 'a' 'b' 'b'
# * bar (x) int64 1 2 1 2
# Data variables:
# *empty*
rds = ds.reset_index(""foo"")
# v2022.03.0
#
#
# Dimensions: (x: 4)
# Coordinates:
# * x (x) int64 1 2 1 2
# foo (x) object 'a' 'a' 'b' 'b'
# Data variables:
# *empty*
# v2022.06.0
#
#
# Dimensions: (x: 4)
# Coordinates:
# foo (x) object 'a' 'a' 'b' 'b'
# * bar (x) int64 1 2 1 2
# Dimensions without coordinates: x
# Data variables:
# *empty*
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6989/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1361626450,I_kwDOAMm_X85RKMVS,6987,Indexes.get_unique() TypeError with pandas indexes,4160723,closed,0,4160723,,0,2022-09-05T09:02:50Z,2022-09-23T07:30:39Z,2022-09-23T07:30:39Z,MEMBER,,,,"@benbovy I also just tested the `get_unique()` method that you mentioned and maybe noticed a related issue here, which I'm not sure is wanted / expected.
Taking the above dataset `ds`, accessing this function results in an error:
```python
> ds.indexes.get_unique()
TypeError: unhashable type: 'MultiIndex'
```
However, for `xindexes` it works:
```python
> ds.xindexes.get_unique()
[]
```
_Originally posted by @lukasbindreiter in https://github.com/pydata/xarray/issues/6752#issuecomment-1236717180_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6987/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
302077805,MDU6SXNzdWUzMDIwNzc4MDU=,1961,"Extend xarray with custom ""coordinate wrappers""",4160723,closed,0,,,10,2018-03-04T11:26:15Z,2022-09-19T08:47:45Z,2022-09-19T08:47:44Z,MEMBER,,,,"Recent and ongoing developments in xarray turn DataArray and Dataset more and more into data wrappers that are extensible at (almost) every level:
- domain-specific methods (accessors)
- io backends (netcdf, raster, zarr, etc.) via an abstract `DataStore` interface
- array backends (numpy, dask, sparse) via multidispatch or hooks (#1938)
- soon custom indexes? (kd-tree, out-of-core indexes... #1603, #1650, #475)
Regarding the latter, I’m thinking about the idea of extending xarray at an even more abstract level, i.e., the possibility of adding / registering ""coordinate wrappers"" to `DataArray` or `Dataset` objects. Basically, it would correspond to adding any *object that allows to do some operation based on one or several coordinates* ~~(I haven’t found any better name than ""coordinate agent"" to describe that)~~.
EDIT: ""coordinate agents"" may not be quite right here, I changed that to ""coordinate wrappers"")
Indexes are a specific case of coordinate wrappers that serve the purpose of indexing. This is built in xarray.
While indexing is enough in 80% of cases, I see a couple of use cases where other coordinate wrappers (built outside of xarray) would be nice to have:
- Grids. For example, [xgcm](https://github.com/xgcm/xgcm) implements operations (interp, diff) on physical axes that may each include several coordinates, depending on the position of the coordinate labels on the axis (center, left…). Other grids define their topology using a greater number of coordinates (e.g., [ugrid](https://github.com/ugrid-conventions/ugrid-conventions)). Storing regridding weights might be another use case?
- Clocks. For example, [xarray-simlab](https://github.com/benbovy/xarray-simlab/) use one or several coordinates to define the timeline of a computational simulation.
In those examples we usually rely on coordinate attributes and/or classes that encapsulate xarray objects to implement the specific features that we need. While it works, it has limitations and I think it can be improved.
Custom coordinate wrappers would be a way of extending xarray that is very consistent with other current (or considered) extension mechanisms.
This is still a very vague idea and I’m sure that there are lots of details that can be discussed (serialization, etc.).
But before going further, I’d like to know your thoughts @pydata/xarray. Do you think it is a silly idea? Do you have in mind other use cases where custom coordinate wrappers would be useful?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1961/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
955936490,MDU6SXNzdWU5NTU5MzY0OTA=,5647,Flexible indexes: review the implementation of alignment and merge,4160723,closed,0,,,12,2021-07-29T15:03:23Z,2022-09-07T09:47:13Z,2022-09-07T09:47:13Z,MEMBER,,,,"The current implementation of the `align` function is problematic in the context of flexible indexes because:
- the sizes of the joined indexes are reused for checking compatibility with unlabelled dimension sizes
- the joined indexes are used as indexers to compute the aligned Dataset / DataArray.
This currently works well since a pd.Index can be directly treated as a 1-d array but this won’t be always the case anymore with custom indexes.
I'm opening this issue to gather ideas on how best to handle alignment in a more flexible way (I haven't been thinking much at this problem yet).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5647/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1322190255,I_kwDOAMm_X85OzwWv,6848,Update API,4160723,closed,0,,,0,2022-07-29T12:30:08Z,2022-07-29T12:30:23Z,2022-07-29T12:30:23Z,MEMBER,,,,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6848/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
968796847,MDU6SXNzdWU5Njg3OTY4NDc=,5697,Coerce the labels passed to Index.query to array-like objects,4160723,closed,0,,,3,2021-08-12T13:09:40Z,2022-03-17T17:11:43Z,2022-03-17T17:11:43Z,MEMBER,,,,"When looking at #5691 I noticed that the labels are sometimes coerced to arrays (i.e., #3153) but not always.
Later in `PandasIndex.query` those may again be coerced to arrays (i.e., `_as_array_tuplesafe`). In #5692 (https://github.com/pydata/xarray/pull/5692/commits/a551c7f05abf90a492fb59068b59ebb2bac8cb4c) they are always coerced to arrays before maybe be converted as scalars.
Shouldn't we therefore make things easier and ensure that the labels given to `xarray.Index.query()` always have an array interface? This would also yield a more predictable behavior to anyone who wants to implement custom xarray indexes.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5697/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
968990058,MDU6SXNzdWU5Njg5OTAwNTg=,5700,Selection with multi-index and float32 values,4160723,closed,0,,,0,2021-08-12T14:55:11Z,2022-03-17T17:11:43Z,2022-03-17T17:11:43Z,MEMBER,,,,"I guess it's rather an edge case, but a similar issue than the one fixed in #3153 may occur with multi-indexes:
```python
>>> foo_data = ['a', 'a', 'b', 'b']
>>> bar_data = np.array([0.1, 0.2, 0.7, 0.9], dtype=np.float32)
>>> da = xr.DataArray([1, 2, 3, 4], dims=""x"", coords={""foo"": (""x"", foo_data), ""bar"": (""x"", bar_data)})
>>> da = da.set_index(x=[""foo"", ""bar""])
```
```python
>>> da.sel(bar=0.1)
KeyError: 0.1
```
```python
>>> da.sel(bar=np.array(0.1, dtype=np.float32).item())
array([1])
Coordinates:
* foo (foo) object 'a'
```
(xarray version: 0.18.2 as there's a regression introduced in 0.19.0 #5691)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5700/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
955605233,MDU6SXNzdWU5NTU2MDUyMzM=,5645,Flexible indexes: handle renaming coordinate variables,4160723,closed,0,,,0,2021-07-29T08:42:00Z,2022-03-17T17:11:42Z,2022-03-17T17:11:42Z,MEMBER,,,,"We should have some API in `xarray.Index` to update the index when its corresponding coordinate variables are renamed.
This currently implemented here where the underlying `pd.Index` name(s) are updated: https://github.com/pydata/xarray/blob/c5530d52d1bcbd071f4a22d471b728a4845ea36f/xarray/core/dataset.py#L3299-L3314
This logic should be moved into `PandasIndex` and `PandasMultiIndex`.
Other, custom indexes might also have internal attributes to update, so we might need formal API for that.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5645/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
985162305,MDU6SXNzdWU5ODUxNjIzMDU=,5755,Mypy errors with the last version of _typed_ops.pyi ,4160723,closed,0,,,5,2021-09-01T13:34:52Z,2021-09-13T10:53:16Z,2021-09-13T00:04:54Z,MEMBER,,,,"**What happened**:
Since #5569 I get a lot of mypy errors from `_typed_ops.pyi` (see below). What's weird is that it is not happening in all cases:
```
$ mypy # ok
$ mypy . # errors
$ pre-commit run --all-files # ok
$ pre-commit run # errors
$ git commit # (via pre-commit hooks) errors
```
I also tried `pre-commit clean` with no luck. EDIT: I also tried on a freshly cloned xarray repository.
@max-sixty @Illviljan Any idea on what's happening?
**What you expected to happen**:
No mypy error in all cases.
**Anything else we need to know?**:
```
xarray/core/_typed_ops.pyi:32: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:33: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:34: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:35: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:36: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:37: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:38: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:39: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:40: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:41: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:42: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:43: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:44: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:45: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:46: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:47: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:48: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:49: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:50: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:51: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:52: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:53: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:54: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:55: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:56: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:57: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:60: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:61: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:62: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:63: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:64: error: The erased type of self ""xarray.core.dataset.Dataset"" is not a supertype of its class ""xarray.core._typed_ops.DatasetOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:65: error: The erased type of self ""xarray.core.dataset.Dataset"" is not a supertype of its class ""xarray.core._typed_ops.DatasetOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:66: error: The erased type of self ""xarray.core.dataset.Dataset"" is not a supertype of its class ""xarray.core._typed_ops.DatasetOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:67: error: The erased type of self ""xarray.core.dataset.Dataset"" is not a supertype of its class ""xarray.core._typed_ops.DatasetOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:77: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:83: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:89: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:95: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:101: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:107: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:113: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:119: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:125: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:131: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:137: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:143: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:149: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:155: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:161: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:167: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:173: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:179: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:185: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:191: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:197: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:203: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:209: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:215: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:221: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:227: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:230: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:231: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:232: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:233: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:234: error: The erased type of self ""xarray.core.dataarray.DataArray"" is not a supertype of its class ""xarray.core._typed_ops.DataArrayOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:235: error: The erased type of self ""xarray.core.dataarray.DataArray"" is not a supertype of its class ""xarray.core._typed_ops.DataArrayOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:236: error: The erased type of self ""xarray.core.dataarray.DataArray"" is not a supertype of its class ""xarray.core._typed_ops.DataArrayOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:237: error: The erased type of self ""xarray.core.dataarray.DataArray"" is not a supertype of its class ""xarray.core._typed_ops.DataArrayOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:247: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:253: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:259: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:265: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:271: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:277: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:283: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:289: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:295: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:301: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:307: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:313: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:319: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:325: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:331: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:337: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:343: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:349: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:355: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:361: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:367: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:373: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:379: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:385: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:391: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:397: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:400: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:401: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:402: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:403: error: Self argument missing for a non-static method (or an invalid type for self) [misc]
xarray/core/_typed_ops.pyi:404: error: The erased type of self ""xarray.core.variable.Variable"" is not a supertype of its class ""xarray.core._typed_ops.VariableOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:405: error: The erased type of self ""xarray.core.variable.Variable"" is not a supertype of its class ""xarray.core._typed_ops.VariableOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:406: error: The erased type of self ""xarray.core.variable.Variable"" is not a supertype of its class ""xarray.core._typed_ops.VariableOpsMixin"" [misc]
xarray/core/_typed_ops.pyi:407: error: The erased type of self ""xarray.core.variable.Variable"" is not a supertype of its class ""xarray.core._typed_ops.VariableOpsMixin"" [misc]
```
**Environment**:
mypy 0.910
python 3.9.6 (also tested with 3.8)
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5755/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
933551030,MDU6SXNzdWU5MzM1NTEwMzA=,5553,Flexible indexes: how best to implement the new data model?,4160723,closed,0,,,2,2021-06-30T10:38:13Z,2021-08-09T07:56:56Z,2021-08-09T07:56:56Z,MEMBER,,,,"Yesterday during the flexible indexes weekly meeting we have discussed with @shoyer and @jhamman on what would be the best approach to implement the new data model described [here](https://github.com/pydata/xarray/blob/main/design_notes/flexible_indexes_notes.md#1-data-model). In this issue I summarize the implementation of the current data model as well as some suggestions for the new data model along with their pros / cons (I might still be missing important ones!). I don't think there's an easy or ideal solution unfortunately, so @pydata/xarray any feedback would be very welcome!
## Current data model implementation
Currently any (pandas) index is wrapped into an `IndexVariable` object through an intermediate adapter to preserve dtypes and handle explicit indexing. This allows directly reusing the index data as a xarray coordinate variable. For a pandas multi-index, virtual coordinates are created for each level from the `IndexVariable` object wrapping the index. Although relying on ""virtual coordinates"" more or less worked so far, it is over-complicated. Moreover, this wouldn't work with the new data model where an index may be built from a set of coordinates with different dimensions.
## Proposed alternatives
### Option 1: independent (coordinate) variables and indexes
Indexes and coordinates are loosely coupled, i.e., a `xarray.Index` holds a reference (mapping) to the coordinate variable(s) from which it is built but both manage their own data independently of each other.
Pros:
- separation of concerns.
- we don't need anymore those complicated adapters for reusing the index data as xarray (virtual) variable(s), which may simplify some xarray internals.
- if we drop an index, that's simple, we just drop it and all its related coordinate variables are left as-is.
- we could theoretically build a (pandas) index from a chunked coordinate, and then when we drop the index we still have this chunked coordinate left untouched.
Cons:
- data duplication
- this would clearly be a regression when using pandas indexes, but maybe less so for other indexes like kd-trees where adapting those objects for using it like coordinate variables wouldn't be easy or even possible.
- what if we want to build a `DataArray` or `Dataset` from one or more existing indexes (pandas or other)? Passing an index and treating as an array then re-building an index from this array is not optimal.
- keeping an index and its corresponding coordinate variable(s) in a consistent, in-sync state may be tricky, given that those variables may be mutable (although we could prevent this by encapsulating those variables using a very lightweight wrapper inspired by `IndexVariable`).
### Option 2: indexes hold coordinate variables
This is the opposite approach of the current one. Here, a `xarray.Index` would wrap one or more `xarray.Variable` objects.
Pros:
- probably easier to keep an index and its corresponding coordinate variable(s) in-sync.
- sharing data between an index and its coordinate variables may be easier.
Cons:
- accessing / iterating through all coordinate variables in a `DataArray` or `Dataset` may be less straightforward.
- when the index is dropped, we might need some logic / API to return the coordinates as new `xarray.Variable` objects with their own data (or should we simply always drop the corresponding coordinates too? maybe not...).
- more responsibility / work for developers who want to provide 3rd party xarray indexes.
### Option 3: intermediate solution
When an index is set (or unset), it returns a new set of coordinate variables to replace the existing ones.
Pros:
- it keeps some separation of concerns, while it allows data sharing through adapters and/or ensures that variables are immutable using lightweight wrappers.
Cons:
- like option 2, more things to care of for 3rd party xarray index developers.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5553/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
187859705,MDU6SXNzdWUxODc4NTk3MDU=,1092,Dataset groups,4160723,closed,0,,,20,2016-11-07T23:28:36Z,2021-07-02T19:56:50Z,2021-07-02T19:56:49Z,MEMBER,,,,"EDIT: see https://github.com/pydata/xarray/issues/4118 for ongoing discussion
-------------------
Probably it has been already suggested, but similarly to netCDF4 groups it would be nice if we could access `Dataset` data variables, coordinates and attributes via groups.
Currently xarray allows loading a specific netCDF4 group into a `Dataset`. Different groups can be loaded as separate `Dataset` objects, which may be then combined into a single, flat `Dataset`. Yet, in some cases it makes sense to represent data as a single object while it would be convenient to keep some nested structure. For example, a `Dataset` representing data on a staggered grid might have `scalar_vars` and `flux_vars` groups. [Here](https://unidata.ucar.edu/software/netcdf/workshops/2010/groups-types/GroupUses.html) are some potential uses for groups. When there are a lot of data variables and/or attributes, it would also help to have a more concise repr.
I think about an implementation of `Dataset.groups` that would be specific to xarray, i.e., independent of any backend, and which would easily co-exist with the flat `Dataset`. It shouldn't be required for a backend to support groups (some existing backends simply don't). It is up to each backend to eventually transpose the `Dataset.groups` logic to its own group logic.
`Dataset.groups` might return a `DatasetGroups` object, which quite similarly to `xarray.core.coordinates.DatasetCoordinates` would (1) have a reference to the Dataset object, (2) basically consist of a Mapping of group names to data variable/coordinate/attribute names and (3) dynamically create another `Dataset` object (sub-dataset) on `__getitem__`. Keys of `Dataset.groups` should be accessible as attributes , e.g., `ds.groups['scalar_vars'] == ds.scalar_vars`.
Questions:
- How to handle hierarchies of > 1 levels (i.e., groups of groups...)?
- How to ensure that a variable / attribute in one group is not also present in another group?
- Case of methods called from groups with `inplace=True`?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1092/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
264747372,MDU6SXNzdWUyNjQ3NDczNzI=,1627,html repr of xarray object (for the notebook),4160723,closed,0,,,39,2017-10-11T21:49:20Z,2019-10-24T16:56:15Z,2019-10-24T16:48:47Z,MEMBER,,,,"Edit: preview for `Dataset` and `DataArray` (pure html/css)
`Dataset`: https://jsfiddle.net/tay08cn9/4/
`DataArray`: https://jsfiddle.net/43z4v2wt/9/
---
I started to think a bit more deeply about how could look like a more rich, html-based representation of xarray objects that we would see, e.g., in jupyter notebooks.
Here are some ideas for `Dataset`: https://jsfiddle.net/9ab4c3tr/35/
Some notes:
- The html repr looks pretty similar than the plain-text repr. I think it's better if they don't differ too much from each other.
- For the sake of consistency, I've stolen some style from `pandas.Dataframe` repr as it is shown in jupyterlab.
- I tried to emphasize the most important parts of the repr, i.e., the lists of dimensions, coordinates and variables.
- I think it's best if we keep a very lightweight implementation, i.e., pure HTML/CSS (no Javascript). It already allows some interaction like hover effects and collapsible sections. However, I doubt that more fancy stuff (like, e.g., highlighting on hover a specific dimension simultaneously at several places of the repr) would be possible here without Javascript. I have limited skills in this area, though.
It is still, of course, some preliminary thoughts. Any feedback/suggestion is welcome, even opinions about whether an html repr is really needed or not!
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1627/reactions"", ""total_count"": 11, ""+1"": 7, ""-1"": 0, ""laugh"": 0, ""hooray"": 4, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
234658224,MDU6SXNzdWUyMzQ2NTgyMjQ=,1447,"Package naming ""conventions"" for xarray extensions",4160723,closed,0,,,5,2017-06-08T21:14:24Z,2019-06-28T22:58:33Z,2019-06-28T21:58:33Z,MEMBER,,,,"I'm wondering what would be a good name for a package that primarily aims at providing an xarray extension (in the form of a `DataArray` and/or `Dataset` accessor).
I'm currently thinking about using a prefix like the `scikit` package family (e.g., `scikit-learn`, `scikit-image`).
For example, for a xarray extension for signal processing we would have:
package full name: `xarray-signal`
package import name: `xrsignal` (like `sklearn`)
accessor name: `signal`.
```python
>>> import xarray as xr
>>> import xrsignal
>>> ds = xr.Dataset()
>>> ds.signal.process(...)
```
The main advantage is that we directly have an idea on what the package is about. It may be also good for the overall visibility of both xarray and its 3rd-party extensions.
The downside is that there is three name variations: one for getting and installing the package, another one for importing the package and again another one for using the accessor. This may be annoying especially for new users who are not accustomed to this kind of naming convention.
Conversely, choosing a different, unrelated name like [salem](https://github.com/fmaussion/salem) or [pangaea](https://github.com/snowman2/pangaea) has the advantage of using the same name everywhere and perhaps providing multiple accessors in the same package, but given that the number of xarray extensions is likely to grow in a next future (see, e.g., the [pangeo-data](https://pangeo-data.github.io/) project) it would become difficult to have a clear view of the whole xarray package ecosystem.
Any thoughts?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1447/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
180676935,MDU6SXNzdWUxODA2NzY5MzU=,1030,Concatenate multiple variables into one variable with a multi-index (categories),4160723,closed,0,,,3,2016-10-03T15:54:23Z,2019-02-25T07:25:40Z,2019-02-25T07:25:40Z,MEMBER,,,,"I often have to deal with datasets in this form (multiple variables of different sizes, each representing different categories, on the same physical dimension but using different names as they have different labels),
```
Dimensions: (wn_band1: 4, wn_band2: 6, wn_band3: 8)
Coordinates:
* wn_band1 (wn_band1) float64 200.0 266.7 333.3 400.0
* wn_band2 (wn_band2) float64 500.0 560.0 620.0 680.0 740.0 800.0
* wn_band3 (wn_band3) float64 1.5e+03 1.643e+03 1.786e+03 1.929e+03 ...
Data variables:
data_band3 (wn_band3) float64 0.7515 0.5302 0.6697 0.9621 0.01815 ...
data_band1 (wn_band1) float64 0.3801 0.6649 0.01884 0.9407
data_band2 (wn_band2) float64 0.8813 0.4481 0.2353 0.9681 0.1085 0.0835
```
where it would be more convenient to have the data re-arranged into the following form (concatenate the variables into a single variable with a multi-index with the labels of both the categories and the physical coordinate):
```
Dimensions: (spectrum: 18)
Coordinates:
* spectrum (spectrum) MultiIndex
- band (spectrum) int64 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3 3
- wn (spectrum) float64 200.0 266.7 333.3 400.0 500.0 560.0 620.0 ...
Data variables:
data (spectrum) float64 0.3801 0.6649 0.01884 0.9407 0.8813 0.4481 ...
```
The latter would allow using xarray's nice features like `ds.groupby('band').mean()`.
Currently, the best way that I've found to transform the data is something like:
``` python
data = np.concatenate([ds.data_band1, ds.data_band2, ds.data_band3])
wn = np.concatenate([ds.wn_band1, ds.wn_band2, ds.wn_band3])
band = np.concatenate([np.repeat(1, 4), np.repeat(2, 6), np.repeat(3, 8)])
midx = pd.MultiIndex.from_arrays([band, wn], names=('band', 'wn'))
ds2 = xr.Dataset({'data': ('spectrum', data)}, coords={'spectrum': midx})
```
Maybe I miss a better way to do this? If I don't, it would be nice to have a convenience method for this, unless this use case is too rare to be worth it. Also not sure at all on what would be a good API such a method.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1030/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
275033174,MDU6SXNzdWUyNzUwMzMxNzQ=,1727,IPython auto-completion triggers data loading,4160723,closed,0,,,11,2017-11-18T00:14:00Z,2017-11-18T07:09:41Z,2017-11-18T07:09:40Z,MEMBER,,,,"I create a big netcdf file like this:
```python
In [1]: import xarray as xr
In [2]: import numpy as np
In [3]: ds = xr.Dataset({'myvar': np.arange(100000000, dtype='float64')})
In [4]: ds.to_netcdf('test.nc')
```
Then when I open the file in a IPython console and I use auto-completion, it triggers loading the data.
```python
In [1]: import xarray as xr
In [2]: ds = xr.open_dataset('test.nc')
In [3]: ds.my # autocompletion with any character -> triggers loading
```
I don't have that issue using the python console. Auto-completion for dictionary access in IPython (#1632) works fine too.
#### Output of ``xr.show_versions()``
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: fr_BE.UTF-8
LOCALE: fr_BE.UTF-8
xarray: 0.10.0rc1-2-gf83361c
pandas: 0.21.0
numpy: 1.13.1
scipy: 0.19.1
netCDF4: 1.3.1
h5netcdf: 0.5.0
Nio: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.15.4
matplotlib: None
cartopy: None
seaborn: None
setuptools: 36.6.0
pip: 9.0.1
conda: None
pytest: None
IPython: 6.2.1
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1727/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
274591962,MDU6SXNzdWUyNzQ1OTE5NjI=,1722,Change in behavior of .set_index() from pandas 0.20.3 to 0.21.0,4160723,closed,0,,,1,2017-11-16T17:05:20Z,2017-11-17T00:54:51Z,2017-11-17T00:54:51Z,MEMBER,,,,"I use xarray 0.9.6 for both examples below.
With pandas 0.20.3, `Dataset.set_index` gives me what I expect (i.e., the `grid__x` data variable becomes a coordinate `x`):
```python
In [1]: import xarray as xr
In [2]: import pandas as pd
In [3]: pd.__version__
Out[3]: '0.20.3'
In [4]: ds = xr.Dataset({'grid__x': ('x', [1, 2, 3])})
In [5]: ds.set_index(x='grid__x')
Out[5]:
Dimensions: (x: 3)
Coordinates:
* x (x) int64 1 2 3
Data variables:
*empty*
```
With pandas 0.21.0, it creates a `MultiIndex`, which is not what I expect here when setting an index with only one data variable:
```python
In [1]: import xarray as xr
In [2]: import pandas as pd
In [3]: pd.__version__
Out[3]: '0.21.0'
In [4]: ds = xr.Dataset({'grid__x': ('x', [1, 2, 3])})
In [5]: ds.set_index(x='grid__x')
Out[5]:
Dimensions: (x: 3)
Coordinates:
* x (x) MultiIndex
- grid__x (x) int64 1 2 3
Data variables:
*empty*
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1722/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
134359597,MDU6SXNzdWUxMzQzNTk1OTc=,767,MultiIndex and data selection,4160723,closed,0,,,9,2016-02-17T18:24:00Z,2016-09-14T14:28:29Z,2016-09-14T14:28:29Z,MEMBER,,,,"[Edited for more clarity]
First of all, I find the MultiIndex very useful and I'm looking forward to see the TODOs in #719 implemented in the next releases, especially the three first ones in the list!
Apart from these issues, I think that some other aspects may be improved, notably regarding data selection. Or maybe I've not correctly understood how to deal with multi-index and data selection...
To illustrate this, I use some fake spectral data with two discontinuous bands of different length / resolution:
```
In [1]: import pandas as pd
In [2]: import xarray as xr
In [3]: band = np.array(['foo', 'foo', 'bar', 'bar', 'bar'])
In [4]: wavenumber = np.array([4050.2, 4050.3, 4100.1, 4100.3, 4100.5])
In [5]: spectrum = np.array([1.7e-4, 1.4e-4, 1.2e-4, 1.0e-4, 8.5e-5])
In [6]: s = pd.Series(spectrum, index=[band, wavenumber])
In [7]: s.index.names = ('band', 'wavenumber')
In [8]: da = xr.DataArray(s, dims='band_wavenumber')
In [9]: da
Out[9]:
array([ 1.70000000e-04, 1.40000000e-04, 1.20000000e-04,
1.00000000e-04, 8.50000000e-05])
Coordinates:
* band_wavenumber (band_wavenumber) object ('foo', 4050.2) ...
```
I extract the band 'bar' using `sel`:
```
In [10]: da_bar = da.sel(band_wavenumber='bar')
In [11]: da_bar
Out[11]:
array([ 1.20000000e-04, 1.00000000e-04, 8.50000000e-05])
Coordinates:
* band_wavenumber (band_wavenumber) object ('bar', 4100.1) ...
```
It selects the data the way I want, although using the dimension name is confusing in this case. It would be nice if we can also use the `MultiIndex` names as arguments of the `sel` method, even though I don't know if it is easy to implement.
Futhermore, `da_bar` still has the 'band_wavenumber' dimension and the 'band' index-level, but it is not very useful anymore. Ideally, I'd rather like to obtain a `DataArray` object with a 'wavenumber' dimension / coordinate and the 'bar' band name dropped from the multi-index, i.e., something would require automatic index-level removal and/or automatic unstack when selecting data.
Extracting the band 'bar' from the pandas `Series` object gives something closer to what I need (see below), but using pandas is not an option as my spectral data involves other dimensions (e.g., time, scans, iterations...) not shown here for simplicity.
```
In [12]: s_bar = s.loc['bar']
In [13]: s_bar
Out[13]:
wavenumber
4100.1 0.000120
4100.3 0.000100
4100.5 0.000085
dtype: float64
```
The problem is also that the unstacked `DataArray` object resulting from the selection has the same dimensions and size than the original, unstacked `DataArray` object. The only difference is that unselected values are replaced by `nan`.
```
In [13]: da.unstack('band_wavenumber')
Out[13]:
array([[ nan, nan, 1.20000000e-04,
1.00000000e-04, 8.50000000e-05],
[ 1.70000000e-04, 1.40000000e-04, nan,
nan, nan]])
Coordinates:
* band (band) object 'bar' 'foo'
* wavenumber (wavenumber) float64 4.05e+03 4.05e+03 4.1e+03 4.1e+03 4.1e+03
In [14]: da_bar.unstack('band_wavenumber')
Out[14]:
array([[ nan, nan, 1.20000000e-04,
1.00000000e-04, 8.50000000e-05],
[ nan, nan, nan,
nan, nan]])
Coordinates:
* band (band) object 'bar' 'foo'
* wavenumber (wavenumber) float64 4.05e+03 4.05e+03 4.1e+03 4.1e+03 4.1e+03
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/767/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
169368546,MDU6SXNzdWUxNjkzNjg1NDY=,942,Filtering by data variable name,4160723,closed,0,,,3,2016-08-04T13:01:20Z,2016-08-04T19:09:07Z,2016-08-04T19:09:07Z,MEMBER,,,,"Given #844 and #916, maybe it might be useful to also have a `Dataset.filter_by_name` method?
I currently deal with datasets that have many data variables with names like:
```
...
reference__HONO (rlevel) float64 3.16e-15 1e-14 1e-14 1e-14 ...
reference__NO (rlevel) float64 2.16e-05 3.57e-06 9.3e-07 ...
reference__HO2NO2 (rlevel) float64 9.58e-20 7.32e-19 4.63e-18 ...
...
retrieved__O3 (level) float64 1.552e-06 5.618e-07 ...
retrieved__N2O (level) float64 4.714e-11 9.905e-11 ...
retrieved__CO2 (level) float64 0.0002816 0.0003592 ...
...
```
Using `ds.filter_by_name(like='reference__')` would be less verbose than, e.g., `xr.Dataset({name: ds[name] for name in ds.keys() if 'reference__' in name})`, unless there is already a more convenient way that I'm missing?
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/942/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue