github: issues: 42 rows where type = "issue" and user = 4160723 sorted by updated

42 rows where type = "issue" and user = 4160723 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	assignee	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
1389295853	I_kwDOAMm_X85Szvjt	7099	Pass arbitrary options to sel()	benbovy 4160723	open		4	2022-09-28T12:44:52Z	2024-04-30T00:44:18Z		MEMBER	Is your feature request related to a problem? Currently `.sel()` accepts two options `method` and `tolerance`. These are relevant for default (pandas) indexes but not necessarily for other, custom indexes. It would be also useful for custom indexes to expose their own selection options, e.g., index query optimization like the `dualtree` flag of sklearn.neighbors.KDTree.query k-nearest neighbors selection with the creation of a new "k" dimension (+ coordinate / index) with user-defined name and size. From #3223, it would be nice if we could also pass distinct options values per index. What would be a good API for that? Describe the solution you'd like Some ideas: A. Allow passing a tuple `(labels, options_dict)` as indexer value `python ds.sel(x=([0, 2], {"method": "nearest"}), y=3)` B. Expose an `options` kwarg that would accept a nested dict `python ds.sel(x=[0, 2], y=3, options={"x": {"method": "nearest"}})` Option A does not look very readable. Option B is slightly better, although the nested dictionary is not great. Any other ideas? Some sort of context manager? Some `Index` specific API? Describe alternatives you've considered The API proposed in #3223 would look great if `method` and `tolerance` were the only accepted options, but less so for arbitrary options. Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7099/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
915057433	MDU6SXNzdWU5MTUwNTc0MzM=	5452	[community] Flexible indexes meeting	benbovy 4160723	closed		7	2021-06-08T13:32:16Z	2024-02-15T01:39:08Z	2024-02-15T01:39:08Z	MEMBER	In addition to the bi-weekly community developers meeting, we plan to have 30min meetings on a weekly basis -- every Tue 8:30-9:00 PDT (17:30-18:00 CEST) -- to discuss the flexible indexes refactor. Anyone from @pydata/xarray feel free to join! The first meeting is in a couple of hours. Zoom link (subject to change). Google calendar Meeting notes	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5452/reactions", "total_count": 5, "+1": 5, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1861543091	I_kwDOAMm_X85u9OSz	8097	Documentation rendering issues (dark mode)	benbovy 4160723	open		2	2023-08-22T14:06:03Z	2024-02-13T02:31:10Z		MEMBER	What is your issue? There is a couple of rendering issues in Xarray's documentation landing page, especially with the dark mode. we should display two versions of of the logo in the light vs. dark mode (note: if the logo is in the svg format, it may be possible to add CSS classes so that it renders consistently with the active mode) same for the images in the section cards (would be nice also to display all the images with the same width / height) if possible, it would be nice moving the twitter logo just next to the github logo (upper right) with consistent styling.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8097/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
213004586	MDU6SXNzdWUyMTMwMDQ1ODY=	1303	`xarray.core.variable.as_variable()` part of the public API?	benbovy 4160723	closed		5	2017-03-09T11:07:52Z	2024-02-06T17:57:21Z	2017-06-02T17:55:12Z	MEMBER	Is it safe to use `xarray.core.variable.as_variable()` externally? I guess that currently it is not. I have a specific use case where this would be very useful. I'm working on a package that heavily uses and extends xarray for landscape evolution modeling, and inside a custom class for model parameters I want to be able to create `xarray.Variable` objects on the fly from any provided object, e.g., a scalar value, an array-like, a `(dims, data[, attrs])` tuple, another `xarray.Variable`, a `xarray.DataArray`... exactly what `xarray.core.variable.as_variable()` does. Although I know that `Variable` objects are not needed in most use cases, in this specific case a clean solution would be the following ```python import xarray as xr class Parameter(object): `def to_variable(self, obj): return xr.as_variable(obj) # ... some validation logic on, e.g., data type, value bounds, dimensions... # ... add default attributes to the created variable (e.g., units, description...)` ``` I don't think it is a viable option to copy `as_variable()` and all its dependent code in my package as it seems to have quite a lot of logic implemented. A workaround using only public API would be something like: ```python class Parameter(object): `def to_variable(self, obj): return xr.Dataset(data_vars={'v': obj}).variables['v']` ``` but it feels a bit hacky.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1303/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
667864088	MDU6SXNzdWU2Njc4NjQwODg=	4285	Awkward array backend?	benbovy 4160723	open		38	2020-07-29T13:53:45Z	2023-12-30T18:47:48Z		MEMBER	Just curious if anyone here has thoughts on this. For more context: Awkward is like numpy but for arrays of very arbitrary (dynamic) structure. I don't know much yet about that library (I've just seen this SciPy 2020 presentation), but now I could imagine using xarray for dealing with labelled collections of geometrical / geospatial objects like polylines or polygons. At this stage, any integration between xarray and awkward arrays would be something highly experimental, but I think this might be an interesting case for flexible arrays (and possibly flexible indexes) mentioned in the roadmap. There is some discussion here: https://github.com/scikit-hep/awkward-1.0/issues/27. Does anyone see any other potential use case? cc @pydata/xarray	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4285/reactions", "total_count": 6, "+1": 6, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1989356758	I_kwDOAMm_X852kyzW	8447	Improve discoverability of backend engine options	benbovy 4160723	open		5	2023-11-12T11:14:56Z	2023-12-12T20:30:28Z		MEMBER	Is your feature request related to a problem? Backend engine options are not easily discoverable and we need to know or figure out them before passing it as kwargs to `xr.open_dataset()`. Describe the solution you'd like The solution is similar to the one proposed in #8002 for setting a new index. The API could look like this: ```python import xarray as xr ds = xr.open_dataset( file_or_obj, engine=xr.backends.engine("myengine").with_options( option1=True, option2=100, ), ) ``` where `xr.backends.engine("myengine")` returns the `MyEngineBackendEntrypoint` subclass. We would need to extend the API for `BackendEntrypoint` with a `.with_options()` factory method: ```python class BackendEntrypoint: _open_dataset_options: dict[str, Any] @classmethod def with_options(cls): """This backend does not implement `with_options`.""" raise NotImplementedError() ``` Such that ```python class MyEngineBackendEntryPoint(BackendEntrypoint): open_dataset_parameters = ("option1", "option2") @classmethod def with_options( cls, option1: bool = False, option2: int \| None = None, ): """Get the backend with user-defined options. Parameters ----------- option1 : bool, optional This is option1. option2 : int, optional This is option2. """ obj = cls() # maybe validate the given input options if option2 is None: option2 = 1 obj._options = {"option1": option1, "option2": option2} return obj def open_dataset( self, filename_or_obj: str \| os.PathLike[Any] \| BufferedIOBase \| AbstractDataStore, , drop_variables: str \| Iterable[str] \| None = None, *kwargs, # no static checker error (liskov substitution principle) ): # kwargs passed directly to open_dataset take precedence to options # or alternatively raise an error? option1 = kwargs.get("option1", self._options.get("option1", False)) ... ``` Pros: Using `.with_options(...)` would seamlessly work with IDE auto-completion, static type checkers (I guess? I'm not sure how static checkers support entry-points), documentation, etc. There is no breaking change (`xr.open_dataset(obj, engine=...)` accepts either a string or a BackenEntryPoint subtype but not yet a BackendEntryPoint object) and this feature could be adopted progressively by existing 3rd-party backends. Cons: The possible duplicated declaration of options among `open_dataset_parameters`, `.with_options()` and `.open_dataset()` does not look super nice but I don't really know how to avoid that. Describe alternatives you've considered A `BackendEntryPoint.with_options()` factory is not really needed and we could just go with `BackendEntryPoint.__init__()` instead. Perhaps `with_options` looks a bit clearer and leaves room for more flexibility in `__init__` , though? Additional context cc @jsignell https://github.com/stac-utils/pystac/issues/846#issuecomment-1405758442	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8447/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1148021907	I_kwDOAMm_X85EbWyT	6293	Explicit indexes: next steps	benbovy 4160723	open		3	2022-02-23T12:19:38Z	2023-12-01T09:34:28Z		MEMBER	5692 is ~~not merged yet~~ now merged ~~but~~ and we can ~~already~~ start thinking about the next steps. I’m opening this issue to list and track the remaining tasks. @pydata/xarray, do not hesitate to add a comment below if you think about something that is missing here. Continue the refactoring of the internals Although in #5692 everything seems to work with the current pandas index wrappers for dimension coordinates, not all of Xarray's internals have been refactored yet to fully support (or at least be compatible with) custom indexes. Here is a list of `Dataset` / `DataArray` methods that still need to be checked / updated (this list may be incomplete): [ ] `as_numpy` (#8001) [ ] `broadcast` (#6430, #6481 ) [ ] `drop_sel` (#6605, #7699) [ ] `drop_isel` [ ] `drop_dims` [ ] `drop_duplicates` (#8499) [ ] `transpose` [ ] `interpolate_na` [ ] `ffill` [ ] `bfill` [ ] `reduce` [ ] `map` [ ] `apply` [ ] `quantile` [ ] `rank` [ ] `integrate` [ ] `cumulative_integrate` [ ] `filter_by_attrs` [ ] `idxmin` [ ] `idxmax` [ ] `argmin` [ ] `argmax` [ ] `concat` (partially refactored, may not fully work with multi-dimension indexes) [ ] `polyfit` I ended up following a common pattern in #5692 when adding explicit / flexible index support for various features (it is quite generic, though, the actual procedure may vary from one case to another and many steps may be skipped): Check if it’s worth adding a new method to the Xarray `Index` base class. There may be several motivations: Avoid handling Pandas index objects inside Dataset or DataArray methods (even if we don’t plan to fully support custom indexes for everything, it is preferable to put this logic behind the `PandasIndex` or `PandasMultiIndex` wrapper classes for clarity and also if eventually we want to make Xarray less dependent on Pandas) We want a specific implementation rather than relying on the `Variable`’s corresponding method for speed-up or for other reasons, e.g., `IndexVariable.concat` exists to avoid unnecessary Pandas/Numpy conversions ; in #5692 `PandasIndex.concat` has the same logic and will fully replace the former if/once we get rid of `IndexVariable` `PandasIndex.roll` reuses `pandas.Index` indexing and `append` capabilities `Index` API closely follows DataArray, Dataset and Variable API (i.e., same method names) for consistency Within the Dataset or DataArray method, first call the `Index` API (if it exists) to create new indexes The `Indexes` class (i.e., the `.xindexes` property returns an instance of this class) provides convenient API for iterating through indexes (e.g., get a list of unique indexes, get all coordinates or dimensions for a given index, etc.) If there’s no implementation for the called `Index` API, either raise an error or fallback to calling the `Variable` API (below) depending on the case Create new coordinate variables for each of the new indexes using `Index.create_variables` It is possible to pass a dict of current coordinate variables to `Index.create_variables` ; it is used to propagate variable metadata (`dtype`, `attrs` and `encoding`) Not all indexes should create new coordinate variables, only those for which it is possible to reuse index data as coordinate variable data (like Pandas indexes) Iterate through the variables and call the `Variable` API (if it exists) Skip new coordinate variables created at the previous step (just reuse it) Propagate the indexes that are not affected by the operation and clean up all indexes, i.e., ensure consistency between indexes and coordinate variables There is a couple of convenient methods that have been added in #5692 for that purpose: `filter_indexes_from_coords` and `assert_no_index_corrupted` Replace indexes and variables, e.g., using `_replace`, `_replace_with_new_dims` or `_overwrite_indexes` methods Relax all constraints related to “dimension (index) coordinates” in Xarray [x] Allow multi-dimensional variables with the name matching one of its dimensions: #2233 #2405 (https://github.com/pydata/xarray/pull/2405#issuecomment-419969570) 7989 Indexes repr [x] Add an `Indexes` section to Dataset and DataArray reprs 6795 7185 [ ] Make the repr of `Indexes` (i.e., `.xindexes` property) consistent with the repr of `Coordinates` (`.coords` property) [x] Add `Index._repr_inline_` for tweaking the inline representation of each index shown in the reprs above 7183 Public API for assigning and (re)setting indexes There is no public API yet for creating and/or assigning existing indexes to Dataset and DataArray objects. [ ] Enable and/or document the `indexes` parameter in Dataset and DataArray constructors [ ] Depreciate the implicit creation of pandas multi-index wrappers (and their corresponding coordinates) from anything passed via the `data`, `data_vars` or `coords` arguments in favor of a more explicit way to pass it. [ ] https://github.com/pydata/xarray/issues/6633 (pass empty dictionary) 6392 7214 7368 [x] Add `set_xindex` and `drop_indexes` methods 6849 6971 Depreciate `set_index` and `reset_index`? See https://github.com/pydata/xarray/issues/4366#issuecomment-920458966 We still need to figure out how best we can (1) assign existing indexes (possibly with their coordinates) and (2) pass index build options. Other public API for index-based operations To fully leverage the power and flexibility of custom indexes, we might want to update some parts of Xarray’s public API in order to allow passing arbitrary options per index. For example: [ ] `sel`: the current `method` and `tolerance` may not be relevant for all indexes, pass extra arguments to Scipy's cKDTree.query, etc. #7099 [ ] `align`: #2217 Also: [ ] Make public the `Indexes` API as it provides convenient methods that might be useful for end-users [ ] Import the `Index` base class into Xarray’s main namespace (i.e., `xr.Index`)? Also `PandasIndex` and `PandasMultiIndex`? The latter may be useful if we depreciate `set_index(append=True)` and/or if we depreciate “unpacking” `pandas.MultiIndex` objects to coordinates when given as `coords` in the Dataset / DataArray constructors. [ ] Add references in docstrings (https://github.com/pydata/xarray/pull/5692#discussion_r820117354). Documentation [ ] User guide: [x] Update the “Terminology” section: “Index” may include custom indexes, review “Dimension coordinate” / “Non-dimension coordinate” as “Indexed coordinate” / “Non-indexed coordinate” [ ] Update the “Data structure” section such that it clearly mentions indexes as 1st class citizen of the Xarray data model [ ] Maybe update other parts of the documentation that refer to the concept of “dimension coordinate” [ ] API reference: [ ] add `Indexes` API [ ] add `Index` API: #6975 [ ] Xarray internals: add a subsection on how to add custom indexes, maybe with some basic examples: #6975 [ ] Update development roadmap section Index types and helper classes built in Xarray [ ] Since a lot of potential use-cases for custom indexes may consist in adding some extra logic on top of one or more pandas indexes along one or more dimensions (i.e., “meta-indexes”), it might be worth providing a helper `Index` abstract subclass that would basically dispatch the given arguments to the corresponding, encapsulated `PandasIndex` instances and then merge the results 7182 [ ] Depreciate `PandasMultiIndex` dimension coordinate? 3rd party indexes [ ] Add custom index entrypoint / plugin system, similarly to storage backend entrypoints	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6293/reactions", "total_count": 12, "+1": 6, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 6, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1890893841	I_kwDOAMm_X85wtMAR	8171	Fancy reprs	benbovy 4160723	open		10	2023-09-11T16:46:43Z	2023-09-15T21:07:52Z		MEMBER	What is your issue? In Xarray we already have the plain-text and html reprs, which is great. Recently, I've tried anywidget and I think that it has potential to overcome some of the limitations of the current repr and possibly go well beyond it. The main advantages of anywidget: it is broadly compatible with jupyter-like front-ends (Jupyterlab, notebook, vscode, colab, etc.), although I haven't tested it myself on all those front-ends yet. it is super easy to get started: almost no project setup (build, packaging) is required before experimenting with it, although it still requires writing Javascript / HTML / CSS, etc.. I don't think we should replace the current html repr (it is still useful to have a basic, pure HTML/CSS version), but having a new widget could improve some aspects like not including the whole CSS each time an object repr is displayed, removing some HTML/CSS hacks... and actually has much more potential since we would have the whole javascript ecosystem at our fingertips (quick plots, etc.). Also bi-directional communication with Python is possible. I'm opening this issue to brainstorm about what would be nice to have in widget-based Xarray reprs: fancy hover effects (e.g., highlight all variables sharing common dimensions, coordinates sharing a common index, etc.) more icons next to each variable reprs (attributes, array repr, quick plot? quick map?) ... ? cc @pydata/xarray	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8171/reactions", "total_count": 5, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 2, "eyes": 0 }		xarray 13221727	issue
1889195671	I_kwDOAMm_X85wmtaX	8166	Dataset.from_dataframe: deprecate expanding the multi-index	benbovy 4160723	open		3	2023-09-10T15:54:31Z	2023-09-11T06:20:50Z		MEMBER	What is your issue? Let's continue here the discussion about changing the behavior of Dataset.from_dataframe (see https://github.com/pydata/xarray/pull/8140#issuecomment-1712485626). The current behaviour of Dataset.from_dataframe where it always unstacks feels wrong to me. To me, it seems sensible that Dataset.from_dataframe(df) automatically creates a Dataset with PandasMultiIndex if df has a MultiIndex. The user can then use that or quite easily unstack to a dense or sparse array. If we don't unstack anymore the multi-index in `Dataset.from_dataframe`, are we OK that the "Dataset -> DataFrame -> Dataset" round-trip will not yield expected results unless we unstack explicitly? ```python ds = xr.Dataset( {"foo": (("x", "y"), [[1, 2], [3, 4]])}, coords={"x": ["a", "b"], "y": [1, 2]}, ) df = ds.to_dataframe() ds2 = xr.Dataset.from_dataframe(df, dim="z") ds2.identical(ds) # False ds2.unstack("z").identical(ds) # True ``` cc @max-sixty @dcherian	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8166/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1364388790	I_kwDOAMm_X85RUuu2	7002	Custom indexes and coordinate (re)ordering	benbovy 4160723	open		2	2022-09-07T09:44:12Z	2023-08-23T14:35:32Z		MEMBER	What is your issue? (From https://github.com/pydata/xarray/issues/5647#issuecomment-946546464). The current alignment logic (as refactored in #5692) requires that two compatible indexes (i.e., of the same type) must relate to one or more coordinates with matching names but also in a matching order. For some multi-coordinate indexes like `PandasMultiIndex` this makes sense. However, for other multi-coordinate indexes (e.g., staggered grid indexes) the order of the coordinates doesn't matter much. Possible options: Setting new Xarray indexes may reorder the coordinate variables, possibly via `Index.create_variables()`, to ensure consistent order Xarray indexes must implement a `Index.matching_key` abstract property in order to support re-indexing and alignment. Take care of coordinate order (and maybe other things) inside `Index.join` and `Index.equals`, e.g., for `PandasMultiIndex` maybe reorder the levels beforehand. pros: more flexible cons: not great to implicitly reorder levels if it's a costly operation? Find matching indexes using a two-passes approach: (1) group all indexes by dimension name and (2) check compatibility between the indexes listed in each group.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7002/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
979316661	MDU6SXNzdWU5NzkzMTY2NjE=	5738	Flexible indexes: how to handle possible dimension vs. coordinate name conflicts?	benbovy 4160723	closed		4	2021-08-25T15:31:39Z	2023-08-23T13:28:41Z	2023-08-23T13:28:40Z	MEMBER	Another thing that I've noticed while working on #5692. Currently it is not possible to have a Dataset with a same name used for both a dimension and a multi-index level. I guess the reason is to prevent some errors like unmatched dimension sizes when eventually the multi-index is dropped with renamed dimension(s) according to the level names (e.g., with `sel` or `unstack`). See #2299. I'm wondering how we should handle this in the context of flexible / custom indexes: A. Keep this current behavior as a special case for (pandas) multi-indexes. This would avoid breaking changes but how to support custom indexes that could eventually be used like pandas multi-indexes in `sel` or `stack`? B. Introduce some tag in `xarray.Index` so that we can identify a multi-coordinate index that behaves like a hierarchical index (i.e., levels may be dropped into a single index/coordinate with dimension renaming) C. Do not allow any dimension name matching the name of a coordinate attached to a multi-coordinate index. This seems silly? D. Eventually revert #2353 and let users taking care of potential conflicts.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5738/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1175329407	I_kwDOAMm_X85GDhp_	6392	Pass indexes to the Dataset and DataArray constructors	benbovy 4160723	closed		6	2022-03-21T12:41:51Z	2023-07-21T20:40:05Z	2023-07-21T20:40:04Z	MEMBER	Is your feature request related to a problem? This is part of #6293 (explicit indexes next steps). Describe the solution you'd like A `Mapping[Hashable, Index]` would probably be the most obvious (optional) value type accepted for the `indexes` argument of the Dataset and DataArray constructors. pros: consistent with the `xindexes` property cons: need to be careful with what is passed as `coords` and `indexes` multi-indexes: redundancy and order matters (e.g., pandas multi-index levels) An example with a pandas multi-index Currently a pandas multi-index may be passed directly as one (dimension) coordinate ; it is then "unpacked" into one dimension (tuple values) coordinate and one or more level coordinates. I would suggest depreciating this behavior in favor of a more explicit (although more verbose) way to pass an existing pandas multi-index: ```python import pandas as pd import xarray as xr pd_idx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("foo", "bar")) idx = xr.PandasMultiIndex(pd_idx, "x") indexes = {"x": idx, "foo": idx, "bar": idx} coords = idx.create_variables() ds = xr.Dataset(coords=coords, indexes=indexes) ``` The cases below should raise an error: ```python ds = xr.Dataset(indexes=indexes) ValueError: missing coordinate(s) for index(es): 'x', 'foo', 'bar' ds = xr.Dataset( coords=coords, indexes={"x": idx, "foo": idx}, ) ValueError: missing index(es) for coordinate(s): 'bar' ds = xr.Dataset( coords={"x": coords["x"], "foo": [0, 1, 2, 3], "bar": coords["bar"]}, indexes=indexes, ) ValueError: conflict between coordinate(s) and index(es): 'foo' ds = xr.Dataset( coords=coords, indexes={"x": idx, "foo": idx, "bar": xr.PandasIndex([0, 1, 2], "y")}, ) ValueError: conflict between coordinate(s) and index(es): 'bar' ``` Should we raise an error or simply ignore the index in the case below? ```python ds = xr.Dataset(coords=coords) ValueError: missing index(es) for coordinate(s): 'x', 'foo', 'bar' or create unindexed coordinates 'foo' and 'bar' and a 'x' coordinate with a single pandas index ``` Should we silently reorder the coordinates and/or indexes when the levels are not passed in the right order? It seems odd requiring mapping elements be passed in a given order. ```python ds = xr.Dataset(coords=coords, indexes={"bar": idx, "x": idx, "foo": idx}) list(ds.xindexes.keys()) ["x", "foo", "bar"] ``` How to generalize to any (custom) index? With the case of multi-index, it is pretty easy to check whether the coordinates and indexes are consistent because we ensure consistent `pd_idx.names` vs. coordinate names and because `idx.get_variables()` returns Xarray `IndexVariable` objects where variable data wraps the pandas multi-index. However, this may not be easy for other indexes. Some Xarray custom indexes (like a KD-Tree index) likely won't return anything from `.get_variables()` as they don't support wrapping internal data as coordinate data. Right now there's nothing in the Xarray `Index` base class that could help checking consistency between indexes vs. coordinates for any kind of index. How could we solve this? A. add a `.coords` property to the Xarray `Index` base class, that returns a `dict[Hashable, IndexVariable]`. Ambiguous when an Index is created directly, i.e., like above `xr.PandasMultiIndex(pd_idx, "x")`. Should `.coords` return `None` and return the coordinates returned by the last `.get_variables()` call? What if different sets of coordinates refer to a common index (e.g., after copying the coordinate variables, etc.)? B. add a `.coord_names` property to the Xarray `Index` base class that returns `tuple[Hashable, ...]`, and add a private attribute to `IndexVariable` that returns the index object (or return it via a very lightweight `IndexAdapter` base class used to wrap variable data). `Index.get_variables(variables)` would by default return shallow copies of the input variables with a reference to the index object. If that's necessary, we could also store the coordinate dimensions in `coord_names`, i.e., using `tuple[tuple[Hashable, tuple[Hashable, ...]], ...]`. I think I prefer the second option. Describe alternatives you've considered Also allow passing index types (and build options) via `indexes` I.e., `Mapping[Hashable, Index \| Type[Index] \| tuple[TypeIndex, Mapping[Any, Any]]]`, so that new indexes can be created from the passed coordinates at DataArray or Dataset creation. pros: Flexible. cons: This is complicated. Constructing the Dataset / DataArray (with default indexes) first then calling `.set_index` is probably better. Hard to deal with multi-index (redundancy of build option, etc.) Pass multi-indexes once, grouped by coordinate names I.e., `indexes` keys accept tuples: `Mapping[Hashable \| tuple[Hashable, ...], Index]` pros: No redundancy and easier to check consistency between indexes vs. coordinates cons: Not consistent with the `.xindexes` property Complicated when eventually using tuples for coordinate names? Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6392/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1812008663	I_kwDOAMm_X85sAQ7X	8002	Improve discoverability of index build options	benbovy 4160723	open		2	2023-07-19T13:54:09Z	2023-07-19T17:48:51Z		MEMBER	Is your feature request related to a problem? Currently `Dataset.set_xindex(coord_names, index_cls=None, options)` allows passing index build options (if any) via the `options` arguments. Those options are not easily discoverable, though (no auto-completion, etc.). Describe the solution you'd like What about something like this? ```python ds.set_xindex("x", MyCustomIndex.with_options(foo=1, bar=True)) or ds.set_xindex("x", MyCustomIndex.with_options(foo=1, bar=True)) ``` This would require adding a `.with_options()` class method that can be overridden in Index subclasses (optional): ```python xarray.core.indexes class Index: @classmethod def with_options(cls) -> tuple[type[Self], dict[str, Any]]: return cls, {} ``` ```python third-party code from xarray.indexes import Index class MyCustomIndex(Index): `@classmethod def with_options(cls, foo: int = 0, bar: bool = False) -> tuple[type[Self], dict[str, Any]]: """Set a new MyCustomIndex with options. Parameters ------------ foo : int, optional The foo option (default: 1). bar : bool, optional The bar option (default: False). """ return cls, {"foo": foo, "bar": bar}` ``` Thoughts? Describe alternatives you've considered Build options are also likely defined in the Index constructor, e.g., ```python third-party code from xarray.indexes import Index class MyCustomIndex(Index): `def __init__(self, data, foo=0, bar=False): ...` ``` However, the Index constructor is not public API (only used internally and indirectly in Xarray when setting a new index from existing coordinates). Any other idea? Additional context No response*	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8002/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1472470718	I_kwDOAMm_X85XxB6-	7346	assign_coords reset all dimension coords to default (pandas) index	benbovy 4160723	closed		0	2022-12-02T08:07:55Z	2022-12-02T16:32:41Z	2022-12-02T16:32:41Z	MEMBER	What happened? See https://github.com/martinfleis/xvec/issues/13#issue-1472023524 What did you expect to happen? `assign_coords()` should preserve the index of coordinates that are not updated or not part of a dropped multi-coordinate index. Minimal Complete Verifiable Example See https://github.com/martinfleis/xvec/issues/13#issue-1472023524 MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? No response Environment Xarray version 2022.11.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7346/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1151751524	I_kwDOAMm_X85EplVk	6308	xr.doctor(): diagnostics on a Dataset / DataArray ?	benbovy 4160723	open		4	2022-02-26T12:10:07Z	2022-11-07T15:28:35Z		MEMBER	Is your feature request related to a problem? Recently I've been reading through various issue reports here and there (GH issues and discussions, forums, etc.) and I'm wondering if it wouldn't be useful to have some function in Xarray that inspects a Dataset or DataArray and reports a bunch of diagnostics, so that the community could better help troubleshooting performance or other issues faced by users. It's not always obvious where to look (e.g., number of chunks of a dask array, number of tasks of a dask graph, etc.) to diagnose issues, sometimes even for experienced users. Describe the solution you'd like A `xr.doctor(dataset_or_dataarray)` top-level function (or `Dataset.doctor()` / `DataArray.doctor()` methods) that would perform a battery of checks and return helpful diagnostics, e.g., "Data variable "x" wraps a dask array that contains a lot of tasks, which may affect performance" "Data variable "x" wraps a dask array that contains many small chunks" ... possibly many other diagnostics? Describe alternatives you've considered None Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6308/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1322198907	I_kwDOAMm_X85Ozyd7	6849	Public API for setting new indexes: add a set_xindex method?	benbovy 4160723	closed		5	2022-07-29T12:38:34Z	2022-09-28T07:25:16Z	2022-09-28T07:25:16Z	MEMBER	What is your issue? xref https://github.com/pydata/xarray/pull/6795#discussion_r932665544 and #6293 (Public API section). The `scipy22` branch contains the addition of a `.set_xindex()` method to DataArray and Dataset so that participants at the SciPy 2022 Xarray sprint could experiment with custom indexes. After thinking more about it, I'm wondering if it couldn't actually be part of Xarray's public API alongside `.set_index()` (at least for a while). Having two methods `.set_xindex()` vs. `.set_index()` would be quite consistent with the `.xindexes` vs. `.indexes` properties that are already there. I actually like the `.set_xindex()` API proposed in the `scipy22`, i.e., setting one index at a time from one or more coordinates, possibly with build options. While it could be possible to support both that and `.set_index()`'s current API (quite specific to pandas multi-indexes) all in one method, it would certainly result in a much more confusing API and internal implementation. In the long term we could progressively get rid of `.indexes` and `.set_index()` and/or rename `.xindexes` to `.indexes` and `.set_xindex()` to `.set_index()`. Thoughts @pydata/xarray?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6849/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1361896826	I_kwDOAMm_X85RLOV6	6989	reset multi-index to single index (level): coordinate not renamed	benbovy 4160723	closed	benbovy 4160723	0	2022-09-05T12:45:22Z	2022-09-27T10:35:39Z	2022-09-27T10:35:39Z	MEMBER	What happened? Resetting a multi-index to a single level (i.e., a single index) does not rename the remaining level coordinate to the dimension name. What did you expect to happen? While it is certainly more consistent not to rename the level coordinate here (since an index can be assigned to a non-dimension coordinate now), it breaks from the old behavior. I think it's better not introduce any breaking change. As discussed elsewhere, we might eventually want to deprecate `reset_index` in favor of `drop_indexes` (#6971). Minimal Complete Verifiable Example ```Python import pandas as pd import xarray as xr midx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("foo", "bar")) ds = xr.Dataset(coords={"x": midx}) <xarray.Dataset> Dimensions: (x: 4) Coordinates: * x (x) object MultiIndex * foo (x) object 'a' 'a' 'b' 'b' * bar (x) int64 1 2 1 2 Data variables: empty rds = ds.reset_index("foo") v2022.03.0 <xarray.Dataset> Dimensions: (x: 4) Coordinates: * x (x) int64 1 2 1 2 foo (x) object 'a' 'a' 'b' 'b' Data variables: empty v2022.06.0 <xarray.Dataset> Dimensions: (x: 4) Coordinates: foo (x) object 'a' 'a' 'b' 'b' * bar (x) int64 1 2 1 2 Dimensions without coordinates: x Data variables: empty ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? No response Environment	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6989/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1361626450	I_kwDOAMm_X85RKMVS	6987	Indexes.get_unique() TypeError with pandas indexes	benbovy 4160723	closed	benbovy 4160723	0	2022-09-05T09:02:50Z	2022-09-23T07:30:39Z	2022-09-23T07:30:39Z	MEMBER	@benbovy I also just tested the `get_unique()` method that you mentioned and maybe noticed a related issue here, which I'm not sure is wanted / expected. Taking the above dataset `ds`, accessing this function results in an error: ```python ds.indexes.get_unique() TypeError: unhashable type: 'MultiIndex' ``` However, for `xindexes` it works: ```python ds.xindexes.get_unique() [<xarray.core.indexes.PandasMultiIndex at 0x7f105bf1df20>] ``` Originally posted by @lukasbindreiter in https://github.com/pydata/xarray/issues/6752#issuecomment-1236717180	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6987/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
302077805	MDU6SXNzdWUzMDIwNzc4MDU=	1961	Extend xarray with custom "coordinate wrappers"	benbovy 4160723	closed		10	2018-03-04T11:26:15Z	2022-09-19T08:47:45Z	2022-09-19T08:47:44Z	MEMBER	Recent and ongoing developments in xarray turn DataArray and Dataset more and more into data wrappers that are extensible at (almost) every level: domain-specific methods (accessors) io backends (netcdf, raster, zarr, etc.) via an abstract `DataStore` interface array backends (numpy, dask, sparse) via multidispatch or hooks (#1938) soon custom indexes? (kd-tree, out-of-core indexes... #1603, #1650, #475) Regarding the latter, I’m thinking about the idea of extending xarray at an even more abstract level, i.e., the possibility of adding / registering "coordinate wrappers" to `DataArray` or `Dataset` objects. Basically, it would correspond to adding any object that allows to do some operation based on one or several coordinates ~~(I haven’t found any better name than "coordinate agent" to describe that)~~. EDIT: "coordinate agents" may not be quite right here, I changed that to "coordinate wrappers") Indexes are a specific case of coordinate wrappers that serve the purpose of indexing. This is built in xarray. While indexing is enough in 80% of cases, I see a couple of use cases where other coordinate wrappers (built outside of xarray) would be nice to have: Grids. For example, xgcm implements operations (interp, diff) on physical axes that may each include several coordinates, depending on the position of the coordinate labels on the axis (center, left…). Other grids define their topology using a greater number of coordinates (e.g., ugrid). Storing regridding weights might be another use case? Clocks. For example, xarray-simlab use one or several coordinates to define the timeline of a computational simulation. In those examples we usually rely on coordinate attributes and/or classes that encapsulate xarray objects to implement the specific features that we need. While it works, it has limitations and I think it can be improved. Custom coordinate wrappers would be a way of extending xarray that is very consistent with other current (or considered) extension mechanisms. This is still a very vague idea and I’m sure that there are lots of details that can be discussed (serialization, etc.). But before going further, I’d like to know your thoughts @pydata/xarray. Do you think it is a silly idea? Do you have in mind other use cases where custom coordinate wrappers would be useful?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1961/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
955936490	MDU6SXNzdWU5NTU5MzY0OTA=	5647	Flexible indexes: review the implementation of alignment and merge	benbovy 4160723	closed		12	2021-07-29T15:03:23Z	2022-09-07T09:47:13Z	2022-09-07T09:47:13Z	MEMBER	The current implementation of the `align` function is problematic in the context of flexible indexes because: the sizes of the joined indexes are reused for checking compatibility with unlabelled dimension sizes the joined indexes are used as indexers to compute the aligned Dataset / DataArray. This currently works well since a pd.Index can be directly treated as a 1-d array but this won’t be always the case anymore with custom indexes. I'm opening this issue to gather ideas on how best to handle alignment in a more flexible way (I haven't been thinking much at this problem yet).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5647/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1325016510	I_kwDOAMm_X85O-iW-	6860	Align with join='override' may update index coordinate metadata	benbovy 4160723	open		0	2022-08-01T21:45:13Z	2022-08-01T21:49:41Z		MEMBER	What happened? It seems that `align(, join="override")` may have affected and still may affect the metadata of index coordinate data in an incorrect way. See the MCV example below. cf. @keewis' original https://github.com/pydata/xarray/pull/6857#discussion_r934425142. What did you expect to happen? Index coordinate metadata unaffected by alignment (i.e., metadata is passed through object -> aligned object for each object), like for align with other join methods. Minimal Complete Verifiable Example ```Python import xarray as xr ds1 = xr.Dataset(coords={"x": ("x", [1, 2, 3], {"foo": 1})}) ds2 = xr.Dataset(coords={"x": ("x", [1, 2, 3], {"bar": 2})}) aligned1, aligned2 = xr.align(ds1, ds2, join="override") aligned1.x.attrs v2022.03.0 -> {'foo': 1} v2022.06.0 -> {'foo': 1, 'bar': 2} PR #6857 -> {'foo': 1} expected -> {'foo': 1} aligned2.x.attrs v2022.03.0 -> {} v2022.06.0 -> {'foo': 1, 'bar': 2} PR #6857 -> {'foo': 1, 'bar': 2} expected -> {'bar': 2} aligned11, aligned22 = xr.align(ds1, ds2, join="inner") aligned11.x.attrs {'foo': 1} aligned22.x.attrs {'bar': 2} ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response* Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 \| packaged by conda-forge \| (default, Jul 11 2021, 03:36:15) [Clang 11.1.0 ] python-bits: 64 OS: Darwin OS-release: 20.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.21.2.dev137+g30023a484 pandas: 1.4.0 numpy: 1.22.2 scipy: 1.7.1 netCDF4: 1.5.8 pydap: installed h5netcdf: 0.11.0 h5py: 3.4.0 Nio: None zarr: 2.6.1 cftime: 1.5.2 nc_time_axis: 1.2.0 PseudoNetCDF: installed rasterio: 1.2.10 cfgrib: 0.9.8.5 iris: 3.0.4 bottleneck: 1.3.2 dask: 2022.01.1 distributed: 2022.01.1 matplotlib: 3.4.3 cartopy: 0.20.1 seaborn: 0.11.1 numbagg: 0.2.1 fsspec: 0.8.5 cupy: None pint: 0.16.1 sparse: 0.13.0 flox: None numpy_groupies: None setuptools: 57.4.0 pip: 20.2.4 conda: None pytest: 6.2.5 IPython: 7.27.0 sphinx: 3.3.1	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6860/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1322190255	I_kwDOAMm_X85OzwWv	6848	Update API	benbovy 4160723	closed		0	2022-07-29T12:30:08Z	2022-07-29T12:30:23Z	2022-07-29T12:30:23Z	MEMBER		{ "url": "https://api.github.com/repos/pydata/xarray/issues/6848/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
968796847	MDU6SXNzdWU5Njg3OTY4NDc=	5697	Coerce the labels passed to Index.query to array-like objects	benbovy 4160723	closed		3	2021-08-12T13:09:40Z	2022-03-17T17:11:43Z	2022-03-17T17:11:43Z	MEMBER	When looking at #5691 I noticed that the labels are sometimes coerced to arrays (i.e., #3153) but not always. Later in `PandasIndex.query` those may again be coerced to arrays (i.e., `_as_array_tuplesafe`). In #5692 (https://github.com/pydata/xarray/pull/5692/commits/a551c7f05abf90a492fb59068b59ebb2bac8cb4c) they are always coerced to arrays before maybe be converted as scalars. Shouldn't we therefore make things easier and ensure that the labels given to `xarray.Index.query()` always have an array interface? This would also yield a more predictable behavior to anyone who wants to implement custom xarray indexes.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5697/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
968990058	MDU6SXNzdWU5Njg5OTAwNTg=	5700	Selection with multi-index and float32 values	benbovy 4160723	closed		0	2021-08-12T14:55:11Z	2022-03-17T17:11:43Z	2022-03-17T17:11:43Z	MEMBER	I guess it's rather an edge case, but a similar issue than the one fixed in #3153 may occur with multi-indexes: ```python foo_data = ['a', 'a', 'b', 'b'] bar_data = np.array([0.1, 0.2, 0.7, 0.9], dtype=np.float32) da = xr.DataArray([1, 2, 3, 4], dims="x", coords={"foo": ("x", foo_data), "bar": ("x", bar_data)}) da = da.set_index(x=["foo", "bar"]) ``` ```python da.sel(bar=0.1) KeyError: 0.1 ``` ```python da.sel(bar=np.array(0.1, dtype=np.float32).item()) <xarray.DataArray (foo: 1)> array([1]) Coordinates: * foo (foo) object 'a' ``` (xarray version: 0.18.2 as there's a regression introduced in 0.19.0 #5691)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5700/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
955605233	MDU6SXNzdWU5NTU2MDUyMzM=	5645	Flexible indexes: handle renaming coordinate variables	benbovy 4160723	closed		0	2021-07-29T08:42:00Z	2022-03-17T17:11:42Z	2022-03-17T17:11:42Z	MEMBER	We should have some API in `xarray.Index` to update the index when its corresponding coordinate variables are renamed. This currently implemented here where the underlying `pd.Index` name(s) are updated: https://github.com/pydata/xarray/blob/c5530d52d1bcbd071f4a22d471b728a4845ea36f/xarray/core/dataset.py#L3299-L3314 This logic should be moved into `PandasIndex` and `PandasMultiIndex`. Other, custom indexes might also have internal attributes to update, so we might need formal API for that.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5645/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1005623261	I_kwDOAMm_X8478Jfd	5812	Check explicit indexes when comparing two xarray objects	benbovy 4160723	open		2	2021-09-23T16:19:32Z	2021-09-24T15:59:02Z		MEMBER	Is your feature request related to a problem? Please describe. With the explicit index refactor, two Dataset or DataArray objects `a` and `b` may have the same variables / coordinates and attributes but different indexes. Describe the solution you'd like I'd suggest that `a.identical(b)` by default also checks for equality between`a.xindexes` and `b.xindexes`. One drawback is when we want to check either the attributes or the indexes but not both. Should we add options like suggested in #5733 then?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5812/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1006335177	I_kwDOAMm_X847-3TJ	5814	Confusing assertion message when comparing datasets with differing coordinates	benbovy 4160723	open		1	2021-09-24T10:50:11Z	2021-09-24T15:17:00Z		MEMBER	What happened: When two datasets `a` and `b` have only differing coordinates, `xr.testing.assert_` may output a confusing message that also reports differing data variables (although strictly equal/identical) sharing common dimensions with those differing coordinates. I guess it is because when comparing the data variables we compare `DataArray` objects (thus including the coordinates). What you expected to happen: An output assertion error message that shows only the differing coordinates. Minimal Complete Verifiable Example: ```python import xarray as xr a = xr.Dataset(data_vars={"var": ("x", [10.0, 11.0])}, coords={"x": [0, 1]}) b = xr.Dataset(data_vars={"var": ("x", [10.0, 11.0])}, coords={"x": [2, 3]}) xr.testing.assert_equal(a, b) AssertionError: Left and right Dataset objects are not equal Differing coordinates: L x (x) int64 0 1 R * x (x) int64 2 3 Differing data variables: L var (x) float64 10.0 11.0 R var (x) float64 10.0 11.0 ``` I would rather expect: ```python xr.testing.assert_equal(a, b) AssertionError: Left and right Dataset objects are not equal Differing coordinates: L * x (x) int64 0 1 R * x (x) int64 2 3 ``` Anything else we need to know?: Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 \| packaged by conda-forge \| (default, Jul 11 2021, 03:36:15) [Clang 11.1.0 ] python-bits: 64 OS: Darwin OS-release: 20.3.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.1.dev72+ga8d84c703.d20210901 pandas: 1.3.2 numpy: 1.21.2 scipy: 1.7.1 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.8.1 h5py: 3.3.0 Nio: None zarr: 2.6.1 cftime: 1.5.0 nc_time_axis: 1.2.0 PseudoNetCDF: installed rasterio: 1.2.1 cfgrib: 0.9.8.5 iris: 3.0.4 bottleneck: 1.3.2 dask: 2021.01.1 distributed: 2021.01.1 matplotlib: 3.4.3 cartopy: 0.18.0 seaborn: 0.11.1 numbagg: None fsspec: 0.8.5 cupy: None pint: 0.16.1 sparse: 0.11.2 setuptools: 57.4.0 pip: 20.2.4 conda: None pytest: 6.2.5 IPython: 7.27.0 sphinx: 3.3.1	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5814/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
985162305	MDU6SXNzdWU5ODUxNjIzMDU=	5755	Mypy errors with the last version of _typed_ops.pyi	benbovy 4160723	closed		5	2021-09-01T13:34:52Z	2021-09-13T10:53:16Z	2021-09-13T00:04:54Z	MEMBER	What happened: Since #5569 I get a lot of mypy errors from `_typed_ops.pyi` (see below). What's weird is that it is not happening in all cases: `$ mypy # ok $ mypy . # errors $ pre-commit run --all-files # ok $ pre-commit run # errors $ git commit # (via pre-commit hooks) errors` I also tried `pre-commit clean` with no luck. EDIT: I also tried on a freshly cloned xarray repository. @max-sixty @Illviljan Any idea on what's happening? What you expected to happen: No mypy error in all cases. Anything else we need to know?: xarray/core/_typed_ops.pyi:32: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:33: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:34: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:35: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:36: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:37: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:38: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:39: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:40: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:41: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:42: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:43: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:44: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:45: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:46: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:47: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:48: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:49: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:50: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:51: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:52: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:53: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:54: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:55: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:56: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:57: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:60: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:61: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:62: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:63: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:64: error: The erased type of self "xarray.core.dataset.Dataset" is not a supertype of its class "xarray.core._typed_ops.DatasetOpsMixin" [misc] xarray/core/_typed_ops.pyi:65: error: The erased type of self "xarray.core.dataset.Dataset" is not a supertype of its class "xarray.core._typed_ops.DatasetOpsMixin" [misc] xarray/core/_typed_ops.pyi:66: error: The erased type of self "xarray.core.dataset.Dataset" is not a supertype of its class "xarray.core._typed_ops.DatasetOpsMixin" [misc] xarray/core/_typed_ops.pyi:67: error: The erased type of self "xarray.core.dataset.Dataset" is not a supertype of its class "xarray.core._typed_ops.DatasetOpsMixin" [misc] xarray/core/_typed_ops.pyi:77: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:83: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:89: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:95: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:101: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:107: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:113: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:119: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:125: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:131: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:137: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:143: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:149: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:155: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:161: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:167: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:173: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:179: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:185: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:191: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:197: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:203: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:209: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:215: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:221: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:227: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:230: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:231: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:232: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:233: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:234: error: The erased type of self "xarray.core.dataarray.DataArray" is not a supertype of its class "xarray.core._typed_ops.DataArrayOpsMixin" [misc] xarray/core/_typed_ops.pyi:235: error: The erased type of self "xarray.core.dataarray.DataArray" is not a supertype of its class "xarray.core._typed_ops.DataArrayOpsMixin" [misc] xarray/core/_typed_ops.pyi:236: error: The erased type of self "xarray.core.dataarray.DataArray" is not a supertype of its class "xarray.core._typed_ops.DataArrayOpsMixin" [misc] xarray/core/_typed_ops.pyi:237: error: The erased type of self "xarray.core.dataarray.DataArray" is not a supertype of its class "xarray.core._typed_ops.DataArrayOpsMixin" [misc] xarray/core/_typed_ops.pyi:247: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:253: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:259: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:265: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:271: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:277: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:283: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:289: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:295: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:301: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:307: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:313: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:319: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:325: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:331: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:337: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:343: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:349: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:355: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:361: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:367: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:373: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:379: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:385: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:391: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:397: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:400: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:401: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:402: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:403: error: Self argument missing for a non-static method (or an invalid type for self) [misc] xarray/core/_typed_ops.pyi:404: error: The erased type of self "xarray.core.variable.Variable" is not a supertype of its class "xarray.core._typed_ops.VariableOpsMixin" [misc] xarray/core/_typed_ops.pyi:405: error: The erased type of self "xarray.core.variable.Variable" is not a supertype of its class "xarray.core._typed_ops.VariableOpsMixin" [misc] xarray/core/_typed_ops.pyi:406: error: The erased type of self "xarray.core.variable.Variable" is not a supertype of its class "xarray.core._typed_ops.VariableOpsMixin" [misc] xarray/core/_typed_ops.pyi:407: error: The erased type of self "xarray.core.variable.Variable" is not a supertype of its class "xarray.core._typed_ops.VariableOpsMixin" [misc] Environment: mypy 0.910 python 3.9.6 (also tested with 3.8)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5755/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
977149831	MDU6SXNzdWU5NzcxNDk4MzE=	5732	Coordinates implicitly created when passing a DataArray as coord to Dataset constructor	benbovy 4160723	open		3	2021-08-23T15:20:37Z	2021-08-24T14:18:09Z		MEMBER	I stumbled on this while working on #5692. Is this intended behavior or unwanted side effect? What happened: Create a new Dataset by passing a DataArray object as coordinate also add the DataArray coordinates to the dataset: ```python foo = xr.DataArray([1.0, 2.0, 3.0], coords={"x": [0, 1, 2]}, dims="x") ds = xr.Dataset(coords={"foo": foo}) ds <xarray.Dataset> Dimensions: (x: 3) Coordinates: * x (x) int64 0 1 2 foo (x) float64 1.0 2.0 3.0 Data variables: empty ``` What you expected to happen: The behavior above seems a bit counter-intuitive to me. I would rather expect no additional coordinates auto-magically added to the dataset, i.e. only one `foo` coordinate in this example: ```python ds <xarray.Dataset> Dimensions: (x: 3) Coordinates: foo (x) float64 1.0 2.0 3.0 Data variables: empty ``` Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 \| packaged by conda-forge \| (default, Nov 27 2020, 19:17:44) [Clang 11.0.0 ] python-bits: 64 OS: Darwin OS-release: 20.3.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.0 pandas: 1.1.5 numpy: 1.21.1 scipy: 1.7.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.3.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.07.2 distributed: 2021.07.2 matplotlib: 3.3.3 cartopy: 0.19.0.post1 seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20201009 pip: 20.3.1 conda: None pytest: 6.1.2 IPython: 7.25.0 sphinx: 3.3.1	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5732/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
933551030	MDU6SXNzdWU5MzM1NTEwMzA=	5553	Flexible indexes: how best to implement the new data model?	benbovy 4160723	closed		2	2021-06-30T10:38:13Z	2021-08-09T07:56:56Z	2021-08-09T07:56:56Z	MEMBER	Yesterday during the flexible indexes weekly meeting we have discussed with @shoyer and @jhamman on what would be the best approach to implement the new data model described here. In this issue I summarize the implementation of the current data model as well as some suggestions for the new data model along with their pros / cons (I might still be missing important ones!). I don't think there's an easy or ideal solution unfortunately, so @pydata/xarray any feedback would be very welcome! Current data model implementation Currently any (pandas) index is wrapped into an `IndexVariable` object through an intermediate adapter to preserve dtypes and handle explicit indexing. This allows directly reusing the index data as a xarray coordinate variable. For a pandas multi-index, virtual coordinates are created for each level from the `IndexVariable` object wrapping the index. Although relying on "virtual coordinates" more or less worked so far, it is over-complicated. Moreover, this wouldn't work with the new data model where an index may be built from a set of coordinates with different dimensions. Proposed alternatives Option 1: independent (coordinate) variables and indexes Indexes and coordinates are loosely coupled, i.e., a `xarray.Index` holds a reference (mapping) to the coordinate variable(s) from which it is built but both manage their own data independently of each other. Pros: separation of concerns. we don't need anymore those complicated adapters for reusing the index data as xarray (virtual) variable(s), which may simplify some xarray internals. if we drop an index, that's simple, we just drop it and all its related coordinate variables are left as-is. we could theoretically build a (pandas) index from a chunked coordinate, and then when we drop the index we still have this chunked coordinate left untouched. Cons: data duplication this would clearly be a regression when using pandas indexes, but maybe less so for other indexes like kd-trees where adapting those objects for using it like coordinate variables wouldn't be easy or even possible. what if we want to build a `DataArray` or `Dataset` from one or more existing indexes (pandas or other)? Passing an index and treating as an array then re-building an index from this array is not optimal. keeping an index and its corresponding coordinate variable(s) in a consistent, in-sync state may be tricky, given that those variables may be mutable (although we could prevent this by encapsulating those variables using a very lightweight wrapper inspired by `IndexVariable`). Option 2: indexes hold coordinate variables This is the opposite approach of the current one. Here, a `xarray.Index` would wrap one or more `xarray.Variable` objects. Pros: probably easier to keep an index and its corresponding coordinate variable(s) in-sync. sharing data between an index and its coordinate variables may be easier. Cons: accessing / iterating through all coordinate variables in a `DataArray` or `Dataset` may be less straightforward. when the index is dropped, we might need some logic / API to return the coordinates as new `xarray.Variable` objects with their own data (or should we simply always drop the corresponding coordinates too? maybe not...). more responsibility / work for developers who want to provide 3rd party xarray indexes. Option 3: intermediate solution When an index is set (or unset), it returns a new set of coordinate variables to replace the existing ones. Pros: it keeps some separation of concerns, while it allows data sharing through adapters and/or ensures that variables are immutable using lightweight wrappers. Cons: like option 2, more things to care of for 3rd party xarray index developers.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5553/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
187859705	MDU6SXNzdWUxODc4NTk3MDU=	1092	Dataset groups	benbovy 4160723	closed		20	2016-11-07T23:28:36Z	2021-07-02T19:56:50Z	2021-07-02T19:56:49Z	MEMBER	EDIT: see https://github.com/pydata/xarray/issues/4118 for ongoing discussion Probably it has been already suggested, but similarly to netCDF4 groups it would be nice if we could access `Dataset` data variables, coordinates and attributes via groups. Currently xarray allows loading a specific netCDF4 group into a `Dataset`. Different groups can be loaded as separate `Dataset` objects, which may be then combined into a single, flat `Dataset`. Yet, in some cases it makes sense to represent data as a single object while it would be convenient to keep some nested structure. For example, a `Dataset` representing data on a staggered grid might have `scalar_vars` and `flux_vars` groups. Here are some potential uses for groups. When there are a lot of data variables and/or attributes, it would also help to have a more concise repr. I think about an implementation of `Dataset.groups` that would be specific to xarray, i.e., independent of any backend, and which would easily co-exist with the flat `Dataset`. It shouldn't be required for a backend to support groups (some existing backends simply don't). It is up to each backend to eventually transpose the `Dataset.groups` logic to its own group logic. `Dataset.groups` might return a `DatasetGroups` object, which quite similarly to `xarray.core.coordinates.DatasetCoordinates` would (1) have a reference to the Dataset object, (2) basically consist of a Mapping of group names to data variable/coordinate/attribute names and (3) dynamically create another `Dataset` object (sub-dataset) on `__getitem__`. Keys of `Dataset.groups` should be accessible as attributes , e.g., `ds.groups['scalar_vars'] == ds.scalar_vars`. Questions: How to handle hierarchies of > 1 levels (i.e., groups of groups...)? How to ensure that a variable / attribute in one group is not also present in another group? Case of methods called from groups with `inplace=True`?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1092/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
902009258	MDU6SXNzdWU5MDIwMDkyNTg=	5376	Multi-scale datasets and custom indexes	benbovy 4160723	open		6	2021-05-26T08:38:00Z	2021-06-02T08:07:38Z		MEMBER	I've been wondering if: multi-scale datasets are generic enough to implement some related functionality in Xarray, e.g., as new `Dataset` and/or `DataArray` method(s) we could leverage custom indexes for that (see the design notes) I'm thinking of an API that would look like this: ```python lazily load a big n-d image (full resolution) as a xarray.Dataset xyz_dataset = ... set a new index for the x/y/z coordinates (`reduction` and `pre_compute_scales` are optional and passed as arguments to `ImagePyramidIndex`) xyz_dataset.set_index( ('x', 'y', 'z'), ImagePyramidIndex, reduction=np.mean, pre_compute_scales=(2, 2), ) get a slice (ImagePyramidIndex will be used to dynamically scale the data or load the right pre-computed dataset) xyz_slice = xyz_dataset.sel_and_rescale(x=slice(...), y=slice(...), z=slice(...)) ``` where `ImagePyramidIndex` is not a "common" index, i.e., it cannot be used directly with Xarray's `.sel()` nor for data alignment. Using an index here might still make sense for such data extraction and resampling operation IMHO. We could extend the `xarray.Index` API to handle multi-scale datasets, so that `ImagePyramidIndex` could either do the scaling dynamically (maybe using a cache) or just lazily load pre-computed data, e.g., from a NGFF / OME-Zarr dataset... Both the implementation and functionality can be pretty flexible. Custom options may be passed through the Xarray API either when creating the index or when extracting a data slice. A hierarchical structure of `xarray.Dataset` objects is already discussed in #4118 for multi-scale datasets, but I'm wondering if using indexes could be an alternative approach (it could also be complementary, i.e., `ImagePyramidIndex` could rely on such hierarchical structure under the hood). I'd see some advantages of the index approach, although this is the perspective from a naive user who is not working with multi-scale datasets: it is flexible: the scaling may be done dynamically without having to store the results in a hierarchical collection with some predefined discrete levels we don't need to expose anything other than a simple `xarray.Dataset` + a "black-box" index in which we abstract away all the implementation details. The API example shown above seems more intuitive to me than having to deal directly with Dataset groups. Xarray will provide a plugin system for 3rd party indexes, allowing for more `ImagePyramidIndex` variants. Xarray already provides an extension mechanism (accessors) for methods like `sel_and_rescale` in the example above... That said, I'd also see the benefits of exposing Dataset groups more transparently to users (in case those are loaded from a store that supports it). cc @thewtex @joshmoore @d-v-b	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5376/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 }		xarray 13221727	issue
869721207	MDU6SXNzdWU4Njk3MjEyMDc=	5226	Attributes encoding compatibility between backends	benbovy 4160723	open		1	2021-04-28T09:11:19Z	2021-04-28T15:42:42Z		MEMBER	What happened: Let's create an Zarr dataset with some "less common" dtype and fill value, open it with Xarray and save the dataset as NetCDF: ```python import xarray as xr import zarr g = zarr.group() g.create('arr', shape=3, fill_value='z', dtype='<U1') g['arr'].attrs['_ARRAY_DIMENSIONS'] = ('dim_1') -- without masking fill values ds = xr.open_zarr(g.store, mask_and_scale=False) ds.arr.attrs # returns {'_FillValue': 'z'} error: netCDF4 does not yet support setting a fill value for variable-length strings ds.to_netcdf('test.nc') -- with masking fill values ds2 = xr.open_zarr(g.store, mask_and_scale=True) returns a dict that includes item _FillValue': 'z' ds2.arr.encoding same error than above ds2.to_netcdf('out2.nc') ``` What you expected to happen: Seamless conversion (read/write) from one backend to another. Is there anything we could do to improve the case shown here above, and maybe other cases like the one described in #5223? Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None libhdf5: None libnetcdf: None xarray: 0.17.0 pandas: 1.0.3 numpy: 1.18.1 scipy: 1.3.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.8.1 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.11.0 distributed: 2.14.0 matplotlib: 3.1.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 46.1.3.post20200325 pip: 19.2.3 conda: None pytest: 5.4.1 IPython: 7.13.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5226/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
733077617	MDU6SXNzdWU3MzMwNzc2MTc=	4555	Vectorized indexing (isel) of chunked data with 1D indices gives weird chunks	benbovy 4160723	open		1	2020-10-30T10:55:33Z	2021-03-02T17:36:48Z		MEMBER	What happened: Applying `.isel()` on a DataArray or Dataset with chunked data using 1-d indices (either stored in a `xarray.Variable` or a `numpy.ndarray`) gives weird chunks (i.e., a lot of chunks with small sizes). What you expected to happen: More consistent chunk sizes. Minimal Complete Verifiable Example: Let's create a chunked DataArray ```python In [1]: import numpy as np In [2]: import xarray as xr In [3]: da = xr.DataArray(np.random.rand(100), dims='points').chunk(50) In [4]: da Out[4]: <xarray.DataArray (points: 100)> dask.array<xarray-\<this-array>, shape=(100,), dtype=float64, chunksize=(50,), chunktype=numpy.ndarray> Dimensions without coordinates: points ``` Select random indices results in a lot of small chunks ```python In [5]: indices = xr.Variable('nodes', np.random.choice(np.arange(100, dtype='int'), size=10)) In [6]: da_sel = da.isel(points=indices) In [7]: da_sel.chunks Out[7]: ((1, 1, 3, 1, 1, 3),) ``` What I would expect `python In [8]: da.data.vindex[indices.data].chunks Out[8]: ((10,),)` This works fine with 2+ dimensional indexers, e.g., ```python In [9]: indices_2d = xr.Variable(('x', 'y'), np.random.choice(np.arange(100), size=(10, 10))) In [10]: da_sel_2d = da.isel(points=indices_2d) In [11]: da_sel_2d.chunks Out[11]: ((10,), (10,)) ``` Anything else we need to know?: I suspect the issue is here: https://github.com/pydata/xarray/blob/063606b90946d869e90a6273e2e18ed24bffb052/xarray/core/variable.py#L616-L617 In the example above I think we still want vectorized indexing (i.e., call `dask.array.Array.vindex[]` instead of `dask.array.Array[]`). Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.3 \| packaged by conda-forge \| (default, Jun 1 2020, 17:21:09) [Clang 9.0.1 ] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.UTF-8 libhdf5: None libnetcdf: None xarray: 0.16.1 pandas: 1.1.3 numpy: 1.19.1 scipy: 1.5.2 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.19.0 distributed: 2.25.0 matplotlib: 3.3.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 47.3.1.post20200616 pip: 20.1.1 conda: None pytest: 5.4.3 IPython: 7.16.1 sphinx: 3.2.1	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4555/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
187873247	MDU6SXNzdWUxODc4NzMyNDc=	1094	Supporting out-of-core computation/indexing for very large indexes	benbovy 4160723	open		5	2016-11-08T00:56:56Z	2021-01-26T20:09:12Z		MEMBER	(Follow-up of discussion here https://github.com/pydata/xarray/pull/1024#issuecomment-258524115). xarray + dask.array successfully enable out-of-core computation for very large variables that doesn't fit in memory. One current limitation is that the indexes of a `Dataset` or `DataArray`, which rely on `pandas.Index`, are still fully loaded into memory (it will be soon loaded eagerly after #1024). In many cases this is not a problem, as the sizes of 1-dimensional indexes are usually much smaller than the sizes of n-dimensional variables or coordinates. However, this may be problematic in some specific cases where we have to deal with very large indexes. As an example, big unstructured meshes often have coordinates (x, y, z) arranged as 1-d arrays of length that equals the number of nodes, which can be very large!! (See, e.g., ugrid conventions). It would be very nice if xarray could also help for these use cases. Therefore I'm wondering if (and how) out-of-core support can be extended to indexes and indexing. I've briefly looked at the documentation on `dask.dataframe`, and a first naive approach I have in mind would be to allow partitioning an index into multiple, contiguous indexes. For label-based indexing, we might for example map `indexing.convert_label_indexer` to each partition and combine the returned indexers. My knowledge of dask is very limited, though. So I've no doubt that this suggestion is very simplistic and not very efficient, or that there are better approaches. I'm also certainly missing other issues not directly related to indexing. Any thoughts? cc @shoyer @mrocklin	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1094/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
264747372	MDU6SXNzdWUyNjQ3NDczNzI=	1627	html repr of xarray object (for the notebook)	benbovy 4160723	closed		39	2017-10-11T21:49:20Z	2019-10-24T16:56:15Z	2019-10-24T16:48:47Z	MEMBER	Edit: preview for `Dataset` and `DataArray` (pure html/css) `Dataset`: https://jsfiddle.net/tay08cn9/4/ `DataArray`: https://jsfiddle.net/43z4v2wt/9/ I started to think a bit more deeply about how could look like a more rich, html-based representation of xarray objects that we would see, e.g., in jupyter notebooks. Here are some ideas for `Dataset`: https://jsfiddle.net/9ab4c3tr/35/ Some notes: - The html repr looks pretty similar than the plain-text repr. I think it's better if they don't differ too much from each other. - For the sake of consistency, I've stolen some style from `pandas.Dataframe` repr as it is shown in jupyterlab. - I tried to emphasize the most important parts of the repr, i.e., the lists of dimensions, coordinates and variables. - I think it's best if we keep a very lightweight implementation, i.e., pure HTML/CSS (no Javascript). It already allows some interaction like hover effects and collapsible sections. However, I doubt that more fancy stuff (like, e.g., highlighting on hover a specific dimension simultaneously at several places of the repr) would be possible here without Javascript. I have limited skills in this area, though. It is still, of course, some preliminary thoughts. Any feedback/suggestion is welcome, even opinions about whether an html repr is really needed or not!	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1627/reactions", "total_count": 11, "+1": 7, "-1": 0, "laugh": 0, "hooray": 4, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
234658224	MDU6SXNzdWUyMzQ2NTgyMjQ=	1447	Package naming "conventions" for xarray extensions	benbovy 4160723	closed		5	2017-06-08T21:14:24Z	2019-06-28T22:58:33Z	2019-06-28T21:58:33Z	MEMBER	I'm wondering what would be a good name for a package that primarily aims at providing an xarray extension (in the form of a `DataArray` and/or `Dataset` accessor). I'm currently thinking about using a prefix like the `scikit` package family (e.g., `scikit-learn`, `scikit-image`). For example, for a xarray extension for signal processing we would have: package full name: `xarray-signal` package import name: `xrsignal` (like `sklearn`) accessor name: `signal`. ```python import xarray as xr import xrsignal ds = xr.Dataset() ds.signal.process(...) ``` The main advantage is that we directly have an idea on what the package is about. It may be also good for the overall visibility of both xarray and its 3rd-party extensions. The downside is that there is three name variations: one for getting and installing the package, another one for importing the package and again another one for using the accessor. This may be annoying especially for new users who are not accustomed to this kind of naming convention. Conversely, choosing a different, unrelated name like salem or pangaea has the advantage of using the same name everywhere and perhaps providing multiple accessors in the same package, but given that the number of xarray extensions is likely to grow in a next future (see, e.g., the pangeo-data project) it would become difficult to have a clear view of the whole xarray package ecosystem. Any thoughts?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1447/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
180676935	MDU6SXNzdWUxODA2NzY5MzU=	1030	Concatenate multiple variables into one variable with a multi-index (categories)	benbovy 4160723	closed		3	2016-10-03T15:54:23Z	2019-02-25T07:25:40Z	2019-02-25T07:25:40Z	MEMBER	I often have to deal with datasets in this form (multiple variables of different sizes, each representing different categories, on the same physical dimension but using different names as they have different labels), `<xarray.Dataset> Dimensions: (wn_band1: 4, wn_band2: 6, wn_band3: 8) Coordinates: * wn_band1 (wn_band1) float64 200.0 266.7 333.3 400.0 * wn_band2 (wn_band2) float64 500.0 560.0 620.0 680.0 740.0 800.0 * wn_band3 (wn_band3) float64 1.5e+03 1.643e+03 1.786e+03 1.929e+03 ... Data variables: data_band3 (wn_band3) float64 0.7515 0.5302 0.6697 0.9621 0.01815 ... data_band1 (wn_band1) float64 0.3801 0.6649 0.01884 0.9407 data_band2 (wn_band2) float64 0.8813 0.4481 0.2353 0.9681 0.1085 0.0835` where it would be more convenient to have the data re-arranged into the following form (concatenate the variables into a single variable with a multi-index with the labels of both the categories and the physical coordinate): `<xarray.Dataset> Dimensions: (spectrum: 18) Coordinates: * spectrum (spectrum) MultiIndex - band (spectrum) int64 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3 3 - wn (spectrum) float64 200.0 266.7 333.3 400.0 500.0 560.0 620.0 ... Data variables: data (spectrum) float64 0.3801 0.6649 0.01884 0.9407 0.8813 0.4481 ...` The latter would allow using xarray's nice features like `ds.groupby('band').mean()`. Currently, the best way that I've found to transform the data is something like: ``` python data = np.concatenate([ds.data_band1, ds.data_band2, ds.data_band3]) wn = np.concatenate([ds.wn_band1, ds.wn_band2, ds.wn_band3]) band = np.concatenate([np.repeat(1, 4), np.repeat(2, 6), np.repeat(3, 8)]) midx = pd.MultiIndex.from_arrays([band, wn], names=('band', 'wn')) ds2 = xr.Dataset({'data': ('spectrum', data)}, coords={'spectrum': midx}) ``` Maybe I miss a better way to do this? If I don't, it would be nice to have a convenience method for this, unless this use case is too rare to be worth it. Also not sure at all on what would be a good API such a method.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1030/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
275033174	MDU6SXNzdWUyNzUwMzMxNzQ=	1727	IPython auto-completion triggers data loading	benbovy 4160723	closed		11	2017-11-18T00:14:00Z	2017-11-18T07:09:41Z	2017-11-18T07:09:40Z	MEMBER	I create a big netcdf file like this: ```python In [1]: import xarray as xr In [2]: import numpy as np In [3]: ds = xr.Dataset({'myvar': np.arange(100000000, dtype='float64')}) In [4]: ds.to_netcdf('test.nc') ``` Then when I open the file in a IPython console and I use auto-completion, it triggers loading the data. ```python In [1]: import xarray as xr In [2]: ds = xr.open_dataset('test.nc') In [3]: ds.my # <TAB> autocompletion with any character -> triggers loading ``` I don't have that issue using the python console. Auto-completion for dictionary access in IPython (#1632) works fine too. Output of `xr.show_versions()` commit: None python: 3.6.3.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: fr_BE.UTF-8 LOCALE: fr_BE.UTF-8 xarray: 0.10.0rc1-2-gf83361c pandas: 0.21.0 numpy: 1.13.1 scipy: 0.19.1 netCDF4: 1.3.1 h5netcdf: 0.5.0 Nio: None bottleneck: 1.2.1 cyordereddict: None dask: 0.15.4 matplotlib: None cartopy: None seaborn: None setuptools: 36.6.0 pip: 9.0.1 conda: None pytest: None IPython: 6.2.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1727/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
274591962	MDU6SXNzdWUyNzQ1OTE5NjI=	1722	Change in behavior of .set_index() from pandas 0.20.3 to 0.21.0	benbovy 4160723	closed		1	2017-11-16T17:05:20Z	2017-11-17T00:54:51Z	2017-11-17T00:54:51Z	MEMBER	I use xarray 0.9.6 for both examples below. With pandas 0.20.3, `Dataset.set_index` gives me what I expect (i.e., the `grid__x` data variable becomes a coordinate `x`): ```python In [1]: import xarray as xr In [2]: import pandas as pd In [3]: pd.version Out[3]: '0.20.3' In [4]: ds = xr.Dataset({'grid__x': ('x', [1, 2, 3])}) In [5]: ds.set_index(x='grid__x') Out[5]: <xarray.Dataset> Dimensions: (x: 3) Coordinates: * x (x) int64 1 2 3 Data variables: empty ``` With pandas 0.21.0, it creates a `MultiIndex`, which is not what I expect here when setting an index with only one data variable: ```python In [1]: import xarray as xr In [2]: import pandas as pd In [3]: pd.version Out[3]: '0.21.0' In [4]: ds = xr.Dataset({'grid__x': ('x', [1, 2, 3])}) In [5]: ds.set_index(x='grid__x') Out[5]: <xarray.Dataset> Dimensions: (x: 3) Coordinates: * x (x) MultiIndex - grid__x (x) int64 1 2 3 Data variables: empty ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1722/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
134359597	MDU6SXNzdWUxMzQzNTk1OTc=	767	MultiIndex and data selection	benbovy 4160723	closed		9	2016-02-17T18:24:00Z	2016-09-14T14:28:29Z	2016-09-14T14:28:29Z	MEMBER	[Edited for more clarity] First of all, I find the MultiIndex very useful and I'm looking forward to see the TODOs in #719 implemented in the next releases, especially the three first ones in the list! Apart from these issues, I think that some other aspects may be improved, notably regarding data selection. Or maybe I've not correctly understood how to deal with multi-index and data selection... To illustrate this, I use some fake spectral data with two discontinuous bands of different length / resolution: ``` In [1]: import pandas as pd In [2]: import xarray as xr In [3]: band = np.array(['foo', 'foo', 'bar', 'bar', 'bar']) In [4]: wavenumber = np.array([4050.2, 4050.3, 4100.1, 4100.3, 4100.5]) In [5]: spectrum = np.array([1.7e-4, 1.4e-4, 1.2e-4, 1.0e-4, 8.5e-5]) In [6]: s = pd.Series(spectrum, index=[band, wavenumber]) In [7]: s.index.names = ('band', 'wavenumber') In [8]: da = xr.DataArray(s, dims='band_wavenumber') In [9]: da Out[9]: <xarray.DataArray (band_wavenumber: 5)> array([ 1.70000000e-04, 1.40000000e-04, 1.20000000e-04, 1.00000000e-04, 8.50000000e-05]) Coordinates: * band_wavenumber (band_wavenumber) object ('foo', 4050.2) ... ``` I extract the band 'bar' using `sel`: ``` In [10]: da_bar = da.sel(band_wavenumber='bar') In [11]: da_bar Out[11]: <xarray.DataArray (band_wavenumber: 3)> array([ 1.20000000e-04, 1.00000000e-04, 8.50000000e-05]) Coordinates: * band_wavenumber (band_wavenumber) object ('bar', 4100.1) ... ``` It selects the data the way I want, although using the dimension name is confusing in this case. It would be nice if we can also use the `MultiIndex` names as arguments of the `sel` method, even though I don't know if it is easy to implement. Futhermore, `da_bar` still has the 'band_wavenumber' dimension and the 'band' index-level, but it is not very useful anymore. Ideally, I'd rather like to obtain a `DataArray` object with a 'wavenumber' dimension / coordinate and the 'bar' band name dropped from the multi-index, i.e., something would require automatic index-level removal and/or automatic unstack when selecting data. Extracting the band 'bar' from the pandas `Series` object gives something closer to what I need (see below), but using pandas is not an option as my spectral data involves other dimensions (e.g., time, scans, iterations...) not shown here for simplicity. ``` In [12]: s_bar = s.loc['bar'] In [13]: s_bar Out[13]: wavenumber 4100.1 0.000120 4100.3 0.000100 4100.5 0.000085 dtype: float64 ``` The problem is also that the unstacked `DataArray` object resulting from the selection has the same dimensions and size than the original, unstacked `DataArray` object. The only difference is that unselected values are replaced by `nan`. ``` In [13]: da.unstack('band_wavenumber') Out[13]: <xarray.DataArray (band: 2, wavenumber: 5)> array([[ nan, nan, 1.20000000e-04, 1.00000000e-04, 8.50000000e-05], [ 1.70000000e-04, 1.40000000e-04, nan, nan, nan]]) Coordinates: * band (band) object 'bar' 'foo' * wavenumber (wavenumber) float64 4.05e+03 4.05e+03 4.1e+03 4.1e+03 4.1e+03 In [14]: da_bar.unstack('band_wavenumber') Out[14]: <xarray.DataArray (band: 2, wavenumber: 5)> array([[ nan, nan, 1.20000000e-04, 1.00000000e-04, 8.50000000e-05], [ nan, nan, nan, nan, nan]]) Coordinates: * band (band) object 'bar' 'foo' * wavenumber (wavenumber) float64 4.05e+03 4.05e+03 4.1e+03 4.1e+03 4.1e+03 ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/767/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
169368546	MDU6SXNzdWUxNjkzNjg1NDY=	942	Filtering by data variable name	benbovy 4160723	closed		3	2016-08-04T13:01:20Z	2016-08-04T19:09:07Z	2016-08-04T19:09:07Z	MEMBER	Given #844 and #916, maybe it might be useful to also have a `Dataset.filter_by_name` method? I currently deal with datasets that have many data variables with names like: `... reference__HONO (rlevel) float64 3.16e-15 1e-14 1e-14 1e-14 ... reference__NO (rlevel) float64 2.16e-05 3.57e-06 9.3e-07 ... reference__HO2NO2 (rlevel) float64 9.58e-20 7.32e-19 4.63e-18 ... ... retrieved__O3 (level) float64 1.552e-06 5.618e-07 ... retrieved__N2O (level) float64 4.714e-11 9.905e-11 ... retrieved__CO2 (level) float64 0.0002816 0.0003592 ... ...` Using `ds.filter_by_name(like='reference__')` would be less verbose than, e.g., `xr.Dataset({name: ds[name] for name in ds.keys() if 'reference__' in name})`, unless there is already a more convenient way that I'm missing?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/942/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

42 rows where type = "issue" and user = 4160723 sorted by updated_at descending

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

What is your issue?

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

5692 is ~~not merged yet~~ now merged ~~but~~ and we can ~~already~~ start thinking about the next steps. I’m opening this issue to list and track the remaining tasks. @pydata/xarray, do not hesitate to add a comment below if you think about something that is missing here.

Continue the refactoring of the internals

Relax all constraints related to “dimension (index) coordinates” in Xarray

7989

Indexes repr

6795

7185

7183

Public API for assigning and (re)setting indexes

6392

7214

7368

6849

6971

Other public API for index-based operations

Documentation

Index types and helper classes built in Xarray

7182

3rd party indexes

What is your issue?

What is your issue?

What is your issue?

Is your feature request related to a problem?

Describe the solution you'd like

An example with a pandas multi-index

ValueError: missing coordinate(s) for index(es): 'x', 'foo', 'bar'

ValueError: missing index(es) for coordinate(s): 'bar'

ValueError: conflict between coordinate(s) and index(es): 'foo'

ValueError: conflict between coordinate(s) and index(es): 'bar'

ValueError: missing index(es) for coordinate(s): 'x', 'foo', 'bar'

or

create unindexed coordinates 'foo' and 'bar' and a 'x' coordinate with a single pandas index

["x", "foo", "bar"]

How to generalize to any (custom) index?

Describe alternatives you've considered

Also allow passing index types (and build options) via indexes

Pass multi-indexes once, grouped by coordinate names

Additional context

Is your feature request related to a problem?

Describe the solution you'd like

or

xarray.core.indexes

third-party code

Describe alternatives you've considered

third-party code

Additional context

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

What is your issue?

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

<xarray.Dataset>

Dimensions: (x: 4)

Coordinates:

* x (x) object MultiIndex

* foo (x) object 'a' 'a' 'b' 'b'

* bar (x) int64 1 2 1 2

Data variables:

empty

5692 is not merged yet now merged but and we can already start thinking about the next steps. I’m opening this issue to list and track the remaining tasks. @pydata/xarray, do not hesitate to add a comment below if you think about something that is missing here.

Also allow passing index types (and build options) via `indexes`

(`reduction` and `pre_compute_scales` are optional and passed

as arguments to `ImagePyramidIndex`)

Output of `xr.show_versions()`