github: issues: 196 rows where user = 35968931 sorted by updated

196 rows where user = 35968931 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	draft	pull_request	body	reactions	state_reason	repo	type
2276408691	I_kwDOAMm_X86Hrz1z	8995	Why does xr.apply_ufunc support numpy/dask.arrays?	TomNicholas 35968931	open	0	2024-05-02T20:18:41Z	2024-05-03T22:03:43Z		MEMBER			What is your issue? @keewis pointed out that it's weird that `xarray.apply_ufunc` supports passing numpy/dask arrays directly, and I'm inclined to agree. I don't understand why we do, and think we should consider removing that feature. Two arguments in favour of removing it: 1) It exposes users to transposition errors Consider this example: ```python In [1]: import xarray as xr In [2]: import numpy as np In [3]: arr = np.arange(12).reshape(3, 4) In [4]: def mean(obj, dim): ...: # note: apply always moves core dimensions to the end ...: return xr.apply_ufunc( ...: np.mean, obj, input_core_dims=[[dim]], kwargs={"axis": -1} ...: ) ...: In [5]: mean(arr, dim='time') Out[5]: array([1.5, 5.5, 9.5]) In [6]: mean(arr.T, dim='time') Out[6]: array([4., 5., 6., 7.]) ``` Transposing the input leads to a different result, with the value of the `dim` kwarg effectively ignored. This kind of error is what xarray code is supposed to prevent by design. 2) There is an alternative input pattern that doesn't require accepting bare arrays Instead, any numpy/dask array can just be wrapped up into an xarray `Variable`/`NamedArray` before passing it to `apply_ufunc`. ```python In [7]: from xarray.core.variable import Variable In [8]: var = Variable(data=arr, dims=['time', 'space']) In [9]: mean(var, dim='time') Out[9]: <xarray.Variable (space: 4)> Size: 32B array([4., 5., 6., 7.]) In [10]: mean(var.T, dim='time') Out[10]: <xarray.Variable (space: 4)> Size: 32B array([4., 5., 6., 7.]) ``` This now guards against the transposition error, and puts the onus on the user to be clear about which axes of their array correspond to which dimension. With `Variable`/`NamedArray` as public API, this latter pattern can handle every case that passing bare arrays in could. I suggest we deprecate accepting bare arrays in favour of having users wrap them in `Variable`/`NamedArray`/`DataArray` objects instead. (Note 1: We also accept raw scalars, but this doesn't expose anyone to transposition errors.) (Note 2: In a quick scan of the `apply_ufunc` docstring, the docs on it in `computation.rst`, and the extensive guide that @dcherian wrote in the xarray tutorial repository, I can't see any examples that actually pass bare arrays to `apply_ufunc`.)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8995/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2276352251	I_kwDOAMm_X86HrmD7	8994	Improving performance of open_datatree	TomNicholas 35968931	open	4	2024-05-02T19:43:17Z	2024-05-03T15:25:33Z		MEMBER			What is your issue? The implementation of `open_datatree` works, but is inefficient, because it calls `open_dataset` once for every group in the file. We should refactor this to improve the performance, which would fix issues like https://github.com/xarray-contrib/datatree/issues/330. We discussed this in the datatree meeting, and my understanding is that concretely we need to: [ ] Create an asv benchmark for `open_datatree`, probably involving first writing then benchmarking the opening of a special netCDF file that has no data but lots of groups. [ ] Refactor the `NetCDFDatastore` class to only create one `CachingFileManager` object per file, not one per group, see https://github.com/pydata/xarray/blob/748bb3a328a65416022ec44ced8d461f143081b5/xarray/backends/netCDF4_.py#L406. [ ] Refactor `NetCDF4BackendEntrypoint.open_datatree` to use an implementation that goes through `NetCDFDatastore` without calling the top-level `xr.open_dataset` again. [ ] Check the performance of calling `xr.open_datatree` on a netCDF file has actually improved. It would be great to get this done soon as part of the datatree integration project. @kmuehlbauer I know you were interested - are you willing / do you have time to take this task on?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8994/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2054280736	I_kwDOAMm_X856cdYg	8572	Track merging datatree into xarray	TomNicholas 35968931	open	27	2023-12-22T17:37:20Z	2024-05-02T19:44:29Z		MEMBER			What is your issue? Master issue to track progress of merging xarray-datatree into xarray `main`. Would close https://github.com/pydata/xarray/issues/4118 (and many similar issues), as well as one of the goals of our development roadmap. Also see the project board for DataTree integration. On calls in the last few dev meetings, we decided to forget about a temporary cross-repo `from xarray import datatree` (so this issue supercedes #7418), and just begin merging datatree into xarray main directly. Weekly meeting See https://github.com/pydata/xarray/issues/8747 Task list: To happen in order: [x] `open_datatree` in xarray. This doesn't need to be performant initially, and ~~it would initially return a `datatree.DataTree` object.~~ EDIT: We decided it should return an `xarray.DataTree` object, or even `xarray.core.datatree.DataTree` object. So we can start by just copying the basic version in `datatree/io.py` right now which just calls `open_dataset` many times. #8697 [x] Triage and fix issues: figure out which of the issues on xarray-contrib/datatree need to be fixed before the merge (if any). [ ] Merge in code for `DataTree` class. I suggest we do this by making one PR for each module, and ideally discussing and merging each before opening a PR for the next module. (Open to other workflow suggestions though.) The main aim here being lowering the bus factor on the code, confirming high-level design decisions, and improving details of the implementation as it goes in. Suggested order of modules to merge: - [x] `datatree/treenode.py` - defines the tree structure, without any dimensions/data attached, #8757 - [x] `datatree/datatree.py` - adds data to the tree structure, #8789 - [x] `datatree/iterators.py` - iterates over a single tree in various ways, currently copied from anytree, #8879 - [x] `datatree/mapping.py` - implements `map_over_subtree` by iterating over N trees at once https://github.com/pydata/xarray/pull/8948, - [ ] `datatree/ops.py` - uses `map_over_subtree` to map methods like `.mean` over whole trees (https://github.com/pydata/xarray/pull/8976), - [x] `datatree/formatting_html.py` - HTML repr, works but could do with some optimization https://github.com/pydata/xarray/pull/8930, - [x] `datatree/{extensions/common}.py` - miscellaneous other features e.g. attribute-like access (#8967). [ ] Expose datatree API publicly. Actually expose `open_datatree` and `DataTree` in xarray's public API as top-level imports. The full list of things to expose is: [ ] `open_datatree` [ ] `DataTree` [ ] `map_over_subtree` [ ] `assert_isomorphic` [ ] `register_datatree_accessor` [ ] Refactor class inheritance - `Dataset`/`DataArray` share some mixin classes (e.g. `DataWithCoords`), and we could probably refactor `DataTree` to use these too. This is low-priority but would reduce code duplication. Can happen basically at any time or maybe in parallel with other efforts: [ ] Generalize backends to support groups. Once a basic version of `xr.open_datatree` exists, we can start refactoring xarray's backend classes to support a general `Backend.open_datatree` method for any backend that can open multiple groups. Then we can make sure this is more performant than the naive implementation, i.e. only opening the file once. See also #8994. [ ] Support backends other than netCDF and Zarr. - e.g. grib, see https://github.com/pydata/xarray/pull/7437, [ ] Support dask properly - Issue https://github.com/xarray-contrib/datatree/pull/97 and the (stale) PR https://github.com/xarray-contrib/datatree/pull/196 are about dask parallelization over separate nodes in the tree. [ ] Add other new high-level API methods - Things like `.reorder_nodes` and ideas we've only discussed like https://github.com/xarray-contrib/datatree/issues/79 and https://github.com/xarray-contrib/datatree/issues/254 (cc @dcherian who has had useful ideas here) [ ] Copy xarray-contrib/datatree issues over to xarray's main repository. I think this is quite important and worth doing as a record of why decisions were made. (@jhamman and @TomNicholas) [ ] Copy over any recent bug fixes from original `datatree` repository [x] Look into merging commit history of xarray-contrib/datatree. I think this would be cool but is less important than keeping the issues. (@jhamman suggested we could do this using some git wizardry that I hadn't heard of before) [ ] `xarray.tutorial.open_datatree` - I've been meaning to make a tutorial datatree object for ages. There's an issue about it, but actually now I think something close to the CMIP6 ensemble data that @jbusecke and I used in our pangeo blog post would already be pretty good. Once we have this it becomes much easier to write docs about some advanced features. [ ] Merge Docs - I've tried to write these pages so that they should slot neatly into xarray's existing docs structure. Careful reading, additions and improvements would be great though. Summary of what docs exist on this issue https://github.com/xarray-contrib/datatree/issues/61 [ ] Write a blog post on the xarray blog highlighting xarray's new functionality, and explicitly thanking the NASA team for their work. Doesn't have to be long, it can just point to the documentation. Anyone is welcome to help with any of this, including but not limited to @owenlittlejohns , @eni-awowale, @flamingbear (@etienneschalk maybe?). cc also @shoyer @keewis for any thoughts as to the process.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8572/reactions", "total_count": 7, "+1": 6, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 }		xarray 13221727	issue
2204914380	PR_kwDOAMm_X85qnPSf	8872	Avoid auto creation of indexes in concat	TomNicholas 35968931	open	15	2024-03-25T05:16:33Z	2024-05-01T19:07:01Z		MEMBER	0	pydata/xarray/pulls/8872	If we create a `Coordinates` object using the concatenated `result_indexes`, and pass that to the `Dataset` constructor, we can explicitly set the correct indexes from the start, instead of auto-creating the wrong ones and then trying to overwrite them with the correct indexes later (which is what the current implementation does). [x] Possible fix for #8871 [x] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8872/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2267803218	PR_kwDOAMm_X85t8pSN	8980	Complete deprecation of Dataset.dims returning dict	TomNicholas 35968931	open	6	2024-04-28T20:32:29Z	2024-05-01T15:40:44Z		MEMBER	0	pydata/xarray/pulls/8980	[x] Completes deprecation cycle described in #8496, and started in #8500 [ ] ~~Tests added~~ [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8980/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2267780811	PR_kwDOAMm_X85t8kgX	8979	Warn on automatic coercion to coordinate variables in Dataset constructor	TomNicholas 35968931	open	2	2024-04-28T19:44:20Z	2024-04-29T21:13:00Z		MEMBER	0	pydata/xarray/pulls/8979	[x] Starts the deprecation cycle for #8959 [x] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~ [ ] Change existing code + examples so as not to emit this new warning everywhere.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8979/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2019566184	I_kwDOAMm_X854YCJo	8494	Filter expected warnings in the test suite	TomNicholas 35968931	closed	1	2023-11-30T21:50:15Z	2024-04-29T16:57:07Z	2024-04-29T16:56:16Z	MEMBER			FWIW one thing I'd be keen for to do generally — though maybe this isn't the place to start it — is handle warnings in the test suite when we add a new warning — i.e. filter them out where we expect them. In this case, that would be the loading the netCDF files that have duplicate dims. Otherwise warnings become a huge block of text without much salience. I mostly see the 350 lines of them and think "meh mostly units & cftime", but then something breaks on a new upstream release that was buried in there, or we have a supported code path that is raising warnings internally. (I'm not sure whether it's possible to generally enforce that — maybe we could raise on any warnings coming from within xarray? Would be a non-trivial project to get us there though...) Originally posted by @max-sixty in https://github.com/pydata/xarray/issues/8491#issuecomment-1834615826	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8494/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2021386895	PR_kwDOAMm_X85g7QZD	8500	Deprecate ds.dims returning dict	TomNicholas 35968931	closed	1	2023-12-01T18:29:28Z	2024-04-28T20:04:00Z	2023-12-06T17:52:24Z	MEMBER	0	pydata/xarray/pulls/8500	[x] Closes first step of #8496, would require another PR later to actually change the return type. Also really resolves the second half of #921. [x] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8500/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2253567622	I_kwDOAMm_X86GUraG	8959	Dataset constructor always coerces 1D data variables with same name as dim to coordinates	TomNicholas 35968931	open	10	2024-04-19T17:54:28Z	2024-04-28T19:57:31Z		MEMBER			What is your issue? Whilst xarray's data model appears to allow 1D data variables that have the same name as their dimension, it seems to be impossible to actually create this using the `Dataset` constructor, as they will always be converted to coordinate variables instead. We can create a 1D data variable with the same name as it's dimension like this: ```python In [9]: ds = xr.Dataset({'x': 0}) In [10]: ds Out[10]: <xarray.Dataset> Size: 8B Dimensions: () Data variables: x int64 8B 0 In [11]: ds.expand_dims('x') Out[11]: <xarray.Dataset> Size: 8B Dimensions: (x: 1) Dimensions without coordinates: x Data variables: x (x) int64 8B 0 ``` so it seems to be a valid part of the data model. But I can't get to that situation from the `Dataset` constructor. This should create the same dataset: ```python In [15]: ds = xr.Dataset(data_vars={'x': ('x', [0])}) In [16]: ds Out[16]: <xarray.Dataset> Size: 8B Dimensions: (x: 1) Coordinates: * x (x) int64 8B 0 Data variables: empty `` But actually it makesx`a coordinate variable (and implicitly creates a pandas Index for it). This means that in this case there is no difference between using the`data_vars`and`coords` kwargs to the constructor: ```python ds = xr.Dataset(coords={'x': ('x', [0])}) In [18]: ds Out[18]: <xarray.Dataset> Size: 8B Dimensions: (x: 1) Coordinates: * x (x) int64 8B 0 Data variables: empty ``` This all seems weird to me. I would have thought that if a 1D data variable is allowed, we shouldn't coerce to making it a coordinate variable in the constructor. If anything that's actively misleading. Note that whilst this came up in the context of trying to avoid auto-creation of 1D indexes for coordinate variables, this issue is actually separate. (xref https://github.com/pydata/xarray/pull/8872#issuecomment-2027571714) cc @benbovy who probably has thoughts	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8959/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2224036575	I_kwDOAMm_X86EkBrf	8905	Variable doesn't have an .expand_dims method	TomNicholas 35968931	closed	4	2024-04-03T22:19:10Z	2024-04-28T19:54:08Z	2024-04-28T19:54:08Z	MEMBER			Is your feature request related to a problem? `DataArray` and `Dataset` have an `.expand_dims` method, but it looks like `Variable` doesn't. Describe the solution you'd like Variable should also have this method, the only difference being that it wouldn't create any coordinates or indexes. Describe alternatives you've considered No response Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8905/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2204768593	I_kwDOAMm_X86DahlR	8871	Concatenation automatically creates indexes where none existed	TomNicholas 35968931	open	1	2024-03-25T02:43:31Z	2024-04-27T16:50:56Z		MEMBER			What happened? Currently concatenation will automatically create indexes for any dimension coordinates in the output, even if there were no indexes on the input. What did you expect to happen? Indexes not to be created for variables which did not already have them. Minimal Complete Verifiable Example ```Python TODO once passing indexes={} directly to DataArray constructor is allowed then no need to create coords object separately first coords = Coordinates( {"x": np.array([1, 2, 3])}, indexes={} ) arrays = [ DataArray( np.zeros((3, 3)), dims=["x", "y"], coords=coords, ) for _ in range(2) ] combined = concat(arrays, dim="x") assert combined.shape == (6, 3) assert combined.dims == ("x", "y") should not have auto-created any indexes assert combined.indexes == {} # this fails combined = concat(arrays, dim="z") assert combined.shape == (2, 3, 3) assert combined.dims == ("z", "x", "y") should not have auto-created any indexes assert combined.indexes == {} # this also fails ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output ```Python nor have auto-created any indexes `assert combined.indexes == {}` E AssertionError: assert Indexes:\n x Index([1, 2, 3, 1, 2, 3], dtype='int64', name='x') == {} E Full diff: E - { E - , E - } E + Indexes: E + x Index([1, 2, 3, 1, 2, 3], dtype='int64', name='x', E + ) ``` Anything else we need to know? The culprit is the call to `core.indexes.create_default_index_implicit` inside `merge.py`. If I comment out this call my concat test passes, but basic tests in `test_merge.py` start failing. I would like know to how to avoid the internal call to `create_default_index_implicit`. I tried passing `compat='override'` but that made no difference, so I think we would have to change `merge.collect_variables_and_indexes` somehow. Conceptually, I would have thought we should be examining what indexes exist on the objects to be concatenated, and not creating new indexes for any variable that doesn't already have one. Presumably we should therefore be making use of the `indexes` argument to `merge.collect_variables_and_indexes`, but currently that just seems to be empty. Environment I've been experimenting running this test on a branch that includes both #8711 and #8714, but actually this example will fail in the same way on `main`.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8871/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2254350395	PR_kwDOAMm_X85tPTua	8960	Option to not auto-create index during expand_dims	TomNicholas 35968931	closed	2	2024-04-20T03:27:23Z	2024-04-27T16:48:30Z	2024-04-27T16:48:24Z	MEMBER	0	pydata/xarray/pulls/8960	[x] Solves part of #8871 by pulling out part of https://github.com/pydata/xarray/pull/8872#issuecomment-2027571714 [x] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~ TODO: - [x] Add new kwarg to `DataArray.expand_dims` - [ ] Add examples to docstrings? - [x] Check it actually solves the problem in #8872	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8960/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2259850888	I_kwDOAMm_X86GspaI	8966	HTML repr for chunked variables with high dimensionality	TomNicholas 35968931	open	1	2024-04-23T22:00:40Z	2024-04-24T13:27:05Z		MEMBER			What is your issue? The graphical representation of dask arrays with many dimensions can end up off the page in the HTML repr. Ideally dask would worry about this for us, and we just use their `_inline_repr`, as mentioned here https://github.com/pydata/xarray/issues/4376#issuecomment-680296332	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8966/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1692909704	PR_kwDOAMm_X85PnMF6	7811	Generalize delayed	TomNicholas 35968931	open	0	2023-05-02T18:34:26Z	2024-04-23T17:41:55Z		MEMBER	0	pydata/xarray/pulls/7811	A small follow-on to #7019 to allow using non-dask implementations of `delayed`. (Builds off of #7019) [x] Closes #7810 [ ] Tests added [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] New functions/methods are listed in `api.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7811/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1692904446	I_kwDOAMm_X85k56v-	7810	Generalize dask.delayed calls to go through ChunkManager	TomNicholas 35968931	open	0	2023-05-02T18:30:32Z	2024-04-23T17:38:58Z		MEMBER			[Deepak: Should we add `chunked_array_type` and `from_array_kwargs` to `open_mfdataset`? I actually don't think we need to - `from_array_kwargs` is only going to get directly passed down to `open_dataset`, and hence could be considered part of `*kwargs`. This should actually just work, except in the case of `parallel=True`. For that we could add `delayed` to the `ChunkManager` ABC, so that if cubed does implement `cubed.delayed` it could be added, else a `NotImplementedError` would be raised. I think all of this wouldn't be necessary if we had lazy concatenation in xarray though (xref https://github.com/pydata/xarray/issues/4628). That suggestion would mean we should also replace other instances of `dask.delayed` in other parts of the codebase though... I think I will split this into a separate issue in the interests of getting this one merged. Originally posted by @TomNicholas in https://github.com/pydata/xarray/pull/7019#discussion_r1182904134*	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7810/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2100707586	PR_kwDOAMm_X85lFQn3	8669	Fix automatic broadcasting when wrapping array api class	TomNicholas 35968931	closed	0	2024-01-25T16:05:19Z	2024-04-20T05:58:05Z	2024-01-26T16:41:30Z	MEMBER	0	pydata/xarray/pulls/8669	[x] Closes #8665 [x] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8669/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2134951079	I_kwDOAMm_X85_QMSn	8747	Datatree design discussions - weekly meeting	TomNicholas 35968931	open	10	2024-02-14T18:39:16Z	2024-04-18T22:09:16Z		MEMBER			What is your issue? In the bi-weekly dev meeting today we agreed that deliberate higher-level discussions of datatree's design would be useful. (i.e. we're not worried about our ability to write high-quality code, so let's focus review time more explicitly on the high-level design questions.) This could take the form of me just talking through what I did in a certain part of the code and why, or a targeted discussion on specific design questions that I was never quite sure about. Some examples of the latter, as food for thought: - [ ] Inheritance of dimension coordinates from parent nodes? https://github.com/xarray-contrib/datatree/issues/297 - [x] ~~Symbolic links? https://github.com/xarray-contrib/datatree/issues/5~~ (we decided this was overkill) - [ ] Is `dt.ds` ugly? See also the difference between `dt.ds` and `dt.to_dataset()` https://github.com/xarray-contrib/datatree/issues/303#issuecomment-1917798769 - [ ] Which methods should map over the subtree and which shouldn't? (can't find the issue for this one) - [ ] Ignore missing dims when mapping over subtree? https://github.com/xarray-contrib/datatree/issues/67 - [ ] API for sub-tree selection https://github.com/xarray-contrib/datatree/issues/254 - [ ] API for merging leaves https://github.com/xarray-contrib/datatree/issues/192 - [ ] Dict-like interface ambiguities https://github.com/xarray-contrib/datatree/issues/240 - [ ] The tree broadcasting rabbit hole https://github.com/xarray-contrib/datatree/issues/199 - [ ] Relationship between datatree and catalogs https://github.com/xarray-contrib/datatree/issues/134 - [ ] Should `xr.concat`/`xr.merge` accept `DataTree` objects? (and map over them by default?) Would help with https://github.com/TomNicholas/VirtualiZarr/issues/84#issuecomment-2065410549 There was also this design doc I wrote at one point @flamingbear are you free at 11:30am EST on Tuesday each week? @shoyer, @keewis and I are all free then. Others also welcome (e.g. @owenlittlejohns , @eni-awowale, @etienneschalk), but not required :)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8747/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2247043809	I_kwDOAMm_X86F7yrh	8949	Mapping DataTree methods over nodes with variables for which the args are invalid	TomNicholas 35968931	open	0	2024-04-16T23:45:26Z	2024-04-17T14:58:14Z		MEMBER			What is your issue? In the datatree call today we narrowed down an issue with how datatree maps methods over many variables in many nodes. This issue is essentially https://github.com/xarray-contrib/datatree/issues/67, but I'll attempt to discuss the problem and solution in more general terms. Context in xarray `xarray.Dataset` is essentially a mapping of variable names to `Variable` objects, and most `Dataset` methods implicitly map a method defined on Variable over all these variables (e.g. `.mean()`). Sometimes the mapped method can be naively applied to every variable in the dataset, but sometimes it doesn't make sense to apply it to some of the variables. For example `.mean(dim='time')` only makes sense for the variables in the dataset that actually have a `time` dimension. `xarray.Dataset` handles this for the user by either working out what version of the method does make sense for that variable (e.g. only trying to take the mean along the reduction dimensions actually present on that variable), or just passing the variable through unaltered. There are some weird subtleties lurking here, e.g. with statistical reductions like `std` and `var`. https://github.com/pydata/xarray/blob/239309f881ba0d7e02280147bc443e6e286e6a63/xarray/core/dataset.py#L6853 There is therefore a difference between `ds.map(Variable.{REDUCTION}, dim='time')` and `ds.{REDUCTION}(dim='time')` For example: ```python In [13]: ds = xr.Dataset({'a': ('x', [1, 2]), 'b': 0}) In [14]: ds.isel(x=0) Out[14]: <xarray.Dataset> Size: 16B Dimensions: () Data variables: a int64 8B 1 b int64 8B 0 In [15]: ds.map(Variable.isel, x=0) ValueError Traceback (most recent call last) Cell In[15], line 1 ----> 1 ds.map(Variable.isel, x=0) ... ValueError: Dimensions {'x'} do not exist. Expected one or more of () ``` (Aside: It would be nice for `Dataset.map` to include information about which variable it raised an exception on in the error message.) Clearly `Dataset.isel` does more than just applying `Variable.isel` using `Dataset.map`. Issue in DataTree In datatree we have to map methods over different variables in the same node, but also over different variables in different nodes. Currently the implementation of a method naively maps the `Dataset` method over every node using `map_over_subtree`, but if there is a node containing a variable for which the method args are invalid, it will raise an exception. This causes problems for users, for example in https://github.com/xarray-contrib/datatree/issues/67. A minimal example of this problem would be ```python In [18]: ds1 = xr.Dataset({'a': ('x', [1, 2])}) In [19]: ds2 = xr.Dataset({'b': 0}) In [20]: dt = DataTree.from_dict({'node1': ds1, 'node2': ds2}) In [21]: dt Out[21]: DataTree('None', parent=None) ├── DataTree('node1') │ Dimensions: (x: 2) │ Dimensions without coordinates: x │ Data variables: │ a (x) int64 16B 1 2 └── DataTree('node2') Dimensions: () Data variables: b int64 8B 0 In [22]: dt.isel(x=0) ValueError: Dimensions {'x'} do not exist. Expected one or more of FrozenMappingWarningOnValuesAccess({}) Raised whilst mapping function over node with path /node2 ``` (The slightly weird error message here is related to the deprecation cycle in #8500) We would have preferred that variable `b` in `node2` survived unchanged, like it does in the pure `Dataset` example. Desired behaviour We can kind of think of the desired behaviour like a hypothesis property we want (xref https://github.com/pydata/xarray/issues/1846), but not quite. It would be something like `python dt.{REDUCTION}().flatten_into_dataset() == dt.flatten_into_dataset().{REDUCTION}()` except that `.flatten_into_dataset()` can't really exist for all cases otherwise we wouldn't need datatree. Proposed Solution There are two ways I can imagine implementing this. 1) Use `map_over_subtree` the apply the method as-is and try to catch known possible `KeyErrors` for missing dimensions. This would be fragile. 2) Do some kind of pre-checking of the data in the tree, potentially adjust the method before applying it using `map_over_subtree`. I think @shoyer and I concluded that we should make (2), in the form of some kind of new primitive, i.e. `DataTree.reduce`. (Actually `DataTree.reduce` already exists, but should be changed to not just `map_over_subtree` `Dataset.reduce`). Taking after `Dataset.reduce`, it would look something like this: ```python class DataTree: def reduce(self, reduce_func: Callable, dim: Dims = None, , kwargs) -> DataTree: all_dims_in_tree = set(node.dims for node in self.subtree) missing_dims = tuple(d for d in dims if d not in all_dims_in_tree) if missing_dims: raise ValueError() # TODO this could probably be refactored to call `map_over_subtree` for node in self.subtree: # using only the reduction dims that are actually present here would fix datatree GH issue #67 reduce_dims = [d for d in node.dims if d in dims] result = node.ds.reduce(func, dims=reduce_dims, *kwargs) # TODO build the result and return it ``` Then every method that has this pattern of acting over one or more dims should be mapped over the tree using `DataTree.reduce`, not `map_over_subtree`. cc @shoyer, @flamingbear, @owenlittlejohns	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8949/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2240895281	PR_kwDOAMm_X85siDno	8934	Correct save_mfdataset docstring	TomNicholas 35968931	closed	0	2024-04-12T20:51:35Z	2024-04-14T19:58:46Z	2024-04-14T11:14:42Z	MEMBER	0	pydata/xarray/pulls/8934	Noticed the `**kwargs` part of the docstring was mangled - see here [ ] ~~Closes #xxxx~~ [ ] ~~Tests added~~ [ ] ~~User visible changes (including notable bug fixes) are documented in `whats-new.rst`~~ [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8934/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2198196326	I_kwDOAMm_X86DBdBm	8860	Ugly error in constructor when no data passed	TomNicholas 35968931	closed	2	2024-03-20T17:55:52Z	2024-04-10T22:46:55Z	2024-04-10T22:46:54Z	MEMBER			What happened? Passing no data to the `Dataset` constructor can result in a very unhelpful "tuple index out of range" error when this is a clear case of malformed input that we should be able to catch. What did you expect to happen? An error more like "tuple must be of form (dims, data[, attrs])" Minimal Complete Verifiable Example `Python xr.Dataset({"t": ()})` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [ ] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output ```Python IndexError Traceback (most recent call last) Cell In[2], line 1 ----> 1 xr.Dataset({"t": ()}) File ~/Documents/Work/Code/xarray/xarray/core/dataset.py:693, in Dataset.init(self, data_vars, coords, attrs) 690 if isinstance(coords, Dataset): 691 coords = coords._variables --> 693 variables, coord_names, dims, indexes, _ = merge_data_and_coords( 694 data_vars, coords 695 ) 697 self._attrs = dict(attrs) if attrs else None 698 self._close = None File ~/Documents/Work/Code/xarray/xarray/core/dataset.py:422, in merge_data_and_coords(data_vars, coords) 418 coords = create_coords_with_default_indexes(coords, data_vars) 420 # exclude coords from alignment (all variables in a Coordinates object should 421 # already be aligned together) and use coordinates' indexes to align data_vars --> 422 return merge_core( 423 [data_vars, coords], 424 compat="broadcast_equals", 425 join="outer", 426 explicit_coords=tuple(coords), 427 indexes=coords.xindexes, 428 priority_arg=1, 429 skip_align_args=[1], 430 ) File ~/Documents/Work/Code/xarray/xarray/core/merge.py:718, in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value, skip_align_args) 715 for pos, obj in skip_align_objs: 716 aligned.insert(pos, obj) --> 718 collected = collect_variables_and_indexes(aligned, indexes=indexes) 719 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat) 720 variables, out_indexes = merge_collected( 721 collected, prioritized, compat=compat, combine_attrs=combine_attrs 722 ) File ~/Documents/Work/Code/xarray/xarray/core/merge.py:358, in collect_variables_and_indexes(list_of_mappings, indexes) 355 indexes_.pop(name, None) 356 append_all(coords_, indexes_) --> 358 variable = as_variable(variable, name=name, auto_convert=False) 359 if name in indexes: 360 append(name, variable, indexes[name]) File ~/Documents/Work/Code/xarray/xarray/core/variable.py:126, in as_variable(obj, name, auto_convert) 124 obj = obj.copy(deep=False) 125 elif isinstance(obj, tuple): --> 126 if isinstance(obj[1], DataArray): 127 raise TypeError( 128 f"Variable {name!r}: Using a DataArray object to construct a variable is" 129 " ambiguous, please extract the data using the .data property." 130 ) 131 try: IndexError: tuple index out of range ``` Anything else we need to know? No response Environment Xarray `main`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8860/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2057651682	PR_kwDOAMm_X85i2Byx	8573	ddof vs correction kwargs in std/var	TomNicholas 35968931	closed	0	2023-12-27T18:10:52Z	2024-04-04T16:46:55Z	2024-04-04T16:46:55Z	MEMBER	0	pydata/xarray/pulls/8573	[x] Attempt to closes issue described in https://github.com/pydata/xarray/issues/8566#issuecomment-1870472827 [x] Tests added [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] New functions/methods are listed in `api.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8573/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2218574880	PR_kwDOAMm_X85rVXJC	8899	New empty whatsnew entry	TomNicholas 35968931	closed	0	2024-04-01T16:04:27Z	2024-04-01T17:49:09Z	2024-04-01T17:49:06Z	MEMBER	0	pydata/xarray/pulls/8899	Should have been done as part of the last release https://github.com/pydata/xarray/releases/tag/v2024.03.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8899/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1336119080	PR_kwDOAMm_X849CQ7A	6908	Hypothesis strategies in xarray.testing.strategies	TomNicholas 35968931	open	15	2022-08-11T15:20:56Z	2024-04-01T16:01:21Z		MEMBER	0	pydata/xarray/pulls/6908	Adds a whole suite of hypothesis strategies for generating xarray objects, inspired by and separated out from the new hypothesis strategies in #4972. They are placed into the namespace `xarray.testing.strategies`, and publicly mentioned in the API docs, but with a big warning message. There is also a new `testing` page in the user guide documenting how to use these strategies. [x] Closes #6911 [x] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [x] New functions/methods are listed in `api.rst` EDIT: A `variables` strategy and user-facing documentation were shipped in https://github.com/pydata/xarray/pull/8404	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6908/reactions", "total_count": 2, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 2, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2120340151	PR_kwDOAMm_X85mHqI0	8714	Avoid coercing to numpy in `as_shared_dtypes`	TomNicholas 35968931	open	3	2024-02-06T09:35:22Z	2024-03-28T18:31:50Z		MEMBER	0	pydata/xarray/pulls/8714	[x] Solves the problem in https://github.com/pydata/xarray/pull/8712#issuecomment-1929037299 [ ] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8714/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2213406564	PR_kwDOAMm_X85rEF-X	8886	Allow multidimensional variable with same name as dim when constructing dataset via coords	TomNicholas 35968931	closed	2	2024-03-28T14:37:27Z	2024-03-28T17:07:10Z	2024-03-28T16:28:09Z	MEMBER	0	pydata/xarray/pulls/8886	Supercedes #8884 as a way to close #8883, in light of me having learnt that this is now allowed! https://github.com/pydata/xarray/issues/8883#issuecomment-2024645815. So this is really a follow-up to #7989. [x] Closes #8883 [x] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8886/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2212186122	I_kwDOAMm_X86D20gK	8883	Coordinates object permits invalid state	TomNicholas 35968931	closed	2	2024-03-28T01:49:21Z	2024-03-28T16:28:11Z	2024-03-28T16:28:11Z	MEMBER			What happened? It is currently possible to create a `Coordinates` object where a variable shares a name with a dimension, but the variable is not 1D. This is explicitly forbidden by the xarray data model. What did you expect to happen? If you try to pass the resulting object into the `Dataset` constructor you get the expected error telling you that this is forbidden, but that error should have been raised by `Coordinates.__init__`. Minimal Complete Verifiable Example ```Python In [1]: from xarray.core.coordinates import Coordinates In [2]: from xarray.core.variable import Variable In [4]: import numpy as np In [5]: var = Variable(data=np.arange(6).reshape(2, 3), dims=['x', 'y']) In [6]: var Out[6]: <xarray.Variable (x: 2, y: 3)> Size: 48B array([[0, 1, 2], [3, 4, 5]]) In [7]: coords = Coordinates(coords={'x': var}, indexes={}) In [8]: coords Out[8]: Coordinates: x (x, y) int64 48B 0 1 2 3 4 5 In [10]: import xarray as xr In [11]: ds = xr.Dataset(coords=coords) MergeError Traceback (most recent call last) Cell In[11], line 1 ----> 1 ds = xr.Dataset(coords=coords) File ~/Documents/Work/Code/xarray/xarray/core/dataset.py:693, in Dataset.init(self, data_vars, coords, attrs) 690 if isinstance(coords, Dataset): 691 coords = coords._variables --> 693 variables, coord_names, dims, indexes, _ = merge_data_and_coords( 694 data_vars, coords 695 ) 697 self._attrs = dict(attrs) if attrs else None 698 self._close = None File ~/Documents/Work/Code/xarray/xarray/core/dataset.py:422, in merge_data_and_coords(data_vars, coords) 418 coords = create_coords_with_default_indexes(coords, data_vars) 420 # exclude coords from alignment (all variables in a Coordinates object should 421 # already be aligned together) and use coordinates' indexes to align data_vars --> 422 return merge_core( 423 [data_vars, coords], 424 compat="broadcast_equals", 425 join="outer", 426 explicit_coords=tuple(coords), 427 indexes=coords.xindexes, 428 priority_arg=1, 429 skip_align_args=[1], 430 ) File ~/Documents/Work/Code/xarray/xarray/core/merge.py:731, in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value, skip_align_args) 729 coord_names.intersection_update(variables) 730 if explicit_coords is not None: --> 731 assert_valid_explicit_coords(variables, dims, explicit_coords) 732 coord_names.update(explicit_coords) 733 for dim, size in dims.items(): File ~/Documents/Work/Code/xarray/xarray/core/merge.py:577, in assert_valid_explicit_coords(variables, dims, explicit_coords) 575 for coord_name in explicit_coords: 576 if coord_name in dims and variables[coord_name].dims != (coord_name,): --> 577 raise MergeError( 578 f"coordinate {coord_name} shares a name with a dataset dimension, but is " 579 "not a 1D variable along that dimension. This is disallowed " 580 "by the xarray data model." 581 ) MergeError: coordinate x shares a name with a dataset dimension, but is not a 1D variable along that dimension. This is disallowed by the xarray data model. ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [x] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response Anything else we need to know? I noticed this whilst working on #8872 Environment `main`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8883/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2212211084	PR_kwDOAMm_X85rABMo	8884	Forbid invalid Coordinates object	TomNicholas 35968931	closed	2	2024-03-28T02:14:01Z	2024-03-28T14:38:43Z	2024-03-28T14:38:03Z	MEMBER	0	pydata/xarray/pulls/8884	[x] Closes #8883 [x] Tests added [ ] ~~User visible changes (including notable bug fixes) are documented in `whats-new.rst`~~ [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8884/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2119537681	PR_kwDOAMm_X85mE7Im	8711	Opt out of auto creating index variables	TomNicholas 35968931	closed	11	2024-02-05T22:04:36Z	2024-03-26T13:55:16Z	2024-03-26T13:50:14Z	MEMBER	0	pydata/xarray/pulls/8711	Tries fixing #8704 by cherry-picking from #8124 as @benbovy suggested in https://github.com/pydata/xarray/issues/8704#issuecomment-1926868422 [x] Closes #8704 [x] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8711/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2117248281	I_kwDOAMm_X85-MqUZ	8704	Currently no way to create a Coordinates object without indexes for 1D variables	TomNicholas 35968931	closed	4	2024-02-04T18:30:18Z	2024-03-26T13:50:16Z	2024-03-26T13:50:15Z	MEMBER			What happened? The workaround described in https://github.com/pydata/xarray/pull/8107#discussion_r1311214263 does not seem to work on `main`, meaning that I think there is currently no way to create an `xr.Coordinates` object without 1D variables being coerced to indexes. This means there is no way to create a `Dataset` object without 1D variables becoming `IndexVariables` being coerced to indexes. What did you expect to happen? I expected to at least be able to use the workaround described in https://github.com/pydata/xarray/pull/8107#discussion_r1311214263, i.e. `python xr.Coordinates({'x': ('x', uarr)}, indexes={})` where `uarr` is an un-indexable array-like. Minimal Complete Verifiable Example ```Python class UnindexableArrayAPI: ... class UnindexableArray: """ Presents like an N-dimensional array but doesn't support changes of any kind, nor can it be coerced into a np.ndarray or pd.Index. """ _shape: tuple[int, ...] _dtype: np.dtype def __init__(self, shape: tuple[int, ...], dtype: np.dtype) -> None: self._shape = shape self._dtype = dtype self.__array_namespace__ = UnindexableArrayAPI @property def dtype(self) -> np.dtype: return self._dtype @property def shape(self) -> tuple[int, ...]: return self._shape @property def ndim(self) -> int: return len(self.shape) @property def size(self) -> int: return np.prod(self.shape) @property def T(self) -> Self: raise NotImplementedError() def __repr__(self) -> str: return f"UnindexableArray(shape={self.shape}, dtype={self.dtype})" def _repr_inline_(self, max_width): """ Format to a single line with at most max_width characters. Used by xarray. """ return self.__repr__() def __getitem__(self, key, /) -> Self: """ Only supports extremely limited indexing. I only added this method because xarray will apparently attempt to index into its lazy indexing classes even if the operation would be a no-op anyway. """ from xarray.core.indexing import BasicIndexer if isinstance(key, BasicIndexer) and key.tuple == ((slice(None),) * self.ndim): # no-op return self else: raise NotImplementedError() def __array__(self) -> np.ndarray: raise NotImplementedError("UnindexableArrays can't be converted into numpy arrays or pandas Index objects") ``` ```python uarr = UnindexableArray(shape=(3,), dtype=np.dtype('int32')) xr.Variable(data=uarr, dims=['x']) # works fine xr.Coordinates({'x': ('x', uarr)}, indexes={}) # works in xarray v2023.08.0 but in versions after that it triggers the NotImplementedError in `__array__`:python NotImplementedError Traceback (most recent call last) Cell In[59], line 1 ----> 1 xr.Coordinates({'x': ('x', uarr)}, indexes={}) File ~/Documents/Work/Code/xarray/xarray/core/coordinates.py:301, in Coordinates.init(self, coords, indexes) 299 variables = {} 300 for name, data in coords.items(): --> 301 var = as_variable(data, name=name) 302 if var.dims == (name,) and indexes is None: 303 index, index_vars = create_default_index_implicit(var, list(coords)) File ~/Documents/Work/Code/xarray/xarray/core/variable.py:159, in as_variable(obj, name) 152 raise TypeError( 153 f"Variable {name!r}: unable to convert object into a variable without an " 154 f"explicit list of dimensions: {obj!r}" 155 ) 157 if name is not None and name in obj.dims and obj.ndim == 1: 158 # automatically convert the Variable into an Index --> 159 obj = obj.to_index_variable() 161 return obj File ~/Documents/Work/Code/xarray/xarray/core/variable.py:572, in Variable.to_index_variable(self) 570 def to_index_variable(self) -> IndexVariable: 571 """Return this variable as an xarray.IndexVariable""" --> 572 return IndexVariable( 573 self._dims, self._data, self._attrs, encoding=self._encoding, fastpath=True 574 ) File ~/Documents/Work/Code/xarray/xarray/core/variable.py:2642, in IndexVariable.init(self, dims, data, attrs, encoding, fastpath) 2640 # Unlike in Variable, always eagerly load values into memory 2641 if not isinstance(self._data, PandasIndexingAdapter): -> 2642 self._data = PandasIndexingAdapter(self._data) File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:1481, in PandasIndexingAdapter.init(self, array, dtype) 1478 def init(self, array: pd.Index, dtype: DTypeLike = None): 1479 from xarray.core.indexes import safe_cast_to_index -> 1481 self.array = safe_cast_to_index(array) 1483 if dtype is None: 1484 self._dtype = get_valid_numpy_dtype(array) File ~/Documents/Work/Code/xarray/xarray/core/indexes.py:469, in safe_cast_to_index(array) 459 emit_user_level_warning( 460 ( 461 "`pandas.Index` does not support the `float16` dtype." (...) 465 category=DeprecationWarning, 466 ) 467 kwargs["dtype"] = "float64" --> 469 index = pd.Index(np.asarray(array), kwargs) 471 return _maybe_cast_to_cftimeindex(index) Cell In[55], line 63, in UnindexableArray.array(self) 62 def array*(self) -> np.ndarray: ---> 63 raise NotImplementedError("UnindexableArrays can't be converted into numpy arrays or pandas Index objects") NotImplementedError: UnindexableArrays can't be converted into numpy arrays or pandas Index objects ``` MVCE confirmation [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [x] Complete example — the example is self-contained, including all data and the text of any traceback. [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [x] New issue — a search of GitHub Issues suggests this is not a duplicate. [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response* Anything else we need to know? Context is #8699 Environment Versions described above	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8704/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1247010680	I_kwDOAMm_X85KU994	6633	Opening dataset without loading any indexes?	TomNicholas 35968931	open	10	2022-05-24T19:06:09Z	2024-02-23T05:36:53Z		MEMBER			Is your feature request related to a problem? Within pangeo-forge's internals we would like to call `open_dataset`, then `to_dict()`, and end up with a schema-like representation of the contents of the dataset. This works, but it also has the side-effect of loading all indexes into memory, even if we are loading the data values "lazily". Describe the solution you'd like @benbovy do you think it would be possible to (perhaps optionally) also avoid loading indexes upon opening a dataset, so that we actually don't load anything? The end result would act a bit like `ncdump` does. Describe alternatives you've considered Otherwise we might have to try using xarray-schema or something but the suggestion here would be much neater and more flexible. xref: https://github.com/pangeo-forge/pangeo-forge-recipes/issues/256 cc @rabernat @jhamman @cisaacstern	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6633/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1912094632	I_kwDOAMm_X85x-D-o	8231	xr.concat concatenates along dimensions that it wasn't asked to	TomNicholas 35968931	open	4	2023-09-25T18:50:29Z	2024-02-14T20:30:26Z		MEMBER			What happened? Here are two toy datasets designed to represent sections of a dataset that has variables living on a staggered grid. This type of dataset is common in fluid modelling (it's why xGCM exists). ```python import xarray as xr ds1 = xr.Dataset( coords={ 'x_center': ('x_center', [1, 2, 3]), 'x_outer': ('x_outer', [0.5, 1.5, 2.5, 3.5]), }, ) ds2 = xr.Dataset( coords={ 'x_center': ('x_center', [4, 5, 6]), 'x_outer': ('x_outer', [4.5, 5.5, 6.5]), }, ) ``` Calling `xr.concat` on these with `dim='x_center'` happily concatenates them `python xr.concat([ds1, ds2], dim='x_center')` `<xarray.Dataset> Dimensions: (x_outer: 7, x_center: 6) Coordinates: * x_outer (x_outer) float64 0.5 1.5 2.5 3.5 4.5 5.5 6.5 * x_center (x_center) int64 1 2 3 4 5 6 Data variables: empty` but notice that the returned result has been concatenated along both `x_center` and `x_outer`. What did you expect to happen? I did not expect this to work. I definitely didn't expect the datasets to be concatenated along a dimension I didn't ask them to be concatenated along (i.e. `x_outer`). What I expected to happen was that (as by default `coords='different'`) both variables would be attempted to be concatenated along the `x_center` dimension, which would have succeeded for the `x_center` variable but failed for the `x_outer` variable. Indeed, if I name the variables differently so that they are no longer coordinate variables then that is what happens: ```python import xarray as xr ds1 = xr.Dataset( data_vars={ 'a': ('x_center', [1, 2, 3]), 'b': ('x_outer', [0.5, 1.5, 2.5, 3.5]), }, ) ds2 = xr.Dataset( data_vars={ 'a': ('x_center', [4, 5, 6]), 'b': ('x_outer', [4.5, 5.5, 6.5]), }, ) python xr.concat([ds1, ds2], dim='x_center', data_vars='different') ValueError: cannot reindex or align along dimension 'x_outer' because of conflicting dimension sizes: {3, 4} ``` Minimal Complete Verifiable Example No response MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? I was trying to create an example for which you would need the automatic combined concat/merge that happens within `xr.combine_by_coords`. Environment xarray `2023.8.0`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8231/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1945654275	PR_kwDOAMm_X85c7HL_	8319	Move parallelcompat and chunkmanagers to NamedArray	TomNicholas 35968931	closed	9	2023-10-16T16:34:26Z	2024-02-12T22:09:24Z	2024-02-12T22:09:24Z	MEMBER	0	pydata/xarray/pulls/8319	@dcherian I got to this point before realizing that simply moving `parallelcompat.py` over isn't what it says in the design doc, which instead talks about Could this functionality be left in Xarray proper for now? Alternative array types like JAX also have some notion of "chunks" for parallel arrays, but the details differ in a number of ways from the Dask/Cubed. Perhaps variable.chunk/load methods should become functions defined in xarray that convert Variable objects. This is easy so long as xarray can reach in and replace `.data` I personally think that simply moving parallelcompat makes sense so long as you expect people to use chunked `NamedArray` objects. I see the chunked arrays as special cases of duck arrays, and my understanding is that `NamedArray` is supposed to have full support for duckarrays. cc @andersy005 [x] As requested in #8238 [ ] ~~Tests added~~ [ ] ~~User visible changes (including notable bug fixes) are documented in `whats-new.rst`~~ [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8319/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2120030667	PR_kwDOAMm_X85mGm4g	8712	Only use CopyOnWriteArray wrapper on BackendArrays	TomNicholas 35968931	open	6	2024-02-06T06:05:53Z	2024-02-07T17:09:56Z		MEMBER	0	pydata/xarray/pulls/8712	This makes sure we only use the `CopyOnWriteArray` wrapper on arrays that have been explicitly marked to be lazily-loaded (through being subclasses of `BackendArray`). Without this change we are implicitly assuming that any array type obtained through the BackendEntrypoint system should be treated as if it points to an on-disk array. Motivated by https://github.com/pydata/xarray/issues/8699, which is a counterexample to that assumption. [ ] Closes #xxxx [ ] Tests added [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] New functions/methods are listed in `api.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8712/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2098882374	I_kwDOAMm_X859GmdG	8660	dtype encoding ignored during IO?	TomNicholas 35968931	closed	3	2024-01-24T18:50:47Z	2024-02-05T17:35:03Z	2024-02-05T17:35:02Z	MEMBER			What happened? When I set the `.encoding['dtype']` attribute before saving a to disk, the actual on-disk representation appears to store a record of the dtype encoding, but when opening it back up in xarray I get the same dtype I had before, not the one specified in the encoding. Is that what's supposed to happen? How does this work? (This happens with both zarr and netCDF.) What did you expect to happen? I expected that setting `.encoding['dtype']` would mean that once I open the data back up, it would be in the new dtype that I set in the encoding. Minimal Complete Verifiable Example ```Python air = xr.tutorial.open_dataset('air_temperature') air['air'].dtype # returns dtype('float32') air['air'].encoding['dtype'] # returns dtype('int16'), which already seems weird air.to_zarr('air.zarr') # I would assume here that the encoding actually does something during IO now if I check the zarr `.zarray` metadata for the `air` variable it says `"dtype":`"<i2"` air2 = xr.open_dataset('air.zarr', engine='zarr') # open it back up air2['air'].dtype # returns dtype('float32'), but I expected dtype('int16') (the same thing happens also with saving to netCDF instead of Zarr) ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response Anything else we need to know? I know I didn't explicitly cast with `.asdtype`, but I'm still confused as to what the relation between the dtype encoding is supposed to be here. I am probably just misunderstanding how this is supposed to work, but then this is arguably a docs issue, because here it says "[the encoding dtype field] controls the type of the data written on disk", which I would have thought also affects the data you get back when you open it up again? Environment `main` branch of xarray	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8660/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2118876352	PR_kwDOAMm_X85mCobE	8708	Try pydata-sphinx-theme in docs	TomNicholas 35968931	open	1	2024-02-05T15:50:01Z	2024-02-05T16:57:33Z		MEMBER	0	pydata/xarray/pulls/8708	[x] Closes #8701 [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` How might we want to move headers/sections around to take advantage of now having a navigation bar at the top? Adding an explicit link to the tutorial.xarray.dev site would be good.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8708/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2116695961	I_kwDOAMm_X85-KjeZ	8699	Wrapping a `kerchunk.Array` object directly with xarray	TomNicholas 35968931	open	3	2024-02-03T22:15:07Z	2024-02-04T21:15:14Z		MEMBER			What is your issue? In https://github.com/fsspec/kerchunk/issues/377 the idea came up of using the xarray API to concatenate arrays which represent parts of a zarr store - i.e. using xarray to kerchunk a large set of netCDF files instead of using `kerchunk.combine.MultiZarrToZarr`. The idea is to make something like this work for kerchunking sets of netCDF files into zarr stores ```python ds = xr.open_mfdataset( '/my/files*.nc' engine='kerchunk', # kerchunk registers an xarray IO backend that returns zarr.Array objects combine='nested', # 'by_coords' would require actually reading coordinate data parallel=True, # would use dask.delayed to generate reference dicts for each file in parallel ) ds # now wraps a bunch of zarr.Array / kerchunk.Array objects, no need for dask arrays ds.kerchunk.to_zarr(store='out.zarr') # kerchunk defines an xarray accessor that extracts the zarr arrays and serializes them (which could also be done in parallel if writing to parquet) ``` I had a go at doing this in this notebook, and in doing so discovered a few potential issues with xarray's internals. For this to work xarray has to: - Wrap a `kerchunk.Array` object which barely defines any array API methods, including basically not supporting indexing at all, - Store all the information present in a kerchunked Zarr store but without ever loading any data, - Not create any indexes by default during dataset construction or during `xr.concat`, - Not try to do anything else that can't be defined for a `kerchunk.Array`. - Possibly we need the Lazy Indexing classes to support concatenation https://github.com/pydata/xarray/issues/4628 It's an interesting exercise in using xarray as an abstraction, with no access to real numerical values at all.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8699/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 }		xarray 13221727	issue
2099530269	I_kwDOAMm_X859JEod	8665	Error when broadcasting array API compliant class	TomNicholas 35968931	closed	1	2024-01-25T04:11:14Z	2024-01-26T16:41:31Z	2024-01-26T16:41:31Z	MEMBER			What happened? Broadcasting fails for array types that strictly follow the array API standard. What did you expect to happen? With a normal numpy array this obviously works fine. Minimal Complete Verifiable Example ```Python import numpy.array_api as nxp arr = nxp.asarray([[1, 2, 3], [4, 5, 6]], dtype=np.dtype('float32')) var = xr.Variable(data=arr, dims=['x', 'y']) var.isel(x=0) # this is fine var * var.isel(x=0) # this is not IndexError Traceback (most recent call last) Cell In[31], line 1 ----> 1 var * var.isel(x=0) File ~/Documents/Work/Code/xarray/xarray/core/_typed_ops.py:487, in VariableOpsMixin.mul(self, other) 486 def mul(self, other: VarCompatible) -> Self \| T_DataArray: --> 487 return self._binary_op(other, operator.mul) File ~/Documents/Work/Code/xarray/xarray/core/variable.py:2406, in Variable._binary_op(self, other, f, reflexive) 2404 other_data, self_data, dims = _broadcast_compat_data(other, self) 2405 else: -> 2406 self_data, other_data, dims = _broadcast_compat_data(self, other) 2407 keep_attrs = _get_keep_attrs(default=False) 2408 attrs = self._attrs if keep_attrs else None File ~/Documents/Work/Code/xarray/xarray/core/variable.py:2922, in _broadcast_compat_data(self, other) 2919 def _broadcast_compat_data(self, other): 2920 if all(hasattr(other, attr) for attr in ["dims", "data", "shape", "encoding"]): 2921 # `other` satisfies the necessary Variable API for broadcast_variables -> 2922 new_self, new_other = _broadcast_compat_variables(self, other) 2923 self_data = new_self.data 2924 other_data = new_other.data File ~/Documents/Work/Code/xarray/xarray/core/variable.py:2899, in _broadcast_compat_variables(variables) 2893 """Create broadcast compatible variables, with the same dimensions. 2894 2895 Unlike the result of broadcast_variables(), some variables may have 2896 dimensions of size 1 instead of the size of the broadcast dimension. 2897 """ 2898 dims = tuple(_unified_dims(variables)) -> 2899 return tuple(var.set_dims(dims) if var.dims != dims else var for var in variables) File ~/Documents/Work/Code/xarray/xarray/core/variable.py:2899, in <genexpr>(.0) 2893 """Create broadcast compatible variables, with the same dimensions. 2894 2895 Unlike the result of broadcast_variables(), some variables may have 2896 dimensions of size 1 instead of the size of the broadcast dimension. 2897 """ 2898 dims = tuple(_unified_dims(variables)) -> 2899 return tuple(var.set_dims(dims) if var.dims != dims else var for var in variables) File ~/Documents/Work/Code/xarray/xarray/core/variable.py:1479, in Variable.set_dims(self, dims, shape) 1477 expanded_data = duck_array_ops.broadcast_to(self.data, tmp_shape) 1478 else: -> 1479 expanded_data = self.data[(None,) (len(expanded_dims) - self.ndim)] 1481 expanded_var = Variable( 1482 expanded_dims, expanded_data, self._attrs, self._encoding, fastpath=True 1483 ) 1484 return expanded_var.transpose(dims) File ~/miniconda3/envs/dev3.11/lib/python3.12/site-packages/numpy/array_api/_array_object.py:555, in Array.getitem(self, key) 550 """ 551 Performs the operation getitem. 552 """ 553 # Note: Only indices required by the spec are allowed. See the 554 # docstring of _validate_index --> 555 self._validate_index(key) 556 if isinstance(key, Array): 557 # Indexing self._array with array_api arrays can be erroneous 558 key = key._array File ~/miniconda3/envs/dev3.11/lib/python3.12/site-packages/numpy/array_api/_array_object.py:348, in Array._validate_index(self, key) 344 elif n_ellipsis == 0: 345 # Note boolean masks must be the sole index, which we check for 346 # later on. 347 if not key_has_mask and n_single_axes < self.ndim: --> 348 raise IndexError( 349 f"{self.ndim=}, but the multi-axes index only specifies " 350 f"{n_single_axes} dimensions. If this was intentional, " 351 "add a trailing ellipsis (...) which expands into as many " 352 "slices (:) as necessary - this is what np.ndarray arrays " 353 "implicitly do, but such flat indexing behaviour is not " 354 "specified in the Array API." 355 ) 357 if n_ellipsis == 0: 358 indexed_shape = self.shape IndexError: self.ndim=1, but the multi-axes index only specifies 0 dimensions. If this was intentional, add a trailing ellipsis (...) which expands into as many slices (:) as necessary - this is what np.ndarray arrays implicitly do, but such flat indexing behaviour is not specified in the Array API. ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response* Anything else we need to know? No response Environment main branch of xarray, numpy 1.26.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8665/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2099622643	PR_kwDOAMm_X85lBkos	8668	Fix unstack method when wrapping array api class	TomNicholas 35968931	closed	0	2024-01-25T05:54:38Z	2024-01-26T16:06:04Z	2024-01-26T16:06:01Z	MEMBER	0	pydata/xarray/pulls/8668	[x] Closes #8666 [x] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8668/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2099550299	I_kwDOAMm_X859JJhb	8666	Error unstacking array API compliant class	TomNicholas 35968931	closed	0	2024-01-25T04:35:09Z	2024-01-26T16:06:02Z	2024-01-26T16:06:02Z	MEMBER			What happened? Unstacking fails for array types that strictly follow the array API standard. What did you expect to happen? This obviously works fine with a normal numpy array. Minimal Complete Verifiable Example ```Python import numpy.array_api as nxp arr = nxp.asarray([[1, 2, 3], [4, 5, 6]], dtype=np.dtype('float32')) da = xr.DataArray( arr, coords=[("x", ["a", "b"]), ("y", [0, 1, 2])], ) da stacked = da.stack(z=("x", "y")) stacked.indexes["z"] stacked.unstack() AttributeError Traceback (most recent call last) Cell In[65], line 8 6 stacked = da.stack(z=("x", "y")) 7 stacked.indexes["z"] ----> 8 roundtripped = stacked.unstack() 9 arr.identical(roundtripped) File ~/Documents/Work/Code/xarray/xarray/util/deprecation_helpers.py:115, in _deprecate_positional_args.<locals>._decorator.<locals>.inner(args, kwargs) 111 kwargs.update({name: arg for name, arg in zip_args}) 113 return func(args[:-n_extra_args], *kwargs) --> 115 return func(args,** kwargs) File ~/Documents/Work/Code/xarray/xarray/core/dataarray.py:2913, in DataArray.unstack(self, dim, fill_value, sparse) 2851 @_deprecate_positional_args("v2023.10.0") 2852 def unstack( 2853 self, (...) 2857 sparse: bool = False, 2858 ) -> Self: 2859 """ 2860 Unstack existing dimensions corresponding to MultiIndexes into 2861 multiple new dimensions. (...) 2911 DataArray.stack 2912 """ -> 2913 ds = self._to_temp_dataset().unstack(dim, fill_value=fill_value, sparse=sparse) 2914 return self._from_temp_dataset(ds) File ~/Documents/Work/Code/xarray/xarray/util/deprecation_helpers.py:115, in _deprecate_positional_args.<locals>._decorator.<locals>.inner(args, kwargs) 111 kwargs.update({name: arg for name, arg in zip_args}) 113 return func(args[:-n_extra_args], *kwargs) --> 115 return func(args, kwargs) File ~/Documents/Work/Code/xarray/xarray/core/dataset.py:5581, in Dataset.unstack(self, dim, fill_value, sparse) 5579 for d in dims: 5580 if needs_full_reindex: -> 5581 result = result._unstack_full_reindex( 5582 d, stacked_indexes[d], fill_value, sparse 5583 ) 5584 else: 5585 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse) File ~/Documents/Work/Code/xarray/xarray/core/dataset.py:5474, in Dataset._unstack_full_reindex(self, dim, index_and_vars, fill_value, sparse) 5472 if name not in index_vars: 5473 if dim in var.dims: -> 5474 variables[name] = var.unstack({dim: new_dim_sizes}) 5475 else: 5476 variables[name] = var File ~/Documents/Work/Code/xarray/xarray/core/variable.py:1684, in Variable.unstack(self, dimensions, dimensions_kwargs) 1682 result = self 1683 for old_dim, dims in dimensions.items(): -> 1684 result = result._unstack_once_full(dims, old_dim) 1685 return result File ~/Documents/Work/Code/xarray/xarray/core/variable.py:1574, in Variable._unstack_once_full(self, dim, old_dim) 1571 reordered = self.transpose(dim_order) 1573 new_shape = reordered.shape[: len(other_dims)] + new_dim_sizes -> 1574 new_data = reordered.data.reshape(new_shape) 1575 new_dims = reordered.dims[: len(other_dims)] + new_dim_names 1577 return type(self)( 1578 new_dims, new_data, self._attrs, self._encoding, fastpath=True 1579 ) AttributeError: 'Array' object has no attribute 'reshape' ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response* Anything else we need to know? It fails on the `arr.reshape` call, because the array API standard has reshape be a function, not a method. We do in fact have an array API-compatible version of `reshape` defined in `duck_array_ops.py`, it just apparently isn't yet used everywhere we call reshape. https://github.com/pydata/xarray/blob/037a39e249e5387bc15de447c57bfd559fd5a574/xarray/core/duck_array_ops.py#L363 Environment main branch of xarray, numpy 1.26.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8666/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2099591300	I_kwDOAMm_X859JTiE	8667	Error using vectorized indexing with array API compliant class	TomNicholas 35968931	open	0	2024-01-25T05:20:31Z	2024-01-25T16:07:12Z		MEMBER			What happened? Vectorized indexing can fail for array types that strictly follow the array API standard. What did you expect to happen? Vectorized indexing to all work. Minimal Complete Verifiable Example ```Python import numpy.array_api as nxp da = xr.DataArray( nxp.reshape(nxp.arange(12), (3, 4)), dims=["x", "y"], coords={"x": [0, 1, 2], "y": ["a", "b", "c", "d"]}, ) da[[0, 2, 2], [1, 3]] # works ind_x = xr.DataArray([0, 1], dims=["x"]) ind_y = xr.DataArray([0, 1], dims=["y"]) da[ind_x, ind_y] # works da[[0, 1], ind_x] # doesn't work TypeError Traceback (most recent call last) Cell In[157], line 1 ----> 1 da[[0, 1], ind_x] File ~/Documents/Work/Code/xarray/xarray/core/dataarray.py:859, in DataArray.getitem(self, key) 856 return self._getitem_coord(key) 857 else: 858 # xarray-style array indexing --> 859 return self.isel(indexers=self._item_key_to_dict(key)) File ~/Documents/Work/Code/xarray/xarray/core/dataarray.py:1472, in DataArray.isel(self, indexers, drop, missing_dims, indexers_kwargs) 1469 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "isel") 1471 if any(is_fancy_indexer(idx) for idx in indexers.values()): -> 1472 ds = self._to_temp_dataset()._isel_fancy( 1473 indexers, drop=drop, missing_dims=missing_dims 1474 ) 1475 return self._from_temp_dataset(ds) 1477 # Much faster algorithm for when all indexers are ints, slices, one-dimensional 1478 # lists, or zero or one-dimensional np.ndarray's File ~/Documents/Work/Code/xarray/xarray/core/dataset.py:3001, in Dataset._isel_fancy(self, indexers, drop, missing_dims) 2997 var_indexers = { 2998 k: v for k, v in valid_indexers.items() if k in var.dims 2999 } 3000 if var_indexers: -> 3001 new_var = var.isel(indexers=var_indexers) 3002 # drop scalar coordinates 3003 # https://github.com/pydata/xarray/issues/6554 3004 if name in self.coords and drop and new_var.ndim == 0: File ~/Documents/Work/Code/xarray/xarray/core/variable.py:1130, in Variable.isel(self, indexers, missing_dims, indexers_kwargs) 1127 indexers = drop_dims_from_indexers(indexers, self.dims, missing_dims) 1129 key = tuple(indexers.get(dim, slice(None)) for dim in self.dims) -> 1130 return self[key] File ~/Documents/Work/Code/xarray/xarray/core/variable.py:812, in Variable.getitem(self, key) 799 """Return a new Variable object whose contents are consistent with 800 getting the provided key from the underlying data. 801 (...) 809 array `x.values` directly. 810 """ 811 dims, indexer, new_order = self._broadcast_indexes(key) --> 812 data = as_indexable(self._data)[indexer] 813 if new_order: 814 data = np.moveaxis(data, range(len(new_order)), new_order) File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:1390, in ArrayApiIndexingAdapter.getitem(self, key) 1388 else: 1389 if isinstance(key, VectorizedIndexer): -> 1390 raise TypeError("Vectorized indexing is not supported") 1391 else: 1392 raise TypeError(f"Unrecognized indexer: {key}") TypeError: Vectorized indexing is not supported ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response Anything else we need to know? I don't really understand why the first two examples work but the last one doesn't... Environment main branch of xarray, numpy 1.26.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8667/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1332231863	I_kwDOAMm_X85PaD63	6894	Public testing framework for duck array integration	TomNicholas 35968931	open	8	2022-08-08T18:23:49Z	2024-01-25T04:04:11Z		MEMBER			What is your issue? In #4972 @keewis started writing a public framework for testing the integration of any duck array class in xarray, inspired by the testing framework pandas has for `ExtensionArrays`. This is a meta-issue for what our version of that framework for wrapping numpy-like duck arrays should look like. (Feel free to edit / add to this) What behaviour should we test? We have a lot of xarray methods to test with any type of duck array. Each of these bullets should correspond to one or more testing base classes which the duck array library author would inherit from. In rough order of increasing complexity: [x] Constructors - Including for `Variable` #6903 [x] Properties - checking that `.shape`, `.dtype` etc. exist on the wrapped array, see #4285 for example #6903 [x] Reductions - #4972 also uses parameters to automatically test many methods, and hypothesis to test each method for many different array instances. [ ] Unary ops [ ] Binary ops [ ] Selection [ ] Computation [ ] Combining [ ] Groupby [ ] Rolling [ ] Coarsen [ ] Weighted We don't need to test that the array class obeys everything else in the Array API Standard. (For instance `.device` is probably never going to be used by xarray directly.) We instead assume that if the array class doesn't implement something in the API standard but all the generated tests pass, then all is well. How extensible does our testing framework need to be? To be able to test any type of wrapped array our testing framework needs to itself be quite flexible. User-defined checking - For some arrays `np.testing.assert_equal` is not enough to guarantee correctness, so the user creating tests needs to specify additional checks. #4972 shows how to do this for checking the units of resulting pint arrays. User-created data? - Some array libraries might need to test array data that is invalid for numpy arrays. I'm thinking specifically of testing wrapping ragged arrays. #4285 Parallel computing frameworks? - Related to the last point is chunked arrays. Here the strategy requires an extra `chunks` argument when the array is created, and any results need to first call `.compute()`. Testing parallel-executed arrays might also require pretty complicated `SetUps` and `TearDowns` in fixtures too. (see also #6807) What documentation / examples do we need? All of this content should really go on a dedicated page in the docs, perhaps grouped alongside other ways of extending xarray. [ ] Motivation [ ] What subset of the Array API standard we expect duck array classes to define (could point to a typing protocol?) [ ] Explanation that the array type needs to return the same type for any numpy-like function which xarray might call upon that type (i.e. the set of duckarray instances is closed under numpy operations) [ ] Explanation of the different base classes [ ] Simple demo of testing a toy numpy-like array class [ ] Point to code testing more advanced examples we actually use (e.g. sparse, pint) [ ] Which advanced behaviours are optional (e.g. Constructors and Properties have to work, but Groupby is optional) Where should duck array compatibility testing eventually live? Right now the tests for sparse & pint are going into the xarray repo, but presumably we don't want tests for every duck array type living in this repository. I suggest that we want to work towards eventually having no array library-specific tests in this repository at all. (Except numpy I guess.) Thanks @crusaderky for the original suggestion. Instead all tests involving pint could live in pint-xarray, all involving sparse could live in the sparse repository (or a new sparse-xarray repo), etc. etc. We would set those test jobs to re-run when xarray is released, and then xref any issues revealed here if needs be. We should probably also move some of our existing tests https://github.com/pydata/xarray/pull/7023#pullrequestreview-1104932752	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6894/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1716228662	I_kwDOAMm_X85mS5I2	7848	Compatibility with the Array API standard	TomNicholas 35968931	open	4	2023-05-18T20:34:43Z	2024-01-25T04:03:42Z		MEMBER			What is your issue? Meta-issue to track all the smaller issues around making xarray and the array API standard compatible with each other. We've already had - #6804 - #7067 - #7847 and there will likely be many others. I suspect this might require changes to the standard as well as to xarray - in particular see this list of common numpy functions which are not currently in the array API standard. Of these xarray currently uses (FYI @ralfgommers ): `np.clip` `np.diff` `np.pad` `np.repeat` ~`np.take`~ ~`np.tile`~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7848/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2098535717	PR_kwDOAMm_X85k94wv	8655	Small improvement to HOW_TO_RELEASE.md	TomNicholas 35968931	closed	1	2024-01-24T15:35:16Z	2024-01-24T21:46:02Z	2024-01-24T21:46:01Z	MEMBER	0	pydata/xarray/pulls/8655	Clarify step 8. by pointing to where the ReadTheDocs build actually is	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8655/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2092346228	PR_kwDOAMm_X85ko-Y2	8632	Pin sphinx-book-theme to 1.0.1 to try to deal with #8619	TomNicholas 35968931	closed	2	2024-01-21T02:18:49Z	2024-01-23T20:16:13Z	2024-01-23T18:28:35Z	MEMBER	0	pydata/xarray/pulls/8632	[x] Hopefully closes #8619	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8632/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2088695240	I_kwDOAMm_X858fvXI	8619	Docs sidebar is squished	TomNicholas 35968931	open	9	2024-01-18T16:54:55Z	2024-01-23T18:38:38Z		MEMBER			What happened? Since the v2024.01.0 release yesterday, there seems to be a rendering error in the website - the sidebar is squished up to the left:	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8619/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	reopened	xarray 13221727	issue
2086704542	PR_kwDOAMm_X85kVyF6	8617	Release summary for release v2024.01.0	TomNicholas 35968931	closed	1	2024-01-17T18:02:29Z	2024-01-17T21:23:45Z	2024-01-17T19:21:11Z	MEMBER	0	pydata/xarray/pulls/8617	Someone give this a thumbs up if it looks good [x] Closes #8616	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8617/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1940536602	I_kwDOAMm_X85zqj0a	8298	cftime.DatetimeNoLeap incorrectly decoded from netCDF file	TomNicholas 35968931	open	14	2023-10-12T18:13:53Z	2024-01-08T01:01:53Z		MEMBER			What happened? I have been given a netCDF file (I think it's netCDF3) which when I open it does not decode the time variable in the way I expected it to. The time coordinate created is a numpy object array What did you expect to happen? I expected it to automatically create a coordinate backed by a `CFTimeIndex` object, not a `CFTimeIndex` object wrapped inside another array type. Minimal Complete Verifiable Example The original problematic file is 455MB (I can share it if necessary), but I can create a small netCDF file that displays the same issue. ```python import cftime time_values = [cftime.DatetimeNoLeap(347, 2, 1, 0, 0, 0, 0, has_year_zero=True)] time_ds = xr.Dataset(coords={'time': (['time'], time_values)}) print(time_ds) time_ds.to_netcdf('time_mwe.nc') <xarray.Dataset> Dimensions: (time: 1) Coordinates: * time (time) object 0347-02-01 00:00:00 Data variables: empty python ds = xr.open_dataset('time_mwe.nc', engine='netcdf4', decode_times=True, use_cftime=True) print(ds) <xarray.Dataset> Dimensions: (time: 1) Coordinates: * time (time) object 0347-02-01 00:00:00 Data variables: empty ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response Anything else we need to know? No response Environment `cftime 1.6.2 netcdf4 1.6.4 xarray 2023.8.0`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8298/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1333644214	PR_kwDOAMm_X8486DyE	6903	Duckarray tests for constructors and properties	TomNicholas 35968931	open	5	2022-08-09T18:36:56Z	2024-01-01T13:33:22Z		MEMBER	0	pydata/xarray/pulls/6903	Builds on top of #4972 to add tests for `Variable/DataArray/Dataset` constructors and properties when wrapping duck arrays. Adds a file `xarray/tests/duckarrays/base/constructors.py` which contains new test base classes. Also uses those new base classes to test Sparse array integration (not yet tried for pint integration). [x] Closes part of #6894 [ ] Tests added (tests for tests?? Maybe...) [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] New functions/methods are listed in `api.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6903/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1519552711	PR_kwDOAMm_X85GqAro	7418	Import datatree in xarray?	TomNicholas 35968931	closed	18	2023-01-04T20:48:09Z	2023-12-22T17:38:04Z	2023-12-22T17:38:04Z	MEMBER	0	pydata/xarray/pulls/7418	I want datatree to live in xarray main, as right now it's in a separate package but imports many xarray internals. This presents a few questions: 1) At what stage is datatree "ready" to moved in here? At what stage should it become encouraged public API? 2) What's a good way to slowly roll the feature out? 3) How do I decrease the bus factor on datatree's code? Can I get some code reviews during the merging process? :pray: 4) Should I make a new CI environment just for testing datatree stuff? Today @jhamman and @keewis suggested for now I make it so that you can `from xarray import DataTree`, using the current xarray-datatree package as an optional dependency. That way I can create a smoother on-ramp, get some more users testing it, but without committing all the code into this repo yet. @pydata/xarray what do you think? Any other thoughts about best practices when moving a good few thousand lines of code into xarray? [x] First step towards moving solution of #4118 into this repository [x] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [x] New functions/methods are listed in `api.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7418/reactions", "total_count": 6, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 2 }		xarray 13221727	pull
2038153739	I_kwDOAMm_X855e8IL	8545	map_blocks should dispatch to ChunkManager	TomNicholas 35968931	open	5	2023-12-12T16:34:13Z	2023-12-22T16:47:27Z		MEMBER			Is your feature request related to a problem? 7019 generalized most of xarrays internals to be able to use any chunked array type that we can create a `ChunkManagerEntrypoint` for. Most functions now go through this (e.g. `apply_ufunc`), but I did not redirect `xarray.map_blocks` to go through `ChunkManagerEntrypoint`. This redirection works by dispatching to high-level dask.array primitives such as `dask.array.apply_gufunc`, `dask.array.blockwise`, and `dask.array.map_blocks`. However the current implementation of `xarray.map_blocks` is much lower-level, building a custom HLG, so it was not obvious how to swap it out. Describe the solution you'd like I would like to either: 1) Replace the current internals of `xarray.map_blocks` with a simple call to `ChunkManagerEntrypoint.map_blocks`. This would be the cleanest separation of concerns we could do here. Presumably there is some obvious reason why this cannot or should not be done, but I have yet to understand what that reason is. (either @dcherian or @tomwhite can you enlighten me perhaps? 🙏) 2) (More likely) refactor so that the existing guts of `xarray.map_blocks` are only called from the `ChunkManagerEntrypoint`, and a non-dask chunked array (i.e. cubed, but in theory other types too) would be able to specify how it wants to perform the map_blocks. Describe alternatives you've considered Leaving it as the status quo breaks the nice abstraction and separation of concerns that #7019 introduced. Additional context Split off from https://github.com/pydata/xarray/issues/8414	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8545/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1820788594	PR_kwDOAMm_X85WW40r	8019	Generalize cumulative reduction (scan) to non-dask types	TomNicholas 35968931	closed	2	2023-07-25T17:22:07Z	2023-12-18T19:30:18Z	2023-12-18T19:30:18Z	MEMBER	0	pydata/xarray/pulls/8019	[x] Needed for https://github.com/tomwhite/cubed/issues/277#issuecomment-1648567431 - should have been added in #7019 [ ] ~~Tests added~~ (would go in cubed-xarray) [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [x] New functions/methods are listed in `api.rst` (new ABC method will be documented on chunked array types page automatically)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8019/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1048697792	PR_kwDOAMm_X84uSksS	5961	[Experimental] Refactor Dataset to store variables in a manifest	TomNicholas 35968931	closed	7	2021-11-09T14:51:03Z	2023-12-06T17:38:53Z	2023-12-06T17:38:52Z	MEMBER	0	pydata/xarray/pulls/5961	This PR is part of an experiment to see how to integrate a `DataTree` into xarray. What is does is refactor `Dataset` to store variables in a `DataManifest` class, which is also capable of maintaining a ledger of child tree nodes. The point of this is to prevent name collisions between stored variables and child datatree nodes, as first mentioned in https://github.com/TomNicholas/datatree/issues/38 and explained further in https://github.com/TomNicholas/datatree/issues/2. ("Manifest" in the old sense, of a noun meaning "a document giving comprehensive details of a ship and its cargo and other contents") [x] Would eventually close https://github.com/TomNicholas/datatree/issues/38 [ ] Tests added [x] Passes `pre-commit run --all-files` [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5961/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1084220684	PR_kwDOAMm_X84wDPg5	6086	Type protocol for internal variable mapping	TomNicholas 35968931	closed	9	2021-12-19T23:32:04Z	2023-12-06T17:20:48Z	2023-12-06T17:19:30Z	MEMBER	1	pydata/xarray/pulls/6086	In #5961 and #6083 I've been experimenting extending `Dataset` to store variables in a custom mapping object (instead of always in a `dict`), so as to eventually fix this mutability problem with `DataTree`. I've been writing out new storage class implementations in those PRs, but on Friday @shoyer suggested that I could instead simply alter the allowed type for `._variables` in `xarray.Dataset`'s type hints. That would allow me to mess about with storage class implementations outside of xarray, whilst guaranteeing type compatibility with xarray `main` itself with absolutely minimal changes (hopefully no runtime changes to `Dataset` at all!). The idea is to define a protocol in xarray which specifies the structural subtyping behaviour of any custom variable storage class that I might want to set as `Dataset._variables`. The type hint for the `._variables` attribute then refers to this protocol, and will be satisfied as long as whatever object is set as `._variables` has compatibly-typed methods. Adding type hints to the `._construct_direct` and `._replace` constructors is enough to propagate this new type specification all over the codebase. In practice this means writing a protocol which describes the type behaviour of all the methods on `dict` that currently get used by `._variable` accesses. So far I've written out a `CopyableMutableMapping` protocol which defines all the methods needed. The issues I'm stuck on at the moment are: 1) The typing behaviour of overloaded methods, specifically `update`. (`setdefault` also has similar problems but I think I can safely omit that from the protocol definition because we don't call `._variables.setdefault()` anywhere.) Mypy complains that `CopyableMutableMapping` is not a compatible type when `Dict` is specified because the type specification of overloaded methods isn't quite right somehow: ``` xarray/core/computation.py:410: error: Argument 1 to "_construct_direct" of "Dataset" has incompatible type "Dict[Hashable, Variable]"; expected "CopyableMutableMapping[Hashable, Variable]" [arg-type] xarray/core/computation.py:410: note: Following member(s) of "Dict[Hashable, Variable]" have conflicts: xarray/core/computation.py:410: note: Expected: xarray/core/computation.py:410: note: @overload xarray/core/computation.py:410: note: def update(self, other: Mapping[Hashable, Variable], kwargs: Variable) -> None xarray/core/computation.py:410: note: @overload xarray/core/computation.py:410: note: def update(self, other: Iterable[Tuple[Hashable, Variable]], kwargs: Variable) -> None xarray/core/computation.py:410: note: <1 more overload not shown> xarray/core/computation.py:410: note: Got: xarray/core/computation.py:410: note: @overload xarray/core/computation.py:410: note: def update(self, Mapping[Hashable, Variable], kwargs: Variable) -> None xarray/core/computation.py:410: note: @overload xarray/core/computation.py:410: note: def update(self, Iterable[Tuple[Hashable, Variable]], kwargs: Variable) -> None ``` I don't understand what the inconsistency is because I literally looked up the exact way that [the type stubs](https://github.com/python/typeshed/blob/e6911530d4d52db0fbdf05be3aff89e520ee39bc/stdlib/typing.pyi#L490) for `Dict` were written (via `MutableMapping`). 2) Making functions which expect a `Mapping` accept my `CopyableMutableMapping`. I would have thought this would just work because I think my protocol defines all the methods which `Mapping` has, so `CopyableMutableMapping` should automatically become a subtype of `Mapping`. But instead I get errors like this with no further information as to what to do about it. ```xarray/core/dataset.py:785: error: Argument 1 to "Frozen" has incompatible type "CopyableMutableMapping[Hashable, Variable]"; expected "Mapping[Hashable, Variable]" [arg-type]``` 3) I'm expecting to get a runtime problem whenever we `assert isinstance(ds._variables, dict)`, which happens in a few places. I'm no sure what the best way to deal with that is, but I'm hoping that simply adding `@typing.runtime_checkable` to the protocol class definition will be enough? Once that passes mypy I will write a test that checks that if I define my own custom variable storage class I can `_construct_direct` a `Dataset` which uses it without any errors. At that point I can be confident that `Dataset` is general enough to hold whichever exact variable storage class I end up needing for `DataTree`. @max-sixty this is entirely a typing challenge, so I'm tagging you in case you're interested :) [ ] Would supercede #5961 and #6083 [ ] Tests added [ ] Passes `pre-commit run --all-files` EDIT: Also using `Protocol` at all is only available in Python 3.8+	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6086/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2027231531	I_kwDOAMm_X8541Rkr	8524	PR labeler bot broken and possibly dead	TomNicholas 35968931	open	2	2023-12-05T22:23:44Z	2023-12-06T15:33:42Z		MEMBER			What is your issue? The PR labeler bot seems to be broken https://github.com/pydata/xarray/actions/runs/7107212418/job/19348227101?pr=8404 and even worse the repository has been archived! https://github.com/andymckay/labeler I actually like this bot, but unless a similar bot exists somewhere else I guess we should just delete this action 😞	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8524/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	reopened	xarray 13221727	issue
2027528985	PR_kwDOAMm_X85hQBHP	8525	Remove PR labeler bot	TomNicholas 35968931	closed	3	2023-12-06T02:31:56Z	2023-12-06T02:45:46Z	2023-12-06T02:45:41Z	MEMBER	0	pydata/xarray/pulls/8525	RIP [x] Closes #8524	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8525/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1974681146	PR_kwDOAMm_X85edMm-	8404	Hypothesis strategy for generating Variable objects	TomNicholas 35968931	closed	6	2023-11-02T17:04:03Z	2023-12-05T22:45:57Z	2023-12-05T22:45:57Z	MEMBER	0	pydata/xarray/pulls/8404	Breaks out just the part of #6908 needed for generating arbitrary `xarray.Variable` objects. (so ignore the ginormous number of commits) EDIT: Check out this test which performs a mean on any subset of any Variable object! ```python In [36]: from xarray.testing.strategies import variables In [37]: variables().example() <xarray.Variable (ĭ: 3)> array([-2.22507386e-313-6.62447795e+016j, nan-6.46207519e+185j, -2.22507386e-309+3.33333333e-001j]) ``` @andersy005 @maxrjones @jhamman I thought this might be useful for the `NamedArray` testing. (xref #8370 and #8244) @keewis and @Zac-HD sorry for letting that PR languish for literally a year :sweat_smile: This PR addresses your feedback about accepting a callable that returns a strategy generating arrays. That suggestion makes some things a bit more complex in user code but actually allows me to simplify the internals of the `variables` strategy significantly. I'm actually really happy with this PR - I think it solves what we were discussing, and is a sensible checkpoint to merge before going back to making strategies for generating composite objects like DataArrays/Datasets work. [x] Closes part of #6911 [x] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [x] New functions/methods are listed in `api.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8404/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2019594436	I_kwDOAMm_X854YJDE	8496	Dataset.dims should return a set, not a dict of sizes	TomNicholas 35968931	open	8	2023-11-30T22:12:37Z	2023-12-02T03:10:14Z		MEMBER			What is your issue? This is inconsistent: ```python In [25]: ds Out[25]: <xarray.Dataset> Dimensions: (x: 1, y: 2) Dimensions without coordinates: x, y Data variables: a (x, y) int64 0 1 In [26]: ds['a'].dims Out[26]: ('x', 'y') In [27]: ds['a'].sizes Out[27]: Frozen({'x': 1, 'y': 2}) In [28]: ds.dims Out[28]: Frozen({'x': 1, 'y': 2}) In [29]: ds.sizes Out[29]: Frozen({'x': 1, 'y': 2}) ``` Surely `ds.dims` should return something like a `Frozenset({'x', 'y'})`? (because dimension order is meaningless when you have multiple arrays underneath - see https://github.com/pydata/xarray/issues/8498)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8496/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
2017285297	PR_kwDOAMm_X85gtObP	8491	Warn on repeated dimension names during construction	TomNicholas 35968931	closed	13	2023-11-29T19:30:51Z	2023-12-01T01:37:36Z	2023-12-01T00:40:18Z	MEMBER	0	pydata/xarray/pulls/8491	[x] Closes #2226 and #1499 by forbidding those situations (but we should leave #3731 open as the "official" place to discuss supporting repeated dimensions [x] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8491/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
552500673	MDU6SXNzdWU1NTI1MDA2NzM=	3709	Feature Proposal: `xarray.interactive` module	TomNicholas 35968931	closed	36	2020-01-20T20:42:22Z	2023-10-27T18:24:49Z	2021-07-29T15:37:21Z	MEMBER			Feature proposal: `xarray.interactive` module I've been experimenting with ipython widgets in jupyter notebooks, and I've been working on how we might use them to make xarray more interactive. Motivation: For most users who are exploring their data, it will be common to find themselves rerunning the same cells repeatedly but with slightly different values. In `xarray`'s case that will often be in an `.isel()` or `.sel()` call, or selecting variables from a dataset. IPython widgets allow you to interact with your functions in a very intuitive way, which we could exploit. There are lots of tutorials on how to interact with `pandas` data (e.g. this great one), but I haven't seen any for interacting with `xarray` objects. Relationship to other libraries: Some downstream plotting libaries (such as @hvplot) already use widgets when interactively plotting xarray-derived data structures, but they don't seem to go the full N dimensions. This also isn't something that should be confined to plotting functions - you often choose slices or variables at the start of analysis, not just at the end. I'll come back to this idea later. The default ipython widgets are pretty good, but we could write an `xarray.interactive` module in such a way that downstream developers can easily replace them with their own widgets. Usage examples: ```python imports import ipywidgets as widgets import xarray.plot as xplot import xarray.interactive as interactive Load tutorial data ds = xr.tutorial.open_dataset('air_temperature')['air'] ``` Plotting against multiple dimensions interactively `python interactive.isel(da, xplot.plot, lat=10, lon=50)` Interactively select a range from a dimension `python def plot_mean_over_time(da): da.mean(dim=time) interactive.isel(da, plot_mean_over_time, time=slice(100, 500))` Animate over one dimension `python from ipywidgets import Play interactive.isel(da, xplot.plot, time=Play())` API ideas: We can write a function like this `python interactive.isel(da, func=xplot.plot, time=10)` which could also be used as a decorator something like this `python @interactive.isel(da, time=10) def plot(da) return xplot.plot(da)` It would be nicer to be able to do this `python @Interactive(da).isel(time=10) def plot(da) return xplot.plot(da)` but Guido forbade it. But we can attach these functions to an accessor to get `python da.interactive.isel(xplot.plot, time=10)` Other ideas Select variables from datasets ```python @interactive.data_vars(da1=ds['n'], da2=ds['T'], ...) def correlation(da1, da2, ...) ... Would produce a dropdown list of variables for each dataset ``` Choose dimensions to apply functions over ```python @interactive.dims(dim='time') def mean(da, dim) ... Would produce a dropdown list of dimensions in the dataarray ``` General `interactive.explore()` method to see variation over any number of dimensions, the default being all of them. What do people think about this? Is it something that makes sense to include within xarray itself? (Dependencies aren't a problem because it's fine to have `ipywidgets` as an optional dependency just for this module.)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3709/reactions", "total_count": 6, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 3, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1806973709	PR_kwDOAMm_X85VoNVM	7992	Docs page on interoperability	TomNicholas 35968931	closed	3	2023-07-17T05:02:29Z	2023-10-26T16:08:56Z	2023-10-26T16:04:33Z	MEMBER	0	pydata/xarray/pulls/7992	Builds upon #7991 by adding a page to the internals enumerating all the different ways in which xarray is interoperable. Would be nice if https://github.com/pydata/xarray/pull/6975 were merged so that I could link to it from this new page. [x] Addresses comment in https://github.com/pydata/xarray/pull/6975#issuecomment-1246487152 [ ] ~~Tests added~~ [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7992/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1036473974	PR_kwDOAMm_X84tsaL3	5900	Add .chunksizes property	TomNicholas 35968931	closed	2	2021-10-26T15:51:09Z	2023-10-20T16:00:15Z	2021-10-29T18:12:22Z	MEMBER	0	pydata/xarray/pulls/5900	Adds a new `.chunksizes` property to `Dataset`, `DataArray` and `Variable`, which returns a mapping from dimensions names to chunk sizes in all cases. Supercedes #5846 because this PR is backwards-compatible. [x] Closes #5843 [x] Tests added [x] Passes `pre-commit run --all-files` [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [x] New functions/methods are listed in `api.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5900/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1790161818	PR_kwDOAMm_X85UvI4i	7963	Suggest installing dask when not discovered by ChunkManager	TomNicholas 35968931	open	2	2023-07-05T19:34:06Z	2023-10-16T13:31:44Z		MEMBER	0	pydata/xarray/pulls/7963	[x] Closes #7962 [ ] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7963/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1812811751	I_kwDOAMm_X85sDU_n	8008	"Deep linking" disparate documentation resources together	TomNicholas 35968931	open	3	2023-07-19T22:18:55Z	2023-10-12T18:36:52Z		MEMBER			What is your issue? Our docs have a general issue with having lots of related resources that are not necessarily linked together in a useful way. This results in users (including myself!) getting "stuck" in one part of the docs and being unaware of material that would help them solve their specific issue. To give a concrete example, if a user wants to know about `coarsen`, there is relevant material: In the coarsen class docstring On the reshaping page On the computations page On the "how do I?" page On the tutorial repository Different types of material are great, but only some of these resources are linked to others. `Coarsen` is actually pretty well covered overall, but for other functions there might be no useful linking at all, or no examples in the docstrings. The biggest missed opportunity here is the way all the great content on the tutorial.xarray.dev repository is not linked from anywhere on the main documentation site (I believe). To address that we could either (a) integrate the `tutorial.xarray.dev` material into the main site or (b) add a lot more cross-linking between the two sites. Identifying sections that could be linked and adding links would be a great task for new contributors.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8008/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
602218021	MDU6SXNzdWU2MDIyMTgwMjE=	3980	Make subclassing easier?	TomNicholas 35968931	open	9	2020-04-17T20:33:13Z	2023-10-04T16:27:28Z		MEMBER			Suggestion We relatively regularly have users asking about subclassing `DataArray` and `Dataset`, and I know of at least a few cases where people have gone through with it. However we currently explicitly discourage doing this, on the basis that basically all operations will return a bare xarray object instead of the subclassed version, it's full of trip hazards, and we have the accessor interface to point people to instead. However, while useful, the accessors aren't enough for some users, and I think we could probably do better. If we refactored internally we might be able to make it much easier to subclass. Example to follow in Pandas Pandas takes an interesting approach: while they also explicitly discourage subclassing, they still try to make it easier, and show you what you need to do in order for it to work. They ask you to override some constructor properties with your own, and allow you to define your own original properties. Potential complications `.construct_dataarray` and `DataArray.__init__` are used a lot internally to reconstruct a DataArray from `dims`, `coords`, `data` etc. before returning the result of a method call. We would probably need to standardise this, before allowing users to override it. Pandas actually has multiple constructor properties you need to override: `_constructor`, `_constructor_sliced`, and `_constructor_expanddim`. What's the minimum set of similar constructors we would need? Blocking access to attributes - we current stop people from adding their own attributes quite aggressively, so that we can have attributes as an alias for variables and attrs, we would need to either relax this or better allow users to set a list of their own `_properties` which they want to register, similar to pandas. `__slots__` - I think something funky can happen if you inherit from a class that defines `__slots__`? Documentation I think if we do this we should also slightly refactor the relevant docs to make clear the distinction between 3 groups of people: - Users - People who import and use xarray at the top-level with (ideally) no particular concern as to how it works. This is who the vast majority of the documentation is for. - Developers - People who are actually improving and developing xarray upstream. This is who the Contributing to xarray page is for. - Extenders - People who want to subclass, accessorize or wrap xarray objects, in order to do something more complicated. These people are probably writing a domain-specific library which will then bring in a new set of users. There maybe aren't as many of these people, but they are really important IMO. This is implicitly who the xarray internals page is aimed at, but it would be nice to make that distinction much more clear. It might also be nice to give them a guide as to "I want to achieve X, should I use wrapping/subclassing/accessors?" @max-sixty you had some ideas about what would need to be done for this to work?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3980/reactions", "total_count": 11, "+1": 11, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1801393806	PR_kwDOAMm_X85VVV4q	7981	Document that Coarsen accepts coord func as callable	TomNicholas 35968931	open	0	2023-07-12T17:01:31Z	2023-09-19T01:18:49Z		MEMBER	0	pydata/xarray/pulls/7981	Documents a hidden feature I noticed yesterday, corrects incorrect docstrings, and tidies up some of the typing internally. [ ] ~~Closes #xxxx~~ [ ] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7981/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1083507645	PR_kwDOAMm_X84wBDeq	6083	Manifest as variables attribute	TomNicholas 35968931	closed	2	2021-12-17T18:14:26Z	2023-09-14T15:37:38Z	2023-09-14T15:37:37Z	MEMBER	1	pydata/xarray/pulls/6083	Another attempt like #5961 @shoyer [ ] Closes #xxxx [ ] Tests added [ ] Passes `pre-commit run --all-files` [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] New functions/methods are listed in `api.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6083/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
663235664	MDU6SXNzdWU2NjMyMzU2NjQ=	4243	Manually drop DataArray from memory?	TomNicholas 35968931	closed	3	2020-07-21T18:54:40Z	2023-09-12T16:17:12Z	2023-09-12T16:17:12Z	MEMBER			Is it possible to deliberately drop data associated with a particular DataArray from memory? Obviously `da.close()` exists, but what happens if you did for example `python ds = open_dataset(file) da = ds[var] da.compute() # something that loads da into memory da.close() # is the memory freed up again now? ds.something() # what about now?` Also does calling python's built-in garbage collector (i.e. `gc.collect()`) do anything in this instance? The context of this question is that I'm trying to resave some massive variables (~65GB each) that were loaded from thousands of files into just a few files for each variable. I would love to use @rabernat 's new rechunker package but I'm not sure how easily I can convert my current netCDF data to Zarr, and I'm interested in this question no matter how I end up solving the problem. I don't currently have a particularly good understanding of file I/O and memory management in xarray, but would like to improve it. Can anyone recommend a tool I can use to answer this kind of question myself on my own machine? I suppose it would need to be able to tell me the current memory usage of specific objects, not just the total memory usage. (@johnomotani you might be interested)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4243/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1806949831	PR_kwDOAMm_X85VoH2o	7991	Docs page on internal design	TomNicholas 35968931	closed	1	2023-07-17T04:46:55Z	2023-09-08T15:41:32Z	2023-09-08T15:41:32Z	MEMBER	0	pydata/xarray/pulls/7991	Adds a new page to the xarray internals documentation giving an overview of the internal design of xarray. This should be helpful for xarray contributors and for developers of extensions because nowhere in the docs does it really explain how `DataArray` and `Dataset` are constructed from `Variable`. [ ] ~~Closes #xxxx~~ [ ] ~~Tests added~~ [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7991/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1812188730	I_kwDOAMm_X85sA846	8004	Rotation Functional Index example	TomNicholas 35968931	open	2	2023-07-19T15:23:20Z	2023-08-24T13:26:56Z		MEMBER			Is your feature request related to a problem? I'm trying to think of an example that would demonstrate the "functional index" pattern discussed in https://github.com/pydata/xarray/issues/3620. I think a 2D rotation is the simplest example of an analytically-expressible, non-trivial, domain-agnostic case where you might want to back a set of multiple coordinates with a single functional index. It's also nice because there is additional information that must be passed and stored (the angle of the rotation), but that part is very simple, and domain-agnostic. I'm proposing we make this example work and put it in the custom index docs. I had a go at making that example (notebook here) @benbovy, but I'm confused about a couple of things: 1) How do I implement `.sel` in such a way that it supports indexing with slices (i.e. to crop my image) 2) How can I make this lazy? 3) Should the implementation be a "MetaIndex" (i.e. wrapping some pandas indexes)? Describe the solution you'd like No response Describe alternatives you've considered No response Additional context This example is inspired by @jni's use case in napari, where (IIUC) they want to do a lazy functional affine transformation from pixel to physical coordinates, where the simplest example of such a transform might be a linear shear (caused by the imaging focal plane being at an angle to the physical sample).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8004/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1368740629	PR_kwDOAMm_X84-uWtE	7019	Generalize handling of chunked array types	TomNicholas 35968931	closed	30	2022-09-10T22:02:18Z	2023-07-24T20:40:29Z	2023-05-18T17:34:31Z	MEMBER	0	pydata/xarray/pulls/7019	Initial attempt to get cubed working within xarray, as an alternative to dask. [x] Closes #6807, at least for the case of cubed [x] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [x] New functions/methods are listed in `api.rst` [x] Correct type hints I've added a `manager` kwarg to the `.chunk` methods so you can do `da.chunk(manager="cubed")` to convert to a chunked `cubed.CoreArray`, with the default still being `da.chunk(manager="dask")`. (I couldn't think of a better name than "manager", as "backend" and "executor" are already taken.) ~~At the moment it should work except for an import error that I don't understand, see below.~~ Fro cubed to work at all with this PR we would also need: - [x] Cubed to expose the correct array type consistently https://github.com/tomwhite/cubed/issues/123 - [x] A cubed version of `apply_gufunc` https://github.com/tomwhite/cubed/pull/119 - implemented in https://github.com/tomwhite/cubed/pull/149 :partying_face: To-dos for me on this PR: - [x] Re-route `xarray.apply_ufunc` through `cubed.apply_gufunc` instead of dask's `apply_gufunc` when appropriate, - [x] Add `from_array_kwargs` to opening functions, e.g. `open_zarr`, and `open_dataset`, - [x] Add `from_array_kwargs` to creation functions, such as `full_like`, - [x] Add `store_kwargs` as a way to propagate cubed-specific kwargs when saving `to_zarr`. To complete this project more generally we should also: - [ ] Have `cubed.apply_gufunc` support multiple output arguments https://github.com/tomwhite/cubed/issues/152 - [x] Have a top-level `cubed.unify_chunks` to match `dask.array.core.unify_chunks` - [ ] Write a test suite for wrapping cubed arrays, which would be best done via #6894 - [ ] Generalise `xarray.map_blocks` to work on cubed arrays, ideally by first rewriting xarray's implementation of `map_blocks` to use `dask.array.map_blocks` cc @tomwhite	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7019/reactions", "total_count": 4, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 2, "eyes": 0 }		xarray 13221727	pull
1810167498	PR_kwDOAMm_X85VzHaS	7999	Core team member guide	TomNicholas 35968931	closed	4	2023-07-18T15:26:01Z	2023-07-21T14:51:57Z	2023-07-21T13:48:26Z	MEMBER	0	pydata/xarray/pulls/7999	Adds a guide for core developers of xarray. Mostly adapted from napari's core dev guide, but with some extra sections and ideas from the pandas maintainance guide. @pydata/xarray please give your feedback on this! If you prefer to give feedback in a non-public channel for whatever reason then please use the private core team email. [ ] ~~Closes #xxxx~~ [ ] ~~Tests added~~ [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7999/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1801849622	I_kwDOAMm_X85rZgsW	7982	Use Meilisearch in our docs	TomNicholas 35968931	closed	1	2023-07-12T22:29:45Z	2023-07-19T19:49:53Z	2023-07-19T19:49:53Z	MEMBER			Is your feature request related to a problem? Just saw this cool search thing for sphinx in a lightning talk at SciPy called Meilisearch Cc @dcherian Describe the solution you'd like Read about it here https://sphinxdocs.ansys.com/version/stable/user_guide/options.html Describe alternatives you've considered No response Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7982/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1807782455	I_kwDOAMm_X85rwJI3	7996	Stable docs build not showing latest changes after release	TomNicholas 35968931	closed	3	2023-07-17T13:24:58Z	2023-07-17T20:48:19Z	2023-07-17T20:48:19Z	MEMBER			What happened? I released xarray version v2023.07.0 last night, but I'm not seeing changes to the documentation reflected in the `https://docs.xarray.dev/en/stable/` build. (In particular the Internals section now should have an entire extra page on wrapping chunked arrays.) I can however see the newest additions on `https://docs.xarray.dev/en/latest/` build. Is that how it's supposed to work? What did you expect to happen? No response Minimal Complete Verifiable Example No response MVCE confirmation [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [ ] Complete example — the example is self-contained, including all data and the text of any traceback. [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [ ] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? No response Environment	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7996/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1807044282	PR_kwDOAMm_X85VodDN	7993	Update whats-new.rst for new release	TomNicholas 35968931	closed	0	2023-07-17T06:03:19Z	2023-07-17T06:03:43Z	2023-07-17T06:03:42Z	MEMBER	0	pydata/xarray/pulls/7993	Needed because I started the release process earlier this week by writing a whatsnew, that apparently got merged, but the release hasn't been issued since. I'll self-merge this and release now.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7993/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1799476089	PR_kwDOAMm_X85VO0Wz	7979	Release summary for v2023.07.0	TomNicholas 35968931	closed	0	2023-07-11T17:59:28Z	2023-07-13T16:33:43Z	2023-07-13T16:33:43Z	MEMBER	0	pydata/xarray/pulls/7979		{ "url": "https://api.github.com/repos/pydata/xarray/issues/7979/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1753401384	PR_kwDOAMm_X85Szs7X	7911	Duck array documentation improvements	TomNicholas 35968931	closed	0	2023-06-12T19:10:41Z	2023-07-10T09:36:05Z	2023-06-29T14:39:22Z	MEMBER	0	pydata/xarray/pulls/7911	Draft improvements to the user guide page on using duck arrays. Intended as part of the scipy tutorial effort, though I wasn't sure whether to concentrate on content in the main xarray docs or the tutorial repo. (I wrote this on a train without enough internet to update my conda environment so I will come back and fix anything that doesn't run.) [x] Part of https://github.com/xarray-contrib/xarray-tutorial/issues/170 [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` cc @dcherian and @keewis	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7911/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1779880070	PR_kwDOAMm_X85UMTE7	7951	Chunked array docs	TomNicholas 35968931	closed	3	2023-06-28T23:01:42Z	2023-07-05T20:33:33Z	2023-07-05T20:08:19Z	MEMBER	0	pydata/xarray/pulls/7951	Builds upon #7911 [x] Documentation for #7019 [ ] ~~Tests added~~ [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7951/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1786830423	PR_kwDOAMm_X85Uj4NA	7960	Update minimum version of typing extensions in pre-commit	TomNicholas 35968931	closed	1	2023-07-03T21:27:40Z	2023-07-05T19:09:04Z	2023-07-05T15:43:40Z	MEMBER	0	pydata/xarray/pulls/7960	Attempt to fix the pre-commit build failure I keep seeing in the CI (e.g. this failure from https://github.com/pydata/xarray/pull/7881)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7960/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1773373878	PR_kwDOAMm_X85T2T_2	7941	Allow cubed arrays to be passed to flox groupby	TomNicholas 35968931	closed	0	2023-06-25T16:48:56Z	2023-06-26T15:28:06Z	2023-06-26T15:28:03Z	MEMBER	0	pydata/xarray/pulls/7941	Generalizes a small check for chunked arrays in groupby so it now allows cubed arrays through to flox rather than just dask arrays. Does not actually mean that flox groupby will work with cubed yet though, see https://github.com/tomwhite/cubed/issues/223 and https://github.com/xarray-contrib/flox/issues/224 [x] Should have been done in #7019 [ ] ~~Tests added~~ (The place to test this would be in [`cubed-xarray`] [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7941/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1768095127	PR_kwDOAMm_X85Tkubk	7934	Release summary for v2023.06.0	TomNicholas 35968931	closed	4	2023-06-21T17:34:29Z	2023-06-23T03:02:12Z	2023-06-23T03:02:11Z	MEMBER	0	pydata/xarray/pulls/7934	Release summary: This release adds features to `curvefit`, improves the performance of concatenation, and fixes various bugs. For some reason when I try to use `git log "$(git tag --sort=v:refname \| tail -1).." --format=%aN \| sort -u \| perl -pe 's/\n/$1, /'` to return the list of all contributors since last release, it only returns Deepak :laughing: I'm not sure what's going wrong there - I definitely have all the git tags fetched, and other people have definitely contributed since the last version!	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7934/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1742035781	I_kwDOAMm_X85n1VtF	7894	Can a "skipna" argument be added for Dataset.integrate() and DataArray.integrate()?	TomNicholas 35968931	open	2	2023-06-05T15:32:35Z	2023-06-05T21:59:45Z		MEMBER			Discussed in https://github.com/pydata/xarray/discussions/5283 <sup>Originally posted by chfite May 9, 2021</sup> I am using the Dataset.integrate() function and noticed that because one of my variables has a NaN in it the function returns a NaN for the integrated value for that variable. I know based on the trapezoidal rule one could not get an integrated value at the location of the NaN, but is it not possible for it to calculate the integrated values where there were regular values? Assuming 0 for NaNs does not work because it would still integrate between the values before and after 0 and add additional area I do not want. Using DataArray.dropna() also is not sufficient because it would assume the value before the NaN is then connected to the value after the NaN and again add additional area that I would not want included. If a "skipna" functionality or something could not be added to the integrate function, does anyone have a suggestion for another way to get around to calculating my integrated area while excluding the NaNs?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7894/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1716200316	PR_kwDOAMm_X85Q1k5D	7847	Array API fixes for astype	TomNicholas 35968931	closed	0	2023-05-18T20:09:32Z	2023-05-19T15:11:17Z	2023-05-19T15:11:16Z	MEMBER	0	pydata/xarray/pulls/7847	Follows on from #7067 and #6804, ensuring that we call `xp.astype()` on arrays rather than `arr.astype()`, as the latter is commonly-implemented by array libraries but not part of the array API standard. A bit of a pain to test in isolation because I made the changes so that xarray's .pad would work with array-API-conforming libraries, but actually `np.pad` is not part of the array API either, so it's going to coerce to numpy for that reason anyway. (This PR replaces #7815, as making a new branch was easier than merging/rebasing with all the changes in #7019.) [ ] ~~Closes #xxxx~~ [ ] ~~Tests added~~ [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7847/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1716345200	PR_kwDOAMm_X85Q2EmD	7849	Whats new for release of v2023.05.0	TomNicholas 35968931	closed	0	2023-05-18T22:30:32Z	2023-05-19T02:18:03Z	2023-05-19T02:17:55Z	MEMBER	0	pydata/xarray/pulls/7849	Summary: This release adds some new methods and operators, updates our deprecation policy for python versions, fixes some bugs with groupby, and introduces experimental support for alternative chunked parallel array computation backends via a new plugin system!	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7849/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1695244129	PR_kwDOAMm_X85PvJSS	7815	Array API fixes for astype	TomNicholas 35968931	closed	2	2023-05-04T04:33:52Z	2023-05-18T20:10:48Z	2023-05-18T20:10:43Z	MEMBER	0	pydata/xarray/pulls/7815	While it's common for duck arrays to have a `.astype` method, this doesn't exist in the new array API standard. We now have `duck_array_ops.astype` to deal with this, but for some reason changing it in just a couple more places broke practically every pint test in `test_units.py` :confused: @keewis Builds on top of #7019 with just one extra commit to separate out this issue. [ ] Closes #xxxx [ ] Tests added [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] New functions/methods are listed in `api.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7815/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1308715638	I_kwDOAMm_X85OAWp2	6807	Alternative parallel execution frameworks in xarray	TomNicholas 35968931	closed	12	2022-07-18T21:48:10Z	2023-05-18T17:34:33Z	2023-05-18T17:34:33Z	MEMBER			Is your feature request related to a problem? Since early on the project xarray has supported wrapping `dask.array` objects in a first-class manner. However recent work on flexible array wrapping has made it possible to wrap all sorts of array types (and with #6804 we should support wrapping any array that conforms to the array API standard). Currently though the only way to parallelize array operations with xarray "automatically" is to use dask. (You could use xarray-beam or other options too but they don't "automatically" generate the computation for you like dask does.) When dask is the only type of parallel framework exposing an array-like API then there is no need for flexibility, but now we have nascent projects like cubed to consider too. @tomwhite Describe the solution you'd like Refactor the internals so that dask is one option among many, and that any newer options can plug in in an extensible way. In particular cubed deliberately uses the same API as `dask.array`, exposing: 1) the methods needed to conform to the array API standard 2) a `.chunk` and `.compute` method, which we could dispatch to 3) dask-like functions to create computation graphs including `blockwise`, `map_blocks`, and `rechunk` I would like to see xarray able to wrap any array-like object which offers this set of methods / functions, and call the corresponding version of that method for the correct library (i.e. dask vs cubed) automatically. That way users could try different parallel execution frameworks simply via a switch like `python ds.chunk(**chunk_pattern, manager="dask")` and see which one works best for their particular problem. Describe alternatives you've considered If we leave it the way it is now then xarray will not be truly flexible in this respect. Any library can wrap (or subclass if they are really brave) xarray objects to provide parallelism but that's not the same level of flexibility. Additional context cubed repo PR about making xarray able to wrap objects conforming to the new array API standard cc @shoyer @rabernat @dcherian @keewis	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6807/reactions", "total_count": 6, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 3, "rocket": 2, "eyes": 1 }	completed	xarray 13221727	issue
1694956396	I_kwDOAMm_X85lBvts	7813	Task naming for general chunkmanagers	TomNicholas 35968931	open	3	2023-05-03T22:56:46Z	2023-05-05T10:30:39Z		MEMBER			What is your issue? (Follow-up to #7019) When you create a dask graph of xarray operations, the tasks in the graph get useful names according the name of the DataArray they operate on, or whether they represent an `open_dataset` call. Currently for cubed this doesn't work, for example this graph from https://github.com/pangeo-data/distributed-array-examples/issues/2#issuecomment-1533852877: cc @tomwhite @dcherian	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7813/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1468534020	I_kwDOAMm_X85XiA0E	7333	FacetGrid with coords error	TomNicholas 35968931	open	1	2022-11-29T18:42:48Z	2023-04-03T10:12:40Z		MEMBER			There may perhaps be a small bug anyway, as DataArrays with and without coords are handled differently. Contrast: ``` da=xr.DataArray(data=np.random.randn(2,2,2,10,10),coords={'A':['a1','a2'],'B':[0,1],'C':[0,1],'X':range(10),'Y':range(10)}) p=da.sel(A='a1').plot.contour(col='B',row='C') try: p.map_dataarray(xr.plot.pcolormesh, y="B", x="C"); except Exception as e: print('An uninformative error:') print(e) An uninformative error: tuple index out of range ``` with: ``` da=xr.DataArray(data=np.random.randn(2,2,2,10,10)) p=da.sel(dim_0=0).plot.contour(col='dim_1',row='dim_2') try: p.map_dataarray(xr.plot.pcolormesh, y="dim_1", x="dim_2"); except Exception as e: print('A more informative error:') print(e) ``` ``` A more informative error: x must be one of None, 'dim_3', 'dim_4' ``` Originally posted by @joshdorrington in https://github.com/pydata/xarray/discussions/7310#discussioncomment-4257643	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7333/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1188523721	I_kwDOAMm_X85G127J	6431	Bug when padding coordinates with NaNs	TomNicholas 35968931	open	2	2022-03-31T18:57:16Z	2023-03-30T13:33:10Z		MEMBER			What happened? `python da = xr.DataArray(np.arange(9), dim='x') da.pad({'x': (0, 1)}, 'constant', constant_values=np.NAN)` ``` ValueError Traceback (most recent call last) Input In [12], in <cell line: 1>() ----> 1 da.pad({'x': 1}, 'constant', constant_values=np.NAN) File ~/Documents/Work/Code/xarray/xarray/core/dataarray.py:4158, in DataArray.pad(self, pad_width, mode, stat_length, constant_values, end_values, reflect_type, pad_width_kwargs) 4000 def pad( 4001 self, 4002 pad_width: Mapping[Any, int \| tuple[int, int]] \| None = None, (...) 4012 pad_width_kwargs: Any, 4013 ) -> DataArray: 4014 """Pad this array along one or more dimensions. 4015 4016 .. warning:: (...) 4156 z (x) float64 nan 100.0 200.0 nan 4157 """ -> 4158 ds = self._to_temp_dataset().pad( 4159 pad_width=pad_width, 4160 mode=mode, 4161 stat_length=stat_length, 4162 constant_values=constant_values, 4163 end_values=end_values, 4164 reflect_type=reflect_type, 4165 pad_width_kwargs, 4166 ) 4167 return self._from_temp_dataset(ds) File ~/Documents/Work/Code/xarray/xarray/core/dataset.py:7368, in Dataset.pad(self, pad_width, mode, stat_length, constant_values, end_values, reflect_type, pad_width_kwargs) 7366 variables[name] = var 7367 elif name in self.data_vars: -> 7368 variables[name] = var.pad( 7369 pad_width=var_pad_width, 7370 mode=mode, 7371 stat_length=stat_length, 7372 constant_values=constant_values, 7373 end_values=end_values, 7374 reflect_type=reflect_type, 7375 ) 7376 else: 7377 variables[name] = var.pad( 7378 pad_width=var_pad_width, 7379 mode=coord_pad_mode, 7380 coord_pad_options, # type: ignore[arg-type] 7381 ) File ~/Documents/Work/Code/xarray/xarray/core/variable.py:1360, in Variable.pad(self, pad_width, mode, stat_length, constant_values, end_values, reflect_type, pad_width_kwargs) 1357 if reflect_type is not None: 1358 pad_option_kwargs["reflect_type"] = reflect_type # type: ignore[assignment] -> 1360 array = np.pad( # type: ignore[call-overload] 1361 self.data.astype(dtype, copy=False), 1362 pad_width_by_index, 1363 mode=mode, 1364 pad_option_kwargs, 1365 ) 1367 return type(self)(self.dims, array) File <array_function** internals>:5, in pad(args, kwargs) File ~/miniconda3/envs/py39/lib/python3.9/site-packages/numpy/lib/arraypad.py:803, in pad(array, pad_width, mode, kwargs) 801 for axis, width_pair, value_pair in zip(axes, pad_width, values): 802 roi = _view_roi(padded, original_area_slice, axis) --> 803 _set_pad_area(roi, axis, width_pair, value_pair) 805 elif mode == "empty": 806 pass # Do nothing as _pad_simple already returned the correct result File ~/miniconda3/envs/py39/lib/python3.9/site-packages/numpy/lib/arraypad.py:147, in _set_pad_area(padded, axis, width_pair, value_pair) 130 """ 131 Set empty-padded area in given dimension. 132 (...) 144 broadcastable to the shape of `arr`. 145 """ 146 left_slice = _slice_at_axis(slice(None, width_pair[0]), axis) --> 147 padded[left_slice] = value_pair[0] 149 right_slice = _slice_at_axis( 150 slice(padded.shape[axis] - width_pair[1], None), axis) 151 padded[right_slice] = value_pair[1] ValueError: cannot convert float NaN to integer ``` What did you expect to happen? It should have successfully padded with a NaN, same as it does if you don't specify `constant_values`: `python In [14]: da.pad({'x': (0, 1)}, 'constant') Out[14]: <xarray.DataArray (x: 3)> array([ 0., 1., nan]) Dimensions without coordinates: x` Minimal Complete Verifiable Example No response* Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS commit: None python: 3.9.7 \| packaged by conda-forge \| (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.11.0-7620-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.20.3.dev4+gdbc02d4e pandas: 1.4.0 numpy: 1.21.4 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.10.3 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.01.1 distributed: 2022.01.1 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2022.01.0 cupy: None pint: None sparse: None setuptools: 59.6.0 pip: 21.3.1 conda: 4.11.0 pytest: 6.2.5 IPython: 8.2.0 sphinx: 4.4.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6431/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1615570467	PR_kwDOAMm_X85LlkLA	7595	Clarifications in contributors guide	TomNicholas 35968931	closed	5	2023-03-08T16:35:45Z	2023-03-13T17:55:43Z	2023-03-13T17:51:24Z	MEMBER	0	pydata/xarray/pulls/7595	Add suggestions @paigem made in #7439, as well as fix a few small formatting things and broken links. I would like to merge this so that it can be helpful for the new contributors we will hopefully get through Outreachy. [x] Closes #7439 [ ] ~~Tests added~~ [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7595/reactions", "total_count": 2, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 2, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1579829674	PR_kwDOAMm_X85JuG-F	7518	State which variables not present in drop vars error message	TomNicholas 35968931	closed	0	2023-02-10T15:00:35Z	2023-03-09T20:47:47Z	2023-03-09T20:47:47Z	MEMBER	0	pydata/xarray/pulls/7518	Makes the error message more informative [ ] Closes #xxxx [x] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7518/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1588461863	I_kwDOAMm_X85ergEn	7539	Concat doesn't concatenate dimension coordinates along new dims	TomNicholas 35968931	open	4	2023-02-16T22:32:33Z	2023-02-21T19:07:48Z		MEMBER			What is your issue? `xr.concat` doesn't concatenate dimension coordinates along new dimensions, which leads to pretty unintuitive behavior. Take this example (motivated by https://github.com/pydata/xarray/discussions/7532#discussioncomment-4988792) `python segments = [] for i in range(2): time = np.sort(np.random.random(4)) da = xr.DataArray( np.random.randn(4,2), dims=["time", "cols"], coords=dict(time=('time', time), cols=["col1", "col2"]), ) segments.append(da)` python In [86]: segments Out[86]: [<xarray.DataArray (time: 4, cols: 2)> array([[-0.61199576, -0.9012078 ], [-0.54187577, 1.30509994], [-3.53720471, 0.97607797], [ 0.2593455 , 0.95920031]]) Coordinates: * time (time) float64 0.1048 0.168 0.869 0.9432 * cols (cols) <U4 'col1' 'col2', <xarray.DataArray (time: 4, cols: 2)> array([[ 0.90266408, -0.54294821], [-1.09087103, -0.17484417], [-0.21679558, -0.57377412], [ 0.07570151, 0.27433728]]) Coordinates: * time (time) float64 0.03627 0.09754 0.2434 0.592 * cols (cols) <U4 'col1' 'col2'] ```python In [85]: xr.concat(segments, dim='new') Out[85]: <xarray.DataArray (new: 2, time: 8, cols: 2)> array([[[ nan, nan], [ nan, nan], [-0.61199576, -0.9012078 ], [-0.54187577, 1.30509994], [ nan, nan], [ nan, nan], [-3.53720471, 0.97607797], [ 0.2593455 , 0.95920031]], `[[ 0.90266408, -0.54294821], [-1.09087103, -0.17484417], [ nan, nan], [ nan, nan], [-0.21679558, -0.57377412], [ 0.07570151, 0.27433728], [ nan, nan], [ nan, nan]]])` Coordinates: * time (time) float64 0.03627 0.09754 0.1048 0.168 ... 0.592 0.869 0.9432 * cols (cols) <U4 'col1' 'col2' Dimensions without coordinates: new ``` I would have expected to get a result of size `{new: 2, time: 4, cols: 2}`. That would be intuitive, because the default is `coords='different'`, and that would be the result of concatenating each `time` coordinate (which have different values) and just propagating the `cols` coordinate (as they have the same values). Instead what happened is that `xr.concat` treats the dimension coordinates as indexes to align, and defaults to an outer join. This auto-alignment behaviour has been discussed at length before, I'm just trying to point out another place in which its problematic. This is kind of briefly mentioned in the concat docstring under `coords='all'`: `“all”: All coordinate variables will be concatenated, except those corresponding to other dimensions.` but it's not even mentioned under `coords='different'` I don't really know what I would prefer to happen with the coordinates. I guess to have created a `time` coordinate of size `{new: 2, time: 4, cols: 2}`, but then I don't know what that implies for the underlying index. @benbovy do you have any thoughts? At the very least we should make this a lot clearer in the docs.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7539/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1586144997	PR_kwDOAMm_X85KDKDY	7534	Docs page on numpy to xarray	TomNicholas 35968931	open	0	2023-02-15T16:16:53Z	2023-02-15T16:16:53Z		MEMBER	0	pydata/xarray/pulls/7534	[x] Closes #7533 [ ] ~~Tests added~~ [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7534/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1585231355	I_kwDOAMm_X85efLX7	7533	Numpy to xarray docs	TomNicholas 35968931	open	0	2023-02-15T05:13:50Z	2023-02-15T06:28:05Z		MEMBER			We should make a docs page specifically to ease the transition from pure-numpy to xarray. A lot of new xarray users come from already using numpy as their primary data structure. We relatively often get questions about "what's the xarray equivalent of X numpy function" but we don't have a dedicated place to collect those answers, or explain key conceptual differences. I think this deserves its own dedicated docs page, with: - [ ] High-level conceptual differences (e.g. transpose invariance) - [ ] Arguments for the benefits of using xarray over pure numpy - [ ] Table of numpy <-> xarray function equivalents (similar to the existing "How do I..." page) - [ ] Other common recommendations for numpy users (e.g. use netCDF / Zarr instead of `.npz` or pickle to store data on disk) For the table I thought of a few already, but I know there will be a lot more: `np.concatenate`/`np.vstack`/`np.hstack`/`np.stack` → `xr.concat` `np.block` → `xr.combine_nested` `np.apply_along_axis` → `xr.apply_ufunc` `np.polynomial` → `xr.polyfit` `np.reshape` -> `xr.coarsen().construct()`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7533/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1573538162	PR_kwDOAMm_X85JY_1l	7509	Update apply_ufunc output_sizes error message	TomNicholas 35968931	closed	0	2023-02-07T01:35:08Z	2023-02-07T15:45:54Z	2023-02-07T05:01:36Z	MEMBER	0	pydata/xarray/pulls/7509	[x] Closes poor error message reported in https://github.com/pydata/xarray/discussions/7503 [x] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] ~~New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7509/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1366751031	PR_kwDOAMm_X84-n1xC	7011	Add sphinx-codeautolink extension to docs build	TomNicholas 35968931	open	15	2022-09-08T17:43:47Z	2023-02-06T17:55:52Z		MEMBER	1	pydata/xarray/pulls/7011	I think that sphinx-codeautolink is different from `sphinx.ext.linkcode`... [x] Closes #7010 [ ] Tests added [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] New functions/methods are listed in `api.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7011/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1565458372	PR_kwDOAMm_X85I-VC2	7497	Enable datatree * dataset commutativity	TomNicholas 35968931	open	0	2023-02-01T05:24:53Z	2023-02-03T17:32:20Z		MEMBER	0	pydata/xarray/pulls/7497	Change binary operations involving `DataTree` objects and `Dataset` objects to be handled by the `DataTree` class. Necessary to enable `ds * dt` to return the same type as `dt * ds`. Builds on top of #7418. [x] Closes https://github.com/xarray-contrib/datatree/issues/146 [x] Tests added [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] New functions/methods are listed in `api.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7497/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1549861293	I_kwDOAMm_X85cYQGt	7459	Error when broadcast given int	TomNicholas 35968931	open	0	2023-01-19T19:59:31Z	2023-01-19T21:11:12Z		MEMBER			What happened? Unhelpful error raised by `xr.broadcast` when supplied with an int. What did you expect to happen? The broadcast to succeed I think? Minimal Complete Verifiable Example ```Python In [1]: import xarray as xr In [2]: da = xr.DataArray([5, 4], dims='x') In [3]: xr.broadcast(da, 1) AttributeError Traceback (most recent call last) Cell In[3], line 1 ----> 1 xr.broadcast(da, 1) File ~/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/core/alignment.py:1049, in broadcast(exclude, args) 1047 if exclude is None: 1048 exclude = set() -> 1049 args = align(args, join="outer", copy=False, exclude=exclude) 1051 dims_map, common_coords = _get_broadcast_dims_map_common_coords(args, exclude) 1052 result = [_broadcast_helper(arg, exclude, dims_map, common_coords) for arg in args] File ~/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/core/alignment.py:772, in align(join, copy, indexes, exclude, fill_value, objects) 576 """ 577 Given any number of Dataset and/or DataArray objects, returns new 578 objects with aligned indexes and dimension sizes. (...) 762 763 """ 764 aligner = Aligner( 765 objects, 766 join=join, (...) 770 fill_value=fill_value, 771 ) --> 772 aligner.align() 773 return aligner.results File ~/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/core/alignment.py:556, in Aligner.align(self) 553 self.results = (obj.copy(deep=self.copy),) 554 return --> 556 self.find_matching_indexes() 557 self.find_matching_unindexed_dims() 558 self.assert_no_index_conflict() File ~/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/core/alignment.py:262, in Aligner.find_matching_indexes(self) 259 objects_matching_indexes = [] 261 for obj in self.objects: --> 262 obj_indexes, obj_index_vars = self._normalize_indexes(obj.xindexes) 263 objects_matching_indexes.append(obj_indexes) 264 for key, idx in obj_indexes.items(): AttributeError: 'int' object has no attribute 'xindexes' ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response* Anything else we need to know? This clearly has something to do with a change in the flexible indexes refactor, as it complains about `.xindexes` not being present. @benbovy Environment The `main` branch	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7459/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1536556849	I_kwDOAMm_X85blf8x	7447	Add Align to terminology page	TomNicholas 35968931	open	0	2023-01-17T15:15:16Z	2023-01-17T15:15:16Z		MEMBER			Is your feature request related to a problem? The terminology docs page mostly contains explanation of available classes. It should also contain explanation of words we use to describe relationships between those classes. For example the docstring on `xr.align` just says "Given any number of Dataset and/or DataArray objects, returns new objects with aligned indexes and dimension sizes.", but there is no link given to a definition of what we mean by "aligned". Describe the solution you'd like No response Describe alternatives you've considered No response Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7447/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1365266461	PR_kwDOAMm_X84-it_s	7006	Fix decorators in ipython code blocks in docs	TomNicholas 35968931	open	0	2022-09-07T22:38:07Z	2023-01-15T18:11:17Z		MEMBER	0	pydata/xarray/pulls/7006	There was a bug in ipython's sphinx extension causing decorators to be skipped when evaluating code blocks. I assume that's why there is this weird workaround in the docs page on defining accessors (which uses decorators). I fixed that bug, and the fix is in the most recent release of ipython, so this PR bumps our ipython version for the docs, and removes the workaround. [ ] Closes #xxxx [ ] Tests added [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] New functions/methods are listed in `api.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7006/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1512290017	I_kwDOAMm_X85aI7bh	7403	Zarr error when trying to overwrite part of existing store	TomNicholas 35968931	open	3	2022-12-28T00:40:16Z	2023-01-11T21:26:10Z		MEMBER			What happened? `to_zarr` threw an error when I tried to overwrite part of an existing zarr store. What did you expect to happen? With mode `w` I was expecting it to overwrite part of the store with no complaints. I expected that because that's what the docstring of `to_zarr` says: `mode ({"w", "w-", "a", "r+", None}, optional)` – Persistence mode: “w” means create (overwrite if exists); “w-” means create (fail if exists); “a” means override existing variables (create if does not exist); The default mode is "w", so I was expecting it to overwrite. Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np np.random.seed(0) ds = xr.Dataset() ds["data"] = (['x', 'y'], np.random.random((100,100))) ds.to_zarr("test.zarr") print(ds["data"].mean().compute()) returns array(0.49645889) as expected ds = xr.open_dataset("test.zarr", engine='zarr', chunks={}) ds["data"].mean().compute() print(ds["data"].mean().compute()) still returns array(0.49645889) as expected ds.to_zarr("test.zarr", mode="a") ``` python <xarray.DataArray 'data' ()> array(0.49645889) <xarray.DataArray 'data' ()> array(0.49645889) Traceback (most recent call last): File "/home/tom/Documents/Work/Code/experimentation/bugs/datatree_nans/mwe_xarray.py", line 16, in <module> ds.to_zarr("test.zarr") File "/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/core/dataset.py", line 2091, in to_zarr return to_zarr( # type: ignore File "/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/backends/api.py", line 1628, in to_zarr zstore = backends.ZarrStore.open_group( File "/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/backends/zarr.py", line 420, in open_group zarr_group = zarr.open_group(store, *open_kwargs) File "/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/zarr/hierarchy.py", line 1389, in open_group raise ContainsGroupError(path) zarr.errors.ContainsGroupError: path '' contains a group MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response* Anything else we need to know? I would like to know what the intended result is supposed to be here, so that I can make sure datatree behaves the same way, see https://github.com/xarray-contrib/datatree/issues/168. Environment Main branch of xarray, zarr v2.13.3	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7403/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

196 rows where user = 35968931 sorted by updated_at descending

What is your issue?

What is your issue?

What is your issue?

Weekly meeting

Task list:

What is your issue?

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

TODO once passing indexes={} directly to DataArray constructor is allowed then no need to create coords object separately first

should not have auto-created any indexes

should not have auto-created any indexes

MVCE confirmation

Relevant log output

nor have auto-created any indexes

Anything else we need to know?

Environment

What is your issue?

What is your issue?

What is your issue?

Context in xarray

In [15]: ds.map(Variable.isel, x=0)

Issue in DataTree

Desired behaviour

Proposed Solution

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

```Python

Anything else we need to know?

Environment

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

In [11]: ds = xr.Dataset(coords=coords)

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

now if I check the zarr .zarray metadata for the air variable it says

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

What is your issue?

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

now if I check the zarr `.zarray` metadata for the `air` variable it says

7019 generalized most of xarrays internals to be able to use any chunked array type that we can create a `ChunkManagerEntrypoint` for. Most functions now go through this (e.g. `apply_ufunc`), but I did not redirect `xarray.map_blocks` to go through `ChunkManagerEntrypoint`.

Feature proposal: `xarray.interactive` module