id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 2194953062,PR_kwDOAMm_X85qFqp1,8854,array api-related upstream-dev failures,14808389,open,0,,,15,2024-03-19T13:17:09Z,2024-05-03T22:46:41Z,,MEMBER,,0,pydata/xarray/pulls/8854,"- [x] towards #8844 This ""fixes"" the upstream-dev failures related to the removal of `numpy.array_api`. There are a couple of open questions, though: - `array-api-strict` is not installed by default, so `namedarray` would get a new dependency. Not sure how to deal with that – as far as I can tell, `numpy.array_api` was not supposed to be used that way, so maybe we need to use `array-api-compat` instead? What do you think, @andersy005, @Illviljan? - `array-api-strict` does not define `Array.nbytes` (causing a funny exception that wrongly claims `DataArray` does not define `nbytes`) - `array-api-strict` has a different `DType` class, which makes it tricky to work with both `numpy` dtypes and said dtype class in the same code. In particular, if I understand correctly we're supposed to check dtypes using `isdtype`, but `numpy.isdtype` will only exist in `numpy>=2`, `array-api-strict`'s version does not define datetime / string / object dtypes, and `numpy.issubdtype` does not work with the non-`numpy` dtype class). So maybe we need to use `array-api-compat` internally?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8854/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 2269295936,PR_kwDOAMm_X85uBwtv,8983,fixes for the `pint` tests,14808389,open,0,,,0,2024-04-29T15:09:28Z,2024-05-03T18:30:06Z,,MEMBER,,0,pydata/xarray/pulls/8983,"This removes the use of the deprecated `numpy.core._exceptions.UFuncError` (and multiplication as a way to attach units), and makes sure we run the `pint` tests in the upstream-dev CI again.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8983/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 2234142680,PR_kwDOAMm_X85sK0g8,8923,"`""source""` encoding for datasets opened from `fsspec` objects",14808389,open,0,,,5,2024-04-09T19:12:45Z,2024-04-23T16:54:09Z,,MEMBER,,0,pydata/xarray/pulls/8923,"When opening files from path-like objects (`str`, `pathlib.Path`), the backend machinery (`_dataset_from_backend_dataset`) sets the `""source""` encoding. This is useful if we need the original path for additional processing, like writing to a similarly named file, or to extract additional metadata. This would be useful as well when using `fsspec` to open remote files. In this PR, I'm extracting the `path` attribute that most `fsspec` objects have to set that value. I've considered using `isinstance` checks instead of the `getattr`-with-default, but the list of potential classes is too big to be practical (at least 4 classes just within `fsspec` itself). If this sounds like a good idea, I'll update the documentation of the `""source""` encoding to mention this feature. - [x] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8923/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 2241492018,PR_kwDOAMm_X85skF_A,8937,drop support for `python=3.9`,14808389,open,0,,,3,2024-04-13T10:18:04Z,2024-04-15T15:07:39Z,,MEMBER,,0,pydata/xarray/pulls/8937,"According to our policy (and NEP-29) we can drop support for `python=3.9` since about a week ago. Interestingly, SPEC0 says we could have started doing this about half a year ago (Q4 2023). We could delay this until we have a release that is compatible with `numpy>=2.0`, though (`numpy>=2.1` will drop support for `python=3.9`). - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8937/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 2079089277,I_kwDOAMm_X8577GJ9,8607,allow computing just a small number of variables,14808389,open,0,,,4,2024-01-12T15:21:27Z,2024-01-12T20:20:29Z,,MEMBER,,,,"### Is your feature request related to a problem? I frequently find myself computing a handful of variables of a dataset (typically coordinates) and assigning them back to the dataset, and wishing we had a method / function that allowed that. ### Describe the solution you'd like I'd imagine something like ```python ds.compute(variables=variable_names) ``` but I'm undecided on whether that's a good idea (it might make `.compute` more complex?) ### Describe alternatives you've considered So far I've been using something like ```python ds.assign_coords({k: lambda ds: ds[k].compute() for k in variable_names}) ds.pipe(lambda ds: ds.merge(ds[variable_names].compute())) ``` but both are not easy to type / understand (though having `.merge` take a callable would make this much easier). Also, the first option computes variables separately, which may not be ideal? ### Additional context _No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8607/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1655290694,I_kwDOAMm_X85iqbtG,7721,`as_shared_dtype` converts scalars to 0d `numpy` arrays if chunked `cupy` is involved,14808389,open,0,,,7,2023-04-05T09:48:34Z,2023-12-04T10:45:43Z,,MEMBER,,,,"I tried to run `where` with chunked `cupy` arrays: ```python In [1]: import xarray as xr ...: import cupy ...: import dask.array as da ...: ...: arr = xr.DataArray(cupy.arange(4), dims=""x"") ...: mask = xr.DataArray(cupy.array([False, True, True, False]), dims=""x"") ``` this works: ```python In [2]: arr.where(mask) Out[2]: array([nan, 1., 2., nan]) Dimensions without coordinates: x ``` this fails: ```python In [4]: arr.chunk().where(mask).compute() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[4], line 1 ----> 1 arr.chunk().where(mask).compute() File ~/repos/xarray/xarray/core/dataarray.py:1095, in DataArray.compute(self, **kwargs) 1076 """"""Manually trigger loading of this array's data from disk or a 1077 remote source into memory and return a new array. The original is 1078 left unaltered. (...) 1092 dask.compute 1093 """""" 1094 new = self.copy(deep=False) -> 1095 return new.load(**kwargs) File ~/repos/xarray/xarray/core/dataarray.py:1069, in DataArray.load(self, **kwargs) 1051 def load(self: T_DataArray, **kwargs) -> T_DataArray: 1052 """"""Manually trigger loading of this array's data from disk or a 1053 remote source into memory and return this array. 1054 (...) 1067 dask.compute 1068 """""" -> 1069 ds = self._to_temp_dataset().load(**kwargs) 1070 new = self._from_temp_dataset(ds) 1071 self._variable = new._variable File ~/repos/xarray/xarray/core/dataset.py:752, in Dataset.load(self, **kwargs) 749 import dask.array as da 751 # evaluate all the dask arrays simultaneously --> 752 evaluated_data = da.compute(*lazy_data.values(), **kwargs) 754 for k, data in zip(lazy_data, evaluated_data): 755 self.variables[k].data = data File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/dask/base.py:600, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs) 597 keys.append(x.__dask_keys__()) 598 postcomputes.append(x.__dask_postcompute__()) --> 600 results = schedule(dsk, keys, **kwargs) 601 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)]) File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/dask/threaded.py:89, in get(dsk, keys, cache, num_workers, pool, **kwargs) 86 elif isinstance(pool, multiprocessing.pool.Pool): 87 pool = MultiprocessingPoolExecutor(pool) ---> 89 results = get_async( 90 pool.submit, 91 pool._max_workers, 92 dsk, 93 keys, 94 cache=cache, 95 get_id=_thread_get_id, 96 pack_exception=pack_exception, 97 **kwargs, 98 ) 100 # Cleanup pools associated to dead threads 101 with pools_lock: File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/dask/local.py:511, in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs) 509 _execute_task(task, data) # Re-execute locally 510 else: --> 511 raise_exception(exc, tb) 512 res, worker_id = loads(res_info) 513 state[""cache""][key] = res File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/dask/local.py:319, in reraise(exc, tb) 317 if exc.__traceback__ is not tb: 318 raise exc.with_traceback(tb) --> 319 raise exc File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/dask/local.py:224, in execute_task(key, task_info, dumps, loads, get_id, pack_exception) 222 try: 223 task, data = loads(task_info) --> 224 result = _execute_task(task, data) 225 id = get_id() 226 result = dumps((result, id)) File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/dask/core.py:119, in _execute_task(arg, cache, dsk) 115 func, args = arg[0], arg[1:] 116 # Note: Don't assign the subtask results to a variable. numpy detects 117 # temporaries by their reference count and can execute certain 118 # operations in-place. --> 119 return func(*(_execute_task(a, cache) for a in args)) 120 elif not ishashable(arg): 121 return arg File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/dask/optimization.py:990, in SubgraphCallable.__call__(self, *args) 988 if not len(args) == len(self.inkeys): 989 raise ValueError(""Expected %d args, got %d"" % (len(self.inkeys), len(args))) --> 990 return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args))) File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/dask/core.py:149, in get(dsk, out, cache) 147 for key in toposort(dsk): 148 task = dsk[key] --> 149 result = _execute_task(task, cache) 150 cache[key] = result 151 result = _execute_task(out, cache) File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/dask/core.py:119, in _execute_task(arg, cache, dsk) 115 func, args = arg[0], arg[1:] 116 # Note: Don't assign the subtask results to a variable. numpy detects 117 # temporaries by their reference count and can execute certain 118 # operations in-place. --> 119 return func(*(_execute_task(a, cache) for a in args)) 120 elif not ishashable(arg): 121 return arg File <__array_function__ internals>:180, in where(*args, **kwargs) File cupy/_core/core.pyx:1723, in cupy._core.core._ndarray_base.__array_function__() File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/cupy/_sorting/search.py:211, in where(condition, x, y) 209 if fusion._is_fusing(): 210 return fusion._call_ufunc(_where_ufunc, condition, x, y) --> 211 return _where_ufunc(condition.astype('?'), x, y) File cupy/_core/_kernel.pyx:1287, in cupy._core._kernel.ufunc.__call__() File cupy/_core/_kernel.pyx:160, in cupy._core._kernel._preprocess_args() File cupy/_core/_kernel.pyx:146, in cupy._core._kernel._preprocess_arg() TypeError: Unsupported type ``` this works again: ```python In [7]: arr.chunk().where(mask.chunk(), cupy.array(cupy.nan)).compute() Out[7]: array([nan, 1., 2., nan]) Dimensions without coordinates: x ``` And other methods like `fillna` show similar behavior. I think the reason is that this: https://github.com/pydata/xarray/blob/d4db16699f30ad1dc3e6861601247abf4ac96567/xarray/core/duck_array_ops.py#L195 is not sufficient to detect `cupy` beneath other layers of duckarrays (most commonly `dask`, `pint`, or both). In this specific case we could extend the condition to also match chunked `cupy` arrays (like `arr.cupy.is_cupy` does, but using `is_duck_dask_array`), but this will still break for other duckarray layers or if `dask` is not involved, and we're also in the process of moving away from special-casing `dask`. So short of asking `cupy` to treat 0d arrays like scalars I'm not sure how to fix this. cc @jacobtomlinson","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7721/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1158378382,I_kwDOAMm_X85FC3OO,6323,propagation of `encoding`,14808389,open,0,,,8,2022-03-03T12:57:29Z,2023-10-25T23:20:31Z,,MEMBER,,,,"### What is your issue? We frequently get bug reports related to `encoding` that can usually be fixed by clearing it or by overriding it using the `encoding` parameter of the `to_*` methods, e.g. - #4224 - #4380 - #4655 - #5427 - #5490 - fsspec/kerchunk#130 There are also a few discussions with more background: - https://github.com/pydata/xarray/pull/5065#issuecomment-806154872 - https://github.com/pydata/xarray/issues/1614 - #5082 - #5336 We discussed this in the meeting yesterday and as far as I can remember agreed that the current default behavior is not ideal and decided to investigate #5336: a `keep_encoding` option, similar to `keep_attrs`, that would be `True` (propagate `encoding`) by default but will be changed to `False` (drop `encoding` on any operation) in the future. cc @rabernat, @shoyer","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6323/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 683142059,MDU6SXNzdWU2ODMxNDIwNTk=,4361,restructure the contributing guide,14808389,open,0,,,5,2020-08-20T22:51:39Z,2023-03-31T17:39:00Z,,MEMBER,,,,"From #4355 @max-sixty: > Stepping back on the contributing doc — I admit I haven't look at it in a while — I wonder whether we can slim it down a bit, for example by linking to other docs for generic tooling — I imagine we're unlikely to have the best docs on working with GH, for example. Or referencing our PR template rather than the (now out-of-date) PR checklist. We could also add a docstring guide since the `numpydoc` guide does not cover every little detail (for example, `default` notation, type spec vs. type hint, space before the colon separating parameter names from types, no colon for parameters without types, etc.)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4361/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1306795760,I_kwDOAMm_X85N5B7w,6793,improve docstrings with examples and links,14808389,open,0,,,10,2022-07-16T12:30:33Z,2023-03-24T16:33:28Z,,MEMBER,,,,"This is a (incomplete) checklist for #5816 to make it easier to find methods that are in need of examples and links to the narrative docs with further information (of course, changes to the docstrings of all other methods / functions part of the public API are also appreciated). Good examples explicitly construct small xarray objects to make it easier to follow (e.g. use `np.{ones,full,zeros}` or the `np.array` constructor instead of `np.random` / loading from files) and show both input and output of the function. Use ```sh pytest --doctest-modules xarray --ignore xarray/tests/ ``` to verify the examples, or push to a PR to have the CI do it for you (note that you will have much quicker feedback locally though). To easily generate the expected output install `pytest-accept` ([docs]()) in your dev environment and then run ``` pytest --doctest-modules FILE_NAME --accept || true ``` To link to other documentation pages we can use ```python :doc:`project:label` Description of the linked page ``` where we can leave out `project` if we link to somewhere within xarray's documentation. To figure out the label, we can either look at the source, search the output of `python -m sphinx.ext.intersphinx https://docs.xarray.dev/en/latest/objects.inv`, or use `sphobjinv` (install from PyPI): ```sh sphobjinv search -su https://docs.xarray.dev/en/latest/ missing ``` Top-level functions: - [ ] `get_options` - [ ] `decode_cf` - [ ] `polyval` - [ ] `unify_chunks` - [ ] `infer_freq` - [ ] `date_range` I/O: - [ ] `load_dataarray` - [ ] `load_dataset` - [ ] `open_dataarray` - [ ] `open_dataset` - [ ] `open_mfdataset` Contents: - [ ] `DataArray.assign_attrs`, `Dataset.assign_attrs` - [ ] `DataArray.expand_dims`, `Dataset.expand_dims` - [ ] `DataArray.drop_duplicates`, `Dataset.drop_duplicates` - [ ] `DataArray.drop_vars`, `Dataset.drop_vars` - [ ] `Dataset.drop_dims` - [ ] `DataArray.convert_calendar`, `Dataset.convert_calendar` - [ ] `DataArray.set_coords`, `Dataset.set_coords` - [ ] `DataArray.reset_coords`, `Dataset.reset_coords` Comparisons: - [ ] `DataArray.equals`, `Dataset.equals` - [ ] `DataArray.identical`, `Dataset.identical` - [ ] `DataArray.broadcast_equals`, `Dataset.broadcast_equals` Dask: - [ ] `DataArray.compute`, `Dataset.compute` - [ ] `DataArray.chunk`, `Dataset.chunk` - [ ] `DataArray.persist`, `Dataset.persist` Missing values: - [ ] `DataArray.bfill`, `Dataset.bfill` - [ ] `DataArray.ffill`, `Dataset.ffill` - [ ] `DataArray.fillna`, `Dataset.fillna` - [ ] `DataArray.dropna`, `Dataset.dropna` Indexing: - [ ] `DataArray.loc` (no docstring at all - came up in https://github.com/pydata/xarray/discussions/7528#discussion-4858556) - [ ] `DataArray.drop_isel` - [ ] `DataArray.drop_sel` - [ ] `DataArray.head`, `Dataset.head` - [ ] `DataArray.tail`, `Dataset.tail` - [ ] `DataArray.interp_like`, `Dataset.interp_like` - [ ] `DataArray.reindex_like`, `Dataset.reindex_like` - [ ] `Dataset.isel` Aggregations: - [ ] `Dataset.argmax` - [ ] `Dataset.argmin` - [ ] `DataArray.cumsum`, `Dataset.cumsum` (intermediate to advanced) - [ ] `DataArray.cumprod`, `Dataset.cumprod` (intermediate to advanced) - [ ] `DataArray.reduce`, `Dataset.reduce`","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6793/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 818059250,MDExOlB1bGxSZXF1ZXN0NTgxNDIzNTIx,4972,Automatic duck array testing - reductions,14808389,open,0,,,23,2021-02-27T23:57:23Z,2022-08-16T13:47:05Z,,MEMBER,,1,pydata/xarray/pulls/4972,"This is the first of a series of PRs to add a framework to make testing the integration of duck arrays as simple as possible. It uses `hypothesis` for increased coverage and maintainability. - [x] Tests added - [x] Passes `pre-commit run --all-files` - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4972/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 532696790,MDU6SXNzdWU1MzI2OTY3OTA=,3594,support for units with pint,14808389,open,0,,,7,2019-12-04T13:49:28Z,2022-08-03T11:44:05Z,,MEMBER,,,,"`pint`'s implementation of NEP-18 (see hgrecco/pint#905) is close enough so we can finally start working on the `pint` support (i.e. make the integration tests pass). This would be the list of tasks to get there: * integration tests: - [x] implement integration tests for `DataArray`, `Dataset` and top-level functions (#3238, #3447, #3493) - [x] add tests for `Variable` as discussed in #3493 (#3654) - [x] clean up the current tests (#3600) - [x] use the standard `assert_identical` and `assert_allclose` functions (#3611, #3643, #3654, #3706, #3975) - [x] clean up the `TestVariable.test_pad` tests * actually get xarray to support units: - [x] top-level functions (#3611) - [x] `Variable` (#3706) + `rolling_window` and `identical` need larger modifications - [x] `DataArray` (#3643) - [x] `Dataset` - [x] silence all the `UnitStrippedWarnings` in the testsuite (#4163) - [ ] try to get `nanprod` to work with quantities - [x] add support for per variable fill values (#4165) - [x] `repr` with units (#2773) - [ ] type hierarchy (e.g. for `np.maximum(data_array, quantity)` vs `np.maximum(quantity, data_array)`) (#3950) * update the documentation - [x] point to [pint-xarray](https://github.com/xarray-contrib/pint-xarray) (see #4530) - [x] mention the requirement for `UnitRegistry(force_ndarray=True)` or `UnitRegistry(force_ndarray_like=True)` (see https://pint-xarray.readthedocs.io/en/stable/creation.html#attaching-units) - [x] list the known issues (see https://github.com/pydata/xarray/pull/3643#issue-354872657 and https://github.com/pydata/xarray/pull/3643#issuecomment-602225731) (#4530): + `pandas` (indexing) + `bottleneck` (`bfill`, `ffill`) + `scipy` (`interp`) + `numbagg` (`rolling_exp`) + `numpy.lib.stride_tricks.as_strided`: `rolling` + `numpy.vectorize`: `interpolate_na` - [x] ~update the install instructions (we can use standard `conda` / `pip` now)~ this should be done by `pint-xarray`","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3594/reactions"", ""total_count"": 14, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 14, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 597566530,MDExOlB1bGxSZXF1ZXN0NDAxNjU2MTc1,3960,examples for special methods on accessors,14808389,open,0,,,6,2020-04-09T21:34:30Z,2022-06-09T14:50:17Z,,MEMBER,,0,pydata/xarray/pulls/3960,"This starts adding the parametrized accessor examples from #3829 to the accessor documentation as suggested by @jhamman. Since then the `weighted` methods have been added, though, so I'd like to use a different example instead (ideas welcome). Also, this feature can be abused to add functions to the main `DataArray` / `Dataset` namespace (by registering a function with the `register_*_accessor` decorators, see the second example). Is this something we want to explicitly discourage? (~When trying to build the docs locally, sphinx keeps complaining about a code block without code. Not sure what that is about~ seems the `ipython` directive does not allow more than one expression, so I used `code` instead) - [x] Closes #3829 - [x] Passes `isort -rc . && black . && mypy . && flake8` - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3960/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 801728730,MDExOlB1bGxSZXF1ZXN0NTY3OTkzOTI3,4863,apply to dataset,14808389,open,0,,,14,2021-02-05T00:05:22Z,2022-06-09T14:50:17Z,,MEMBER,,0,pydata/xarray/pulls/4863,"as discussed in #4837, this adds a method that applies a function to a `DataArray` by first converting it to a temporary dataset using `_to_temp_dataset`, applies the function and converts it back. I'm not really happy with the name but I can't find a better one. This function is really similar to `pipe`, so I guess a keyword argument to pipe would work, too. The disadvantage of that is that `pipe` passes all kwargs to the passed function, which means we would shadow a specific kwarg. - [x] Closes #4837 - [x] Tests added - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [x] New functions/methods are listed in `api.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4863/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 959063390,MDExOlB1bGxSZXF1ZXN0NzAyMjM0ODc1,5668,create the context objects passed to custom `combine_attrs` functions,14808389,open,0,,,1,2021-08-03T12:24:50Z,2022-06-09T14:50:16Z,,MEMBER,,0,pydata/xarray/pulls/5668,"Follow-up to #4896: this creates the context object in reduce methods and passes it to `merge_attrs`, with more planned. - [ ] might help with xarray-contrib/cf-xarray#228 - [ ] Tests added - [x] Passes `pre-commit run --all-files` - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` Note that for now this is a bit inconvenient to use for provenance tracking (as discussed in the `cf-xarray` issue) because functions implementing that would still have to deal with merging the `attrs`. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5668/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1265366275,I_kwDOAMm_X85La_UD,6678,exception groups,14808389,open,0,,,1,2022-06-08T22:09:37Z,2022-06-08T23:38:28Z,,MEMBER,,,,"### What is your issue? As I mentioned in the meeting today, we have a lot of features where the the exception group support from [PEP654](https://peps.python.org/pep-0654/) (which is scheduled for python 3.11 and consists of the class and a syntax change) might be useful. For example, we might want to collect all errors raised by `rename` in a exception group instead of raising them one-by-one. For `python<=3.10` there's a [backport](https://github.com/agronholm/exceptiongroup) that contains the class and a workaround for the new syntax.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6678/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 624778130,MDU6SXNzdWU2MjQ3NzgxMzA=,4095,merging non-dimension coordinates with the Dataset constructor,14808389,open,0,,,1,2020-05-26T10:30:37Z,2022-04-19T13:54:43Z,,MEMBER,,,,"When adding two `DataArray` objects with different coordinates to a `Dataset`, a `MergeError` is raised even though one of the conflicting coords is a subset of the other. Merging dimension coordinates works so I'd expect associated non-dimension coordinates to work, too. This fails: ```python In [1]: import xarray as xr ...: import numpy as np In [2]: a = np.linspace(0, 1, 10) ...: b = np.linspace(-1, 0, 12) ...: ...: x_a = np.arange(10) ...: x_b = np.arange(12) ...: ...: y_a = x_a * 1000 ...: y_b = x_b * 1000 ...: ...: arr1 = xr.DataArray(data=a, coords={""x"": x_a, ""y"": (""x"", y_a)}, dims=""x"") ...: arr2 = xr.DataArray(data=b, coords={""x"": x_b, ""y"": (""x"", y_b)}, dims=""x"") ...: ...: xr.Dataset({""a"": arr1, ""b"": arr2}) ... MergeError: conflicting values for variable 'y' on objects to be combined. You can skip this check by specifying compat='override'. ``` While this works: ```python In [3]: a = np.linspace(0, 1, 10) ...: b = np.linspace(-1, 0, 12) ...: ...: x_a = np.arange(10) ...: x_b = np.arange(12) ...: ...: y_a = x_a * 1000 ...: y_b = x_b * 1000 ...: ...: xr.Dataset({ ...: ""a"": xr.DataArray(data=a, coords={""x"": x_a}, dims=""x""), ...: ""b"": xr.DataArray(data=b, coords={""x"": x_b}, dims=""x""), ...: }) Out[3]: Dimensions: (x: 12) Coordinates: * x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 Data variables: a (x) float64 0.0 0.1111 0.2222 0.3333 0.4444 ... 0.8889 1.0 nan nan b (x) float64 -1.0 -0.9091 -0.8182 -0.7273 ... -0.1818 -0.09091 0.0 ``` I can work around this by calling: ```python In [4]: xr.merge([arr1.rename(""a"").to_dataset(), arr2.rename(""b"").to_dataset()]) Out[4]: Dimensions: (x: 12) Coordinates: * x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 y (x) float64 0.0 1e+03 2e+03 3e+03 ... 8e+03 9e+03 1e+04 1.1e+04 Data variables: a (x) float64 0.0 0.1111 0.2222 0.3333 0.4444 ... 0.8889 1.0 nan nan b (x) float64 -1.0 -0.9091 -0.8182 -0.7273 ... -0.1818 -0.09091 0.0 ``` but I think the `Dataset` constructor should be capable of that, too.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4095/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 539181896,MDU6SXNzdWU1MzkxODE4OTY=,3638,load_store and dump_to_store,14808389,open,0,,,1,2019-12-17T16:37:53Z,2021-11-08T21:11:26Z,,MEMBER,,,,"Continuing from #3602, `load_store` and `dump_to_store` look like they are old and unmaintained functions: * `load_store` is referenced once in `api.rst` (I assume the reference to `from_store` was to `load_store`), but never tested, used or mentioned anywhere else * `dump_to_store` is tested (and probably used), but never mentioned except from the section on backends what should we do with these? Are they obsolete and should be removed or just unmaintained (then we should properly document and test them).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3638/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 789106802,MDU6SXNzdWU3ODkxMDY4MDI=,4825,clean up the API for renaming and changing dimensions / coordinates,14808389,open,0,,,5,2021-01-19T15:11:55Z,2021-09-10T15:04:14Z,,MEMBER,,,,"From #4108: I wonder if it would be better to first ""reorganize"" all of the existing functions: we currently have `rename` (and `Dataset.rename_dims` / `Dataset.rename_vars`), `set_coords`, `reset_coords`, `set_index`, `reset_index` and `swap_dims`, which overlap partially. For example, the code sample from #4417 works if instead of ```python ds = ds.rename(b='x') ds = ds.set_coords('x') ``` we use ```python ds = ds.set_index(x=""b"") ``` and something similar for the code sample in #4107. I believe we currently have these use cases (not sure if that list is complete, though): - rename a `DataArray` → `rename` - rename a existing variable to a name that is not yet in the object → `rename` / `Dataset.rename_vars` / `Dataset.rename_dims` - convert a data variable to a coordinate (not a dimension coordinate) → `set_coords` - convert a coordinate (not a dimension coordinate) to a data variable → `reset_coords` - swap a existing dimension coordinate with a coordinate (which may not exist) and rename the dimension → `swap_dims` - use a existing coordinate / data variable as a dimension coordinate (do not rename the dimension) → `set_index` - stop using a coordinate as dimension coordinate and append `_` to its name (do not rename the dimension) → `reset_index` - use two existing coordinates / data variables as a MultiIndex → `set_index` - stop using a MultiIndex as a dimension coordinate and use its levels as coordinates → `reset_index` Sometimes, some of these can be emulated by combinations of others, for example: ```python # x is a dimension without coordinates assert_identical(ds.set_index({""x"": ""b""}), ds.swap_dims({""x"": ""b""}).rename({""b"": ""x""})) assert_identical(ds.swap_dims({""x"": ""b""}), ds.set_index({""x"": ""b""}).rename({""x"": ""b""})) ``` and, with this PR: ```python assert_identical(ds.set_index({""x"": ""b""}), ds.set_coords(""b"").rename({""b"": ""x""})) assert_identical(ds.swap_dims({""x"": ""b""}), ds.rename({""b"": ""x""})) ``` which means that it would increase the overlap of `rename`, `set_index`, and `swap_dims`. In any case I think we should add a guide which explains which method to pick in which situation (or extend `howdoi`). _Originally posted by @keewis in https://github.com/pydata/xarray/issues/4108#issuecomment-761907785_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4825/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 935531700,MDU6SXNzdWU5MzU1MzE3MDA=,5562,"hooks to ""prepare"" xarray objects for plotting",14808389,open,0,,,6,2021-07-02T08:14:02Z,2021-07-04T08:46:34Z,,MEMBER,,,,"From https://github.com/xarray-contrib/pint-xarray/pull/61#discussion_r662485351 `matplotlib` has a module called `matplotlib.units` which manages a mapping of types to hooks. This is then used to convert custom types to something `matplotlib` can work with, and to optionally add axis labels. For example, with `pint`: ```python In [9]: ureg = pint.UnitRegistry() ...: ureg.setup_matplotlib() ...: ...: t = ureg.Quantity(np.arange(10), ""s"") ...: v = ureg.Quantity(5, ""m / s"") ...: x = v * t ...: ...: fig, ax = plt.subplots(1, 1) ...: ax.plot(t, x) ...: ...: plt.show() ``` this will plot the data without `UnitStrippedWarning`s and even attach the units as labels to the axis (the format is hard-coded in `pint` right now). While this is pretty neat there are some issues: - `xarray`'s plotting code converts to `masked_array`, dropping metadata on the duck array (which means `matplotlib` won't see the duck arrays) - we will end up overwriting the axis labels once the variable names are added (not sure if there's a way to specify a label format?) - it is `matplotlib` specific, which means we have to reimplement once we go through with the plotting entrypoints discussed in #3553 and #3640 All of this makes me wonder: should we try to maintain our own mapping of hooks which ""prepare"" the object based on the data's type? My initial idea would be that the hook function receives a `Dataset` or `DataArray` object and modifies it to convert the data to `numpy` arrays and optionally modifies the `attrs`. For example for `pint` the hook would return the result of `.pint.dequantify()` but it could also be used to explicitly call `.get` on `cupy` arrays or `.todense` on `sparse` arrays. xref #5561","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5562/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 589850951,MDU6SXNzdWU1ODk4NTA5NTE=,3917,running numpy functions on xarray objects,14808389,open,0,,,1,2020-03-29T18:17:29Z,2021-07-04T02:00:22Z,,MEMBER,,,,"In the `pint` integration tests I tried to also test calling numpy functions on xarray objects (we provide methods for all of them). Some of these functions, like `numpy.median`, `numpy.searchsorted` and `numpy.clip`, depend on `__array_function__` (i.e. not `__array_ufunc__`) to dispatch. However, neither `Dataset` nor `DataArray` (nor `Variable`, I think?) define these protocols (see #3643). Should we define `__array_function__` on xarray objects?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3917/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 674445594,MDU6SXNzdWU2NzQ0NDU1OTQ=,4321,push inline formatting functions upstream,14808389,open,0,,,0,2020-08-06T16:35:04Z,2021-04-19T03:20:11Z,,MEMBER,,,,"#4248 added a `_repr_inline_` [method](https://xarray.pydata.org/en/latest/internals.html#duck-arrays) duck arrays can use to customize their collapsed variable `repr`. We currently also have `inline_dask_repr` and `inline_sparse_repr` which remove redundant information like `dtype` and `shape` from `dask` and `sparse` arrays. In order to reduce the complexity of `inline_variable_array_repr`, we could try to push these functions upstream.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4321/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 675342733,MDU6SXNzdWU2NzUzNDI3MzM=,4324,constructing nested inline reprs,14808389,open,0,,,9,2020-08-07T23:25:31Z,2021-04-19T03:20:01Z,,MEMBER,,,,"While implementing the new `_repr_inline_` in xarray-contrib/pint-xarray#22, I realized that I designed that method with a single level of nesting in mind, e.g. `xarray(pint(x))` or `xarray(dask(x))`. From that PR: @keewis > thinking about this some more, this doesn't work for anything other than `numpy.ndarray` objects. For now I guess we could use the magnitude's `_repr_inline_` (falling back to `__repr__` if that doesn't exist) and only use `format_array_flat` if the magnitude is a `ndarray`. > > However, as we nest deeper (e.g. `xarray(pint(uncertainties(dask(sparse(cupy)))))` – for argument's sake, let's assume that this actually makes sense) this might break or become really complicated. Does anyone have any ideas how to deal with that? > > If I'm simply missing something we have that discussion here, otherwise I guess we should open a issue on `xarray`'s issue tracker. @jthielen > Yes, I agree that `format_array_flat` should probably just apply to magnitude being an `ndarray`. > > I think a cascading series of `_repr_inline_` should work for nested arrays, so long as > > * the metadata of the higher nested objects is considered the priority (if not, then we're back to a fully managed solution to the likes of [dask/dask#5329](https://github.com/dask/dask/issues/5329)) > > * small max lengths are handled gracefully (i.e., a minimum where it is just like `Dask.Array(...)`, then `...`, then nothing) > > * we're okay with the lowest arrays in large nesting chains not having any information show up in the inline repr (situation where there is not enough characters to even describe the full nesting has to be accounted for somehow) > > * it can be adopted without too much complaint across the ecosystem > > > Assuming all this, then each layer of the nesting will reduce the max length of the inline repr string available to the layers below it, until a layer reaches a reasonable minimum where it ""gives up"". At least that's the natural design that I inferred from the simple `_repr_inline_(max_width)` API. > > All that being said, it might still be good to bring up on xarray's end since this is a more general issue with inline reprs of nested duck arrays, with nothing pint-specific other than it being the motivating use-case. How should we deal with this?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4324/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 791277757,MDU6SXNzdWU3OTEyNzc3NTc=,4837,expose _to_temp_dataset / _from_temp_dataset as semi-public API?,14808389,open,0,,,5,2021-01-21T16:11:32Z,2021-01-22T02:07:08Z,,MEMBER,,,,"When writing accessors which behave the same for both `Dataset` and `DataArray`, it would be incredibly useful to be able to use `DataArray._to_temp_dataset` / `DataArray._from_temp_dataset` to deduplicate code. Is it safe to use those in external packages (like `pint-xarray`)? Otherwise I guess it would be possible to use ```python name = da.name if da.name is None else ""__temp"" temp_ds = da.to_dataset(name=name) new_da = temp_ds[name] if da.name is None: new_da = new_da.rename(da.name) assert_identical(da, new_da) ``` but that seems less efficient.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4837/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 552896124,MDU6SXNzdWU1NTI4OTYxMjQ=,3711,PseudoNetCDF tests failing randomly,14808389,open,0,,,6,2020-01-21T14:01:49Z,2020-03-23T20:32:32Z,,MEMBER,,,,"The `py37-windows` CI seems to fail for newer PRs: ```pytb _______________ TestPseudoNetCDFFormat.test_uamiv_format_write ________________ self = def test_uamiv_format_write(self): fmtkw = {""format"": ""uamiv""} expected = open_example_dataset( ""example.uamiv"", engine=""pseudonetcdf"", backend_kwargs=fmtkw ) with self.roundtrip( expected, save_kwargs=fmtkw, open_kwargs={""backend_kwargs"": fmtkw}, allow_cleanup_failure=True, ) as actual: > assert_identical(expected, actual) xarray\tests\test_backends.py:3532: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ xarray\core\formatting.py:628: in diff_dataset_repr summary.append(diff_attrs_repr(a.attrs, b.attrs, compat)) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a_mapping = {'CPROJ': 0, 'FILEDESC': 'CAMx ', 'FTYPE': 1, 'GDNAM': 'CAMx ', ...} b_mapping = {'CPROJ': 0, 'FILEDESC': 'CAMx ', 'FTYPE': 1, 'GDNAM': 'CAMx ', ...} compat = 'identical', title = 'Attributes' summarizer = , col_width = None def _diff_mapping_repr(a_mapping, b_mapping, compat, title, summarizer, col_width=None): def extra_items_repr(extra_keys, mapping, ab_side): extra_repr = [summarizer(k, mapping[k], col_width) for k in extra_keys] if extra_repr: header = f""{title} only on the {ab_side} object:"" return [header] + extra_repr else: return [] a_keys = set(a_mapping) b_keys = set(b_mapping) summary = [] diff_items = [] for k in a_keys & b_keys: try: # compare xarray variable compatible = getattr(a_mapping[k], compat)(b_mapping[k]) is_variable = True except AttributeError: # compare attribute value compatible = a_mapping[k] == b_mapping[k] is_variable = False > if not compatible: E ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3711/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 517195073,MDU6SXNzdWU1MTcxOTUwNzM=,3483,assign_coords with mixed DataArray / array args removes coords,14808389,open,0,,,5,2019-11-04T14:38:40Z,2019-11-07T15:46:15Z,,MEMBER,,,,"I'm not sure if using `assign_coords` to overwrite the data of coords is the best way to do so, but using mixed args (on current master) turns out to have surprising results: ```python >>> obj = xr.DataArray( ... data=[6, 3, 4, 6], ... coords={""x"": list(""abcd""), ""y"": (""x"", range(4))}, ... dims=""x"", ... ) >>> obj array([6, 3, 4, 6]) Coordinates: * x (x) >> # works as expected >>> obj.assign_coords(coords={""x"": list(""efgh""), ""y"": (""x"", [0, 2, 4, 6])}) array([6, 3, 4, 6]) Coordinates: * x (x) >> # works, too (same as .data / .values) >>> obj.assign_coords(coords={ ... ""x"": obj.x.copy(data=list(""efgh"")).variable, ... ""y"": (""x"", [0, 2, 4, 6]), ... }) array([6, 3, 4, 6]) Coordinates: * x (x) >> # this drops ""y"" >>> obj.assign_coords(coords={ ... ""x"": obj.x.copy(data=list(""efgh"")), ... ""y"": (""x"", [0, 2, 4, 6]), ... }) array([6, 3, 4, 6]) Coordinates: * x (x) >> obj.assign_coords(x=list(""efgh""), y=obj.y * 2) xarray.core.merge.MergeError: conflicting values for index 'x' on objects to be combined: first value: Index(['e', 'f', 'g', 'h'], dtype='object', name='x') second value: Index(['a', 'b', 'c', 'd'], dtype='object', name='x') ``` I would expect the result to be the same regardless of the type of the new coords. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3483/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue