home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

25 rows where state = "open" and user = 14808389 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, updated_at, draft, created_at (date), updated_at (date)

type 2

  • issue 17
  • pull 8

state 1

  • open · 25 ✖

repo 1

  • xarray 25
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2194953062 PR_kwDOAMm_X85qFqp1 8854 array api-related upstream-dev failures keewis 14808389 open 0     15 2024-03-19T13:17:09Z 2024-05-03T22:46:41Z   MEMBER   0 pydata/xarray/pulls/8854
  • [x] towards #8844

This "fixes" the upstream-dev failures related to the removal of numpy.array_api. There are a couple of open questions, though: - array-api-strict is not installed by default, so namedarray would get a new dependency. Not sure how to deal with that – as far as I can tell, numpy.array_api was not supposed to be used that way, so maybe we need to use array-api-compat instead? What do you think, @andersy005, @Illviljan? - array-api-strict does not define Array.nbytes (causing a funny exception that wrongly claims DataArray does not define nbytes) - array-api-strict has a different DType class, which makes it tricky to work with both numpy dtypes and said dtype class in the same code. In particular, if I understand correctly we're supposed to check dtypes using isdtype, but numpy.isdtype will only exist in numpy>=2, array-api-strict's version does not define datetime / string / object dtypes, and numpy.issubdtype does not work with the non-numpy dtype class). So maybe we need to use array-api-compat internally?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8854/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2269295936 PR_kwDOAMm_X85uBwtv 8983 fixes for the `pint` tests keewis 14808389 open 0     0 2024-04-29T15:09:28Z 2024-05-03T18:30:06Z   MEMBER   0 pydata/xarray/pulls/8983

This removes the use of the deprecated numpy.core._exceptions.UFuncError (and multiplication as a way to attach units), and makes sure we run the pint tests in the upstream-dev CI again.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8983/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2234142680 PR_kwDOAMm_X85sK0g8 8923 `"source"` encoding for datasets opened from `fsspec` objects keewis 14808389 open 0     5 2024-04-09T19:12:45Z 2024-04-23T16:54:09Z   MEMBER   0 pydata/xarray/pulls/8923

When opening files from path-like objects (str, pathlib.Path), the backend machinery (_dataset_from_backend_dataset) sets the "source" encoding. This is useful if we need the original path for additional processing, like writing to a similarly named file, or to extract additional metadata. This would be useful as well when using fsspec to open remote files.

In this PR, I'm extracting the path attribute that most fsspec objects have to set that value. I've considered using isinstance checks instead of the getattr-with-default, but the list of potential classes is too big to be practical (at least 4 classes just within fsspec itself).

If this sounds like a good idea, I'll update the documentation of the "source" encoding to mention this feature.

  • [x] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8923/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2241492018 PR_kwDOAMm_X85skF_A 8937 drop support for `python=3.9` keewis 14808389 open 0     3 2024-04-13T10:18:04Z 2024-04-15T15:07:39Z   MEMBER   0 pydata/xarray/pulls/8937

According to our policy (and NEP-29) we can drop support for python=3.9 since about a week ago. Interestingly, SPEC0 says we could have started doing this about half a year ago (Q4 2023).

We could delay this until we have a release that is compatible with numpy>=2.0, though (numpy>=2.1 will drop support for python=3.9).

  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8937/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2079089277 I_kwDOAMm_X8577GJ9 8607 allow computing just a small number of variables keewis 14808389 open 0     4 2024-01-12T15:21:27Z 2024-01-12T20:20:29Z   MEMBER      

Is your feature request related to a problem?

I frequently find myself computing a handful of variables of a dataset (typically coordinates) and assigning them back to the dataset, and wishing we had a method / function that allowed that.

Describe the solution you'd like

I'd imagine something like python ds.compute(variables=variable_names) but I'm undecided on whether that's a good idea (it might make .compute more complex?)

Describe alternatives you've considered

So far I've been using something like python ds.assign_coords({k: lambda ds: ds[k].compute() for k in variable_names}) ds.pipe(lambda ds: ds.merge(ds[variable_names].compute())) but both are not easy to type / understand (though having .merge take a callable would make this much easier). Also, the first option computes variables separately, which may not be ideal?

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8607/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1655290694 I_kwDOAMm_X85iqbtG 7721 `as_shared_dtype` converts scalars to 0d `numpy` arrays if chunked `cupy` is involved keewis 14808389 open 0     7 2023-04-05T09:48:34Z 2023-12-04T10:45:43Z   MEMBER      

I tried to run where with chunked cupy arrays: python In [1]: import xarray as xr ...: import cupy ...: import dask.array as da ...: ...: arr = xr.DataArray(cupy.arange(4), dims="x") ...: mask = xr.DataArray(cupy.array([False, True, True, False]), dims="x") this works: python In [2]: arr.where(mask) Out[2]: <xarray.DataArray (x: 4)> array([nan, 1., 2., nan]) Dimensions without coordinates: x this fails: ```python In [4]: arr.chunk().where(mask).compute()


TypeError Traceback (most recent call last) Cell In[4], line 1 ----> 1 arr.chunk().where(mask).compute()

File ~/repos/xarray/xarray/core/dataarray.py:1095, in DataArray.compute(self, kwargs) 1076 """Manually trigger loading of this array's data from disk or a 1077 remote source into memory and return a new array. The original is 1078 left unaltered. (...) 1092 dask.compute 1093 """ 1094 new = self.copy(deep=False) -> 1095 return new.load(kwargs)

File ~/repos/xarray/xarray/core/dataarray.py:1069, in DataArray.load(self, kwargs) 1051 def load(self: T_DataArray, kwargs) -> T_DataArray: 1052 """Manually trigger loading of this array's data from disk or a 1053 remote source into memory and return this array. 1054 (...) 1067 dask.compute 1068 """ -> 1069 ds = self._to_temp_dataset().load(**kwargs) 1070 new = self._from_temp_dataset(ds) 1071 self._variable = new._variable

File ~/repos/xarray/xarray/core/dataset.py:752, in Dataset.load(self, kwargs) 749 import dask.array as da 751 # evaluate all the dask arrays simultaneously --> 752 evaluated_data = da.compute(*lazy_data.values(), kwargs) 754 for k, data in zip(lazy_data, evaluated_data): 755 self.variables[k].data = data

File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/dask/base.py:600, in compute(traverse, optimize_graph, scheduler, get, args, kwargs) 597 keys.append(x.dask_keys()) 598 postcomputes.append(x.dask_postcompute()) --> 600 results = schedule(dsk, keys, kwargs) 601 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)])

File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/dask/threaded.py:89, in get(dsk, keys, cache, num_workers, pool, kwargs) 86 elif isinstance(pool, multiprocessing.pool.Pool): 87 pool = MultiprocessingPoolExecutor(pool) ---> 89 results = get_async( 90 pool.submit, 91 pool._max_workers, 92 dsk, 93 keys, 94 cache=cache, 95 get_id=_thread_get_id, 96 pack_exception=pack_exception, 97 kwargs, 98 ) 100 # Cleanup pools associated to dead threads 101 with pools_lock:

File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/dask/local.py:511, in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs) 509 _execute_task(task, data) # Re-execute locally 510 else: --> 511 raise_exception(exc, tb) 512 res, worker_id = loads(res_info) 513 state["cache"][key] = res

File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/dask/local.py:319, in reraise(exc, tb) 317 if exc.traceback is not tb: 318 raise exc.with_traceback(tb) --> 319 raise exc

File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/dask/local.py:224, in execute_task(key, task_info, dumps, loads, get_id, pack_exception) 222 try: 223 task, data = loads(task_info) --> 224 result = _execute_task(task, data) 225 id = get_id() 226 result = dumps((result, id))

File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/dask/core.py:119, in _execute_task(arg, cache, dsk) 115 func, args = arg[0], arg[1:] 116 # Note: Don't assign the subtask results to a variable. numpy detects 117 # temporaries by their reference count and can execute certain 118 # operations in-place. --> 119 return func(*(_execute_task(a, cache) for a in args)) 120 elif not ishashable(arg): 121 return arg

File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/dask/optimization.py:990, in SubgraphCallable.call(self, *args) 988 if not len(args) == len(self.inkeys): 989 raise ValueError("Expected %d args, got %d" % (len(self.inkeys), len(args))) --> 990 return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/dask/core.py:149, in get(dsk, out, cache) 147 for key in toposort(dsk): 148 task = dsk[key] --> 149 result = _execute_task(task, cache) 150 cache[key] = result 151 result = _execute_task(out, cache)

File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/dask/core.py:119, in _execute_task(arg, cache, dsk) 115 func, args = arg[0], arg[1:] 116 # Note: Don't assign the subtask results to a variable. numpy detects 117 # temporaries by their reference count and can execute certain 118 # operations in-place. --> 119 return func(*(_execute_task(a, cache) for a in args)) 120 elif not ishashable(arg): 121 return arg

File <array_function internals>:180, in where(args, *kwargs)

File cupy/_core/core.pyx:1723, in cupy._core.core._ndarray_base.array_function()

File ~/.local/opt/mambaforge/envs/xarray/lib/python3.10/site-packages/cupy/_sorting/search.py:211, in where(condition, x, y) 209 if fusion._is_fusing(): 210 return fusion._call_ufunc(_where_ufunc, condition, x, y) --> 211 return _where_ufunc(condition.astype('?'), x, y)

File cupy/_core/_kernel.pyx:1287, in cupy._core._kernel.ufunc.call()

File cupy/_core/_kernel.pyx:160, in cupy._core._kernel._preprocess_args()

File cupy/_core/_kernel.pyx:146, in cupy._core._kernel._preprocess_arg()

TypeError: Unsupported type <class 'numpy.ndarray'> this works again:python In [7]: arr.chunk().where(mask.chunk(), cupy.array(cupy.nan)).compute() Out[7]: <xarray.DataArray (x: 4)> array([nan, 1., 2., nan]) Dimensions without coordinates: x `` And other methods likefillna` show similar behavior.

I think the reason is that this: https://github.com/pydata/xarray/blob/d4db16699f30ad1dc3e6861601247abf4ac96567/xarray/core/duck_array_ops.py#L195 is not sufficient to detect cupy beneath other layers of duckarrays (most commonly dask, pint, or both). In this specific case we could extend the condition to also match chunked cupy arrays (like arr.cupy.is_cupy does, but using is_duck_dask_array), but this will still break for other duckarray layers or if dask is not involved, and we're also in the process of moving away from special-casing dask. So short of asking cupy to treat 0d arrays like scalars I'm not sure how to fix this.

cc @jacobtomlinson

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7721/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1158378382 I_kwDOAMm_X85FC3OO 6323 propagation of `encoding` keewis 14808389 open 0     8 2022-03-03T12:57:29Z 2023-10-25T23:20:31Z   MEMBER      

What is your issue?

We frequently get bug reports related to encoding that can usually be fixed by clearing it or by overriding it using the encoding parameter of the to_* methods, e.g. - #4224 - #4380 - #4655 - #5427 - #5490 - fsspec/kerchunk#130

There are also a few discussions with more background: - https://github.com/pydata/xarray/pull/5065#issuecomment-806154872 - https://github.com/pydata/xarray/issues/1614 - #5082 - #5336

We discussed this in the meeting yesterday and as far as I can remember agreed that the current default behavior is not ideal and decided to investigate #5336: a keep_encoding option, similar to keep_attrs, that would be True (propagate encoding) by default but will be changed to False (drop encoding on any operation) in the future.

cc @rabernat, @shoyer

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6323/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
683142059 MDU6SXNzdWU2ODMxNDIwNTk= 4361 restructure the contributing guide keewis 14808389 open 0     5 2020-08-20T22:51:39Z 2023-03-31T17:39:00Z   MEMBER      

From #4355

@max-sixty:

Stepping back on the contributing doc — I admit I haven't look at it in a while — I wonder whether we can slim it down a bit, for example by linking to other docs for generic tooling — I imagine we're unlikely to have the best docs on working with GH, for example. Or referencing our PR template rather than the (now out-of-date) PR checklist.

We could also add a docstring guide since the numpydoc guide does not cover every little detail (for example, default notation, type spec vs. type hint, space before the colon separating parameter names from types, no colon for parameters without types, etc.)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4361/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1306795760 I_kwDOAMm_X85N5B7w 6793 improve docstrings with examples and links keewis 14808389 open 0     10 2022-07-16T12:30:33Z 2023-03-24T16:33:28Z   MEMBER      

This is a (incomplete) checklist for #5816 to make it easier to find methods that are in need of examples and links to the narrative docs with further information (of course, changes to the docstrings of all other methods / functions part of the public API are also appreciated).

Good examples explicitly construct small xarray objects to make it easier to follow (e.g. use np.{ones,full,zeros} or the np.array constructor instead of np.random / loading from files) and show both input and output of the function.

Use sh pytest --doctest-modules xarray --ignore xarray/tests/ to verify the examples, or push to a PR to have the CI do it for you (note that you will have much quicker feedback locally though).

To easily generate the expected output install pytest-accept (docs) in your dev environment and then run pytest --doctest-modules FILE_NAME --accept || true

To link to other documentation pages we can use python :doc:`project:label` Description of the linked page where we can leave out project if we link to somewhere within xarray's documentation. To figure out the label, we can either look at the source, search the output of python -m sphinx.ext.intersphinx https://docs.xarray.dev/en/latest/objects.inv, or use sphobjinv (install from PyPI): sh sphobjinv search -su https://docs.xarray.dev/en/latest/ missing

Top-level functions: - [ ] get_options - [ ] decode_cf - [ ] polyval - [ ] unify_chunks - [ ] infer_freq - [ ] date_range

I/O: - [ ] load_dataarray - [ ] load_dataset - [ ] open_dataarray - [ ] open_dataset - [ ] open_mfdataset

Contents: - [ ] DataArray.assign_attrs, Dataset.assign_attrs - [ ] DataArray.expand_dims, Dataset.expand_dims - [ ] DataArray.drop_duplicates, Dataset.drop_duplicates - [ ] DataArray.drop_vars, Dataset.drop_vars - [ ] Dataset.drop_dims - [ ] DataArray.convert_calendar, Dataset.convert_calendar - [ ] DataArray.set_coords, Dataset.set_coords - [ ] DataArray.reset_coords, Dataset.reset_coords

Comparisons: - [ ] DataArray.equals, Dataset.equals - [ ] DataArray.identical, Dataset.identical - [ ] DataArray.broadcast_equals, Dataset.broadcast_equals

Dask: - [ ] DataArray.compute, Dataset.compute - [ ] DataArray.chunk, Dataset.chunk - [ ] DataArray.persist, Dataset.persist

Missing values: - [ ] DataArray.bfill, Dataset.bfill - [ ] DataArray.ffill, Dataset.ffill - [ ] DataArray.fillna, Dataset.fillna - [ ] DataArray.dropna, Dataset.dropna

Indexing: - [ ] DataArray.loc (no docstring at all - came up in https://github.com/pydata/xarray/discussions/7528#discussion-4858556) - [ ] DataArray.drop_isel - [ ] DataArray.drop_sel - [ ] DataArray.head, Dataset.head - [ ] DataArray.tail, Dataset.tail - [ ] DataArray.interp_like, Dataset.interp_like - [ ] DataArray.reindex_like, Dataset.reindex_like - [ ] Dataset.isel

Aggregations: - [ ] Dataset.argmax - [ ] Dataset.argmin - [ ] DataArray.cumsum, Dataset.cumsum (intermediate to advanced) - [ ] DataArray.cumprod, Dataset.cumprod (intermediate to advanced) - [ ] DataArray.reduce, Dataset.reduce

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6793/reactions",
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
818059250 MDExOlB1bGxSZXF1ZXN0NTgxNDIzNTIx 4972 Automatic duck array testing - reductions keewis 14808389 open 0     23 2021-02-27T23:57:23Z 2022-08-16T13:47:05Z   MEMBER   1 pydata/xarray/pulls/4972

This is the first of a series of PRs to add a framework to make testing the integration of duck arrays as simple as possible. It uses hypothesis for increased coverage and maintainability.

  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4972/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
532696790 MDU6SXNzdWU1MzI2OTY3OTA= 3594 support for units with pint keewis 14808389 open 0     7 2019-12-04T13:49:28Z 2022-08-03T11:44:05Z   MEMBER      

pint's implementation of NEP-18 (see hgrecco/pint#905) is close enough so we can finally start working on the pint support (i.e. make the integration tests pass). This would be the list of tasks to get there: * integration tests: - [x] implement integration tests for DataArray, Dataset and top-level functions (#3238, #3447, #3493) - [x] add tests for Variable as discussed in #3493 (#3654) - [x] clean up the current tests (#3600) - [x] use the standard assert_identical and assert_allclose functions (#3611, #3643, #3654, #3706, #3975) - [x] clean up the TestVariable.test_pad tests * actually get xarray to support units: - [x] top-level functions (#3611) - [x] Variable (#3706) + rolling_window and identical need larger modifications - [x] DataArray (#3643) - [x] Dataset - [x] silence all the UnitStrippedWarnings in the testsuite (#4163) - [ ] try to get nanprod to work with quantities - [x] add support for per variable fill values (#4165) - [x] repr with units (#2773) - [ ] type hierarchy (e.g. for np.maximum(data_array, quantity) vs np.maximum(quantity, data_array)) (#3950) * update the documentation - [x] point to pint-xarray (see #4530) - [x] mention the requirement for UnitRegistry(force_ndarray=True) or UnitRegistry(force_ndarray_like=True) (see https://pint-xarray.readthedocs.io/en/stable/creation.html#attaching-units) - [x] list the known issues (see https://github.com/pydata/xarray/pull/3643#issue-354872657 and https://github.com/pydata/xarray/pull/3643#issuecomment-602225731) (#4530): + pandas (indexing) + bottleneck (bfill, ffill) + scipy (interp) + numbagg (rolling_exp) + numpy.lib.stride_tricks.as_strided: rolling + numpy.vectorize: interpolate_na - [x] ~update the install instructions (we can use standard conda / pip now)~ this should be done by pint-xarray

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3594/reactions",
    "total_count": 14,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 14,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
597566530 MDExOlB1bGxSZXF1ZXN0NDAxNjU2MTc1 3960 examples for special methods on accessors keewis 14808389 open 0     6 2020-04-09T21:34:30Z 2022-06-09T14:50:17Z   MEMBER   0 pydata/xarray/pulls/3960

This starts adding the parametrized accessor examples from #3829 to the accessor documentation as suggested by @jhamman. Since then the weighted methods have been added, though, so I'd like to use a different example instead (ideas welcome).

Also, this feature can be abused to add functions to the main DataArray / Dataset namespace (by registering a function with the register_*_accessor decorators, see the second example). Is this something we want to explicitly discourage?

(~When trying to build the docs locally, sphinx keeps complaining about a code block without code. Not sure what that is about~ seems the ipython directive does not allow more than one expression, so I used code instead)

  • [x] Closes #3829
  • [x] Passes isort -rc . && black . && mypy . && flake8
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3960/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
801728730 MDExOlB1bGxSZXF1ZXN0NTY3OTkzOTI3 4863 apply to dataset keewis 14808389 open 0     14 2021-02-05T00:05:22Z 2022-06-09T14:50:17Z   MEMBER   0 pydata/xarray/pulls/4863

as discussed in #4837, this adds a method that applies a function to a DataArray by first converting it to a temporary dataset using _to_temp_dataset, applies the function and converts it back. I'm not really happy with the name but I can't find a better one.

This function is really similar to pipe, so I guess a keyword argument to pipe would work, too. The disadvantage of that is that pipe passes all kwargs to the passed function, which means we would shadow a specific kwarg.

  • [x] Closes #4837
  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4863/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
959063390 MDExOlB1bGxSZXF1ZXN0NzAyMjM0ODc1 5668 create the context objects passed to custom `combine_attrs` functions keewis 14808389 open 0     1 2021-08-03T12:24:50Z 2022-06-09T14:50:16Z   MEMBER   0 pydata/xarray/pulls/5668

Follow-up to #4896: this creates the context object in reduce methods and passes it to merge_attrs, with more planned.

  • [ ] might help with xarray-contrib/cf-xarray#228
  • [ ] Tests added
  • [x] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

Note that for now this is a bit inconvenient to use for provenance tracking (as discussed in the cf-xarray issue) because functions implementing that would still have to deal with merging the attrs.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5668/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1265366275 I_kwDOAMm_X85La_UD 6678 exception groups keewis 14808389 open 0     1 2022-06-08T22:09:37Z 2022-06-08T23:38:28Z   MEMBER      

What is your issue?

As I mentioned in the meeting today, we have a lot of features where the the exception group support from PEP654 (which is scheduled for python 3.11 and consists of the class and a syntax change) might be useful. For example, we might want to collect all errors raised by rename in a exception group instead of raising them one-by-one.

For python<=3.10 there's a backport that contains the class and a workaround for the new syntax.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6678/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
624778130 MDU6SXNzdWU2MjQ3NzgxMzA= 4095 merging non-dimension coordinates with the Dataset constructor keewis 14808389 open 0     1 2020-05-26T10:30:37Z 2022-04-19T13:54:43Z   MEMBER      

When adding two DataArray objects with different coordinates to a Dataset, a MergeError is raised even though one of the conflicting coords is a subset of the other. Merging dimension coordinates works so I'd expect associated non-dimension coordinates to work, too.

This fails: ```python In [1]: import xarray as xr ...: import numpy as np

In [2]: a = np.linspace(0, 1, 10) ...: b = np.linspace(-1, 0, 12) ...: ...: x_a = np.arange(10) ...: x_b = np.arange(12) ...: ...: y_a = x_a * 1000 ...: y_b = x_b * 1000 ...: ...: arr1 = xr.DataArray(data=a, coords={"x": x_a, "y": ("x", y_a)}, dims="x") ...: arr2 = xr.DataArray(data=b, coords={"x": x_b, "y": ("x", y_b)}, dims="x") ...: ...: xr.Dataset({"a": arr1, "b": arr2}) ... MergeError: conflicting values for variable 'y' on objects to be combined. You can skip this check by specifying compat='override'. While this works:python In [3]: a = np.linspace(0, 1, 10) ...: b = np.linspace(-1, 0, 12) ...: ...: x_a = np.arange(10) ...: x_b = np.arange(12) ...: ...: y_a = x_a * 1000 ...: y_b = x_b * 1000 ...: ...: xr.Dataset({ ...: "a": xr.DataArray(data=a, coords={"x": x_a}, dims="x"), ...: "b": xr.DataArray(data=b, coords={"x": x_b}, dims="x"), ...: }) Out[3]: <xarray.Dataset> Dimensions: (x: 12) Coordinates: * x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 Data variables: a (x) float64 0.0 0.1111 0.2222 0.3333 0.4444 ... 0.8889 1.0 nan nan b (x) float64 -1.0 -0.9091 -0.8182 -0.7273 ... -0.1818 -0.09091 0.0 ```

I can work around this by calling: python In [4]: xr.merge([arr1.rename("a").to_dataset(), arr2.rename("b").to_dataset()]) Out[4]: <xarray.Dataset> Dimensions: (x: 12) Coordinates: * x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 y (x) float64 0.0 1e+03 2e+03 3e+03 ... 8e+03 9e+03 1e+04 1.1e+04 Data variables: a (x) float64 0.0 0.1111 0.2222 0.3333 0.4444 ... 0.8889 1.0 nan nan b (x) float64 -1.0 -0.9091 -0.8182 -0.7273 ... -0.1818 -0.09091 0.0 but I think the Dataset constructor should be capable of that, too.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4095/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
539181896 MDU6SXNzdWU1MzkxODE4OTY= 3638 load_store and dump_to_store keewis 14808389 open 0     1 2019-12-17T16:37:53Z 2021-11-08T21:11:26Z   MEMBER      

Continuing from #3602, load_store and dump_to_store look like they are old and unmaintained functions: * load_store is referenced once in api.rst (I assume the reference to from_store was to load_store), but never tested, used or mentioned anywhere else * dump_to_store is tested (and probably used), but never mentioned except from the section on backends

what should we do with these? Are they obsolete and should be removed or just unmaintained (then we should properly document and test them).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3638/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
789106802 MDU6SXNzdWU3ODkxMDY4MDI= 4825 clean up the API for renaming and changing dimensions / coordinates keewis 14808389 open 0     5 2021-01-19T15:11:55Z 2021-09-10T15:04:14Z   MEMBER      

From #4108:

I wonder if it would be better to first "reorganize" all of the existing functions: we currently have rename (and Dataset.rename_dims / Dataset.rename_vars), set_coords, reset_coords, set_index, reset_index and swap_dims, which overlap partially. For example, the code sample from #4417 works if instead of python ds = ds.rename(b='x') ds = ds.set_coords('x') we use python ds = ds.set_index(x="b") and something similar for the code sample in #4107.

I believe we currently have these use cases (not sure if that list is complete, though): - rename a DataArray → rename - rename a existing variable to a name that is not yet in the object → rename / Dataset.rename_vars / Dataset.rename_dims - convert a data variable to a coordinate (not a dimension coordinate) → set_coords - convert a coordinate (not a dimension coordinate) to a data variable → reset_coords - swap a existing dimension coordinate with a coordinate (which may not exist) and rename the dimension → swap_dims - use a existing coordinate / data variable as a dimension coordinate (do not rename the dimension) → set_index - stop using a coordinate as dimension coordinate and append _ to its name (do not rename the dimension) → reset_index - use two existing coordinates / data variables as a MultiIndex → set_index - stop using a MultiIndex as a dimension coordinate and use its levels as coordinates → reset_index

Sometimes, some of these can be emulated by combinations of others, for example: ```python

x is a dimension without coordinates

assert_identical(ds.set_index({"x": "b"}), ds.swap_dims({"x": "b"}).rename({"b": "x"})) assert_identical(ds.swap_dims({"x": "b"}), ds.set_index({"x": "b"}).rename({"x": "b"})) and, with this PR:python assert_identical(ds.set_index({"x": "b"}), ds.set_coords("b").rename({"b": "x"})) assert_identical(ds.swap_dims({"x": "b"}), ds.rename({"b": "x"})) `` which means that it would increase the overlap ofrename,set_index, andswap_dims`.

In any case I think we should add a guide which explains which method to pick in which situation (or extend howdoi).

Originally posted by @keewis in https://github.com/pydata/xarray/issues/4108#issuecomment-761907785

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4825/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
935531700 MDU6SXNzdWU5MzU1MzE3MDA= 5562 hooks to "prepare" xarray objects for plotting keewis 14808389 open 0     6 2021-07-02T08:14:02Z 2021-07-04T08:46:34Z   MEMBER      

From https://github.com/xarray-contrib/pint-xarray/pull/61#discussion_r662485351

matplotlib has a module called matplotlib.units which manages a mapping of types to hooks. This is then used to convert custom types to something matplotlib can work with, and to optionally add axis labels. For example, with pint: python In [9]: ureg = pint.UnitRegistry() ...: ureg.setup_matplotlib() ...: ...: t = ureg.Quantity(np.arange(10), "s") ...: v = ureg.Quantity(5, "m / s") ...: x = v * t ...: ...: fig, ax = plt.subplots(1, 1) ...: ax.plot(t, x) ...: ...: plt.show() this will plot the data without UnitStrippedWarnings and even attach the units as labels to the axis (the format is hard-coded in pint right now).

While this is pretty neat there are some issues: - xarray's plotting code converts to masked_array, dropping metadata on the duck array (which means matplotlib won't see the duck arrays) - we will end up overwriting the axis labels once the variable names are added (not sure if there's a way to specify a label format?) - it is matplotlib specific, which means we have to reimplement once we go through with the plotting entrypoints discussed in #3553 and #3640

All of this makes me wonder: should we try to maintain our own mapping of hooks which "prepare" the object based on the data's type? My initial idea would be that the hook function receives a Dataset or DataArray object and modifies it to convert the data to numpy arrays and optionally modifies the attrs.

For example for pint the hook would return the result of .pint.dequantify() but it could also be used to explicitly call .get on cupy arrays or .todense on sparse arrays.

xref #5561

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5562/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
589850951 MDU6SXNzdWU1ODk4NTA5NTE= 3917 running numpy functions on xarray objects keewis 14808389 open 0     1 2020-03-29T18:17:29Z 2021-07-04T02:00:22Z   MEMBER      

In the pint integration tests I tried to also test calling numpy functions on xarray objects (we provide methods for all of them).

Some of these functions, like numpy.median, numpy.searchsorted and numpy.clip, depend on __array_function__ (i.e. not __array_ufunc__) to dispatch. However, neither Dataset nor DataArray (nor Variable, I think?) define these protocols (see #3643).

Should we define __array_function__ on xarray objects?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3917/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
674445594 MDU6SXNzdWU2NzQ0NDU1OTQ= 4321 push inline formatting functions upstream keewis 14808389 open 0     0 2020-08-06T16:35:04Z 2021-04-19T03:20:11Z   MEMBER      

4248 added a _repr_inline_ method duck arrays can use to customize their collapsed variable repr.

We currently also have inline_dask_repr and inline_sparse_repr which remove redundant information like dtype and shape from dask and sparse arrays.

In order to reduce the complexity of inline_variable_array_repr, we could try to push these functions upstream.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4321/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
675342733 MDU6SXNzdWU2NzUzNDI3MzM= 4324 constructing nested inline reprs keewis 14808389 open 0     9 2020-08-07T23:25:31Z 2021-04-19T03:20:01Z   MEMBER      

While implementing the new _repr_inline_ in xarray-contrib/pint-xarray#22, I realized that I designed that method with a single level of nesting in mind, e.g. xarray(pint(x)) or xarray(dask(x)).

From that PR: @keewis

thinking about this some more, this doesn't work for anything other than numpy.ndarray objects. For now I guess we could use the magnitude's _repr_inline_ (falling back to __repr__ if that doesn't exist) and only use format_array_flat if the magnitude is a ndarray.

However, as we nest deeper (e.g. xarray(pint(uncertainties(dask(sparse(cupy))))) – for argument's sake, let's assume that this actually makes sense) this might break or become really complicated. Does anyone have any ideas how to deal with that?

If I'm simply missing something we have that discussion here, otherwise I guess we should open a issue on xarray's issue tracker.

@jthielen

Yes, I agree that format_array_flat should probably just apply to magnitude being an ndarray.

I think a cascading series of _repr_inline_ should work for nested arrays, so long as

* the metadata of the higher nested objects is considered the priority (if not, then we're back to a fully managed solution to the likes of [dask/dask#5329](https://github.com/dask/dask/issues/5329))

* small max lengths are handled gracefully (i.e., a minimum where it is just like `Dask.Array(...)`, then `...`, then nothing)

* we're okay with the lowest arrays in large nesting chains not having any information show up in the inline repr (situation where there is not enough characters to even describe the full nesting has to be accounted for somehow)

* it can be adopted without too much complaint across the ecosystem

Assuming all this, then each layer of the nesting will reduce the max length of the inline repr string available to the layers below it, until a layer reaches a reasonable minimum where it "gives up". At least that's the natural design that I inferred from the simple _repr_inline_(max_width) API.

All that being said, it might still be good to bring up on xarray's end since this is a more general issue with inline reprs of nested duck arrays, with nothing pint-specific other than it being the motivating use-case.

How should we deal with this?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4324/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
791277757 MDU6SXNzdWU3OTEyNzc3NTc= 4837 expose _to_temp_dataset / _from_temp_dataset as semi-public API? keewis 14808389 open 0     5 2021-01-21T16:11:32Z 2021-01-22T02:07:08Z   MEMBER      

When writing accessors which behave the same for both Dataset and DataArray, it would be incredibly useful to be able to use DataArray._to_temp_dataset / DataArray._from_temp_dataset to deduplicate code. Is it safe to use those in external packages (like pint-xarray)?

Otherwise I guess it would be possible to use python name = da.name if da.name is None else "__temp" temp_ds = da.to_dataset(name=name) new_da = temp_ds[name] if da.name is None: new_da = new_da.rename(da.name) assert_identical(da, new_da) but that seems less efficient.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4837/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
552896124 MDU6SXNzdWU1NTI4OTYxMjQ= 3711 PseudoNetCDF tests failing randomly keewis 14808389 open 0     6 2020-01-21T14:01:49Z 2020-03-23T20:32:32Z   MEMBER      

The py37-windows CI seems to fail for newer PRs: ```pytb ___ TestPseudoNetCDFFormat.test_uamiv_format_write ____

self = <xarray.tests.test_backends.TestPseudoNetCDFFormat object at 0x000002E11FF2DC08>

def test_uamiv_format_write(self):
fmtkw = {"format": "uamiv"}

expected = open_example_dataset(
        "example.uamiv", engine="pseudonetcdf", backend_kwargs=fmtkw
)
with self.roundtrip(
    expected,
        save_kwargs=fmtkw,
    open_kwargs={"backend_kwargs": fmtkw},
        allow_cleanup_failure=True,
) as actual:
      assert_identical(expected, actual)

xarray\tests\test_backends.py:3532:


xarray\core\formatting.py:628: in diff_dataset_repr summary.append(diff_attrs_repr(a.attrs, b.attrs, compat))


a_mapping = {'CPROJ': 0, 'FILEDESC': 'CAMx ', 'FTYPE': 1, 'GDNAM': 'CAMx ', ...} b_mapping = {'CPROJ': 0, 'FILEDESC': 'CAMx ', 'FTYPE': 1, 'GDNAM': 'CAMx ', ...} compat = 'identical', title = 'Attributes' summarizer = <function summarize_attr at 0x000002E1156813A8>, col_width = None

def _diff_mapping_repr(a_mapping, b_mapping, compat, title, summarizer, col_width=None):
    def extra_items_repr(extra_keys, mapping, ab_side):
        extra_repr = [summarizer(k, mapping[k], col_width) for k in extra_keys]
        if extra_repr:
            header = f"{title} only on the {ab_side} object:"
            return [header] + extra_repr
        else:
            return []

a_keys = set(a_mapping)
    b_keys = set(b_mapping)

    summary = []

diff_items = []

    for k in a_keys & b_keys:
    try:
            # compare xarray variable
            compatible = getattr(a_mapping[k], compat)(b_mapping[k])
            is_variable = True
        except AttributeError:
            # compare attribute value
            compatible = a_mapping[k] == b_mapping[k]
            is_variable = False
      if not compatible:

E ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3711/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
517195073 MDU6SXNzdWU1MTcxOTUwNzM= 3483 assign_coords with mixed DataArray / array args removes coords keewis 14808389 open 0     5 2019-11-04T14:38:40Z 2019-11-07T15:46:15Z   MEMBER      

I'm not sure if using assign_coords to overwrite the data of coords is the best way to do so, but using mixed args (on current master) turns out to have surprising results: ```python

obj = xr.DataArray( ... data=[6, 3, 4, 6], ... coords={"x": list("abcd"), "y": ("x", range(4))}, ... dims="x", ... ) obj <xarray.DataArray 'obj' (x: 4)> array([6, 3, 4, 6]) Coordinates: * x (x) <U1 'a' 'b' 'c' 'd' y (x) int64 0 1 2 3

works as expected

obj.assign_coords(coords={"x": list("efgh"), "y": ("x", [0, 2, 4, 6])}) <xarray.DataArray 'obj' (x: 4)> array([6, 3, 4, 6]) Coordinates: * x (x) <U1 'e' 'f' 'g' 'h' y (x) int64 0 2 4 6

works, too (same as .data / .values)

obj.assign_coords(coords={ ... "x": obj.x.copy(data=list("efgh")).variable, ... "y": ("x", [0, 2, 4, 6]), ... }) <xarray.DataArray 'obj' (x: 4)> array([6, 3, 4, 6]) Coordinates: * x (x) <U1 'e' 'f' 'g' 'h' y (x) int64 0 2 4 6

this drops "y"

obj.assign_coords(coords={ ... "x": obj.x.copy(data=list("efgh")), ... "y": ("x", [0, 2, 4, 6]), ... }) <xarray.DataArray 'obj' (x: 4)> array([6, 3, 4, 6]) Coordinates: * x (x) <U1 'e' 'f' 'g' 'h' Passing a `DataArray` for `y`, like `obj.y * 2` while also changing `x` (the type does not matter) always results in a `MergeError`:python obj.assign_coords(x=list("efgh"), y=obj.y * 2) xarray.core.merge.MergeError: conflicting values for index 'x' on objects to be combined: first value: Index(['e', 'f', 'g', 'h'], dtype='object', name='x') second value: Index(['a', 'b', 'c', 'd'], dtype='object', name='x') ```

I would expect the result to be the same regardless of the type of the new coords.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3483/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 35.977ms · About: xarray-datasette