id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 2120340151,PR_kwDOAMm_X85mHqI0,8714,Avoid coercing to numpy in `as_shared_dtypes`,35968931,open,0,,,3,2024-02-06T09:35:22Z,2024-03-28T18:31:50Z,,MEMBER,,0,pydata/xarray/pulls/8714,"- [x] Solves the problem in https://github.com/pydata/xarray/pull/8712#issuecomment-1929037299 - [ ] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] ~~New functions/methods are listed in `api.rst`~~ ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8714/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 2116695961,I_kwDOAMm_X85-KjeZ,8699,Wrapping a `kerchunk.Array` object directly with xarray,35968931,open,0,,,3,2024-02-03T22:15:07Z,2024-02-04T21:15:14Z,,MEMBER,,,,"### What is your issue? In https://github.com/fsspec/kerchunk/issues/377 the idea came up of using the xarray API to concatenate arrays which represent parts of a zarr store - i.e. using xarray to kerchunk a large set of netCDF files instead of using `kerchunk.combine.MultiZarrToZarr`. The [idea](https://github.com/fsspec/kerchunk/issues/377#issuecomment-1922688615) is to make something like this work for kerchunking sets of netCDF files into zarr stores ```python ds = xr.open_mfdataset( '/my/files*.nc' engine='kerchunk', # kerchunk registers an xarray IO backend that returns zarr.Array objects combine='nested', # 'by_coords' would require actually reading coordinate data parallel=True, # would use dask.delayed to generate reference dicts for each file in parallel ) ds # now wraps a bunch of zarr.Array / kerchunk.Array objects, no need for dask arrays ds.kerchunk.to_zarr(store='out.zarr') # kerchunk defines an xarray accessor that extracts the zarr arrays and serializes them (which could also be done in parallel if writing to parquet) ``` I had a go at doing this [in this notebook](https://gist.github.com/TomNicholas/d9eb8ac81d3fd214a23b5e921dbd72b7), and in doing so discovered a few potential issues with xarray's internals. For this to work xarray has to: - Wrap a `kerchunk.Array` object which barely defines any array API methods, including basically not supporting indexing at all, - Store all the information present in a kerchunked Zarr store but without ever loading any data, - Not create any indexes by default during dataset construction or during `xr.concat`, - Not try to do anything else that can't be defined for a `kerchunk.Array`. - Possibly we need the Lazy Indexing classes to support concatenation https://github.com/pydata/xarray/issues/4628 It's an interesting exercise in using xarray as an abstraction, with no access to real numerical values at all. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8699/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 1}",,,13221727,issue 1812811751,I_kwDOAMm_X85sDU_n,8008,"""Deep linking"" disparate documentation resources together",35968931,open,0,,,3,2023-07-19T22:18:55Z,2023-10-12T18:36:52Z,,MEMBER,,,,"### What is your issue? Our docs have a general issue with having lots of related resources that are not necessarily linked together in a useful way. This results in users (including myself!) getting ""stuck"" in one part of the docs and being unaware of material that would help them solve their specific issue. To give a concrete example, if a user wants to know about `coarsen`, there is relevant material: - In the [coarsen class docstring](https://docs.xarray.dev/en/stable/generated/xarray.core.rolling.DatasetCoarsen.html#xarray.core.rolling.DatasetCoarsen) - On the [reshaping page](https://docs.xarray.dev/en/stable/user-guide/reshaping.html#reshaping-via-coarsen) - On the [computations page](https://docs.xarray.dev/en/stable/user-guide/computation.html#coarsen-large-arrays) - On the [""how do I?"" page](https://docs.xarray.dev/en/stable/howdoi.html) - On the [tutorial repository](https://tutorial.xarray.dev/fundamentals/03.3_windowed.html?highlight=coarsen#coarsening) Different types of material are great, but only some of these resources are linked to others. `Coarsen` is actually pretty well covered overall, but for other functions there might be no useful linking at all, or no examples in the docstrings. --- The biggest missed opportunity here is the way all the great content on the [tutorial.xarray.dev](https://tutorial.xarray.dev/) repository is not linked from anywhere on the main documentation site (I believe). To address that we could either (a) integrate the `tutorial.xarray.dev` material into the main site or (b) add a lot more cross-linking between the two sites. Identifying sections that could be linked and adding links would be a great task for new contributors.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8008/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1694956396,I_kwDOAMm_X85lBvts,7813,Task naming for general chunkmanagers,35968931,open,0,,,3,2023-05-03T22:56:46Z,2023-05-05T10:30:39Z,,MEMBER,,,,"### What is your issue? (Follow-up to #7019) When you create a dask graph of xarray operations, the tasks in the graph get useful names according the name of the DataArray they operate on, or whether they represent an `open_dataset` call. Currently for cubed this doesn't work, for example this graph from https://github.com/pangeo-data/distributed-array-examples/issues/2#issuecomment-1533852877: ![image](https://user-images.githubusercontent.com/35968931/236056613-48f3925a-8aa6-418c-b204-1a57b612ff93.png) cc @tomwhite @dcherian ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7813/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1512290017,I_kwDOAMm_X85aI7bh,7403,Zarr error when trying to overwrite part of existing store,35968931,open,0,,,3,2022-12-28T00:40:16Z,2023-01-11T21:26:10Z,,MEMBER,,,,"### What happened? `to_zarr` threw an error when I tried to overwrite part of an existing zarr store. ### What did you expect to happen? With mode `w` I was expecting it to overwrite part of the store with no complaints. I expected that because that's what the docstring of `to_zarr` says: > `mode ({""w"", ""w-"", ""a"", ""r+"", None}, optional)` – Persistence mode: “w” means create (overwrite if exists); “w-” means create (fail if exists); “a” means override existing variables (create if does not exist); The default mode is ""w"", so I was expecting it to overwrite. ### Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np np.random.seed(0) ds = xr.Dataset() ds[""data""] = (['x', 'y'], np.random.random((100,100))) ds.to_zarr(""test.zarr"") print(ds[""data""].mean().compute()) # returns array(0.49645889) as expected ds = xr.open_dataset(""test.zarr"", engine='zarr', chunks={}) ds[""data""].mean().compute() print(ds[""data""].mean().compute()) # still returns array(0.49645889) as expected ds.to_zarr(""test.zarr"", mode=""a"") ``` ```python array(0.49645889) array(0.49645889) Traceback (most recent call last): File ""/home/tom/Documents/Work/Code/experimentation/bugs/datatree_nans/mwe_xarray.py"", line 16, in ds.to_zarr(""test.zarr"") File ""/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/core/dataset.py"", line 2091, in to_zarr return to_zarr( # type: ignore File ""/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/backends/api.py"", line 1628, in to_zarr zstore = backends.ZarrStore.open_group( File ""/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/backends/zarr.py"", line 420, in open_group zarr_group = zarr.open_group(store, **open_kwargs) File ""/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/zarr/hierarchy.py"", line 1389, in open_group raise ContainsGroupError(path) zarr.errors.ContainsGroupError: path '' contains a group ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output _No response_ ### Anything else we need to know? I would like to know what the intended result is supposed to be here, so that I can make sure datatree behaves the same way, see https://github.com/xarray-contrib/datatree/issues/168. ### Environment Main branch of xarray, zarr v2.13.3","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7403/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 906023492,MDExOlB1bGxSZXF1ZXN0NjU3MDYxODI5,5400,Multidimensional histogram,35968931,open,0,,,3,2021-05-28T20:38:53Z,2022-11-21T22:41:01Z,,MEMBER,,0,pydata/xarray/pulls/5400,"Initial work on integrating the multi-dimensional dask-powered histogram functionality from xhistogram into xarray. Just working on the skeleton to fit around the histogram algorithm for now, to be filled in later. - [x] Closes #4610 - [x] API skeleton - [x] Input checking - [ ] Internal `blockwise` algorithm from https://github.com/xgcm/xhistogram/pull/49 - [x] Redirect `plot.hist` - [x] `da.weighted().hist()` - [ ] Tests added for results - [x] Hypothesis tests for different chunking patterns - [ ] Examples in documentation - [ ] Examples in docstrings - [x] Type hints (first time trying these so might be wrong) - [ ] Passes `pre-commit run --all-files` - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [x] New functions/methods are listed in `api.rst` - [x] Range argument - [ ] Handle multidimensional bins (for a future PR? - See https://github.com/xgcm/xhistogram/pull/59) - [ ] Handle `np.datetime64` dtypes by refactoring to use `np.searchsorted` (for a future PR? See [discussion](https://github.com/xgcm/xhistogram/pull/44#issuecomment-861139042)) - [ ] Fast path for uniform bin widths (for a future PR? See [suggestion](https://github.com/xgcm/xhistogram/issues/63#issuecomment-861662430)) Question: `da.hist()` or `da.histogram()`?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5400/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 400289716,MDU6SXNzdWU0MDAyODk3MTY=,2686,Is `create_test_data()` public API?,35968931,open,0,,,3,2019-01-17T14:00:20Z,2022-04-09T01:48:14Z,,MEMBER,,,,"We want to encourage people to use and extend xarray, and we already provide testing functions as public API to help with this. One function I keep using when writing code which uses xarray is `xarray.tests.test_dataset.create_test_data()`. This is very useful for quickly writing tests for the same reasons that it's useful in xarray's internal tests, but it's not explicitly public API. This means that there's no guarantee it won't change/disappear, which is [not ideal](https://github.com/boutproject/xBOUT/issues/26) if you're trying to write a test suite for separate software. But so many tests in xarray rely on it that presumably it's not going to get changed. Is there any reason why it shouldn't be public API? Is there something I should use instead? ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2686/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue