id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 2266174558,I_kwDOAMm_X86HExRe,8975,Xarray sponsorship guidelines,1217238,open,0,,,3,2024-04-26T17:05:01Z,2024-04-30T20:52:33Z,,MEMBER,,,,"### At what level of support should Xarray acknowledge sponsors on our website? I would like to surface this for open discussion because there are potential sponsoring organizations with conflicts of interest with members of Xarray's leadership team (e.g., [Earthmover](https://earthmover.io/), which employs @jhamman, @rabernat and @dcherian). My suggestion is to use [NumPy's guidelines](https://numpy.org/neps/nep-0046-sponsorship-guidelines.html), with an adjustment down to 1/3 of the thresholds to account for the smaller size of the project: - $10,000/yr for unrestricted financial contributions (e.g., donations) - $20,000/yr for financial contributions for a particular purpose (e.g., grants) - $30,000/yr for in-kind contributions (e.g., time for employees to contribute) - 2 person-months/yr of paid work time for one or more Xarray maintainers or regular contributors to any Xarray team or activity The NumPy guidelines also include a grace period of a minimum of 6 months for acknowledging support. I would suggest increasing this to a minimum of 1 year for Xarray. I would greatly appreciate any feedback from members of the community, either in this issue or on the next [team meeting](https://docs.xarray.dev/en/stable/developers-meeting.html).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8975/reactions"", ""total_count"": 6, ""+1"": 5, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 271043420,MDU6SXNzdWUyNzEwNDM0MjA=,1689,Roundtrip serialization of coordinate variables with spaces in their names,1217238,open,0,,,5,2017-11-03T16:43:20Z,2024-03-22T14:02:48Z,,MEMBER,,,,"If coordinates have spaces in their names, they get restored from netCDF files as data variables instead: ``` >>> xarray.open_dataset(xarray.Dataset(coords={'name with spaces': 1}).to_netcdf()) Dimensions: () Data variables: name with spaces int32 1 ```` This happens because the CF convention is to indicate coordinates as a space separated string, e.g., `coordinates='latitude longitude'`. Even though these aren't CF compliant variable names (which cannot have strings) It would be nice to have an ad-hoc convention for xarray that allows us to serialize/deserialize coordinates in all/most cases. Maybe we could use escape characters for spaces (e.g., `coordinates='name\ with\ spaces'`) or quote names if they have spaces (e.g., `coordinates='""name\ with\ spaces""'`? At the very least, we should issue a warning in these cases.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1689/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 267542085,MDU6SXNzdWUyNjc1NDIwODU=,1647,Representing missing values in string arrays on disk,1217238,closed,0,,,3,2017-10-23T05:01:10Z,2024-02-06T13:03:40Z,2024-02-06T13:03:40Z,MEMBER,,,,"This came up as part of my clean-up of serializing unicode strings in https://github.com/pydata/xarray/pull/1648. There are two ways to represent strings in netCDF files. - As character arrays (`NC_CHAR`), supported by both netCDF3 and netCDF4 - As variable length unicode strings (`NC_STRING`), only supported by netCDF4/HDF5. Currently, by default (if no `_FillValue` is set) we replace missing values (NaN) with an empty string when writing data to disk. For character arrays, we *could* use the normal `_FillValue` mechanism to set a fill value and decode when data is read back from disk. In fact, this already currently works for `dtype=bytes` (though it isn't documented): ``` In [10]: ds = xr.Dataset({'foo': ('x', np.array([b'bar', np.nan], dtype=object), {}, {'_FillValue': b''})}) In [11]: ds Out[11]: Dimensions: (x: 2) Dimensions without coordinates: x Data variables: foo (x) object b'bar' nan In [12]: ds.to_netcdf('foobar.nc') In [13]: xr.open_dataset('foobar.nc').load() Out[13]: Dimensions: (x: 2) Dimensions without coordinates: x Data variables: foo (x) object b'bar' nan ``` For variable length strings, it [currently isn't possible](https://github.com/Unidata/netcdf4-python/issues/730) to set a fill-value. So there's no good way to indicate missing values, though this may change if the future depending on the resolution of the netCDF-python issue. It would obviously be nice to always automatically round-trip missing values, both for strings and bytes. I see two possible ways to do this: 1. Require setting an explicit `_FillValue` when a string contains missing values, by raising an error if this isn't done. We need an explicit choice because there aren't any extra unused characters left over, at least for character arrays. (NetCDF explicitly allows arbitrary bytes to be stored in `NC_CHAR`, even though this maps to an HDF5 fixed-width string with ASCII encoding.) For variable length strings, we could potentially set a [non-character unicode symbol](https://en.wikipedia.org/wiki/Specials_(Unicode_block)) like `U+FFFF`, but again that isn't supported yet. 2. Treat empty strings as equivalent to a missing value (NaN). This has the advantage of not requiring an explicit choice of `_FillValue`, so we don't need to wait for any netCDF4 issues to be resolved. However, this does mean that empty strings would not round-trip. Still, given the relative prevalence of missing values vs empty strings in xarray/pandas, it's probably the lesser evil to not preserve empty string. The default option is to adopt neither of these, and keep the current behavior where missing values are written as empty strings and not decoded at all. Any opinions? I am leaning towards option (2).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1647/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 842436143,MDU6SXNzdWU4NDI0MzYxNDM=,5081,Lazy indexing arrays as a stand-alone package,1217238,open,0,,,6,2021-03-27T07:06:03Z,2023-12-15T13:20:03Z,,MEMBER,,,,"From @rabernat on [Twitter](https://twitter.com/rabernat/status/1330707155742322689): > ""Xarray has some secret private classes for lazily indexing / wrapping arrays that are so useful I think they should be broken out into a standalone package. https://github.com/pydata/xarray/blob/master/xarray/core/indexing.py#L516"" The idea here is create a first-class ""duck array"" library for lazy indexing that could replace xarray's internal classes for lazy indexing. This would be in some ways similar to dask.array, but much simpler, because it doesn't have to worry about parallel computing. Desired features: - Lazy indexing - Lazy transposes - Lazy concatenation (#4628) and stacking - Lazy vectorized operations (e.g., unary and binary arithmetic) - needed for decoding variables from disk (`xarray.encoding`) and - building lazy multi-dimensional coordinate arrays corresponding to map projections (#3620) - Maybe: lazy reshapes (#4113) A common feature of these operations is they can (and almost always should) be _fused_ with indexing: if N elements are selected via indexing, only O(N) compute and memory is required to produce them, regards of the size of the original arrays as long as the number of applied operations can be treated as a constant. Memory access is significantly slower than compute on modern hardware, so recomputing these operations on the fly is almost always a good idea. Out of scope: lazy computation when indexing could require access to many more elements to compute the desired value than are returned. For example, `mean()` probably should not be lazy, because that could involve computation of a very large number of elements that one might want to cache. This is valuable functionality for Xarray for two reasons: 1. It allows for ""previewing"" small bits of data loaded from disk or remote storage, even if that data needs some form of cheap ""decoding"" from its form on disk. 2. It allows for xarray to decode data in a lazy fashion that is compatible with full-featured systems for lazy computation (e.g., Dask), without requiring the user to choose dask when reading the data. Related issues: - [Proposal] Expose Variable without Pandas dependency #3981 - Lazy concatenation of arrays #4628 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5081/reactions"", ""total_count"": 6, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 6, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 197939448,MDU6SXNzdWUxOTc5Mzk0NDg=,1189,Document using a spawning multiprocessing pool for multiprocessing with dask,1217238,closed,0,,,3,2016-12-29T01:21:50Z,2023-12-05T21:51:04Z,2023-12-05T21:51:04Z,MEMBER,,,,"This is a nice option for working with in-file HFD5/netCDF4 compression: https://github.com/pydata/xarray/pull/1128#issuecomment-261936849 Mixed multi-threading/multi-processing could also be interesting, if anyone wants to revive that: https://github.com/dask/dask/pull/457 (I think it would work now that xarray data stores are pickle-able) CC @mrocklin","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1189/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 430188626,MDU6SXNzdWU0MzAxODg2MjY=,2873,Dask distributed tests fail locally,1217238,closed,0,,,3,2019-04-07T20:26:53Z,2023-12-05T21:43:02Z,2023-12-05T21:43:02Z,MEMBER,,,,"I'm not sure why, but when I run the integration tests with dask-distributed locally (on my MacBook pro), they fail: ``` $ pytest xarray/tests/test_distributed.py --maxfail 1 ================================================ test session starts ================================================= platform darwin -- Python 3.7.2, pytest-4.0.1, py-1.7.0, pluggy-0.8.0 rootdir: /Users/shoyer/dev/xarray, inifile: setup.cfg plugins: repeat-0.7.0 collected 19 items xarray/tests/test_distributed.py F ====================================================== FAILURES ====================================================== __________________________ test_dask_distributed_netcdf_roundtrip[netcdf4-NETCDF3_CLASSIC] ___________________________ loop = tmp_netcdf_filename = '/private/var/folders/15/qdcz0wqj1t9dg40m_ld0fjkh00b4kd/T/pytest-of-shoyer/pytest-3/test_dask_distributed_netcdf_r0/testfile.nc' engine = 'netcdf4', nc_format = 'NETCDF3_CLASSIC' @pytest.mark.parametrize('engine,nc_format', ENGINES_AND_FORMATS) # noqa def test_dask_distributed_netcdf_roundtrip( loop, tmp_netcdf_filename, engine, nc_format): if engine not in ENGINES: pytest.skip('engine not available') chunks = {'dim1': 4, 'dim2': 3, 'dim3': 6} with cluster() as (s, [a, b]): with Client(s['address'], loop=loop): original = create_test_data().chunk(chunks) if engine == 'scipy': with pytest.raises(NotImplementedError): original.to_netcdf(tmp_netcdf_filename, engine=engine, format=nc_format) return original.to_netcdf(tmp_netcdf_filename, engine=engine, format=nc_format) with xr.open_dataset(tmp_netcdf_filename, chunks=chunks, engine=engine) as restored: assert isinstance(restored.var1.data, da.Array) computed = restored.compute() > assert_allclose(original, computed) xarray/tests/test_distributed.py:87: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../miniconda3/envs/xarray-py37/lib/python3.7/contextlib.py:119: in __exit__ next(self.gen) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ nworkers = 2, nanny = False, worker_kwargs = {}, active_rpc_timeout = 1, scheduler_kwargs = {} @contextmanager def cluster(nworkers=2, nanny=False, worker_kwargs={}, active_rpc_timeout=1, scheduler_kwargs={}): ... # trimmed start = time() while list(ws): sleep(0.01) > assert time() < start + 1, 'Workers still around after one second' E AssertionError: Workers still around after one second ../../miniconda3/envs/xarray-py37/lib/python3.7/site-packages/distributed/utils_test.py:721: AssertionError ------------------------------------------------ Captured stderr call ------------------------------------------------ distributed.scheduler - INFO - Clear task state distributed.scheduler - INFO - Scheduler at: tcp://127.0.0.1:51715 distributed.worker - INFO - Start worker at: tcp://127.0.0.1:51718 distributed.worker - INFO - Listening to: tcp://127.0.0.1:51718 distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:51715 distributed.worker - INFO - ------------------------------------------------- distributed.worker - INFO - Threads: 1 distributed.worker - INFO - Memory: 17.18 GB distributed.worker - INFO - Local Directory: /Users/shoyer/dev/xarray/_test_worker-5cabd1b7-4d9c-49eb-a79e-205c588f5dae/worker-n8uv72yx distributed.worker - INFO - ------------------------------------------------- distributed.worker - INFO - Start worker at: tcp://127.0.0.1:51720 distributed.worker - INFO - Listening to: tcp://127.0.0.1:51720 distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:51715 distributed.scheduler - INFO - Register tcp://127.0.0.1:51718 distributed.worker - INFO - ------------------------------------------------- distributed.worker - INFO - Threads: 1 distributed.worker - INFO - Memory: 17.18 GB distributed.worker - INFO - Local Directory: /Users/shoyer/dev/xarray/_test_worker-71a426d4-bd34-4808-9d33-79cac2bb4801/worker-a70rlf4r distributed.worker - INFO - ------------------------------------------------- distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:51718 distributed.core - INFO - Starting established connection distributed.worker - INFO - Registered to: tcp://127.0.0.1:51715 distributed.worker - INFO - ------------------------------------------------- distributed.core - INFO - Starting established connection distributed.scheduler - INFO - Register tcp://127.0.0.1:51720 distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:51720 distributed.core - INFO - Starting established connection distributed.worker - INFO - Registered to: tcp://127.0.0.1:51715 distributed.worker - INFO - ------------------------------------------------- distributed.core - INFO - Starting established connection distributed.scheduler - INFO - Receive client connection: Client-59a7918c-5972-11e9-912a-8c85907bce57 distributed.core - INFO - Starting established connection distributed.core - INFO - Event loop was unresponsive in Worker for 1.05s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability. distributed.scheduler - INFO - Receive client connection: Client-worker-5a5c81de-5972-11e9-9136-8c85907bce57 distributed.core - INFO - Starting established connection distributed.core - INFO - Event loop was unresponsive in Worker for 1.33s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability. distributed.scheduler - INFO - Receive client connection: Client-worker-5b2496d8-5972-11e9-9137-8c85907bce57 distributed.core - INFO - Starting established connection distributed.scheduler - INFO - Remove client Client-59a7918c-5972-11e9-912a-8c85907bce57 distributed.scheduler - INFO - Remove client Client-59a7918c-5972-11e9-912a-8c85907bce57 distributed.scheduler - INFO - Close client connection: Client-59a7918c-5972-11e9-912a-8c85907bce57 distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:51720 distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:51718 distributed.scheduler - INFO - Remove worker tcp://127.0.0.1:51720 distributed.core - INFO - Removing comms to tcp://127.0.0.1:51720 distributed.scheduler - INFO - Remove worker tcp://127.0.0.1:51718 distributed.core - INFO - Removing comms to tcp://127.0.0.1:51718 distributed.scheduler - INFO - Lost all workers distributed.scheduler - INFO - Remove client Client-worker-5b2496d8-5972-11e9-9137-8c85907bce57 distributed.scheduler - INFO - Remove client Client-worker-5a5c81de-5972-11e9-9136-8c85907bce57 distributed.scheduler - INFO - Close client connection: Client-worker-5b2496d8-5972-11e9-9137-8c85907bce57 distributed.scheduler - INFO - Close client connection: Client-worker-5a5c81de-5972-11e9-9136-8c85907bce57 distributed.scheduler - INFO - Scheduler closing... distributed.scheduler - INFO - Scheduler closing all comms ``` Version info: ``` In [2]: xarray.show_versions() INSTALLED VERSIONS ------------------ commit: 2ce0639ee2ba9c7b1503356965f77d847d6cfcdf python: 3.7.2 (default, Dec 29 2018, 00:00:04) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1+4.g2ce0639e pandas: 0.24.0 numpy: 1.15.4 scipy: 1.1.0 netCDF4: 1.4.3.2 pydap: None h5netcdf: 0.7.0 h5py: 2.9.0 Nio: None zarr: 2.2.0 cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 1.1.5 distributed: 1.26.1 matplotlib: 3.0.2 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.0.0 pip: 18.0 conda: None pytest: 4.0.1 IPython: 6.5.0 sphinx: 1.8.2 ``` @mrocklin does this sort of error look familiar to you?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2873/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue 707647715,MDExOlB1bGxSZXF1ZXN0NDkyMDEzODg4,4453,Simplify and restore old behavior for deep-copies,1217238,closed,0,,,3,2020-09-23T20:10:33Z,2023-09-14T03:06:34Z,2023-09-14T03:06:33Z,MEMBER,,1,pydata/xarray/pulls/4453,"Intended to fix https://github.com/pydata/xarray/issues/4449 The goal is to restore behavior to match what we had prior to https://github.com/pydata/xarray/pull/4379 for all types of `data` other than `np.ndarray` objects Needs tests! - [ ] Closes #xxxx - [ ] Tests added - [ ] Passes `isort . && black . && mypy . && flake8` - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4453/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 588105641,MDU6SXNzdWU1ODgxMDU2NDE=,3893,HTML repr in the online docs,1217238,open,0,,,3,2020-03-26T02:17:51Z,2023-09-11T17:41:59Z,,MEMBER,,,,"I noticed two minor issues in our online docs, now that we've switched to the hip new HTML repr by default. 1. Most doc pages still show text, not HTML. I suspect this is a limitation of the [IPython sphinx derictive](https://ipython.readthedocs.io/en/stable/sphinxext.html) we use for our snippets. We might be able to fix that by switching to [jupyter-sphinx](https://jupyter-sphinx.readthedocs.io/en/latest/)? 2. The ""attributes"" part of the HTML repr in our notebook examples [looks a little funny](http://xarray.pydata.org/en/stable/examples/multidimensional-coords.html), with strange blue formatting around each attribute name. It looks like part of the outer style of our docs is leaking into the HTML repr: ![image](https://user-images.githubusercontent.com/1217238/77603390-31bc5a80-6ecd-11ea-911d-f2b6ed2714f6.png) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3893/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1376109308,I_kwDOAMm_X85SBcL8,7045,Should Xarray stop doing automatic index-based alignment?,1217238,open,0,,,13,2022-09-16T15:31:03Z,2023-08-23T07:42:34Z,,MEMBER,,,,"### What is your issue? I am increasingly thinking that automatic index-based alignment in Xarray (copied from pandas) may have been a design mistake. Almost every time I work with datasets with different indexes, I find myself writing code to explicitly align them: 1. Automatic alignment is **hard to predict**. The implementation is complicated, and the exact mode of automatic alignment (outer vs inner vs left join) depends on the specific operation. It's also no longer possible to predict the shape (or even the dtype) resulting from most Xarray operations purely from input shape/dtype. 2. Automatic alignment brings unexpected **performance penalty**. In some domains (analytics) this is OK, but in others (e.g,. numerical modeling or deep learning) this is a complete deal-breaker. 3. Automatic alignment is **not useful for float indexes**, because exact matches are rare. In practice, this makes it less useful in Xarray's usual domains than it for pandas. Would it be insane to consider changing Xarray's behavior to stop doing automatic alignment? I imagine we could roll this out slowly, first with warnings and then with an option for disabling it. If you think this is a good or bad idea, consider responding to this issue with a 👍 or 👎 reaction.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7045/reactions"", ""total_count"": 13, ""+1"": 9, ""-1"": 2, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 2}",,,13221727,issue 342928718,MDExOlB1bGxSZXF1ZXN0MjAyNzE0MjUx,2302,WIP: lazy=True in apply_ufunc(),1217238,open,0,,,1,2018-07-20T00:01:21Z,2023-07-18T04:19:17Z,,MEMBER,,0,pydata/xarray/pulls/2302," - [x] Closes https://github.com/pydata/xarray/issues/2298 - [ ] Tests added - [ ] Tests passed - [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API Still needs more tests and documentation.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2302/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1767947798,PR_kwDOAMm_X85TkPzV,7933,Update calendar for developers meeting,1217238,closed,0,,,0,2023-06-21T16:09:44Z,2023-06-21T17:56:22Z,2023-06-21T17:56:22Z,MEMBER,,0,pydata/xarray/pulls/7933,"The old calendar was on @jhamman's UCAR account, which he no longer has access to! xref https://github.com/pydata/xarray/issues/4001","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7933/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 479942077,MDU6SXNzdWU0Nzk5NDIwNzc=,3213,How should xarray use/support sparse arrays?,1217238,open,0,,,55,2019-08-13T03:29:42Z,2023-06-07T15:43:55Z,,MEMBER,,,,"I'm looking forward to being easily able to create sparse xarray objects from pandas: https://github.com/pydata/xarray/issues/3206 Are there other xarray APIs that could make good use of sparse arrays, or could make sparse arrays easier to use? Some ideas: - `to_sparse()`/`to_dense()` methods for converting to/from sparse without requiring using `.data` - `to_dataframe()`/`to_series()` could grow options for skipping the fill-value in sparse arrays, so they can round-trip MultiIndex data back to pandas - Serialization to/from netCDF files, using some custom convention (see https://github.com/pydata/xarray/issues/1375#issuecomment-402699810)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3213/reactions"", ""total_count"": 14, ""+1"": 14, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1465287257,I_kwDOAMm_X85XVoJZ,7325,Support reading Zarr data via TensorStore,1217238,open,0,,,1,2022-11-27T00:12:17Z,2023-05-11T01:24:27Z,,MEMBER,,,,"### What is your issue? [TensorStore](https://github.com/google/tensorstore/) is another high performance API for reading distributed arrays in formats such as Zarr, written in C++. It could be interesting to write an Xarray storage backend using TensorStore as an alternative way to read Zarr files. As an exercise, I make a little demo of doing this: https://gist.github.com/shoyer/5b0c485979cc9c36a9685d8cf8e94565 I have not tested it for performance. The main annoyance is that TensorStore doesn't understand Zarr groups or Zarr array attributes, so I needed to write my own helpers for reading this metadata. Also, there's a bit of an impedance mis-match between TensorStore (where everything returns futures) and Xarray (which assumes that indexing results in numpy arrays). This could likely be improved with some amount of effort -- in particular https://github.com/pydata/xarray/pull/6874/files should help. CC @jbms who may have better ideas about how to use the TensorStore API.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7325/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 253395960,MDU6SXNzdWUyNTMzOTU5NjA=,1533,Index variables loaded from dask can be computed twice,1217238,closed,0,,,6,2017-08-28T17:18:27Z,2023-04-06T04:15:46Z,2023-04-06T04:15:46Z,MEMBER,,,,as reported by @crusaderky in #1522 ,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1533/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 209653741,MDU6SXNzdWUyMDk2NTM3NDE=,1285,FAQ page could use some updating,1217238,open,0,,,1,2017-02-23T03:29:16Z,2023-03-26T16:32:44Z,,MEMBER,,,,"Along the same lines as https://github.com/pydata/xarray/issues/1282, we haven't done much updating for frequently asked questions -- it's mostly still the original handful of FAQ entries I wrote in the first version of the docs. Topics worth addressing: - [ ] How xarray handles missing values - [x] File formats -- how can I read format *X* in xarray? (Maybe we should make a table with links to other packages?) (please add suggestions for this list!) StackOverflow may be a helpful reference here: http://stackoverflow.com/questions/tagged/python-xarray?sort=votes&pageSize=50","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1285/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 176805500,MDU6SXNzdWUxNzY4MDU1MDA=,1004,Remove IndexVariable.name,1217238,open,0,,,3,2016-09-14T03:27:43Z,2023-03-11T19:57:40Z,,MEMBER,,,,"As discussed in #947, we should remove the `IndexVariable.name` attribute. It should be fine to use an `IndexVariable` anywhere, regardless of whether or not it labels ticks along a dimension. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1004/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 98587746,MDU6SXNzdWU5ODU4Nzc0Ng==,508,Ignore missing variables when concatenating datasets?,1217238,closed,0,,,8,2015-08-02T06:03:57Z,2023-01-20T16:04:28Z,2023-01-20T16:04:28Z,MEMBER,,,,"Several users (@raj-kesavan, @richardotis, now myself) have wondered about how to concatenate xray Datasets with different variables. With the current `xray.concat`, you need to awkwardly create dummy variables filled with `NaN` in datasets that don't have them (or drop mismatched variables entirely). Neither of these are great options -- `concat` should have an option (the default?) to take care of this for the user. This would also be more consistent with `pd.concat`, which takes a more relaxed approach to matching dataframes with different variables (it does an outer join). ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/508/reactions"", ""total_count"": 6, ""+1"": 6, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 895983112,MDExOlB1bGxSZXF1ZXN0NjQ4MTM1NTcy,5351,Add xarray.backends.NoMatchingEngineError,1217238,open,0,,,4,2021-05-19T22:09:21Z,2022-11-16T15:19:54Z,,MEMBER,,0,pydata/xarray/pulls/5351," - [x] Closes #5329 - [x] Tests added - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [x] New functions/methods are listed in `api.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5351/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 803068773,MDExOlB1bGxSZXF1ZXN0NTY5MDU5MTEz,4879,Cache files for different CachingFileManager objects separately,1217238,closed,0,,,10,2021-02-07T21:48:06Z,2022-10-18T16:40:41Z,2022-10-18T16:40:40Z,MEMBER,,0,pydata/xarray/pulls/4879,"This means that explicitly opening a file multiple times with ``open_dataset`` (e.g., after modifying it on disk) now reopens the file from scratch, rather than reusing a cached version. If users want to reuse the cached file, they can reuse the same xarray object. We don't need this for handling many files in Dask (the original motivation for caching), because in those cases only a single CachingFileManager is created. I think this should some long-standing usability issues: #4240, #4862 Conveniently, this also obviates the need for some messy reference counting logic. - [x] Closes #4240, #4862 - [x] Tests added - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4879/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 623804131,MDU6SXNzdWU2MjM4MDQxMzE=,4090,Error with indexing 2D lat/lon coordinates,1217238,closed,0,,,2,2020-05-24T06:19:45Z,2022-09-28T12:06:03Z,2022-09-28T12:06:03Z,MEMBER,,,,"``` filslp = ""ChonghuaYinData/prmsl.mon.mean.nc"" filtmp = ""ChonghuaYinData/air.sig995.mon.mean.nc"" filprc = ""ChonghuaYinData/precip.mon.mean.nc"" ds_slp = xr.open_dataset(filslp).sel(time=slice(str(yrStrt)+'-01-01', str(yrLast)+'-12-31')) ds_slp ``` outputs: ``` Dimensions: (nbnds: 2, time: 480, x: 349, y: 277) Coordinates: * time (time) datetime64[ns] 1979-01-01 ... 2018-12-01 lat (y, x) float32 ... lon (y, x) float32 ... * y (y) float32 0.0 32463.0 64926.0 ... 8927325.0 8959788.0 * x (x) float32 0.0 32463.0 64926.0 ... 11264660.0 11297120.0 Dimensions without coordinates: nbnds Data variables: Lambert_Conformal int32 ... prmsl (time, y, x) float32 ... time_bnds (time, nbnds) float64 ... Attributes: Conventions: CF-1.2 centerlat: 50.0 centerlon: -107.0 comments: institution: National Centers for Environmental Prediction latcorners: [ 1.000001 0.897945 46.3544 46.63433 ] loncorners: [-145.5 -68.32005 -2.569891 148.6418 ] platform: Model standardpar1: 50.0 standardpar2: 50.000001 title: NARR Monthly Means dataset_title: NCEP North American Regional Reanalysis (NARR) history: created 2016/04/12 by NOAA/ESRL/PSD references: https://www.esrl.noaa.gov/psd/data/gridded/data.narr.html source: http://www.emc.ncep.noaa.gov/mmb/rreanl/index.html References: ``` ``` yrStrt = 1950 # manually specify for convenience yrLast = 2018 # 20th century ends 2018 clStrt = 1950 # reference climatology for SOI clLast = 1979 yrStrtP = 1979 # 1st year GPCP yrLastP = yrLast # match 20th century latT = -17.6 # Tahiti lonT = 210.75 latD = -12.5 # Darwin lonD = 130.83 # select grids of T and D T = ds_slp.sel(lat=latT, lon=lonT, method='nearest') D = ds_slp.sel(lat=latD, lon=lonD, method='nearest') ``` outputs: ``` --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in 1 # select grids of T and D ----> 2 T = ds_slp.sel(lat=latT, lon=lonT, method='nearest') 3 D = ds_slp.sel(lat=latD, lon=lonD, method='nearest') ~\Anaconda3\lib\site-packages\xarray\core\dataset.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs) 2004 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, ""sel"") 2005 pos_indexers, new_indexes = remap_label_indexers( -> 2006 self, indexers=indexers, method=method, tolerance=tolerance 2007 ) 2008 result = self.isel(indexers=pos_indexers, drop=drop) ~\Anaconda3\lib\site-packages\xarray\core\coordinates.py in remap_label_indexers(obj, indexers, method, tolerance, **indexers_kwargs) 378 379 pos_indexers, new_indexes = indexing.remap_label_indexers( --> 380 obj, v_indexers, method=method, tolerance=tolerance 381 ) 382 # attach indexer's coordinate to pos_indexers ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in remap_label_indexers(data_obj, indexers, method, tolerance) 257 new_indexes = {} 258 --> 259 dim_indexers = get_dim_indexers(data_obj, indexers) 260 for dim, label in dim_indexers.items(): 261 try: ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in get_dim_indexers(data_obj, indexers) 223 ] 224 if invalid: --> 225 raise ValueError(""dimensions or multi-index levels %r do not exist"" % invalid) 226 227 level_indexers = defaultdict(dict) ValueError: dimensions or multi-index levels ['lat', 'lon'] do not exist ``` Does any know how fix to this problem?Thank you very much. _Originally posted by @JimmyGao0204 in https://github.com/pydata/xarray/issues/475#issuecomment-633172787_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4090/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1210147360,I_kwDOAMm_X85IIWIg,6504,test_weighted.test_weighted_operations_nonequal_coords should avoid depending on random number seed,1217238,closed,0,1217238,,0,2022-04-20T19:56:19Z,2022-08-29T20:42:30Z,2022-08-29T20:42:30Z,MEMBER,,,,"### What happened? In testing an upgrade to the latest version of xarray in our systems, I noticed this test failing: ``` def test_weighted_operations_nonequal_coords(): # There are no weights for a == 4, so that data point is ignored. weights = DataArray(np.random.randn(4), dims=(""a"",), coords=dict(a=[0, 1, 2, 3])) data = DataArray(np.random.randn(4), dims=(""a"",), coords=dict(a=[1, 2, 3, 4])) check_weighted_operations(data, weights, dim=""a"", skipna=None) q = 0.5 result = data.weighted(weights).quantile(q, dim=""a"") # Expected value computed using code from [https://aakinshin.net/posts/weighted-quantiles/](https://www.google.com/url?q=https://aakinshin.net/posts/weighted-quantiles/&sa=D) with values at a=1,2,3 expected = DataArray([0.9308707], coords={""quantile"": [q]}).squeeze() > assert_allclose(result, expected) E AssertionError: Left and right DataArray objects are not close E E Differing values: E L E array(0.919569) E R E array(0.930871) ``` It appears that this test is hard-coded to match a particular random number seed, which in turn would fix the resutls of `np.random.randn()`. ### What did you expect to happen? Whenever possible, Xarray's own tests should avoid relying on particular random number generators, e.g., in this case we could specify random numbers instead. A back-up option would be to explicitly set random seed locally inside the tests, e.g., by creating a `np.random.RandomState()` with a fixed seed and using that. The global random state used by `np.random.randn()` is sensitive to implementation details like the order in which tests are run. ### Minimal Complete Verifiable Example _No response_ ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment ...","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6504/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1210267320,I_kwDOAMm_X85IIza4,6505,Dropping a MultiIndex variable raises an error after explicit indexes refactor,1217238,closed,0,,,3,2022-04-20T22:07:26Z,2022-07-21T14:46:58Z,2022-07-21T14:46:58Z,MEMBER,,,,"### What happened? With the latest released version of Xarray, it is possible to delete all variables corresponding to a MultiIndex by simply deleting the name of the MultiIndex. After the explicit indexes refactor (i.e,. using the ""main"" development branch) this now raises error about how this would ""corrupt"" index state. This comes up when using `drop()` and `assign_coords()` and possibly some other methods. This is not hard to work around, but we may want to consider this bug a blocker for the next Xarray release. I found the issue surfaced in several projects when attempting to use the new version of Xarray inside Google's codebase. CC @benbovy in case you have any thoughts to share. ### What did you expect to happen? For now, we should preserve the behavior of deleting the variables corresponding to MultiIndex levels, but should issue a deprecation warning encouraging users to explicitly delete everything. ### Minimal Complete Verifiable Example ```Python import xarray array = xarray.DataArray( [[1, 2], [3, 4]], dims=['x', 'y'], coords={'x': ['a', 'b']}, ) stacked = array.stack(z=['x', 'y']) print(stacked.drop('z')) print() print(stacked.assign_coords(z=[1, 2, 3, 4])) ``` ### Relevant log output ```Python ValueError Traceback (most recent call last) Input In [1], in () 3 array = xarray.DataArray( 4 [[1, 2], [3, 4]], 5 dims=['x', 'y'], 6 coords={'x': ['a', 'b']}, 7 ) 8 stacked = array.stack(z=['x', 'y']) ----> 9 print(stacked.drop('z')) 10 print() 11 print(stacked.assign_coords(z=[1, 2, 3, 4])) File ~/dev/xarray/xarray/core/dataarray.py:2425, in DataArray.drop(self, labels, dim, errors, **labels_kwargs) 2408 def drop( 2409 self, 2410 labels: Mapping = None, (...) 2414 **labels_kwargs, 2415 ) -> DataArray: 2416 """"""Backward compatible method based on `drop_vars` and `drop_sel` 2417 2418 Using either `drop_vars` or `drop_sel` is encouraged (...) 2423 DataArray.drop_sel 2424 """""" -> 2425 ds = self._to_temp_dataset().drop(labels, dim, errors=errors) 2426 return self._from_temp_dataset(ds) File ~/dev/xarray/xarray/core/dataset.py:4590, in Dataset.drop(self, labels, dim, errors, **labels_kwargs) 4584 if dim is None and (is_scalar(labels) or isinstance(labels, Iterable)): 4585 warnings.warn( 4586 ""dropping variables using `drop` will be deprecated; using drop_vars is encouraged."", 4587 PendingDeprecationWarning, 4588 stacklevel=2, 4589 ) -> 4590 return self.drop_vars(labels, errors=errors) 4591 if dim is not None: 4592 warnings.warn( 4593 ""dropping labels using list-like labels is deprecated; using "" 4594 ""dict-like arguments with `drop_sel`, e.g. `ds.drop_sel(dim=[labels])."", 4595 DeprecationWarning, 4596 stacklevel=2, 4597 ) File ~/dev/xarray/xarray/core/dataset.py:4549, in Dataset.drop_vars(self, names, errors) 4546 if errors == ""raise"": 4547 self._assert_all_in_dataset(names) -> 4549 assert_no_index_corrupted(self.xindexes, names) 4551 variables = {k: v for k, v in self._variables.items() if k not in names} 4552 coord_names = {k for k in self._coord_names if k in variables} File ~/dev/xarray/xarray/core/indexes.py:1394, in assert_no_index_corrupted(indexes, coord_names) 1392 common_names_str = "", "".join(f""{k!r}"" for k in common_names) 1393 index_names_str = "", "".join(f""{k!r}"" for k in index_coords) -> 1394 raise ValueError( 1395 f""cannot remove coordinate(s) {common_names_str}, which would corrupt "" 1396 f""the following index built from coordinates {index_names_str}:\n"" 1397 f""{index}"" 1398 ) ValueError: cannot remove coordinate(s) 'z', which would corrupt the following index built from coordinates 'z', 'x', 'y': ``` ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: 33cdabd261b5725ac357c2823bd0f33684d3a954 python: 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:42:03) [Clang 12.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.4.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.18.3.dev137+g96c56836 pandas: 1.4.2 numpy: 1.22.3 scipy: 1.8.0 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.11.3 cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.04.1 distributed: 2022.4.1 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2022.3.0 cupy: None pint: None sparse: None setuptools: 62.1.0 pip: 22.0.4 conda: None pytest: 7.1.1 IPython: 8.2.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6505/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 168272291,MDExOlB1bGxSZXF1ZXN0NzkzMjE2NTc=,924,WIP: progress toward making groupby work with multiple arguments,1217238,open,0,,,16,2016-07-29T08:07:57Z,2022-06-09T14:50:17Z,,MEMBER,,0,pydata/xarray/pulls/924,"Fixes #324 It definitely doesn't work properly yet, totally mixing up coordinates, data variables and multi-indexes (as shown by the failing tests). A simple example: ``` In [4]: coords = {'a': ('x', [0, 0, 1, 1]), 'b': ('y', [0, 0, 1, 1])} In [5]: square = xr.DataArray(np.arange(16).reshape(4, 4), coords=coords, dims=['x', 'y']) In [6]: square Out[6]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]) Coordinates: b (y) int64 0 0 1 1 a (x) int64 0 0 1 1 * x (x) int64 0 1 2 3 * y (y) int64 0 1 2 3 In [7]: square.groupby(['a', 'b']).mean() Out[7]: array([[ 2.5, 4.5], [ 10.5, 12.5]]) Coordinates: * a (a) int64 0 1 * b (b) int64 0 1 In [8]: square.groupby(['x', 'y']).mean() Out[8]: array([[ 0., 1., 2., 3.], [ 4., 5., 6., 7.], [ 8., 9., 10., 11.], [ 12., 13., 14., 15.]]) Coordinates: * x (x) int64 0 1 2 3 * y (y) int64 0 1 2 3 ``` More examples: https://gist.github.com/shoyer/5cfa4d5751e8a78a14af25f8442ad8d5 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/924/reactions"", ""total_count"": 4, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 3, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 711626733,MDU6SXNzdWU3MTE2MjY3MzM=,4473,Wrap numpy-groupies to speed up Xarray's groupby aggregations,1217238,closed,0,,,8,2020-09-30T04:43:04Z,2022-05-15T02:38:29Z,2022-05-15T02:38:29Z,MEMBER,,,," **Is your feature request related to a problem? Please describe.** Xarray's groupby aggregations (e.g., `groupby(..).sum()`) are very slow compared to pandas, as described in https://github.com/pydata/xarray/issues/659. **Describe the solution you'd like** We could speed things up considerably (easily 100x) by wrapping the [numpy-groupies](https://github.com/ml31415/numpy-groupies) package. **Additional context** One challenge is how to handle dask arrays (and other duck arrays). In some cases it might make sense to apply the numpy-groupies function (using apply_ufunc), but in other cases it might be better to stick with the current indexing + concatenate solution. We could either pick some simple heuristics for choosing the algorithm to use on dask arrays, or could just stick with the current algorithm for now. In particular, it might make sense to stick with the current algorithm if there are a many chunks in the arrays to aggregated along the ""grouped"" dimension (depending on the size of the unique group values).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4473/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 326205036,MDU6SXNzdWUzMjYyMDUwMzY=,2180,How should Dataset.update() handle conflicting coordinates?,1217238,open,0,,,16,2018-05-24T16:46:23Z,2022-04-30T13:40:28Z,,MEMBER,,,,"Recently, we updated `Dataset.__setitem__` to drop conflicting coordinates from DataArray values being assigned if they conflict with existing coordinates (https://github.com/pydata/xarray/pull/2087). Because `update` and `__setitem__` share the same code path, this inadvertently updated `update` as well. Is this something we want? In v0.10.3, both `__setitem__` and `update` prioritize coordinates from the assigned objects (e.g., `value` in `dataset[key] = value`). In v0.10.4, both `__setitem__` and `update` prioritize coordinates from the original object (e.g., `dataset`). I'm not sure this is the right behavior. In particular, in the case of `dataset.update(other)` where `other` is also an `xarray.Dataset`, it seems like coordinates from `other` should take priority. Note that one advantage of the current logic (which is violated by my current fix in https://github.com/pydata/xarray/pull/2162), is that we maintain the invariant that `dataset[key] = value` is equivalent to `dataset.update({key: value})`.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2180/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 612918997,MDU6SXNzdWU2MTI5MTg5OTc=,4034,Fix tight_layout warning on cartopy facetgrid docs example,1217238,open,0,,,1,2020-05-05T21:54:46Z,2022-04-30T12:37:50Z,,MEMBER,,,,"Per the fix in https://github.com/pydata/xarray/pull/4032, I'm pretty sure we will soon start seeing a warning message printed on ReadTheDocs in Cartopy FacetGrid example: http://xarray.pydata.org/en/stable/plotting.html#maps This would be nice to fix for users, especially because it's likely users will see this warning when running code outside of our documentation, too.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4034/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 621123222,MDU6SXNzdWU2MjExMjMyMjI=,4081,"Wrap ""Dimensions"" onto multiple lines in xarray.Dataset repr?",1217238,closed,0,,,4,2020-05-19T16:31:59Z,2022-04-29T19:59:24Z,2022-04-29T19:59:24Z,MEMBER,,,,"Here's an example dataset of a large dataset from @alimanfoo: https://nbviewer.jupyter.org/gist/alimanfoo/b74b08465727894538d5b161b3ced764 ``` Dimensions: (__variants/BaseCounts_dim1: 4, __variants/MLEAC_dim1: 3, __variants/MLEAF_dim1: 3, alt_alleles: 3, ploidy: 2, samples: 1142, variants: 21442865) Coordinates: samples/ID (samples) object dask.array variants/CHROM (variants) object dask.array variants/POS (variants) int32 dask.array Dimensions without coordinates: __variants/BaseCounts_dim1, __variants/MLEAC_dim1, __variants/MLEAF_dim1, alt_alleles, ploidy, samples, variants Data variables: variants/ABHet (variants) float32 dask.array variants/ABHom (variants) float32 dask.array variants/AC (variants, alt_alleles) int32 dask.array variants/AF (variants, alt_alleles) float32 dask.array ... ``` I know similarly large datasets with lots of dimensions come up in other contexts as well, e.g., with geophysical model output. That's a very long first line! This would be easier to read as: ``` Dimensions: (__variants/BaseCounts_dim1: 4, __variants/MLEAC_dim1: 3, __variants/MLEAF_dim1: 3, alt_alleles: 3, ploidy: 2, samples: 1142, variants: 21442865) Coordinates: samples/ID (samples) object dask.array variants/CHROM (variants) object dask.array variants/POS (variants) int32 dask.array Dimensions without coordinates: __variants/BaseCounts_dim1, __variants/MLEAC_dim1, __variants/MLEAF_dim1, alt_alleles, ploidy, samples, variants Data variables: variants/ABHet (variants) float32 dask.array variants/ABHom (variants) float32 dask.array variants/AC (variants, alt_alleles) int32 dask.array variants/AF (variants, alt_alleles) float32 dask.array ... ``` or maybe: ``` Dimensions: __variants/BaseCounts_dim1: 4 __variants/MLEAC_dim1: 3 __variants/MLEAF_dim1: 3 alt_alleles: 3 ploidy: 2 samples: 1142 variants: 21442865 Coordinates: samples/ID (samples) object dask.array variants/CHROM (variants) object dask.array variants/POS (variants) int32 dask.array Dimensions without coordinates: __variants/BaseCounts_dim1, __variants/MLEAC_dim1, __variants/MLEAF_dim1, alt_alleles, ploidy, samples, variants Data variables: variants/ABHet (variants) float32 dask.array variants/ABHom (variants) float32 dask.array variants/AC (variants, alt_alleles) int32 dask.array variants/AF (variants, alt_alleles) float32 dask.array ... ``` `Dimensions without coordinates` could probably use some wrapping, too.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4081/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 205455788,MDU6SXNzdWUyMDU0NTU3ODg=,1251,Consistent naming for xarray's methods that apply functions,1217238,closed,0,,,13,2017-02-05T21:27:24Z,2022-04-27T20:06:25Z,2022-04-27T20:06:25Z,MEMBER,,,,"We currently have two types of methods that take a function to apply to xarray objects: - `pipe` (on `DataArray` and `Dataset`): apply a function to this entire object (`array.pipe(func)` -> `func(array)`) - `apply` (on `Dataset` and `GroupBy`): apply a function to each labeled object in this object (e.g., `ds.apply(func)` -> `ds({k: func(v) for k, v in ds.data_vars.items()})`). And one more method that we want to add but isn't finalized yet -- currently named `apply_ufunc`: - Apply a function that acts on unlabeled (i.e., numpy) arrays to each array in the object I'd like to have three distinct names that makes it clear what these methods do and how they are different. This has come up a few times recently, e.g., https://github.com/pydata/xarray/issues/1130 One proposal: rename `apply` to `map`, and then use `apply` only for methods that act on unlabeled arrays. This would require a deprecation cycle, but eventually it would let us add `.apply` methods for handling raw arrays to both Dataset and DataArray. (We could use a separate apply method from `apply_ufunc` to convert `dim` arguments to `axis` and not do automatic broadcasting.)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1251/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 342180429,MDU6SXNzdWUzNDIxODA0Mjk=,2298,Making xarray math lazy,1217238,open,0,,,7,2018-07-18T05:18:53Z,2022-04-19T15:38:59Z,,MEMBER,,,,"At SciPy, I had the realization that it would be relatively straightforward to make element-wise math between xarray objects lazy. This would let us support lazy coordinate arrays, a feature that has quite a few use-cases, e.g., for both geoscience and astronomy. The trick would be to write a lazy array class that holds an element-wise vectorized function and passes indexers on to its arguments. I haven't thought too hard about this yet for vectorized indexing, but it could be quite efficient for outer indexing. I have some prototype code but no tests yet. The question is how to hook this into xarray operations. In particular, supposing that the inputs to a function do no hold dask arrays: - Should we try to make *every* element-wise operation with vectorized functions (ufuncs) lazy by default? This might have negative performance implications and would be a little tricky to implement with xarray's current code, since we still implement binary operations like `+` with separate logic from `apply_ufunc`. - Should we make every element-wise operation that explicitly uses `apply_ufunc()` lazy by default? - Or should we only make element-wise operations lazy with `apply_ufunc()` if you use some special flag, e.g., `apply_ufunc(..., lazy=True)`? I am leaning towards the last option for now but would welcome other opinions.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2298/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 902622057,MDU6SXNzdWU5MDI2MjIwNTc=,5381,concat() with compat='no_conflicts' on dask arrays has accidentally quadratic runtime,1217238,open,0,,,0,2021-05-26T16:12:06Z,2022-04-19T03:48:27Z,,MEMBER,,,,"This ends up calling `fillna()` in a loop inside `xarray.core.merge.unique_variable()`, something like: ```python out = variables[0] for var in variables[1:]: out = out.fillna(var) ``` https://github.com/pydata/xarray/blob/55e5b5aaa6d9c27adcf9a7cb1f6ac3bf71c10dea/xarray/core/merge.py#L147-L149 This has quadratic behavior if the variables are stored in dask arrays (the dask graph gets one element larger after each loop iteration). This is OK for `merge()` (which typically only has two arguments) but is problematic for dealing with variables that shouldn't be concatenated inside `concat()`, which should be able to handle very long lists of arguments. I encountered this because `compat='no_conflicts'` is the default for `xarray.combine_nested()`. I guess there's also the related issue which is that even if we produced the output dask graph by hand without a loop, it still wouldn't be easy to evaluate for a large number of elements. Ideally we would use some sort of tree-reduction to ensure the operation can be parallelized. xref https://github.com/google/xarray-beam/pull/13","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5381/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 325439138,MDU6SXNzdWUzMjU0MzkxMzg=,2171,Support alignment/broadcasting with unlabeled dimensions of size 1,1217238,open,0,,,5,2018-05-22T19:52:21Z,2022-04-19T03:15:24Z,,MEMBER,,,,"Sometimes, it's convenient to include placeholder dimensions of size 1, which allows for removing any ambiguity related to the order of output dimensions. Currently, this is not supported with xarray: ``` >>> xr.DataArray([1], dims='x') + xr.DataArray([1, 2, 3], dims='x') ValueError: arguments without labels along dimension 'x' cannot be aligned because they have different dimension sizes: {1, 3} >>> xr.Variable(('x',), [1]) + xr.Variable(('x',), [1, 2, 3]) ValueError: operands cannot be broadcast together with mismatched lengths for dimension 'x': (1, 3) ``` However, these operations aren't really ambiguous. With size 1 dimensions, we could logically do broadcasting like NumPy arrays, e.g., ``` >>> np.array([1]) + np.array([1, 2, 3]) array([2, 3, 4]) ``` This would be particularly convenient if we add `keepdims=True` to xarray operations (#2170).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2171/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 665488672,MDU6SXNzdWU2NjU0ODg2NzI=,4267,CachingFileManager should not use __del__,1217238,open,0,,,2,2020-07-25T01:20:52Z,2022-04-17T21:42:39Z,,MEMBER,,,,"`__del__` is sometimes called after modules have been deallocated, which results in errors printed to stderr when Python exits. This manifests itself in the following bug: https://github.com/shoyer/h5netcdf/issues/50 Per https://github.com/shoyer/h5netcdf/issues/50#issuecomment-572191867, the right solution is probably to use `weakref.finalize`.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4267/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 469440752,MDU6SXNzdWU0Njk0NDA3NTI=,3139,"Change the signature of DataArray to DataArray(data, dims, coords, ...)?",1217238,open,0,,,1,2019-07-17T20:54:57Z,2022-04-09T15:28:51Z,,MEMBER,,,,"Currently, the signature of DataArray is `DataArray(data, coords, dims, ...)`: http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html In the long term, I think `DataArray(data, dims, coords, ...)` would be more intuitive: dimensions are a more fundamental part of xarray's data model than coordinates. Certainly I find it much more common to omit `coords` than to omit `dims` when I create a `DataArray`. My original reasoning for this argument order was that `dims` could be copied from `coords`, e.g., `DataArray(new_data, old_dataarray.coords)`, and it was nice to be able to pass this sole argument by position instead of by name. But a cleaner way to write this now is `old_dataarray.copy(data=new_data)`. The challenge in making any change here would be to have a smooth deprecation process, and that ideally avoids requiring users to rewrite all of their code and avoids loads of pointless/extraneous warnings. I'm not entirely sure this is possible. We could likely use heuristics to distinguish between `dims` and `coords` arguments regardless of their order, but this probably isn't something we would want to preserve in the long term. An alternative that might achieve some of the convenience of this change would be to allow for passing lists of strings in the `coords` argument by position, which are interpreted as dimensions, e.g., `DataArray(data, ['x', 'y'])`. The downside of this alternative is that it would add even more special cases to the `DataArray` constructor , which would make it harder to understand.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3139/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 327166000,MDExOlB1bGxSZXF1ZXN0MTkxMDMwMjA4,2195,WIP: explicit indexes,1217238,closed,0,,,3,2018-05-29T04:25:15Z,2022-03-21T14:59:52Z,2022-03-21T14:59:52Z,MEMBER,,0,pydata/xarray/pulls/2195,"Some utility functions that should be useful for https://github.com/pydata/xarray/issues/1603 Still very much a work in progress -- it would be great if someone has time to finish writing any of these in another PR!","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2195/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 864249974,MDU6SXNzdWU4NjQyNDk5NzQ=,5202,Make creating a MultiIndex in stack optional,1217238,closed,0,,,7,2021-04-21T20:21:03Z,2022-03-17T17:11:42Z,2022-03-17T17:11:42Z,MEMBER,,,,"As @Hoeze notes in https://github.com/pydata/xarray/issues/5179, calling `stack()` can be ""incredibly slow and memory-demanding, since it creates a MultiIndex of every possible coordinate in the array."" This is true with how `stack()` works currently, but I'm not sure this is necessary. I suspect it's a vestigial design choice from copying pandas, back from before Xarray had optional indexes. One benefit is that it's convenient for making `unstack()` the inverse of `stack()`, but isn't always required. Regardless of how we define the semantics for boolean indexing (https://github.com/pydata/xarray/issues/1887), it seems like it could be a good idea to allow stack to skip creating a MultiIndex for the new dimension, via a new keyword argument such as `ds.stack(index=False)`. This would be equivalent to calling `reset_index()` after `stack()` but would be cheaper because the MultiIndex is never created in the first place.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5202/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 237008177,MDU6SXNzdWUyMzcwMDgxNzc=,1460,groupby should still squeeze for non-monotonic inputs,1217238,open,0,,,5,2017-06-19T20:05:14Z,2022-03-04T21:31:41Z,,MEMBER,,,,"We can simply use `argsort()` to determine `group_indices` instead of `np.arange()`: https://github.com/pydata/xarray/blob/22ff955d53e253071f6e4fa849e5291d0005282a/xarray/core/groupby.py#L256","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1460/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 58117200,MDU6SXNzdWU1ODExNzIwMA==,324,Support multi-dimensional grouped operations and group_over,1217238,open,0,,741199,12,2015-02-18T19:42:20Z,2022-02-28T19:03:17Z,,MEMBER,,,,"Multi-dimensional grouped operations should be relatively straightforward -- the main complexity will be writing an N-dimensional concat that doesn't involve repetitively copying data. The idea with `group_over` would be to support groupby operations that act on a single element from each of the given groups, rather than the unique values. For example, `ds.group_over(['lat', 'lon'])` would let you iterate over or apply to 2D slices of `ds`, no matter how many dimensions it has. Roughly speaking (it's a little more complex for the case of non-dimension variables), `ds.group_over(dims)` would get translated into `ds.groupby([d for d in ds.dims if d not in dims])`. Related: #266 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/324/reactions"", ""total_count"": 18, ""+1"": 18, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1090700695,I_kwDOAMm_X85BAsWX,6125,[Bug]: HTML repr does not display well in notebooks hosted on GitHub,1217238,open,0,,,0,2021-12-29T19:05:49Z,2021-12-29T19:36:25Z,,MEMBER,,,,"### What happened? We see _both_ the raw text *and* a malformed version of the HTML (without CSS formatting). Example (https://github.com/microsoft/PlanetaryComputerExamples/blob/main/quickstarts/reading-zarr-data.ipynb): ![image](https://user-images.githubusercontent.com/1217238/147695209-127feae1-7dd2-48b9-9626-f0c8eb3815eb.png) ### What did you expect to happen? Either: 1. Ideally, we only see the HTML repr, with CSS formatting applied. 2. Or, if that isn't possible, we should figure out how to only show the raw text. nbviewer [gets this right](https://nbviewer.org/github/microsoft/PlanetaryComputerExamples/blob/main/quickstarts/reading-zarr-data.ipynb): ![image](https://user-images.githubusercontent.com/1217238/147695174-eebcefff-f99a-4391-b9c1-13ccf77f36ba.png) ### Minimal Complete Verifiable Example _No response_ ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment NA","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6125/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1062709354,PR_kwDOAMm_X84u-sO9,6025,Simplify missing value handling in xarray.corr,1217238,closed,0,,,1,2021-11-24T17:48:03Z,2021-11-28T04:39:22Z,2021-11-28T04:39:22Z,MEMBER,,0,pydata/xarray/pulls/6025,"This PR simplifies the fix from https://github.com/pydata/xarray/pull/5731, specifically for the benefit of xarray.corr. There is no need to use `map_blocks` instead of using `where` directly. It is a basically an alternative version of https://github.com/pydata/xarray/pull/5284. It is potentially slightly less efficient to do this masking step when unnecessary, but I doubt this makes a noticeable performance difference in practice (and I doubt this optimization is useful insdie `map_blocks`, anyways).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6025/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1044151556,PR_kwDOAMm_X84uELYB,5935,Docs: fix URL for PTSA,1217238,closed,0,,,1,2021-11-03T21:56:44Z,2021-11-05T09:36:04Z,2021-11-05T09:36:04Z,MEMBER,,0,pydata/xarray/pulls/5935,One of the PTSA authors told me about the new URL by email.,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5935/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 874292512,MDU6SXNzdWU4NzQyOTI1MTI=,5251,Switch default for Zarr reading/writing to consolidated=True?,1217238,closed,0,,,4,2021-05-03T06:59:42Z,2021-08-30T15:21:11Z,2021-08-30T15:21:11Z,MEMBER,,,,"Consolidated metadata was a new feature in Zarr v2.3, which was released over two year ago (March 22, 2019). Since then, I have used `consolidated=True` _every_ time I've written or opened a Zarr store. As far as I can tell, this is almost always a good idea: - With local storage, it usually doesn't really matter. You spend a bit of time writing the consolidated metadata and have one extra file on disk, but the overhead is typically negligible. - With Cloud object stores or network filesystems, it can matter quite a large amount. Without consolidated metadata, these systems can be unusably slow for opening datasets. Cloud storage is of course the main use-case for Zarr. If you're using a local disk, you might as well stick with single files such as netCDF. I wonder if consolidated metadata is mature enough now that we could consider switching the default behavior in Xarray. From my perspective, this is a big ""gotcha"" for getting good performance with Zarr. More than one of my colleagues has been unimpressed with the performance of Zarr until they learned to set `consolidated=True`. I would suggest doing this in way is almost entirely backwards compatible, with only a minor performance costs for reading non-consolidated datasets: - `to_zarr()` switches the default to `consolidated=True`. The `consolidate_metadata()` will thus happen by default. - `open_zarr()` switches the default to `consolidated=None`, which means ""Try reading consolidated metadata, and fall-back to non-consolidated if that fails."" This will be slightly slower for non-consolidated metadata due to the extra file-lookup, but given that opening with non-consolidated metadata already requires a moderately large number of file look-ups, I doubt anyone will notice the difference. CC @rabernat ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5251/reactions"", ""total_count"": 11, ""+1"": 11, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 928402742,MDU6SXNzdWU5Mjg0MDI3NDI=,5516,Rename master branch -> main,1217238,closed,0,,,4,2021-06-23T15:45:57Z,2021-07-23T21:58:39Z,2021-07-23T21:58:39Z,MEMBER,,,,"This is a best practice for inclusive projects. See https://github.com/github/renaming for guidance.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5516/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 948890466,MDExOlB1bGxSZXF1ZXN0NjkzNjY1NDEy,5624,Make typing-extensions optional,1217238,closed,0,,,6,2021-07-20T17:43:22Z,2021-07-22T23:30:49Z,2021-07-22T23:02:03Z,MEMBER,,0,pydata/xarray/pulls/5624,"Type checking may be a little worse if typing-extensions are not installed, but I don't think it's worth the trouble of adding another hard dependency just for one use for TypeGuard. Note: sadly this doesn't work yet. Mypy (and pylance) don't like the type alias defined with try/except. Any ideas? In the worst case, we could revert the TypeGuard entirely, but that would be a shame... - [x] Closes #5495 - [x] Passes `pre-commit run --all-files` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5624/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 890534794,MDU6SXNzdWU4OTA1MzQ3OTQ=,5295,"Engine is no longer inferred for filenames not ending in "".nc""",1217238,closed,0,,,0,2021-05-12T22:28:46Z,2021-07-15T14:57:54Z,2021-05-14T22:40:14Z,MEMBER,,,,"This works with xarray=0.17.0: ```python import xarray xarray.Dataset({'x': [1, 2, 3]}).to_netcdf('tmp') xarray.open_dataset('tmp') ``` On xarray 0.18.0, it fails: ``` --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () 2 3 xarray.Dataset({'x': [1, 2, 3]}).to_netcdf('tmp') ----> 4 xarray.open_dataset('tmp') /usr/local/lib/python3.7/dist-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs) 483 484 if engine is None: --> 485 engine = plugins.guess_engine(filename_or_obj) 486 487 backend = plugins.get_backend(engine) /usr/local/lib/python3.7/dist-packages/xarray/backends/plugins.py in guess_engine(store_spec) 110 warnings.warn(f""{engine!r} fails while guessing"", RuntimeWarning) 111 --> 112 raise ValueError(""cannot guess the engine, try passing one explicitly"") 113 114 ValueError: cannot guess the engine, try passing one explicitly ``` I'm not entirely sure what changed. My guess is that we used to fall-back to trying to use SciPy, but don't do that anymore. A potential fix would be reading strings as filenames in `xarray.backends.utils.read_magic_number`. Related: https://github.com/pydata/xarray/issues/5291","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5295/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 252707680,MDU6SXNzdWUyNTI3MDc2ODA=,1525,Consider setting name=False in Variable.chunk(),1217238,open,0,,,4,2017-08-24T19:34:28Z,2021-07-13T01:50:16Z,,MEMBER,,,,"@mrocklin writes: > The following will be slower: ``` b = (a.chunk(...) + 1) + (a.chunk(...) + 1) ``` > In current operation this will be optimized to ``` tmp = a.chunk(...) + 1 b = tmp + tmp ``` > So you'll lose that, but I suspect that in your case chunking the same dataset many times is somewhat rare. See here for discussion: https://github.com/pydata/xarray/pull/1517#issuecomment-324722153 Whether this is worth doing really depends on on what people would find most useful -- and what is the most intuitive behavior.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1525/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 254888879,MDU6SXNzdWUyNTQ4ODg4Nzk=,1552,Flow chart for choosing indexing operations,1217238,open,0,,,2,2017-09-03T17:33:30Z,2021-07-11T22:26:17Z,,MEMBER,,,,"We have a lot of indexing operations, even though `sel_points` and `isel_points` are about to be deprecated (#1473). A flow chart / decision tree to help users pick the right indexing operation might be helpful (e.g., like [this skimage FlowChart](http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html)). It would ask various questions (e.g., do you have labels or integer positions? do you want to select or impose coordinates?) and then suggest appropriate the indexer methods. cc @fujiisoup ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1552/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 891281614,MDU6SXNzdWU4OTEyODE2MTQ=,5302,Suggesting specific IO backends to install when open_dataset() fails,1217238,closed,0,,,3,2021-05-13T18:45:28Z,2021-06-23T08:18:07Z,2021-06-23T08:18:07Z,MEMBER,,,,"Currently, Xarray's internal backends don't get registered unless the necessary dependencies are installed: https://github.com/pydata/xarray/blob/1305d9b624723b86050ca5b2d854e5326bbaa8e6/xarray/backends/netCDF4_.py#L567-L568 In order to facilitating suggesting a specific backend to install (e.g., to improve error messages from opening tutorial datasets https://github.com/pydata/xarray/issues/5291), I would suggest that Xarray _always_ registers its own backend entrypoints. Then we make the following changes to the plugin protocol: - `guess_can_open()` should work _regardless_ of whether the underlying backend is installed - `installed()` returns a boolean reporting whether backend is installed. The default method in the base class would return `True`, for backwards compatibility. - `open_dataset()` of course should error if the backend is not installed. This will let us leverage the existing `guess_can_open()` functionality to suggest specific optional dependencies to install. E.g., if you supply a netCDF3 file: `Xarray cannot find a matching installed backend for this file in the installed backends [""h5netcdf""]. Consider installing one of the following backends which reports a match: [""scipy"", ""netcdf4""]` Does this reasonable and worthwhile? CC @aurghs @alexamici ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5302/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 874331538,MDExOlB1bGxSZXF1ZXN0NjI4OTE0NDQz,5252,"Add mode=""r+"" for to_zarr and use consolidated writes/reads by default",1217238,closed,0,,,14,2021-05-03T07:57:16Z,2021-06-22T06:51:35Z,2021-06-17T17:19:26Z,MEMBER,,0,pydata/xarray/pulls/5252,"`mode=""r+""` only allows for modifying pre-existing array values in a Zarr store. This makes it a safer default `mode` when doing a limited `region` write. It also offers a nice performance bonus when using consolidated metadata, because the store to modify can be opened in ""consolidated"" mode -- rather than painfully slow non-consolidated mode. This PR includes several related changes to `to_zarr()`: 1. It adds support for the new `mode=""r+""`. 2. `consolidated=True` in `to_zarr()` now means ""open in consolidated mode"" if using using `mode=""r+""`, instead of ""write in consolidated mode"" (which would not make sense for r+). 3. It allows setting `consolidated=True` when using `region`, mostly for the sake of fast store opening with r+. 4. Validation in `to_zarr()` has been reorganized to always use the _existing_ Zarr group, rather than re-opening zar stores from scratch, which could require additional network requests. 5. Incidentally, I've renamed the `ZarrStore.ds` attribute to `ZarrStore.zarr_group`, which is a much more descriptive name. These changes gave me a ~5x boost in write performance in a large parallel job making use of `to_zarr` with `region`. - [x] Tests added - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5252/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 340733448,MDU6SXNzdWUzNDA3MzM0NDg=,2283,Exact alignment should allow missing dimension coordinates,1217238,open,0,,,2,2018-07-12T17:40:24Z,2021-06-15T09:52:29Z,,MEMBER,,,,"#### Code Sample, a copy-pastable example if possible ```python import xarray as xr xr.align(xr.DataArray([1, 2, 3], dims='x'), xr.DataArray([1, 2, 3], dims='x', coords=[[0, 1, 2]]), join='exact') ``` #### Problem description This currently results in an error, but a missing index of size 3 does not actually conflict: ```python-traceback --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () 1 xr.align(xr.DataArray([1, 2, 3], dims='x'), 2 xr.DataArray([1, 2, 3], dims='x', coords=[[0, 1, 2]]), ----> 3 join='exact') /usr/local/lib/python3.6/dist-packages/xarray/core/alignment.py in align(*objects, **kwargs) 129 raise ValueError( 130 'indexes along dimension {!r} are not equal' --> 131 .format(dim)) 132 index = joiner(matching_indexes) 133 joined_indexes[dim] = index ValueError: indexes along dimension 'x' are not equal ``` This surfaced as an issue on StackOverflow: https://stackoverflow.com/questions/51308962/computing-matrix-vector-multiplication-for-each-time-point-in-two-dataarrays #### Expected Output Both output arrays should end up with the `x` coordinate from the input that has it, like the output of the above expression if `join='inner'`: ``` ( array([1, 2, 3]) Coordinates: * x (x) int64 0 1 2, array([1, 2, 3]) Coordinates: * x (x) int64 0 1 2) ``` #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.14.33+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.7 pandas: 0.22.0 numpy: 1.14.5 scipy: 0.19.1 netCDF4: None h5netcdf: None h5py: 2.8.0 Nio: None zarr: None bottleneck: None cyordereddict: None dask: None distributed: None matplotlib: 2.1.2 cartopy: None seaborn: 0.7.1 setuptools: 39.1.0 pip: 10.0.1 conda: None pytest: None IPython: 5.5.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2283/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 842438533,MDU6SXNzdWU4NDI0Mzg1MzM=,5082,Move encoding from xarray.Variable to duck arrays?,1217238,open,0,,,2,2021-03-27T07:21:55Z,2021-06-13T01:34:00Z,,MEMBER,,,,"The `encoding` property on `Variable` has always been an awkward part of Xarray's API, and an example of poor separation of concerns. It add conceptual overhead to all uses of `xarray.Variable`, but exists only for the (somewhat niche) benefit of Xarray's backend IO functionality. This is particularly problematic if we consider the possible separation of `xarray.Variable` into a separate package to remove the pandas dependency (https://github.com/pydata/xarray/issues/3981). I think a cleaner way to handle `encoding` would be to move it from `Variable` onto array objects, specifically duck array objects that Xarray creates when loading data from disk. As long as these duck arrays don't ""propagate"" themselves under array operations but rather turn into raw numpy arrays (or whatever is wrapped), this would automatically resolve all issues around propagating `encoding` attributes (e.g., https://github.com/pydata/xarray/pull/5065, https://github.com/pydata/xarray/issues/1614). And users who don't care about `encoding` because they don't use Xarray's IO functionality would never need to think about it.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5082/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 416554477,MDU6SXNzdWU0MTY1NTQ0Nzc=,2797,Stalebot is being overly aggressive,1217238,closed,0,,,7,2019-03-03T19:37:37Z,2021-06-03T21:31:46Z,2021-06-03T21:22:48Z,MEMBER,,,,"E.g., see https://github.com/pydata/xarray/issues/1151 where stalebot closed an issue even after another comment. Is this something we need to reconfigure or just a bug? cc @pydata/xarray ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2797/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 276241764,MDU6SXNzdWUyNzYyNDE3NjQ=,1739,Utility to restore original dimension order after apply_ufunc,1217238,open,0,,,11,2017-11-23T00:47:57Z,2021-05-29T07:39:33Z,,MEMBER,,,,"This seems to be coming up quite a bit for wrapping functions that apply an operation along an axis, e.g., for `interpolate` in #1640 or `rank` in #1733. We should either write a utility function to do this or consider adding an option to `apply_ufunc`.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1739/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 901047466,MDU6SXNzdWU5MDEwNDc0NjY=,5372,Consider revising the _repr_inline_ protocol,1217238,open,0,,,0,2021-05-25T16:18:31Z,2021-05-25T16:18:31Z,,MEMBER,,,,"`_repr_inline_` looks like an [IPython special method](https://ipython.readthedocs.io/en/stable/config/integrating.html#rich-display) but is actually includes some xarray specific details: the result should not include `shape` or `dtype`. As I wrote in https://github.com/pydata/xarray/pull/5352, I would suggest revising it in one of two ways: 1. Giving it a name like `_xarray_repr_inline_` to make it clearer that it's Xarray specific 2. Include some more generic way of indicating that `shape`/`dtype` is redundant, e.g,. call it like `obj._repr_ndarray_inline_(dtype=False, shape=False)`","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5372/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 891253662,MDExOlB1bGxSZXF1ZXN0NjQ0MTQ5Mzc2,5300,Better error message when no backend engine is found.,1217238,closed,0,,,4,2021-05-13T18:10:04Z,2021-05-18T21:23:00Z,2021-05-18T21:23:00Z,MEMBER,,0,pydata/xarray/pulls/5300,"Also includes a better error message when loading a tutorial dataset but an underlying IO dependency is not found. - [x] Fixes #5291 - [x] Tests added - [x] Passes `pre-commit run --all-files` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5300/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 890573049,MDExOlB1bGxSZXF1ZXN0NjQzNTc1Mjc5,5296,More robust guess_can_open for netCDF4/scipy/h5netcdf entrypoints,1217238,closed,0,,,1,2021-05-12T23:53:32Z,2021-05-14T22:40:14Z,2021-05-14T22:40:14Z,MEMBER,,0,pydata/xarray/pulls/5296,"The new version checks magic numbers in files on disk, not just already open file objects. I've also added a bunch of unit-tests. Fixes GH5295 - [x] Closes #5295 - [x] Tests added - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5296/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 46049691,MDU6SXNzdWU0NjA0OTY5MQ==,255,Add Dataset.to_pandas() method,1217238,closed,0,,987654,2,2014-10-17T00:01:36Z,2021-05-04T13:56:00Z,2021-05-04T13:56:00Z,MEMBER,,,,"This would be the complement of the DataArray constructor, converting an xray.DataArray into a 1D series, 2D DataFrame or 3D panel, whichever is appropriate. `to_pandas` would also makes sense for Dataset, if it could convert 0d datasets to series, e.g., `pd.Series({k: v.item() for k, v in ds.items()})` (there is currently no direct way to do this), and revert to to_dataframe for higher dimensional input. - [x] DataArray method - [ ] Dataset method ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/255/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 294241734,MDU6SXNzdWUyOTQyNDE3MzQ=,1887,Boolean indexing with multi-dimensional key arrays,1217238,open,0,,,13,2018-02-04T23:28:45Z,2021-04-22T21:06:47Z,,MEMBER,,,,"Originally from https://github.com/pydata/xarray/issues/974 For _boolean indexing_: - `da[key]` where `key` is a boolean labelled array (with _any_ number of dimensions) is made equivalent to `da.where(key.reindex_like(ds), drop=True)`. This matches the existing behavior if `key` is a 1D boolean array. For multi-dimensional arrays, even though the result is now multi-dimensional, this coupled with automatic skipping of NaNs means that `da[key].mean()` gives the same result as in NumPy. - `da[key] = value` where `key` is a boolean labelled array can be made equivalent to `da = da.where(*align(key.reindex_like(da), value.reindex_like(da)))` (that is, the three argument form of `where`). - `da[key_0, ..., key_n]` where all of `key_i` are boolean arrays gets handled in the usual way. It is an `IndexingError` to supply multiple labelled keys if any of them are not already aligned with as the corresponding index coordinates (and share the same dimension name). If they want alignment, we suggest users simply write `da[key_0 & ... & key_n]`. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1887/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 346822633,MDU6SXNzdWUzNDY4MjI2MzM=,2336,test_88_character_filename_segmentation_fault should not try to write to the current working directory,1217238,closed,0,,,2,2018-08-02T01:06:41Z,2021-04-20T23:38:53Z,2021-04-20T23:38:53Z,MEMBER,,,,"This files in cases where the current working directory does not support writes, e.g., as seen here ``` def test_88_character_filename_segmentation_fault(self): # should be fixed in netcdf4 v1.3.1 with mock.patch('netCDF4.__version__', '1.2.4'): with warnings.catch_warnings(): message = ('A segmentation fault may occur when the ' 'file path has exactly 88 characters') warnings.filterwarnings('error', message) with pytest.raises(Warning): # Need to construct 88 character filepath > xr.Dataset().to_netcdf('a' * (88 - len(os.getcwd()) - 1)) tests/test_backends.py:1234: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ core/dataset.py:1150: in to_netcdf compute=compute) backends/api.py:715: in to_netcdf autoclose=autoclose, lock=lock) backends/netCDF4_.py:332: in open ds = opener() backends/netCDF4_.py:231: in _open_netcdf4_group ds = nc4.Dataset(filename, mode=mode, **kwargs) third_party/py/netCDF4/_netCDF4.pyx:2111: in netCDF4._netCDF4.Dataset.__init__ ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E IOError: [Errno 13] Permission denied ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2336/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 843996137,MDU6SXNzdWU4NDM5OTYxMzc=,5092,Concurrent loading of coordinate arrays from Zarr,1217238,open,0,,,0,2021-03-30T02:19:50Z,2021-04-19T02:43:31Z,,MEMBER,,,,"When you open a dataset with Zarr, xarray loads coordinate arrays corresponding to indexes in serial. This can be slow (multiple seconds) even with only a handful of such arrays if they are stored in a remote filesystem (e.g., cloud object stores). This is similar to the use-cases for [consolidated metadata](https://zarr.readthedocs.io/en/latest/tutorial.html#consolidating-metadata). In principle, we could speed up loading datasets from Zarr into Xarray significantly by reading the data corresponding to these arrays in parallel (e.g., in multiple threads).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5092/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 621082480,MDU6SXNzdWU2MjEwODI0ODA=,4080,Most arguments to open_dataset should be keyword only,1217238,closed,0,,,1,2020-05-19T15:38:51Z,2021-03-16T10:56:09Z,2021-03-16T10:56:09Z,MEMBER,,,,"`open_dataset` has a long list of arguments: `xarray.open_dataset(filename_or_obj, group=None, decode_cf=True, mask_and_scale=None, decode_times=True, autoclose=None, concat_characters=True, decode_coords=True, engine=None, chunks=None, lock=None, cache=None, drop_variables=None, backend_kwargs=None, use_cftime=None)` Similarly to the case for pandas (https://github.com/pandas-dev/pandas/issues/27544), it would be nice to make most of these arguments keyword-only, e.g., `def open_dataset(filename_or_obj, group, *, ...)`. For consistency, this would also apply to `open_dataarray`, `decode_cf`, `open_mfdataset`, etc. This would encourage writing readable code when calling `open_dataset()` and would allow us to use better organization when adding new arguments (e.g., `decode_timedelta` in https://github.com/pydata/xarray/pull/4071). To make this change, we could make use of the `deprecate_nonkeyword_arguments` decorator from https://github.com/pandas-dev/pandas/pull/27573","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4080/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 645062817,MDExOlB1bGxSZXF1ZXN0NDM5NTg4OTU1,4178,Fix min_deps_check; revert to support numpy=1.14 and pandas=0.24,1217238,closed,0,,,5,2020-06-25T00:37:19Z,2021-02-27T21:46:43Z,2021-02-27T21:46:42Z,MEMBER,,1,pydata/xarray/pulls/4178,"Fixes the issue noticed in: https://github.com/pydata/xarray/pull/4175#issuecomment-649135372 Let's see if this passes CI... - [x] Passes `isort -rc . && black . && mypy . && flake8` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4178/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 645154872,MDU6SXNzdWU2NDUxNTQ4NzI=,4179,Consider revising our minimum dependency version policy,1217238,closed,0,,,7,2020-06-25T05:04:38Z,2021-02-22T05:02:25Z,2021-02-22T05:02:25Z,MEMBER,,,,"Our [current policy](http://xarray.pydata.org/en/stable/installing.html#minimum-dependency-versions) is that xarray supports ""the minor version (X.Y) initially published no more than N months ago"" where N is: - Python: 42 months (NEP 29) - numpy: 24 months (NEP 29) - pandas: 12 months - scipy: 12 months - sparse, pint and other libraries that rely on NEP-18 for integration: very latest available versions only, - all other libraries: 6 months I think this policy is too aggressive, particularly for pandas, SciPy and other libraries. Some of these projects can go 6+ months between minor releases. For example, version 2.3 of zarr is currently more than 6 months old. So if zarr released 2.4 *today* and xarray issued a new release *tomorrow*, and then our policy would dictate that we could ask users to upgrade to the new version. In https://github.com/pydata/xarray/pull/4178, I misinterpreted our policy as supporting ""the most recent minor version (X.Y) initially published more than N months ago"". This version makes a bit more sense to me: users only need to upgrade dependencies at least every N months to use the latest xarray release. I understand that NEP-29 chose its language intentionally, so that distributors know ahead of time when they can drop support for a Python or NumPy version. But this seems like a (very) poor fit for projects without regular releases. At the very least we should adjust the specific time windows. I'll see if I can gain some understanding of the motivation for this particular language over on the NumPy tracker...","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4179/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 267927402,MDU6SXNzdWUyNjc5Mjc0MDI=,1652,Resolve warnings issued in the xarray test suite,1217238,closed,0,,,10,2017-10-24T07:36:55Z,2021-02-21T23:06:35Z,2021-02-21T23:06:34Z,MEMBER,,,,"82 warnings are currently issued in the process of running our test suite: https://gist.github.com/shoyer/db0b2c82efd76b254453216e957c4345 Some of can probably be safely ignored, but others are likely noticed by users, e.g., https://stackoverflow.com/questions/41130138/why-is-invalid-value-encountered-in-greater-warning-thrown-in-python-xarray-fo/41147570#41147570 It would be nice to clean up all of these, either by catching the appropriate upstream warning (if irrelevant) or changing our usage to avoid the warning. There may very well be a lurking FutureWarning in there somewhere that could cause issues when another library updates. Probably the easiest way to get started here is to get the test suite running locally, and use `py.test -W error` to turn all warnings into errors.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1652/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 777327298,MDU6SXNzdWU3NzczMjcyOTg=,4749,Option for combine_attrs with conflicting values silently dropped,1217238,closed,0,,,0,2021-01-01T18:04:49Z,2021-02-10T19:50:17Z,2021-02-10T19:50:17Z,MEMBER,,,,"`merge()` currently supports four options for merging `attrs`: ``` combine_attrs : {""drop"", ""identical"", ""no_conflicts"", ""override""}, \ default: ""drop"" String indicating how to combine attrs of the objects being merged: - ""drop"": empty attrs on returned Dataset. - ""identical"": all attrs must be the same on every object. - ""no_conflicts"": attrs from all objects are combined, any that have the same name must also have the same value. - ""override"": skip comparing and copy attrs from the first dataset to the result. ``` It would be nice to have an option to combine attrs from all objects like ""no_conflicts"", but that drops attributes with conflicting values rather than raising an error. We might call this `combine_attrs=""drop_conflicts""` or `combine_attrs=""matching""`. This is similar to how xarray currently handles conflicting values for `DataArray.name` and would be more suitable to consider for the default behavior of `merge` and other functions/methods that merge coordinates (e.g., apply_ufunc, concat, where, binary arithmetic). cc @keewis ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4749/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 264098632,MDU6SXNzdWUyNjQwOTg2MzI=,1618,apply_raw() for a simpler version of apply_ufunc(),1217238,open,0,,,4,2017-10-10T04:51:38Z,2021-01-01T17:14:43Z,,MEMBER,,,,"`apply_raw()` would work like `apply_ufunc()`, but without the hard to understand broadcasting behavior and core dimensions. The rule for `apply_raw()` would be that it directly unwraps its arguments and passes them on to the wrapped function, without any broadcasting. We would also include a `dim` argument that is automatically converted into the appropriate `axis` argument when calling the wrapped function. Output dimensions would be determined from a simple rule of some sort: - Default output dimensions would either be copied from the first argument, or would take on the ordered union on all input dimensions. - Custom dimensions could either be set by adding a `drop_dims` argument (like `dask.array.map_blocks`), or require an explicit override `output_dims`. This also could be suitable for defining as a method instead of a separate function. See https://github.com/pydata/xarray/issues/1251 and https://github.com/pydata/xarray/issues/1130 for related issues.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1618/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 269700511,MDU6SXNzdWUyNjk3MDA1MTE=,1672,Append along an unlimited dimension to an existing netCDF file,1217238,open,0,,,8,2017-10-30T18:09:54Z,2020-11-29T17:35:04Z,,MEMBER,,,,"This would be a nice feature to have for some use cases, e.g., for writing simulation time-steps: https://stackoverflow.com/questions/46951981/create-and-write-xarray-dataarray-to-netcdf-in-chunks It should be relatively straightforward to add, too, building on support for writing files with unlimited dimensions. User facing API would probably be a new keyword argument to `to_netcdf()`, e.g., `extend='time'` to indicate the extended dimension.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1672/reactions"", ""total_count"": 21, ""+1"": 21, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 314444743,MDU6SXNzdWUzMTQ0NDQ3NDM=,2059,How should xarray serialize bytes/unicode strings across Python/netCDF versions?,1217238,open,0,,,5,2018-04-15T19:36:55Z,2020-11-19T10:08:16Z,,MEMBER,,,,"# netCDF string types We have several options for storing strings in netCDF files: - `NC_CHAR`: netCDF's legacy character type. The closest match is NumPy `'S1'` dtype. In principle, it's supposed to be able to store arbitrary bytes. On HDF5, it uses an UTF-8 encoded string with a fixed-size of 1 (but note that HDF5 does not complain about storing arbitrary bytes). - `NC_STRING`: netCDF's newer variable length string type. It's only available on netCDF4 (not netCDF3). It corresponds to an HDF5 variable-length string with UTF-8 encoding. - `NC_CHAR` with an `_Encoding` attribute: xarray and netCDF4-Python support an ad-hoc convention for storing unicode strings in `NC_CHAR` data-types, by adding an attribute `{'_Encoding': 'UTF-8'}`. The data is still stored as fixed width strings, but xarray (and netCDF4-Python) can decode them as unicode. `NC_STRING` would seem like a clear win in cases where it's supported, but as @crusaderky points out in https://github.com/pydata/xarray/issues/2040, it actually results in much larger netCDF files in many cases than using character arrays, which are more easily compressed. Nonetheless, we currently default to storing unicode strings in `NC_STRING`, because it's the most portable option -- every tool that handles HDF5 and netCDF4 should be able to read it properly as unicode strings. # NumPy/Python string types On the Python side, our options are perhaps even more confusing: - NumPy's `dtype=np.string_` corresponds to fixed-length bytes. This is the default dtype for strings on Python 2, because on Python 2 strings are the same as bytes. - NumPy's `dtype=np.unicode_` corresponds to fixed-length unicode. This is the default dtype for strings on Python 3, because on Python 3 strings are the same as unicode. - Strings are also commonly stored in numpy arrays with `dtype=np.object_`, as arrays of either `bytes` or `unicode` objects. This is a pragmatic choice, because otherwise NumPy has no support for variable length strings. We also use this (like pandas) to mark missing values with `np.nan`. Like pandas, we are pretty liberal with converting back and forth between fixed-length (`np.string`/`np.unicode_`) and variable-length (object dtype) representations of strings as necessary. This works pretty well, though converting from object arrays in particular has downsides, since it cannot be done lazily with dask. # Current behavior of xarray Currently, xarray uses the same behavior on Python 2/3. The priority was faithfully round-tripping data from a particular version of Python to netCDF and back, which the current serialization behavior achieves: | Python version | NetCDF version | NumPy datatype | NetCDF datatype | | --------- | ---------- | -------------- | ------------ | | Python 2 | NETCDF3 | np.string_ / str | NC_CHAR | | Python 2 | NETCDF4 | np.string_ / str | NC_CHAR | | Python 3 | NETCDF3 | np.string_ / bytes | NC_CHAR | | Python 3 | NETCDF4 | np.string_ / bytes | NC_CHAR | | Python 2 | NETCDF3 | np.unicode_ / unicode | NC_CHAR with UTF-8 encoding | | Python 2 | NETCDF4 | np.unicode_ / unicode | NC_STRING | | Python 3 | NETCDF3 | np.unicode_ / str | NC_CHAR with UTF-8 encoding | | Python 3 | NETCDF4 | np.unicode_ / str | NC_STRING | | Python 2 | NETCDF3 | object bytes/str | NC_CHAR | | Python 2 | NETCDF4 | object bytes/str | NC_CHAR | | Python 3 | NETCDF3 | object bytes | NC_CHAR | | Python 3 | NETCDF4 | object bytes | NC_CHAR | | Python 2 | NETCDF3 | object unicode | NC_CHAR with UTF-8 encoding | | Python 2 | NETCDF4 | object unicode | NC_STRING | | Python 3 | NETCDF3 | object unicode/str | NC_CHAR with UTF-8 encoding | | Python 3 | NETCDF4 | object unicode/str | NC_STRING | This can also be selected explicitly for most data-types by setting dtype in encoding: - `'S1'` for NC_CHAR (with or without encoding) - `str` for NC_STRING (though I'm not 100% sure it works properly currently when given bytes) Script for generating table:
```python from __future__ import print_function import xarray as xr import uuid import netCDF4 import numpy as np import sys for dtype_name, value in [ ('np.string_ / ' + type(b'').__name__, np.array([b'abc'])), ('np.unicode_ / ' + type(u'').__name__, np.array([u'abc'])), ('object bytes/' + type(b'').__name__, np.array([b'abc'], dtype=object)), ('object unicode/' + type(u'').__name__, np.array([u'abc'], dtype=object)), ]: for format in ['NETCDF3_64BIT', 'NETCDF4']: filename = str(uuid.uuid4()) + '.nc' xr.Dataset({'data': value}).to_netcdf(filename, format=format) with netCDF4.Dataset(filename) as f: var = f.variables['data'] disk_dtype = var.dtype has_encoding = hasattr(var, '_Encoding') disk_dtype_name = (('NC_CHAR' if disk_dtype == 'S1' else 'NC_STRING') + (' with UTF-8 encoding' if has_encoding else '')) print('|', 'Python %i' % sys.version_info[0], '|', format[:7], '|', dtype_name, '|', disk_dtype_name, '|') ```
# Potential alternatives The main option I'm considering is switching to default to `NC_CHAR` with UTF-8 encoding for np.string_ / str and object bytes/str on Python 2. The current behavior could be explicitly toggled by setting an encoding of `{'_Encoding': None}`. This would imply two changes: 1. Attempting to serialize arbitrary bytes (on Python 2) would start raising an error -- anything that isn't ASCII would require explicitly disabling `_Encoding`. 2. Strings read back from disk on Python 2 would come back as unicode instead of bytes. This implicit conversion would be consistent with Python 2's general handling of bytes/unicode, and facilitate reading netCDF files on Python 3 that were written with Python 2. The counter-argument is that it may not be worth changing this at this late point, given that we will be sunsetting Python 2 support by year's end.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2059/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 613012939,MDExOlB1bGxSZXF1ZXN0NDEzODQ3NzU0,4035,Support parallel writes to regions of zarr stores,1217238,closed,0,,,17,2020-05-06T02:40:19Z,2020-11-04T06:19:01Z,2020-11-04T06:19:01Z,MEMBER,,0,pydata/xarray/pulls/4035,"This PR adds support for a `region` keyword argument to `to_zarr()`, to support parallel writes to different parts of arrays in a zarr stores, e.g., `ds.to_zarr(..., region={'x': slice(1000, 2000)})` to write a dataset over the range `1000:2000` along the `x` dimension. This is useful for creating large Zarr datasets _without_ requiring dask. For example, the separate workers in a simulation job might each write a single non-overlapping chunk of a Zarr file. The standard way to handle such datasets today is to first write netCDF files in each process, and then consolidate them afterwards with dask (see #3096). ### Creating empty Zarr stores In order to do so, the Zarr file must be pre-existing with desired variables in the right shapes/chunks. It is desirable to be able to create such stores without actually writing data, because datasets that we want to write in parallel may be very large. In the example below, I achieve this filling a `Dataset` with dask arrays, and passing `compute=False` to `to_zarr()`. This works, but it relies on an undocumented implementation detail of the `compute` argument. We should either: 1. Officially document that the `compute` argument only controls writing array values, not metadata (at least for zarr). 2. Add a new keyword argument or entire new method for creating an unfilled Zarr store, e.g., `write_values=False`. I think (1) is maybe the cleanest option (no extra API endpoints). ### Unchunked variables One potential gotcha concerns coordinate arrays that are not chunked, e.g., consider parallel writing of a dataset divided along time with 2D `latitude` and `longitude` arrays that are fixed over all chunks. With the current PR, such coordinate arrays would get rewritten by each separate writer. If a Zarr store does not have atomic writes, then conceivably this could result in corrupted data. The default DirectoryStore has atomic writes and cloud based object stores should also be atomic, so perhaps this doesn't matter in practice, but at the very least it's inefficient and could cause issues for large-scale jobs due to resource contention. Options include: 1. Current behavior. Variables whose dimensions do not overlap with `region` are written by `to_zarr()`. *This is likely the most intuitive behavior for writing from a single process at a time.* 2. Exclude variables whose dimensions do not overlap with `region` from being written. This is likely the most convenient behavior for writing from multiple processes at once. 3. Like (2), but issue a warning if any such variables exist instead of silently dropping them. 4. Like (2), but raise an error instead of a warning. Require the user to explicitly drop them with `.drop()`. This is probably the safest behavior. I think (4) would be my preferred option. Some users would undoubtedly find this annoying, but the power-users for whom we are adding this feature would likely appreciate it. ### Usage example ```python import xarray import dask.array as da ds = xarray.Dataset({'u': (('x',), da.arange(1000, chunks=100))}) # create the new zarr store, but don't write data path = 'my-data.zarr' ds.to_zarr(path, compute=False) # look at the unwritten data ds_opened = xarray.open_zarr(path) print('Data before writing:', ds_opened.u.data[::100].compute()) # Data before writing: [ 1 100 1 100 100 1 1 1 1 1] # write out each slice (could be in separate processes) for start in range(0, 1000, 100): selection = {'x': slice(start, start + 100)} ds.isel(selection).to_zarr(path, region=selection) print('Data after writing:', ds_opened.u.data[::100].compute()) # Data after writing: [ 0 100 200 300 400 500 600 700 800 900] ``` - [x] Closes https://github.com/pydata/xarray/issues/3096 - [x] Integration test - [x] Unit tests - [x] Passes `isort -rc . && black . && mypy . && flake8` - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4035/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 124809636,MDU6SXNzdWUxMjQ4MDk2MzY=,703,Document xray internals / advanced API,1217238,closed,0,,,2,2016-01-04T18:12:30Z,2020-11-03T17:33:32Z,2020-11-03T17:33:32Z,MEMBER,,,,"It would be useful to document the internal `Variable` class and the internal structure of `Dataset` and `DataArray`. This would be helpful for both new contributors and expert users, who might find `Variable` helpful as an advanced API. I had some notes in an earlier version of the docs that could be adapted. Note, however, that the internal structure of `DataArray` changed in #648: http://xray.readthedocs.org/en/v0.2/tutorial.html#notes-on-xray-s-internals ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/703/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 715374721,MDU6SXNzdWU3MTUzNzQ3MjE=,4490,Group together decoding options into a single argument,1217238,open,0,,,6,2020-10-06T06:15:18Z,2020-10-29T04:07:46Z,,MEMBER,,,,"**Is your feature request related to a problem? Please describe.** `open_dataset()` currently has a _very_ long function signature. This makes it hard to keep track of everything it can do, and is particularly problematic for the authors of _new_ backends (e.g., see https://github.com/pydata/xarray/pull/4477), which might need to know how to handle all these arguments. **Describe the solution you'd like** To simple the interface, I propose to group together all the decoding options into a new `DecodingOptions` class. I'm thinking something like: ```python from dataclasses import dataclass, field, asdict from typing import Optional, List @dataclass(frozen=True) class DecodingOptions: mask: Optional[bool] = None scale: Optional[bool] = None datetime: Optional[bool] = None timedelta: Optional[bool] = None use_cftime: Optional[bool] = None concat_characters: Optional[bool] = None coords: Optional[bool] = None drop_variables: Optional[List[str]] = None @classmethods def disabled(cls): return cls(mask=False, scale=False, datetime=False, timedelta=False, concat_characters=False, coords=False) def non_defaults(self): return {k: v for k, v in asdict(self).items() if v is not None} # add another method for creating default Variable Coder() objects, # e.g., those listed in encode_cf_variable() ``` The signature of `open_dataset` would then become: ```python def open_dataset( filename_or_obj, group=None, * engine=None, chunks=None, lock=None, cache=None, backend_kwargs=None, decode: Union[DecodingOptions, bool] = None, **deprecated_kwargs ): if decode is None: decode = DecodingOptions() if decode is False: decode = DecodingOptions.disabled() # handle deprecated_kwargs... ... ``` **Question**: are `decode` and `DecodingOptions` the right names? Maybe these should still include the name ""CF"", e.g., `decode_cf` and `CFDecodingOptions`, given that these are specific to CF conventions? **Note**: the current signature is `open_dataset(filename_or_obj, group=None, decode_cf=True, mask_and_scale=None, decode_times=True, autoclose=None, concat_characters=True, decode_coords=True, engine=None, chunks=None, lock=None, cache=None, drop_variables=None, backend_kwargs=None, use_cftime=None, decode_timedelta=None)` Usage with the new interface would look like `xr.open_dataset(filename, decode=False)` or `xr.open_dataset(filename, decode=xr.DecodingOptions(mask=False, scale=False))`. This requires a _little_ bit more typing than what we currently have, but it has a few advantages: 1. It's easier to understand the role of different arguments. Now there is a function with ~8 arguments and a class with ~8 arguments rather than a function with ~15 arguments. 2. It's easier to add new decoding arguments (e.g., for more advanced CF conventions), because they don't clutter the `open_dataset` interface. For example, I separated out `mask` and `scale` arguments, versus the current `mask_and_scale` argument. 3. If a new backend plugin for `open_dataset()` needs to handle every option supported by `open_dataset()`, this makes that task significantly easier. The only decoding options they need to worry about are _non-default_ options that were explicitly set, i.e., those exposed by the `non_defaults()` method. If another decoding option wasn't explicitly set and isn't recognized by the backend, they can just ignore it. **Describe alternatives you've considered** For the overall approach: 1. We could keep the current design, with separate keyword arguments for decoding options, and just be very careful about passing around these arguments. This seems pretty painful for the backend refactor, though. 2. We could keep the current design only for the user facing `open_dataset()` interface, and then internally convert into the `DecodingOptions()` struct for passing to backend constructors. This would provide much needed flexibility for backend authors, but most users wouldn't benefit from the new interface. Perhaps this would make sense as an intermediate step? ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4490/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 718492237,MDExOlB1bGxSZXF1ZXN0NTAwODc5MTY3,4500,Add variable/attribute names to netCDF validation errors,1217238,closed,0,,,1,2020-10-10T00:47:18Z,2020-10-10T05:28:08Z,2020-10-10T05:28:08Z,MEMBER,,0,pydata/xarray/pulls/4500,"This should result in a better user experience, e.g., specifically pointing out the attribute with an invalid value. - [x] Tests added - [x] Passes `isort . && black . && mypy . && flake8` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4500/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 169274464,MDU6SXNzdWUxNjkyNzQ0NjQ=,939,Consider how to deal with the proliferation of decoder options on open_dataset,1217238,closed,0,,,8,2016-08-04T01:57:26Z,2020-10-06T15:39:11Z,2020-10-06T15:39:11Z,MEMBER,,,,"There are already lots of keyword arguments, and users want even more! (#843) Maybe we should use some sort of object to encapsulate desired options? ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/939/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 253107677,MDU6SXNzdWUyNTMxMDc2Nzc=,1527,"Binary operations with ds.groupby('time.dayofyear') errors out, but ds.groupby('time.month') works",1217238,open,0,,,10,2017-08-26T16:54:53Z,2020-09-29T10:05:42Z,,MEMBER,,,,"Reported on the mailing list: Original datasets: ``` >>> ds_xr array([-0.01, -0.01, -0.01, ..., -0.27, -0.27, -0.27]) Coordinates: * time (time) datetime64[ns] 1979-01-01 1979-01-02 1979-01-03 ... >>> slope_itcp_ds Dimensions: (lat: 73, level: 2, lon: 144, time: 366) Coordinates: * lon (lon) float32 0.0 2.5 5.0 7.5 10.0 12.5 ... * lat (lat) float32 90.0 87.5 85.0 82.5 80.0 ... * level (level) float64 0.0 1.0 * time (time) datetime64[ns] 2010-01-01 ... Data variables: __xarray_dataarray_variable__ (time, level, lat, lon) float64 -0.8795 ... Attributes: CDI: Climate Data Interface version 1.7.1 (http://mpimet.mpg.de/... Conventions: CF-1.4 history: Fri Aug 25 18:55:50 2017: cdo -inttime,2010-01-01,00:00:00,... CDO: Climate Data Operators version 1.7.1 (http://mpimet.mpg.de/... ``` Issue: Grouping by month works and outputs this: ``` >>> ds_xr.groupby('time.month') - slope_itcp_ds.groupby('time.month').mean('time') Dimensions: (lat: 73, level: 2, lon: 144, time: 12775) Coordinates: * lon (lon) float32 0.0 2.5 5.0 7.5 10.0 12.5 ... * lat (lat) float32 90.0 87.5 85.0 82.5 80.0 ... * level (level) float64 0.0 1.0 month (time) int64 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ... * time (time) datetime64[ns] 1979-01-01 ... Data variables: __xarray_dataarray_variable__ (time, level, lat, lon) float64 1.015 ... ``` Grouping by dayofyear doesn't work and gives this traceback: ``` >>> ds_xr.groupby('time.dayofyear') - slope_itcp_ds.groupby('time.dayofyear').mean('time') KeyError Traceback (most recent call last) in () ----> 1 ds_xr.groupby('time.dayofyear') - slope_itcp_ds.groupby('time.dayofyear').mean('time') /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/groupby.py in func(self, other) 316 g = f if not reflexive else lambda x, y: f(y, x) 317 applied = self._yield_binary_applied(g, other) --> 318 combined = self._combine(applied) 319 return combined 320 return func /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/groupby.py in _combine(self, applied, shortcut) 532 combined = self._concat_shortcut(applied, dim, positions) 533 else: --> 534 combined = concat(applied, dim) 535 combined = _maybe_reorder(combined, dim, positions) 536 /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in concat(objs, dim, data_vars, coords, compat, positions, indexers, mode, concat_over) 118 raise TypeError('can only concatenate xarray Dataset and DataArray ' 119 'objects, got %s' % type(first_obj)) --> 120 return f(objs, dim, data_vars, coords, compat, positions) 121 122 /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions) 210 datasets = align(*datasets, join='outer', copy=False, exclude=[dim]) 211 --> 212 concat_over = _calc_concat_over(datasets, dim, data_vars, coords) 213 214 def insert_result_variable(k, v): /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in _calc_concat_over(datasets, dim, data_vars, coords) 190 if dim in v.dims) 191 concat_over.update(process_subset_opt(data_vars, 'data_vars')) --> 192 concat_over.update(process_subset_opt(coords, 'coords')) 193 if dim in datasets[0]: 194 concat_over.add(dim) /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in process_subset_opt(opt, subset) 165 for ds in datasets[1:]) 166 # all nonindexes that are not the same in each dataset --> 167 concat_new = set(k for k in getattr(datasets[0], subset) 168 if k not in concat_over and differs(k)) 169 elif opt == 'all': /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in (.0) 166 # all nonindexes that are not the same in each dataset 167 concat_new = set(k for k in getattr(datasets[0], subset) --> 168 if k not in concat_over and differs(k)) 169 elif opt == 'all': 170 concat_new = (set(getattr(datasets[0], subset)) - /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in differs(vname) 163 v = datasets[0].variables[vname] 164 return any(not ds.variables[vname].equals(v) --> 165 for ds in datasets[1:]) 166 # all nonindexes that are not the same in each dataset 167 concat_new = set(k for k in getattr(datasets[0], subset) /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in (.0) 163 v = datasets[0].variables[vname] 164 return any(not ds.variables[vname].equals(v) --> 165 for ds in datasets[1:]) 166 # all nonindexes that are not the same in each dataset 167 concat_new = set(k for k in getattr(datasets[0], subset) /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/utils.py in __getitem__(self, key) 288 289 def __getitem__(self, key): --> 290 return self.mapping[key] 291 292 def __iter__(self): KeyError: 'lon' ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1527/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 644821435,MDU6SXNzdWU2NDQ4MjE0MzU=,4176,Pre-expand data and attributes in DataArray/Variable HTML repr?,1217238,closed,0,,,7,2020-06-24T18:22:35Z,2020-09-21T20:10:26Z,2020-06-28T17:03:40Z,MEMBER,,,,"## Proposal Given that a major purpose for plotting an array is to look at data or attributes, I wonder if we should expand these sections by default? - I worry that clicking on icons to expand sections may not be easy to discover - This would also be consistent with the text repr, which shows these sections by default (the Dataset repr is already consistent by default between text and HTML already) ## Context Currently the HTML repr for DataArray/Variable looks like this: ![image](https://user-images.githubusercontent.com/1217238/85610183-9e014400-b60b-11ea-8be1-5f9196126acd.png) To see array data, you have to click on the ![image](https://user-images.githubusercontent.com/1217238/85610286-b7a28b80-b60b-11ea-9496-a4f9d9b048ac.png) icon: ![image](https://user-images.githubusercontent.com/1217238/85610262-b1acaa80-b60b-11ea-9621-17f0bcffb885.png) (thanks to @max-sixty for making this a little bit more manageably sized in https://github.com/pydata/xarray/pull/3905!) There's also a really nice repr for nested dask arrays: ![image](https://user-images.githubusercontent.com/1217238/85610598-fcc6bd80-b60b-11ea-8b1a-5cf950449dcb.png) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4176/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 702372014,MDExOlB1bGxSZXF1ZXN0NDg3NjYxMzIz,4426,Fix for h5py deepcopy issues,1217238,closed,0,,,6,2020-09-16T01:11:00Z,2020-09-18T22:31:13Z,2020-09-18T22:31:09Z,MEMBER,,0,pydata/xarray/pulls/4426," - [x] Closes #4425 - [x] Tests added - [x] Passes `isort . && black . && mypy . && flake8` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4426/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 669307837,MDExOlB1bGxSZXF1ZXN0NDU5Njk1NDA5,4292,Fix indexing with datetime64[ns] with pandas=1.1,1217238,closed,0,,,11,2020-07-31T00:48:50Z,2020-09-16T03:11:48Z,2020-09-16T01:33:30Z,MEMBER,,0,pydata/xarray/pulls/4292,"Fixes #4283 The underlying issue is that calling `.item()` on a NumPy array with `dtype=datetime64[ns]` returns an _integer_, rather than an `np.datetime64` scalar. This is somewhat baffling but works this way because `.item()` returns native Python types, but `datetime.datetime` doesn't support nanosecond precision. `pandas.Index.get_loc` used to support these integers, but now is more strict. Hence we get errors. We can fix this by using `array[()]` to convert 0d arrays into NumPy scalars instead of calling `array.item()`. I've added a crude regression test. There may well be a better way to test this but I haven't figured it out yet. - [x] Tests added - [x] Passes `isort . && black . && mypy . && flake8` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4292/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 417542619,MDU6SXNzdWU0MTc1NDI2MTk=,2803,Test failure with TestValidateAttrs.test_validating_attrs,1217238,closed,0,,,6,2019-03-05T23:03:02Z,2020-08-25T14:29:19Z,2019-03-14T15:59:13Z,MEMBER,,,,"This is due to setting multi-dimensional attributes being an error, as of the latest netCDF4-Python release: https://github.com/Unidata/netcdf4-python/blob/master/Changelog E.g., as seen on Appveyor: https://ci.appveyor.com/project/shoyer/xray/builds/22834250/job/9q0ip6i3cchlbkw2 ``` ================================== FAILURES =================================== ___________________ TestValidateAttrs.test_validating_attrs ___________________ self = def test_validating_attrs(self): def new_dataset(): return Dataset({'data': ('y', np.arange(10.0))}, {'y': np.arange(10)}) def new_dataset_and_dataset_attrs(): ds = new_dataset() return ds, ds.attrs def new_dataset_and_data_attrs(): ds = new_dataset() return ds, ds.data.attrs def new_dataset_and_coord_attrs(): ds = new_dataset() return ds, ds.coords['y'].attrs for new_dataset_and_attrs in [new_dataset_and_dataset_attrs, new_dataset_and_data_attrs, new_dataset_and_coord_attrs]: ds, attrs = new_dataset_and_attrs() attrs[123] = 'test' with raises_regex(TypeError, 'Invalid name for attr'): ds.to_netcdf('test.nc') ds, attrs = new_dataset_and_attrs() attrs[MiscObject()] = 'test' with raises_regex(TypeError, 'Invalid name for attr'): ds.to_netcdf('test.nc') ds, attrs = new_dataset_and_attrs() attrs[''] = 'test' with raises_regex(ValueError, 'Invalid name for attr'): ds.to_netcdf('test.nc') # This one should work ds, attrs = new_dataset_and_attrs() attrs['test'] = 'test' with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = {'a': 5} with raises_regex(TypeError, 'Invalid value for attr'): ds.to_netcdf('test.nc') ds, attrs = new_dataset_and_attrs() attrs['test'] = MiscObject() with raises_regex(TypeError, 'Invalid value for attr'): ds.to_netcdf('test.nc') ds, attrs = new_dataset_and_attrs() attrs['test'] = 5 with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = 3.14 with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = [1, 2, 3, 4] with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = (1.9, 2.5) with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = np.arange(5) with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = np.arange(12).reshape(3, 4) with create_tmp_file() as tmp_file: > ds.to_netcdf(tmp_file) xarray\tests\test_backends.py:3450: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ xarray\core\dataset.py:1323: in to_netcdf compute=compute) xarray\backends\api.py:767: in to_netcdf unlimited_dims=unlimited_dims) xarray\backends\api.py:810: in dump_to_store unlimited_dims=unlimited_dims) xarray\backends\common.py:262: in store self.set_attributes(attributes) xarray\backends\common.py:278: in set_attributes self.set_attribute(k, v) xarray\backends\netCDF4_.py:418: in set_attribute _set_nc_attribute(self.ds, key, value) xarray\backends\netCDF4_.py:294: in _set_nc_attribute obj.setncattr(key, value) netCDF4\_netCDF4.pyx:2781: in netCDF4._netCDF4.Dataset.setncattr ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E ValueError: multi-dimensional array attributes not supported netCDF4\_netCDF4.pyx:1514: ValueError ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2803/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 676306518,MDU6SXNzdWU2NzYzMDY1MTg=,4331,Support explicitly setting a dimension order with to_dataframe(),1217238,closed,0,,,0,2020-08-10T17:45:17Z,2020-08-14T18:28:26Z,2020-08-14T18:28:26Z,MEMBER,,,,"As discussed in https://github.com/pydata/xarray/issues/2346, it would be nice to support explicitly setting the desired order of dimensions when calling `Dataset.to_dataframe()` or `DataArray.to_dataframe()`. There is nice precedent for this in the `to_dask_dataframe` method: http://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_dask_dataframe.html I imagine we could copy the exact same API for `to_dataframe.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4331/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 671019427,MDU6SXNzdWU2NzEwMTk0Mjc=,4295,We shouldn't require a recent version of setuptools to install xarray,1217238,closed,0,,,33,2020-08-01T16:49:57Z,2020-08-14T09:52:42Z,2020-08-14T09:52:42Z,MEMBER,,,,"@canol reports on our mailing that our setuptools 41.2 (released 21 August 2019) install requirement is making it hard to install recent versions of xarray at his company: https://groups.google.com/g/xarray/c/HS_xcZDEEtA/m/GGmW-3eMCAAJ > Hello, this is just a feedback about an issue we experienced which caused our internal tools stack to stay with xarray 0.15 version instead of a newer versions. > > We are a company using xarray in our internal frameworks and at the beginning we didn't have any restrictions on xarray version in our requirements file, so that new installations of our framework were using the latest version of xarray. But a few months ago we started to hear complaints from users who were having problems with installing our framework and the installation was failing because of xarray's requirement to use at least setuptools 41.2 which is released on 21th of August last year. So it hasn't been a year since it got released which might be considered relatively new. > > During the installation of our framework, pip was failing to update setuptools by saying that some other process is already using setuptools files so it cannot update setuptools. The people who are using our framework are not software developers so they didn't know how to solve this problem and it became so overwhelming for us maintainers that we set the xarray requirement to version >=0.15 <0.16. We also share our internal framework with customers of our company so we didn't want to bother the customers with any potential problems. > > You can see some other people having having similar problem when trying to update setuptools here (although not related to xarray): https://stackoverflow.com/questions/49338652/pip-install-u-setuptools-fail-windows-10 > > It is not a big deal but I just wanted to give this as a feedback. I don't know how much xarray depends on setuptools' 41.2 version. I was surprised to see this in our `setup.cfg` file, added by @crusaderky in #3628. The version requirement is not documented in our docs. Given that setuptools may be challenging to upgrade, would it be possible to relax this version requirement?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4295/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 638597800,MDExOlB1bGxSZXF1ZXN0NDM0MzMxNzQ3,4154,Update issue templates inspired/based on dask,1217238,closed,0,,,1,2020-06-15T07:00:53Z,2020-08-05T13:05:33Z,2020-06-17T16:50:57Z,MEMBER,,0,pydata/xarray/pulls/4154,See https://github.com/dask/dask/issues/new/choose for an approximate example of what this looks like.,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4154/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 290593053,MDU6SXNzdWUyOTA1OTMwNTM=,1850,xarray contrib module,1217238,closed,0,,,25,2018-01-22T19:50:08Z,2020-07-23T16:34:10Z,2020-07-23T16:34:10Z,MEMBER,,,,"Over in #1288 @nbren12 wrote: > Overall, I think the xarray community could really benefit from some kind of centralized contrib package which has a low barrier to entry for these kinds of functions. Yes, I agree that we should explore this. There are a lot of interesting projects building on xarray now but not great ways to discover them. Are there other open source projects with a good model we should copy here? - Scikit-Learn has a separate GitHub org/repositories for contrib projects: https://github.com/scikit-learn-contrib. - TensorFlow has a contrib module within the TensorFlow namespace: `tensorflow.contrib` This gives us two different models to consider. The first ""separate repository"" model might be easier/flexible from a maintenance perspective. Any preferences/thoughts? There's also some nice overlap with the [Pangeo project](https://pangeo-data.github.io/).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1850/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 646073396,MDExOlB1bGxSZXF1ZXN0NDQwNDMxNjk5,4184,Improve the speed of from_dataframe with a MultiIndex (by 40x!),1217238,closed,0,,,1,2020-06-26T07:39:14Z,2020-07-02T20:39:02Z,2020-07-02T20:39:02Z,MEMBER,,0,pydata/xarray/pulls/4184,"Before: pandas.MultiIndexSeries.time_to_xarray ======= ========= ========== -- subset ------- -------------------- dtype True False ======= ========= ========== int 505±0ms 37.1±0ms float 485±0ms 38.3±0ms ======= ========= ========== After: pandas.MultiIndexSeries.time_to_xarray ======= ============ ========== -- subset ------- ----------------------- dtype True False ======= ============ ========== int 10.7±0.4ms 22.6±1ms float 10.0±0.8ms 21.1±1ms ======= ============ ========== ~~There are still some cases where we have to fall back to the existing slow implementation, but hopefully they should now be relatively rare.~~ Edit: now we always use the new implementation - [x] Closes #2459, closes #4186 - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [x] Passes `isort -rc . && black . && mypy . && flake8` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4184/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 1, ""eyes"": 0}",,,13221727,pull 645961347,MDExOlB1bGxSZXF1ZXN0NDQwMzQ2NTQz,4182,Show data by default in HTML repr for DataArray,1217238,closed,0,,,0,2020-06-26T02:25:08Z,2020-06-28T17:03:41Z,2020-06-28T17:03:41Z,MEMBER,,0,pydata/xarray/pulls/4182," - [x] Closes #4176 - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4182/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 644170008,MDExOlB1bGxSZXF1ZXN0NDM4ODQxMjk2,4171,Remove
 from nested HTML repr,1217238,closed,0,,,0,2020-06-23T21:51:14Z,2020-06-24T15:45:20Z,2020-06-24T15:45:00Z,MEMBER,,0,pydata/xarray/pulls/4171,"Using `
` messes up the display of nested HTML reprs, e.g., from dask. Now we only use the `
` tag when displaying raw text reprs.

Before (Jupyter notebook):
![image](https://user-images.githubusercontent.com/1217238/85467844-8faa1e00-b560-11ea-8565-b22105ca603a.png)

After:
![image](https://user-images.githubusercontent.com/1217238/85467860-946ed200-b560-11ea-90ed-79ea6505e07f.png)

 - [x] Tests added
 - [x] Passes `isort -rc . && black . && mypy . && flake8`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4171/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
613546626,MDExOlB1bGxSZXF1ZXN0NDE0MjgwMDEz,4039,Revise pull request template,1217238,closed,0,,,5,2020-05-06T19:08:19Z,2020-06-18T05:45:11Z,2020-06-18T05:45:10Z,MEMBER,,0,pydata/xarray/pulls/4039,"See below for the new language, to clarify that documentation is only necessary
for ""user visible changes.""

I added ""including notable bug fixes"" to indicate that minor bug fixes may not
be worth noting (I was thinking of test-suite only fixes in this category) but
perhaps that is too confusing.

cc @pydata/xarray for opinions!



 - [ ] Closes #xxxx
 - [ ] Tests added
 - [ ] Passes `isort -rc . && black . && mypy . && flake8`
 - [ ] Fully documented, including `whats-new.rst` for user visible changes
       (including notable bug fixes) and `api.rst` for new API
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4039/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
639334065,MDExOlB1bGxSZXF1ZXN0NDM0OTQ0NTc4,4159,Test RTD's new pull request builder,1217238,closed,0,,,1,2020-06-16T03:06:32Z,2020-06-17T16:54:02Z,2020-06-17T16:54:02Z,MEMBER,,1,pydata/xarray/pulls/4159,"https://docs.readthedocs.io/en/latest/guides/autobuild-docs-for-pull-requests.html



Don't merge this!","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4159/reactions"", ""total_count"": 3, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 3, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
639397110,MDExOlB1bGxSZXF1ZXN0NDM0OTk1NzQz,4160,Fix failing upstream-dev build & remove docs build,1217238,closed,0,,,0,2020-06-16T06:08:55Z,2020-06-16T06:35:49Z,2020-06-16T06:35:44Z,MEMBER,,0,pydata/xarray/pulls/4160,"Instead, we'll use RTD's new doc builder instead. For an example, click on
""docs/readthedocs.org:xray"" below or look at GH4159



 - [x] Closes https://github.com/pydata/xarray/issues/4146
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4160/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
35682274,MDU6SXNzdWUzNTY4MjI3NA==,158,groupby should work with name=None,1217238,closed,0,,,2,2014-06-13T15:38:00Z,2020-05-30T13:15:56Z,2020-05-30T13:15:56Z,MEMBER,,,,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/158/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
612214951,MDExOlB1bGxSZXF1ZXN0NDEzMjIyOTEx,4028,Remove broken test for Panel with to_pandas(),1217238,closed,0,,,5,2020-05-04T22:41:42Z,2020-05-06T01:50:21Z,2020-05-06T01:50:21Z,MEMBER,,0,pydata/xarray/pulls/4028,"We don't support creating a Panel with to_pandas() with *any* version of
pandas at present, so this test was previous broken if pandas < 0.25 was
installed.


","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4028/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
612772669,MDU6SXNzdWU2MTI3NzI2Njk=,4030,Doc build on Azure is timing out on master,1217238,closed,0,,,1,2020-05-05T17:30:16Z,2020-05-05T21:49:26Z,2020-05-05T21:49:26Z,MEMBER,,,,"I don't know what's going on, but it currently times out after 1 hour:
https://dev.azure.com/xarray/xarray/_build/results?buildId=2767&view=logs&j=7e620c85-24a8-5ffa-8b1f-642bc9b1fc36&t=68484831-0a19-5145-bfe9-6309e5f7691d

Is it possible to login to Azure to debug this stuff?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4030/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
612838635,MDExOlB1bGxSZXF1ZXN0NDEzNzA3Mzgy,4032,Allow warning with cartopy in docs plotting build,1217238,closed,0,,,1,2020-05-05T19:25:11Z,2020-05-05T21:49:26Z,2020-05-05T21:49:26Z,MEMBER,,0,pydata/xarray/pulls/4032,"Fixes https://github.com/pydata/xarray/issues/4030

It looks like this is triggered by the new cartopy version now being installed
on RTD (version 0.17.0 -> 0.18.0).

Long term we should fix this, but for now it's better just to disable the
warning.

Here's the message from RTD:
```
Exception occurred:
  File ""/home/docs/checkouts/readthedocs.org/user_builds/xray/conda/latest/lib/python3.8/site-packages/IPython/sphinxext/ipython_directive.py"", line 586, in process_input
    raise RuntimeError('Non Expected warning in `{}` line {}'.format(filename, lineno))
RuntimeError: Non Expected warning in `/home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/latest/doc/plotting.rst` line 732
The full traceback has been saved in /tmp/sphinx-err-qav6jjmm.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at . Thanks!

>>>-------------------------------------------------------------------------
Warning in /home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/latest/doc/plotting.rst at block ending on line 732
Specify :okwarning: as an option in the ipython:: block to suppress this message
----------------------------------------------------------------------------
/home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/latest/xarray/plot/facetgrid.py:373: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations.
  self.fig.tight_layout()
<<<-------------------------------------------------------------------------
```
https://readthedocs.org/projects/xray/builds/10969146/


","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4032/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
612262200,MDExOlB1bGxSZXF1ZXN0NDEzMjYwNTY2,4029,Support overriding existing variables in to_zarr() without appending,1217238,closed,0,,,2,2020-05-05T01:06:40Z,2020-05-05T19:28:02Z,2020-05-05T19:28:02Z,MEMBER,,0,pydata/xarray/pulls/4029,"This is nice for consistency with `to_netcdf`. It should be useful for cases where users want to update values in existing Zarr datasets.



 - [x] Tests added
 - [x] Passes `isort -rc . && black . && mypy . && flake8`
 - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4029/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
187625917,MDExOlB1bGxSZXF1ZXN0OTI1MjQzMjg=,1087,WIP: New DataStore / Encoder / Decoder API for review,1217238,closed,0,,,8,2016-11-07T05:02:04Z,2020-04-17T18:37:45Z,2020-04-17T18:37:45Z,MEMBER,,0,pydata/xarray/pulls/1087,"The goal here is to make something extensible that we can live with for quite
some time, and to clean up the internals of xarray's backend interface.

Most of these are analogues of existing xarray classes with a cleaned up
interface. I have not yet worried about backwards compatibility or tests -- I
would appreciate feedback on the approach here.

Several parts of the logic exist for the sake of dask. I've included the word
""dask"" in comments to facilitate inspection by mrocklin.

CC @rabernat, @pwolfram, @jhamman, @mrocklin -- for review

CC @mcgibbon, @JoyMonteiro -- this is relevant to our discussion today about
adding support for appending to netCDF files. Don't let this stop you from
getting started on that with the existing interface, though.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1087/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
598567792,MDU6SXNzdWU1OTg1Njc3OTI=,3966,HTML repr is slightly broken in Google Colab,1217238,closed,0,,,1,2020-04-12T20:44:51Z,2020-04-16T20:14:37Z,2020-04-16T20:14:32Z,MEMBER,,,,"The ""data"" toggles are pre-expanded and don't work.

See https://github.com/googlecolab/colabtools/issues/1145 for a full description.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3966/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
479434052,MDU6SXNzdWU0Nzk0MzQwNTI=,3206,DataFrame with MultiIndex -> xarray with sparse array,1217238,closed,0,,,1,2019-08-12T00:46:16Z,2020-04-06T20:41:26Z,2019-08-27T08:54:26Z,MEMBER,,,,"Now that we have preliminary support for [sparse](https://sparse.pydata.org/en/latest/) arrays in xarray, one really cool feature we could explore is creating sparse arrays from MultiIndexed pandas DataFrames.

Right now, xarray's methods for creating objects from pandas always create dense arrays, but the size of these dense arrays can get big really quickly if the MultiIndex is sparsely populated, e.g.,
```python
import pandas as pd
import numpy as np
import xarray
df = pd.DataFrame({
    'w': range(10),
    'x': list('abcdefghij'),
    'y': np.arange(0, 100, 10),
    'z': np.ones(10),
}).set_index(['w', 'x', 'y'])
print(xarray.Dataset.from_dataframe(df))
```
This length 10 DataFrame turned into a dense array with 1000 elements (only 10 of which are not NaN):
```

Dimensions:  (w: 10, x: 10, y: 10)
Coordinates:
  * w        (w) int64 0 1 2 3 4 5 6 7 8 9
  * x        (x) object 'a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j'
  * y        (y) int64 0 10 20 30 40 50 60 70 80 90
Data variables:
    z        (w, x, y) float64 1.0 nan nan nan nan nan ... nan nan nan nan 1.0
```

We can imagine `xarray.Dataset.from_dataframe(df, sparse=True)` would make the same Dataset, but with sparse array (with a `NaN` fill value) instead of dense arrays.

Once sparse arrays work pretty well, this could actually obviate most of the use cases for `MultiIndex` in arrays. Arguably the model is quite a bit cleaner.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3206/reactions"", ""total_count"": 3, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 3, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
479940669,MDU6SXNzdWU0Nzk5NDA2Njk=,3212,Custom fill_value for from_dataframe/from_series,1217238,open,0,,,0,2019-08-13T03:22:46Z,2020-04-06T20:40:26Z,,MEMBER,,,,"It would be to have the option to customize the fill value when creating an xarray objects from pandas, instead of requiring to always be NaN.

This would probably be especially useful when creating sparse arrays (https://github.com/pydata/xarray/issues/3206), for which it often makes sense to use a fill value of zero. If your data has integer values (e.g., it represents counts), you probably don't want to let it be cast to float first.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3212/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
314482923,MDU6SXNzdWUzMTQ0ODI5MjM=,2061,Backend specific conventions decoding,1217238,open,0,,,1,2018-04-16T02:45:46Z,2020-04-05T23:42:34Z,,MEMBER,,,,"Currently, we have a single function `xarray.decode_cf()` that we apply to data loaded from all xarray backends.

This is appropriate for netCDF data, but it's not appropriate for backends with different implementations. For example, it doesn't work for zarr (which is why we have the separate `open_zarr`), and is also a poor fit for PseudoNetCDF (https://github.com/pydata/xarray/pull/1905). In the worst cases (e.g., for PseudoNetCDF) it can actually result in data being decoded *twice*, which can result in incorrectly scaled data.

Instead, we should declare default decoders as part of the backend API, and use those decoders as the defaults for `open_dataset()`.

This should probably be tackled as part of the broader backends refactor: https://github.com/pydata/xarray/issues/1970
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2061/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
28376794,MDU6SXNzdWUyODM3Njc5NA==,25,Consistent rules for handling merges between variables with different attributes,1217238,closed,0,,,13,2014-02-26T22:37:01Z,2020-04-05T19:13:13Z,2014-09-04T06:50:49Z,MEMBER,,,,"Currently, variable attributes are checked for equality before allowing for a merge via a call to `xarray_equal`. It should be possible to merge datasets even if some of the variable metadata disagrees (conflicting attributes should be dropped). This is already the behavior for global attributes.

The right design of this feature should probably include some optional argument to `Dataset.merge` indicating how strict we want the merge to be. I can see at least three versions that could be useful:
1. Drop conflicting metadata silently.
2. Don't allow for conflicting values, but drop non-matching keys.
3. Require all keys and values to match.

We can argue about which of these should be the default option. My inclination is to be as flexible as possible by using 1 or 2 in most cases.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/25/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
173612265,MDU6SXNzdWUxNzM2MTIyNjU=,988,Hooks for custom attribute handling in xarray operations,1217238,open,0,,,24,2016-08-27T19:48:22Z,2020-04-05T18:19:11Z,,MEMBER,,,,"Over in #964, I am working on a rewrite/unification of the guts of xarray's logic for computation with labelled data. The goal is to get all of xarray's internal logic for working with labelled data going through a minimal set of flexible functions which we can also expose as part of the API.

Because we will finally have all (or at least nearly all) xarray operations using the same code path, I think it will also finally become feasible to open up hooks allowing extensions how xarray handles metadata.

Two obvious use cases here are units (#525) and automatic maintenance of metadata (e.g., [`cell_methods`](https://github.com/pydata/xarray/issues/987#issuecomment-242912131) or [`history`](#826) fields). Both of these are out of scope for xarray itself, mostly because the specific logic tends to be domain specific. This could also subsume options like the existing `keep_attrs` on many operations.

I like the idea of supporting something like NumPy's [`__array_wrap__`](http://docs.scipy.org/doc/numpy-1.11.0/reference/arrays.classes.html#numpy.class.__array_wrap__) to allow third-party code to finalize xarray objects in some way before they are returned. However, it's not obvious to me what the right design is.
- Should we lookup a custom attribute on subclasses like `__array_wrap__` (or `__numpy_ufunc__`) in NumPy, or should we have a system (e.g., unilaterally or with a context manager and `xarray.set_options`) for registering hooks that are then checked on _all_ xarray objects? I am inclined toward the later, even though it's a little slower, just because it will be simpler and easier to get right
- Should these methods be able to control the full result objects, or only set `attrs` and/or `name`?
- To be useful, do we need to allow extensions to take control of the full operation, to support things like automatic unit conversion? This would suggest something closing to `__numpy_ufunc__`, which is a little more ambitious than what I had in mind here.

Feedback would be greatly appreciated.

CC @darothen @rabernat @jhamman @pwolfram
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/988/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
29136905,MDU6SXNzdWUyOTEzNjkwNQ==,60,Implement DataArray.idxmax(),1217238,closed,0,,741199,14,2014-03-10T22:03:06Z,2020-03-29T01:54:25Z,2020-03-29T01:54:25Z,MEMBER,,,,"Should match the pandas function: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.idxmax.html
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/60/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue