github: issues: 896 rows where user = 1217238 sorted by updated

896 rows where user = 1217238 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	assignee	milestone	comments	created_at	updated_at ▲	closed_at	author_association	draft	pull_request	body	reactions	state_reason	repo	type
2266174558	I_kwDOAMm_X86HExRe	8975	Xarray sponsorship guidelines	shoyer 1217238	open			3	2024-04-26T17:05:01Z	2024-04-30T20:52:33Z		MEMBER			At what level of support should Xarray acknowledge sponsors on our website? I would like to surface this for open discussion because there are potential sponsoring organizations with conflicts of interest with members of Xarray's leadership team (e.g., Earthmover, which employs @jhamman, @rabernat and @dcherian). My suggestion is to use NumPy's guidelines, with an adjustment down to 1/3 of the thresholds to account for the smaller size of the project: $10,000/yr for unrestricted financial contributions (e.g., donations) $20,000/yr for financial contributions for a particular purpose (e.g., grants) $30,000/yr for in-kind contributions (e.g., time for employees to contribute) 2 person-months/yr of paid work time for one or more Xarray maintainers or regular contributors to any Xarray team or activity The NumPy guidelines also include a grace period of a minimum of 6 months for acknowledging support. I would suggest increasing this to a minimum of 1 year for Xarray. I would greatly appreciate any feedback from members of the community, either in this issue or on the next team meeting.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8975/reactions", "total_count": 6, "+1": 5, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
271043420	MDU6SXNzdWUyNzEwNDM0MjA=	1689	Roundtrip serialization of coordinate variables with spaces in their names	shoyer 1217238	open			5	2017-11-03T16:43:20Z	2024-03-22T14:02:48Z		MEMBER			If coordinates have spaces in their names, they get restored from netCDF files as data variables instead: ``` xarray.open_dataset(xarray.Dataset(coords={'name with spaces': 1}).to_netcdf()) <xarray.Dataset> Dimensions: () Data variables: name with spaces int32 1 ```` This happens because the CF convention is to indicate coordinates as a space separated string, e.g., `coordinates='latitude longitude'`. Even though these aren't CF compliant variable names (which cannot have strings) It would be nice to have an ad-hoc convention for xarray that allows us to serialize/deserialize coordinates in all/most cases. Maybe we could use escape characters for spaces (e.g., `coordinates='name\ with\ spaces'`) or quote names if they have spaces (e.g., `coordinates='"name\ with\ spaces"'`? At the very least, we should issue a warning in these cases.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1689/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
267542085	MDU6SXNzdWUyNjc1NDIwODU=	1647	Representing missing values in string arrays on disk	shoyer 1217238	closed			3	2017-10-23T05:01:10Z	2024-02-06T13:03:40Z	2024-02-06T13:03:40Z	MEMBER			This came up as part of my clean-up of serializing unicode strings in https://github.com/pydata/xarray/pull/1648. There are two ways to represent strings in netCDF files. As character arrays (`NC_CHAR`), supported by both netCDF3 and netCDF4 As variable length unicode strings (`NC_STRING`), only supported by netCDF4/HDF5. Currently, by default (if no `_FillValue` is set) we replace missing values (NaN) with an empty string when writing data to disk. For character arrays, we could use the normal `_FillValue` mechanism to set a fill value and decode when data is read back from disk. In fact, this already currently works for `dtype=bytes` (though it isn't documented): ``` In [10]: ds = xr.Dataset({'foo': ('x', np.array([b'bar', np.nan], dtype=object), {}, {'_FillValue': b''})}) In [11]: ds Out[11]: <xarray.Dataset> Dimensions: (x: 2) Dimensions without coordinates: x Data variables: foo (x) object b'bar' nan In [12]: ds.to_netcdf('foobar.nc') In [13]: xr.open_dataset('foobar.nc').load() Out[13]: <xarray.Dataset> Dimensions: (x: 2) Dimensions without coordinates: x Data variables: foo (x) object b'bar' nan ``` For variable length strings, it currently isn't possible to set a fill-value. So there's no good way to indicate missing values, though this may change if the future depending on the resolution of the netCDF-python issue. It would obviously be nice to always automatically round-trip missing values, both for strings and bytes. I see two possible ways to do this: 1. Require setting an explicit `_FillValue` when a string contains missing values, by raising an error if this isn't done. We need an explicit choice because there aren't any extra unused characters left over, at least for character arrays. (NetCDF explicitly allows arbitrary bytes to be stored in `NC_CHAR`, even though this maps to an HDF5 fixed-width string with ASCII encoding.) For variable length strings, we could potentially set a non-character unicode symbol like `U+FFFF`, but again that isn't supported yet. 2. Treat empty strings as equivalent to a missing value (NaN). This has the advantage of not requiring an explicit choice of `_FillValue`, so we don't need to wait for any netCDF4 issues to be resolved. However, this does mean that empty strings would not round-trip. Still, given the relative prevalence of missing values vs empty strings in xarray/pandas, it's probably the lesser evil to not preserve empty string. The default option is to adopt neither of these, and keep the current behavior where missing values are written as empty strings and not decoded at all. Any opinions? I am leaning towards option (2).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1647/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
842436143	MDU6SXNzdWU4NDI0MzYxNDM=	5081	Lazy indexing arrays as a stand-alone package	shoyer 1217238	open			6	2021-03-27T07:06:03Z	2023-12-15T13:20:03Z		MEMBER			From @rabernat on Twitter: "Xarray has some secret private classes for lazily indexing / wrapping arrays that are so useful I think they should be broken out into a standalone package. https://github.com/pydata/xarray/blob/master/xarray/core/indexing.py#L516" The idea here is create a first-class "duck array" library for lazy indexing that could replace xarray's internal classes for lazy indexing. This would be in some ways similar to dask.array, but much simpler, because it doesn't have to worry about parallel computing. Desired features: Lazy indexing Lazy transposes Lazy concatenation (#4628) and stacking Lazy vectorized operations (e.g., unary and binary arithmetic) needed for decoding variables from disk (`xarray.encoding`) and building lazy multi-dimensional coordinate arrays corresponding to map projections (#3620) Maybe: lazy reshapes (#4113) A common feature of these operations is they can (and almost always should) be fused with indexing: if N elements are selected via indexing, only O(N) compute and memory is required to produce them, regards of the size of the original arrays as long as the number of applied operations can be treated as a constant. Memory access is significantly slower than compute on modern hardware, so recomputing these operations on the fly is almost always a good idea. Out of scope: lazy computation when indexing could require access to many more elements to compute the desired value than are returned. For example, `mean()` probably should not be lazy, because that could involve computation of a very large number of elements that one might want to cache. This is valuable functionality for Xarray for two reasons: It allows for "previewing" small bits of data loaded from disk or remote storage, even if that data needs some form of cheap "decoding" from its form on disk. It allows for xarray to decode data in a lazy fashion that is compatible with full-featured systems for lazy computation (e.g., Dask), without requiring the user to choose dask when reading the data. Related issues: [Proposal] Expose Variable without Pandas dependency #3981 Lazy concatenation of arrays #4628	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5081/reactions", "total_count": 6, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 6, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
197939448	MDU6SXNzdWUxOTc5Mzk0NDg=	1189	Document using a spawning multiprocessing pool for multiprocessing with dask	shoyer 1217238	closed			3	2016-12-29T01:21:50Z	2023-12-05T21:51:04Z	2023-12-05T21:51:04Z	MEMBER			This is a nice option for working with in-file HFD5/netCDF4 compression: https://github.com/pydata/xarray/pull/1128#issuecomment-261936849 Mixed multi-threading/multi-processing could also be interesting, if anyone wants to revive that: https://github.com/dask/dask/pull/457 (I think it would work now that xarray data stores are pickle-able) CC @mrocklin	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1189/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
430188626	MDU6SXNzdWU0MzAxODg2MjY=	2873	Dask distributed tests fail locally	shoyer 1217238	closed			3	2019-04-07T20:26:53Z	2023-12-05T21:43:02Z	2023-12-05T21:43:02Z	MEMBER			I'm not sure why, but when I run the integration tests with dask-distributed locally (on my MacBook pro), they fail: ``` $ pytest xarray/tests/test_distributed.py --maxfail 1 ================================================ test session starts ================================================= platform darwin -- Python 3.7.2, pytest-4.0.1, py-1.7.0, pluggy-0.8.0 rootdir: /Users/shoyer/dev/xarray, inifile: setup.cfg plugins: repeat-0.7.0 collected 19 items xarray/tests/test_distributed.py F ====================================================== FAILURES ====================================================== ____ test_dask_distributed_netcdf_roundtrip[netcdf4-NETCDF3_CLASSIC] _______ loop = <tornado.platform.asyncio.AsyncIOLoop object at 0x1c182da1d0> tmp_netcdf_filename = '/private/var/folders/15/qdcz0wqj1t9dg40m_ld0fjkh00b4kd/T/pytest-of-shoyer/pytest-3/test_dask_distributed_netcdf_r0/testfile.nc' engine = 'netcdf4', nc_format = 'NETCDF3_CLASSIC' @pytest.mark.parametrize('engine,nc_format', ENGINES_AND_FORMATS) # noqa def test_dask_distributed_netcdf_roundtrip( loop, tmp_netcdf_filename, engine, nc_format): if engine not in ENGINES: pytest.skip('engine not available') chunks = {'dim1': 4, 'dim2': 3, 'dim3': 6} with cluster() as (s, [a, b]): with Client(s['address'], loop=loop): original = create_test_data().chunk(chunks) if engine == 'scipy': with pytest.raises(NotImplementedError): original.to_netcdf(tmp_netcdf_filename, engine=engine, format=nc_format) return original.to_netcdf(tmp_netcdf_filename, engine=engine, format=nc_format) with xr.open_dataset(tmp_netcdf_filename, chunks=chunks, engine=engine) as restored: assert isinstance(restored.var1.data, da.Array) computed = restored.compute() `assert_allclose(original, computed)` xarray/tests/test_distributed.py:87: ../../miniconda3/envs/xarray-py37/lib/python3.7/contextlib.py:119: in exit next(self.gen) nworkers = 2, nanny = False, worker_kwargs = {}, active_rpc_timeout = 1, scheduler_kwargs = {} `@contextmanager def cluster(nworkers=2, nanny=False, worker_kwargs={}, active_rpc_timeout=1, scheduler_kwargs={}): ... # trimmed start = time() while list(ws): sleep(0.01)` `assert time() < start + 1, 'Workers still around after one second'` E AssertionError: Workers still around after one second ../../miniconda3/envs/xarray-py37/lib/python3.7/site-packages/distributed/utils_test.py:721: AssertionError ------------------------------------------------ Captured stderr call ------------------------------------------------ distributed.scheduler - INFO - Clear task state distributed.scheduler - INFO - Scheduler at: tcp://127.0.0.1:51715 distributed.worker - INFO - Start worker at: tcp://127.0.0.1:51718 distributed.worker - INFO - Listening to: tcp://127.0.0.1:51718 distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:51715 distributed.worker - INFO - ------------------------------------------------- distributed.worker - INFO - Threads: 1 distributed.worker - INFO - Memory: 17.18 GB distributed.worker - INFO - Local Directory: /Users/shoyer/dev/xarray/_test_worker-5cabd1b7-4d9c-49eb-a79e-205c588f5dae/worker-n8uv72yx distributed.worker - INFO - ------------------------------------------------- distributed.worker - INFO - Start worker at: tcp://127.0.0.1:51720 distributed.worker - INFO - Listening to: tcp://127.0.0.1:51720 distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:51715 distributed.scheduler - INFO - Register tcp://127.0.0.1:51718 distributed.worker - INFO - ------------------------------------------------- distributed.worker - INFO - Threads: 1 distributed.worker - INFO - Memory: 17.18 GB distributed.worker - INFO - Local Directory: /Users/shoyer/dev/xarray/_test_worker-71a426d4-bd34-4808-9d33-79cac2bb4801/worker-a70rlf4r distributed.worker - INFO - ------------------------------------------------- distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:51718 distributed.core - INFO - Starting established connection distributed.worker - INFO - Registered to: tcp://127.0.0.1:51715 distributed.worker - INFO - ------------------------------------------------- distributed.core - INFO - Starting established connection distributed.scheduler - INFO - Register tcp://127.0.0.1:51720 distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:51720 distributed.core - INFO - Starting established connection distributed.worker - INFO - Registered to: tcp://127.0.0.1:51715 distributed.worker - INFO - ------------------------------------------------- distributed.core - INFO - Starting established connection distributed.scheduler - INFO - Receive client connection: Client-59a7918c-5972-11e9-912a-8c85907bce57 distributed.core - INFO - Starting established connection distributed.core - INFO - Event loop was unresponsive in Worker for 1.05s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability. distributed.scheduler - INFO - Receive client connection: Client-worker-5a5c81de-5972-11e9-9136-8c85907bce57 distributed.core - INFO - Starting established connection distributed.core - INFO - Event loop was unresponsive in Worker for 1.33s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability. distributed.scheduler - INFO - Receive client connection: Client-worker-5b2496d8-5972-11e9-9137-8c85907bce57 distributed.core - INFO - Starting established connection distributed.scheduler - INFO - Remove client Client-59a7918c-5972-11e9-912a-8c85907bce57 distributed.scheduler - INFO - Remove client Client-59a7918c-5972-11e9-912a-8c85907bce57 distributed.scheduler - INFO - Close client connection: Client-59a7918c-5972-11e9-912a-8c85907bce57 distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:51720 distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:51718 distributed.scheduler - INFO - Remove worker tcp://127.0.0.1:51720 distributed.core - INFO - Removing comms to tcp://127.0.0.1:51720 distributed.scheduler - INFO - Remove worker tcp://127.0.0.1:51718 distributed.core - INFO - Removing comms to tcp://127.0.0.1:51718 distributed.scheduler - INFO - Lost all workers distributed.scheduler - INFO - Remove client Client-worker-5b2496d8-5972-11e9-9137-8c85907bce57 distributed.scheduler - INFO - Remove client Client-worker-5a5c81de-5972-11e9-9136-8c85907bce57 distributed.scheduler - INFO - Close client connection: Client-worker-5b2496d8-5972-11e9-9137-8c85907bce57 distributed.scheduler - INFO - Close client connection: Client-worker-5a5c81de-5972-11e9-9136-8c85907bce57 distributed.scheduler - INFO - Scheduler closing... distributed.scheduler - INFO - Scheduler closing all comms ``` Version info: ``` In [2]: xarray.show_versions() INSTALLED VERSIONS commit: 2ce0639ee2ba9c7b1503356965f77d847d6cfcdf python: 3.7.2 (default, Dec 29 2018, 00:00:04) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1+4.g2ce0639e pandas: 0.24.0 numpy: 1.15.4 scipy: 1.1.0 netCDF4: 1.4.3.2 pydap: None h5netcdf: 0.7.0 h5py: 2.9.0 Nio: None zarr: 2.2.0 cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 1.1.5 distributed: 1.26.1 matplotlib: 3.0.2 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.0.0 pip: 18.0 conda: None pytest: 4.0.1 IPython: 6.5.0 sphinx: 1.8.2 ``` @mrocklin does this sort of error look familiar to you?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2873/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	not_planned	xarray 13221727	issue
707647715	MDExOlB1bGxSZXF1ZXN0NDkyMDEzODg4	4453	Simplify and restore old behavior for deep-copies	shoyer 1217238	closed			3	2020-09-23T20:10:33Z	2023-09-14T03:06:34Z	2023-09-14T03:06:33Z	MEMBER	1	pydata/xarray/pulls/4453	Intended to fix https://github.com/pydata/xarray/issues/4449 The goal is to restore behavior to match what we had prior to https://github.com/pydata/xarray/pull/4379 for all types of `data` other than `np.ndarray` objects Needs tests! [ ] Closes #xxxx [ ] Tests added [ ] Passes `isort . && black . && mypy . && flake8` [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] New functions/methods are listed in `api.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4453/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
588105641	MDU6SXNzdWU1ODgxMDU2NDE=	3893	HTML repr in the online docs	shoyer 1217238	open			3	2020-03-26T02:17:51Z	2023-09-11T17:41:59Z		MEMBER			I noticed two minor issues in our online docs, now that we've switched to the hip new HTML repr by default. Most doc pages still show text, not HTML. I suspect this is a limitation of the IPython sphinx derictive we use for our snippets. We might be able to fix that by switching to jupyter-sphinx? The "attributes" part of the HTML repr in our notebook examples looks a little funny, with strange blue formatting around each attribute name. It looks like part of the outer style of our docs is leaking into the HTML repr:	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3893/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1376109308	I_kwDOAMm_X85SBcL8	7045	Should Xarray stop doing automatic index-based alignment?	shoyer 1217238	open			13	2022-09-16T15:31:03Z	2023-08-23T07:42:34Z		MEMBER			What is your issue? I am increasingly thinking that automatic index-based alignment in Xarray (copied from pandas) may have been a design mistake. Almost every time I work with datasets with different indexes, I find myself writing code to explicitly align them: Automatic alignment is hard to predict. The implementation is complicated, and the exact mode of automatic alignment (outer vs inner vs left join) depends on the specific operation. It's also no longer possible to predict the shape (or even the dtype) resulting from most Xarray operations purely from input shape/dtype. Automatic alignment brings unexpected performance penalty. In some domains (analytics) this is OK, but in others (e.g,. numerical modeling or deep learning) this is a complete deal-breaker. Automatic alignment is not useful for float indexes, because exact matches are rare. In practice, this makes it less useful in Xarray's usual domains than it for pandas. Would it be insane to consider changing Xarray's behavior to stop doing automatic alignment? I imagine we could roll this out slowly, first with warnings and then with an option for disabling it. If you think this is a good or bad idea, consider responding to this issue with a 👍 or 👎 reaction.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7045/reactions", "total_count": 13, "+1": 9, "-1": 2, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 2 }		xarray 13221727	issue
342928718	MDExOlB1bGxSZXF1ZXN0MjAyNzE0MjUx	2302	WIP: lazy=True in apply_ufunc()	shoyer 1217238	open			1	2018-07-20T00:01:21Z	2023-07-18T04:19:17Z		MEMBER	0	pydata/xarray/pulls/2302	[x] Closes https://github.com/pydata/xarray/issues/2298 [ ] Tests added [ ] Tests passed [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API Still needs more tests and documentation.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2302/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1767947798	PR_kwDOAMm_X85TkPzV	7933	Update calendar for developers meeting	shoyer 1217238	closed			0	2023-06-21T16:09:44Z	2023-06-21T17:56:22Z	2023-06-21T17:56:22Z	MEMBER	0	pydata/xarray/pulls/7933	The old calendar was on @jhamman's UCAR account, which he no longer has access to! xref https://github.com/pydata/xarray/issues/4001	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7933/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
479942077	MDU6SXNzdWU0Nzk5NDIwNzc=	3213	How should xarray use/support sparse arrays?	shoyer 1217238	open			55	2019-08-13T03:29:42Z	2023-06-07T15:43:55Z		MEMBER			I'm looking forward to being easily able to create sparse xarray objects from pandas: https://github.com/pydata/xarray/issues/3206 Are there other xarray APIs that could make good use of sparse arrays, or could make sparse arrays easier to use? Some ideas: - `to_sparse()`/`to_dense()` methods for converting to/from sparse without requiring using `.data` - `to_dataframe()`/`to_series()` could grow options for skipping the fill-value in sparse arrays, so they can round-trip MultiIndex data back to pandas - Serialization to/from netCDF files, using some custom convention (see https://github.com/pydata/xarray/issues/1375#issuecomment-402699810)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3213/reactions", "total_count": 14, "+1": 14, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1465287257	I_kwDOAMm_X85XVoJZ	7325	Support reading Zarr data via TensorStore	shoyer 1217238	open			1	2022-11-27T00:12:17Z	2023-05-11T01:24:27Z		MEMBER			What is your issue? TensorStore is another high performance API for reading distributed arrays in formats such as Zarr, written in C++. It could be interesting to write an Xarray storage backend using TensorStore as an alternative way to read Zarr files. As an exercise, I make a little demo of doing this: https://gist.github.com/shoyer/5b0c485979cc9c36a9685d8cf8e94565 I have not tested it for performance. The main annoyance is that TensorStore doesn't understand Zarr groups or Zarr array attributes, so I needed to write my own helpers for reading this metadata. Also, there's a bit of an impedance mis-match between TensorStore (where everything returns futures) and Xarray (which assumes that indexing results in numpy arrays). This could likely be improved with some amount of effort -- in particular https://github.com/pydata/xarray/pull/6874/files should help. CC @jbms who may have better ideas about how to use the TensorStore API.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7325/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
253395960	MDU6SXNzdWUyNTMzOTU5NjA=	1533	Index variables loaded from dask can be computed twice	shoyer 1217238	closed			6	2017-08-28T17:18:27Z	2023-04-06T04:15:46Z	2023-04-06T04:15:46Z	MEMBER			as reported by @crusaderky in #1522	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1533/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
209653741	MDU6SXNzdWUyMDk2NTM3NDE=	1285	FAQ page could use some updating	shoyer 1217238	open			1	2017-02-23T03:29:16Z	2023-03-26T16:32:44Z		MEMBER			Along the same lines as https://github.com/pydata/xarray/issues/1282, we haven't done much updating for frequently asked questions -- it's mostly still the original handful of FAQ entries I wrote in the first version of the docs. Topics worth addressing: [ ] How xarray handles missing values [x] File formats -- how can I read format X in xarray? (Maybe we should make a table with links to other packages?) (please add suggestions for this list!) StackOverflow may be a helpful reference here: http://stackoverflow.com/questions/tagged/python-xarray?sort=votes&pageSize=50	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1285/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
176805500	MDU6SXNzdWUxNzY4MDU1MDA=	1004	Remove IndexVariable.name	shoyer 1217238	open			3	2016-09-14T03:27:43Z	2023-03-11T19:57:40Z		MEMBER			As discussed in #947, we should remove the `IndexVariable.name` attribute. It should be fine to use an `IndexVariable` anywhere, regardless of whether or not it labels ticks along a dimension.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1004/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
98587746	MDU6SXNzdWU5ODU4Nzc0Ng==	508	Ignore missing variables when concatenating datasets?	shoyer 1217238	closed			8	2015-08-02T06:03:57Z	2023-01-20T16:04:28Z	2023-01-20T16:04:28Z	MEMBER			Several users (@raj-kesavan, @richardotis, now myself) have wondered about how to concatenate xray Datasets with different variables. With the current `xray.concat`, you need to awkwardly create dummy variables filled with `NaN` in datasets that don't have them (or drop mismatched variables entirely). Neither of these are great options -- `concat` should have an option (the default?) to take care of this for the user. This would also be more consistent with `pd.concat`, which takes a more relaxed approach to matching dataframes with different variables (it does an outer join).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/508/reactions", "total_count": 6, "+1": 6, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
895983112	MDExOlB1bGxSZXF1ZXN0NjQ4MTM1NTcy	5351	Add xarray.backends.NoMatchingEngineError	shoyer 1217238	open			4	2021-05-19T22:09:21Z	2022-11-16T15:19:54Z		MEMBER	0	pydata/xarray/pulls/5351	[x] Closes #5329 [x] Tests added [x] Passes `pre-commit run --all-files` [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [x] New functions/methods are listed in `api.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5351/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
803068773	MDExOlB1bGxSZXF1ZXN0NTY5MDU5MTEz	4879	Cache files for different CachingFileManager objects separately	shoyer 1217238	closed			10	2021-02-07T21:48:06Z	2022-10-18T16:40:41Z	2022-10-18T16:40:40Z	MEMBER	0	pydata/xarray/pulls/4879	This means that explicitly opening a file multiple times with `open_dataset` (e.g., after modifying it on disk) now reopens the file from scratch, rather than reusing a cached version. If users want to reuse the cached file, they can reuse the same xarray object. We don't need this for handling many files in Dask (the original motivation for caching), because in those cases only a single CachingFileManager is created. I think this should some long-standing usability issues: #4240, #4862 Conveniently, this also obviates the need for some messy reference counting logic. [x] Closes #4240, #4862 [x] Tests added [x] Passes `pre-commit run --all-files` [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4879/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
623804131	MDU6SXNzdWU2MjM4MDQxMzE=	4090	Error with indexing 2D lat/lon coordinates	shoyer 1217238	closed			2	2020-05-24T06:19:45Z	2022-09-28T12:06:03Z	2022-09-28T12:06:03Z	MEMBER			``` filslp = "ChonghuaYinData/prmsl.mon.mean.nc" filtmp = "ChonghuaYinData/air.sig995.mon.mean.nc" filprc = "ChonghuaYinData/precip.mon.mean.nc" ds_slp = xr.open_dataset(filslp).sel(time=slice(str(yrStrt)+'-01-01', str(yrLast)+'-12-31')) ds_slp `outputs:` <xarray.Dataset> Dimensions: (nbnds: 2, time: 480, x: 349, y: 277) Coordinates: * time (time) datetime64[ns] 1979-01-01 ... 2018-12-01 lat (y, x) float32 ... lon (y, x) float32 ... * y (y) float32 0.0 32463.0 64926.0 ... 8927325.0 8959788.0 * x (x) float32 0.0 32463.0 64926.0 ... 11264660.0 11297120.0 Dimensions without coordinates: nbnds Data variables: Lambert_Conformal int32 ... prmsl (time, y, x) float32 ... time_bnds (time, nbnds) float64 ... Attributes: Conventions: CF-1.2 centerlat: 50.0 centerlon: -107.0 comments: institution: National Centers for Environmental Prediction latcorners: [ 1.000001 0.897945 46.3544 46.63433 ] loncorners: [-145.5 -68.32005 -2.569891 148.6418 ] platform: Model standardpar1: 50.0 standardpar2: 50.000001 title: NARR Monthly Means dataset_title: NCEP North American Regional Reanalysis (NARR) history: created 2016/04/12 by NOAA/ESRL/PSD references: https://www.esrl.noaa.gov/psd/data/gridded/data.narr.html source: http://www.emc.ncep.noaa.gov/mmb/rreanl/index.html References: ``` ``` yrStrt = 1950 # manually specify for convenience yrLast = 2018 # 20th century ends 2018 clStrt = 1950 # reference climatology for SOI clLast = 1979 yrStrtP = 1979 # 1st year GPCP yrLastP = yrLast # match 20th century latT = -17.6 # Tahiti lonT = 210.75 latD = -12.5 # Darwin lonD = 130.83 select grids of T and D T = ds_slp.sel(lat=latT, lon=lonT, method='nearest') D = ds_slp.sel(lat=latD, lon=lonD, method='nearest') `outputs:` ValueError Traceback (most recent call last) <ipython-input-27-6702b30f473f> in <module> 1 # select grids of T and D ----> 2 T = ds_slp.sel(lat=latT, lon=lonT, method='nearest') 3 D = ds_slp.sel(lat=latD, lon=lonD, method='nearest') ~\Anaconda3\lib\site-packages\xarray\core\dataset.py in sel(self, indexers, method, tolerance, drop, indexers_kwargs) 2004 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel") 2005 pos_indexers, new_indexes = remap_label_indexers( -> 2006 self, indexers=indexers, method=method, tolerance=tolerance 2007 ) 2008 result = self.isel(indexers=pos_indexers, drop=drop) ~\Anaconda3\lib\site-packages\xarray\core\coordinates.py in remap_label_indexers(obj, indexers, method, tolerance, indexers_kwargs) 378 379 pos_indexers, new_indexes = indexing.remap_label_indexers( --> 380 obj, v_indexers, method=method, tolerance=tolerance 381 ) 382 # attach indexer's coordinate to pos_indexers ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in remap_label_indexers(data_obj, indexers, method, tolerance) 257 new_indexes = {} 258 --> 259 dim_indexers = get_dim_indexers(data_obj, indexers) 260 for dim, label in dim_indexers.items(): 261 try: ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in get_dim_indexers(data_obj, indexers) 223 ] 224 if invalid: --> 225 raise ValueError("dimensions or multi-index levels %r do not exist" % invalid) 226 227 level_indexers = defaultdict(dict) ValueError: dimensions or multi-index levels ['lat', 'lon'] do not exist ``` Does any know how fix to this problem?Thank you very much. Originally posted by @JimmyGao0204 in https://github.com/pydata/xarray/issues/475#issuecomment-633172787	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4090/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1210147360	I_kwDOAMm_X85IIWIg	6504	test_weighted.test_weighted_operations_nonequal_coords should avoid depending on random number seed	shoyer 1217238	closed	shoyer 1217238		0	2022-04-20T19:56:19Z	2022-08-29T20:42:30Z	2022-08-29T20:42:30Z	MEMBER			What happened? In testing an upgrade to the latest version of xarray in our systems, I noticed this test failing: ``` def test_weighted_operations_nonequal_coords(): # There are no weights for a == 4, so that data point is ignored. weights = DataArray(np.random.randn(4), dims=("a",), coords=dict(a=[0, 1, 2, 3])) data = DataArray(np.random.randn(4), dims=("a",), coords=dict(a=[1, 2, 3, 4])) check_weighted_operations(data, weights, dim="a", skipna=None) `q = 0.5 result = data.weighted(weights).quantile(q, dim="a") # Expected value computed using code from [https://aakinshin.net/posts/weighted-quantiles/](https://www.google.com/url?q=https://aakinshin.net/posts/weighted-quantiles/&sa=D) with values at a=1,2,3 expected = DataArray([0.9308707], coords={"quantile": [q]}).squeeze()` `assert_allclose(result, expected)` E AssertionError: Left and right DataArray objects are not close E E Differing values: E L E array(0.919569) E R E array(0.930871) ``` It appears that this test is hard-coded to match a particular random number seed, which in turn would fix the resutls of `np.random.randn()`. What did you expect to happen? Whenever possible, Xarray's own tests should avoid relying on particular random number generators, e.g., in this case we could specify random numbers instead. A back-up option would be to explicitly set random seed locally inside the tests, e.g., by creating a `np.random.RandomState()` with a fixed seed and using that. The global random state used by `np.random.randn()` is sensitive to implementation details like the order in which tests are run. Minimal Complete Verifiable Example No response Relevant log output No response Anything else we need to know? No response Environment ...	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6504/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1210267320	I_kwDOAMm_X85IIza4	6505	Dropping a MultiIndex variable raises an error after explicit indexes refactor	shoyer 1217238	closed			3	2022-04-20T22:07:26Z	2022-07-21T14:46:58Z	2022-07-21T14:46:58Z	MEMBER			What happened? With the latest released version of Xarray, it is possible to delete all variables corresponding to a MultiIndex by simply deleting the name of the MultiIndex. After the explicit indexes refactor (i.e,. using the "main" development branch) this now raises error about how this would "corrupt" index state. This comes up when using `drop()` and `assign_coords()` and possibly some other methods. This is not hard to work around, but we may want to consider this bug a blocker for the next Xarray release. I found the issue surfaced in several projects when attempting to use the new version of Xarray inside Google's codebase. CC @benbovy in case you have any thoughts to share. What did you expect to happen? For now, we should preserve the behavior of deleting the variables corresponding to MultiIndex levels, but should issue a deprecation warning encouraging users to explicitly delete everything. Minimal Complete Verifiable Example ```Python import xarray array = xarray.DataArray( [[1, 2], [3, 4]], dims=['x', 'y'], coords={'x': ['a', 'b']}, ) stacked = array.stack(z=['x', 'y']) print(stacked.drop('z')) print() print(stacked.assign_coords(z=[1, 2, 3, 4])) ``` Relevant log output ```Python ValueError Traceback (most recent call last) Input In [1], in <cell line: 9>() 3 array = xarray.DataArray( 4 [[1, 2], [3, 4]], 5 dims=['x', 'y'], 6 coords={'x': ['a', 'b']}, 7 ) 8 stacked = array.stack(z=['x', 'y']) ----> 9 print(stacked.drop('z')) 10 print() 11 print(stacked.assign_coords(z=[1, 2, 3, 4])) File ~/dev/xarray/xarray/core/dataarray.py:2425, in DataArray.drop(self, labels, dim, errors, labels_kwargs) 2408 def drop( 2409 self, 2410 labels: Mapping = None, (...) 2414 labels_kwargs, 2415 ) -> DataArray: 2416 """Backward compatible method based on `drop_vars` and `drop_sel` 2417 2418 Using either `drop_vars` or `drop_sel` is encouraged (...) 2423 DataArray.drop_sel 2424 """ -> 2425 ds = self._to_temp_dataset().drop(labels, dim, errors=errors) 2426 return self._from_temp_dataset(ds) File ~/dev/xarray/xarray/core/dataset.py:4590, in Dataset.drop(self, labels, dim, errors, *labels_kwargs) 4584 if dim is None and (is_scalar(labels) or isinstance(labels, Iterable)): 4585 warnings.warn( 4586 "dropping variables using `drop` will be deprecated; using drop_vars is encouraged.", 4587 PendingDeprecationWarning, 4588 stacklevel=2, 4589 ) -> 4590 return self.drop_vars(labels, errors=errors) 4591 if dim is not None: 4592 warnings.warn( 4593 "dropping labels using list-like labels is deprecated; using " 4594 "dict-like arguments with `drop_sel`, e.g. `ds.drop_sel(dim=[labels]).", 4595 DeprecationWarning, 4596 stacklevel=2, 4597 ) File ~/dev/xarray/xarray/core/dataset.py:4549, in Dataset.drop_vars(self, names, errors) 4546 if errors == "raise": 4547 self._assert_all_in_dataset(names) -> 4549 assert_no_index_corrupted(self.xindexes, names) 4551 variables = {k: v for k, v in self._variables.items() if k not in names} 4552 coord_names = {k for k in self._coord_names if k in variables} File ~/dev/xarray/xarray/core/indexes.py:1394, in assert_no_index_corrupted(indexes, coord_names) 1392 common_names_str = ", ".join(f"{k!r}" for k in common_names) 1393 index_names_str = ", ".join(f"{k!r}" for k in index_coords) -> 1394 raise ValueError( 1395 f"cannot remove coordinate(s) {common_names_str}, which would corrupt " 1396 f"the following index built from coordinates {index_names_str}:\n" 1397 f"{index}" 1398 ) ValueError: cannot remove coordinate(s) 'z', which would corrupt the following index built from coordinates 'z', 'x', 'y': <xarray.core.indexes.PandasMultiIndex object at 0x148c95150> ``` Anything else we need to know? No response* Environment INSTALLED VERSIONS ------------------ commit: 33cdabd261b5725ac357c2823bd0f33684d3a954 python: 3.10.4 \| packaged by conda-forge \| (main, Mar 24 2022, 17:42:03) [Clang 12.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.4.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.18.3.dev137+g96c56836 pandas: 1.4.2 numpy: 1.22.3 scipy: 1.8.0 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.11.3 cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.04.1 distributed: 2022.4.1 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2022.3.0 cupy: None pint: None sparse: None setuptools: 62.1.0 pip: 22.0.4 conda: None pytest: 7.1.1 IPython: 8.2.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6505/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
168272291	MDExOlB1bGxSZXF1ZXN0NzkzMjE2NTc=	924	WIP: progress toward making groupby work with multiple arguments	shoyer 1217238	open			16	2016-07-29T08:07:57Z	2022-06-09T14:50:17Z		MEMBER	0	pydata/xarray/pulls/924	Fixes #324 It definitely doesn't work properly yet, totally mixing up coordinates, data variables and multi-indexes (as shown by the failing tests). A simple example: ``` In [4]: coords = {'a': ('x', [0, 0, 1, 1]), 'b': ('y', [0, 0, 1, 1])} In [5]: square = xr.DataArray(np.arange(16).reshape(4, 4), coords=coords, dims=['x', 'y']) In [6]: square Out[6]: <xarray.DataArray (x: 4, y: 4)> array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]) Coordinates: b (y) int64 0 0 1 1 a (x) int64 0 0 1 1 * x (x) int64 0 1 2 3 * y (y) int64 0 1 2 3 In [7]: square.groupby(['a', 'b']).mean() Out[7]: <xarray.DataArray (a: 2, b: 2)> array([[ 2.5, 4.5], [ 10.5, 12.5]]) Coordinates: * a (a) int64 0 1 * b (b) int64 0 1 In [8]: square.groupby(['x', 'y']).mean() Out[8]: <xarray.DataArray (x: 4, y: 4)> array([[ 0., 1., 2., 3.], [ 4., 5., 6., 7.], [ 8., 9., 10., 11.], [ 12., 13., 14., 15.]]) Coordinates: * x (x) int64 0 1 2 3 * y (y) int64 0 1 2 3 ``` More examples: https://gist.github.com/shoyer/5cfa4d5751e8a78a14af25f8442ad8d5	{ "url": "https://api.github.com/repos/pydata/xarray/issues/924/reactions", "total_count": 4, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 3, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
711626733	MDU6SXNzdWU3MTE2MjY3MzM=	4473	Wrap numpy-groupies to speed up Xarray's groupby aggregations	shoyer 1217238	closed			8	2020-09-30T04:43:04Z	2022-05-15T02:38:29Z	2022-05-15T02:38:29Z	MEMBER			Is your feature request related to a problem? Please describe. Xarray's groupby aggregations (e.g., `groupby(..).sum()`) are very slow compared to pandas, as described in https://github.com/pydata/xarray/issues/659. Describe the solution you'd like We could speed things up considerably (easily 100x) by wrapping the numpy-groupies package. Additional context One challenge is how to handle dask arrays (and other duck arrays). In some cases it might make sense to apply the numpy-groupies function (using apply_ufunc), but in other cases it might be better to stick with the current indexing + concatenate solution. We could either pick some simple heuristics for choosing the algorithm to use on dask arrays, or could just stick with the current algorithm for now. In particular, it might make sense to stick with the current algorithm if there are a many chunks in the arrays to aggregated along the "grouped" dimension (depending on the size of the unique group values).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4473/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
326205036	MDU6SXNzdWUzMjYyMDUwMzY=	2180	How should Dataset.update() handle conflicting coordinates?	shoyer 1217238	open			16	2018-05-24T16:46:23Z	2022-04-30T13:40:28Z		MEMBER			Recently, we updated `Dataset.__setitem__` to drop conflicting coordinates from DataArray values being assigned if they conflict with existing coordinates (https://github.com/pydata/xarray/pull/2087). Because `update` and `__setitem__` share the same code path, this inadvertently updated `update` as well. Is this something we want? In v0.10.3, both `__setitem__` and `update` prioritize coordinates from the assigned objects (e.g., `value` in `dataset[key] = value`). In v0.10.4, both `__setitem__` and `update` prioritize coordinates from the original object (e.g., `dataset`). I'm not sure this is the right behavior. In particular, in the case of `dataset.update(other)` where `other` is also an `xarray.Dataset`, it seems like coordinates from `other` should take priority. Note that one advantage of the current logic (which is violated by my current fix in https://github.com/pydata/xarray/pull/2162), is that we maintain the invariant that `dataset[key] = value` is equivalent to `dataset.update({key: value})`.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2180/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
612918997	MDU6SXNzdWU2MTI5MTg5OTc=	4034	Fix tight_layout warning on cartopy facetgrid docs example	shoyer 1217238	open			1	2020-05-05T21:54:46Z	2022-04-30T12:37:50Z		MEMBER			Per the fix in https://github.com/pydata/xarray/pull/4032, I'm pretty sure we will soon start seeing a warning message printed on ReadTheDocs in Cartopy FacetGrid example: http://xarray.pydata.org/en/stable/plotting.html#maps This would be nice to fix for users, especially because it's likely users will see this warning when running code outside of our documentation, too.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4034/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
621123222	MDU6SXNzdWU2MjExMjMyMjI=	4081	Wrap "Dimensions" onto multiple lines in xarray.Dataset repr?	shoyer 1217238	closed			4	2020-05-19T16:31:59Z	2022-04-29T19:59:24Z	2022-04-29T19:59:24Z	MEMBER			Here's an example dataset of a large dataset from @alimanfoo: https://nbviewer.jupyter.org/gist/alimanfoo/b74b08465727894538d5b161b3ced764 <xarray.Dataset> Dimensions: (__variants/BaseCounts_dim1: 4, __variants/MLEAC_dim1: 3, __variants/MLEAF_dim1: 3, alt_alleles: 3, ploidy: 2, samples: 1142, variants: 21442865) Coordinates: samples/ID (samples) object dask.array<chunksize=(1142,), meta=np.ndarray> variants/CHROM (variants) object dask.array<chunksize=(21442865,), meta=np.ndarray> variants/POS (variants) int32 dask.array<chunksize=(4194304,), meta=np.ndarray> Dimensions without coordinates: __variants/BaseCounts_dim1, __variants/MLEAC_dim1, __variants/MLEAF_dim1, alt_alleles, ploidy, samples, variants Data variables: variants/ABHet (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray> variants/ABHom (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray> variants/AC (variants, alt_alleles) int32 dask.array<chunksize=(4194304, 3), meta=np.ndarray> variants/AF (variants, alt_alleles) float32 dask.array<chunksize=(4194304, 3), meta=np.ndarray> ... I know similarly large datasets with lots of dimensions come up in other contexts as well, e.g., with geophysical model output. That's a very long first line! This would be easier to read as: <xarray.Dataset> Dimensions: (__variants/BaseCounts_dim1: 4, __variants/MLEAC_dim1: 3, __variants/MLEAF_dim1: 3, alt_alleles: 3, ploidy: 2, samples: 1142, variants: 21442865) Coordinates: samples/ID (samples) object dask.array<chunksize=(1142,), meta=np.ndarray> variants/CHROM (variants) object dask.array<chunksize=(21442865,), meta=np.ndarray> variants/POS (variants) int32 dask.array<chunksize=(4194304,), meta=np.ndarray> Dimensions without coordinates: __variants/BaseCounts_dim1, __variants/MLEAC_dim1, __variants/MLEAF_dim1, alt_alleles, ploidy, samples, variants Data variables: variants/ABHet (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray> variants/ABHom (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray> variants/AC (variants, alt_alleles) int32 dask.array<chunksize=(4194304, 3), meta=np.ndarray> variants/AF (variants, alt_alleles) float32 dask.array<chunksize=(4194304, 3), meta=np.ndarray> ... or maybe: <xarray.Dataset> Dimensions: __variants/BaseCounts_dim1: 4 __variants/MLEAC_dim1: 3 __variants/MLEAF_dim1: 3 alt_alleles: 3 ploidy: 2 samples: 1142 variants: 21442865 Coordinates: samples/ID (samples) object dask.array<chunksize=(1142,), meta=np.ndarray> variants/CHROM (variants) object dask.array<chunksize=(21442865,), meta=np.ndarray> variants/POS (variants) int32 dask.array<chunksize=(4194304,), meta=np.ndarray> Dimensions without coordinates: __variants/BaseCounts_dim1, __variants/MLEAC_dim1, __variants/MLEAF_dim1, alt_alleles, ploidy, samples, variants Data variables: variants/ABHet (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray> variants/ABHom (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray> variants/AC (variants, alt_alleles) int32 dask.array<chunksize=(4194304, 3), meta=np.ndarray> variants/AF (variants, alt_alleles) float32 dask.array<chunksize=(4194304, 3), meta=np.ndarray> ... `Dimensions without coordinates` could probably use some wrapping, too.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4081/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
205455788	MDU6SXNzdWUyMDU0NTU3ODg=	1251	Consistent naming for xarray's methods that apply functions	shoyer 1217238	closed			13	2017-02-05T21:27:24Z	2022-04-27T20:06:25Z	2022-04-27T20:06:25Z	MEMBER			We currently have two types of methods that take a function to apply to xarray objects: - `pipe` (on `DataArray` and `Dataset`): apply a function to this entire object (`array.pipe(func)` -> `func(array)`) - `apply` (on `Dataset` and `GroupBy`): apply a function to each labeled object in this object (e.g., `ds.apply(func)` -> `ds({k: func(v) for k, v in ds.data_vars.items()})`). And one more method that we want to add but isn't finalized yet -- currently named `apply_ufunc`: - Apply a function that acts on unlabeled (i.e., numpy) arrays to each array in the object I'd like to have three distinct names that makes it clear what these methods do and how they are different. This has come up a few times recently, e.g., https://github.com/pydata/xarray/issues/1130 One proposal: rename `apply` to `map`, and then use `apply` only for methods that act on unlabeled arrays. This would require a deprecation cycle, but eventually it would let us add `.apply` methods for handling raw arrays to both Dataset and DataArray. (We could use a separate apply method from `apply_ufunc` to convert `dim` arguments to `axis` and not do automatic broadcasting.)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1251/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
342180429	MDU6SXNzdWUzNDIxODA0Mjk=	2298	Making xarray math lazy	shoyer 1217238	open			7	2018-07-18T05:18:53Z	2022-04-19T15:38:59Z		MEMBER			At SciPy, I had the realization that it would be relatively straightforward to make element-wise math between xarray objects lazy. This would let us support lazy coordinate arrays, a feature that has quite a few use-cases, e.g., for both geoscience and astronomy. The trick would be to write a lazy array class that holds an element-wise vectorized function and passes indexers on to its arguments. I haven't thought too hard about this yet for vectorized indexing, but it could be quite efficient for outer indexing. I have some prototype code but no tests yet. The question is how to hook this into xarray operations. In particular, supposing that the inputs to a function do no hold dask arrays: - Should we try to make every element-wise operation with vectorized functions (ufuncs) lazy by default? This might have negative performance implications and would be a little tricky to implement with xarray's current code, since we still implement binary operations like `+` with separate logic from `apply_ufunc`. - Should we make every element-wise operation that explicitly uses `apply_ufunc()` lazy by default? - Or should we only make element-wise operations lazy with `apply_ufunc()` if you use some special flag, e.g., `apply_ufunc(..., lazy=True)`? I am leaning towards the last option for now but would welcome other opinions.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2298/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
902622057	MDU6SXNzdWU5MDI2MjIwNTc=	5381	concat() with compat='no_conflicts' on dask arrays has accidentally quadratic runtime	shoyer 1217238	open			0	2021-05-26T16:12:06Z	2022-04-19T03:48:27Z		MEMBER			This ends up calling `fillna()` in a loop inside `xarray.core.merge.unique_variable()`, something like: `python out = variables[0] for var in variables[1:]: out = out.fillna(var)` https://github.com/pydata/xarray/blob/55e5b5aaa6d9c27adcf9a7cb1f6ac3bf71c10dea/xarray/core/merge.py#L147-L149 This has quadratic behavior if the variables are stored in dask arrays (the dask graph gets one element larger after each loop iteration). This is OK for `merge()` (which typically only has two arguments) but is problematic for dealing with variables that shouldn't be concatenated inside `concat()`, which should be able to handle very long lists of arguments. I encountered this because `compat='no_conflicts'` is the default for `xarray.combine_nested()`. I guess there's also the related issue which is that even if we produced the output dask graph by hand without a loop, it still wouldn't be easy to evaluate for a large number of elements. Ideally we would use some sort of tree-reduction to ensure the operation can be parallelized. xref https://github.com/google/xarray-beam/pull/13	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5381/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
325439138	MDU6SXNzdWUzMjU0MzkxMzg=	2171	Support alignment/broadcasting with unlabeled dimensions of size 1	shoyer 1217238	open			5	2018-05-22T19:52:21Z	2022-04-19T03:15:24Z		MEMBER			Sometimes, it's convenient to include placeholder dimensions of size 1, which allows for removing any ambiguity related to the order of output dimensions. Currently, this is not supported with xarray: ``` xr.DataArray([1], dims='x') + xr.DataArray([1, 2, 3], dims='x') ValueError: arguments without labels along dimension 'x' cannot be aligned because they have different dimension sizes: {1, 3} xr.Variable(('x',), [1]) + xr.Variable(('x',), [1, 2, 3]) ValueError: operands cannot be broadcast together with mismatched lengths for dimension 'x': (1, 3) ``` However, these operations aren't really ambiguous. With size 1 dimensions, we could logically do broadcasting like NumPy arrays, e.g., ``` np.array([1]) + np.array([1, 2, 3]) array([2, 3, 4]) ``` This would be particularly convenient if we add `keepdims=True` to xarray operations (#2170).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2171/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
665488672	MDU6SXNzdWU2NjU0ODg2NzI=	4267	CachingFileManager should not use __del__	shoyer 1217238	open			2	2020-07-25T01:20:52Z	2022-04-17T21:42:39Z		MEMBER			`__del__` is sometimes called after modules have been deallocated, which results in errors printed to stderr when Python exits. This manifests itself in the following bug: https://github.com/shoyer/h5netcdf/issues/50 Per https://github.com/shoyer/h5netcdf/issues/50#issuecomment-572191867, the right solution is probably to use `weakref.finalize`.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4267/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
469440752	MDU6SXNzdWU0Njk0NDA3NTI=	3139	Change the signature of DataArray to DataArray(data, dims, coords, ...)?	shoyer 1217238	open			1	2019-07-17T20:54:57Z	2022-04-09T15:28:51Z		MEMBER			Currently, the signature of DataArray is `DataArray(data, coords, dims, ...)`: http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html In the long term, I think `DataArray(data, dims, coords, ...)` would be more intuitive: dimensions are a more fundamental part of xarray's data model than coordinates. Certainly I find it much more common to omit `coords` than to omit `dims` when I create a `DataArray`. My original reasoning for this argument order was that `dims` could be copied from `coords`, e.g., `DataArray(new_data, old_dataarray.coords)`, and it was nice to be able to pass this sole argument by position instead of by name. But a cleaner way to write this now is `old_dataarray.copy(data=new_data)`. The challenge in making any change here would be to have a smooth deprecation process, and that ideally avoids requiring users to rewrite all of their code and avoids loads of pointless/extraneous warnings. I'm not entirely sure this is possible. We could likely use heuristics to distinguish between `dims` and `coords` arguments regardless of their order, but this probably isn't something we would want to preserve in the long term. An alternative that might achieve some of the convenience of this change would be to allow for passing lists of strings in the `coords` argument by position, which are interpreted as dimensions, e.g., `DataArray(data, ['x', 'y'])`. The downside of this alternative is that it would add even more special cases to the `DataArray` constructor , which would make it harder to understand.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3139/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
327166000	MDExOlB1bGxSZXF1ZXN0MTkxMDMwMjA4	2195	WIP: explicit indexes	shoyer 1217238	closed			3	2018-05-29T04:25:15Z	2022-03-21T14:59:52Z	2022-03-21T14:59:52Z	MEMBER	0	pydata/xarray/pulls/2195	Some utility functions that should be useful for https://github.com/pydata/xarray/issues/1603 Still very much a work in progress -- it would be great if someone has time to finish writing any of these in another PR!	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2195/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
864249974	MDU6SXNzdWU4NjQyNDk5NzQ=	5202	Make creating a MultiIndex in stack optional	shoyer 1217238	closed			7	2021-04-21T20:21:03Z	2022-03-17T17:11:42Z	2022-03-17T17:11:42Z	MEMBER			As @Hoeze notes in https://github.com/pydata/xarray/issues/5179, calling `stack()` can be "incredibly slow and memory-demanding, since it creates a MultiIndex of every possible coordinate in the array." This is true with how `stack()` works currently, but I'm not sure this is necessary. I suspect it's a vestigial design choice from copying pandas, back from before Xarray had optional indexes. One benefit is that it's convenient for making `unstack()` the inverse of `stack()`, but isn't always required. Regardless of how we define the semantics for boolean indexing (https://github.com/pydata/xarray/issues/1887), it seems like it could be a good idea to allow stack to skip creating a MultiIndex for the new dimension, via a new keyword argument such as `ds.stack(index=False)`. This would be equivalent to calling `reset_index()` after `stack()` but would be cheaper because the MultiIndex is never created in the first place.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5202/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
237008177	MDU6SXNzdWUyMzcwMDgxNzc=	1460	groupby should still squeeze for non-monotonic inputs	shoyer 1217238	open			5	2017-06-19T20:05:14Z	2022-03-04T21:31:41Z		MEMBER			We can simply use `argsort()` to determine `group_indices` instead of `np.arange()`: https://github.com/pydata/xarray/blob/22ff955d53e253071f6e4fa849e5291d0005282a/xarray/core/groupby.py#L256	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1460/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
58117200	MDU6SXNzdWU1ODExNzIwMA==	324	Support multi-dimensional grouped operations and group_over	shoyer 1217238	open		1.0 741199	12	2015-02-18T19:42:20Z	2022-02-28T19:03:17Z		MEMBER			Multi-dimensional grouped operations should be relatively straightforward -- the main complexity will be writing an N-dimensional concat that doesn't involve repetitively copying data. The idea with `group_over` would be to support groupby operations that act on a single element from each of the given groups, rather than the unique values. For example, `ds.group_over(['lat', 'lon'])` would let you iterate over or apply to 2D slices of `ds`, no matter how many dimensions it has. Roughly speaking (it's a little more complex for the case of non-dimension variables), `ds.group_over(dims)` would get translated into `ds.groupby([d for d in ds.dims if d not in dims])`. Related: #266	{ "url": "https://api.github.com/repos/pydata/xarray/issues/324/reactions", "total_count": 18, "+1": 18, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1090700695	I_kwDOAMm_X85BAsWX	6125	[Bug]: HTML repr does not display well in notebooks hosted on GitHub	shoyer 1217238	open			0	2021-12-29T19:05:49Z	2021-12-29T19:36:25Z		MEMBER			What happened? We see both the raw text and a malformed version of the HTML (without CSS formatting). Example (https://github.com/microsoft/PlanetaryComputerExamples/blob/main/quickstarts/reading-zarr-data.ipynb): What did you expect to happen? Either: Ideally, we only see the HTML repr, with CSS formatting applied. Or, if that isn't possible, we should figure out how to only show the raw text. nbviewer gets this right: Minimal Complete Verifiable Example No response Relevant log output No response Anything else we need to know? No response Environment NA	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6125/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1062709354	PR_kwDOAMm_X84u-sO9	6025	Simplify missing value handling in xarray.corr	shoyer 1217238	closed			1	2021-11-24T17:48:03Z	2021-11-28T04:39:22Z	2021-11-28T04:39:22Z	MEMBER	0	pydata/xarray/pulls/6025	This PR simplifies the fix from https://github.com/pydata/xarray/pull/5731, specifically for the benefit of xarray.corr. There is no need to use `map_blocks` instead of using `where` directly. It is a basically an alternative version of https://github.com/pydata/xarray/pull/5284. It is potentially slightly less efficient to do this masking step when unnecessary, but I doubt this makes a noticeable performance difference in practice (and I doubt this optimization is useful insdie `map_blocks`, anyways).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6025/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1044151556	PR_kwDOAMm_X84uELYB	5935	Docs: fix URL for PTSA	shoyer 1217238	closed			1	2021-11-03T21:56:44Z	2021-11-05T09:36:04Z	2021-11-05T09:36:04Z	MEMBER	0	pydata/xarray/pulls/5935	One of the PTSA authors told me about the new URL by email.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5935/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
874292512	MDU6SXNzdWU4NzQyOTI1MTI=	5251	Switch default for Zarr reading/writing to consolidated=True?	shoyer 1217238	closed			4	2021-05-03T06:59:42Z	2021-08-30T15:21:11Z	2021-08-30T15:21:11Z	MEMBER			Consolidated metadata was a new feature in Zarr v2.3, which was released over two year ago (March 22, 2019). Since then, I have used `consolidated=True` every time I've written or opened a Zarr store. As far as I can tell, this is almost always a good idea: - With local storage, it usually doesn't really matter. You spend a bit of time writing the consolidated metadata and have one extra file on disk, but the overhead is typically negligible. - With Cloud object stores or network filesystems, it can matter quite a large amount. Without consolidated metadata, these systems can be unusably slow for opening datasets. Cloud storage is of course the main use-case for Zarr. If you're using a local disk, you might as well stick with single files such as netCDF. I wonder if consolidated metadata is mature enough now that we could consider switching the default behavior in Xarray. From my perspective, this is a big "gotcha" for getting good performance with Zarr. More than one of my colleagues has been unimpressed with the performance of Zarr until they learned to set `consolidated=True`. I would suggest doing this in way is almost entirely backwards compatible, with only a minor performance costs for reading non-consolidated datasets: - `to_zarr()` switches the default to `consolidated=True`. The `consolidate_metadata()` will thus happen by default. - `open_zarr()` switches the default to `consolidated=None`, which means "Try reading consolidated metadata, and fall-back to non-consolidated if that fails." This will be slightly slower for non-consolidated metadata due to the extra file-lookup, but given that opening with non-consolidated metadata already requires a moderately large number of file look-ups, I doubt anyone will notice the difference. CC @rabernat	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5251/reactions", "total_count": 11, "+1": 11, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
928402742	MDU6SXNzdWU5Mjg0MDI3NDI=	5516	Rename master branch -> main	shoyer 1217238	closed			4	2021-06-23T15:45:57Z	2021-07-23T21:58:39Z	2021-07-23T21:58:39Z	MEMBER			This is a best practice for inclusive projects. See https://github.com/github/renaming for guidance.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5516/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
948890466	MDExOlB1bGxSZXF1ZXN0NjkzNjY1NDEy	5624	Make typing-extensions optional	shoyer 1217238	closed			6	2021-07-20T17:43:22Z	2021-07-22T23:30:49Z	2021-07-22T23:02:03Z	MEMBER	0	pydata/xarray/pulls/5624	Type checking may be a little worse if typing-extensions are not installed, but I don't think it's worth the trouble of adding another hard dependency just for one use for TypeGuard. Note: sadly this doesn't work yet. Mypy (and pylance) don't like the type alias defined with try/except. Any ideas? In the worst case, we could revert the TypeGuard entirely, but that would be a shame... [x] Closes #5495 [x] Passes `pre-commit run --all-files`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5624/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
890534794	MDU6SXNzdWU4OTA1MzQ3OTQ=	5295	Engine is no longer inferred for filenames not ending in ".nc"	shoyer 1217238	closed			0	2021-05-12T22:28:46Z	2021-07-15T14:57:54Z	2021-05-14T22:40:14Z	MEMBER			This works with xarray=0.17.0: `python import xarray xarray.Dataset({'x': [1, 2, 3]}).to_netcdf('tmp') xarray.open_dataset('tmp')` On xarray 0.18.0, it fails: ``` ValueError Traceback (most recent call last) <ipython-input-1-20e128a730aa> in <module>() 2 3 xarray.Dataset({'x': [1, 2, 3]}).to_netcdf('tmp') ----> 4 xarray.open_dataset('tmp') /usr/local/lib/python3.7/dist-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, args, *kwargs) 483 484 if engine is None: --> 485 engine = plugins.guess_engine(filename_or_obj) 486 487 backend = plugins.get_backend(engine) /usr/local/lib/python3.7/dist-packages/xarray/backends/plugins.py in guess_engine(store_spec) 110 warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning) 111 --> 112 raise ValueError("cannot guess the engine, try passing one explicitly") 113 114 ValueError: cannot guess the engine, try passing one explicitly ``` I'm not entirely sure what changed. My guess is that we used to fall-back to trying to use SciPy, but don't do that anymore. A potential fix would be reading strings as filenames in `xarray.backends.utils.read_magic_number`. Related: https://github.com/pydata/xarray/issues/5291	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5295/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
252707680	MDU6SXNzdWUyNTI3MDc2ODA=	1525	Consider setting name=False in Variable.chunk()	shoyer 1217238	open			4	2017-08-24T19:34:28Z	2021-07-13T01:50:16Z		MEMBER			@mrocklin writes: The following will be slower: `b = (a.chunk(...) + 1) + (a.chunk(...) + 1)` In current operation this will be optimized to `tmp = a.chunk(...) + 1 b = tmp + tmp` So you'll lose that, but I suspect that in your case chunking the same dataset many times is somewhat rare. See here for discussion: https://github.com/pydata/xarray/pull/1517#issuecomment-324722153 Whether this is worth doing really depends on on what people would find most useful -- and what is the most intuitive behavior.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1525/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
254888879	MDU6SXNzdWUyNTQ4ODg4Nzk=	1552	Flow chart for choosing indexing operations	shoyer 1217238	open			2	2017-09-03T17:33:30Z	2021-07-11T22:26:17Z		MEMBER			We have a lot of indexing operations, even though `sel_points` and `isel_points` are about to be deprecated (#1473). A flow chart / decision tree to help users pick the right indexing operation might be helpful (e.g., like this skimage FlowChart). It would ask various questions (e.g., do you have labels or integer positions? do you want to select or impose coordinates?) and then suggest appropriate the indexer methods. cc @fujiisoup	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1552/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
891281614	MDU6SXNzdWU4OTEyODE2MTQ=	5302	Suggesting specific IO backends to install when open_dataset() fails	shoyer 1217238	closed			3	2021-05-13T18:45:28Z	2021-06-23T08:18:07Z	2021-06-23T08:18:07Z	MEMBER			Currently, Xarray's internal backends don't get registered unless the necessary dependencies are installed: https://github.com/pydata/xarray/blob/1305d9b624723b86050ca5b2d854e5326bbaa8e6/xarray/backends/netCDF4_.py#L567-L568 In order to facilitating suggesting a specific backend to install (e.g., to improve error messages from opening tutorial datasets https://github.com/pydata/xarray/issues/5291), I would suggest that Xarray always registers its own backend entrypoints. Then we make the following changes to the plugin protocol: `guess_can_open()` should work regardless of whether the underlying backend is installed `installed()` returns a boolean reporting whether backend is installed. The default method in the base class would return `True`, for backwards compatibility. `open_dataset()` of course should error if the backend is not installed. This will let us leverage the existing `guess_can_open()` functionality to suggest specific optional dependencies to install. E.g., if you supply a netCDF3 file: `Xarray cannot find a matching installed backend for this file in the installed backends ["h5netcdf"]. Consider installing one of the following backends which reports a match: ["scipy", "netcdf4"]` Does this reasonable and worthwhile? CC @aurghs @alexamici	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5302/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
874331538	MDExOlB1bGxSZXF1ZXN0NjI4OTE0NDQz	5252	Add mode="r+" for to_zarr and use consolidated writes/reads by default	shoyer 1217238	closed			14	2021-05-03T07:57:16Z	2021-06-22T06:51:35Z	2021-06-17T17:19:26Z	MEMBER	0	pydata/xarray/pulls/5252	`mode="r+"` only allows for modifying pre-existing array values in a Zarr store. This makes it a safer default `mode` when doing a limited `region` write. It also offers a nice performance bonus when using consolidated metadata, because the store to modify can be opened in "consolidated" mode -- rather than painfully slow non-consolidated mode. This PR includes several related changes to `to_zarr()`: It adds support for the new `mode="r+"`. `consolidated=True` in `to_zarr()` now means "open in consolidated mode" if using using `mode="r+"`, instead of "write in consolidated mode" (which would not make sense for r+). It allows setting `consolidated=True` when using `region`, mostly for the sake of fast store opening with r+. Validation in `to_zarr()` has been reorganized to always use the existing Zarr group, rather than re-opening zar stores from scratch, which could require additional network requests. Incidentally, I've renamed the `ZarrStore.ds` attribute to `ZarrStore.zarr_group`, which is a much more descriptive name. These changes gave me a ~5x boost in write performance in a large parallel job making use of `to_zarr` with `region`. [x] Tests added [x] Passes `pre-commit run --all-files` [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5252/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
340733448	MDU6SXNzdWUzNDA3MzM0NDg=	2283	Exact alignment should allow missing dimension coordinates	shoyer 1217238	open			2	2018-07-12T17:40:24Z	2021-06-15T09:52:29Z		MEMBER			Code Sample, a copy-pastable example if possible `python import xarray as xr xr.align(xr.DataArray([1, 2, 3], dims='x'), xr.DataArray([1, 2, 3], dims='x', coords=[[0, 1, 2]]), join='exact')` Problem description This currently results in an error, but a missing index of size 3 does not actually conflict: ```python-traceback ValueError Traceback (most recent call last) <ipython-input-15-1d63d3512fb6> in <module>() 1 xr.align(xr.DataArray([1, 2, 3], dims='x'), 2 xr.DataArray([1, 2, 3], dims='x', coords=[[0, 1, 2]]), ----> 3 join='exact') /usr/local/lib/python3.6/dist-packages/xarray/core/alignment.py in align(objects, kwargs) 129 raise ValueError( 130 'indexes along dimension {!r} are not equal' --> 131 .format(dim)) 132 index = joiner(matching_indexes) 133 joined_indexes[dim] = index ValueError: indexes along dimension 'x' are not equal ``` This surfaced as an issue on StackOverflow: https://stackoverflow.com/questions/51308962/computing-matrix-vector-multiplication-for-each-time-point-in-two-dataarrays Expected Output Both output arrays should end up with the `x` coordinate from the input that has it, like the output of the above expression if `join='inner'`: `(<xarray.DataArray (x: 3)> array([1, 2, 3]) Coordinates: x (x) int64 0 1 2, <xarray.DataArray (x: 3)> array([1, 2, 3]) Coordinates: * x (x) int64 0 1 2)` Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.14.33+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.7 pandas: 0.22.0 numpy: 1.14.5 scipy: 0.19.1 netCDF4: None h5netcdf: None h5py: 2.8.0 Nio: None zarr: None bottleneck: None cyordereddict: None dask: None distributed: None matplotlib: 2.1.2 cartopy: None seaborn: 0.7.1 setuptools: 39.1.0 pip: 10.0.1 conda: None pytest: None IPython: 5.5.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2283/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
842438533	MDU6SXNzdWU4NDI0Mzg1MzM=	5082	Move encoding from xarray.Variable to duck arrays?	shoyer 1217238	open			2	2021-03-27T07:21:55Z	2021-06-13T01:34:00Z		MEMBER			The `encoding` property on `Variable` has always been an awkward part of Xarray's API, and an example of poor separation of concerns. It add conceptual overhead to all uses of `xarray.Variable`, but exists only for the (somewhat niche) benefit of Xarray's backend IO functionality. This is particularly problematic if we consider the possible separation of `xarray.Variable` into a separate package to remove the pandas dependency (https://github.com/pydata/xarray/issues/3981). I think a cleaner way to handle `encoding` would be to move it from `Variable` onto array objects, specifically duck array objects that Xarray creates when loading data from disk. As long as these duck arrays don't "propagate" themselves under array operations but rather turn into raw numpy arrays (or whatever is wrapped), this would automatically resolve all issues around propagating `encoding` attributes (e.g., https://github.com/pydata/xarray/pull/5065, https://github.com/pydata/xarray/issues/1614). And users who don't care about `encoding` because they don't use Xarray's IO functionality would never need to think about it.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5082/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
416554477	MDU6SXNzdWU0MTY1NTQ0Nzc=	2797	Stalebot is being overly aggressive	shoyer 1217238	closed			7	2019-03-03T19:37:37Z	2021-06-03T21:31:46Z	2021-06-03T21:22:48Z	MEMBER			E.g., see https://github.com/pydata/xarray/issues/1151 where stalebot closed an issue even after another comment. Is this something we need to reconfigure or just a bug? cc @pydata/xarray	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2797/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
276241764	MDU6SXNzdWUyNzYyNDE3NjQ=	1739	Utility to restore original dimension order after apply_ufunc	shoyer 1217238	open			11	2017-11-23T00:47:57Z	2021-05-29T07:39:33Z		MEMBER			This seems to be coming up quite a bit for wrapping functions that apply an operation along an axis, e.g., for `interpolate` in #1640 or `rank` in #1733. We should either write a utility function to do this or consider adding an option to `apply_ufunc`.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1739/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
901047466	MDU6SXNzdWU5MDEwNDc0NjY=	5372	Consider revising the _repr_inline_ protocol	shoyer 1217238	open			0	2021-05-25T16:18:31Z	2021-05-25T16:18:31Z		MEMBER			`_repr_inline_` looks like an IPython special method but is actually includes some xarray specific details: the result should not include `shape` or `dtype`. As I wrote in https://github.com/pydata/xarray/pull/5352, I would suggest revising it in one of two ways: Giving it a name like `_xarray_repr_inline_` to make it clearer that it's Xarray specific Include some more generic way of indicating that `shape`/`dtype` is redundant, e.g,. call it like `obj._repr_ndarray_inline_(dtype=False, shape=False)`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5372/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
891253662	MDExOlB1bGxSZXF1ZXN0NjQ0MTQ5Mzc2	5300	Better error message when no backend engine is found.	shoyer 1217238	closed			4	2021-05-13T18:10:04Z	2021-05-18T21:23:00Z	2021-05-18T21:23:00Z	MEMBER	0	pydata/xarray/pulls/5300	Also includes a better error message when loading a tutorial dataset but an underlying IO dependency is not found. [x] Fixes #5291 [x] Tests added [x] Passes `pre-commit run --all-files`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5300/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
890573049	MDExOlB1bGxSZXF1ZXN0NjQzNTc1Mjc5	5296	More robust guess_can_open for netCDF4/scipy/h5netcdf entrypoints	shoyer 1217238	closed			1	2021-05-12T23:53:32Z	2021-05-14T22:40:14Z	2021-05-14T22:40:14Z	MEMBER	0	pydata/xarray/pulls/5296	The new version checks magic numbers in files on disk, not just already open file objects. I've also added a bunch of unit-tests. Fixes GH5295 [x] Closes #5295 [x] Tests added [x] Passes `pre-commit run --all-files` [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5296/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
46049691	MDU6SXNzdWU0NjA0OTY5MQ==	255	Add Dataset.to_pandas() method	shoyer 1217238	closed		0.5 987654	2	2014-10-17T00:01:36Z	2021-05-04T13:56:00Z	2021-05-04T13:56:00Z	MEMBER			This would be the complement of the DataArray constructor, converting an xray.DataArray into a 1D series, 2D DataFrame or 3D panel, whichever is appropriate. `to_pandas` would also makes sense for Dataset, if it could convert 0d datasets to series, e.g., `pd.Series({k: v.item() for k, v in ds.items()})` (there is currently no direct way to do this), and revert to to_dataframe for higher dimensional input. - [x] DataArray method - [ ] Dataset method	{ "url": "https://api.github.com/repos/pydata/xarray/issues/255/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
294241734	MDU6SXNzdWUyOTQyNDE3MzQ=	1887	Boolean indexing with multi-dimensional key arrays	shoyer 1217238	open			13	2018-02-04T23:28:45Z	2021-04-22T21:06:47Z		MEMBER			Originally from https://github.com/pydata/xarray/issues/974 For boolean indexing: - `da[key]` where `key` is a boolean labelled array (with any number of dimensions) is made equivalent to `da.where(key.reindex_like(ds), drop=True)`. This matches the existing behavior if `key` is a 1D boolean array. For multi-dimensional arrays, even though the result is now multi-dimensional, this coupled with automatic skipping of NaNs means that `da[key].mean()` gives the same result as in NumPy. - `da[key] = value` where `key` is a boolean labelled array can be made equivalent to `da = da.where(*align(key.reindex_like(da), value.reindex_like(da)))` (that is, the three argument form of `where`). - `da[key_0, ..., key_n]` where all of `key_i` are boolean arrays gets handled in the usual way. It is an `IndexingError` to supply multiple labelled keys if any of them are not already aligned with as the corresponding index coordinates (and share the same dimension name). If they want alignment, we suggest users simply write `da[key_0 & ... & key_n]`.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1887/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
346822633	MDU6SXNzdWUzNDY4MjI2MzM=	2336	test_88_character_filename_segmentation_fault should not try to write to the current working directory	shoyer 1217238	closed			2	2018-08-02T01:06:41Z	2021-04-20T23:38:53Z	2021-04-20T23:38:53Z	MEMBER			This files in cases where the current working directory does not support writes, e.g., as seen here ``` def test_88_character_filename_segmentation_fault(self): # should be fixed in netcdf4 v1.3.1 with mock.patch('netCDF4.version', '1.2.4'): with warnings.catch_warnings(): message = ('A segmentation fault may occur when the ' 'file path has exactly 88 characters') warnings.filterwarnings('error', message) with pytest.raises(Warning): # Need to construct 88 character filepath `xr.Dataset().to_netcdf('a' * (88 - len(os.getcwd()) - 1))` tests/test_backends.py:1234: core/dataset.py:1150: in to_netcdf compute=compute) backends/api.py:715: in to_netcdf autoclose=autoclose, lock=lock) backends/netCDF4_.py:332: in open ds = opener() backends/netCDF4_.py:231: in _open_netcdf4_group ds = nc4.Dataset(filename, mode=mode, **kwargs) third_party/py/netCDF4/_netCDF4.pyx:2111: in netCDF4._netCDF4.Dataset.init ??? ??? E IOError: [Errno 13] Permission denied ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2336/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
843996137	MDU6SXNzdWU4NDM5OTYxMzc=	5092	Concurrent loading of coordinate arrays from Zarr	shoyer 1217238	open			0	2021-03-30T02:19:50Z	2021-04-19T02:43:31Z		MEMBER			When you open a dataset with Zarr, xarray loads coordinate arrays corresponding to indexes in serial. This can be slow (multiple seconds) even with only a handful of such arrays if they are stored in a remote filesystem (e.g., cloud object stores). This is similar to the use-cases for consolidated metadata. In principle, we could speed up loading datasets from Zarr into Xarray significantly by reading the data corresponding to these arrays in parallel (e.g., in multiple threads).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5092/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
621082480	MDU6SXNzdWU2MjEwODI0ODA=	4080	Most arguments to open_dataset should be keyword only	shoyer 1217238	closed			1	2020-05-19T15:38:51Z	2021-03-16T10:56:09Z	2021-03-16T10:56:09Z	MEMBER			`open_dataset` has a long list of arguments: `xarray.open_dataset(filename_or_obj, group=None, decode_cf=True, mask_and_scale=None, decode_times=True, autoclose=None, concat_characters=True, decode_coords=True, engine=None, chunks=None, lock=None, cache=None, drop_variables=None, backend_kwargs=None, use_cftime=None)` Similarly to the case for pandas (https://github.com/pandas-dev/pandas/issues/27544), it would be nice to make most of these arguments keyword-only, e.g., `def open_dataset(filename_or_obj, group, *, ...)`. For consistency, this would also apply to `open_dataarray`, `decode_cf`, `open_mfdataset`, etc. This would encourage writing readable code when calling `open_dataset()` and would allow us to use better organization when adding new arguments (e.g., `decode_timedelta` in https://github.com/pydata/xarray/pull/4071). To make this change, we could make use of the `deprecate_nonkeyword_arguments` decorator from https://github.com/pandas-dev/pandas/pull/27573	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4080/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
645062817	MDExOlB1bGxSZXF1ZXN0NDM5NTg4OTU1	4178	Fix min_deps_check; revert to support numpy=1.14 and pandas=0.24	shoyer 1217238	closed			5	2020-06-25T00:37:19Z	2021-02-27T21:46:43Z	2021-02-27T21:46:42Z	MEMBER	1	pydata/xarray/pulls/4178	Fixes the issue noticed in: https://github.com/pydata/xarray/pull/4175#issuecomment-649135372 Let's see if this passes CI... [x] Passes `isort -rc . && black . && mypy . && flake8`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4178/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
645154872	MDU6SXNzdWU2NDUxNTQ4NzI=	4179	Consider revising our minimum dependency version policy	shoyer 1217238	closed			7	2020-06-25T05:04:38Z	2021-02-22T05:02:25Z	2021-02-22T05:02:25Z	MEMBER			Our current policy is that xarray supports "the minor version (X.Y) initially published no more than N months ago" where N is: Python: 42 months (NEP 29) numpy: 24 months (NEP 29) pandas: 12 months scipy: 12 months sparse, pint and other libraries that rely on NEP-18 for integration: very latest available versions only, all other libraries: 6 months I think this policy is too aggressive, particularly for pandas, SciPy and other libraries. Some of these projects can go 6+ months between minor releases. For example, version 2.3 of zarr is currently more than 6 months old. So if zarr released 2.4 today and xarray issued a new release tomorrow, and then our policy would dictate that we could ask users to upgrade to the new version. In https://github.com/pydata/xarray/pull/4178, I misinterpreted our policy as supporting "the most recent minor version (X.Y) initially published more than N months ago". This version makes a bit more sense to me: users only need to upgrade dependencies at least every N months to use the latest xarray release. I understand that NEP-29 chose its language intentionally, so that distributors know ahead of time when they can drop support for a Python or NumPy version. But this seems like a (very) poor fit for projects without regular releases. At the very least we should adjust the specific time windows. I'll see if I can gain some understanding of the motivation for this particular language over on the NumPy tracker...	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4179/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
267927402	MDU6SXNzdWUyNjc5Mjc0MDI=	1652	Resolve warnings issued in the xarray test suite	shoyer 1217238	closed			10	2017-10-24T07:36:55Z	2021-02-21T23:06:35Z	2021-02-21T23:06:34Z	MEMBER			82 warnings are currently issued in the process of running our test suite: https://gist.github.com/shoyer/db0b2c82efd76b254453216e957c4345 Some of can probably be safely ignored, but others are likely noticed by users, e.g., https://stackoverflow.com/questions/41130138/why-is-invalid-value-encountered-in-greater-warning-thrown-in-python-xarray-fo/41147570#41147570 It would be nice to clean up all of these, either by catching the appropriate upstream warning (if irrelevant) or changing our usage to avoid the warning. There may very well be a lurking FutureWarning in there somewhere that could cause issues when another library updates. Probably the easiest way to get started here is to get the test suite running locally, and use `py.test -W error` to turn all warnings into errors.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1652/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
777327298	MDU6SXNzdWU3NzczMjcyOTg=	4749	Option for combine_attrs with conflicting values silently dropped	shoyer 1217238	closed			0	2021-01-01T18:04:49Z	2021-02-10T19:50:17Z	2021-02-10T19:50:17Z	MEMBER			`merge()` currently supports four options for merging `attrs`: `combine_attrs : {"drop", "identical", "no_conflicts", "override"}, \ default: "drop" String indicating how to combine attrs of the objects being merged: - "drop": empty attrs on returned Dataset. - "identical": all attrs must be the same on every object. - "no_conflicts": attrs from all objects are combined, any that have the same name must also have the same value. - "override": skip comparing and copy attrs from the first dataset to the result.` It would be nice to have an option to combine attrs from all objects like "no_conflicts", but that drops attributes with conflicting values rather than raising an error. We might call this `combine_attrs="drop_conflicts"` or `combine_attrs="matching"`. This is similar to how xarray currently handles conflicting values for `DataArray.name` and would be more suitable to consider for the default behavior of `merge` and other functions/methods that merge coordinates (e.g., apply_ufunc, concat, where, binary arithmetic). cc @keewis	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4749/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
264098632	MDU6SXNzdWUyNjQwOTg2MzI=	1618	apply_raw() for a simpler version of apply_ufunc()	shoyer 1217238	open			4	2017-10-10T04:51:38Z	2021-01-01T17:14:43Z		MEMBER			`apply_raw()` would work like `apply_ufunc()`, but without the hard to understand broadcasting behavior and core dimensions. The rule for `apply_raw()` would be that it directly unwraps its arguments and passes them on to the wrapped function, without any broadcasting. We would also include a `dim` argument that is automatically converted into the appropriate `axis` argument when calling the wrapped function. Output dimensions would be determined from a simple rule of some sort: - Default output dimensions would either be copied from the first argument, or would take on the ordered union on all input dimensions. - Custom dimensions could either be set by adding a `drop_dims` argument (like `dask.array.map_blocks`), or require an explicit override `output_dims`. This also could be suitable for defining as a method instead of a separate function. See https://github.com/pydata/xarray/issues/1251 and https://github.com/pydata/xarray/issues/1130 for related issues.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1618/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
269700511	MDU6SXNzdWUyNjk3MDA1MTE=	1672	Append along an unlimited dimension to an existing netCDF file	shoyer 1217238	open			8	2017-10-30T18:09:54Z	2020-11-29T17:35:04Z		MEMBER			This would be a nice feature to have for some use cases, e.g., for writing simulation time-steps: https://stackoverflow.com/questions/46951981/create-and-write-xarray-dataarray-to-netcdf-in-chunks It should be relatively straightforward to add, too, building on support for writing files with unlimited dimensions. User facing API would probably be a new keyword argument to `to_netcdf()`, e.g., `extend='time'` to indicate the extended dimension.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1672/reactions", "total_count": 21, "+1": 21, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
314444743	MDU6SXNzdWUzMTQ0NDQ3NDM=	2059	How should xarray serialize bytes/unicode strings across Python/netCDF versions?	shoyer 1217238	open			5	2018-04-15T19:36:55Z	2020-11-19T10:08:16Z		MEMBER			netCDF string types We have several options for storing strings in netCDF files: - `NC_CHAR`: netCDF's legacy character type. The closest match is NumPy `'S1'` dtype. In principle, it's supposed to be able to store arbitrary bytes. On HDF5, it uses an UTF-8 encoded string with a fixed-size of 1 (but note that HDF5 does not complain about storing arbitrary bytes). - `NC_STRING`: netCDF's newer variable length string type. It's only available on netCDF4 (not netCDF3). It corresponds to an HDF5 variable-length string with UTF-8 encoding. - `NC_CHAR` with an `_Encoding` attribute: xarray and netCDF4-Python support an ad-hoc convention for storing unicode strings in `NC_CHAR` data-types, by adding an attribute `{'_Encoding': 'UTF-8'}`. The data is still stored as fixed width strings, but xarray (and netCDF4-Python) can decode them as unicode. `NC_STRING` would seem like a clear win in cases where it's supported, but as @crusaderky points out in https://github.com/pydata/xarray/issues/2040, it actually results in much larger netCDF files in many cases than using character arrays, which are more easily compressed. Nonetheless, we currently default to storing unicode strings in `NC_STRING`, because it's the most portable option -- every tool that handles HDF5 and netCDF4 should be able to read it properly as unicode strings. NumPy/Python string types On the Python side, our options are perhaps even more confusing: - NumPy's `dtype=np.string_` corresponds to fixed-length bytes. This is the default dtype for strings on Python 2, because on Python 2 strings are the same as bytes. - NumPy's `dtype=np.unicode_` corresponds to fixed-length unicode. This is the default dtype for strings on Python 3, because on Python 3 strings are the same as unicode. - Strings are also commonly stored in numpy arrays with `dtype=np.object_`, as arrays of either `bytes` or `unicode` objects. This is a pragmatic choice, because otherwise NumPy has no support for variable length strings. We also use this (like pandas) to mark missing values with `np.nan`. Like pandas, we are pretty liberal with converting back and forth between fixed-length (`np.string`/`np.unicode_`) and variable-length (object dtype) representations of strings as necessary. This works pretty well, though converting from object arrays in particular has downsides, since it cannot be done lazily with dask. Current behavior of xarray Currently, xarray uses the same behavior on Python 2/3. The priority was faithfully round-tripping data from a particular version of Python to netCDF and back, which the current serialization behavior achieves: \| Python version \| NetCDF version \| NumPy datatype \| NetCDF datatype \| \| --------- \| ---------- \| -------------- \| ------------ \| \| Python 2 \| NETCDF3 \| np.string_ / str \| NC_CHAR \| \| Python 2 \| NETCDF4 \| np.string_ / str \| NC_CHAR \| \| Python 3 \| NETCDF3 \| np.string_ / bytes \| NC_CHAR \| \| Python 3 \| NETCDF4 \| np.string_ / bytes \| NC_CHAR \| \| Python 2 \| NETCDF3 \| np.unicode_ / unicode \| NC_CHAR with UTF-8 encoding \| \| Python 2 \| NETCDF4 \| np.unicode_ / unicode \| NC_STRING \| \| Python 3 \| NETCDF3 \| np.unicode_ / str \| NC_CHAR with UTF-8 encoding \| \| Python 3 \| NETCDF4 \| np.unicode_ / str \| NC_STRING \| \| Python 2 \| NETCDF3 \| object bytes/str \| NC_CHAR \| \| Python 2 \| NETCDF4 \| object bytes/str \| NC_CHAR \| \| Python 3 \| NETCDF3 \| object bytes \| NC_CHAR \| \| Python 3 \| NETCDF4 \| object bytes \| NC_CHAR \| \| Python 2 \| NETCDF3 \| object unicode \| NC_CHAR with UTF-8 encoding \| \| Python 2 \| NETCDF4 \| object unicode \| NC_STRING \| \| Python 3 \| NETCDF3 \| object unicode/str \| NC_CHAR with UTF-8 encoding \| \| Python 3 \| NETCDF4 \| object unicode/str \| NC_STRING \| This can also be selected explicitly for most data-types by setting dtype in encoding: - `'S1'` for NC_CHAR (with or without encoding) - `str` for NC_STRING (though I'm not 100% sure it works properly currently when given bytes) Script for generating table: ```python from __future__ import print_function import xarray as xr import uuid import netCDF4 import numpy as np import sys for dtype_name, value in [ ('np.string_ / ' + type(b'').__name__, np.array([b'abc'])), ('np.unicode_ / ' + type(u'').__name__, np.array([u'abc'])), ('object bytes/' + type(b'').__name__, np.array([b'abc'], dtype=object)), ('object unicode/' + type(u'').__name__, np.array([u'abc'], dtype=object)), ]: for format in ['NETCDF3_64BIT', 'NETCDF4']: filename = str(uuid.uuid4()) + '.nc' xr.Dataset({'data': value}).to_netcdf(filename, format=format) with netCDF4.Dataset(filename) as f: var = f.variables['data'] disk_dtype = var.dtype has_encoding = hasattr(var, '_Encoding') disk_dtype_name = (('NC_CHAR' if disk_dtype == 'S1' else 'NC_STRING') + (' with UTF-8 encoding' if has_encoding else '')) print('\|', 'Python %i' % sys.version_info[0], '\|', format[:7], '\|', dtype_name, '\|', disk_dtype_name, '\|') ``` Potential alternatives The main option I'm considering is switching to default to `NC_CHAR` with UTF-8 encoding for np.string_ / str and object bytes/str on Python 2. The current behavior could be explicitly toggled by setting an encoding of `{'_Encoding': None}`. This would imply two changes: 1. Attempting to serialize arbitrary bytes (on Python 2) would start raising an error -- anything that isn't ASCII would require explicitly disabling `_Encoding`. 2. Strings read back from disk on Python 2 would come back as unicode instead of bytes. This implicit conversion would be consistent with Python 2's general handling of bytes/unicode, and facilitate reading netCDF files on Python 3 that were written with Python 2. The counter-argument is that it may not be worth changing this at this late point, given that we will be sunsetting Python 2 support by year's end.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2059/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
613012939	MDExOlB1bGxSZXF1ZXN0NDEzODQ3NzU0	4035	Support parallel writes to regions of zarr stores	shoyer 1217238	closed			17	2020-05-06T02:40:19Z	2020-11-04T06:19:01Z	2020-11-04T06:19:01Z	MEMBER	0	pydata/xarray/pulls/4035	This PR adds support for a `region` keyword argument to `to_zarr()`, to support parallel writes to different parts of arrays in a zarr stores, e.g., `ds.to_zarr(..., region={'x': slice(1000, 2000)})` to write a dataset over the range `1000:2000` along the `x` dimension. This is useful for creating large Zarr datasets without requiring dask. For example, the separate workers in a simulation job might each write a single non-overlapping chunk of a Zarr file. The standard way to handle such datasets today is to first write netCDF files in each process, and then consolidate them afterwards with dask (see #3096). Creating empty Zarr stores In order to do so, the Zarr file must be pre-existing with desired variables in the right shapes/chunks. It is desirable to be able to create such stores without actually writing data, because datasets that we want to write in parallel may be very large. In the example below, I achieve this filling a `Dataset` with dask arrays, and passing `compute=False` to `to_zarr()`. This works, but it relies on an undocumented implementation detail of the `compute` argument. We should either: Officially document that the `compute` argument only controls writing array values, not metadata (at least for zarr). Add a new keyword argument or entire new method for creating an unfilled Zarr store, e.g., `write_values=False`. I think (1) is maybe the cleanest option (no extra API endpoints). Unchunked variables One potential gotcha concerns coordinate arrays that are not chunked, e.g., consider parallel writing of a dataset divided along time with 2D `latitude` and `longitude` arrays that are fixed over all chunks. With the current PR, such coordinate arrays would get rewritten by each separate writer. If a Zarr store does not have atomic writes, then conceivably this could result in corrupted data. The default DirectoryStore has atomic writes and cloud based object stores should also be atomic, so perhaps this doesn't matter in practice, but at the very least it's inefficient and could cause issues for large-scale jobs due to resource contention. Options include: Current behavior. Variables whose dimensions do not overlap with `region` are written by `to_zarr()`. This is likely the most intuitive behavior for writing from a single process at a time. Exclude variables whose dimensions do not overlap with `region` from being written. This is likely the most convenient behavior for writing from multiple processes at once. Like (2), but issue a warning if any such variables exist instead of silently dropping them. Like (2), but raise an error instead of a warning. Require the user to explicitly drop them with `.drop()`. This is probably the safest behavior. I think (4) would be my preferred option. Some users would undoubtedly find this annoying, but the power-users for whom we are adding this feature would likely appreciate it. Usage example ```python import xarray import dask.array as da ds = xarray.Dataset({'u': (('x',), da.arange(1000, chunks=100))}) create the new zarr store, but don't write data path = 'my-data.zarr' ds.to_zarr(path, compute=False) look at the unwritten data ds_opened = xarray.open_zarr(path) print('Data before writing:', ds_opened.u.data[::100].compute()) Data before writing: [ 1 100 1 100 100 1 1 1 1 1] write out each slice (could be in separate processes) for start in range(0, 1000, 100): selection = {'x': slice(start, start + 100)} ds.isel(selection).to_zarr(path, region=selection) print('Data after writing:', ds_opened.u.data[::100].compute()) Data after writing: [ 0 100 200 300 400 500 600 700 800 900] ``` [x] Closes https://github.com/pydata/xarray/issues/3096 [x] Integration test [x] Unit tests [x] Passes `isort -rc . && black . && mypy . && flake8` [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4035/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
124809636	MDU6SXNzdWUxMjQ4MDk2MzY=	703	Document xray internals / advanced API	shoyer 1217238	closed			2	2016-01-04T18:12:30Z	2020-11-03T17:33:32Z	2020-11-03T17:33:32Z	MEMBER			It would be useful to document the internal `Variable` class and the internal structure of `Dataset` and `DataArray`. This would be helpful for both new contributors and expert users, who might find `Variable` helpful as an advanced API. I had some notes in an earlier version of the docs that could be adapted. Note, however, that the internal structure of `DataArray` changed in #648: http://xray.readthedocs.org/en/v0.2/tutorial.html#notes-on-xray-s-internals	{ "url": "https://api.github.com/repos/pydata/xarray/issues/703/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
715374721	MDU6SXNzdWU3MTUzNzQ3MjE=	4490	Group together decoding options into a single argument	shoyer 1217238	open			6	2020-10-06T06:15:18Z	2020-10-29T04:07:46Z		MEMBER			Is your feature request related to a problem? Please describe. `open_dataset()` currently has a very long function signature. This makes it hard to keep track of everything it can do, and is particularly problematic for the authors of new backends (e.g., see https://github.com/pydata/xarray/pull/4477), which might need to know how to handle all these arguments. Describe the solution you'd like To simple the interface, I propose to group together all the decoding options into a new `DecodingOptions` class. I'm thinking something like: ```python from dataclasses import dataclass, field, asdict from typing import Optional, List @dataclass(frozen=True) class DecodingOptions: mask: Optional[bool] = None scale: Optional[bool] = None datetime: Optional[bool] = None timedelta: Optional[bool] = None use_cftime: Optional[bool] = None concat_characters: Optional[bool] = None coords: Optional[bool] = None drop_variables: Optional[List[str]] = None `@classmethods def disabled(cls): return cls(mask=False, scale=False, datetime=False, timedelta=False, concat_characters=False, coords=False) def non_defaults(self): return {k: v for k, v in asdict(self).items() if v is not None} # add another method for creating default Variable Coder() objects, # e.g., those listed in encode_cf_variable()` ``` The signature of `open_dataset` would then become: `python def open_dataset( filename_or_obj, group=None, * engine=None, chunks=None, lock=None, cache=None, backend_kwargs=None, decode: Union[DecodingOptions, bool] = None, deprecated_kwargs ): if decode is None: decode = DecodingOptions() if decode is False: decode = DecodingOptions.disabled() # handle deprecated_kwargs... ...` Question: are `decode` and `DecodingOptions` the right names? Maybe these should still include the name "CF", e.g., `decode_cf` and `CFDecodingOptions`, given that these are specific to CF conventions? Note*: the current signature is `open_dataset(filename_or_obj, group=None, decode_cf=True, mask_and_scale=None, decode_times=True, autoclose=None, concat_characters=True, decode_coords=True, engine=None, chunks=None, lock=None, cache=None, drop_variables=None, backend_kwargs=None, use_cftime=None, decode_timedelta=None)` Usage with the new interface would look like `xr.open_dataset(filename, decode=False)` or `xr.open_dataset(filename, decode=xr.DecodingOptions(mask=False, scale=False))`. This requires a little* bit more typing than what we currently have, but it has a few advantages: It's easier to understand the role of different arguments. Now there is a function with ~8 arguments and a class with ~8 arguments rather than a function with ~15 arguments. It's easier to add new decoding arguments (e.g., for more advanced CF conventions), because they don't clutter the `open_dataset` interface. For example, I separated out `mask` and `scale` arguments, versus the current `mask_and_scale` argument. If a new backend plugin for `open_dataset()` needs to handle every option supported by `open_dataset()`, this makes that task significantly easier. The only decoding options they need to worry about are non-default options that were explicitly set, i.e., those exposed by the `non_defaults()` method. If another decoding option wasn't explicitly set and isn't recognized by the backend, they can just ignore it. Describe alternatives you've considered For the overall approach: We could keep the current design, with separate keyword arguments for decoding options, and just be very careful about passing around these arguments. This seems pretty painful for the backend refactor, though. We could keep the current design only for the user facing `open_dataset()` interface, and then internally convert into the `DecodingOptions()` struct for passing to backend constructors. This would provide much needed flexibility for backend authors, but most users wouldn't benefit from the new interface. Perhaps this would make sense as an intermediate step?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4490/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
718492237	MDExOlB1bGxSZXF1ZXN0NTAwODc5MTY3	4500	Add variable/attribute names to netCDF validation errors	shoyer 1217238	closed			1	2020-10-10T00:47:18Z	2020-10-10T05:28:08Z	2020-10-10T05:28:08Z	MEMBER	0	pydata/xarray/pulls/4500	This should result in a better user experience, e.g., specifically pointing out the attribute with an invalid value. [x] Tests added [x] Passes `isort . && black . && mypy . && flake8`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4500/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
169274464	MDU6SXNzdWUxNjkyNzQ0NjQ=	939	Consider how to deal with the proliferation of decoder options on open_dataset	shoyer 1217238	closed			8	2016-08-04T01:57:26Z	2020-10-06T15:39:11Z	2020-10-06T15:39:11Z	MEMBER			There are already lots of keyword arguments, and users want even more! (#843) Maybe we should use some sort of object to encapsulate desired options?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/939/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
253107677	MDU6SXNzdWUyNTMxMDc2Nzc=	1527	Binary operations with ds.groupby('time.dayofyear') errors out, but ds.groupby('time.month') works	shoyer 1217238	open			10	2017-08-26T16:54:53Z	2020-09-29T10:05:42Z		MEMBER			Reported on the mailing list: Original datasets: ``` ds_xr <xarray.DataArray (time: 12775)> array([-0.01, -0.01, -0.01, ..., -0.27, -0.27, -0.27]) Coordinates: * time (time) datetime64[ns] 1979-01-01 1979-01-02 1979-01-03 ... slope_itcp_ds <xarray.Dataset> Dimensions: (lat: 73, level: 2, lon: 144, time: 366) Coordinates: * lon (lon) float32 0.0 2.5 5.0 7.5 10.0 12.5 ... * lat (lat) float32 90.0 87.5 85.0 82.5 80.0 ... * level (level) float64 0.0 1.0 * time (time) datetime64[ns] 2010-01-01 ... Data variables: xarray_dataarray_variable (time, level, lat, lon) float64 -0.8795 ... Attributes: CDI: Climate Data Interface version 1.7.1 (http://mpimet.mpg.de/... Conventions: CF-1.4 history: Fri Aug 25 18:55:50 2017: cdo -inttime,2010-01-01,00:00:00,... CDO: Climate Data Operators version 1.7.1 (http://mpimet.mpg.de/... ``` Issue: Grouping by month works and outputs this: ``` ds_xr.groupby('time.month') - slope_itcp_ds.groupby('time.month').mean('time') <xarray.Dataset> Dimensions: (lat: 73, level: 2, lon: 144, time: 12775) Coordinates: * lon (lon) float32 0.0 2.5 5.0 7.5 10.0 12.5 ... * lat (lat) float32 90.0 87.5 85.0 82.5 80.0 ... * level (level) float64 0.0 1.0 month (time) int64 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ... * time (time) datetime64[ns] 1979-01-01 ... Data variables: xarray_dataarray_variable (time, level, lat, lon) float64 1.015 ... ``` Grouping by dayofyear doesn't work and gives this traceback: ``` ds_xr.groupby('time.dayofyear') - slope_itcp_ds.groupby('time.dayofyear').mean('time') KeyError Traceback (most recent call last) <ipython-input-10-01c0cf4c980a> in <module>() ----> 1 ds_xr.groupby('time.dayofyear') - slope_itcp_ds.groupby('time.dayofyear').mean('time') /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/groupby.py in func(self, other) 316 g = f if not reflexive else lambda x, y: f(y, x) 317 applied = self._yield_binary_applied(g, other) --> 318 combined = self._combine(applied) 319 return combined 320 return func /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/groupby.py in _combine(self, applied, shortcut) 532 combined = self._concat_shortcut(applied, dim, positions) 533 else: --> 534 combined = concat(applied, dim) 535 combined = _maybe_reorder(combined, dim, positions) 536 /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in concat(objs, dim, data_vars, coords, compat, positions, indexers, mode, concat_over) 118 raise TypeError('can only concatenate xarray Dataset and DataArray ' 119 'objects, got %s' % type(first_obj)) --> 120 return f(objs, dim, data_vars, coords, compat, positions) 121 122 /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions) 210 datasets = align(datasets, join='outer', copy=False, exclude=[dim]) 211 --> 212 concat_over = _calc_concat_over(datasets, dim, data_vars, coords) 213 214 def insert_result_variable(k, v): /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in _calc_concat_over(datasets, dim, data_vars, coords) 190 if dim in v.dims) 191 concat_over.update(process_subset_opt(data_vars, 'data_vars')) --> 192 concat_over.update(process_subset_opt(coords, 'coords')) 193 if dim in datasets[0]: 194 concat_over.add(dim) /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in process_subset_opt(opt, subset) 165 for ds in datasets[1:]) 166 # all nonindexes that are not the same in each dataset --> 167 concat_new = set(k for k in getattr(datasets[0], subset) 168 if k not in concat_over and differs(k)) 169 elif opt == 'all': /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in <genexpr>(.0) 166 # all nonindexes that are not the same in each dataset 167 concat_new = set(k for k in getattr(datasets[0], subset) --> 168 if k not in concat_over and differs(k)) 169 elif opt == 'all': 170 concat_new = (set(getattr(datasets[0], subset)) - /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in differs(vname) 163 v = datasets[0].variables[vname] 164 return any(not ds.variables[vname].equals(v) --> 165 for ds in datasets[1:]) 166 # all nonindexes that are not the same in each dataset 167 concat_new = set(k for k in getattr(datasets[0], subset) /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in <genexpr>(.0) 163 v = datasets[0].variables[vname] 164 return any(not ds.variables[vname].equals(v) --> 165 for ds in datasets[1:]) 166 # all nonindexes that are not the same in each dataset 167 concat_new = set(k for k in getattr(datasets[0], subset) /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/utils.py in getitem(self, key) 288 289 def getitem(self, key): --> 290 return self.mapping[key] 291 292 def iter*(self): KeyError: 'lon' ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1527/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
644821435	MDU6SXNzdWU2NDQ4MjE0MzU=	4176	Pre-expand data and attributes in DataArray/Variable HTML repr?	shoyer 1217238	closed			7	2020-06-24T18:22:35Z	2020-09-21T20:10:26Z	2020-06-28T17:03:40Z	MEMBER			Proposal Given that a major purpose for plotting an array is to look at data or attributes, I wonder if we should expand these sections by default? - I worry that clicking on icons to expand sections may not be easy to discover - This would also be consistent with the text repr, which shows these sections by default (the Dataset repr is already consistent by default between text and HTML already) Context Currently the HTML repr for DataArray/Variable looks like this: To see array data, you have to click on the icon: (thanks to @max-sixty for making this a little bit more manageably sized in https://github.com/pydata/xarray/pull/3905!) There's also a really nice repr for nested dask arrays:	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4176/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
702372014	MDExOlB1bGxSZXF1ZXN0NDg3NjYxMzIz	4426	Fix for h5py deepcopy issues	shoyer 1217238	closed			6	2020-09-16T01:11:00Z	2020-09-18T22:31:13Z	2020-09-18T22:31:09Z	MEMBER	0	pydata/xarray/pulls/4426	[x] Closes #4425 [x] Tests added [x] Passes `isort . && black . && mypy . && flake8`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4426/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
669307837	MDExOlB1bGxSZXF1ZXN0NDU5Njk1NDA5	4292	Fix indexing with datetime64[ns] with pandas=1.1	shoyer 1217238	closed			11	2020-07-31T00:48:50Z	2020-09-16T03:11:48Z	2020-09-16T01:33:30Z	MEMBER	0	pydata/xarray/pulls/4292	Fixes #4283 The underlying issue is that calling `.item()` on a NumPy array with `dtype=datetime64[ns]` returns an integer, rather than an `np.datetime64` scalar. This is somewhat baffling but works this way because `.item()` returns native Python types, but `datetime.datetime` doesn't support nanosecond precision. `pandas.Index.get_loc` used to support these integers, but now is more strict. Hence we get errors. We can fix this by using `array[()]` to convert 0d arrays into NumPy scalars instead of calling `array.item()`. I've added a crude regression test. There may well be a better way to test this but I haven't figured it out yet. [x] Tests added [x] Passes `isort . && black . && mypy . && flake8`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4292/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
417542619	MDU6SXNzdWU0MTc1NDI2MTk=	2803	Test failure with TestValidateAttrs.test_validating_attrs	shoyer 1217238	closed			6	2019-03-05T23:03:02Z	2020-08-25T14:29:19Z	2019-03-14T15:59:13Z	MEMBER			This is due to setting multi-dimensional attributes being an error, as of the latest netCDF4-Python release: https://github.com/Unidata/netcdf4-python/blob/master/Changelog E.g., as seen on Appveyor: https://ci.appveyor.com/project/shoyer/xray/builds/22834250/job/9q0ip6i3cchlbkw2 ``` ================================== FAILURES =================================== ___ TestValidateAttrs.test_validating_attrs _____ self = <xarray.tests.test_backends.TestValidateAttrs object at 0x00000096BE5FAFD0> def test_validating_attrs(self): def new_dataset(): return Dataset({'data': ('y', np.arange(10.0))}, {'y': np.arange(10)}) def new_dataset_and_dataset_attrs(): ds = new_dataset() return ds, ds.attrs def new_dataset_and_data_attrs(): ds = new_dataset() return ds, ds.data.attrs def new_dataset_and_coord_attrs(): ds = new_dataset() return ds, ds.coords['y'].attrs for new_dataset_and_attrs in [new_dataset_and_dataset_attrs, new_dataset_and_data_attrs, new_dataset_and_coord_attrs]: ds, attrs = new_dataset_and_attrs() attrs[123] = 'test' with raises_regex(TypeError, 'Invalid name for attr'): ds.to_netcdf('test.nc') ds, attrs = new_dataset_and_attrs() attrs[MiscObject()] = 'test' with raises_regex(TypeError, 'Invalid name for attr'): ds.to_netcdf('test.nc') ds, attrs = new_dataset_and_attrs() attrs[''] = 'test' with raises_regex(ValueError, 'Invalid name for attr'): ds.to_netcdf('test.nc') # This one should work ds, attrs = new_dataset_and_attrs() attrs['test'] = 'test' with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = {'a': 5} with raises_regex(TypeError, 'Invalid value for attr'): ds.to_netcdf('test.nc') ds, attrs = new_dataset_and_attrs() attrs['test'] = MiscObject() with raises_regex(TypeError, 'Invalid value for attr'): ds.to_netcdf('test.nc') ds, attrs = new_dataset_and_attrs() attrs['test'] = 5 with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = 3.14 with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = [1, 2, 3, 4] with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = (1.9, 2.5) with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = np.arange(5) with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = np.arange(12).reshape(3, 4) with create_tmp_file() as tmp_file: `ds.to_netcdf(tmp_file)` xarray\tests\test_backends.py:3450: xarray\core\dataset.py:1323: in to_netcdf compute=compute) xarray\backends\api.py:767: in to_netcdf unlimited_dims=unlimited_dims) xarray\backends\api.py:810: in dump_to_store unlimited_dims=unlimited_dims) xarray\backends\common.py:262: in store self.set_attributes(attributes) xarray\backends\common.py:278: in set_attributes self.set_attribute(k, v) xarray\backends\netCDF4_.py:418: in set_attribute set_nc_attribute(self.ds, key, value) xarray\backends\netCDF4.py:294: in _set_nc_attribute obj.setncattr(key, value) netCDF4_netCDF4.pyx:2781: in netCDF4._netCDF4.Dataset.setncattr ??? ??? E ValueError: multi-dimensional array attributes not supported netCDF4_netCDF4.pyx:1514: ValueError ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2803/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
676306518	MDU6SXNzdWU2NzYzMDY1MTg=	4331	Support explicitly setting a dimension order with to_dataframe()	shoyer 1217238	closed			0	2020-08-10T17:45:17Z	2020-08-14T18:28:26Z	2020-08-14T18:28:26Z	MEMBER			As discussed in https://github.com/pydata/xarray/issues/2346, it would be nice to support explicitly setting the desired order of dimensions when calling `Dataset.to_dataframe()` or `DataArray.to_dataframe()`. There is nice precedent for this in the `to_dask_dataframe` method: http://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_dask_dataframe.html I imagine we could copy the exact same API for `to_dataframe.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4331/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
671019427	MDU6SXNzdWU2NzEwMTk0Mjc=	4295	We shouldn't require a recent version of setuptools to install xarray	shoyer 1217238	closed			33	2020-08-01T16:49:57Z	2020-08-14T09:52:42Z	2020-08-14T09:52:42Z	MEMBER			@canol reports on our mailing that our setuptools 41.2 (released 21 August 2019) install requirement is making it hard to install recent versions of xarray at his company: https://groups.google.com/g/xarray/c/HS_xcZDEEtA/m/GGmW-3eMCAAJ Hello, this is just a feedback about an issue we experienced which caused our internal tools stack to stay with xarray 0.15 version instead of a newer versions. We are a company using xarray in our internal frameworks and at the beginning we didn't have any restrictions on xarray version in our requirements file, so that new installations of our framework were using the latest version of xarray. But a few months ago we started to hear complaints from users who were having problems with installing our framework and the installation was failing because of xarray's requirement to use at least setuptools 41.2 which is released on 21th of August last year. So it hasn't been a year since it got released which might be considered relatively new. During the installation of our framework, pip was failing to update setuptools by saying that some other process is already using setuptools files so it cannot update setuptools. The people who are using our framework are not software developers so they didn't know how to solve this problem and it became so overwhelming for us maintainers that we set the xarray requirement to version >=0.15 <0.16. We also share our internal framework with customers of our company so we didn't want to bother the customers with any potential problems. You can see some other people having having similar problem when trying to update setuptools here (although not related to xarray): https://stackoverflow.com/questions/49338652/pip-install-u-setuptools-fail-windows-10 It is not a big deal but I just wanted to give this as a feedback. I don't know how much xarray depends on setuptools' 41.2 version. I was surprised to see this in our `setup.cfg` file, added by @crusaderky in #3628. The version requirement is not documented in our docs. Given that setuptools may be challenging to upgrade, would it be possible to relax this version requirement?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4295/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
638597800	MDExOlB1bGxSZXF1ZXN0NDM0MzMxNzQ3	4154	Update issue templates inspired/based on dask	shoyer 1217238	closed			1	2020-06-15T07:00:53Z	2020-08-05T13:05:33Z	2020-06-17T16:50:57Z	MEMBER	0	pydata/xarray/pulls/4154	See https://github.com/dask/dask/issues/new/choose for an approximate example of what this looks like.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4154/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
290593053	MDU6SXNzdWUyOTA1OTMwNTM=	1850	xarray contrib module	shoyer 1217238	closed			25	2018-01-22T19:50:08Z	2020-07-23T16:34:10Z	2020-07-23T16:34:10Z	MEMBER			Over in #1288 @nbren12 wrote: Overall, I think the xarray community could really benefit from some kind of centralized contrib package which has a low barrier to entry for these kinds of functions. Yes, I agree that we should explore this. There are a lot of interesting projects building on xarray now but not great ways to discover them. Are there other open source projects with a good model we should copy here? - Scikit-Learn has a separate GitHub org/repositories for contrib projects: https://github.com/scikit-learn-contrib. - TensorFlow has a contrib module within the TensorFlow namespace: `tensorflow.contrib` This gives us two different models to consider. The first "separate repository" model might be easier/flexible from a maintenance perspective. Any preferences/thoughts? There's also some nice overlap with the Pangeo project.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1850/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
646073396	MDExOlB1bGxSZXF1ZXN0NDQwNDMxNjk5	4184	Improve the speed of from_dataframe with a MultiIndex (by 40x!)	shoyer 1217238	closed			1	2020-06-26T07:39:14Z	2020-07-02T20:39:02Z	2020-07-02T20:39:02Z	MEMBER	0	pydata/xarray/pulls/4184	Before: `pandas.MultiIndexSeries.time_to_xarray ======= ========= ========== -- subset ------- -------------------- dtype True False ======= ========= ========== int 505±0ms 37.1±0ms float 485±0ms 38.3±0ms ======= ========= ==========` After: `pandas.MultiIndexSeries.time_to_xarray ======= ============ ========== -- subset ------- ----------------------- dtype True False ======= ============ ========== int 10.7±0.4ms 22.6±1ms float 10.0±0.8ms 21.1±1ms ======= ============ ==========` ~~There are still some cases where we have to fall back to the existing slow implementation, but hopefully they should now be relatively rare.~~ Edit: now we always use the new implementation [x] Closes #2459, closes #4186 [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [x] Passes `isort -rc . && black . && mypy . && flake8`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4184/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 1, "eyes": 0 }		xarray 13221727	pull
645961347	MDExOlB1bGxSZXF1ZXN0NDQwMzQ2NTQz	4182	Show data by default in HTML repr for DataArray	shoyer 1217238	closed			0	2020-06-26T02:25:08Z	2020-06-28T17:03:41Z	2020-06-28T17:03:41Z	MEMBER	0	pydata/xarray/pulls/4182	[x] Closes #4176 [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4182/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
644170008	MDExOlB1bGxSZXF1ZXN0NDM4ODQxMjk2	4171	Remove <pre> from nested HTML repr	shoyer 1217238	closed			0	2020-06-23T21:51:14Z	2020-06-24T15:45:20Z	2020-06-24T15:45:00Z	MEMBER	0	pydata/xarray/pulls/4171	Using `<pre>` messes up the display of nested HTML reprs, e.g., from dask. Now we only use the `<pre>` tag when displaying raw text reprs. Before (Jupyter notebook): After: [x] Tests added [x] Passes `isort -rc . && black . && mypy . && flake8`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4171/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
613546626	MDExOlB1bGxSZXF1ZXN0NDE0MjgwMDEz	4039	Revise pull request template	shoyer 1217238	closed			5	2020-05-06T19:08:19Z	2020-06-18T05:45:11Z	2020-06-18T05:45:10Z	MEMBER	0	pydata/xarray/pulls/4039	See below for the new language, to clarify that documentation is only necessary for "user visible changes." I added "including notable bug fixes" to indicate that minor bug fixes may not be worth noting (I was thinking of test-suite only fixes in this category) but perhaps that is too confusing. cc @pydata/xarray for opinions! [ ] Closes #xxxx [ ] Tests added [ ] Passes `isort -rc . && black . && mypy . && flake8` [ ] Fully documented, including `whats-new.rst` for user visible changes (including notable bug fixes) and `api.rst` for new API	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4039/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
639334065	MDExOlB1bGxSZXF1ZXN0NDM0OTQ0NTc4	4159	Test RTD's new pull request builder	shoyer 1217238	closed			1	2020-06-16T03:06:32Z	2020-06-17T16:54:02Z	2020-06-17T16:54:02Z	MEMBER	1	pydata/xarray/pulls/4159	https://docs.readthedocs.io/en/latest/guides/autobuild-docs-for-pull-requests.html Don't merge this!	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4159/reactions", "total_count": 3, "+1": 0, "-1": 0, "laugh": 0, "hooray": 3, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
639397110	MDExOlB1bGxSZXF1ZXN0NDM0OTk1NzQz	4160	Fix failing upstream-dev build & remove docs build	shoyer 1217238	closed			0	2020-06-16T06:08:55Z	2020-06-16T06:35:49Z	2020-06-16T06:35:44Z	MEMBER	0	pydata/xarray/pulls/4160	Instead, we'll use RTD's new doc builder instead. For an example, click on "docs/readthedocs.org:xray" below or look at GH4159 [x] Closes https://github.com/pydata/xarray/issues/4146	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4160/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
35682274	MDU6SXNzdWUzNTY4MjI3NA==	158	groupby should work with name=None	shoyer 1217238	closed			2	2014-06-13T15:38:00Z	2020-05-30T13:15:56Z	2020-05-30T13:15:56Z	MEMBER				{ "url": "https://api.github.com/repos/pydata/xarray/issues/158/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
612214951	MDExOlB1bGxSZXF1ZXN0NDEzMjIyOTEx	4028	Remove broken test for Panel with to_pandas()	shoyer 1217238	closed			5	2020-05-04T22:41:42Z	2020-05-06T01:50:21Z	2020-05-06T01:50:21Z	MEMBER	0	pydata/xarray/pulls/4028	We don't support creating a Panel with to_pandas() with any version of pandas at present, so this test was previous broken if pandas < 0.25 was installed.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4028/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
612772669	MDU6SXNzdWU2MTI3NzI2Njk=	4030	Doc build on Azure is timing out on master	shoyer 1217238	closed			1	2020-05-05T17:30:16Z	2020-05-05T21:49:26Z	2020-05-05T21:49:26Z	MEMBER			I don't know what's going on, but it currently times out after 1 hour: https://dev.azure.com/xarray/xarray/_build/results?buildId=2767&view=logs&j=7e620c85-24a8-5ffa-8b1f-642bc9b1fc36&t=68484831-0a19-5145-bfe9-6309e5f7691d Is it possible to login to Azure to debug this stuff?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4030/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
612838635	MDExOlB1bGxSZXF1ZXN0NDEzNzA3Mzgy	4032	Allow warning with cartopy in docs plotting build	shoyer 1217238	closed			1	2020-05-05T19:25:11Z	2020-05-05T21:49:26Z	2020-05-05T21:49:26Z	MEMBER	0	pydata/xarray/pulls/4032	Fixes https://github.com/pydata/xarray/issues/4030 It looks like this is triggered by the new cartopy version now being installed on RTD (version 0.17.0 -> 0.18.0). Long term we should fix this, but for now it's better just to disable the warning. Here's the message from RTD: `` Exception occurred: File "/home/docs/checkouts/readthedocs.org/user_builds/xray/conda/latest/lib/python3.8/site-packages/IPython/sphinxext/ipython_directive.py", line 586, in process_input raise RuntimeError('Non Expected warning in{}`line {}'.format(filename, lineno)) RuntimeError: Non Expected warning in`/home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/latest/doc/plotting.rst` line 732 The full traceback has been saved in /tmp/sphinx-err-qav6jjmm.log, if you want to report the issue to the developers. Please also report this if it was a user error, so that a better error message can be provided next time. A bug report can be filed in the tracker at https://github.com/sphinx-doc/sphinx/issues. Thanks! Warning in /home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/latest/doc/plotting.rst at block ending on line 732 Specify :okwarning: as an option in the ipython:: block to suppress this message /home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/latest/xarray/plot/facetgrid.py:373: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations. self.fig.tight_layout() <<<------------------------------------------------------------------------- ``` https://readthedocs.org/projects/xray/builds/10969146/	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4032/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
612262200	MDExOlB1bGxSZXF1ZXN0NDEzMjYwNTY2	4029	Support overriding existing variables in to_zarr() without appending	shoyer 1217238	closed			2	2020-05-05T01:06:40Z	2020-05-05T19:28:02Z	2020-05-05T19:28:02Z	MEMBER	0	pydata/xarray/pulls/4029	This is nice for consistency with `to_netcdf`. It should be useful for cases where users want to update values in existing Zarr datasets. [x] Tests added [x] Passes `isort -rc . && black . && mypy . && flake8` [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4029/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
187625917	MDExOlB1bGxSZXF1ZXN0OTI1MjQzMjg=	1087	WIP: New DataStore / Encoder / Decoder API for review	shoyer 1217238	closed			8	2016-11-07T05:02:04Z	2020-04-17T18:37:45Z	2020-04-17T18:37:45Z	MEMBER	0	pydata/xarray/pulls/1087	The goal here is to make something extensible that we can live with for quite some time, and to clean up the internals of xarray's backend interface. Most of these are analogues of existing xarray classes with a cleaned up interface. I have not yet worried about backwards compatibility or tests -- I would appreciate feedback on the approach here. Several parts of the logic exist for the sake of dask. I've included the word "dask" in comments to facilitate inspection by mrocklin. CC @rabernat, @pwolfram, @jhamman, @mrocklin -- for review CC @mcgibbon, @JoyMonteiro -- this is relevant to our discussion today about adding support for appending to netCDF files. Don't let this stop you from getting started on that with the existing interface, though.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1087/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
598567792	MDU6SXNzdWU1OTg1Njc3OTI=	3966	HTML repr is slightly broken in Google Colab	shoyer 1217238	closed			1	2020-04-12T20:44:51Z	2020-04-16T20:14:37Z	2020-04-16T20:14:32Z	MEMBER			The "data" toggles are pre-expanded and don't work. See https://github.com/googlecolab/colabtools/issues/1145 for a full description.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3966/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
479434052	MDU6SXNzdWU0Nzk0MzQwNTI=	3206	DataFrame with MultiIndex -> xarray with sparse array	shoyer 1217238	closed			1	2019-08-12T00:46:16Z	2020-04-06T20:41:26Z	2019-08-27T08:54:26Z	MEMBER			Now that we have preliminary support for sparse arrays in xarray, one really cool feature we could explore is creating sparse arrays from MultiIndexed pandas DataFrames. Right now, xarray's methods for creating objects from pandas always create dense arrays, but the size of these dense arrays can get big really quickly if the MultiIndex is sparsely populated, e.g., `python import pandas as pd import numpy as np import xarray df = pd.DataFrame({ 'w': range(10), 'x': list('abcdefghij'), 'y': np.arange(0, 100, 10), 'z': np.ones(10), }).set_index(['w', 'x', 'y']) print(xarray.Dataset.from_dataframe(df))` This length 10 DataFrame turned into a dense array with 1000 elements (only 10 of which are not NaN): `<xarray.Dataset> Dimensions: (w: 10, x: 10, y: 10) Coordinates: * w (w) int64 0 1 2 3 4 5 6 7 8 9 * x (x) object 'a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' * y (y) int64 0 10 20 30 40 50 60 70 80 90 Data variables: z (w, x, y) float64 1.0 nan nan nan nan nan ... nan nan nan nan 1.0` We can imagine `xarray.Dataset.from_dataframe(df, sparse=True)` would make the same Dataset, but with sparse array (with a `NaN` fill value) instead of dense arrays. Once sparse arrays work pretty well, this could actually obviate most of the use cases for `MultiIndex` in arrays. Arguably the model is quite a bit cleaner.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3206/reactions", "total_count": 3, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 3, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
479940669	MDU6SXNzdWU0Nzk5NDA2Njk=	3212	Custom fill_value for from_dataframe/from_series	shoyer 1217238	open			0	2019-08-13T03:22:46Z	2020-04-06T20:40:26Z		MEMBER			It would be to have the option to customize the fill value when creating an xarray objects from pandas, instead of requiring to always be NaN. This would probably be especially useful when creating sparse arrays (https://github.com/pydata/xarray/issues/3206), for which it often makes sense to use a fill value of zero. If your data has integer values (e.g., it represents counts), you probably don't want to let it be cast to float first.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3212/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
314482923	MDU6SXNzdWUzMTQ0ODI5MjM=	2061	Backend specific conventions decoding	shoyer 1217238	open			1	2018-04-16T02:45:46Z	2020-04-05T23:42:34Z		MEMBER			Currently, we have a single function `xarray.decode_cf()` that we apply to data loaded from all xarray backends. This is appropriate for netCDF data, but it's not appropriate for backends with different implementations. For example, it doesn't work for zarr (which is why we have the separate `open_zarr`), and is also a poor fit for PseudoNetCDF (https://github.com/pydata/xarray/pull/1905). In the worst cases (e.g., for PseudoNetCDF) it can actually result in data being decoded twice, which can result in incorrectly scaled data. Instead, we should declare default decoders as part of the backend API, and use those decoders as the defaults for `open_dataset()`. This should probably be tackled as part of the broader backends refactor: https://github.com/pydata/xarray/issues/1970	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2061/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
28376794	MDU6SXNzdWUyODM3Njc5NA==	25	Consistent rules for handling merges between variables with different attributes	shoyer 1217238	closed			13	2014-02-26T22:37:01Z	2020-04-05T19:13:13Z	2014-09-04T06:50:49Z	MEMBER			Currently, variable attributes are checked for equality before allowing for a merge via a call to `xarray_equal`. It should be possible to merge datasets even if some of the variable metadata disagrees (conflicting attributes should be dropped). This is already the behavior for global attributes. The right design of this feature should probably include some optional argument to `Dataset.merge` indicating how strict we want the merge to be. I can see at least three versions that could be useful: 1. Drop conflicting metadata silently. 2. Don't allow for conflicting values, but drop non-matching keys. 3. Require all keys and values to match. We can argue about which of these should be the default option. My inclination is to be as flexible as possible by using 1 or 2 in most cases.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/25/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
173612265	MDU6SXNzdWUxNzM2MTIyNjU=	988	Hooks for custom attribute handling in xarray operations	shoyer 1217238	open			24	2016-08-27T19:48:22Z	2020-04-05T18:19:11Z		MEMBER			Over in #964, I am working on a rewrite/unification of the guts of xarray's logic for computation with labelled data. The goal is to get all of xarray's internal logic for working with labelled data going through a minimal set of flexible functions which we can also expose as part of the API. Because we will finally have all (or at least nearly all) xarray operations using the same code path, I think it will also finally become feasible to open up hooks allowing extensions how xarray handles metadata. Two obvious use cases here are units (#525) and automatic maintenance of metadata (e.g., `cell_methods` or `history` fields). Both of these are out of scope for xarray itself, mostly because the specific logic tends to be domain specific. This could also subsume options like the existing `keep_attrs` on many operations. I like the idea of supporting something like NumPy's `__array_wrap__` to allow third-party code to finalize xarray objects in some way before they are returned. However, it's not obvious to me what the right design is. - Should we lookup a custom attribute on subclasses like `__array_wrap__` (or `__numpy_ufunc__`) in NumPy, or should we have a system (e.g., unilaterally or with a context manager and `xarray.set_options`) for registering hooks that are then checked on all xarray objects? I am inclined toward the later, even though it's a little slower, just because it will be simpler and easier to get right - Should these methods be able to control the full result objects, or only set `attrs` and/or `name`? - To be useful, do we need to allow extensions to take control of the full operation, to support things like automatic unit conversion? This would suggest something closing to `__numpy_ufunc__`, which is a little more ambitious than what I had in mind here. Feedback would be greatly appreciated. CC @darothen @rabernat @jhamman @pwolfram	{ "url": "https://api.github.com/repos/pydata/xarray/issues/988/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
29136905	MDU6SXNzdWUyOTEzNjkwNQ==	60	Implement DataArray.idxmax()	shoyer 1217238	closed		1.0 741199	14	2014-03-10T22:03:06Z	2020-03-29T01:54:25Z	2020-03-29T01:54:25Z	MEMBER			Should match the pandas function: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.idxmax.html	{ "url": "https://api.github.com/repos/pydata/xarray/issues/60/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

896 rows where user = 1217238 sorted by updated_at descending

At what level of support should Xarray acknowledge sponsors on our website?

INSTALLED VERSIONS

What is your issue?

What is your issue?

select grids of T and D

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Relevant log output

Anything else we need to know?

Environment

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Relevant log output

Anything else we need to know?

Environment

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Relevant log output

Anything else we need to know?

Environment

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of xr.show_versions()

netCDF string types

NumPy/Python string types

Current behavior of xarray

Potential alternatives

Creating empty Zarr stores

Unchunked variables

Usage example

create the new zarr store, but don't write data

look at the unwritten data

Data before writing: [ 1 100 1 100 100 1 1 1 1 1]

write out each slice (could be in separate processes)

Data after writing: [ 0 100 200 300 400 500 600 700 800 900]

Proposal

Context

Advanced export

Output of `xr.show_versions()`