github: issues: 48 rows where state = "open" and user = 1217238 sorted by updated

48 rows where state = "open" and user = 1217238 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	milestone	comments	created_at	updated_at ▲	author_association	draft	pull_request	body	reactions	repo	type
2266174558	I_kwDOAMm_X86HExRe	8975	Xarray sponsorship guidelines	shoyer 1217238	open		3	2024-04-26T17:05:01Z	2024-04-30T20:52:33Z	MEMBER			At what level of support should Xarray acknowledge sponsors on our website? I would like to surface this for open discussion because there are potential sponsoring organizations with conflicts of interest with members of Xarray's leadership team (e.g., Earthmover, which employs @jhamman, @rabernat and @dcherian). My suggestion is to use NumPy's guidelines, with an adjustment down to 1/3 of the thresholds to account for the smaller size of the project: $10,000/yr for unrestricted financial contributions (e.g., donations) $20,000/yr for financial contributions for a particular purpose (e.g., grants) $30,000/yr for in-kind contributions (e.g., time for employees to contribute) 2 person-months/yr of paid work time for one or more Xarray maintainers or regular contributors to any Xarray team or activity The NumPy guidelines also include a grace period of a minimum of 6 months for acknowledging support. I would suggest increasing this to a minimum of 1 year for Xarray. I would greatly appreciate any feedback from members of the community, either in this issue or on the next team meeting.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8975/reactions", "total_count": 6, "+1": 5, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
271043420	MDU6SXNzdWUyNzEwNDM0MjA=	1689	Roundtrip serialization of coordinate variables with spaces in their names	shoyer 1217238	open		5	2017-11-03T16:43:20Z	2024-03-22T14:02:48Z	MEMBER			If coordinates have spaces in their names, they get restored from netCDF files as data variables instead: ``` xarray.open_dataset(xarray.Dataset(coords={'name with spaces': 1}).to_netcdf()) <xarray.Dataset> Dimensions: () Data variables: name with spaces int32 1 ```` This happens because the CF convention is to indicate coordinates as a space separated string, e.g., `coordinates='latitude longitude'`. Even though these aren't CF compliant variable names (which cannot have strings) It would be nice to have an ad-hoc convention for xarray that allows us to serialize/deserialize coordinates in all/most cases. Maybe we could use escape characters for spaces (e.g., `coordinates='name\ with\ spaces'`) or quote names if they have spaces (e.g., `coordinates='"name\ with\ spaces"'`? At the very least, we should issue a warning in these cases.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1689/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
842436143	MDU6SXNzdWU4NDI0MzYxNDM=	5081	Lazy indexing arrays as a stand-alone package	shoyer 1217238	open		6	2021-03-27T07:06:03Z	2023-12-15T13:20:03Z	MEMBER			From @rabernat on Twitter: "Xarray has some secret private classes for lazily indexing / wrapping arrays that are so useful I think they should be broken out into a standalone package. https://github.com/pydata/xarray/blob/master/xarray/core/indexing.py#L516" The idea here is create a first-class "duck array" library for lazy indexing that could replace xarray's internal classes for lazy indexing. This would be in some ways similar to dask.array, but much simpler, because it doesn't have to worry about parallel computing. Desired features: Lazy indexing Lazy transposes Lazy concatenation (#4628) and stacking Lazy vectorized operations (e.g., unary and binary arithmetic) needed for decoding variables from disk (`xarray.encoding`) and building lazy multi-dimensional coordinate arrays corresponding to map projections (#3620) Maybe: lazy reshapes (#4113) A common feature of these operations is they can (and almost always should) be fused with indexing: if N elements are selected via indexing, only O(N) compute and memory is required to produce them, regards of the size of the original arrays as long as the number of applied operations can be treated as a constant. Memory access is significantly slower than compute on modern hardware, so recomputing these operations on the fly is almost always a good idea. Out of scope: lazy computation when indexing could require access to many more elements to compute the desired value than are returned. For example, `mean()` probably should not be lazy, because that could involve computation of a very large number of elements that one might want to cache. This is valuable functionality for Xarray for two reasons: It allows for "previewing" small bits of data loaded from disk or remote storage, even if that data needs some form of cheap "decoding" from its form on disk. It allows for xarray to decode data in a lazy fashion that is compatible with full-featured systems for lazy computation (e.g., Dask), without requiring the user to choose dask when reading the data. Related issues: [Proposal] Expose Variable without Pandas dependency #3981 Lazy concatenation of arrays #4628	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5081/reactions", "total_count": 6, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 6, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
588105641	MDU6SXNzdWU1ODgxMDU2NDE=	3893	HTML repr in the online docs	shoyer 1217238	open		3	2020-03-26T02:17:51Z	2023-09-11T17:41:59Z	MEMBER			I noticed two minor issues in our online docs, now that we've switched to the hip new HTML repr by default. Most doc pages still show text, not HTML. I suspect this is a limitation of the IPython sphinx derictive we use for our snippets. We might be able to fix that by switching to jupyter-sphinx? The "attributes" part of the HTML repr in our notebook examples looks a little funny, with strange blue formatting around each attribute name. It looks like part of the outer style of our docs is leaking into the HTML repr:	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3893/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
1376109308	I_kwDOAMm_X85SBcL8	7045	Should Xarray stop doing automatic index-based alignment?	shoyer 1217238	open		13	2022-09-16T15:31:03Z	2023-08-23T07:42:34Z	MEMBER			What is your issue? I am increasingly thinking that automatic index-based alignment in Xarray (copied from pandas) may have been a design mistake. Almost every time I work with datasets with different indexes, I find myself writing code to explicitly align them: Automatic alignment is hard to predict. The implementation is complicated, and the exact mode of automatic alignment (outer vs inner vs left join) depends on the specific operation. It's also no longer possible to predict the shape (or even the dtype) resulting from most Xarray operations purely from input shape/dtype. Automatic alignment brings unexpected performance penalty. In some domains (analytics) this is OK, but in others (e.g,. numerical modeling or deep learning) this is a complete deal-breaker. Automatic alignment is not useful for float indexes, because exact matches are rare. In practice, this makes it less useful in Xarray's usual domains than it for pandas. Would it be insane to consider changing Xarray's behavior to stop doing automatic alignment? I imagine we could roll this out slowly, first with warnings and then with an option for disabling it. If you think this is a good or bad idea, consider responding to this issue with a 👍 or 👎 reaction.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7045/reactions", "total_count": 13, "+1": 9, "-1": 2, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 2 }	xarray 13221727	issue
342928718	MDExOlB1bGxSZXF1ZXN0MjAyNzE0MjUx	2302	WIP: lazy=True in apply_ufunc()	shoyer 1217238	open		1	2018-07-20T00:01:21Z	2023-07-18T04:19:17Z	MEMBER	0	pydata/xarray/pulls/2302	[x] Closes https://github.com/pydata/xarray/issues/2298 [ ] Tests added [ ] Tests passed [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API Still needs more tests and documentation.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2302/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	pull
479942077	MDU6SXNzdWU0Nzk5NDIwNzc=	3213	How should xarray use/support sparse arrays?	shoyer 1217238	open		55	2019-08-13T03:29:42Z	2023-06-07T15:43:55Z	MEMBER			I'm looking forward to being easily able to create sparse xarray objects from pandas: https://github.com/pydata/xarray/issues/3206 Are there other xarray APIs that could make good use of sparse arrays, or could make sparse arrays easier to use? Some ideas: - `to_sparse()`/`to_dense()` methods for converting to/from sparse without requiring using `.data` - `to_dataframe()`/`to_series()` could grow options for skipping the fill-value in sparse arrays, so they can round-trip MultiIndex data back to pandas - Serialization to/from netCDF files, using some custom convention (see https://github.com/pydata/xarray/issues/1375#issuecomment-402699810)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3213/reactions", "total_count": 14, "+1": 14, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
1465287257	I_kwDOAMm_X85XVoJZ	7325	Support reading Zarr data via TensorStore	shoyer 1217238	open		1	2022-11-27T00:12:17Z	2023-05-11T01:24:27Z	MEMBER			What is your issue? TensorStore is another high performance API for reading distributed arrays in formats such as Zarr, written in C++. It could be interesting to write an Xarray storage backend using TensorStore as an alternative way to read Zarr files. As an exercise, I make a little demo of doing this: https://gist.github.com/shoyer/5b0c485979cc9c36a9685d8cf8e94565 I have not tested it for performance. The main annoyance is that TensorStore doesn't understand Zarr groups or Zarr array attributes, so I needed to write my own helpers for reading this metadata. Also, there's a bit of an impedance mis-match between TensorStore (where everything returns futures) and Xarray (which assumes that indexing results in numpy arrays). This could likely be improved with some amount of effort -- in particular https://github.com/pydata/xarray/pull/6874/files should help. CC @jbms who may have better ideas about how to use the TensorStore API.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7325/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
209653741	MDU6SXNzdWUyMDk2NTM3NDE=	1285	FAQ page could use some updating	shoyer 1217238	open		1	2017-02-23T03:29:16Z	2023-03-26T16:32:44Z	MEMBER			Along the same lines as https://github.com/pydata/xarray/issues/1282, we haven't done much updating for frequently asked questions -- it's mostly still the original handful of FAQ entries I wrote in the first version of the docs. Topics worth addressing: [ ] How xarray handles missing values [x] File formats -- how can I read format X in xarray? (Maybe we should make a table with links to other packages?) (please add suggestions for this list!) StackOverflow may be a helpful reference here: http://stackoverflow.com/questions/tagged/python-xarray?sort=votes&pageSize=50	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1285/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
176805500	MDU6SXNzdWUxNzY4MDU1MDA=	1004	Remove IndexVariable.name	shoyer 1217238	open		3	2016-09-14T03:27:43Z	2023-03-11T19:57:40Z	MEMBER			As discussed in #947, we should remove the `IndexVariable.name` attribute. It should be fine to use an `IndexVariable` anywhere, regardless of whether or not it labels ticks along a dimension.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1004/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
895983112	MDExOlB1bGxSZXF1ZXN0NjQ4MTM1NTcy	5351	Add xarray.backends.NoMatchingEngineError	shoyer 1217238	open		4	2021-05-19T22:09:21Z	2022-11-16T15:19:54Z	MEMBER	0	pydata/xarray/pulls/5351	[x] Closes #5329 [x] Tests added [x] Passes `pre-commit run --all-files` [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [x] New functions/methods are listed in `api.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5351/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	pull
168272291	MDExOlB1bGxSZXF1ZXN0NzkzMjE2NTc=	924	WIP: progress toward making groupby work with multiple arguments	shoyer 1217238	open		16	2016-07-29T08:07:57Z	2022-06-09T14:50:17Z	MEMBER	0	pydata/xarray/pulls/924	Fixes #324 It definitely doesn't work properly yet, totally mixing up coordinates, data variables and multi-indexes (as shown by the failing tests). A simple example: ``` In [4]: coords = {'a': ('x', [0, 0, 1, 1]), 'b': ('y', [0, 0, 1, 1])} In [5]: square = xr.DataArray(np.arange(16).reshape(4, 4), coords=coords, dims=['x', 'y']) In [6]: square Out[6]: <xarray.DataArray (x: 4, y: 4)> array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]) Coordinates: b (y) int64 0 0 1 1 a (x) int64 0 0 1 1 * x (x) int64 0 1 2 3 * y (y) int64 0 1 2 3 In [7]: square.groupby(['a', 'b']).mean() Out[7]: <xarray.DataArray (a: 2, b: 2)> array([[ 2.5, 4.5], [ 10.5, 12.5]]) Coordinates: * a (a) int64 0 1 * b (b) int64 0 1 In [8]: square.groupby(['x', 'y']).mean() Out[8]: <xarray.DataArray (x: 4, y: 4)> array([[ 0., 1., 2., 3.], [ 4., 5., 6., 7.], [ 8., 9., 10., 11.], [ 12., 13., 14., 15.]]) Coordinates: * x (x) int64 0 1 2 3 * y (y) int64 0 1 2 3 ``` More examples: https://gist.github.com/shoyer/5cfa4d5751e8a78a14af25f8442ad8d5	{ "url": "https://api.github.com/repos/pydata/xarray/issues/924/reactions", "total_count": 4, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 3, "rocket": 0, "eyes": 0 }	xarray 13221727	pull
326205036	MDU6SXNzdWUzMjYyMDUwMzY=	2180	How should Dataset.update() handle conflicting coordinates?	shoyer 1217238	open		16	2018-05-24T16:46:23Z	2022-04-30T13:40:28Z	MEMBER			Recently, we updated `Dataset.__setitem__` to drop conflicting coordinates from DataArray values being assigned if they conflict with existing coordinates (https://github.com/pydata/xarray/pull/2087). Because `update` and `__setitem__` share the same code path, this inadvertently updated `update` as well. Is this something we want? In v0.10.3, both `__setitem__` and `update` prioritize coordinates from the assigned objects (e.g., `value` in `dataset[key] = value`). In v0.10.4, both `__setitem__` and `update` prioritize coordinates from the original object (e.g., `dataset`). I'm not sure this is the right behavior. In particular, in the case of `dataset.update(other)` where `other` is also an `xarray.Dataset`, it seems like coordinates from `other` should take priority. Note that one advantage of the current logic (which is violated by my current fix in https://github.com/pydata/xarray/pull/2162), is that we maintain the invariant that `dataset[key] = value` is equivalent to `dataset.update({key: value})`.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2180/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
612918997	MDU6SXNzdWU2MTI5MTg5OTc=	4034	Fix tight_layout warning on cartopy facetgrid docs example	shoyer 1217238	open		1	2020-05-05T21:54:46Z	2022-04-30T12:37:50Z	MEMBER			Per the fix in https://github.com/pydata/xarray/pull/4032, I'm pretty sure we will soon start seeing a warning message printed on ReadTheDocs in Cartopy FacetGrid example: http://xarray.pydata.org/en/stable/plotting.html#maps This would be nice to fix for users, especially because it's likely users will see this warning when running code outside of our documentation, too.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4034/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
342180429	MDU6SXNzdWUzNDIxODA0Mjk=	2298	Making xarray math lazy	shoyer 1217238	open		7	2018-07-18T05:18:53Z	2022-04-19T15:38:59Z	MEMBER			At SciPy, I had the realization that it would be relatively straightforward to make element-wise math between xarray objects lazy. This would let us support lazy coordinate arrays, a feature that has quite a few use-cases, e.g., for both geoscience and astronomy. The trick would be to write a lazy array class that holds an element-wise vectorized function and passes indexers on to its arguments. I haven't thought too hard about this yet for vectorized indexing, but it could be quite efficient for outer indexing. I have some prototype code but no tests yet. The question is how to hook this into xarray operations. In particular, supposing that the inputs to a function do no hold dask arrays: - Should we try to make every element-wise operation with vectorized functions (ufuncs) lazy by default? This might have negative performance implications and would be a little tricky to implement with xarray's current code, since we still implement binary operations like `+` with separate logic from `apply_ufunc`. - Should we make every element-wise operation that explicitly uses `apply_ufunc()` lazy by default? - Or should we only make element-wise operations lazy with `apply_ufunc()` if you use some special flag, e.g., `apply_ufunc(..., lazy=True)`? I am leaning towards the last option for now but would welcome other opinions.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2298/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
902622057	MDU6SXNzdWU5MDI2MjIwNTc=	5381	concat() with compat='no_conflicts' on dask arrays has accidentally quadratic runtime	shoyer 1217238	open		0	2021-05-26T16:12:06Z	2022-04-19T03:48:27Z	MEMBER			This ends up calling `fillna()` in a loop inside `xarray.core.merge.unique_variable()`, something like: `python out = variables[0] for var in variables[1:]: out = out.fillna(var)` https://github.com/pydata/xarray/blob/55e5b5aaa6d9c27adcf9a7cb1f6ac3bf71c10dea/xarray/core/merge.py#L147-L149 This has quadratic behavior if the variables are stored in dask arrays (the dask graph gets one element larger after each loop iteration). This is OK for `merge()` (which typically only has two arguments) but is problematic for dealing with variables that shouldn't be concatenated inside `concat()`, which should be able to handle very long lists of arguments. I encountered this because `compat='no_conflicts'` is the default for `xarray.combine_nested()`. I guess there's also the related issue which is that even if we produced the output dask graph by hand without a loop, it still wouldn't be easy to evaluate for a large number of elements. Ideally we would use some sort of tree-reduction to ensure the operation can be parallelized. xref https://github.com/google/xarray-beam/pull/13	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5381/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
325439138	MDU6SXNzdWUzMjU0MzkxMzg=	2171	Support alignment/broadcasting with unlabeled dimensions of size 1	shoyer 1217238	open		5	2018-05-22T19:52:21Z	2022-04-19T03:15:24Z	MEMBER			Sometimes, it's convenient to include placeholder dimensions of size 1, which allows for removing any ambiguity related to the order of output dimensions. Currently, this is not supported with xarray: ``` xr.DataArray([1], dims='x') + xr.DataArray([1, 2, 3], dims='x') ValueError: arguments without labels along dimension 'x' cannot be aligned because they have different dimension sizes: {1, 3} xr.Variable(('x',), [1]) + xr.Variable(('x',), [1, 2, 3]) ValueError: operands cannot be broadcast together with mismatched lengths for dimension 'x': (1, 3) ``` However, these operations aren't really ambiguous. With size 1 dimensions, we could logically do broadcasting like NumPy arrays, e.g., ``` np.array([1]) + np.array([1, 2, 3]) array([2, 3, 4]) ``` This would be particularly convenient if we add `keepdims=True` to xarray operations (#2170).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2171/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
665488672	MDU6SXNzdWU2NjU0ODg2NzI=	4267	CachingFileManager should not use __del__	shoyer 1217238	open		2	2020-07-25T01:20:52Z	2022-04-17T21:42:39Z	MEMBER			`__del__` is sometimes called after modules have been deallocated, which results in errors printed to stderr when Python exits. This manifests itself in the following bug: https://github.com/shoyer/h5netcdf/issues/50 Per https://github.com/shoyer/h5netcdf/issues/50#issuecomment-572191867, the right solution is probably to use `weakref.finalize`.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4267/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
469440752	MDU6SXNzdWU0Njk0NDA3NTI=	3139	Change the signature of DataArray to DataArray(data, dims, coords, ...)?	shoyer 1217238	open		1	2019-07-17T20:54:57Z	2022-04-09T15:28:51Z	MEMBER			Currently, the signature of DataArray is `DataArray(data, coords, dims, ...)`: http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html In the long term, I think `DataArray(data, dims, coords, ...)` would be more intuitive: dimensions are a more fundamental part of xarray's data model than coordinates. Certainly I find it much more common to omit `coords` than to omit `dims` when I create a `DataArray`. My original reasoning for this argument order was that `dims` could be copied from `coords`, e.g., `DataArray(new_data, old_dataarray.coords)`, and it was nice to be able to pass this sole argument by position instead of by name. But a cleaner way to write this now is `old_dataarray.copy(data=new_data)`. The challenge in making any change here would be to have a smooth deprecation process, and that ideally avoids requiring users to rewrite all of their code and avoids loads of pointless/extraneous warnings. I'm not entirely sure this is possible. We could likely use heuristics to distinguish between `dims` and `coords` arguments regardless of their order, but this probably isn't something we would want to preserve in the long term. An alternative that might achieve some of the convenience of this change would be to allow for passing lists of strings in the `coords` argument by position, which are interpreted as dimensions, e.g., `DataArray(data, ['x', 'y'])`. The downside of this alternative is that it would add even more special cases to the `DataArray` constructor , which would make it harder to understand.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3139/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
237008177	MDU6SXNzdWUyMzcwMDgxNzc=	1460	groupby should still squeeze for non-monotonic inputs	shoyer 1217238	open		5	2017-06-19T20:05:14Z	2022-03-04T21:31:41Z	MEMBER			We can simply use `argsort()` to determine `group_indices` instead of `np.arange()`: https://github.com/pydata/xarray/blob/22ff955d53e253071f6e4fa849e5291d0005282a/xarray/core/groupby.py#L256	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1460/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
58117200	MDU6SXNzdWU1ODExNzIwMA==	324	Support multi-dimensional grouped operations and group_over	shoyer 1217238	open	1.0 741199	12	2015-02-18T19:42:20Z	2022-02-28T19:03:17Z	MEMBER			Multi-dimensional grouped operations should be relatively straightforward -- the main complexity will be writing an N-dimensional concat that doesn't involve repetitively copying data. The idea with `group_over` would be to support groupby operations that act on a single element from each of the given groups, rather than the unique values. For example, `ds.group_over(['lat', 'lon'])` would let you iterate over or apply to 2D slices of `ds`, no matter how many dimensions it has. Roughly speaking (it's a little more complex for the case of non-dimension variables), `ds.group_over(dims)` would get translated into `ds.groupby([d for d in ds.dims if d not in dims])`. Related: #266	{ "url": "https://api.github.com/repos/pydata/xarray/issues/324/reactions", "total_count": 18, "+1": 18, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
1090700695	I_kwDOAMm_X85BAsWX	6125	[Bug]: HTML repr does not display well in notebooks hosted on GitHub	shoyer 1217238	open		0	2021-12-29T19:05:49Z	2021-12-29T19:36:25Z	MEMBER			What happened? We see both the raw text and a malformed version of the HTML (without CSS formatting). Example (https://github.com/microsoft/PlanetaryComputerExamples/blob/main/quickstarts/reading-zarr-data.ipynb): What did you expect to happen? Either: Ideally, we only see the HTML repr, with CSS formatting applied. Or, if that isn't possible, we should figure out how to only show the raw text. nbviewer gets this right: Minimal Complete Verifiable Example No response Relevant log output No response Anything else we need to know? No response Environment NA	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6125/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
252707680	MDU6SXNzdWUyNTI3MDc2ODA=	1525	Consider setting name=False in Variable.chunk()	shoyer 1217238	open		4	2017-08-24T19:34:28Z	2021-07-13T01:50:16Z	MEMBER			@mrocklin writes: The following will be slower: `b = (a.chunk(...) + 1) + (a.chunk(...) + 1)` In current operation this will be optimized to `tmp = a.chunk(...) + 1 b = tmp + tmp` So you'll lose that, but I suspect that in your case chunking the same dataset many times is somewhat rare. See here for discussion: https://github.com/pydata/xarray/pull/1517#issuecomment-324722153 Whether this is worth doing really depends on on what people would find most useful -- and what is the most intuitive behavior.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1525/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
254888879	MDU6SXNzdWUyNTQ4ODg4Nzk=	1552	Flow chart for choosing indexing operations	shoyer 1217238	open		2	2017-09-03T17:33:30Z	2021-07-11T22:26:17Z	MEMBER			We have a lot of indexing operations, even though `sel_points` and `isel_points` are about to be deprecated (#1473). A flow chart / decision tree to help users pick the right indexing operation might be helpful (e.g., like this skimage FlowChart). It would ask various questions (e.g., do you have labels or integer positions? do you want to select or impose coordinates?) and then suggest appropriate the indexer methods. cc @fujiisoup	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1552/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
340733448	MDU6SXNzdWUzNDA3MzM0NDg=	2283	Exact alignment should allow missing dimension coordinates	shoyer 1217238	open		2	2018-07-12T17:40:24Z	2021-06-15T09:52:29Z	MEMBER			Code Sample, a copy-pastable example if possible `python import xarray as xr xr.align(xr.DataArray([1, 2, 3], dims='x'), xr.DataArray([1, 2, 3], dims='x', coords=[[0, 1, 2]]), join='exact')` Problem description This currently results in an error, but a missing index of size 3 does not actually conflict: ```python-traceback ValueError Traceback (most recent call last) <ipython-input-15-1d63d3512fb6> in <module>() 1 xr.align(xr.DataArray([1, 2, 3], dims='x'), 2 xr.DataArray([1, 2, 3], dims='x', coords=[[0, 1, 2]]), ----> 3 join='exact') /usr/local/lib/python3.6/dist-packages/xarray/core/alignment.py in align(objects, kwargs) 129 raise ValueError( 130 'indexes along dimension {!r} are not equal' --> 131 .format(dim)) 132 index = joiner(matching_indexes) 133 joined_indexes[dim] = index ValueError: indexes along dimension 'x' are not equal ``` This surfaced as an issue on StackOverflow: https://stackoverflow.com/questions/51308962/computing-matrix-vector-multiplication-for-each-time-point-in-two-dataarrays Expected Output Both output arrays should end up with the `x` coordinate from the input that has it, like the output of the above expression if `join='inner'`: `(<xarray.DataArray (x: 3)> array([1, 2, 3]) Coordinates: x (x) int64 0 1 2, <xarray.DataArray (x: 3)> array([1, 2, 3]) Coordinates: * x (x) int64 0 1 2)` Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.14.33+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.7 pandas: 0.22.0 numpy: 1.14.5 scipy: 0.19.1 netCDF4: None h5netcdf: None h5py: 2.8.0 Nio: None zarr: None bottleneck: None cyordereddict: None dask: None distributed: None matplotlib: 2.1.2 cartopy: None seaborn: 0.7.1 setuptools: 39.1.0 pip: 10.0.1 conda: None pytest: None IPython: 5.5.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2283/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
842438533	MDU6SXNzdWU4NDI0Mzg1MzM=	5082	Move encoding from xarray.Variable to duck arrays?	shoyer 1217238	open		2	2021-03-27T07:21:55Z	2021-06-13T01:34:00Z	MEMBER			The `encoding` property on `Variable` has always been an awkward part of Xarray's API, and an example of poor separation of concerns. It add conceptual overhead to all uses of `xarray.Variable`, but exists only for the (somewhat niche) benefit of Xarray's backend IO functionality. This is particularly problematic if we consider the possible separation of `xarray.Variable` into a separate package to remove the pandas dependency (https://github.com/pydata/xarray/issues/3981). I think a cleaner way to handle `encoding` would be to move it from `Variable` onto array objects, specifically duck array objects that Xarray creates when loading data from disk. As long as these duck arrays don't "propagate" themselves under array operations but rather turn into raw numpy arrays (or whatever is wrapped), this would automatically resolve all issues around propagating `encoding` attributes (e.g., https://github.com/pydata/xarray/pull/5065, https://github.com/pydata/xarray/issues/1614). And users who don't care about `encoding` because they don't use Xarray's IO functionality would never need to think about it.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5082/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
276241764	MDU6SXNzdWUyNzYyNDE3NjQ=	1739	Utility to restore original dimension order after apply_ufunc	shoyer 1217238	open		11	2017-11-23T00:47:57Z	2021-05-29T07:39:33Z	MEMBER			This seems to be coming up quite a bit for wrapping functions that apply an operation along an axis, e.g., for `interpolate` in #1640 or `rank` in #1733. We should either write a utility function to do this or consider adding an option to `apply_ufunc`.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1739/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
901047466	MDU6SXNzdWU5MDEwNDc0NjY=	5372	Consider revising the _repr_inline_ protocol	shoyer 1217238	open		0	2021-05-25T16:18:31Z	2021-05-25T16:18:31Z	MEMBER			`_repr_inline_` looks like an IPython special method but is actually includes some xarray specific details: the result should not include `shape` or `dtype`. As I wrote in https://github.com/pydata/xarray/pull/5352, I would suggest revising it in one of two ways: Giving it a name like `_xarray_repr_inline_` to make it clearer that it's Xarray specific Include some more generic way of indicating that `shape`/`dtype` is redundant, e.g,. call it like `obj._repr_ndarray_inline_(dtype=False, shape=False)`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5372/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
294241734	MDU6SXNzdWUyOTQyNDE3MzQ=	1887	Boolean indexing with multi-dimensional key arrays	shoyer 1217238	open		13	2018-02-04T23:28:45Z	2021-04-22T21:06:47Z	MEMBER			Originally from https://github.com/pydata/xarray/issues/974 For boolean indexing: - `da[key]` where `key` is a boolean labelled array (with any number of dimensions) is made equivalent to `da.where(key.reindex_like(ds), drop=True)`. This matches the existing behavior if `key` is a 1D boolean array. For multi-dimensional arrays, even though the result is now multi-dimensional, this coupled with automatic skipping of NaNs means that `da[key].mean()` gives the same result as in NumPy. - `da[key] = value` where `key` is a boolean labelled array can be made equivalent to `da = da.where(*align(key.reindex_like(da), value.reindex_like(da)))` (that is, the three argument form of `where`). - `da[key_0, ..., key_n]` where all of `key_i` are boolean arrays gets handled in the usual way. It is an `IndexingError` to supply multiple labelled keys if any of them are not already aligned with as the corresponding index coordinates (and share the same dimension name). If they want alignment, we suggest users simply write `da[key_0 & ... & key_n]`.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1887/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
843996137	MDU6SXNzdWU4NDM5OTYxMzc=	5092	Concurrent loading of coordinate arrays from Zarr	shoyer 1217238	open		0	2021-03-30T02:19:50Z	2021-04-19T02:43:31Z	MEMBER			When you open a dataset with Zarr, xarray loads coordinate arrays corresponding to indexes in serial. This can be slow (multiple seconds) even with only a handful of such arrays if they are stored in a remote filesystem (e.g., cloud object stores). This is similar to the use-cases for consolidated metadata. In principle, we could speed up loading datasets from Zarr into Xarray significantly by reading the data corresponding to these arrays in parallel (e.g., in multiple threads).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5092/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
264098632	MDU6SXNzdWUyNjQwOTg2MzI=	1618	apply_raw() for a simpler version of apply_ufunc()	shoyer 1217238	open		4	2017-10-10T04:51:38Z	2021-01-01T17:14:43Z	MEMBER			`apply_raw()` would work like `apply_ufunc()`, but without the hard to understand broadcasting behavior and core dimensions. The rule for `apply_raw()` would be that it directly unwraps its arguments and passes them on to the wrapped function, without any broadcasting. We would also include a `dim` argument that is automatically converted into the appropriate `axis` argument when calling the wrapped function. Output dimensions would be determined from a simple rule of some sort: - Default output dimensions would either be copied from the first argument, or would take on the ordered union on all input dimensions. - Custom dimensions could either be set by adding a `drop_dims` argument (like `dask.array.map_blocks`), or require an explicit override `output_dims`. This also could be suitable for defining as a method instead of a separate function. See https://github.com/pydata/xarray/issues/1251 and https://github.com/pydata/xarray/issues/1130 for related issues.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1618/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
269700511	MDU6SXNzdWUyNjk3MDA1MTE=	1672	Append along an unlimited dimension to an existing netCDF file	shoyer 1217238	open		8	2017-10-30T18:09:54Z	2020-11-29T17:35:04Z	MEMBER			This would be a nice feature to have for some use cases, e.g., for writing simulation time-steps: https://stackoverflow.com/questions/46951981/create-and-write-xarray-dataarray-to-netcdf-in-chunks It should be relatively straightforward to add, too, building on support for writing files with unlimited dimensions. User facing API would probably be a new keyword argument to `to_netcdf()`, e.g., `extend='time'` to indicate the extended dimension.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1672/reactions", "total_count": 21, "+1": 21, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
314444743	MDU6SXNzdWUzMTQ0NDQ3NDM=	2059	How should xarray serialize bytes/unicode strings across Python/netCDF versions?	shoyer 1217238	open		5	2018-04-15T19:36:55Z	2020-11-19T10:08:16Z	MEMBER			netCDF string types We have several options for storing strings in netCDF files: - `NC_CHAR`: netCDF's legacy character type. The closest match is NumPy `'S1'` dtype. In principle, it's supposed to be able to store arbitrary bytes. On HDF5, it uses an UTF-8 encoded string with a fixed-size of 1 (but note that HDF5 does not complain about storing arbitrary bytes). - `NC_STRING`: netCDF's newer variable length string type. It's only available on netCDF4 (not netCDF3). It corresponds to an HDF5 variable-length string with UTF-8 encoding. - `NC_CHAR` with an `_Encoding` attribute: xarray and netCDF4-Python support an ad-hoc convention for storing unicode strings in `NC_CHAR` data-types, by adding an attribute `{'_Encoding': 'UTF-8'}`. The data is still stored as fixed width strings, but xarray (and netCDF4-Python) can decode them as unicode. `NC_STRING` would seem like a clear win in cases where it's supported, but as @crusaderky points out in https://github.com/pydata/xarray/issues/2040, it actually results in much larger netCDF files in many cases than using character arrays, which are more easily compressed. Nonetheless, we currently default to storing unicode strings in `NC_STRING`, because it's the most portable option -- every tool that handles HDF5 and netCDF4 should be able to read it properly as unicode strings. NumPy/Python string types On the Python side, our options are perhaps even more confusing: - NumPy's `dtype=np.string_` corresponds to fixed-length bytes. This is the default dtype for strings on Python 2, because on Python 2 strings are the same as bytes. - NumPy's `dtype=np.unicode_` corresponds to fixed-length unicode. This is the default dtype for strings on Python 3, because on Python 3 strings are the same as unicode. - Strings are also commonly stored in numpy arrays with `dtype=np.object_`, as arrays of either `bytes` or `unicode` objects. This is a pragmatic choice, because otherwise NumPy has no support for variable length strings. We also use this (like pandas) to mark missing values with `np.nan`. Like pandas, we are pretty liberal with converting back and forth between fixed-length (`np.string`/`np.unicode_`) and variable-length (object dtype) representations of strings as necessary. This works pretty well, though converting from object arrays in particular has downsides, since it cannot be done lazily with dask. Current behavior of xarray Currently, xarray uses the same behavior on Python 2/3. The priority was faithfully round-tripping data from a particular version of Python to netCDF and back, which the current serialization behavior achieves: \| Python version \| NetCDF version \| NumPy datatype \| NetCDF datatype \| \| --------- \| ---------- \| -------------- \| ------------ \| \| Python 2 \| NETCDF3 \| np.string_ / str \| NC_CHAR \| \| Python 2 \| NETCDF4 \| np.string_ / str \| NC_CHAR \| \| Python 3 \| NETCDF3 \| np.string_ / bytes \| NC_CHAR \| \| Python 3 \| NETCDF4 \| np.string_ / bytes \| NC_CHAR \| \| Python 2 \| NETCDF3 \| np.unicode_ / unicode \| NC_CHAR with UTF-8 encoding \| \| Python 2 \| NETCDF4 \| np.unicode_ / unicode \| NC_STRING \| \| Python 3 \| NETCDF3 \| np.unicode_ / str \| NC_CHAR with UTF-8 encoding \| \| Python 3 \| NETCDF4 \| np.unicode_ / str \| NC_STRING \| \| Python 2 \| NETCDF3 \| object bytes/str \| NC_CHAR \| \| Python 2 \| NETCDF4 \| object bytes/str \| NC_CHAR \| \| Python 3 \| NETCDF3 \| object bytes \| NC_CHAR \| \| Python 3 \| NETCDF4 \| object bytes \| NC_CHAR \| \| Python 2 \| NETCDF3 \| object unicode \| NC_CHAR with UTF-8 encoding \| \| Python 2 \| NETCDF4 \| object unicode \| NC_STRING \| \| Python 3 \| NETCDF3 \| object unicode/str \| NC_CHAR with UTF-8 encoding \| \| Python 3 \| NETCDF4 \| object unicode/str \| NC_STRING \| This can also be selected explicitly for most data-types by setting dtype in encoding: - `'S1'` for NC_CHAR (with or without encoding) - `str` for NC_STRING (though I'm not 100% sure it works properly currently when given bytes) Script for generating table: ```python from __future__ import print_function import xarray as xr import uuid import netCDF4 import numpy as np import sys for dtype_name, value in [ ('np.string_ / ' + type(b'').__name__, np.array([b'abc'])), ('np.unicode_ / ' + type(u'').__name__, np.array([u'abc'])), ('object bytes/' + type(b'').__name__, np.array([b'abc'], dtype=object)), ('object unicode/' + type(u'').__name__, np.array([u'abc'], dtype=object)), ]: for format in ['NETCDF3_64BIT', 'NETCDF4']: filename = str(uuid.uuid4()) + '.nc' xr.Dataset({'data': value}).to_netcdf(filename, format=format) with netCDF4.Dataset(filename) as f: var = f.variables['data'] disk_dtype = var.dtype has_encoding = hasattr(var, '_Encoding') disk_dtype_name = (('NC_CHAR' if disk_dtype == 'S1' else 'NC_STRING') + (' with UTF-8 encoding' if has_encoding else '')) print('\|', 'Python %i' % sys.version_info[0], '\|', format[:7], '\|', dtype_name, '\|', disk_dtype_name, '\|') ``` Potential alternatives The main option I'm considering is switching to default to `NC_CHAR` with UTF-8 encoding for np.string_ / str and object bytes/str on Python 2. The current behavior could be explicitly toggled by setting an encoding of `{'_Encoding': None}`. This would imply two changes: 1. Attempting to serialize arbitrary bytes (on Python 2) would start raising an error -- anything that isn't ASCII would require explicitly disabling `_Encoding`. 2. Strings read back from disk on Python 2 would come back as unicode instead of bytes. This implicit conversion would be consistent with Python 2's general handling of bytes/unicode, and facilitate reading netCDF files on Python 3 that were written with Python 2. The counter-argument is that it may not be worth changing this at this late point, given that we will be sunsetting Python 2 support by year's end.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2059/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
715374721	MDU6SXNzdWU3MTUzNzQ3MjE=	4490	Group together decoding options into a single argument	shoyer 1217238	open		6	2020-10-06T06:15:18Z	2020-10-29T04:07:46Z	MEMBER			Is your feature request related to a problem? Please describe. `open_dataset()` currently has a very long function signature. This makes it hard to keep track of everything it can do, and is particularly problematic for the authors of new backends (e.g., see https://github.com/pydata/xarray/pull/4477), which might need to know how to handle all these arguments. Describe the solution you'd like To simple the interface, I propose to group together all the decoding options into a new `DecodingOptions` class. I'm thinking something like: ```python from dataclasses import dataclass, field, asdict from typing import Optional, List @dataclass(frozen=True) class DecodingOptions: mask: Optional[bool] = None scale: Optional[bool] = None datetime: Optional[bool] = None timedelta: Optional[bool] = None use_cftime: Optional[bool] = None concat_characters: Optional[bool] = None coords: Optional[bool] = None drop_variables: Optional[List[str]] = None `@classmethods def disabled(cls): return cls(mask=False, scale=False, datetime=False, timedelta=False, concat_characters=False, coords=False) def non_defaults(self): return {k: v for k, v in asdict(self).items() if v is not None} # add another method for creating default Variable Coder() objects, # e.g., those listed in encode_cf_variable()` ``` The signature of `open_dataset` would then become: `python def open_dataset( filename_or_obj, group=None, * engine=None, chunks=None, lock=None, cache=None, backend_kwargs=None, decode: Union[DecodingOptions, bool] = None, deprecated_kwargs ): if decode is None: decode = DecodingOptions() if decode is False: decode = DecodingOptions.disabled() # handle deprecated_kwargs... ...` Question: are `decode` and `DecodingOptions` the right names? Maybe these should still include the name "CF", e.g., `decode_cf` and `CFDecodingOptions`, given that these are specific to CF conventions? Note*: the current signature is `open_dataset(filename_or_obj, group=None, decode_cf=True, mask_and_scale=None, decode_times=True, autoclose=None, concat_characters=True, decode_coords=True, engine=None, chunks=None, lock=None, cache=None, drop_variables=None, backend_kwargs=None, use_cftime=None, decode_timedelta=None)` Usage with the new interface would look like `xr.open_dataset(filename, decode=False)` or `xr.open_dataset(filename, decode=xr.DecodingOptions(mask=False, scale=False))`. This requires a little* bit more typing than what we currently have, but it has a few advantages: It's easier to understand the role of different arguments. Now there is a function with ~8 arguments and a class with ~8 arguments rather than a function with ~15 arguments. It's easier to add new decoding arguments (e.g., for more advanced CF conventions), because they don't clutter the `open_dataset` interface. For example, I separated out `mask` and `scale` arguments, versus the current `mask_and_scale` argument. If a new backend plugin for `open_dataset()` needs to handle every option supported by `open_dataset()`, this makes that task significantly easier. The only decoding options they need to worry about are non-default options that were explicitly set, i.e., those exposed by the `non_defaults()` method. If another decoding option wasn't explicitly set and isn't recognized by the backend, they can just ignore it. Describe alternatives you've considered For the overall approach: We could keep the current design, with separate keyword arguments for decoding options, and just be very careful about passing around these arguments. This seems pretty painful for the backend refactor, though. We could keep the current design only for the user facing `open_dataset()` interface, and then internally convert into the `DecodingOptions()` struct for passing to backend constructors. This would provide much needed flexibility for backend authors, but most users wouldn't benefit from the new interface. Perhaps this would make sense as an intermediate step?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4490/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
253107677	MDU6SXNzdWUyNTMxMDc2Nzc=	1527	Binary operations with ds.groupby('time.dayofyear') errors out, but ds.groupby('time.month') works	shoyer 1217238	open		10	2017-08-26T16:54:53Z	2020-09-29T10:05:42Z	MEMBER			Reported on the mailing list: Original datasets: ``` ds_xr <xarray.DataArray (time: 12775)> array([-0.01, -0.01, -0.01, ..., -0.27, -0.27, -0.27]) Coordinates: * time (time) datetime64[ns] 1979-01-01 1979-01-02 1979-01-03 ... slope_itcp_ds <xarray.Dataset> Dimensions: (lat: 73, level: 2, lon: 144, time: 366) Coordinates: * lon (lon) float32 0.0 2.5 5.0 7.5 10.0 12.5 ... * lat (lat) float32 90.0 87.5 85.0 82.5 80.0 ... * level (level) float64 0.0 1.0 * time (time) datetime64[ns] 2010-01-01 ... Data variables: xarray_dataarray_variable (time, level, lat, lon) float64 -0.8795 ... Attributes: CDI: Climate Data Interface version 1.7.1 (http://mpimet.mpg.de/... Conventions: CF-1.4 history: Fri Aug 25 18:55:50 2017: cdo -inttime,2010-01-01,00:00:00,... CDO: Climate Data Operators version 1.7.1 (http://mpimet.mpg.de/... ``` Issue: Grouping by month works and outputs this: ``` ds_xr.groupby('time.month') - slope_itcp_ds.groupby('time.month').mean('time') <xarray.Dataset> Dimensions: (lat: 73, level: 2, lon: 144, time: 12775) Coordinates: * lon (lon) float32 0.0 2.5 5.0 7.5 10.0 12.5 ... * lat (lat) float32 90.0 87.5 85.0 82.5 80.0 ... * level (level) float64 0.0 1.0 month (time) int64 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ... * time (time) datetime64[ns] 1979-01-01 ... Data variables: xarray_dataarray_variable (time, level, lat, lon) float64 1.015 ... ``` Grouping by dayofyear doesn't work and gives this traceback: ``` ds_xr.groupby('time.dayofyear') - slope_itcp_ds.groupby('time.dayofyear').mean('time') KeyError Traceback (most recent call last) <ipython-input-10-01c0cf4c980a> in <module>() ----> 1 ds_xr.groupby('time.dayofyear') - slope_itcp_ds.groupby('time.dayofyear').mean('time') /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/groupby.py in func(self, other) 316 g = f if not reflexive else lambda x, y: f(y, x) 317 applied = self._yield_binary_applied(g, other) --> 318 combined = self._combine(applied) 319 return combined 320 return func /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/groupby.py in _combine(self, applied, shortcut) 532 combined = self._concat_shortcut(applied, dim, positions) 533 else: --> 534 combined = concat(applied, dim) 535 combined = _maybe_reorder(combined, dim, positions) 536 /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in concat(objs, dim, data_vars, coords, compat, positions, indexers, mode, concat_over) 118 raise TypeError('can only concatenate xarray Dataset and DataArray ' 119 'objects, got %s' % type(first_obj)) --> 120 return f(objs, dim, data_vars, coords, compat, positions) 121 122 /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions) 210 datasets = align(datasets, join='outer', copy=False, exclude=[dim]) 211 --> 212 concat_over = _calc_concat_over(datasets, dim, data_vars, coords) 213 214 def insert_result_variable(k, v): /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in _calc_concat_over(datasets, dim, data_vars, coords) 190 if dim in v.dims) 191 concat_over.update(process_subset_opt(data_vars, 'data_vars')) --> 192 concat_over.update(process_subset_opt(coords, 'coords')) 193 if dim in datasets[0]: 194 concat_over.add(dim) /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in process_subset_opt(opt, subset) 165 for ds in datasets[1:]) 166 # all nonindexes that are not the same in each dataset --> 167 concat_new = set(k for k in getattr(datasets[0], subset) 168 if k not in concat_over and differs(k)) 169 elif opt == 'all': /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in <genexpr>(.0) 166 # all nonindexes that are not the same in each dataset 167 concat_new = set(k for k in getattr(datasets[0], subset) --> 168 if k not in concat_over and differs(k)) 169 elif opt == 'all': 170 concat_new = (set(getattr(datasets[0], subset)) - /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in differs(vname) 163 v = datasets[0].variables[vname] 164 return any(not ds.variables[vname].equals(v) --> 165 for ds in datasets[1:]) 166 # all nonindexes that are not the same in each dataset 167 concat_new = set(k for k in getattr(datasets[0], subset) /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in <genexpr>(.0) 163 v = datasets[0].variables[vname] 164 return any(not ds.variables[vname].equals(v) --> 165 for ds in datasets[1:]) 166 # all nonindexes that are not the same in each dataset 167 concat_new = set(k for k in getattr(datasets[0], subset) /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/utils.py in getitem(self, key) 288 289 def getitem(self, key): --> 290 return self.mapping[key] 291 292 def iter*(self): KeyError: 'lon' ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1527/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
479940669	MDU6SXNzdWU0Nzk5NDA2Njk=	3212	Custom fill_value for from_dataframe/from_series	shoyer 1217238	open		0	2019-08-13T03:22:46Z	2020-04-06T20:40:26Z	MEMBER			It would be to have the option to customize the fill value when creating an xarray objects from pandas, instead of requiring to always be NaN. This would probably be especially useful when creating sparse arrays (https://github.com/pydata/xarray/issues/3206), for which it often makes sense to use a fill value of zero. If your data has integer values (e.g., it represents counts), you probably don't want to let it be cast to float first.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3212/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
314482923	MDU6SXNzdWUzMTQ0ODI5MjM=	2061	Backend specific conventions decoding	shoyer 1217238	open		1	2018-04-16T02:45:46Z	2020-04-05T23:42:34Z	MEMBER			Currently, we have a single function `xarray.decode_cf()` that we apply to data loaded from all xarray backends. This is appropriate for netCDF data, but it's not appropriate for backends with different implementations. For example, it doesn't work for zarr (which is why we have the separate `open_zarr`), and is also a poor fit for PseudoNetCDF (https://github.com/pydata/xarray/pull/1905). In the worst cases (e.g., for PseudoNetCDF) it can actually result in data being decoded twice, which can result in incorrectly scaled data. Instead, we should declare default decoders as part of the backend API, and use those decoders as the defaults for `open_dataset()`. This should probably be tackled as part of the broader backends refactor: https://github.com/pydata/xarray/issues/1970	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2061/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
173612265	MDU6SXNzdWUxNzM2MTIyNjU=	988	Hooks for custom attribute handling in xarray operations	shoyer 1217238	open		24	2016-08-27T19:48:22Z	2020-04-05T18:19:11Z	MEMBER			Over in #964, I am working on a rewrite/unification of the guts of xarray's logic for computation with labelled data. The goal is to get all of xarray's internal logic for working with labelled data going through a minimal set of flexible functions which we can also expose as part of the API. Because we will finally have all (or at least nearly all) xarray operations using the same code path, I think it will also finally become feasible to open up hooks allowing extensions how xarray handles metadata. Two obvious use cases here are units (#525) and automatic maintenance of metadata (e.g., `cell_methods` or `history` fields). Both of these are out of scope for xarray itself, mostly because the specific logic tends to be domain specific. This could also subsume options like the existing `keep_attrs` on many operations. I like the idea of supporting something like NumPy's `__array_wrap__` to allow third-party code to finalize xarray objects in some way before they are returned. However, it's not obvious to me what the right design is. - Should we lookup a custom attribute on subclasses like `__array_wrap__` (or `__numpy_ufunc__`) in NumPy, or should we have a system (e.g., unilaterally or with a context manager and `xarray.set_options`) for registering hooks that are then checked on all xarray objects? I am inclined toward the later, even though it's a little slower, just because it will be simpler and easier to get right - Should these methods be able to control the full result objects, or only set `attrs` and/or `name`? - To be useful, do we need to allow extensions to take control of the full operation, to support things like automatic unit conversion? This would suggest something closing to `__numpy_ufunc__`, which is a little more ambitious than what I had in mind here. Feedback would be greatly appreciated. CC @darothen @rabernat @jhamman @pwolfram	{ "url": "https://api.github.com/repos/pydata/xarray/issues/988/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
296120524	MDU6SXNzdWUyOTYxMjA1MjQ=	1901	Update assign to preserve order for **kwargs	shoyer 1217238	open		1	2018-02-10T18:05:45Z	2020-02-10T19:44:20Z	MEMBER			In Python 3.6+, keyword arguments preserve the order in which they are written. We should update `assign` and `assign_coords` to rely on this in the next major release, as has been done in pandas: https://github.com/pandas-dev/pandas/issues/14207	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1901/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
398107776	MDU6SXNzdWUzOTgxMDc3NzY=	2666	Dataset.from_dataframe will produce a FutureWarning for DatetimeTZ data	shoyer 1217238	open		6	2019-01-11T02:45:49Z	2019-12-30T22:58:23Z	MEMBER			This appears with the development version of pandas; see https://github.com/pandas-dev/pandas/issues/24716 for details. Example: ``` In [16]: df = pd.DataFrame({"A": pd.date_range('2000', periods=12, tz='US/Central')}) In [17]: df.to_xarray() /Users/taugspurger/Envs/pandas-dev/lib/python3.7/site-packages/xarray/core/dataset.py:3111: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'. To accept the future behavior, pass 'dtype=object'. To keep the old behavior, pass 'dtype="datetime64[ns]"'. data = np.asarray(series).reshape(shape) Out[17]: <xarray.Dataset> Dimensions: (index: 12) Coordinates: * index (index) int64 0 1 2 3 4 5 6 7 8 9 10 11 Data variables: A (index) datetime64[ns] 2000-01-01T06:00:00 ... 2000-01-12T06:00:00 ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2666/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
96211612	MDU6SXNzdWU5NjIxMTYxMg==	486	API for multi-dimensional resampling/regridding	shoyer 1217238	open		32	2015-07-21T02:38:29Z	2019-11-06T18:00:52Z	MEMBER			This notebook by @kegl shows a nice example of how to use pyresample with xray: https://www.lri.fr/~kegl/Ramps/edaElNino.html#Downsampling It would nice to build a wrapper for this machinery directly into xray in some way. xref #475 cc @jhamman @rabernat	{ "url": "https://api.github.com/repos/pydata/xarray/issues/486/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
269348789	MDU6SXNzdWUyNjkzNDg3ODk=	1668	Remove use of allow_cleanup_failure in test_backends.py	shoyer 1217238	open		6	2017-10-28T20:47:31Z	2019-09-29T20:07:03Z	MEMBER			This exists for the benefit of Windows, on which trying to delete an open file results in an error. But really, it would be nice to have a test suite that doesn't leave any temporary files hanging around. The main culprit is tests like this, where opening a file triggers an error: `python with raises_regex(TypeError, 'pip install netcdf4'): open_dataset(tmp_file, engine='scipy')` The way to fix this is to use mocking of some sort, to intercept calls to backend file objects and close them afterwards.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1668/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
317362786	MDU6SXNzdWUzMTczNjI3ODY=	2078	apply_ufunc should include variable names in error messages	shoyer 1217238	open		4	2018-04-24T19:26:13Z	2019-08-26T18:10:23Z	MEMBER			This would make it easier to debug issues with dimensions. For example, in this example from StackOverflow, the error message was `ValueError: operand to apply_ufunc has required core dimensions ['time', 'lat', 'lon'], but some of these are missing on the input variable: ['lat', 'lon']`. A better error message would be: `ValueError: operand to apply_ufunc has required core dimensions ['time', 'lat', 'lon'], but some of these are missing on input variable 'status': ['lat', 'lon']`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2078/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
188113943	MDU6SXNzdWUxODgxMTM5NDM=	1097	Better support for subclasses: tests, docs and API	shoyer 1217238	open		14	2016-11-08T21:54:00Z	2019-08-22T13:07:44Z	MEMBER			Given that people do currently subclass xarray objects, it's worth considering making a subclass API like pandas: http://pandas.pydata.org/pandas-docs/stable/internals.html#subclassing-pandas-data-structures At the very least, it would be nice to have docs that describe how/when it's safe to subclass, and tests that verify our support for such subclasses.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1097/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
292000828	MDU6SXNzdWUyOTIwMDA4Mjg=	1861	Add an example page to the docs on geospatial filtering/indexing	shoyer 1217238	open		0	2018-01-26T19:07:11Z	2019-07-12T02:53:53Z	MEMBER			We cover standard time-series stuff pretty well in the "Toy weather data" example, but geospatial filtering/indexing questions come up all the time aren't well covered. Topics could include: - How to filter out a region of interest (`sel()` with `slice` and `where(..., drop=True)`) - How to align two gridded datasets in space. - How to sample a gridded dataset at a list of station locations - How to resample a dataset to a new resolution (possibly referencing xESMF) Not all of these are as smooth as they could be, but hopefully that will clearly point to where we have room for improvement in our APIs :).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1861/reactions", "total_count": 6, "+1": 6, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
35633124	MDU6SXNzdWUzNTYzMzEyNA==	155	Expose a public interface for CF encoding/decoding functions	shoyer 1217238	open		3	2014-06-12T23:33:42Z	2019-02-04T04:17:40Z	MEMBER			Relevant discussion: #153	{ "url": "https://api.github.com/repos/pydata/xarray/issues/155/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
403504120	MDU6SXNzdWU0MDM1MDQxMjA=	2719	Should xarray.align sort indexes in alignment?	shoyer 1217238	open		1	2019-01-27T01:51:29Z	2019-01-28T18:03:53Z	MEMBER			I noticed in https://github.com/pandas-dev/pandas/issues/24959 (which turned up as a failure in our test suite) that pandas sorts by default in `Index.union` and now `Index.intersection`, unless the indexes are the same or either index has duplicates. (These aspects are probably bugs.) It occurs to me that we should make an intentional choice about sorting in `xarray.align()`, rather than merely following the whims of changed upstream behavior. Note that `align()` is called internally by all xarray operations that combine multiple objects (e.g., in arithmetic). My proposal is to use "order of appearance" and not sort by default, but add a `sort` keyword argument to allow users to control this. Reasons for the default behavior of not sorting: 1. Sorting can't be undone if the original order is lost, so this preserve maximum flexibility for users. 2. This matches how we handle the ordering of dimensions in broadcasting. 3. Pandas is quite inconsistent with how it applies sorting and we don't want to copy that in xarray. We definitely don't want to sort in all cases by default (e.g., if objects have the same index), so we should avoid sorting in others.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2719/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
316448044	MDU6SXNzdWUzMTY0NDgwNDQ=	2069	to_netcdf() should not implicitly load dask arrays of strings into memory	shoyer 1217238	open		0	2018-04-21T00:57:23Z	2019-01-13T01:41:20Z	MEMBER			As discussed in https://github.com/pydata/xarray/pull/2058#discussion_r181606513, we should have an explicit interface of some sort, either via encoding or some new keyword argument to `to_netcdf()`.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2069/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

48 rows where state = "open" and user = 1217238 sorted by updated_at descending

At what level of support should Xarray acknowledge sponsors on our website?

What is your issue?

What is your issue?

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Relevant log output

Anything else we need to know?

Environment

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of xr.show_versions()

netCDF string types

NumPy/Python string types

Current behavior of xarray

Potential alternatives

Advanced export

Output of `xr.show_versions()`