id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 2266174558,I_kwDOAMm_X86HExRe,8975,Xarray sponsorship guidelines,1217238,open,0,,,3,2024-04-26T17:05:01Z,2024-04-30T20:52:33Z,,MEMBER,,,,"### At what level of support should Xarray acknowledge sponsors on our website? I would like to surface this for open discussion because there are potential sponsoring organizations with conflicts of interest with members of Xarray's leadership team (e.g., [Earthmover](https://earthmover.io/), which employs @jhamman, @rabernat and @dcherian). My suggestion is to use [NumPy's guidelines](https://numpy.org/neps/nep-0046-sponsorship-guidelines.html), with an adjustment down to 1/3 of the thresholds to account for the smaller size of the project: - $10,000/yr for unrestricted financial contributions (e.g., donations) - $20,000/yr for financial contributions for a particular purpose (e.g., grants) - $30,000/yr for in-kind contributions (e.g., time for employees to contribute) - 2 person-months/yr of paid work time for one or more Xarray maintainers or regular contributors to any Xarray team or activity The NumPy guidelines also include a grace period of a minimum of 6 months for acknowledging support. I would suggest increasing this to a minimum of 1 year for Xarray. I would greatly appreciate any feedback from members of the community, either in this issue or on the next [team meeting](https://docs.xarray.dev/en/stable/developers-meeting.html).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8975/reactions"", ""total_count"": 6, ""+1"": 5, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 271043420,MDU6SXNzdWUyNzEwNDM0MjA=,1689,Roundtrip serialization of coordinate variables with spaces in their names,1217238,open,0,,,5,2017-11-03T16:43:20Z,2024-03-22T14:02:48Z,,MEMBER,,,,"If coordinates have spaces in their names, they get restored from netCDF files as data variables instead: ``` >>> xarray.open_dataset(xarray.Dataset(coords={'name with spaces': 1}).to_netcdf()) Dimensions: () Data variables: name with spaces int32 1 ```` This happens because the CF convention is to indicate coordinates as a space separated string, e.g., `coordinates='latitude longitude'`. Even though these aren't CF compliant variable names (which cannot have strings) It would be nice to have an ad-hoc convention for xarray that allows us to serialize/deserialize coordinates in all/most cases. Maybe we could use escape characters for spaces (e.g., `coordinates='name\ with\ spaces'`) or quote names if they have spaces (e.g., `coordinates='""name\ with\ spaces""'`? At the very least, we should issue a warning in these cases.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1689/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 842436143,MDU6SXNzdWU4NDI0MzYxNDM=,5081,Lazy indexing arrays as a stand-alone package,1217238,open,0,,,6,2021-03-27T07:06:03Z,2023-12-15T13:20:03Z,,MEMBER,,,,"From @rabernat on [Twitter](https://twitter.com/rabernat/status/1330707155742322689): > ""Xarray has some secret private classes for lazily indexing / wrapping arrays that are so useful I think they should be broken out into a standalone package. https://github.com/pydata/xarray/blob/master/xarray/core/indexing.py#L516"" The idea here is create a first-class ""duck array"" library for lazy indexing that could replace xarray's internal classes for lazy indexing. This would be in some ways similar to dask.array, but much simpler, because it doesn't have to worry about parallel computing. Desired features: - Lazy indexing - Lazy transposes - Lazy concatenation (#4628) and stacking - Lazy vectorized operations (e.g., unary and binary arithmetic) - needed for decoding variables from disk (`xarray.encoding`) and - building lazy multi-dimensional coordinate arrays corresponding to map projections (#3620) - Maybe: lazy reshapes (#4113) A common feature of these operations is they can (and almost always should) be _fused_ with indexing: if N elements are selected via indexing, only O(N) compute and memory is required to produce them, regards of the size of the original arrays as long as the number of applied operations can be treated as a constant. Memory access is significantly slower than compute on modern hardware, so recomputing these operations on the fly is almost always a good idea. Out of scope: lazy computation when indexing could require access to many more elements to compute the desired value than are returned. For example, `mean()` probably should not be lazy, because that could involve computation of a very large number of elements that one might want to cache. This is valuable functionality for Xarray for two reasons: 1. It allows for ""previewing"" small bits of data loaded from disk or remote storage, even if that data needs some form of cheap ""decoding"" from its form on disk. 2. It allows for xarray to decode data in a lazy fashion that is compatible with full-featured systems for lazy computation (e.g., Dask), without requiring the user to choose dask when reading the data. Related issues: - [Proposal] Expose Variable without Pandas dependency #3981 - Lazy concatenation of arrays #4628 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5081/reactions"", ""total_count"": 6, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 6, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 588105641,MDU6SXNzdWU1ODgxMDU2NDE=,3893,HTML repr in the online docs,1217238,open,0,,,3,2020-03-26T02:17:51Z,2023-09-11T17:41:59Z,,MEMBER,,,,"I noticed two minor issues in our online docs, now that we've switched to the hip new HTML repr by default. 1. Most doc pages still show text, not HTML. I suspect this is a limitation of the [IPython sphinx derictive](https://ipython.readthedocs.io/en/stable/sphinxext.html) we use for our snippets. We might be able to fix that by switching to [jupyter-sphinx](https://jupyter-sphinx.readthedocs.io/en/latest/)? 2. The ""attributes"" part of the HTML repr in our notebook examples [looks a little funny](http://xarray.pydata.org/en/stable/examples/multidimensional-coords.html), with strange blue formatting around each attribute name. It looks like part of the outer style of our docs is leaking into the HTML repr: ![image](https://user-images.githubusercontent.com/1217238/77603390-31bc5a80-6ecd-11ea-911d-f2b6ed2714f6.png) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3893/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1376109308,I_kwDOAMm_X85SBcL8,7045,Should Xarray stop doing automatic index-based alignment?,1217238,open,0,,,13,2022-09-16T15:31:03Z,2023-08-23T07:42:34Z,,MEMBER,,,,"### What is your issue? I am increasingly thinking that automatic index-based alignment in Xarray (copied from pandas) may have been a design mistake. Almost every time I work with datasets with different indexes, I find myself writing code to explicitly align them: 1. Automatic alignment is **hard to predict**. The implementation is complicated, and the exact mode of automatic alignment (outer vs inner vs left join) depends on the specific operation. It's also no longer possible to predict the shape (or even the dtype) resulting from most Xarray operations purely from input shape/dtype. 2. Automatic alignment brings unexpected **performance penalty**. In some domains (analytics) this is OK, but in others (e.g,. numerical modeling or deep learning) this is a complete deal-breaker. 3. Automatic alignment is **not useful for float indexes**, because exact matches are rare. In practice, this makes it less useful in Xarray's usual domains than it for pandas. Would it be insane to consider changing Xarray's behavior to stop doing automatic alignment? I imagine we could roll this out slowly, first with warnings and then with an option for disabling it. If you think this is a good or bad idea, consider responding to this issue with a 👍 or 👎 reaction.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7045/reactions"", ""total_count"": 13, ""+1"": 9, ""-1"": 2, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 2}",,,13221727,issue 479942077,MDU6SXNzdWU0Nzk5NDIwNzc=,3213,How should xarray use/support sparse arrays?,1217238,open,0,,,55,2019-08-13T03:29:42Z,2023-06-07T15:43:55Z,,MEMBER,,,,"I'm looking forward to being easily able to create sparse xarray objects from pandas: https://github.com/pydata/xarray/issues/3206 Are there other xarray APIs that could make good use of sparse arrays, or could make sparse arrays easier to use? Some ideas: - `to_sparse()`/`to_dense()` methods for converting to/from sparse without requiring using `.data` - `to_dataframe()`/`to_series()` could grow options for skipping the fill-value in sparse arrays, so they can round-trip MultiIndex data back to pandas - Serialization to/from netCDF files, using some custom convention (see https://github.com/pydata/xarray/issues/1375#issuecomment-402699810)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3213/reactions"", ""total_count"": 14, ""+1"": 14, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1465287257,I_kwDOAMm_X85XVoJZ,7325,Support reading Zarr data via TensorStore,1217238,open,0,,,1,2022-11-27T00:12:17Z,2023-05-11T01:24:27Z,,MEMBER,,,,"### What is your issue? [TensorStore](https://github.com/google/tensorstore/) is another high performance API for reading distributed arrays in formats such as Zarr, written in C++. It could be interesting to write an Xarray storage backend using TensorStore as an alternative way to read Zarr files. As an exercise, I make a little demo of doing this: https://gist.github.com/shoyer/5b0c485979cc9c36a9685d8cf8e94565 I have not tested it for performance. The main annoyance is that TensorStore doesn't understand Zarr groups or Zarr array attributes, so I needed to write my own helpers for reading this metadata. Also, there's a bit of an impedance mis-match between TensorStore (where everything returns futures) and Xarray (which assumes that indexing results in numpy arrays). This could likely be improved with some amount of effort -- in particular https://github.com/pydata/xarray/pull/6874/files should help. CC @jbms who may have better ideas about how to use the TensorStore API.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7325/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 209653741,MDU6SXNzdWUyMDk2NTM3NDE=,1285,FAQ page could use some updating,1217238,open,0,,,1,2017-02-23T03:29:16Z,2023-03-26T16:32:44Z,,MEMBER,,,,"Along the same lines as https://github.com/pydata/xarray/issues/1282, we haven't done much updating for frequently asked questions -- it's mostly still the original handful of FAQ entries I wrote in the first version of the docs. Topics worth addressing: - [ ] How xarray handles missing values - [x] File formats -- how can I read format *X* in xarray? (Maybe we should make a table with links to other packages?) (please add suggestions for this list!) StackOverflow may be a helpful reference here: http://stackoverflow.com/questions/tagged/python-xarray?sort=votes&pageSize=50","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1285/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 176805500,MDU6SXNzdWUxNzY4MDU1MDA=,1004,Remove IndexVariable.name,1217238,open,0,,,3,2016-09-14T03:27:43Z,2023-03-11T19:57:40Z,,MEMBER,,,,"As discussed in #947, we should remove the `IndexVariable.name` attribute. It should be fine to use an `IndexVariable` anywhere, regardless of whether or not it labels ticks along a dimension. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1004/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 326205036,MDU6SXNzdWUzMjYyMDUwMzY=,2180,How should Dataset.update() handle conflicting coordinates?,1217238,open,0,,,16,2018-05-24T16:46:23Z,2022-04-30T13:40:28Z,,MEMBER,,,,"Recently, we updated `Dataset.__setitem__` to drop conflicting coordinates from DataArray values being assigned if they conflict with existing coordinates (https://github.com/pydata/xarray/pull/2087). Because `update` and `__setitem__` share the same code path, this inadvertently updated `update` as well. Is this something we want? In v0.10.3, both `__setitem__` and `update` prioritize coordinates from the assigned objects (e.g., `value` in `dataset[key] = value`). In v0.10.4, both `__setitem__` and `update` prioritize coordinates from the original object (e.g., `dataset`). I'm not sure this is the right behavior. In particular, in the case of `dataset.update(other)` where `other` is also an `xarray.Dataset`, it seems like coordinates from `other` should take priority. Note that one advantage of the current logic (which is violated by my current fix in https://github.com/pydata/xarray/pull/2162), is that we maintain the invariant that `dataset[key] = value` is equivalent to `dataset.update({key: value})`.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2180/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 612918997,MDU6SXNzdWU2MTI5MTg5OTc=,4034,Fix tight_layout warning on cartopy facetgrid docs example,1217238,open,0,,,1,2020-05-05T21:54:46Z,2022-04-30T12:37:50Z,,MEMBER,,,,"Per the fix in https://github.com/pydata/xarray/pull/4032, I'm pretty sure we will soon start seeing a warning message printed on ReadTheDocs in Cartopy FacetGrid example: http://xarray.pydata.org/en/stable/plotting.html#maps This would be nice to fix for users, especially because it's likely users will see this warning when running code outside of our documentation, too.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4034/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 342180429,MDU6SXNzdWUzNDIxODA0Mjk=,2298,Making xarray math lazy,1217238,open,0,,,7,2018-07-18T05:18:53Z,2022-04-19T15:38:59Z,,MEMBER,,,,"At SciPy, I had the realization that it would be relatively straightforward to make element-wise math between xarray objects lazy. This would let us support lazy coordinate arrays, a feature that has quite a few use-cases, e.g., for both geoscience and astronomy. The trick would be to write a lazy array class that holds an element-wise vectorized function and passes indexers on to its arguments. I haven't thought too hard about this yet for vectorized indexing, but it could be quite efficient for outer indexing. I have some prototype code but no tests yet. The question is how to hook this into xarray operations. In particular, supposing that the inputs to a function do no hold dask arrays: - Should we try to make *every* element-wise operation with vectorized functions (ufuncs) lazy by default? This might have negative performance implications and would be a little tricky to implement with xarray's current code, since we still implement binary operations like `+` with separate logic from `apply_ufunc`. - Should we make every element-wise operation that explicitly uses `apply_ufunc()` lazy by default? - Or should we only make element-wise operations lazy with `apply_ufunc()` if you use some special flag, e.g., `apply_ufunc(..., lazy=True)`? I am leaning towards the last option for now but would welcome other opinions.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2298/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 902622057,MDU6SXNzdWU5MDI2MjIwNTc=,5381,concat() with compat='no_conflicts' on dask arrays has accidentally quadratic runtime,1217238,open,0,,,0,2021-05-26T16:12:06Z,2022-04-19T03:48:27Z,,MEMBER,,,,"This ends up calling `fillna()` in a loop inside `xarray.core.merge.unique_variable()`, something like: ```python out = variables[0] for var in variables[1:]: out = out.fillna(var) ``` https://github.com/pydata/xarray/blob/55e5b5aaa6d9c27adcf9a7cb1f6ac3bf71c10dea/xarray/core/merge.py#L147-L149 This has quadratic behavior if the variables are stored in dask arrays (the dask graph gets one element larger after each loop iteration). This is OK for `merge()` (which typically only has two arguments) but is problematic for dealing with variables that shouldn't be concatenated inside `concat()`, which should be able to handle very long lists of arguments. I encountered this because `compat='no_conflicts'` is the default for `xarray.combine_nested()`. I guess there's also the related issue which is that even if we produced the output dask graph by hand without a loop, it still wouldn't be easy to evaluate for a large number of elements. Ideally we would use some sort of tree-reduction to ensure the operation can be parallelized. xref https://github.com/google/xarray-beam/pull/13","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5381/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 325439138,MDU6SXNzdWUzMjU0MzkxMzg=,2171,Support alignment/broadcasting with unlabeled dimensions of size 1,1217238,open,0,,,5,2018-05-22T19:52:21Z,2022-04-19T03:15:24Z,,MEMBER,,,,"Sometimes, it's convenient to include placeholder dimensions of size 1, which allows for removing any ambiguity related to the order of output dimensions. Currently, this is not supported with xarray: ``` >>> xr.DataArray([1], dims='x') + xr.DataArray([1, 2, 3], dims='x') ValueError: arguments without labels along dimension 'x' cannot be aligned because they have different dimension sizes: {1, 3} >>> xr.Variable(('x',), [1]) + xr.Variable(('x',), [1, 2, 3]) ValueError: operands cannot be broadcast together with mismatched lengths for dimension 'x': (1, 3) ``` However, these operations aren't really ambiguous. With size 1 dimensions, we could logically do broadcasting like NumPy arrays, e.g., ``` >>> np.array([1]) + np.array([1, 2, 3]) array([2, 3, 4]) ``` This would be particularly convenient if we add `keepdims=True` to xarray operations (#2170).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2171/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 665488672,MDU6SXNzdWU2NjU0ODg2NzI=,4267,CachingFileManager should not use __del__,1217238,open,0,,,2,2020-07-25T01:20:52Z,2022-04-17T21:42:39Z,,MEMBER,,,,"`__del__` is sometimes called after modules have been deallocated, which results in errors printed to stderr when Python exits. This manifests itself in the following bug: https://github.com/shoyer/h5netcdf/issues/50 Per https://github.com/shoyer/h5netcdf/issues/50#issuecomment-572191867, the right solution is probably to use `weakref.finalize`.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4267/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 469440752,MDU6SXNzdWU0Njk0NDA3NTI=,3139,"Change the signature of DataArray to DataArray(data, dims, coords, ...)?",1217238,open,0,,,1,2019-07-17T20:54:57Z,2022-04-09T15:28:51Z,,MEMBER,,,,"Currently, the signature of DataArray is `DataArray(data, coords, dims, ...)`: http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html In the long term, I think `DataArray(data, dims, coords, ...)` would be more intuitive: dimensions are a more fundamental part of xarray's data model than coordinates. Certainly I find it much more common to omit `coords` than to omit `dims` when I create a `DataArray`. My original reasoning for this argument order was that `dims` could be copied from `coords`, e.g., `DataArray(new_data, old_dataarray.coords)`, and it was nice to be able to pass this sole argument by position instead of by name. But a cleaner way to write this now is `old_dataarray.copy(data=new_data)`. The challenge in making any change here would be to have a smooth deprecation process, and that ideally avoids requiring users to rewrite all of their code and avoids loads of pointless/extraneous warnings. I'm not entirely sure this is possible. We could likely use heuristics to distinguish between `dims` and `coords` arguments regardless of their order, but this probably isn't something we would want to preserve in the long term. An alternative that might achieve some of the convenience of this change would be to allow for passing lists of strings in the `coords` argument by position, which are interpreted as dimensions, e.g., `DataArray(data, ['x', 'y'])`. The downside of this alternative is that it would add even more special cases to the `DataArray` constructor , which would make it harder to understand.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3139/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 237008177,MDU6SXNzdWUyMzcwMDgxNzc=,1460,groupby should still squeeze for non-monotonic inputs,1217238,open,0,,,5,2017-06-19T20:05:14Z,2022-03-04T21:31:41Z,,MEMBER,,,,"We can simply use `argsort()` to determine `group_indices` instead of `np.arange()`: https://github.com/pydata/xarray/blob/22ff955d53e253071f6e4fa849e5291d0005282a/xarray/core/groupby.py#L256","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1460/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 58117200,MDU6SXNzdWU1ODExNzIwMA==,324,Support multi-dimensional grouped operations and group_over,1217238,open,0,,741199,12,2015-02-18T19:42:20Z,2022-02-28T19:03:17Z,,MEMBER,,,,"Multi-dimensional grouped operations should be relatively straightforward -- the main complexity will be writing an N-dimensional concat that doesn't involve repetitively copying data. The idea with `group_over` would be to support groupby operations that act on a single element from each of the given groups, rather than the unique values. For example, `ds.group_over(['lat', 'lon'])` would let you iterate over or apply to 2D slices of `ds`, no matter how many dimensions it has. Roughly speaking (it's a little more complex for the case of non-dimension variables), `ds.group_over(dims)` would get translated into `ds.groupby([d for d in ds.dims if d not in dims])`. Related: #266 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/324/reactions"", ""total_count"": 18, ""+1"": 18, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1090700695,I_kwDOAMm_X85BAsWX,6125,[Bug]: HTML repr does not display well in notebooks hosted on GitHub,1217238,open,0,,,0,2021-12-29T19:05:49Z,2021-12-29T19:36:25Z,,MEMBER,,,,"### What happened? We see _both_ the raw text *and* a malformed version of the HTML (without CSS formatting). Example (https://github.com/microsoft/PlanetaryComputerExamples/blob/main/quickstarts/reading-zarr-data.ipynb): ![image](https://user-images.githubusercontent.com/1217238/147695209-127feae1-7dd2-48b9-9626-f0c8eb3815eb.png) ### What did you expect to happen? Either: 1. Ideally, we only see the HTML repr, with CSS formatting applied. 2. Or, if that isn't possible, we should figure out how to only show the raw text. nbviewer [gets this right](https://nbviewer.org/github/microsoft/PlanetaryComputerExamples/blob/main/quickstarts/reading-zarr-data.ipynb): ![image](https://user-images.githubusercontent.com/1217238/147695174-eebcefff-f99a-4391-b9c1-13ccf77f36ba.png) ### Minimal Complete Verifiable Example _No response_ ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment NA","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6125/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 252707680,MDU6SXNzdWUyNTI3MDc2ODA=,1525,Consider setting name=False in Variable.chunk(),1217238,open,0,,,4,2017-08-24T19:34:28Z,2021-07-13T01:50:16Z,,MEMBER,,,,"@mrocklin writes: > The following will be slower: ``` b = (a.chunk(...) + 1) + (a.chunk(...) + 1) ``` > In current operation this will be optimized to ``` tmp = a.chunk(...) + 1 b = tmp + tmp ``` > So you'll lose that, but I suspect that in your case chunking the same dataset many times is somewhat rare. See here for discussion: https://github.com/pydata/xarray/pull/1517#issuecomment-324722153 Whether this is worth doing really depends on on what people would find most useful -- and what is the most intuitive behavior.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1525/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 254888879,MDU6SXNzdWUyNTQ4ODg4Nzk=,1552,Flow chart for choosing indexing operations,1217238,open,0,,,2,2017-09-03T17:33:30Z,2021-07-11T22:26:17Z,,MEMBER,,,,"We have a lot of indexing operations, even though `sel_points` and `isel_points` are about to be deprecated (#1473). A flow chart / decision tree to help users pick the right indexing operation might be helpful (e.g., like [this skimage FlowChart](http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html)). It would ask various questions (e.g., do you have labels or integer positions? do you want to select or impose coordinates?) and then suggest appropriate the indexer methods. cc @fujiisoup ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1552/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 340733448,MDU6SXNzdWUzNDA3MzM0NDg=,2283,Exact alignment should allow missing dimension coordinates,1217238,open,0,,,2,2018-07-12T17:40:24Z,2021-06-15T09:52:29Z,,MEMBER,,,,"#### Code Sample, a copy-pastable example if possible ```python import xarray as xr xr.align(xr.DataArray([1, 2, 3], dims='x'), xr.DataArray([1, 2, 3], dims='x', coords=[[0, 1, 2]]), join='exact') ``` #### Problem description This currently results in an error, but a missing index of size 3 does not actually conflict: ```python-traceback --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () 1 xr.align(xr.DataArray([1, 2, 3], dims='x'), 2 xr.DataArray([1, 2, 3], dims='x', coords=[[0, 1, 2]]), ----> 3 join='exact') /usr/local/lib/python3.6/dist-packages/xarray/core/alignment.py in align(*objects, **kwargs) 129 raise ValueError( 130 'indexes along dimension {!r} are not equal' --> 131 .format(dim)) 132 index = joiner(matching_indexes) 133 joined_indexes[dim] = index ValueError: indexes along dimension 'x' are not equal ``` This surfaced as an issue on StackOverflow: https://stackoverflow.com/questions/51308962/computing-matrix-vector-multiplication-for-each-time-point-in-two-dataarrays #### Expected Output Both output arrays should end up with the `x` coordinate from the input that has it, like the output of the above expression if `join='inner'`: ``` ( array([1, 2, 3]) Coordinates: * x (x) int64 0 1 2, array([1, 2, 3]) Coordinates: * x (x) int64 0 1 2) ``` #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.14.33+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.7 pandas: 0.22.0 numpy: 1.14.5 scipy: 0.19.1 netCDF4: None h5netcdf: None h5py: 2.8.0 Nio: None zarr: None bottleneck: None cyordereddict: None dask: None distributed: None matplotlib: 2.1.2 cartopy: None seaborn: 0.7.1 setuptools: 39.1.0 pip: 10.0.1 conda: None pytest: None IPython: 5.5.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2283/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 842438533,MDU6SXNzdWU4NDI0Mzg1MzM=,5082,Move encoding from xarray.Variable to duck arrays?,1217238,open,0,,,2,2021-03-27T07:21:55Z,2021-06-13T01:34:00Z,,MEMBER,,,,"The `encoding` property on `Variable` has always been an awkward part of Xarray's API, and an example of poor separation of concerns. It add conceptual overhead to all uses of `xarray.Variable`, but exists only for the (somewhat niche) benefit of Xarray's backend IO functionality. This is particularly problematic if we consider the possible separation of `xarray.Variable` into a separate package to remove the pandas dependency (https://github.com/pydata/xarray/issues/3981). I think a cleaner way to handle `encoding` would be to move it from `Variable` onto array objects, specifically duck array objects that Xarray creates when loading data from disk. As long as these duck arrays don't ""propagate"" themselves under array operations but rather turn into raw numpy arrays (or whatever is wrapped), this would automatically resolve all issues around propagating `encoding` attributes (e.g., https://github.com/pydata/xarray/pull/5065, https://github.com/pydata/xarray/issues/1614). And users who don't care about `encoding` because they don't use Xarray's IO functionality would never need to think about it.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5082/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 276241764,MDU6SXNzdWUyNzYyNDE3NjQ=,1739,Utility to restore original dimension order after apply_ufunc,1217238,open,0,,,11,2017-11-23T00:47:57Z,2021-05-29T07:39:33Z,,MEMBER,,,,"This seems to be coming up quite a bit for wrapping functions that apply an operation along an axis, e.g., for `interpolate` in #1640 or `rank` in #1733. We should either write a utility function to do this or consider adding an option to `apply_ufunc`.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1739/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 901047466,MDU6SXNzdWU5MDEwNDc0NjY=,5372,Consider revising the _repr_inline_ protocol,1217238,open,0,,,0,2021-05-25T16:18:31Z,2021-05-25T16:18:31Z,,MEMBER,,,,"`_repr_inline_` looks like an [IPython special method](https://ipython.readthedocs.io/en/stable/config/integrating.html#rich-display) but is actually includes some xarray specific details: the result should not include `shape` or `dtype`. As I wrote in https://github.com/pydata/xarray/pull/5352, I would suggest revising it in one of two ways: 1. Giving it a name like `_xarray_repr_inline_` to make it clearer that it's Xarray specific 2. Include some more generic way of indicating that `shape`/`dtype` is redundant, e.g,. call it like `obj._repr_ndarray_inline_(dtype=False, shape=False)`","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5372/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 294241734,MDU6SXNzdWUyOTQyNDE3MzQ=,1887,Boolean indexing with multi-dimensional key arrays,1217238,open,0,,,13,2018-02-04T23:28:45Z,2021-04-22T21:06:47Z,,MEMBER,,,,"Originally from https://github.com/pydata/xarray/issues/974 For _boolean indexing_: - `da[key]` where `key` is a boolean labelled array (with _any_ number of dimensions) is made equivalent to `da.where(key.reindex_like(ds), drop=True)`. This matches the existing behavior if `key` is a 1D boolean array. For multi-dimensional arrays, even though the result is now multi-dimensional, this coupled with automatic skipping of NaNs means that `da[key].mean()` gives the same result as in NumPy. - `da[key] = value` where `key` is a boolean labelled array can be made equivalent to `da = da.where(*align(key.reindex_like(da), value.reindex_like(da)))` (that is, the three argument form of `where`). - `da[key_0, ..., key_n]` where all of `key_i` are boolean arrays gets handled in the usual way. It is an `IndexingError` to supply multiple labelled keys if any of them are not already aligned with as the corresponding index coordinates (and share the same dimension name). If they want alignment, we suggest users simply write `da[key_0 & ... & key_n]`. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1887/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 843996137,MDU6SXNzdWU4NDM5OTYxMzc=,5092,Concurrent loading of coordinate arrays from Zarr,1217238,open,0,,,0,2021-03-30T02:19:50Z,2021-04-19T02:43:31Z,,MEMBER,,,,"When you open a dataset with Zarr, xarray loads coordinate arrays corresponding to indexes in serial. This can be slow (multiple seconds) even with only a handful of such arrays if they are stored in a remote filesystem (e.g., cloud object stores). This is similar to the use-cases for [consolidated metadata](https://zarr.readthedocs.io/en/latest/tutorial.html#consolidating-metadata). In principle, we could speed up loading datasets from Zarr into Xarray significantly by reading the data corresponding to these arrays in parallel (e.g., in multiple threads).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5092/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 264098632,MDU6SXNzdWUyNjQwOTg2MzI=,1618,apply_raw() for a simpler version of apply_ufunc(),1217238,open,0,,,4,2017-10-10T04:51:38Z,2021-01-01T17:14:43Z,,MEMBER,,,,"`apply_raw()` would work like `apply_ufunc()`, but without the hard to understand broadcasting behavior and core dimensions. The rule for `apply_raw()` would be that it directly unwraps its arguments and passes them on to the wrapped function, without any broadcasting. We would also include a `dim` argument that is automatically converted into the appropriate `axis` argument when calling the wrapped function. Output dimensions would be determined from a simple rule of some sort: - Default output dimensions would either be copied from the first argument, or would take on the ordered union on all input dimensions. - Custom dimensions could either be set by adding a `drop_dims` argument (like `dask.array.map_blocks`), or require an explicit override `output_dims`. This also could be suitable for defining as a method instead of a separate function. See https://github.com/pydata/xarray/issues/1251 and https://github.com/pydata/xarray/issues/1130 for related issues.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1618/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 269700511,MDU6SXNzdWUyNjk3MDA1MTE=,1672,Append along an unlimited dimension to an existing netCDF file,1217238,open,0,,,8,2017-10-30T18:09:54Z,2020-11-29T17:35:04Z,,MEMBER,,,,"This would be a nice feature to have for some use cases, e.g., for writing simulation time-steps: https://stackoverflow.com/questions/46951981/create-and-write-xarray-dataarray-to-netcdf-in-chunks It should be relatively straightforward to add, too, building on support for writing files with unlimited dimensions. User facing API would probably be a new keyword argument to `to_netcdf()`, e.g., `extend='time'` to indicate the extended dimension.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1672/reactions"", ""total_count"": 21, ""+1"": 21, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 314444743,MDU6SXNzdWUzMTQ0NDQ3NDM=,2059,How should xarray serialize bytes/unicode strings across Python/netCDF versions?,1217238,open,0,,,5,2018-04-15T19:36:55Z,2020-11-19T10:08:16Z,,MEMBER,,,,"# netCDF string types We have several options for storing strings in netCDF files: - `NC_CHAR`: netCDF's legacy character type. The closest match is NumPy `'S1'` dtype. In principle, it's supposed to be able to store arbitrary bytes. On HDF5, it uses an UTF-8 encoded string with a fixed-size of 1 (but note that HDF5 does not complain about storing arbitrary bytes). - `NC_STRING`: netCDF's newer variable length string type. It's only available on netCDF4 (not netCDF3). It corresponds to an HDF5 variable-length string with UTF-8 encoding. - `NC_CHAR` with an `_Encoding` attribute: xarray and netCDF4-Python support an ad-hoc convention for storing unicode strings in `NC_CHAR` data-types, by adding an attribute `{'_Encoding': 'UTF-8'}`. The data is still stored as fixed width strings, but xarray (and netCDF4-Python) can decode them as unicode. `NC_STRING` would seem like a clear win in cases where it's supported, but as @crusaderky points out in https://github.com/pydata/xarray/issues/2040, it actually results in much larger netCDF files in many cases than using character arrays, which are more easily compressed. Nonetheless, we currently default to storing unicode strings in `NC_STRING`, because it's the most portable option -- every tool that handles HDF5 and netCDF4 should be able to read it properly as unicode strings. # NumPy/Python string types On the Python side, our options are perhaps even more confusing: - NumPy's `dtype=np.string_` corresponds to fixed-length bytes. This is the default dtype for strings on Python 2, because on Python 2 strings are the same as bytes. - NumPy's `dtype=np.unicode_` corresponds to fixed-length unicode. This is the default dtype for strings on Python 3, because on Python 3 strings are the same as unicode. - Strings are also commonly stored in numpy arrays with `dtype=np.object_`, as arrays of either `bytes` or `unicode` objects. This is a pragmatic choice, because otherwise NumPy has no support for variable length strings. We also use this (like pandas) to mark missing values with `np.nan`. Like pandas, we are pretty liberal with converting back and forth between fixed-length (`np.string`/`np.unicode_`) and variable-length (object dtype) representations of strings as necessary. This works pretty well, though converting from object arrays in particular has downsides, since it cannot be done lazily with dask. # Current behavior of xarray Currently, xarray uses the same behavior on Python 2/3. The priority was faithfully round-tripping data from a particular version of Python to netCDF and back, which the current serialization behavior achieves: | Python version | NetCDF version | NumPy datatype | NetCDF datatype | | --------- | ---------- | -------------- | ------------ | | Python 2 | NETCDF3 | np.string_ / str | NC_CHAR | | Python 2 | NETCDF4 | np.string_ / str | NC_CHAR | | Python 3 | NETCDF3 | np.string_ / bytes | NC_CHAR | | Python 3 | NETCDF4 | np.string_ / bytes | NC_CHAR | | Python 2 | NETCDF3 | np.unicode_ / unicode | NC_CHAR with UTF-8 encoding | | Python 2 | NETCDF4 | np.unicode_ / unicode | NC_STRING | | Python 3 | NETCDF3 | np.unicode_ / str | NC_CHAR with UTF-8 encoding | | Python 3 | NETCDF4 | np.unicode_ / str | NC_STRING | | Python 2 | NETCDF3 | object bytes/str | NC_CHAR | | Python 2 | NETCDF4 | object bytes/str | NC_CHAR | | Python 3 | NETCDF3 | object bytes | NC_CHAR | | Python 3 | NETCDF4 | object bytes | NC_CHAR | | Python 2 | NETCDF3 | object unicode | NC_CHAR with UTF-8 encoding | | Python 2 | NETCDF4 | object unicode | NC_STRING | | Python 3 | NETCDF3 | object unicode/str | NC_CHAR with UTF-8 encoding | | Python 3 | NETCDF4 | object unicode/str | NC_STRING | This can also be selected explicitly for most data-types by setting dtype in encoding: - `'S1'` for NC_CHAR (with or without encoding) - `str` for NC_STRING (though I'm not 100% sure it works properly currently when given bytes) Script for generating table:
```python from __future__ import print_function import xarray as xr import uuid import netCDF4 import numpy as np import sys for dtype_name, value in [ ('np.string_ / ' + type(b'').__name__, np.array([b'abc'])), ('np.unicode_ / ' + type(u'').__name__, np.array([u'abc'])), ('object bytes/' + type(b'').__name__, np.array([b'abc'], dtype=object)), ('object unicode/' + type(u'').__name__, np.array([u'abc'], dtype=object)), ]: for format in ['NETCDF3_64BIT', 'NETCDF4']: filename = str(uuid.uuid4()) + '.nc' xr.Dataset({'data': value}).to_netcdf(filename, format=format) with netCDF4.Dataset(filename) as f: var = f.variables['data'] disk_dtype = var.dtype has_encoding = hasattr(var, '_Encoding') disk_dtype_name = (('NC_CHAR' if disk_dtype == 'S1' else 'NC_STRING') + (' with UTF-8 encoding' if has_encoding else '')) print('|', 'Python %i' % sys.version_info[0], '|', format[:7], '|', dtype_name, '|', disk_dtype_name, '|') ```
# Potential alternatives The main option I'm considering is switching to default to `NC_CHAR` with UTF-8 encoding for np.string_ / str and object bytes/str on Python 2. The current behavior could be explicitly toggled by setting an encoding of `{'_Encoding': None}`. This would imply two changes: 1. Attempting to serialize arbitrary bytes (on Python 2) would start raising an error -- anything that isn't ASCII would require explicitly disabling `_Encoding`. 2. Strings read back from disk on Python 2 would come back as unicode instead of bytes. This implicit conversion would be consistent with Python 2's general handling of bytes/unicode, and facilitate reading netCDF files on Python 3 that were written with Python 2. The counter-argument is that it may not be worth changing this at this late point, given that we will be sunsetting Python 2 support by year's end.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2059/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 715374721,MDU6SXNzdWU3MTUzNzQ3MjE=,4490,Group together decoding options into a single argument,1217238,open,0,,,6,2020-10-06T06:15:18Z,2020-10-29T04:07:46Z,,MEMBER,,,,"**Is your feature request related to a problem? Please describe.** `open_dataset()` currently has a _very_ long function signature. This makes it hard to keep track of everything it can do, and is particularly problematic for the authors of _new_ backends (e.g., see https://github.com/pydata/xarray/pull/4477), which might need to know how to handle all these arguments. **Describe the solution you'd like** To simple the interface, I propose to group together all the decoding options into a new `DecodingOptions` class. I'm thinking something like: ```python from dataclasses import dataclass, field, asdict from typing import Optional, List @dataclass(frozen=True) class DecodingOptions: mask: Optional[bool] = None scale: Optional[bool] = None datetime: Optional[bool] = None timedelta: Optional[bool] = None use_cftime: Optional[bool] = None concat_characters: Optional[bool] = None coords: Optional[bool] = None drop_variables: Optional[List[str]] = None @classmethods def disabled(cls): return cls(mask=False, scale=False, datetime=False, timedelta=False, concat_characters=False, coords=False) def non_defaults(self): return {k: v for k, v in asdict(self).items() if v is not None} # add another method for creating default Variable Coder() objects, # e.g., those listed in encode_cf_variable() ``` The signature of `open_dataset` would then become: ```python def open_dataset( filename_or_obj, group=None, * engine=None, chunks=None, lock=None, cache=None, backend_kwargs=None, decode: Union[DecodingOptions, bool] = None, **deprecated_kwargs ): if decode is None: decode = DecodingOptions() if decode is False: decode = DecodingOptions.disabled() # handle deprecated_kwargs... ... ``` **Question**: are `decode` and `DecodingOptions` the right names? Maybe these should still include the name ""CF"", e.g., `decode_cf` and `CFDecodingOptions`, given that these are specific to CF conventions? **Note**: the current signature is `open_dataset(filename_or_obj, group=None, decode_cf=True, mask_and_scale=None, decode_times=True, autoclose=None, concat_characters=True, decode_coords=True, engine=None, chunks=None, lock=None, cache=None, drop_variables=None, backend_kwargs=None, use_cftime=None, decode_timedelta=None)` Usage with the new interface would look like `xr.open_dataset(filename, decode=False)` or `xr.open_dataset(filename, decode=xr.DecodingOptions(mask=False, scale=False))`. This requires a _little_ bit more typing than what we currently have, but it has a few advantages: 1. It's easier to understand the role of different arguments. Now there is a function with ~8 arguments and a class with ~8 arguments rather than a function with ~15 arguments. 2. It's easier to add new decoding arguments (e.g., for more advanced CF conventions), because they don't clutter the `open_dataset` interface. For example, I separated out `mask` and `scale` arguments, versus the current `mask_and_scale` argument. 3. If a new backend plugin for `open_dataset()` needs to handle every option supported by `open_dataset()`, this makes that task significantly easier. The only decoding options they need to worry about are _non-default_ options that were explicitly set, i.e., those exposed by the `non_defaults()` method. If another decoding option wasn't explicitly set and isn't recognized by the backend, they can just ignore it. **Describe alternatives you've considered** For the overall approach: 1. We could keep the current design, with separate keyword arguments for decoding options, and just be very careful about passing around these arguments. This seems pretty painful for the backend refactor, though. 2. We could keep the current design only for the user facing `open_dataset()` interface, and then internally convert into the `DecodingOptions()` struct for passing to backend constructors. This would provide much needed flexibility for backend authors, but most users wouldn't benefit from the new interface. Perhaps this would make sense as an intermediate step? ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4490/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 253107677,MDU6SXNzdWUyNTMxMDc2Nzc=,1527,"Binary operations with ds.groupby('time.dayofyear') errors out, but ds.groupby('time.month') works",1217238,open,0,,,10,2017-08-26T16:54:53Z,2020-09-29T10:05:42Z,,MEMBER,,,,"Reported on the mailing list: Original datasets: ``` >>> ds_xr array([-0.01, -0.01, -0.01, ..., -0.27, -0.27, -0.27]) Coordinates: * time (time) datetime64[ns] 1979-01-01 1979-01-02 1979-01-03 ... >>> slope_itcp_ds Dimensions: (lat: 73, level: 2, lon: 144, time: 366) Coordinates: * lon (lon) float32 0.0 2.5 5.0 7.5 10.0 12.5 ... * lat (lat) float32 90.0 87.5 85.0 82.5 80.0 ... * level (level) float64 0.0 1.0 * time (time) datetime64[ns] 2010-01-01 ... Data variables: __xarray_dataarray_variable__ (time, level, lat, lon) float64 -0.8795 ... Attributes: CDI: Climate Data Interface version 1.7.1 (http://mpimet.mpg.de/... Conventions: CF-1.4 history: Fri Aug 25 18:55:50 2017: cdo -inttime,2010-01-01,00:00:00,... CDO: Climate Data Operators version 1.7.1 (http://mpimet.mpg.de/... ``` Issue: Grouping by month works and outputs this: ``` >>> ds_xr.groupby('time.month') - slope_itcp_ds.groupby('time.month').mean('time') Dimensions: (lat: 73, level: 2, lon: 144, time: 12775) Coordinates: * lon (lon) float32 0.0 2.5 5.0 7.5 10.0 12.5 ... * lat (lat) float32 90.0 87.5 85.0 82.5 80.0 ... * level (level) float64 0.0 1.0 month (time) int64 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ... * time (time) datetime64[ns] 1979-01-01 ... Data variables: __xarray_dataarray_variable__ (time, level, lat, lon) float64 1.015 ... ``` Grouping by dayofyear doesn't work and gives this traceback: ``` >>> ds_xr.groupby('time.dayofyear') - slope_itcp_ds.groupby('time.dayofyear').mean('time') KeyError Traceback (most recent call last) in () ----> 1 ds_xr.groupby('time.dayofyear') - slope_itcp_ds.groupby('time.dayofyear').mean('time') /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/groupby.py in func(self, other) 316 g = f if not reflexive else lambda x, y: f(y, x) 317 applied = self._yield_binary_applied(g, other) --> 318 combined = self._combine(applied) 319 return combined 320 return func /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/groupby.py in _combine(self, applied, shortcut) 532 combined = self._concat_shortcut(applied, dim, positions) 533 else: --> 534 combined = concat(applied, dim) 535 combined = _maybe_reorder(combined, dim, positions) 536 /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in concat(objs, dim, data_vars, coords, compat, positions, indexers, mode, concat_over) 118 raise TypeError('can only concatenate xarray Dataset and DataArray ' 119 'objects, got %s' % type(first_obj)) --> 120 return f(objs, dim, data_vars, coords, compat, positions) 121 122 /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions) 210 datasets = align(*datasets, join='outer', copy=False, exclude=[dim]) 211 --> 212 concat_over = _calc_concat_over(datasets, dim, data_vars, coords) 213 214 def insert_result_variable(k, v): /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in _calc_concat_over(datasets, dim, data_vars, coords) 190 if dim in v.dims) 191 concat_over.update(process_subset_opt(data_vars, 'data_vars')) --> 192 concat_over.update(process_subset_opt(coords, 'coords')) 193 if dim in datasets[0]: 194 concat_over.add(dim) /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in process_subset_opt(opt, subset) 165 for ds in datasets[1:]) 166 # all nonindexes that are not the same in each dataset --> 167 concat_new = set(k for k in getattr(datasets[0], subset) 168 if k not in concat_over and differs(k)) 169 elif opt == 'all': /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in (.0) 166 # all nonindexes that are not the same in each dataset 167 concat_new = set(k for k in getattr(datasets[0], subset) --> 168 if k not in concat_over and differs(k)) 169 elif opt == 'all': 170 concat_new = (set(getattr(datasets[0], subset)) - /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in differs(vname) 163 v = datasets[0].variables[vname] 164 return any(not ds.variables[vname].equals(v) --> 165 for ds in datasets[1:]) 166 # all nonindexes that are not the same in each dataset 167 concat_new = set(k for k in getattr(datasets[0], subset) /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in (.0) 163 v = datasets[0].variables[vname] 164 return any(not ds.variables[vname].equals(v) --> 165 for ds in datasets[1:]) 166 # all nonindexes that are not the same in each dataset 167 concat_new = set(k for k in getattr(datasets[0], subset) /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/utils.py in __getitem__(self, key) 288 289 def __getitem__(self, key): --> 290 return self.mapping[key] 291 292 def __iter__(self): KeyError: 'lon' ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1527/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 479940669,MDU6SXNzdWU0Nzk5NDA2Njk=,3212,Custom fill_value for from_dataframe/from_series,1217238,open,0,,,0,2019-08-13T03:22:46Z,2020-04-06T20:40:26Z,,MEMBER,,,,"It would be to have the option to customize the fill value when creating an xarray objects from pandas, instead of requiring to always be NaN. This would probably be especially useful when creating sparse arrays (https://github.com/pydata/xarray/issues/3206), for which it often makes sense to use a fill value of zero. If your data has integer values (e.g., it represents counts), you probably don't want to let it be cast to float first.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3212/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 314482923,MDU6SXNzdWUzMTQ0ODI5MjM=,2061,Backend specific conventions decoding,1217238,open,0,,,1,2018-04-16T02:45:46Z,2020-04-05T23:42:34Z,,MEMBER,,,,"Currently, we have a single function `xarray.decode_cf()` that we apply to data loaded from all xarray backends. This is appropriate for netCDF data, but it's not appropriate for backends with different implementations. For example, it doesn't work for zarr (which is why we have the separate `open_zarr`), and is also a poor fit for PseudoNetCDF (https://github.com/pydata/xarray/pull/1905). In the worst cases (e.g., for PseudoNetCDF) it can actually result in data being decoded *twice*, which can result in incorrectly scaled data. Instead, we should declare default decoders as part of the backend API, and use those decoders as the defaults for `open_dataset()`. This should probably be tackled as part of the broader backends refactor: https://github.com/pydata/xarray/issues/1970 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2061/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 173612265,MDU6SXNzdWUxNzM2MTIyNjU=,988,Hooks for custom attribute handling in xarray operations,1217238,open,0,,,24,2016-08-27T19:48:22Z,2020-04-05T18:19:11Z,,MEMBER,,,,"Over in #964, I am working on a rewrite/unification of the guts of xarray's logic for computation with labelled data. The goal is to get all of xarray's internal logic for working with labelled data going through a minimal set of flexible functions which we can also expose as part of the API. Because we will finally have all (or at least nearly all) xarray operations using the same code path, I think it will also finally become feasible to open up hooks allowing extensions how xarray handles metadata. Two obvious use cases here are units (#525) and automatic maintenance of metadata (e.g., [`cell_methods`](https://github.com/pydata/xarray/issues/987#issuecomment-242912131) or [`history`](#826) fields). Both of these are out of scope for xarray itself, mostly because the specific logic tends to be domain specific. This could also subsume options like the existing `keep_attrs` on many operations. I like the idea of supporting something like NumPy's [`__array_wrap__`](http://docs.scipy.org/doc/numpy-1.11.0/reference/arrays.classes.html#numpy.class.__array_wrap__) to allow third-party code to finalize xarray objects in some way before they are returned. However, it's not obvious to me what the right design is. - Should we lookup a custom attribute on subclasses like `__array_wrap__` (or `__numpy_ufunc__`) in NumPy, or should we have a system (e.g., unilaterally or with a context manager and `xarray.set_options`) for registering hooks that are then checked on _all_ xarray objects? I am inclined toward the later, even though it's a little slower, just because it will be simpler and easier to get right - Should these methods be able to control the full result objects, or only set `attrs` and/or `name`? - To be useful, do we need to allow extensions to take control of the full operation, to support things like automatic unit conversion? This would suggest something closing to `__numpy_ufunc__`, which is a little more ambitious than what I had in mind here. Feedback would be greatly appreciated. CC @darothen @rabernat @jhamman @pwolfram ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/988/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 296120524,MDU6SXNzdWUyOTYxMjA1MjQ=,1901,Update assign to preserve order for **kwargs,1217238,open,0,,,1,2018-02-10T18:05:45Z,2020-02-10T19:44:20Z,,MEMBER,,,,"In Python 3.6+, keyword arguments preserve the order in which they are written. We should update `assign` and `assign_coords` to rely on this in the next major release, as has been done in pandas: https://github.com/pandas-dev/pandas/issues/14207","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1901/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 398107776,MDU6SXNzdWUzOTgxMDc3NzY=,2666,Dataset.from_dataframe will produce a FutureWarning for DatetimeTZ data,1217238,open,0,,,6,2019-01-11T02:45:49Z,2019-12-30T22:58:23Z,,MEMBER,,,,"This appears with the development version of pandas; see https://github.com/pandas-dev/pandas/issues/24716 for details. Example: ``` In [16]: df = pd.DataFrame({""A"": pd.date_range('2000', periods=12, tz='US/Central')}) In [17]: df.to_xarray() /Users/taugspurger/Envs/pandas-dev/lib/python3.7/site-packages/xarray/core/dataset.py:3111: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'. To accept the future behavior, pass 'dtype=object'. To keep the old behavior, pass 'dtype=""datetime64[ns]""'. data = np.asarray(series).reshape(shape) Out[17]: Dimensions: (index: 12) Coordinates: * index (index) int64 0 1 2 3 4 5 6 7 8 9 10 11 Data variables: A (index) datetime64[ns] 2000-01-01T06:00:00 ... 2000-01-12T06:00:00 ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2666/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 96211612,MDU6SXNzdWU5NjIxMTYxMg==,486,API for multi-dimensional resampling/regridding,1217238,open,0,,,32,2015-07-21T02:38:29Z,2019-11-06T18:00:52Z,,MEMBER,,,,"This notebook by @kegl shows a nice example of how to use pyresample with xray: https://www.lri.fr/~kegl/Ramps/edaElNino.html#Downsampling It would nice to build a wrapper for this machinery directly into xray in some way. xref #475 cc @jhamman @rabernat ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/486/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 269348789,MDU6SXNzdWUyNjkzNDg3ODk=,1668,Remove use of allow_cleanup_failure in test_backends.py,1217238,open,0,,,6,2017-10-28T20:47:31Z,2019-09-29T20:07:03Z,,MEMBER,,,,"This exists for the benefit of Windows, on which trying to delete an open file results in an error. But really, it would be nice to have a test suite that doesn't leave any temporary files hanging around. The main culprit is tests like this, where opening a file triggers an error: ```python with raises_regex(TypeError, 'pip install netcdf4'): open_dataset(tmp_file, engine='scipy') ``` The way to fix this is to use mocking of some sort, to intercept calls to backend file objects and close them afterwards.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1668/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 317362786,MDU6SXNzdWUzMTczNjI3ODY=,2078,apply_ufunc should include variable names in error messages,1217238,open,0,,,4,2018-04-24T19:26:13Z,2019-08-26T18:10:23Z,,MEMBER,,,,"This would make it easier to debug issues with dimensions. For example, in [this example](https://stackoverflow.com/questions/49959449/how-to-use-xr-apply-ufunc-with-changing-dimensions) from StackOverflow, the error message was `ValueError: operand to apply_ufunc has required core dimensions ['time', 'lat', 'lon'], but some of these are missing on the input variable: ['lat', 'lon']`. A better error message would be: `ValueError: operand to apply_ufunc has required core dimensions ['time', 'lat', 'lon'], but some of these are missing on input variable 'status': ['lat', 'lon']`","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2078/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 188113943,MDU6SXNzdWUxODgxMTM5NDM=,1097,"Better support for subclasses: tests, docs and API",1217238,open,0,,,14,2016-11-08T21:54:00Z,2019-08-22T13:07:44Z,,MEMBER,,,,"Given that people *do* currently subclass xarray objects, it's worth considering making a subclass API like pandas: http://pandas.pydata.org/pandas-docs/stable/internals.html#subclassing-pandas-data-structures At the very least, it would be nice to have docs that describe how/when it's safe to subclass, and tests that verify our support for such subclasses.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1097/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 292000828,MDU6SXNzdWUyOTIwMDA4Mjg=,1861,Add an example page to the docs on geospatial filtering/indexing,1217238,open,0,,,0,2018-01-26T19:07:11Z,2019-07-12T02:53:53Z,,MEMBER,,,,"We cover standard time-series stuff pretty well in the ""Toy weather data"" example, but geospatial filtering/indexing questions come up all the time aren't well covered. Topics could include: - How to filter out a region of interest (`sel()` with `slice` and `where(..., drop=True)`) - How to align two gridded datasets in space. - How to sample a gridded dataset at a list of station locations - How to resample a dataset to a new resolution (possibly referencing [xESMF](https://github.com/JiaweiZhuang/xESMF)) Not all of these are as smooth as they could be, but hopefully that will clearly point to where we have room for improvement in our APIs :).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1861/reactions"", ""total_count"": 6, ""+1"": 6, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 35633124,MDU6SXNzdWUzNTYzMzEyNA==,155,Expose a public interface for CF encoding/decoding functions,1217238,open,0,,,3,2014-06-12T23:33:42Z,2019-02-04T04:17:40Z,,MEMBER,,,,"Relevant discussion: #153 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/155/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 403504120,MDU6SXNzdWU0MDM1MDQxMjA=,2719,Should xarray.align sort indexes in alignment?,1217238,open,0,,,1,2019-01-27T01:51:29Z,2019-01-28T18:03:53Z,,MEMBER,,,,"I noticed in https://github.com/pandas-dev/pandas/issues/24959 (which turned up as a failure in our test suite) that pandas sorts by default in `Index.union` and now `Index.intersection`, *unless* the indexes are the same or either index has duplicates. (These aspects are probably bugs.) It occurs to me that we should make an intentional choice about sorting in `xarray.align()`, rather than merely following the whims of changed upstream behavior. Note that `align()` is called internally by all xarray operations that combine multiple objects (e.g., in arithmetic). My proposal is to use ""order of appearance"" and *not* sort by default, but add a `sort` keyword argument to allow users to control this. Reasons for the default behavior of not sorting: 1. Sorting can't be undone if the original order is lost, so this preserve maximum flexibility for users. 2. This matches how we handle the ordering of dimensions in broadcasting. 3. Pandas is quite inconsistent with how it applies sorting and we don't want to copy that in xarray. We definitely don't want to sort in all cases by default (e.g., if objects have the same index), so we should avoid sorting in others.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2719/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 316448044,MDU6SXNzdWUzMTY0NDgwNDQ=,2069,to_netcdf() should not implicitly load dask arrays of strings into memory,1217238,open,0,,,0,2018-04-21T00:57:23Z,2019-01-13T01:41:20Z,,MEMBER,,,,"As discussed in https://github.com/pydata/xarray/pull/2058#discussion_r181606513, we should have an explicit interface of some sort, either via encoding or some new keyword argument to `to_netcdf()`.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2069/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue