id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 342180429,MDU6SXNzdWUzNDIxODA0Mjk=,2298,Making xarray math lazy,1217238,open,0,,,7,2018-07-18T05:18:53Z,2022-04-19T15:38:59Z,,MEMBER,,,,"At SciPy, I had the realization that it would be relatively straightforward to make element-wise math between xarray objects lazy. This would let us support lazy coordinate arrays, a feature that has quite a few use-cases, e.g., for both geoscience and astronomy. The trick would be to write a lazy array class that holds an element-wise vectorized function and passes indexers on to its arguments. I haven't thought too hard about this yet for vectorized indexing, but it could be quite efficient for outer indexing. I have some prototype code but no tests yet. The question is how to hook this into xarray operations. In particular, supposing that the inputs to a function do no hold dask arrays: - Should we try to make *every* element-wise operation with vectorized functions (ufuncs) lazy by default? This might have negative performance implications and would be a little tricky to implement with xarray's current code, since we still implement binary operations like `+` with separate logic from `apply_ufunc`. - Should we make every element-wise operation that explicitly uses `apply_ufunc()` lazy by default? - Or should we only make element-wise operations lazy with `apply_ufunc()` if you use some special flag, e.g., `apply_ufunc(..., lazy=True)`? I am leaning towards the last option for now but would welcome other opinions.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2298/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 864249974,MDU6SXNzdWU4NjQyNDk5NzQ=,5202,Make creating a MultiIndex in stack optional,1217238,closed,0,,,7,2021-04-21T20:21:03Z,2022-03-17T17:11:42Z,2022-03-17T17:11:42Z,MEMBER,,,,"As @Hoeze notes in https://github.com/pydata/xarray/issues/5179, calling `stack()` can be ""incredibly slow and memory-demanding, since it creates a MultiIndex of every possible coordinate in the array."" This is true with how `stack()` works currently, but I'm not sure this is necessary. I suspect it's a vestigial design choice from copying pandas, back from before Xarray had optional indexes. One benefit is that it's convenient for making `unstack()` the inverse of `stack()`, but isn't always required. Regardless of how we define the semantics for boolean indexing (https://github.com/pydata/xarray/issues/1887), it seems like it could be a good idea to allow stack to skip creating a MultiIndex for the new dimension, via a new keyword argument such as `ds.stack(index=False)`. This would be equivalent to calling `reset_index()` after `stack()` but would be cheaper because the MultiIndex is never created in the first place.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5202/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 416554477,MDU6SXNzdWU0MTY1NTQ0Nzc=,2797,Stalebot is being overly aggressive,1217238,closed,0,,,7,2019-03-03T19:37:37Z,2021-06-03T21:31:46Z,2021-06-03T21:22:48Z,MEMBER,,,,"E.g., see https://github.com/pydata/xarray/issues/1151 where stalebot closed an issue even after another comment. Is this something we need to reconfigure or just a bug? cc @pydata/xarray ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2797/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 645154872,MDU6SXNzdWU2NDUxNTQ4NzI=,4179,Consider revising our minimum dependency version policy,1217238,closed,0,,,7,2020-06-25T05:04:38Z,2021-02-22T05:02:25Z,2021-02-22T05:02:25Z,MEMBER,,,,"Our [current policy](http://xarray.pydata.org/en/stable/installing.html#minimum-dependency-versions) is that xarray supports ""the minor version (X.Y) initially published no more than N months ago"" where N is: - Python: 42 months (NEP 29) - numpy: 24 months (NEP 29) - pandas: 12 months - scipy: 12 months - sparse, pint and other libraries that rely on NEP-18 for integration: very latest available versions only, - all other libraries: 6 months I think this policy is too aggressive, particularly for pandas, SciPy and other libraries. Some of these projects can go 6+ months between minor releases. For example, version 2.3 of zarr is currently more than 6 months old. So if zarr released 2.4 *today* and xarray issued a new release *tomorrow*, and then our policy would dictate that we could ask users to upgrade to the new version. In https://github.com/pydata/xarray/pull/4178, I misinterpreted our policy as supporting ""the most recent minor version (X.Y) initially published more than N months ago"". This version makes a bit more sense to me: users only need to upgrade dependencies at least every N months to use the latest xarray release. I understand that NEP-29 chose its language intentionally, so that distributors know ahead of time when they can drop support for a Python or NumPy version. But this seems like a (very) poor fit for projects without regular releases. At the very least we should adjust the specific time windows. I'll see if I can gain some understanding of the motivation for this particular language over on the NumPy tracker...","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4179/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 644821435,MDU6SXNzdWU2NDQ4MjE0MzU=,4176,Pre-expand data and attributes in DataArray/Variable HTML repr?,1217238,closed,0,,,7,2020-06-24T18:22:35Z,2020-09-21T20:10:26Z,2020-06-28T17:03:40Z,MEMBER,,,,"## Proposal Given that a major purpose for plotting an array is to look at data or attributes, I wonder if we should expand these sections by default? - I worry that clicking on icons to expand sections may not be easy to discover - This would also be consistent with the text repr, which shows these sections by default (the Dataset repr is already consistent by default between text and HTML already) ## Context Currently the HTML repr for DataArray/Variable looks like this: ![image](https://user-images.githubusercontent.com/1217238/85610183-9e014400-b60b-11ea-8be1-5f9196126acd.png) To see array data, you have to click on the ![image](https://user-images.githubusercontent.com/1217238/85610286-b7a28b80-b60b-11ea-9496-a4f9d9b048ac.png) icon: ![image](https://user-images.githubusercontent.com/1217238/85610262-b1acaa80-b60b-11ea-9621-17f0bcffb885.png) (thanks to @max-sixty for making this a little bit more manageably sized in https://github.com/pydata/xarray/pull/3905!) There's also a really nice repr for nested dask arrays: ![image](https://user-images.githubusercontent.com/1217238/85610598-fcc6bd80-b60b-11ea-8b1a-5cf950449dcb.png) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4176/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 127068208,MDU6SXNzdWUxMjcwNjgyMDg=,719,Follow-ups on MultIndex support,1217238,closed,0,,,7,2016-01-17T01:42:59Z,2019-02-23T09:47:00Z,2019-02-23T09:47:00Z,MEMBER,,,,"xref #702 - [ ] Serialization to NetCDF - [x] Better repr, showing level names/dtypes? - [x] Indexing a scalar at a particular level should drop that level from the MultiIndex (#767) - [x] Make levels accessible as coordinate variables (e.g., `ds['time']` can pull out the `'time'` level of a multi-index) - [x] Support indexing with levels, e.g., `ds.sel(time='2000-01')`. - [x] ~~Make `isel_points`/`sel_points` return objects with a MultiIndex? (probably after the previous TODO, so we can preserve basic backwards compatibility)~~ (differed until we figure out #974) - [x] Add `set_index`/`reset_index`/`swaplevel` to make it easier to create and manipulate multi-indexes ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/719/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 68759727,MDU6SXNzdWU2ODc1OTcyNw==,392,Non-aggregating grouped operations on dask arrays are painfully slow to construct,1217238,closed,0,,,7,2015-04-15T18:45:28Z,2019-02-01T23:06:35Z,2019-02-01T23:06:35Z,MEMBER,,,,"These are both entirely lazy operations: ``` >>> %time res = ds.groupby('time.month').mean('time') CPU times: user 142 ms, sys: 20.3 ms, total: 162 ms Wall time: 159 ms >>> %time res = ds.groupby('time.month').apply(lambda x: x - x.mean()) CPU times: user 46.1 s, sys: 4.9 s, total: 51 s Wall time: 50.4 s ``` I suspect the issue (in part) is that [_interleaved_concat_slow](https://github.com/xray/xray/blob/e22468f51c2b6ceb0fc4d71b657ee64d0a0c315b/xray/core/ops.py#L113) indexes out single elements from each dask array along the grouped axis prior to concatenating them together (unit tests for interleaved_concat can be found [here](https://github.com/xray/xray/blob/e22468f51c2b6ceb0fc4d71b657ee64d0a0c315b/xray/test/test_ops.py#L81)). So we end up creating way too many small dask arrays. Profiling results on slightly smaller data are in [this gist](https://gist.github.com/shoyer/bfdda77549dcead3e996). It would be great if we could figure out a way to make this faster, because these sort of operations are a really nice show case for xray + dask. CC @mrocklin in case you have any ideas. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/392/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 148757289,MDU6SXNzdWUxNDg3NTcyODk=,824,Disable lock=True in open_mfdataset when reading netCDF3 files,1217238,closed,0,,,7,2016-04-15T20:14:07Z,2019-01-30T04:37:50Z,2019-01-30T04:37:36Z,MEMBER,,,,"This slows things down unnecessarily. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/824/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 33639540,MDU6SXNzdWUzMzYzOTU0MA==,133,Functions for converting to and from CDAT cdms2 variables,1217238,closed,0,,,7,2014-05-16T01:09:14Z,2015-04-24T22:39:03Z,2014-12-19T09:11:39Z,MEMBER,,,,"Apparently CDAT has a number of useful modules for working with weather and climate data, especially for things like computing climatologies (related: #112). There's no point in duplicating that work in xray, of course (also, climatologies may be too domain specific for xray), so we should make it possible to use both xray and CDAT interchangeably. Unfortunately, I haven't used CDAT, so it not obvious to me what the right interface is. Also, CDAT seems to be somewhat difficult (impossible?) to install as a Python library, so it may be hard to setup automated testing. CC @DamienIrving ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/133/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 58310637,MDU6SXNzdWU1ODMxMDYzNw==,328,Support out-of-core computation using dask,1217238,closed,0,,987654,7,2015-02-20T05:02:22Z,2015-04-17T21:03:12Z,2015-04-17T21:03:12Z,MEMBER,,,,"[Dask](https://github.com/ContinuumIO/dask) is a library for out of core computation somewhat similar to [biggus](https://github.com/scitools/biggus) in conception, but with slightly grander aspirations. For examples of how Dask could be applied to weather data, see this blog post by @mrocklin: http://matthewrocklin.com/blog/work/2015/02/13/Towards-OOC-Slicing-and-Stacking/ It would be interesting to explore using dask internally in xray, so that we can implement lazy/out-of-core aggregations, concat and groupby to complement the existing lazy indexing. This functionality would be quite useful for xray, and even more so than merely supporting datasets-on-disk (#199). A related issue is #79: we can easily imagine using Dask with groupby/apply to power out-of-core and multi-threaded computation. Todos for xray: - [x] refactor `Variable.concat` to make use of functions like `concatenate` and `stack` instead of in-place array modification (Dask arrays do not support mutation, for good reasons) - [x] refactor `reindex_variables` to not make direct use of mutation (e.g., by using `da.insert` below) - [x] add some sort of internal abstraction to represent ""computable"" arrays that are not necessarily `numpy.ndarray` objects (done: this is the `data` attribute) - [x] expose `reblock` in the public API - [x] load datasets into dask arrays from disk - [x] load dataset from multiple files into dask - [x] ~~some sort of API for user controlled lazy apply on dask arrays (using groupby, mostly likely)~~ (not necessary for initial release) - [x] save from dask arrays - [x] an API for lazy ufuncs like `sin` and `sqrt` - [x] robustly handle indexing along orthogonal dimensions if dask can't handle it directly. Todos for dask (to be clear, none of these are blockers for a proof of concept): - [x] support for NaN skipping aggregations - [x] ~~support for interleaved concatenation (necessary for transformations by group, which are quite common)~~ (turns out to be a one-liner with concatenate and take, see below) - [x] ~~support for something like `take_nd` from pandas: like `np.take`, but with -1 as a sentinel value for ""missing"" (necessary for many alignment operations)~~ `da.insert`, modeled after `np.insert` would solve this problem. - [x] ~~support ""orthogonal"" MATLAB-like array-based indexing along multiple dimensions~~ (taking along one axis at a time is close enough) - [x] `broadcast_to`: see https://github.com/numpy/numpy/pull/5371 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/328/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue