html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/3213#issuecomment-1534695467,https://api.github.com/repos/pydata/xarray/issues/3213,1534695467,IC_kwDOAMm_X85beZgr,1634164,2023-05-04T12:31:22Z,2023-05-04T12:31:22Z,NONE,"That's a totally valid scope limitation for the sparse package, and I understand the motivation.

I'm just saying that the [principle of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment) is not being followed: the user cannot at the moment read either the xarray or sparse docs and know which portions of the xarray API will work when giving `…, sparse=True`, and which instead require a deliberate choice to densify, or see examples of how best to mix the two. It would be helpful to clarify—that's all.","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,479942077
https://github.com/pydata/xarray/issues/3213#issuecomment-1534231523,https://api.github.com/repos/pydata/xarray/issues/3213,1534231523,IC_kwDOAMm_X85bcoPj,1634164,2023-05-04T07:40:26Z,2023-05-04T07:40:26Z,NONE,"@jbbutler please also see this comment et seq. https://github.com/pydata/sparse/issues/1#issuecomment-792342987 and related pydata/sparse#438.

To add to @rabernat's point about sparse support being ""not well documented"", I suspect (but don't know, as I'm just a user of xarray, not a developer) that it's also not thoroughly *tested*. I expected to be able to use e.g. `DataArray.cumprod` when the underlying data was sparse, but could not.

IMHO, I/O to/from sparse-backed objects is less valuable if only a small subset of xarray functionality is available on those objects. Perhaps explicitly testing/confirming which parts of the API do/do not currently work with sparse would support the improvements to the docs that Ryan mentioned, and reveal the work remaining to provide full(er) support.

","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,479942077
https://github.com/pydata/xarray/issues/6822#issuecomment-1209127586,https://api.github.com/repos/pydata/xarray/issues/6822,1209127586,IC_kwDOAMm_X85IEdKi,1634164,2022-08-09T09:17:39Z,2022-08-09T09:17:39Z,NONE,Thanks @Illviljan for the fix! 🙏🏾 ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1316423844
https://github.com/pydata/xarray/issues/5648#issuecomment-896866803,https://api.github.com/repos/pydata/xarray/issues/5648,896866803,IC_kwDOAMm_X841dRnz,1634164,2021-08-11T14:18:05Z,2021-08-11T14:18:05Z,NONE,👂🏾 ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,956103236
https://github.com/pydata/xarray/issues/3381#issuecomment-541160571,https://api.github.com/repos/pydata/xarray/issues/3381,541160571,MDEyOklzc3VlQ29tbWVudDU0MTE2MDU3MQ==,1634164,2019-10-11T17:49:09Z,2019-10-11T17:49:09Z,NONE,"Thanks both for the comments. I understand sparse's behaviour; to clarify, the bug (IMO) is that xarray doesn't handle this for the user. To condense my example:
```python
# Same as above to ---
import numpy as np
import pandas as pd
import xarray as xr

foo = [f'foo{i}' for i in range(6)]
bar = [f'bar{i}' for i in range(6)]
raw = np.random.rand(len(foo) // 2, len(bar))

b_series = pd.DataFrame(raw, index=foo[3:], columns=bar) \
             .stack() \
             .rename_axis(index=['foo', 'bar'])
# ---

b = xr.DataArray.from_series(b_series, sparse=True)
c = b.sum(dim='foo').expand_dims({'foo': ['total']})
d = xr.concat([b, c], dim='foo')
```

This succeeds when `sparse=False` and fails when `sparse=True`.
- Shouldn't it succeed automatically? I feel like it should.
- If it does, what should be the fill value on `d`?  I'm not clear what the intended behaviour is.

I haven't touched xarray internals before, but if time allows I will try to add some tests.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,503711327
https://github.com/pydata/xarray/issues/3245#issuecomment-539215442,https://api.github.com/repos/pydata/xarray/issues/3245,539215442,MDEyOklzc3VlQ29tbWVudDUzOTIxNTQ0Mg==,1634164,2019-10-07T21:37:53Z,2019-10-07T21:37:53Z,NONE,"As far as I can tell, the proposal here will require either
```python
s = pd.Series(...)
xr.DataArray.from_series(s).to_series()
```
or:
```python
xr.DataArray.from_series(s, sparse=True).to_dense().to_series()
```

For any code that can't guarantee sparse/non-sparse input, the first will fail sometimes, so it will always be necessary to write the latter everywhere, which IMO is unnecessarily verbose.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,484240082
https://github.com/pydata/xarray/issues/3213#issuecomment-520741706,https://api.github.com/repos/pydata/xarray/issues/3213,520741706,MDEyOklzc3VlQ29tbWVudDUyMDc0MTcwNg==,1634164,2019-08-13T08:31:30Z,2019-08-13T08:31:30Z,NONE,"This is very exciting! In energy-economic research (unlike, e.g., earth systems research), data are almost *always* sparse, so first-class sparse support will be broadly useful.

I'm leaving a comment here (since this seems to be a meta-issue; please link from wherever else, if needed) with two example use-cases. For the moment, #3206 seems to cover them, so I can't name any specific additional features.

1. [MESSAGEix](http://message.iiasa.ac.at/en/stable/) is an energy systems optimization model framework, formulated as a linear program.
   - Some variables have many dimensions, for instance, the input coefficient for a technology has the dimensions `(node_loc, technology, year_vintage, year_active, mode, node_origin, commodity, level, time, time_origin)`.
     - In the global version of our model, the `technology` dimension has over 400 labels.
     - Often two or more dimensions are tied, eg `technology='coal power plant'` will only take input from `(commodity='coal', level='primary energy')`; all other combinations of `(commodity, level)` are empty for this `technology`.
     - So, this data is inherently sparse.
   - For modeling research, specifying quantities in this way is a good design because (a) it is intuitive to researchers in this domain, and (b) the optimization model is solved using various LP solvers via GAMS, which automatically prune zero rows in the resulting matrices.
    - When we were developing a dask/DAG-based [system for model results post-processing](http://message.iiasa.ac.at/en/stable/reporting.html), we wanted to use xarray, but had some quantities with tens of millions of elements that were less than 1% full. [Here is some test code](https://github.com/iiasa/ixmp/blob/82ae6c92a2076a25d54d64a04c20aff653f4309b/tests/test_reporting.py#L430-L489) that triggered MemoryErrors using xarray. We chose to fall back on using a pd.Series subclass that mocks xarray methods.

2. In transportation research, stock models of vehicle fleets are often used.
    - These models always have at least two time dimensions: `cohort` (the time period in which a vehicle was sold) and `period`(s) in which it is used (and thus consumes fuel, etc.).
    - Since a vehicle sold in 2020 can't be used in 2015, these data are always triangular w.r.t. these two dimensions. (The dimensions `year_vintage` and `year_active` in example #1 above have the same relationship.)
    - Once multiplied by other dimensions (technology; fuel; size or shape or market segment; embodied materials; different variables; model runs across various scenarios or input assumptions) the overhead of dense arrays can become problematic.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,479942077
https://github.com/pydata/xarray/issues/1761#issuecomment-372148226,https://api.github.com/repos/pydata/xarray/issues/1761,372148226,MDEyOklzc3VlQ29tbWVudDM3MjE0ODIyNg==,1634164,2018-03-11T20:53:49Z,2018-03-11T20:59:08Z,NONE,"~Also experiencing this, though for a different method & version of bottleneck:~

Sorry, turns out this was due to a Python 3.5 → 3.6 upgrade without re-install of pip packages. Please disregard!

```
$ pip list | grep -Ei ""(bottleneck|xarray)""
Bottleneck                         1.2.1       
xarray                             0.10.1      
$ python3 -c ""import xarray""
Traceback (most recent call last):
  File ""<string>"", line 1, in <module>
  File ""/home/khaeru/.local/lib/python3.6/site-packages/xarray/__init__.py"", line 10, in <module>
    from .core.extensions import (register_dataarray_accessor,
  File ""/home/khaeru/.local/lib/python3.6/site-packages/xarray/core/extensions.py"", line 7, in <module>
    from .dataarray import DataArray
  File ""/home/khaeru/.local/lib/python3.6/site-packages/xarray/core/dataarray.py"", line 16, in <module>
    from . import rolling
  File ""/home/khaeru/.local/lib/python3.6/site-packages/xarray/core/rolling.py"", line 377, in <module>
    inject_bottleneck_rolling_methods(DataArrayRolling)
  File ""/home/khaeru/.local/lib/python3.6/site-packages/xarray/core/ops.py"", line 362, in inject_bottleneck_rolling_methods
    f = getattr(bn, bn_name)
AttributeError: module 'bottleneck' has no attribute 'move_sum'
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,279456192
https://github.com/pydata/xarray/pull/401#issuecomment-221960807,https://api.github.com/repos/pydata/xarray/issues/401,221960807,MDEyOklzc3VlQ29tbWVudDIyMTk2MDgwNw==,1634164,2016-05-26T18:51:06Z,2016-05-26T18:51:06Z,NONE,"@jhamman thanks for taking this up and finishing it!
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,70805273
https://github.com/pydata/xarray/pull/806#issuecomment-202527380,https://api.github.com/repos/pydata/xarray/issues/806,202527380,MDEyOklzc3VlQ29tbWVudDIwMjUyNzM4MA==,1634164,2016-03-28T18:52:01Z,2016-03-28T18:53:35Z,NONE,"@fmaussion that's still helpful, thanks.

> For cases where you want to do custom initialization, the suggestion (which I should add) is to simply write your own function to use in place of `xarray.open_dataset`.

Now that I think of it, it should also be possible to use some other in logic in `__init__()`—or even as a kludge store something like `xarray_obj.attrs['_geoaccessor_state']`—to determine whether the object is already, or needs to be, ""initialized"" (whatever that happens to mean for each accessor).

For instance, if the accessor creates and uses certain variables in a Dataset, it could check for their presence, and skip any initialization code if they already exist.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,143877458
https://github.com/pydata/xarray/pull/806#issuecomment-202473784,https://api.github.com/repos/pydata/xarray/issues/806,202473784,MDEyOklzc3VlQ29tbWVudDIwMjQ3Mzc4NA==,1634164,2016-03-28T16:31:59Z,2016-03-28T16:31:59Z,NONE,"Of the two different projects I'm working (sporadically) on that both subclass Dataset, it seems like one (pyGDX) should more properly be a backend, while the other could work as an accessor. This code looks good!

Just to be clear—`xarray_obj` is passed to the `__init__()` method of an accessor. Will this happen before, or after `Dataset.__init__()`/`DataArray.__init__()` is invoked?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,143877458
https://github.com/pydata/xarray/issues/805#issuecomment-202469160,https://api.github.com/repos/pydata/xarray/issues/805,202469160,MDEyOklzc3VlQ29tbWVudDIwMjQ2OTE2MA==,1634164,2016-03-28T16:19:05Z,2016-03-28T16:19:05Z,NONE,"@jhamman — you're right. In truth, I was working with some more complex code using a PeriodIndex and getting errors I couldn't decipher, so I pulled those lines from the docs and played with them to try to understand what was happening. I don't know why it's that way in the docs…maybe because `ds['reference_time']` or `ds.reference_time` is more concise than `ds.attrs['reference_time']`?

@shoyer — thanks!
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,143764621
https://github.com/pydata/xarray/pull/401#issuecomment-96086024,https://api.github.com/repos/pydata/xarray/issues/401,96086024,MDEyOklzc3VlQ29tbWVudDk2MDg2MDI0,1634164,2015-04-24T22:40:17Z,2015-04-24T22:40:17Z,NONE,"Thanks—putting this up was evidently the fastest ways to get pointers to those examples in the code!

I'll add those items and comment again once I have.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,70805273