id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1617395129,I_kwDOAMm_X85gZ325,7601,groupby_bins groups not correctly applied with built-in methods,69774,closed,0,,,3,2023-03-09T14:44:15Z,2023-03-29T16:28:30Z,2023-03-29T16:28:30Z,NONE,,,,"### What happened?
### Setup
I want to calculate image statistics per chunk in one dimension. Let's assume a very small image for demonstration purposes:
```python
a = xr.DataArray(np.arange(12).reshape(6,2), dims=('x', 'y'))
a
```

Trying to chunk this into three subimages, I use these bins into the x dimension:
```python
x_bins = (0, 2, 4, 6)
```
I look at the groups this creates by default:
```python
for iv, g in a.groupby_bins('x', x_bins):
print(iv)
print(g)
```

I don't understand the use-case for this grouping, as it's missing the beginning and is having uneven sized last group (Obviously a follow-error from not including the first row).
To force the even chunking of the image I need to call it with these parameters:
```python
groups = a.groupby_bins('x', x_bins, include_lowest=True, right=False)
for iv, g in groups:
print(iv)
print(g)
```

### Issue
But now, calculating the mean value of each group, I get different results when doing it by hand using the groups or doing it using the groups inherent method `mean()`:

Indeed, I verified, that these results are what one gets, using the first version of applying the bins:

The same is true when I use the elliptical operator to receive the mean over the remaining dimensions (note, the 2nd cell here is using the `groups` variable as defined in the cell before, so should really return the same values, but it doesn't:

### Application
I believe that `groupby_bins` is the most appropriate tool to do this in xarray. I wished that one could enforce the dask-chunks in dask arrays to survive and return stats from them, but haven't found a way to do that.
### What did you expect to happen?
That the inherent stats methods of the `groups` method respect the interval use constraints from the `groupby_bins` call.
I also have verified that the same problem exists with `groups.std()`.
### Minimal Complete Verifiable Example
```Python
import xarray as xr
import numpy as np
a = xr.DataArray(np.arange(12).reshape(6,2), dims=('x', 'y'))
x_bins = (0, 2, 4, 6)
default_groups = a.groupby_bins('x', x_bins)
my_groups = a.groupby_bins('x', x_bins, include_lowest=True, right=False)
print(""Weird grouping using default call:"")
for iv, g in default_groups:
print(""Interval:"",iv)
print(g.data)
print()
print(""Evenly chunked using `my_groups`:"")
for iv, g in my_groups:
print(""Interval:"", iv)
print(g.data)
print()
print(""Calculating mean on my own using loop over groups:"")
for iv, g in my_groups:
print(g.mean('x').data)
print(""Calculting same using my_groups.mean()"")
print(""No dim given:"")
print(my_groups.mean().data.T)
print(""using mean('x'):"")
print(my_groups.mean('x').data.T)
print(""These results come from the default groups!:"")
for iv, g in default_groups:
print(g.mean('x').data)
print(""STD has the same issue"")
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
_No response_
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 6.0.12-76060006-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.1
xarray: 2023.2.0
pandas: 1.5.3
numpy: 1.23.5
scipy: 1.10.1
netCDF4: 1.6.3
pydap: None
h5netcdf: None
h5py: 3.8.0
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.6
cfgrib: None
iris: None
bottleneck: 1.3.7
dask: 2023.3.0
distributed: 2023.3.0
matplotlib: 3.7.1
cartopy: 0.21.1
seaborn: 0.12.2
numbagg: None
fsspec: 2023.3.0
cupy: None
pint: None
sparse: None
flox: 0.6.8
numpy_groupies: 0.9.20
setuptools: 67.5.1
pip: 23.0.1
conda: installed
pytest: 7.1.3
mypy: None
IPython: 8.7.0
sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7601/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
662505658,MDU6SXNzdWU2NjI1MDU2NTg=,4240,jupyter repr caching deleted netcdf file,69774,closed,0,,,9,2020-07-21T02:50:04Z,2022-10-18T16:40:41Z,2022-10-18T16:40:41Z,NONE,,,,"**What happened**:
Testing xarray data storage in a jupyter notebook with varying data sizes and storing to a netcdf, i noticed that open_dataset/array (both show this behaviour) continue to return data from the first testing run, ignoring the fact that each run deletes the previously created netcdf file.
This only happens once the `repr` was used to display the xarray object.
But once in error mode, even the previously fine printed objects are then showing the wrong data.
This was hard to track down as it depends on the precise sequence in jupyter.
**What you expected to happen**:
when i use `open_dataset/array`, the resulting object should reflect reality on disk.
**Minimal Complete Verifiable Example**:
```python
import xarray as xr
from pathlib import Path
import numpy as np
def test_repr(nx):
ds = xr.DataArray(np.random.rand(nx))
path = Path(""saved_on_disk.nc"")
if path.exists():
path.unlink()
ds.to_netcdf(path)
return path
```
When executed in a cell with print for display, all is fine:
```python
test_repr(4)
print(xr.open_dataset(""saved_on_disk.nc""))
test_repr(5)
print(xr.open_dataset(""saved_on_disk.nc""))
```
but as soon as one cell used the jupyter repr:
```python
xr.open_dataset(""saved_on_disk.nc"")
```
all future file reads, even after executing the test function again and even using `print` and not `repr`, show the data from the last repr use.
**Anything else we need to know?**:
Here's a notebook showing the issue:
https://gist.github.com/05c2542ed33662cdcb6024815cc0c72c
**Environment**:
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-40-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.16.0
pandas: 1.0.5
numpy: 1.19.0
scipy: 1.5.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.5
cfgrib: None
iris: None
bottleneck: None
dask: 2.21.0
distributed: 2.21.0
matplotlib: 3.3.0
cartopy: 0.18.0
seaborn: 0.10.1
numbagg: None
pint: None
setuptools: 49.2.0.post20200712
pip: 20.1.1
conda: installed
pytest: 6.0.0rc1
IPython: 7.16.1
sphinx: 3.1.2
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4240/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
657792526,MDExOlB1bGxSZXF1ZXN0NDQ5ODUxMTk4,4230,provide set_option `collapse_html` to control HTML repr collapsed state,69774,closed,0,,,15,2020-07-16T02:29:07Z,2021-05-13T17:02:55Z,2021-05-13T17:02:54Z,NONE,,1,pydata/xarray/pulls/4230,"
- [x] Closes #4229
- [x] Tests added
- [ ] Passes `isort -rc . && black . && mypy . && flake8`
- [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4230/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
657769716,MDU6SXNzdWU2NTc3Njk3MTY=,4229,FR: Provide option for collapsing the HTML display in notebooks,69774,closed,0,,,1,2020-07-16T01:27:15Z,2021-04-27T01:37:54Z,2021-04-27T01:37:54Z,NONE,,,,"# Issue description
The overly long output of the text repr of xarray always bugged so I was very happy that the recently implemented html repr collapsed the data part, and equally sad to see that 0.16.0 reverted that, IMHO, correct design implementation back, presumably to align it with the text repr.
# Suggested solution
As the opinions will vary on what a good repr should do, similar to existing xarray.set_options I would like to have an option that let's me control if the data part (and maybe other parts?) appear in a collapsed fashion for the html repr.
# Additional questions
* Is it worth considering this as well for the text repr? Or is that harder to implement?
Any guidance on
* which files need to change
* potential pitfalls
would be welcome. I'm happy to work on this, as I seem to be the only one not liking the current implementation.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4229/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
650231649,MDU6SXNzdWU2NTAyMzE2NDk=,4194,AttributeError accessing data_array.variable,69774,closed,0,,,3,2020-07-02T22:16:58Z,2020-07-02T22:34:01Z,2020-07-02T22:29:48Z,NONE,,,,"**What happened**:
accessing `variable` attribute seems to work but also throws an AttributeError:
```
AttributeError: 'Variable' object has no attribute 'variable'
```
**What you expected to happen**:
According to https://xarray.pydata.org/en/stable/terminology.html all DataArrays should have ""an underlying variable that can be accessed via `arr.vairable`"", so I tried that out and got the error.
**Minimal Complete Verifiable Example**:
Using the example code from the docs:
```python
data = xr.DataArray(np.random.randn(2, 3), dims=('x', 'y'), coords={'x': [10, 20]})
data.variable
```
**Environment**:
Python 3.7 on Kubuntu 20.04 using Brave browser in Jupyterlab, up-to-date conda env.
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-40-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.15.1
pandas: 1.0.5
numpy: 1.18.5
scipy: 1.5.0
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.1.3
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.5
cfgrib: None
iris: None
bottleneck: None
dask: 2.19.0
distributed: 2.19.0
matplotlib: 3.2.2
cartopy: 0.18.0
seaborn: 0.10.1
numbagg: None
setuptools: 47.3.1.post20200616
pip: 20.1.1
conda: installed
pytest: 5.4.3
IPython: 7.16.1
sphinx: 3.1.1

","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4194/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue