home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

5 rows where user = 69774 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 4
  • pull 1

state 1

  • closed 5

repo 1

  • xarray 5
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1617395129 I_kwDOAMm_X85gZ325 7601 groupby_bins groups not correctly applied with built-in methods michaelaye 69774 closed 0     3 2023-03-09T14:44:15Z 2023-03-29T16:28:30Z 2023-03-29T16:28:30Z NONE      

What happened?

Setup

I want to calculate image statistics per chunk in one dimension. Let's assume a very small image for demonstration purposes:

python a = xr.DataArray(np.arange(12).reshape(6,2), dims=('x', 'y')) a Trying to chunk this into three subimages, I use these bins into the x dimension: python x_bins = (0, 2, 4, 6) I look at the groups this creates by default: python for iv, g in a.groupby_bins('x', x_bins): print(iv) print(g) I don't understand the use-case for this grouping, as it's missing the beginning and is having uneven sized last group (Obviously a follow-error from not including the first row).

To force the even chunking of the image I need to call it with these parameters: python groups = a.groupby_bins('x', x_bins, include_lowest=True, right=False) for iv, g in groups: print(iv) print(g)

Issue

But now, calculating the mean value of each group, I get different results when doing it by hand using the groups or doing it using the groups inherent method mean():

Indeed, I verified, that these results are what one gets, using the first version of applying the bins:

The same is true when I use the elliptical operator to receive the mean over the remaining dimensions (note, the 2nd cell here is using the groups variable as defined in the cell before, so should really return the same values, but it doesn't:

Application

I believe that groupby_bins is the most appropriate tool to do this in xarray. I wished that one could enforce the dask-chunks in dask arrays to survive and return stats from them, but haven't found a way to do that.

What did you expect to happen?

That the inherent stats methods of the groups method respect the interval use constraints from the groupby_bins call.

I also have verified that the same problem exists with groups.std().

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np

a = xr.DataArray(np.arange(12).reshape(6,2), dims=('x', 'y'))

x_bins = (0, 2, 4, 6)

default_groups = a.groupby_bins('x', x_bins) my_groups = a.groupby_bins('x', x_bins, include_lowest=True, right=False)

print("Weird grouping using default call:") for iv, g in default_groups: print("Interval:",iv) print(g.data) print()

print("Evenly chunked using my_groups:")
for iv, g in my_groups: print("Interval:", iv) print(g.data) print()

print("Calculating mean on my own using loop over groups:") for iv, g in my_groups: print(g.mean('x').data)

print("Calculting same using my_groups.mean()") print("No dim given:") print(my_groups.mean().data.T) print("using mean('x'):") print(my_groups.mean('x').data.T)

print("These results come from the default groups!:") for iv, g in default_groups: print(g.mean('x').data)

print("STD has the same issue") ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 6.0.12-76060006-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.1 xarray: 2023.2.0 pandas: 1.5.3 numpy: 1.23.5 scipy: 1.10.1 netCDF4: 1.6.3 pydap: None h5netcdf: None h5py: 3.8.0 Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.6 cfgrib: None iris: None bottleneck: 1.3.7 dask: 2023.3.0 distributed: 2023.3.0 matplotlib: 3.7.1 cartopy: 0.21.1 seaborn: 0.12.2 numbagg: None fsspec: 2023.3.0 cupy: None pint: None sparse: None flox: 0.6.8 numpy_groupies: 0.9.20 setuptools: 67.5.1 pip: 23.0.1 conda: installed pytest: 7.1.3 mypy: None IPython: 8.7.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7601/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
662505658 MDU6SXNzdWU2NjI1MDU2NTg= 4240 jupyter repr caching deleted netcdf file michaelaye 69774 closed 0     9 2020-07-21T02:50:04Z 2022-10-18T16:40:41Z 2022-10-18T16:40:41Z NONE      

What happened:

Testing xarray data storage in a jupyter notebook with varying data sizes and storing to a netcdf, i noticed that open_dataset/array (both show this behaviour) continue to return data from the first testing run, ignoring the fact that each run deletes the previously created netcdf file. This only happens once the repr was used to display the xarray object. But once in error mode, even the previously fine printed objects are then showing the wrong data.

This was hard to track down as it depends on the precise sequence in jupyter.

What you expected to happen:

when i use open_dataset/array, the resulting object should reflect reality on disk.

Minimal Complete Verifiable Example:

```python import xarray as xr from pathlib import Path import numpy as np

def test_repr(nx): ds = xr.DataArray(np.random.rand(nx)) path = Path("saved_on_disk.nc") if path.exists(): path.unlink() ds.to_netcdf(path) return path ```

When executed in a cell with print for display, all is fine: python test_repr(4) print(xr.open_dataset("saved_on_disk.nc")) test_repr(5) print(xr.open_dataset("saved_on_disk.nc"))

but as soon as one cell used the jupyter repr:

python xr.open_dataset("saved_on_disk.nc")

all future file reads, even after executing the test function again and even using print and not repr, show the data from the last repr use.

Anything else we need to know?:

Here's a notebook showing the issue: https://gist.github.com/05c2542ed33662cdcb6024815cc0c72c

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-40-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.0 pandas: 1.0.5 numpy: 1.19.0 scipy: 1.5.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.5 cfgrib: None iris: None bottleneck: None dask: 2.21.0 distributed: 2.21.0 matplotlib: 3.3.0 cartopy: 0.18.0 seaborn: 0.10.1 numbagg: None pint: None setuptools: 49.2.0.post20200712 pip: 20.1.1 conda: installed pytest: 6.0.0rc1 IPython: 7.16.1 sphinx: 3.1.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4240/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
657792526 MDExOlB1bGxSZXF1ZXN0NDQ5ODUxMTk4 4230 provide set_option `collapse_html` to control HTML repr collapsed state michaelaye 69774 closed 0     15 2020-07-16T02:29:07Z 2021-05-13T17:02:55Z 2021-05-13T17:02:54Z NONE   1 pydata/xarray/pulls/4230
  • [x] Closes #4229
  • [x] Tests added
  • [ ] Passes isort -rc . && black . && mypy . && flake8
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4230/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
657769716 MDU6SXNzdWU2NTc3Njk3MTY= 4229 FR: Provide option for collapsing the HTML display in notebooks michaelaye 69774 closed 0     1 2020-07-16T01:27:15Z 2021-04-27T01:37:54Z 2021-04-27T01:37:54Z NONE      

Issue description

The overly long output of the text repr of xarray always bugged so I was very happy that the recently implemented html repr collapsed the data part, and equally sad to see that 0.16.0 reverted that, IMHO, correct design implementation back, presumably to align it with the text repr.

Suggested solution

As the opinions will vary on what a good repr should do, similar to existing xarray.set_options I would like to have an option that let's me control if the data part (and maybe other parts?) appear in a collapsed fashion for the html repr.

Additional questions

  • Is it worth considering this as well for the text repr? Or is that harder to implement?

Any guidance on * which files need to change * potential pitfalls

would be welcome. I'm happy to work on this, as I seem to be the only one not liking the current implementation.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4229/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
650231649 MDU6SXNzdWU2NTAyMzE2NDk= 4194 AttributeError accessing data_array.variable michaelaye 69774 closed 0     3 2020-07-02T22:16:58Z 2020-07-02T22:34:01Z 2020-07-02T22:29:48Z NONE      

What happened:

accessing variable attribute seems to work but also throws an AttributeError:

AttributeError: 'Variable' object has no attribute 'variable'

What you expected to happen:

According to https://xarray.pydata.org/en/stable/terminology.html all DataArrays should have "an underlying variable that can be accessed via arr.vairable", so I tried that out and got the error.

Minimal Complete Verifiable Example:

Using the example code from the docs: python data = xr.DataArray(np.random.randn(2, 3), dims=('x', 'y'), coords={'x': [10, 20]}) data.variable

Environment: Python 3.7 on Kubuntu 20.04 using Brave browser in Jupyterlab, up-to-date conda env.

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-40-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.15.1 pandas: 1.0.5 numpy: 1.18.5 scipy: 1.5.0 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.1.3 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.5 cfgrib: None iris: None bottleneck: None dask: 2.19.0 distributed: 2.19.0 matplotlib: 3.2.2 cartopy: 0.18.0 seaborn: 0.10.1 numbagg: None setuptools: 47.3.1.post20200616 pip: 20.1.1 conda: installed pytest: 5.4.3 IPython: 7.16.1 sphinx: 3.1.1

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4194/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 638.355ms · About: xarray-datasette