home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 559283550

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
559283550 MDU6SXNzdWU1NTkyODM1NTA= 3745 groupby drops the variable used to group 22245117 open 0     0 2020-02-03T19:25:06Z 2022-04-09T02:25:17Z   CONTRIBUTOR      

MCVE Code Sample

python import xarray as xr ds = xr.tutorial.load_dataset('rasm') ```python

Seasonal mean

ds_season = ds.groupby('time.season').mean() ds_season ```

<xarray.Dataset>
Dimensions:  (season: 4, x: 275, y: 205)
Coordinates:
    yc       (y, x) float64 16.53 16.78 17.02 17.27 ... 28.26 28.01 27.76 27.51
    xc       (y, x) float64 189.2 189.4 189.6 189.7 ... 17.65 17.4 17.15 16.91
  * season   (season) object 'DJF' 'JJA' 'MAM' 'SON'
Dimensions without coordinates: x, y
Data variables:
    Tair     (season, y, x) float64 nan nan nan nan ... 23.13 22.06 21.72 21.94

```python

The seasons are ordered in alphabetical order.

I want to sort them based on time.

But time was dropped, so I have to do this:

time_season = ds['time'].groupby('time.season').mean() ds_season.sortby(time_season) ```

<xarray.Dataset>
Dimensions:  (season: 4, x: 275, y: 205)
Coordinates:
    yc       (y, x) float64 16.53 16.78 17.02 17.27 ... 28.26 28.01 27.76 27.51
    xc       (y, x) float64 189.2 189.4 189.6 189.7 ... 17.65 17.4 17.15 16.91
  * season   (season) object 'SON' 'DJF' 'MAM' 'JJA'
Dimensions without coordinates: x, y
Data variables:
    Tair     (season, y, x) float64 nan nan nan nan ... 29.27 28.39 27.94 28.05

Expected Output

```python

Why does groupby drop time?

I would expect a dataset that looks like this:

ds_season['time'] = time_season ds_season ```

<xarray.Dataset>
Dimensions:  (season: 4, x: 275, y: 205)
Coordinates:
    yc       (y, x) float64 16.53 16.78 17.02 17.27 ... 28.26 28.01 27.76 27.51
    xc       (y, x) float64 189.2 189.4 189.6 189.7 ... 17.65 17.4 17.15 16.91
  * season   (season) object 'DJF' 'JJA' 'MAM' 'SON'
Dimensions without coordinates: x, y
Data variables:
    Tair     (season, y, x) float64 nan nan nan nan ... 23.13 22.06 21.72 21.94
    time     (season) object 1982-01-16 12:00:00 ... 1981-10-17 00:00:00

Problem Description

I often use groupby on time variables. When I do that, the time variable is dropped and replaced (e.g., time is replaced by season, month, year, ...). Most of the time I also want to sort the new dataset based on the original time. The example above shows why this is useful for seasons. Another example would be to sort monthly averages of a dataset that originally had daily data from Sep-2000 to Aug-2001. Why is time dropped? Does it make sense to keep it in the grouped dataset?

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-1067-oem machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.14.1 pandas: 1.0.0 numpy: 1.18.1 scipy: None netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.1 dask: 2.10.1 distributed: 2.10.0 matplotlib: 3.1.2 cartopy: None seaborn: None numbagg: None setuptools: 45.1.0.post20200127 pip: 20.0.2 conda: None pytest: None IPython: 7.11.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3745/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.571ms · About: xarray-datasette