home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 271957479

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
271957479 MDU6SXNzdWUyNzE5NTc0Nzk= 1695 Diagnose groupby/groupby_bins issues 14314623 closed 0     3 2017-11-07T19:39:38Z 2017-11-09T16:36:26Z 2017-11-09T16:36:19Z CONTRIBUTOR      

Code Sample, a copy-pastable example if possible

```python import xarray as xr xr.version

'0.9.6'

ds = xr.open_dataset('../testing/Bianchi_o2.nc',chunks={'TIME':1}) ds

<xarray.Dataset> Dimensions: (DEPTH: 33, LATITUDE: 180, LONGITUDE: 360, TIME: 12, bnds: 2) Coordinates: * LONGITUDE (LONGITUDE) float64 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 ... * LATITUDE (LATITUDE) float64 -89.5 -88.5 -87.5 -86.5 -85.5 -84.5 -83.5 ... * DEPTH (DEPTH) float64 0.0 10.0 20.0 30.0 50.0 75.0 100.0 125.0 ... * TIME (TIME) float64 15.0 44.0 73.5 104.0 134.5 165.0 195.5 226.5 ... Dimensions without coordinates: bnds Data variables: DEPTH_bnds (DEPTH, bnds) float64 -5.0 5.0 5.0 15.0 15.0 25.0 25.0 40.0 ... TIME_bnds (TIME, bnds) float64 0.5 29.5 29.5 58.75 58.75 88.75 88.75 ... O2_LINEAR (TIME, DEPTH, LATITUDE, LONGITUDE) float64 nan nan nan nan ... Attributes: history: FERRET V5.70 (alpha) 29-Sep-11

This runs as expected

ds.isel(TIME=0).groupby_bins('O2_LINEAR', np.array([0,20,40,60,100])).max()

This crashes the kernel

ds.groupby_bins('O2_LINEAR', np.array([0,20,40,60,100])).max()

```

Problem description

I am working on ocean oxygen data and would like to compute the volume of the ocean contained within a range of concentration values.

I am trying to use groupby_bins but even with this modest size dataset (1 deg global resolution, 25 depth levels, 12 time steps) my kernel crashes every time without any error message.

I eventually want to perform this step on several TB of ocean model output, so this is concerning.

First of all I would like to ask if there is an easy way to diagnose the problem further. And secondly, are there recommendations how to compute the sum over groupby_bins for very large datasets (consisting out of dask arrays).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1695/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 79.537ms · About: xarray-datasette