home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 335523891

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
335523891 MDU6SXNzdWUzMzU1MjM4OTE= 2249 stacked_xarray.groupby('lat','lon').apply(func) over 3D array takes too long 40496187 closed 0     3 2018-06-25T18:47:00Z 2020-06-28T04:11:21Z 2020-06-28T04:11:21Z NONE      

```python

the function I use to calculate the number of consequent dry days in a time-series and

store it in the original 3D shape:

def my_function(x): mask = (x.precipitation < 1 )[:,0] #mask dry days i.e. days with daily rain < 1mm dryday_num = mask.groupby( ~mask.cumsum() ).sum() # calculate number of consecutive dry days (i.e. number of consecutive <True> values in a time-series)

index = np.where((((mask.shift(time=-1).fillna(False)) - mask)==-1) & (mask==True))[0] # shift the mask by one time step relative to original and calculate the difference between them to identify last <True> index in a dry day sequence
dry_days = x.precipitation[:,0].copy(); dry_days[:] = 0.0 #create an empty array of dry days
dry_days[index] = dryday_num[dryday_num>0]  # store the number of dry days in the last day of every dry day period 
return dry_days

===================

MAIN Code

===================

(1) - get a slice from global precipitation data

pr = precip.isel(time=slice(0,20), lon=slice(700,1000), lat=slice(300,600)) print (pr)

--------------------------

<xarray.Dataset>

Dimensions: (lat: 300, lon: 300, time: 20)

Coordinates:

* time (time) datetime64[ns] 2010-05-31 2010-06-01 2010-06-02 ...

* lat (lat) float64 14.88 14.62 14.38 14.12 13.88 13.62 13.38 ...

* lon (lon) float64 -4.875 -4.625 -4.375 -4.125 -3.875 -3.625 ...

Data variables:

precipitation (time, lon, lat) float64 0.2165 0.02367 0.6997 1.288 ...

--------------------------

(2) - stuck lat/ lons to reduce the dimension and to provide only a time-series to my_function()

stacked = pr.stack(allpoints=['lat','lon']) dry_day_number_stacked = stacked.groupby('allpoints').apply(my_function) # run my_function over every grid pixel dry_day_num = dry_day_num_stacked.unstack('allpoints') print (dry_day_num )

--------------------------

<xarray.DataArray 'precipitation' (time: 20, allpoints_level_0: 300, allpoints_level_1: 300)>

array([[... ]]])

Coordinates:

* time (time) datetime64[ns] 2010-05-31 2010-06-01 ...

* allpoints_level_0 (allpoints_level_0) float64 14.88 14.62 14.38 ...

* allpoints_level_1 (allpoints_level_1) float64 -4.875 -4.625 -4.375 ...

--------------------------

``` Hello, My task is to (i) - calculate the number of consecutive dry days (daily rain > 1mm) for every grid pixel using 3D dataset of daily rainfall and (ii) - write it out as a netcdf file preserving the original 3-D dataset shape. The latter is done by assigning the total number of consecutive dry days into the last dry day in this sequence in this pixel. (Ex: 4 dry days on 1-4th of June will be stored as ...0-0-0-4-... corresponding to the ...-1st-2nd-3d-4th-... days of June accordingly.

The problem is that my (probably inefficient) calculation for only a slice of data (20 days, 300x300 grid pixels) takes already ~ 15 mins. I would need to run it over 20 years and full globe (1440x720 pixels). What takes long is the internal looping over stacked 'lat' / 'lon' couples, when I call my function(). I would be very much grateful if some of you guys could advice me a better&smarter way to do what I am trying to do. Many thanks!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2249/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 0.815ms · About: xarray-datasette