issues: 335523891
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
335523891 | MDU6SXNzdWUzMzU1MjM4OTE= | 2249 | stacked_xarray.groupby('lat','lon').apply(func) over 3D array takes too long | 40496187 | closed | 0 | 3 | 2018-06-25T18:47:00Z | 2020-06-28T04:11:21Z | 2020-06-28T04:11:21Z | NONE | ```python the function I use to calculate the number of consequent dry days in a time-series andstore it in the original 3D shape:def my_function(x): mask = (x.precipitation < 1 )[:,0] #mask dry days i.e. days with daily rain < 1mm dryday_num = mask.groupby( ~mask.cumsum() ).sum() # calculate number of consecutive dry days (i.e. number of consecutive <True> values in a time-series)
===================MAIN Code===================(1) - get a slice from global precipitation datapr = precip.isel(time=slice(0,20), lon=slice(700,1000), lat=slice(300,600)) print (pr) --------------------------<xarray.Dataset>Dimensions: (lat: 300, lon: 300, time: 20)Coordinates:* time (time) datetime64[ns] 2010-05-31 2010-06-01 2010-06-02 ...* lat (lat) float64 14.88 14.62 14.38 14.12 13.88 13.62 13.38 ...* lon (lon) float64 -4.875 -4.625 -4.375 -4.125 -3.875 -3.625 ...Data variables:precipitation (time, lon, lat) float64 0.2165 0.02367 0.6997 1.288 ...--------------------------(2) - stuck lat/ lons to reduce the dimension and to provide only a time-series to my_function()stacked = pr.stack(allpoints=['lat','lon']) dry_day_number_stacked = stacked.groupby('allpoints').apply(my_function) # run my_function over every grid pixel dry_day_num = dry_day_num_stacked.unstack('allpoints') print (dry_day_num ) --------------------------<xarray.DataArray 'precipitation' (time: 20, allpoints_level_0: 300, allpoints_level_1: 300)>array([[... ]]])Coordinates:* time (time) datetime64[ns] 2010-05-31 2010-06-01 ...* allpoints_level_0 (allpoints_level_0) float64 14.88 14.62 14.38 ...* allpoints_level_1 (allpoints_level_1) float64 -4.875 -4.625 -4.375 ...--------------------------``` Hello, My task is to (i) - calculate the number of consecutive dry days (daily rain > 1mm) for every grid pixel using 3D dataset of daily rainfall and (ii) - write it out as a netcdf file preserving the original 3-D dataset shape. The latter is done by assigning the total number of consecutive dry days into the last dry day in this sequence in this pixel. (Ex: 4 dry days on 1-4th of June will be stored as ...0-0-0-4-... corresponding to the ...-1st-2nd-3d-4th-... days of June accordingly. The problem is that my (probably inefficient) calculation for only a slice of data (20 days, 300x300 grid pixels) takes already ~ 15 mins. I would need to run it over 20 years and full globe (1440x720 pixels). What takes long is the internal looping over stacked 'lat' / 'lon' couples, when I call my function(). I would be very much grateful if some of you guys could advice me a better&smarter way to do what I am trying to do. Many thanks! |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2249/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |