home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 660459866

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4236#issuecomment-660459866 https://api.github.com/repos/pydata/xarray/issues/4236 660459866 MDEyOklzc3VlQ29tbWVudDY2MDQ1OTg2Ng== 8098361 2020-07-18T10:03:58Z 2020-07-18T10:03:58Z NONE

I've cleaned up some code so hopefully it shows my two methods more clearly;

Current method

```

Set some day of year globals

DOY1 = 1; DOY2 = 31

def select_time(ds): # METHOD 1: Derive start/end date from external ipy widget values # Problem: Doesn't work with kwarg parallel=True (pickling error) # Unknown: if the widget values here will actually change when widgets are changed year_min, year_max = ds.time.dt.year.min(), ds.time.dt.year.max() start_date = pd.Timestamp(dateparse(str(int(year_min)) + mmddW.value)) end_date = pd.Timestamp(start_date + timedelta(days=daysW.value)) # Test using fixed values to create start/end dates...this works with pickling # start_date = pd.Timestamp(dateparse(str(int(year_min)) + '0101')) # end_date = pd.Timestamp(start_date + timedelta(days=30)) ds = ds.sel(time=slice(start_date, end_date))

# METHOD 2: Select time range based on day of year, where DOY1,DOY2 are
# globals set outside this function. Does pickle, so works with parallel option.
#   Problem: DOY1, DOY2 don't update here when changed externally after
#   function declaration
ds = ds.sel(time=((ds['time.dayofyear']>=DOY1) & (ds['time.dayofyear']<=DOY2)))
return ds

ds = xr.open_mfdataset( files, chunks={'lat': 50, 'lon':50}, combine='nested', concat_dim='time', preprocess=select_time, parallel=True ) ``` I can appreciate the pickling error for Method 1 is actually because of the reference to the (global) ipy widgets mmddW & daysW. After all why should it be expected to pickle those? Interesting that's only a problem for the parallel option though.

I don't fully understand, but can also appreciate, Method 2 only references DOY1/2 when they're declared and seems to be static thereafter even if DOY1/2 are modified.

Both methods are variations on a theme: I'm trying to use globals in the preprocess function as an alternative to passing extra args. The broader question is whether extra arguments could be useful feature to have.

Another solution

I think the actual solution to my problem is to forget about preprocessing. Since nothing is loaded at that stage ``` ds = xr.open_mfdataset( files, combine='nested', concat_dim='time', parallel=True

ds = ds.sel(time=((ds['time.dayofyear']>=DOY1) & (ds['time.dayofyear']<=DOY2))) ds = ds.chunk({'time': -1, 'lat':50, 'lon':50}).persist() `` Doing everything after theopen_mfdataset` and seems to work more efficiently. This sort of thing is counter intuitive to me still. Loading less would seem better from the outset but the after-the-fact processing seems to take care of this problem.

Still, it's a side-step around the arg passing issue.

Before I think about this further - could your problem be solved using functools.partial? I've never used functools.partial. From my reading it seems this is used to wrap functions and fix certain arguments so you can call the wrapper with less args. I don't know how to use it to help my current situation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  659142789
Powered by Datasette · Queries took 0.926ms · About: xarray-datasette