issue_comments: 660459866

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/4236#issuecomment-660459866	https://api.github.com/repos/pydata/xarray/issues/4236	660459866	MDEyOklzc3VlQ29tbWVudDY2MDQ1OTg2Ng==	8098361	2020-07-18T10:03:58Z	2020-07-18T10:03:58Z	NONE	I've cleaned up some code so hopefully it shows my two methods more clearly; Current method ``` Set some day of year globals DOY1 = 1; DOY2 = 31 def select_time(ds): # METHOD 1: Derive start/end date from external ipy widget values # Problem: Doesn't work with kwarg parallel=True (pickling error) # Unknown: if the widget values here will actually change when widgets are changed year_min, year_max = ds.time.dt.year.min(), ds.time.dt.year.max() start_date = pd.Timestamp(dateparse(str(int(year_min)) + mmddW.value)) end_date = pd.Timestamp(start_date + timedelta(days=daysW.value)) # Test using fixed values to create start/end dates...this works with pickling # start_date = pd.Timestamp(dateparse(str(int(year_min)) + '0101')) # end_date = pd.Timestamp(start_date + timedelta(days=30)) ds = ds.sel(time=slice(start_date, end_date)) `# METHOD 2: Select time range based on day of year, where DOY1,DOY2 are # globals set outside this function. Does pickle, so works with parallel option. # Problem: DOY1, DOY2 don't update here when changed externally after # function declaration ds = ds.sel(time=((ds['time.dayofyear']>=DOY1) & (ds['time.dayofyear']<=DOY2))) return ds` ds = xr.open_mfdataset( files, chunks={'lat': 50, 'lon':50}, combine='nested', concat_dim='time', preprocess=select_time, parallel=True ) ``` I can appreciate the pickling error for Method 1 is actually because of the reference to the (global) ipy widgets mmddW & daysW. After all why should it be expected to pickle those? Interesting that's only a problem for the parallel option though. I don't fully understand, but can also appreciate, Method 2 only references DOY1/2 when they're declared and seems to be static thereafter even if DOY1/2 are modified. Both methods are variations on a theme: I'm trying to use globals in the `preprocess` function as an alternative to passing extra args. The broader question is whether extra arguments could be useful feature to have. Another solution I think the actual solution to my problem is to forget about preprocessing. Since nothing is loaded at that stage ``` ds = xr.open_mfdataset( files, combine='nested', concat_dim='time', parallel=True ds = ds.sel(time=((ds['time.dayofyear']>=DOY1) & (ds['time.dayofyear']<=DOY2))) ds = ds.chunk({'time': -1, 'lat':50, 'lon':50}).persist() `` Doing everything after theopen_mfdataset` and seems to work more efficiently. This sort of thing is counter intuitive to me still. Loading less would seem better from the outset but the after-the-fact processing seems to take care of this problem. Still, it's a side-step around the arg passing issue. Before I think about this further - could your problem be solved using `functools.partial`? I've never used `functools.partial`. From my reading it seems this is used to wrap functions and fix certain arguments so you can call the wrapper with less args. I don't know how to use it to help my current situation.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		659142789