issue_comments
3 rows where user = 8098361 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
| id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue | 
|---|---|---|---|---|---|---|---|---|---|---|---|
| 662868741 | https://github.com/pydata/xarray/issues/4236#issuecomment-662868741 | https://api.github.com/repos/pydata/xarray/issues/4236 | MDEyOklzc3VlQ29tbWVudDY2Mjg2ODc0MQ== | prs247au 8098361 | 2020-07-23T07:53:23Z | 2020-07-23T07:53:23Z | NONE | My minimal  def preprocessing(doys, ds): print(doys)def get_data_set(doys, parallel=True): ds = xr.open_mfdataset( files, combine='nested', concat_dim='time', parallel=parallel, preprocess=partial(preprocessing, doys) ) return ds if name == 'main':
    pth = "/path/to/data"
    day_of_year_range = (100, 140)
    files = list(Path(pth).rglob('*.nc'))
    ds = get_data_set(day_of_year_range, parallel=False)
    print(ds)
 | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | Allow passing args to preprocess function in open_mfdataset 659142789 | |
| 662806773 | https://github.com/pydata/xarray/issues/4236#issuecomment-662806773 | https://api.github.com/repos/pydata/xarray/issues/4236 | MDEyOklzc3VlQ29tbWVudDY2MjgwNjc3Mw== | prs247au 8098361 | 2020-07-23T03:56:09Z | 2020-07-23T03:56:09Z | NONE | Thanks for the suggestion of  Otherwise, I do agree with you about when args would need to be passed, ie. individual file processing that can't be done outside. Obviously if you don't need args, don't pass any. While I see now my use case doesn't need that, there still might be others that do, though this might be rare (later I'll need to add a dimension for each file with a value that varies between files, but luckily I can extract that from the filename). I was imagining additional args working something like the way the  Any additional arguments are passed on to job_func when the job runs. :param job_func: The function to be scheduled :return: The invoked job instance File: d:\anaconda3\lib\site-packages\schedule__init__.py Type: function ``` My original intent was cutting down the data I was loading from large files by managing that through the preprocess callback. But this is where I readily admit not knowing how xarray handles things under the covers which means I do things the wrong (sub-optimal?) way. I'm not the only one that is struggling with what is optimal though; Unexpected behaviour when chunking with multiple netcdf files in xarray/dask | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | Allow passing args to preprocess function in open_mfdataset 659142789 | |
| 660459866 | https://github.com/pydata/xarray/issues/4236#issuecomment-660459866 | https://api.github.com/repos/pydata/xarray/issues/4236 | MDEyOklzc3VlQ29tbWVudDY2MDQ1OTg2Ng== | prs247au 8098361 | 2020-07-18T10:03:58Z | 2020-07-18T10:03:58Z | NONE | I've cleaned up some code so hopefully it shows my two methods more clearly; Current method``` Set some day of year globalsDOY1 = 1; DOY2 = 31 def select_time(ds): # METHOD 1: Derive start/end date from external ipy widget values # Problem: Doesn't work with kwarg parallel=True (pickling error) # Unknown: if the widget values here will actually change when widgets are changed year_min, year_max = ds.time.dt.year.min(), ds.time.dt.year.max() start_date = pd.Timestamp(dateparse(str(int(year_min)) + mmddW.value)) end_date = pd.Timestamp(start_date + timedelta(days=daysW.value)) # Test using fixed values to create start/end dates...this works with pickling # start_date = pd.Timestamp(dateparse(str(int(year_min)) + '0101')) # end_date = pd.Timestamp(start_date + timedelta(days=30)) ds = ds.sel(time=slice(start_date, end_date)) ds = xr.open_mfdataset( files, chunks={'lat': 50, 'lon':50}, combine='nested', concat_dim='time', preprocess=select_time, parallel=True ) ``` I can appreciate the pickling error for Method 1 is actually because of the reference to the (global) ipy widgets mmddW & daysW. After all why should it be expected to pickle those? Interesting that's only a problem for the parallel option though. I don't fully understand, but can also appreciate, Method 2 only references DOY1/2 when they're declared and seems to be static thereafter even if DOY1/2 are modified. Both methods are variations on a theme: I'm trying to use globals in the  Another solutionI think the actual solution to my problem is to forget about preprocessing. Since nothing is loaded at that stage ``` ds = xr.open_mfdataset( files, combine='nested', concat_dim='time', parallel=True ds = ds.sel(time=((ds['time.dayofyear']>=DOY1) & (ds['time.dayofyear']<=DOY2)))
ds = ds.chunk({'time': -1, 'lat':50, 'lon':50}).persist()
 Still, it's a side-step around the arg passing issue. 
 | {
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | Allow passing args to preprocess function in open_mfdataset 659142789 | 
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
user 1