home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 659142789

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
659142789 MDU6SXNzdWU2NTkxNDI3ODk= 4236 Allow passing args to preprocess function in open_mfdataset 8098361 closed 0     7 2020-07-17T10:52:14Z 2023-09-12T15:59:45Z 2023-09-12T15:59:45Z NONE      

For a set of netcdf files I'm opening with open_mfdataset I'd also like to pass a couple of extra arguments to the preprocess function. At the moment the Dataset seems to be the only arg that the preprocess function accepts.

The netcdf files have dimensions (time, lat, lon). It's the time dimension I'd like to cut down during a parallel load eg. using a start/end dayofyear. Each file covers a different year and I'd like to slice out particular dayofyear range within each. I tried calculating dayofyear inside the preprocess function and doing .sel, which works perfectly fine, but not with the parallel=True option. Without the parallel option loading is much slower. However, if parallel=True a pickling error occurs, possibly because I'm using other functions like dateparse, or timedelta inside the preprocess function to calculate the dayofyear (which itself is derived from a ipywidget). I don't really understand the pickling error, it says; ``` D:\Anaconda3\lib\pickle.py in save_global(self, obj, name) 963 raise PicklingError( 964 "Can't pickle %r: it's not the same object as %s.%s" % --> 965 (obj, module_name, name)) 966 967 if self.proto >= 2:

PicklingError: Can't pickle <built-in function input>: it's not the same object as builtins.input `` I also tried setting the start/end dayofyear as global outside thepreprocess` function and using them inside the function but changing those globals (integers) after the function is defined doesn't seem to alter the reference to them inside the function. I don't think this solution is very elegant anyway.

Many other packages have functions that use callbacks with the option of passing additional arguments. Has this been considered for open_mfdataset?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4236/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 6 rows from issue in issue_comments
Powered by Datasette · Queries took 4.125ms · About: xarray-datasette