issues: 789653499
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
789653499 | MDU6SXNzdWU3ODk2NTM0OTk= | 4830 | GH2550 revisited | 40218891 | open | 0 | 2 | 2021-01-20T05:40:16Z | 2021-01-25T23:06:01Z | NONE | Is your feature request related to a problem? Please describe. I am retrieving files from AWS: https://registry.opendata.aws/wrf-se-alaska-snap/. An example: ``` import s3fs import xarray as xr s3 = s3fs.S3FileSystem(anon=True) s3path = 's3://wrf-se-ak-ar5/gfdl/hist/daily/1980/WRFDS_1980-01-0[12].nc' remote_files = s3.glob(s3path) fileset = [s3.open(file) for file in remote_files] ds = xr.open_mfdataset(fileset, concat_dim='Time', decode_cf=False) ds ``` Data files for 1980 are missing time coordinate, so the above code fails. The time could be obtained by parsing file name, however in the current implementation the source attribute is available only when the fileset consists of strings or Paths. Describe the solution you'd like I would suggest to return to the original suggestion in #2550 - pass filename_or_object as an argument to preprocess function, but with necessary inspection. Here is my attempt (code in open_mfdataset): ``` open_kwargs = dict( engine=engine, chunks=chunks or {}, lock=lock, autoclose=autoclose, **kwargs )
ds = xr.open_mfdataset(fileset, preprocess=fix, concat_dim='Time', decode_cf=False)
def fix1(ds): print('fix1') return ds def fix2(ds, file): print('fix2:', file.as_uri()) return ds def fix3(ds, file, arg): print('fix3:', file.as_uri(), arg) return ds fileset = [Path('/home/george/Downloads/WRFDS_1988-04-23.nc'),
Path('/home/george/Downloads/WRFDS_1988-04-24.nc')
]
ds = xr.open_mfdataset(fileset, preprocess=fix1, concat_dim='Time', parallel=True)
ds = xr.open_mfdataset(fileset, preprocess=fix2, concat_dim='Time')
ds = xr.open_mfdataset(fileset, preprocess=partial(fix3, arg='additional argument'),
concat_dim='Time')
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4830/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
13221727 | issue |