html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2550#issuecomment-440009447,https://api.github.com/repos/pydata/xarray/issues/2550,440009447,MDEyOklzc3VlQ29tbWVudDQ0MDAwOTQ0Nw==,1217238,2018-11-19T19:16:48Z,2018-11-19T19:16:48Z,MEMBER,"> Is this something that we want to mandate that backends provide? I think it would be better to do this systematically, e.g., inside `xarray.open_dataset()`. We would need to verify that `filename_or_obj` is provided as a string, but if so we could add it into `encoding` on the Dataset object.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407 https://github.com/pydata/xarray/issues/2550#issuecomment-440002135,https://api.github.com/repos/pydata/xarray/issues/2550,440002135,MDEyOklzc3VlQ29tbWVudDQ0MDAwMjEzNQ==,4806877,2018-11-19T18:53:27Z,2018-11-19T18:53:27Z,CONTRIBUTOR,"Having started writing a test, I now think that `encoding['source']` is backend specific. Here it is implemented in netcdf4: https://github.com/pydata/xarray/blob/70e9eb8fc834e4aeff42c221c04c9713eb465b8a/xarray/backends/netCDF4_.py#L386 but I don't see it for pynio for instance: https://github.com/pydata/xarray/blob/70e9eb8fc834e4aeff42c221c04c9713eb465b8a/xarray/backends/pynio_.py#L77-L81 Is this something that we want to mandate that backends provide? ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407 https://github.com/pydata/xarray/issues/2550#issuecomment-439961292,https://api.github.com/repos/pydata/xarray/issues/2550,439961292,MDEyOklzc3VlQ29tbWVudDQzOTk2MTI5Mg==,1217238,2018-11-19T16:47:50Z,2018-11-19T16:47:50Z,MEMBER,"Yes, that sounds great! Potentially this would be a good opportunity for a doc update, too. On Mon, Nov 19, 2018 at 6:36 AM Julia Signell wrote: > Should I add a test that expects .encoding['source'] to ensure its > continued presence? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > , or mute > the thread > > . > ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407 https://github.com/pydata/xarray/issues/2550#issuecomment-439913493,https://api.github.com/repos/pydata/xarray/issues/2550,439913493,MDEyOklzc3VlQ29tbWVudDQzOTkxMzQ5Mw==,4806877,2018-11-19T14:36:37Z,2018-11-19T14:36:37Z,CONTRIBUTOR,Should I add a test that expects `.encoding['source']` to ensure its continued presence?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407 https://github.com/pydata/xarray/issues/2550#issuecomment-439770613,https://api.github.com/repos/pydata/xarray/issues/2550,439770613,MDEyOklzc3VlQ29tbWVudDQzOTc3MDYxMw==,1217238,2018-11-19T04:47:13Z,2018-11-19T04:47:13Z,MEMBER,I'm not sure `.encoding['source']` should really be relied upon -- it wasn't really an intentional API decision. But I guess it's harmless enough to include it...,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407 https://github.com/pydata/xarray/issues/2550#issuecomment-439742167,https://api.github.com/repos/pydata/xarray/issues/2550,439742167,MDEyOklzc3VlQ29tbWVudDQzOTc0MjE2Nw==,4806877,2018-11-19T00:52:03Z,2018-11-19T00:52:03Z,CONTRIBUTOR,"Ah I don't think I understood that adding `source` to encoding was a new addition. In latest master (`'0.11.0+3.g70e9eb8`) this works fine: ```python def func(ds): var = next(var for var in ds) return ds.assign(path=ds[var].encoding['source']) ds = xr.open_mfdataset(['./air_1.nc', './air_2.nc'], concat_dim='path', preprocess=func) ``` I do think it is misleading though that after you've concatenated the data, the `encoding['source']` on a concatenated var seems to be the first path. ```python >>> ds['air'].encoding['source'] '~/air_1.nc' ``` I'll close this one though since there is a clear way to access the filename. Thanks for the tip @jhamman! ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407 https://github.com/pydata/xarray/issues/2550#issuecomment-437464939,https://api.github.com/repos/pydata/xarray/issues/2550,437464939,MDEyOklzc3VlQ29tbWVudDQzNzQ2NDkzOQ==,2448579,2018-11-09T19:14:35Z,2018-11-09T19:14:35Z,MEMBER,"True, maybe we should track down why that isn't happening with your dataset","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407 https://github.com/pydata/xarray/issues/2550#issuecomment-437464067,https://api.github.com/repos/pydata/xarray/issues/2550,437464067,MDEyOklzc3VlQ29tbWVudDQzNzQ2NDA2Nw==,4806877,2018-11-09T19:11:38Z,2018-11-09T19:11:38Z,CONTRIBUTOR,">A dirty fix would be to add an attribute to each dataset. I thought @jhamman was suggesting that already exists, but I couldn't find it: https://github.com/pydata/xarray/issues/2550#issuecomment-437157299","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407 https://github.com/pydata/xarray/issues/2550#issuecomment-437463476,https://api.github.com/repos/pydata/xarray/issues/2550,437463476,MDEyOklzc3VlQ29tbWVudDQzNzQ2MzQ3Ng==,2448579,2018-11-09T19:09:34Z,2018-11-09T19:09:34Z,MEMBER,A dirty fix would be to add an attribute to each dataset. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407 https://github.com/pydata/xarray/issues/2550#issuecomment-437458789,https://api.github.com/repos/pydata/xarray/issues/2550,437458789,MDEyOklzc3VlQ29tbWVudDQzNzQ1ODc4OQ==,1217238,2018-11-09T18:53:00Z,2018-11-09T18:53:00Z,MEMBER,"The danger with inspecting user provided functions is that it's pretty fragile, e.g., it fails if you use provide a signature like *args, **kwargs (which can happen pretty easily with decorators). Probably the best option is to come up with a new keyword argument to replace ""preprocess"" and to deprecate the current preprocess (if we can think of another good name). We could also do a deprecation cycle with FutureWarning, but that's pretty painful. On Fri, Nov 9, 2018 at 12:29 PM Julia Signell wrote: > Maybe we can inspect the preprocess function like this: > > >>> preprocess = lambda a, b: print(a, b)>>> preprocess .__code__.co_varnames > ('a', 'b') > > This response is ordered, so the first one can always be ds regardless of > its name and then we can look for special names (like filename) in the > rest. > > From this answer: https://stackoverflow.com/a/4051447/4021797 > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > , or mute > the thread > > . > ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407 https://github.com/pydata/xarray/issues/2550#issuecomment-437433736,https://api.github.com/repos/pydata/xarray/issues/2550,437433736,MDEyOklzc3VlQ29tbWVudDQzNzQzMzczNg==,4806877,2018-11-09T17:29:05Z,2018-11-09T17:29:05Z,CONTRIBUTOR,"Maybe we can inspect the `preprocess` function like this: ```python >>> preprocess = lambda a, b: print(a, b) >>> preprocess .__code__.co_varnames ('a', 'b') ``` This response is ordered, so the first one can always be `ds` regardless of its name and then we can look for special names (like `filename`) in the rest. From this answer: https://stackoverflow.com/a/4051447/4021797","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407 https://github.com/pydata/xarray/issues/2550#issuecomment-437408363,https://api.github.com/repos/pydata/xarray/issues/2550,437408363,MDEyOklzc3VlQ29tbWVudDQzNzQwODM2Mw==,2448579,2018-11-09T16:10:08Z,2018-11-09T16:10:08Z,MEMBER,Hmm... Sorry @jsignell. I thought preprocess passed the filename too.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407 https://github.com/pydata/xarray/issues/2550#issuecomment-437394231,https://api.github.com/repos/pydata/xarray/issues/2550,437394231,MDEyOklzc3VlQ29tbWVudDQzNzM5NDIzMQ==,1217238,2018-11-09T15:28:40Z,2018-11-09T15:28:40Z,MEMBER,"@jhamman The problem is that xarray needs way to figure out what arguments it can safely pass to `preprocess`, i.e., it needs to inspect the `proprocess` function and see if it can handle a `filename` argument. It's not obvious what the best way to do this in a backwards compatible way is...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407 https://github.com/pydata/xarray/issues/2550#issuecomment-437243753,https://api.github.com/repos/pydata/xarray/issues/2550,437243753,MDEyOklzc3VlQ29tbWVudDQzNzI0Mzc1Mw==,2443309,2018-11-09T04:08:14Z,2018-11-09T04:08:14Z,MEMBER,"@shoyer and @jsignell - I'd also be happy to see this added to the preprocess function. Ideally the function signature would look like: ```python def preprocess(ds, filename=None): ... return ds ``` This would avoid a breaking change and allow us to add additional kwargs at a later date if need be.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407 https://github.com/pydata/xarray/issues/2550#issuecomment-437173740,https://api.github.com/repos/pydata/xarray/issues/2550,437173740,MDEyOklzc3VlQ29tbWVudDQzNzE3Mzc0MA==,1217238,2018-11-08T22:08:06Z,2018-11-08T22:08:06Z,MEMBER,Hmm. It really seems like the `preprocess` function should pass in the file-name along with the dataset.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407 https://github.com/pydata/xarray/issues/2550#issuecomment-437161279,https://api.github.com/repos/pydata/xarray/issues/2550,437161279,MDEyOklzc3VlQ29tbWVudDQzNzE2MTI3OQ==,4806877,2018-11-08T21:24:45Z,2018-11-08T21:24:45Z,CONTRIBUTOR,"@jhamman that looks pretty good, but I'm not seeing the source in the encoding dict. Is this what you were expecting? ```python def func(ds): var = next(var for var in ds) return ds.assign(path=ds[var].encoding['source']) xr.open_mfdataset(['./ST4.2018092500.01h', './ST4.2018092501.01h'], engine='pynio', concat_dim='path', preprocess=func) ``` ```python-traceback --------------------------------------------------------------------------- KeyError Traceback (most recent call last) in () ----> 1 ds = xr.open_mfdataset(['./ST4.2018092500.01h', './ST4.2018092501.01h'], engine='pynio', concat_dim='path', preprocess=func) /opt/conda/lib/python3.6/site-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, lock, data_vars, coords, autoclose, parallel, **kwargs) 612 file_objs = [getattr_(ds, '_file_obj') for ds in datasets] 613 if preprocess is not None: --> 614 datasets = [preprocess(ds) for ds in datasets] 615 616 if parallel: /opt/conda/lib/python3.6/site-packages/xarray/backends/api.py in (.0) 612 file_objs = [getattr_(ds, '_file_obj') for ds in datasets] 613 if preprocess is not None: --> 614 datasets = [preprocess(ds) for ds in datasets] 615 616 if parallel: in func(ds) 1 def func(ds): 2 var = next(var for var in ds) ----> 3 return ds.assign(path=ds[var].encoding['source']) KeyError: 'source' ``` xarray version: '0.11.0+1.g575e97ae'","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407 https://github.com/pydata/xarray/issues/2550#issuecomment-437157299,https://api.github.com/repos/pydata/xarray/issues/2550,437157299,MDEyOklzc3VlQ29tbWVudDQzNzE1NzI5OQ==,2443309,2018-11-08T21:11:08Z,2018-11-08T21:11:08Z,MEMBER,"@jsignell - perhaps not a very pretty solution but we do save the source of each variable in the encoding dictionary. ```python ds['varname'].encoding['source'] ``` Presumably, you could unpack this via a preprocess step.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407 https://github.com/pydata/xarray/issues/2550#issuecomment-437156317,https://api.github.com/repos/pydata/xarray/issues/2550,437156317,MDEyOklzc3VlQ29tbWVudDQzNzE1NjMxNw==,4806877,2018-11-08T21:07:48Z,2018-11-08T21:07:48Z,CONTRIBUTOR,"> There is a preprocess argument. You provide a function and it is run on every file. Yes but the input to that function is just the ds, I couldn't figure out a way to get the filename from within a preprocess function. This is what I was doing to poke around in there: ```python def func(ds): import pdb; pdb.set_trace() xr.open_mfdataset(['./ST4.2018092500.01h', './ST4.2018092501.01h'], engine='pynio', concat_dim='path', preprocess=func) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407 https://github.com/pydata/xarray/issues/2550#issuecomment-437153461,https://api.github.com/repos/pydata/xarray/issues/2550,437153461,MDEyOklzc3VlQ29tbWVudDQzNzE1MzQ2MQ==,2448579,2018-11-08T20:58:47Z,2018-11-08T20:58:47Z,MEMBER,There is a preprocess argument. You provide a function and it is run on every file.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,378898407