html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/7059#issuecomment-1268031159,https://api.github.com/repos/pydata/xarray/issues/7059,1268031159,IC_kwDOAMm_X85LlJ63,691772,2022-10-05T07:02:23Z,2022-10-05T07:02:48Z,CONTRIBUTOR,"> I agree with just passing all args explicitly. Does it work otherwise with `""processes""`? What do you mean by that? > 1. Why are you chunking iniside the mapped function? Uhm yes, you are right, this should be removed, not sure how this happened. Removing `.chunk({""time"": None})` in the lambda function does not change the behavior of the example regarding this issue. > 2. If you `conda install flox`, the resample operation should be quite efficient, without the need to use `map_blocks` Oh wow, thanks! Haven't seen flox before.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1379372915 https://github.com/pydata/xarray/issues/7059#issuecomment-1254873700,https://api.github.com/repos/pydata/xarray/issues/7059,1254873700,IC_kwDOAMm_X85Ky9pk,691772,2022-09-22T11:09:16Z,2022-09-22T11:09:16Z,CONTRIBUTOR,"I have managed to reduce the reproducing example (see ""Minimal Complete Verifiable Example 2"" above) and then also find a proper solution to fix this issue. I am still not sure whether this is a bug or intended behavior, so I'll won't close the issue for now. Basically the issue occurs when a chunked NetCDF file is loaded from disk, passed to `xarray.map_blocks()` and is then used in `.sel()` as parameter to get a subset of some other xarray object which is not passed to the worker `func()`. I think the proper solution is to use the `args` parameter of `map_blocks()` instead of `.sel()`: ``` --- run-broken.py 2022-09-22 13:00:41.095555961 +0200 +++ run.py 2022-09-22 13:01:14.452696511 +0200 @@ -30,17 +30,17 @@ def resample_annually(data): return data.sortby(""time"").resample(time=""1A"", label=""left"", loffset=""1D"").mean(dim=""time"") - def worker(data): - locations_chunk = locations.sel(locations=data.locations) - out_raw = data * locations_chunk + def worker(data, locations): + out_raw = data * locations out = resample_annually(out_raw) return out template = resample_annually(data) out = xr.map_blocks( - lambda data: worker(data).compute().chunk({""time"": None}), + lambda data, locations: worker(data, locations).compute().chunk({""time"": None}), data, + (locations,), template=template, ) ``` This seems to fix this issue and seems to be the proper solution anyway. I still don't see why I am not allowed to use `.sel()` on shadowed objects in the worker `func()ยด. Is this on purpose? If yes, should we add something to the documentation? Is this a specific behavior of `map_blocks()`? Is it related to #6904?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1379372915 https://github.com/pydata/xarray/issues/7059#issuecomment-1252561840,https://api.github.com/repos/pydata/xarray/issues/7059,1252561840,IC_kwDOAMm_X85KqJOw,691772,2022-09-20T15:54:48Z,2022-09-20T15:54:48Z,CONTRIBUTOR,"@benbovy thanks for the hint! I tried passing an explicit lock to `xr.open_mfdataset()` [as suggested](https://github.com/pydata/xarray/issues/6904#issuecomment-1210233503), but didn't change anything, still the same exception. I will double check, if I did it the right way, I might be missing something.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1379372915