github: issue_comments: 3 rows where issue = 1379372915 and user = 691772 sorted by updated

3 rows where issue = 1379372915 and user = 691772 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1268031159	https://github.com/pydata/xarray/issues/7059#issuecomment-1268031159	https://api.github.com/repos/pydata/xarray/issues/7059	IC_kwDOAMm_X85LlJ63	lumbric 691772	2022-10-05T07:02:23Z	2022-10-05T07:02:48Z	CONTRIBUTOR	I agree with just passing all args explicitly. Does it work otherwise with `"processes"`? What do you mean by that? Why are you chunking iniside the mapped function? Uhm yes, you are right, this should be removed, not sure how this happened. Removing `.chunk({"time": None})` in the lambda function does not change the behavior of the example regarding this issue. If you `conda install flox`, the resample operation should be quite efficient, without the need to use `map_blocks` Oh wow, thanks! Haven't seen flox before.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	pandas.errors.InvalidIndexError raised when running computation in parallel using dask 1379372915
1254873700	https://github.com/pydata/xarray/issues/7059#issuecomment-1254873700	https://api.github.com/repos/pydata/xarray/issues/7059	IC_kwDOAMm_X85Ky9pk	lumbric 691772	2022-09-22T11:09:16Z	2022-09-22T11:09:16Z	CONTRIBUTOR	I have managed to reduce the reproducing example (see "Minimal Complete Verifiable Example 2" above) and then also find a proper solution to fix this issue. I am still not sure whether this is a bug or intended behavior, so I'll won't close the issue for now. Basically the issue occurs when a chunked NetCDF file is loaded from disk, passed to `xarray.map_blocks()` and is then used in `.sel()` as parameter to get a subset of some other xarray object which is not passed to the worker `func()`. I think the proper solution is to use the `args` parameter of `map_blocks()` instead of `.sel()`: ``` --- run-broken.py 2022-09-22 13:00:41.095555961 +0200 +++ run.py 2022-09-22 13:01:14.452696511 +0200 @@ -30,17 +30,17 @@ def resample_annually(data): return data.sortby("time").resample(time="1A", label="left", loffset="1D").mean(dim="time") def worker(data): locations_chunk = locations.sel(locations=data.locations) out_raw = data * locations_chunk def worker(data, locations): out_raw = data * locations out = resample_annually(out_raw) return out template = resample_annually(data) out = xr.map_blocks( - lambda data: worker(data).compute().chunk({"time": None}), + lambda data, locations: worker(data, locations).compute().chunk({"time": None}), data, + (locations,), template=template, ) ``` This seems to fix this issue and seems to be the proper solution anyway. I still don't see why I am not allowed to use `.sel()` on shadowed objects in the worker `func()´. Is this on purpose? If yes, should we add something to the documentation? Is this a specific behavior of`map_blocks()`? Is it related to #6904?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	pandas.errors.InvalidIndexError raised when running computation in parallel using dask 1379372915
1252561840	https://github.com/pydata/xarray/issues/7059#issuecomment-1252561840	https://api.github.com/repos/pydata/xarray/issues/7059	IC_kwDOAMm_X85KqJOw	lumbric 691772	2022-09-20T15:54:48Z	2022-09-20T15:54:48Z	CONTRIBUTOR	@benbovy thanks for the hint! I tried passing an explicit lock to `xr.open_mfdataset()` as suggested, but didn't change anything, still the same exception. I will double check, if I did it the right way, I might be missing something.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	pandas.errors.InvalidIndexError raised when running computation in parallel using dask 1379372915

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);