html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/7516#issuecomment-1532601237,https://api.github.com/repos/pydata/xarray/issues/7516,1532601237,IC_kwDOAMm_X85bWaOV,1492047,2023-05-03T07:58:22Z,2023-05-03T07:58:22Z,CONTRIBUTOR,"Hello, I'm not sure performances problematics were fully addressed (we're now forced to fully compute/load the selection expression) but changes made in the last versions makes this issue irrelevant and I think we can close it. Thank you!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1575938277 https://github.com/pydata/xarray/issues/7516#issuecomment-1451754167,https://api.github.com/repos/pydata/xarray/issues/7516,1451754167,IC_kwDOAMm_X85WiAK3,1492047,2023-03-02T11:59:47Z,2023-03-02T11:59:47Z,CONTRIBUTOR,"The `.variable` computation is fast but it cannot be directly used like you suggest: ``` dsx.where(sel.variable, drop=True) TypeError: cond argument is ... but must be a or ``` Doing it like this seems to be working correctly (and is fast enough): ``` dsx[""x""]= sel.variable.compute() dsx.where(dsx[""x""], drop=True) ``` `_nadir` variables have the same chunks and are way faster to read than the other ones (lot smaller). ![image](https://user-images.githubusercontent.com/1492047/222421050-0928ddfc-f5d9-4767-a7d2-84fdf8f91938.png) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1575938277 https://github.com/pydata/xarray/issues/7516#issuecomment-1450712889,https://api.github.com/repos/pydata/xarray/issues/7516,1450712889,IC_kwDOAMm_X85WeB85,2448579,2023-03-01T19:10:15Z,2023-03-01T19:10:15Z,MEMBER,"Yeah that was another change I guess. We could extract out the variable using `.variable`. ``` .where(sel2.variable.compute(), drop=True) ``` do your `""_nadir""` variables have smaller chunk sizes or are slower to read for some reason?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1575938277 https://github.com/pydata/xarray/issues/7516#issuecomment-1449714522,https://api.github.com/repos/pydata/xarray/issues/7516,1449714522,IC_kwDOAMm_X85WaONa,1492047,2023-03-01T09:43:27Z,2023-03-01T09:43:27Z,CONTRIBUTOR,"``` sel = (dsx[""longitude""] > 0) & (dsx[""longitude""] < 100) sel.compute() ``` This ""compute"" finishes and takes more than 80sec on both versions with a huge memory consumption (it loads the 4 coordinates and the result itself). I know xarray has to keep more information regarding coordinates and dimensions but doing this (just dask arrays) : ``` sel2 = (dsx[""longitude""].data > 0) & (dsx[""longitude""].data < 100) sel2.compute() ``` Takes less than 6 seconds.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1575938277 https://github.com/pydata/xarray/issues/7516#issuecomment-1449085012,https://api.github.com/repos/pydata/xarray/issues/7516,1449085012,IC_kwDOAMm_X85WX0hU,2448579,2023-02-28T23:30:59Z,2023-02-28T23:30:59Z,MEMBER,Does `sel.compute()` not finish?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1575938277 https://github.com/pydata/xarray/issues/7516#issuecomment-1447798846,https://api.github.com/repos/pydata/xarray/issues/7516,1447798846,IC_kwDOAMm_X85WS6g-,1492047,2023-02-28T08:54:16Z,2023-02-28T11:24:11Z,CONTRIBUTOR,"Just tried it and it does not seem identical at all to what was happening earlier. This is the kind of dataset I'm working ![image](https://user-images.githubusercontent.com/1492047/221800788-b44051ba-f89a-4dd7-9358-21128858e4d7.png) With this selection: `sel = (dsx[""longitude""] > 0) & (dsx[""longitude""] < 100)` Old xarray takes a little less that 1 minute and less than 6GB of memory. New xarray with compute did not finish and had to be stopped before consuming my 16GB of memory.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1575938277 https://github.com/pydata/xarray/issues/7516#issuecomment-1447565936,https://api.github.com/repos/pydata/xarray/issues/7516,1447565936,IC_kwDOAMm_X85WSBpw,2448579,2023-02-28T04:41:03Z,2023-02-28T04:41:03Z,MEMBER,"The old code had: ``` nonzeros = zip(clipcond.dims, np.nonzero(clipcond.values)) ``` This loaded the array once and then passed numpy values to the indexing code. Now, the dask array is passed to the indexing code and is computed many times . #5873 raises an error saying boolean indexing with dask arrays is not allowed. For here just do `ds.where(sel.compute(), drop=True)`. It's identical to what was happening earlier. I think we should close this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1575938277 https://github.com/pydata/xarray/issues/7516#issuecomment-1447037080,https://api.github.com/repos/pydata/xarray/issues/7516,1447037080,IC_kwDOAMm_X85WQAiY,43316012,2023-02-27T20:27:52Z,2023-02-27T20:27:52Z,COLLABORATOR,"I am a bit puzzled here... The dask graph looks identical, so it must be the way the indexers are constructed. The major difference I can find is: The old version used `np.unique` while the new version uses xarrays `cond.any(..)` Maybe someone with more experience in dask can help out?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1575938277 https://github.com/pydata/xarray/issues/7516#issuecomment-1445469752,https://api.github.com/repos/pydata/xarray/issues/7516,1445469752,IC_kwDOAMm_X85WKB44,43316012,2023-02-26T21:16:35Z,2023-02-26T21:16:35Z,COLLABORATOR,"Git bisect pinpoints this to https://github.com/pydata/xarray/pull/6690 which funny enough, is my PR haha. I will look into it when I find time :)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1575938277 https://github.com/pydata/xarray/issues/7516#issuecomment-1445467918,https://api.github.com/repos/pydata/xarray/issues/7516,1445467918,IC_kwDOAMm_X85WKBcO,43316012,2023-02-26T21:07:56Z,2023-02-26T21:07:56Z,COLLABORATOR,"Can confirm, on my machine it went from 520ms to 5s","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1575938277