html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/7516#issuecomment-1532601237,https://api.github.com/repos/pydata/xarray/issues/7516,1532601237,IC_kwDOAMm_X85bWaOV,1492047,2023-05-03T07:58:22Z,2023-05-03T07:58:22Z,CONTRIBUTOR,"Hello, I'm not sure performances problematics were fully addressed (we're now forced to fully compute/load the selection expression) but changes made in the last versions makes this issue irrelevant and I think we can close it. Thank you!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1575938277 https://github.com/pydata/xarray/issues/7516#issuecomment-1451754167,https://api.github.com/repos/pydata/xarray/issues/7516,1451754167,IC_kwDOAMm_X85WiAK3,1492047,2023-03-02T11:59:47Z,2023-03-02T11:59:47Z,CONTRIBUTOR,"The `.variable` computation is fast but it cannot be directly used like you suggest: ``` dsx.where(sel.variable, drop=True) TypeError: cond argument is ... but must be a or ``` Doing it like this seems to be working correctly (and is fast enough): ``` dsx[""x""]= sel.variable.compute() dsx.where(dsx[""x""], drop=True) ``` `_nadir` variables have the same chunks and are way faster to read than the other ones (lot smaller). ![image](https://user-images.githubusercontent.com/1492047/222421050-0928ddfc-f5d9-4767-a7d2-84fdf8f91938.png) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1575938277 https://github.com/pydata/xarray/issues/7516#issuecomment-1449714522,https://api.github.com/repos/pydata/xarray/issues/7516,1449714522,IC_kwDOAMm_X85WaONa,1492047,2023-03-01T09:43:27Z,2023-03-01T09:43:27Z,CONTRIBUTOR,"``` sel = (dsx[""longitude""] > 0) & (dsx[""longitude""] < 100) sel.compute() ``` This ""compute"" finishes and takes more than 80sec on both versions with a huge memory consumption (it loads the 4 coordinates and the result itself). I know xarray has to keep more information regarding coordinates and dimensions but doing this (just dask arrays) : ``` sel2 = (dsx[""longitude""].data > 0) & (dsx[""longitude""].data < 100) sel2.compute() ``` Takes less than 6 seconds.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1575938277 https://github.com/pydata/xarray/issues/7516#issuecomment-1447798846,https://api.github.com/repos/pydata/xarray/issues/7516,1447798846,IC_kwDOAMm_X85WS6g-,1492047,2023-02-28T08:54:16Z,2023-02-28T11:24:11Z,CONTRIBUTOR,"Just tried it and it does not seem identical at all to what was happening earlier. This is the kind of dataset I'm working ![image](https://user-images.githubusercontent.com/1492047/221800788-b44051ba-f89a-4dd7-9358-21128858e4d7.png) With this selection: `sel = (dsx[""longitude""] > 0) & (dsx[""longitude""] < 100)` Old xarray takes a little less that 1 minute and less than 6GB of memory. New xarray with compute did not finish and had to be stopped before consuming my 16GB of memory.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1575938277