issues: 1181573623

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1181573623	I_kwDOAMm_X85GbWH3	6413	Lazy label-based .isel() using 1-D Boolean array, followed by .load() is very slow	28287009	closed	0			1	2022-03-26T07:31:00Z	2023-11-06T06:10:01Z	2023-11-06T06:10:00Z	NONE				What is your issue? Info about my dataset I have a large (~20 GB, `~27,000 x ~300,000`, `int16`) netCDF-4 file written to disk incrementally along the first (unlimited) dimension without using `dask` (using code adapted from this comment). The DataArray stored in this file also has `~50` coordinates along the first dimension and `~300` coordinates along the second dimension. Trying to load a subset of the data into memory I have a 1D Boolean mask `my_mask` (with `~15,000` `True`) along the second dimension of the array that I'd like to use to index my array. When I do the following, the operation is very slow (I haven't seen it complete): `python import xarray as xr x = xr.open_dataarray(path_to_file) x = x.isel({"second_dim": my_mask}) x = x.load()` However, I can load the entire array and then index (this is slow-ish, but works): `python import xarray as xr x = xr.load_dataarray(path_to_file) x = x.isel({"second_dim": my_mask})` Is this vectorized indexing? I'm not sure if this is expected behavior: according to the Tip here in the User Guide, indexing is slow when using vectorized indexing, which I assumed to mean indexing along multiple dimensions (outer indexing, in `numpy` parlance). Is indexing using a 1D Boolean mask (or equivalently a 1D integer array) also slow? What to do for larger datasets that don't fit in RAM? Right now, I can `load` and then `isel` because my array fits in RAM. I have other datasets that don't fit in RAM: how would you recommend I load a subset of such data from disk? In the event that I have to use `dask`, I will be writing along the first dimension (and hence chunking along that dimension, probably), and reading along the second dimension: is that going to be efficient (or at least more efficient than whatever `xarray` is doing sans `dask`)?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6413/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		not_planned	13221727	issue

Links from other tables

1 row from issues_id in issues_labels
0 rows from issue in issue_comments