issues: 416962458

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
416962458	MDU6SXNzdWU0MTY5NjI0NTg=	2799	Performance: numpy indexes small amounts of data 1000 faster than xarray	1386642	open	0			42	2019-03-04T19:44:17Z	2024-03-18T17:51:25Z		CONTRIBUTOR				Machine learning applications often require iterating over every index along some of the dimensions of a dataset. For instance, iterating over all the `(lat, lon)` pairs in a 4D dataset with dimensions `(time, level, lat, lon)`. Unfortunately, this is very slow with xarray objects compared to numpy (or h5py) arrays. When the Pangeo machine learning working group met today, we found that several of us have struggled with this. I made some simplified benchmarks, which show that xarray is about 1000 times slower than numpy when repeatedly grabbing a small amount of data from an array. This is a problem with both `isel` or `[]` indexing. After doing some profiling, the main culprits seem to be xarray routines like `_validate_indexers` and `_broadcast_indexes`. While python will always be slower than C when iterating over an array in this fashion, I would hope that xarray could be nearly as fast as numpy. I am not sure what the best way to improve this is though.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2799/reactions", "total_count": 9, "+1": 9, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }			13221727	issue

Links from other tables

1 row from issues_id in issues_labels
40 rows from issue in issue_comments