html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/1094#issuecomment-259121188,https://api.github.com/repos/pydata/xarray/issues/1094,259121188,MDEyOklzc3VlQ29tbWVudDI1OTEyMTE4OA==,4160723,2016-11-08T12:12:54Z,2016-11-08T12:13:33Z,MEMBER,"Yes I understand that using a `pandas.MultiIndex` for such case is not efficient at all for both indexing and regarding memory usage and index building. My example was actually not complete, since I also have categorical indexes such as a few regions defined in space (with complex geometries) and node types (e.g., boundary, active, inactive). Sorry not to have mentioned that. a KDTree is indeed good for indexing on space coordinates. Looking at the API you suggest in #475, my (2-d) mesh might look like this: ``` >>> ds Dimensions: (node: 10000000) Coordinates: * node - region (node) int32 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ... - node_type (node) object 'b' 'b' 'b' 'b' 'a' 'a' 'a' 'a' 'a' 'a' 'a' ... * spatial_index (node) KDTree - x (node) float64 49.754 56.823 65.765 93.058 96.691 105.832 ... - y (node) float64 37.582 45.769 58.672 77.029 82.983 99.672 ... Data variables: topo_elevation (node) float64 57.352 48.710 47.080 49.072 33.184 54.833 ... ``` Anyway, maybe I've opened this issue a bit too early since my data still fits into memory, though it is likely that I'll have to deal with meshes of 1e8 to 1e9 nodes in a near future. Side note: I don't know why I get much worse performance on my machine when building the KDTree? (Intel(R) Xeon(R) CPU x4 5160 @ 3.00GHz, 16 Gb RAM, scipy 0.18.1, numpy 1.11.2) ``` In [3]: x = np.random.rand(int(1e7), 3) In [4]: %time tree = scipy.spatial.cKDTree(x, leafsize=100) CPU times: user 38 s, sys: 64 ms, total: 38.1 s Wall time: 38.1 s ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187873247