html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1094#issuecomment-767797103,https://api.github.com/repos/pydata/xarray/issues/1094,767797103,MDEyOklzc3VlQ29tbWVudDc2Nzc5NzEwMw==,1312546,2021-01-26T20:09:11Z,2021-01-26T20:09:11Z,MEMBER,Should this and https://github.com/pydata/xarray/issues/1650 be consolidated into a single issue? I think that they're duplicates of eachother.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187873247
https://github.com/pydata/xarray/issues/1094#issuecomment-457468939,https://api.github.com/repos/pydata/xarray/issues/1094,457468939,MDEyOklzc3VlQ29tbWVudDQ1NzQ2ODkzOQ==,26384082,2019-01-25T06:21:47Z,2019-01-25T06:21:47Z,NONE,"In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187873247
https://github.com/pydata/xarray/issues/1094#issuecomment-259188266,https://api.github.com/repos/pydata/xarray/issues/1094,259188266,MDEyOklzc3VlQ29tbWVudDI1OTE4ODI2Ng==,1197350,2016-11-08T16:38:27Z,2016-11-08T16:38:27Z,MEMBER,"My cKDTree time was:
- 19.2 s on a 32-core Intel(R) Xeon(R) CPU E5-4627 v2 @ 3.30GHz, 512 GB RAM.
- 23 s on my Macbook (1.7 GHz Intel Core i7)
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187873247
https://github.com/pydata/xarray/issues/1094#issuecomment-259121188,https://api.github.com/repos/pydata/xarray/issues/1094,259121188,MDEyOklzc3VlQ29tbWVudDI1OTEyMTE4OA==,4160723,2016-11-08T12:12:54Z,2016-11-08T12:13:33Z,MEMBER,"Yes I understand that using a `pandas.MultiIndex` for such case is not efficient at all for both indexing and regarding memory usage and index building.
My example was actually not complete, since I also have categorical indexes such as a few regions defined in space (with complex geometries) and node types (e.g., boundary, active, inactive). Sorry not to have mentioned that.
a KDTree is indeed good for indexing on space coordinates. Looking at the API you suggest in #475, my (2-d) mesh might look like this:
```
>>> ds
Dimensions: (node: 10000000)
Coordinates:
* node
- region (node) int32 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ...
- node_type (node) object 'b' 'b' 'b' 'b' 'a' 'a' 'a' 'a' 'a' 'a' 'a' ...
* spatial_index (node) KDTree
- x (node) float64 49.754 56.823 65.765 93.058 96.691 105.832 ...
- y (node) float64 37.582 45.769 58.672 77.029 82.983 99.672 ...
Data variables:
topo_elevation (node) float64 57.352 48.710 47.080 49.072 33.184 54.833 ...
```
Anyway, maybe I've opened this issue a bit too early since my data still fits into memory, though it is likely that I'll have to deal with meshes of 1e8 to 1e9 nodes in a near future.
Side note: I don't know why I get much worse performance on my machine when building the KDTree? (Intel(R) Xeon(R) CPU x4 5160 @ 3.00GHz, 16 Gb RAM, scipy 0.18.1, numpy 1.11.2)
```
In [3]: x = np.random.rand(int(1e7), 3)
In [4]: %time tree = scipy.spatial.cKDTree(x, leafsize=100)
CPU times: user 38 s, sys: 64 ms, total: 38.1 s
Wall time: 38.1 s
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187873247
https://github.com/pydata/xarray/issues/1094#issuecomment-259022083,https://api.github.com/repos/pydata/xarray/issues/1094,259022083,MDEyOklzc3VlQ29tbWVudDI1OTAyMjA4Mw==,1217238,2016-11-08T01:52:40Z,2016-11-08T01:52:40Z,MEMBER,"For unstructured meshes of points, pandas.MultiIndex is not the right abstraction.
Suppose you have a (very long) list of sorted points `(x, y, z)` in a multi-index. You can efficiently query within fixed bounds along `x` by doing binary search. But for queries in `y` and `z`, you cannot do any better than looking through the entire list. Moreover, pandas.MultiIndex factorizes each level into unique values, which is a complete waste on an unstructured grid where few coordinate overlap.
For unstructured meshes, you need something like a KDTree (see discussion in https://github.com/pydata/xarray/issues/475), with ideally with nearby points in space stored in contiguous array chunks.
I would start with trying to get an in-memory KDTree working, and then switch to something out of core only when/if necessary. For example, SciPy's cKDTree can load 1e7 points in 3-dimensions in only a few seconds:
```
x = np.random.rand(int(1e7), 3)
%time tree = scipy.spatial.cKDTree(x, leafsize=100)
# CPU times: user 2.58 s, sys: 0 ns, total: 2.58 s
# Wall time: 2.55 s
```
The might be good enough.
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187873247