home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

11 rows where issue = 40395257 and user = 291576 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • WeatherGod · 11 ✖

issue 1

  • Pointwise indexing -- something like sel_points · 11 ✖

author_association 1

  • CONTRIBUTOR 11
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
58570858 https://github.com/pydata/xarray/issues/214#issuecomment-58570858 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU4NTcwODU4 WeatherGod 291576 2014-10-09T20:19:12Z 2014-10-09T20:19:12Z CONTRIBUTOR

Ok, I think I got it (for reals this time...)

``` def bcast(spat_only, coord_names): coords = [] for i, n in enumerate(coord_names): if spat_only[n].ndim != len(spat_only.dims): # Needs new axes slices = [np.newaxis] * len(spat_only.dims) slices[i] = slice(None) else: slices = [slice(None)] * len(spat_only.dims) coords.append(spat_only[n].values[slices]) return np.broadcast_arrays(*coords)

def grid_to_points2(grid, points, coord_names): if not coord_names: raise ValueError("No coordinate names provided") spat_dims = {d for n in coord_names for d in grid[n].dims} not_spatial = set(grid.dims) - spat_dims spatial_selection = {n:0 for n in not_spatial} spat_only = grid.isel(**spatial_selection)

coords = bcast(spat_only, coord_names)

kd = KDTree(zip(*[c.ravel() for c in coords]))
_, indx = kd.query(zip(*[points[n].values for n in coord_names]))
indx = np.unravel_index(indx, coords[0].shape)

return xray.concat(
        (grid.isel(**{n:j for n, j in zip(spat_only.dims, i)})
         for i in zip(*indx)),
        dim='station')

```

Needs a lot more tests and comments and such, but I think this works. Best part is that it seems to do a very decent job of keeping memory usage low, and only operates upon the coordinates that I specify. Everything else is left alone. So, I have used this on 4-D data, picking out grid points at specified lat/lon positions, and get back a 3D result (time, level, station). And I have used this on just 2D data, getting back just a 1D result (dimension='station').

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
58568933 https://github.com/pydata/xarray/issues/214#issuecomment-58568933 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU4NTY4OTMz WeatherGod 291576 2014-10-09T20:05:01Z 2014-10-09T20:05:01Z CONTRIBUTOR

Consider the following Dataset:

<xray.Dataset> Dimensions: (lv_HTGL1: 2, lv_HTGL3: 2, lv_HTGL5: 2, lv_HTGL6: 2, lv_ISBL0: 37, lv_SPDL2: 6, lv_SPDL4: 3, time: 9, xgrid_0: 451, ygrid_0: 337) Coordinates: * xgrid_0 (xgrid_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... * ygrid_0 (ygrid_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... * lv_ISBL0 (lv_ISBL0) float32 10000.0 12500.0 15000.0 17500.0 20000.0 ... * lv_HTGL6 (lv_HTGL6) float32 1000.0 4000.0 * lv_HTGL1 (lv_HTGL1) float32 2.0 80.0 * lv_HTGL3 (lv_HTGL3) float32 10.0 80.0 latitude (ygrid_0, xgrid_0) float32 16.281 16.3084 16.3356 16.3628 16.3898 ... longitude (ygrid_0, xgrid_0) float32 233.862 233.984 234.106 234.229 ... * lv_HTGL5 (lv_HTGL5) int64 0 1 * lv_SPDL2 (lv_SPDL2) int64 0 1 2 3 4 5 * lv_SPDL4 (lv_SPDL4) int64 0 1 2 * time (time) datetime64[ns] 2014-09-25T01:00:00 ... Variables: gridrot_0 (ygrid_0, xgrid_0) float32 -0.229676 -0.228775 -0.227873 ... TMP_P0_L103_GLC0 (time, lv_HTGL1, ygrid_0, xgrid_0) float64 295.8 295.7 295.7 295.7 ...

The latitude and longitude variables are both dependent upon xgrid_0 and ygrid_0. Meanwhile...

<xray.Dataset> Dimensions: (station: 120, time: 4) Coordinates: latitude (station) float32 34.805 34.795 34.585 36.705 34.245 34.915 34.195 36.075 ... * station (station) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ... sixhourly (time) int64 0 1 2 3 longitude (station) float32 -98.025 -96.665 -99.335 -98.705 -95.665 -98.295 ... * time (time) datetime64[ns] 2014-10-07 2014-10-07T06:00:00 ... Variables: MaxGust (station, time) float64 7.794 7.47 8.675 4.788 7.071 7.903 8.641 5.533 ...

the latitude and longitude variables are independent of each other (they are 1-D).

The variable in the first one can not be accessed directly by lat/lon values, while the MaxGust variable in the second one can. This poses some difficulties.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
58565934 https://github.com/pydata/xarray/issues/214#issuecomment-58565934 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU4NTY1OTM0 WeatherGod 291576 2014-10-09T19:43:08Z 2014-10-09T19:43:08Z CONTRIBUTOR

Hmmm, limitation that I just encountered. When there are dependent coordinates, the variables representing those coordinates are not the index arrays (and thus, are not "dimensions" either), so my solution is completely broken for dependent coordinates. If I were to go back to my DataArray-only solution, then I still need to correct the code to use the dimension names of the coordinate variables, and still need to fix the coordinates != dimensions issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
58562506 https://github.com/pydata/xarray/issues/214#issuecomment-58562506 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU4NTYyNTA2 WeatherGod 291576 2014-10-09T19:16:52Z 2014-10-09T19:16:52Z CONTRIBUTOR

to/from_dateframe just ate up all my memory. I think I am going to stick with my broadcasting approach...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
58558069 https://github.com/pydata/xarray/issues/214#issuecomment-58558069 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU4NTU4MDY5 WeatherGod 291576 2014-10-09T18:47:22Z 2014-10-09T18:47:22Z CONTRIBUTOR

oooh, didn't realize that dims is different for DataSet and DataArray... Gonna have to fix that, too. I am checking out the broadcasting functions you pointed out. The one limitation I see right away with xray.core.variable.broadcast_variables is that it is limited to two variables (presumedly, I would be broadcasting N number of coordinates because the variables may or may not have extraneous dimensions that I don't care to broadcast)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
58553935 https://github.com/pydata/xarray/issues/214#issuecomment-58553935 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU4NTUzOTM1 WeatherGod 291576 2014-10-09T18:21:16Z 2014-10-09T18:21:16Z CONTRIBUTOR

And, actually, the example I gave above has a bug in the dependent dimension case. This one should be much better (not fully tested yet, though):

``` def grid_to_points2(grid, points, coord_names): if not coord_names: raise ValueError("No coordinate names provided") not_spatial = set(grid.dims) - set(coord_names) spatial_selection = {n:0 for n in not_spatial} spat_only = grid.isel(*spatial_selection) coords = [] for i, n in enumerate(spat_only.dims): if spat_only[n].ndim != len(spat_only.dims): # Needs new axes slices = [np.newaxis] * len(spat_only.dims) slices[i] = slice(None) else: slices = [slice(None)] * len(spat_only.dims) coords.append(spat_only[n].values[slices]) coords = np.broadcast_arrays(coords)

kd = KDTree(zip(*[c.flatten() for c in coords]))
_, indx = kd.query(zip(*[points[n].values for n in spat_only.dims]))
indx = np.unravel_index(indx, coords[0].shape)

return xray.concat(
        (grid.sel(**{n:c[i] for n, c in zip(spat_only.dims, coords)})
         for i in zip(*indx)),
        dim='station')

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
58551759 https://github.com/pydata/xarray/issues/214#issuecomment-58551759 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU4NTUxNzU5 WeatherGod 291576 2014-10-09T18:06:56Z 2014-10-09T18:06:56Z CONTRIBUTOR

And, I think I just realized how I could generalize it even more. Right now, grid can only be a DataArray, but I would like this to work for a DataSet as well. I bet if I use .sel() instead of .isel() and access the elements of the broadcasted arrays, I could make this work very nicely for both DataArray and DataSet.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
58550741 https://github.com/pydata/xarray/issues/214#issuecomment-58550741 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU4NTUwNzQx WeatherGod 291576 2014-10-09T18:00:33Z 2014-10-09T18:00:33Z CONTRIBUTOR

Oh, and it does take advantage of a bunch of python2.7 features such as dictionary comprehensions and generator statements, so...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
58550403 https://github.com/pydata/xarray/issues/214#issuecomment-58550403 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU4NTUwNDAz WeatherGod 291576 2014-10-09T17:58:25Z 2014-10-09T17:58:25Z CONTRIBUTOR

Starting using the above snippet for more datasets, some with interdependent coordinates and some without (so the coordinates would be 1-d). I think I have generalized it significantly...

``` def grid_to_points(grid, points, coord_names): not_spatial = set(grid.dims) - set(coord_names) spatial_selection = {n:0 for n in not_spatial} spat_only = grid.isel(*spatial_selection) coords = [] for i, n in enumerate(spat_only.dims): if spat_only[n].ndim != len(spat_only.dims): # Needs new axes slices = [np.newaxis] * len(spat_only.dims) slices[i] = slice(None) else: slices = [slice(None)] * len(spat_only.dims) coords.append(spat_only[n].values[slices]) coords = [c.flatten() for c in np.broadcast_arrays(coords)]

kd = KDTree(zip(*coords))
_, indx = kd.query(zip(*[points[n].values for n in spat_only.dims]))
indx = np.unravel_index(indx, spat_only.shape)

return xray.concat((grid.isel(**{n:j for n, j in zip(spat_only.dims, i)})
                    for i in zip(*indx)), dim='station')

```

I can still imagine some situations where this won't work, such as a requested set of dimensions that are a mix of dependent and independent variables. Currently, if the dimensions are independent, then the number of dimensions of each one is assumed to be 1 and np.newaxis is used for the others. Meanwhile, if the dimensions are dependent, then the number of dimensions for each one is assumed to be the same as the number of dependent variables and is merely flattened (the broadcast is essentially no-op).

I should also note that this is technically not restricted to spatial coordinates even though the code says so. Just anything that can be represented in euclidean space.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
57857522 https://github.com/pydata/xarray/issues/214#issuecomment-57857522 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU3ODU3NTIy WeatherGod 291576 2014-10-03T20:48:35Z 2014-10-03T20:48:35Z CONTRIBUTOR

Just managed to implement this using your suggestion for my data:

from scipy.spatial import cKDTree as KDTree kd = KDTree(zip(model['longitude'].values.ravel(), model['latitude'].values.ravel())) dists, indx = kd.query(zip(obs['longitude'], obs['latitude'])) indx = np.unravel_index(indx, mod['longitude'].shape) mod_points = xray.concat([mod.isel(x=x, y=y) for y, x in zip(*indx)], dim='station')

Not entirely certain why I needed to reverse y and x in that last part, but, oh well...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
57847940 https://github.com/pydata/xarray/issues/214#issuecomment-57847940 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU3ODQ3OTQw WeatherGod 291576 2014-10-03T19:56:16Z 2014-10-03T19:56:16Z CONTRIBUTOR

Unless I am missing something about xray, that selection operation could only work if pts had values that exactly matched coordinate values in ds. In most scenarios, that would not be the case. One would have to first build pts from a computation of nearest-neighbor indexs between the stations and the model grid.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.787ms · About: xarray-datasette