home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where author_association = "NONE" and issue = 146182176 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • naught101 4
  • monocongo 1

issue 1

  • Multidimensional groupby · 5 ✖

author_association 1

  • NONE · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
219231028 https://github.com/pydata/xarray/pull/818#issuecomment-219231028 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIxOTIzMTAyOA== monocongo 1328158 2016-05-14T16:56:37Z 2016-05-14T16:56:37Z NONE

I would also like to do what is described below but so far have had little success using xarray.

I have time series data (x years of monthly values) at each lat/lon point of a grid (x*12 times, lons, lats). I want to apply a function f() against the time series to return a corresponding time series of values. I then write these values to an output NetCDF which corresponds to the input NetCDF in terms of dimensions and coordinate variables. So instead of looping over every lat and every lon I want to apply f() in a vectorized manner such as what's described for xarray's groupby (in order to gain the expected performance from using xarray for the split-apply-combine pattern), but it needs to work for more than a single dimension which is the current capability.

Has anyone done what is described above using xarray? What sort of performance gains can be expected using your approach?

Thanks in advance for any help with this topic. My apologies if there is a more appropriate forum for this sort of discussion (please redirect if so), as this may not be applicable to the original issue...

--James

On Wed, May 11, 2016 at 2:24 AM, naught101 notifications@github.com wrote:

I want to be able to run a scikit-learn model over a bunch of variables in a 3D (lat/lon/time) dataset, and return values for each coordinate point. Is something like this multi-dimensional groupby required (I'm thinking groupby(lat, lon) => 2D matrices that can be fed straight into scikit-learn), or is there already some other mechanism that could achieve something like this? Or is the best way at the moment just to create a null dataset, and loop over lat/lon and fill in the blanks as you go?

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/pydata/xarray/pull/818#issuecomment-218372591

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
218675077 https://github.com/pydata/xarray/pull/818#issuecomment-218675077 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIxODY3NTA3Nw== naught101 167164 2016-05-12T06:54:53Z 2016-05-12T06:54:53Z NONE

forcing_data.isel(lat=lat, lon=lon).values() returns a ValuesView, which scikit-learn doesn't like. However, forcing_data.isel(lat=lat, lon=lon).to_array().T seems to work..

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
218667702 https://github.com/pydata/xarray/pull/818#issuecomment-218667702 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIxODY2NzcwMg== naught101 167164 2016-05-12T06:02:55Z 2016-05-12T06:02:55Z NONE

@shoyer: Where does times come from in that code?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
218654978 https://github.com/pydata/xarray/pull/818#issuecomment-218654978 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIxODY1NDk3OA== naught101 167164 2016-05-12T04:02:43Z 2016-05-12T04:03:01Z NONE

Example forcing data:

<xarray.Dataset> Dimensions: (lat: 360, lon: 720, time: 2928) Coordinates: * lon (lon) float64 0.25 0.75 1.25 1.75 2.25 2.75 3.25 3.75 4.25 4.75 ... * lat (lat) float64 -89.75 -89.25 -88.75 -88.25 -87.75 -87.25 -86.75 ... * time (time) datetime64[ns] 2012-01-01 2012-01-01T03:00:00 ... Data variables: SWdown (time, lat, lon) float64 446.5 444.9 445.3 447.8 452.4 456.3 ...

Where there might be an arbitrary number of data variables, and the scikit-learn input would be time (rows) by data variables (columns). I'm currently doing this:

``` python def predict_gridded(model, forcing_data, flux_vars): """predict model results for gridded data

:model: TODO
:data: TODO
:returns: TODO

"""
# set prediction metadata
prediction = forcing_data[list(forcing_data.coords)]

# Arrays like (var, lon, lat, time)
result = np.full([len(flux_vars),
                  forcing_data.dims['lon'],
                  forcing_data.dims['lat'],
                  forcing_data.dims['time']],
                 np.nan)
print("predicting for lon: ")
for lon in range(len(forcing_data['lon'])):
    print(lon, end=', ')
    for lat in range(len(forcing_data['lat'])):
        result[:, lon, lat, :] = model.predict(
            forcing_data.isel(lat=lat, lon=lon)
                        .to_dataframe()
                        .drop(['lat', 'lon'], axis=1)
        ).T
print("")
for i, fv in enumerate(flux_vars):
    prediction.update(
        {fv: xr.DataArray(result[i, :, :, :], 
                          dims=['lon', 'lat', 'time'],
                          coords=forcing_data.coords)
        }
    )

return prediction

```

and I think it's working (still debugging, and it's pretty slow running)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
218372591 https://github.com/pydata/xarray/pull/818#issuecomment-218372591 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIxODM3MjU5MQ== naught101 167164 2016-05-11T06:24:11Z 2016-05-11T06:24:11Z NONE

I want to be able to run a scikit-learn model over a bunch of variables in a 3D (lat/lon/time) dataset, and return values for each coordinate point. Is something like this multi-dimensional groupby required (I'm thinking groupby(lat, lon) => 2D matrices that can be fed straight into scikit-learn), or is there already some other mechanism that could achieve something like this? Or is the best way at the moment just to create a null dataset, and loop over lat/lon and fill in the blanks as you go?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 19.07ms · About: xarray-datasette