home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 195050684

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
195050684 MDU6SXNzdWUxOTUwNTA2ODQ= 1161 Generated Dask graph is huge - performance issue? 743508 closed 0     8 2016-12-12T18:35:12Z 2017-01-23T20:21:14Z 2017-01-23T20:21:14Z CONTRIBUTOR      

I've been trying to get around some performance issues when subsetting a set of netCDF files opend with open_mfdataset. I managed to print out the generated dask graph for one variable and it doesn't seem right - it's huge, 5000 elements, and seems to have a getitem entry for every requested element for that variable.

The code that generates this select looks roughly like:

```python

paths = WEATHER_MET['latlon'].glob('_resampled.nc') dataset = xr.open_mfdataset([str(p) for p in paths]) selection = dataset.sel(time=time_sel).sel_points(method='nearest', tolerance=0.1, lon=lon, lat=lat) selection = weights ```

and the graph for one variable in the select (the irradiance value) looks like this:

mydask.pdf

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1161/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 8 rows from issue in issue_comments
Powered by Datasette · Queries took 0.62ms · About: xarray-datasette