home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 476678007

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2852#issuecomment-476678007 https://api.github.com/repos/pydata/xarray/issues/2852 476678007 MDEyOklzc3VlQ29tbWVudDQ3NjY3ODAwNw== 1197350 2019-03-26T14:41:59Z 2019-03-26T14:41:59Z MEMBER

label (y, x) uint16 dask.array<shape=(10980, 10980), chunksize=(200, 10980)> ... geoms_ds.groupby('label')`

It is very hard to make this sort of groupby lazy, because you are grouping over the variable label itself. Groupby uses a split-apply-combine paradigm to transform the data. The apply and combine steps can be lazy. But the split step cannot. Xarray uses the group variable to determine how to index the array, i.e. which items belong in which group. To do this, it needs to read the whole variable into memory.

In this specific example, it sounds like what you want is to compute the histogram of labels. That could be accomplished without groupby. For example, you could use apply_ufunc together with dask.array.histogram.

So my recommendation is to think of a way to accomplish what you want that does not involve groupby.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  425320466
Powered by Datasette · Queries took 0.716ms · About: xarray-datasette