home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 78214797

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/364#issuecomment-78214797 https://api.github.com/repos/pydata/xarray/issues/364 78214797 MDEyOklzc3VlQ29tbWVudDc4MjE0Nzk3 1217238 2015-03-11T07:06:57Z 2015-03-11T07:06:57Z MEMBER

The problem is that you've created a new timeofday dimension that is gigantic and orthogonal to all the other ones. You want timeofday to be along the time dimension.

d.groupby('timeofday').mean('time') is literally doing the exact same calculation 70128 times. We also implicitly assume the coordinates corresponding to dimensions have unique labels, which explains why we aren't grouping 48 times instead.

Also, unlike pandas, xray currently does the core loop for all groupby operations in pure Python, which means that yes, it will be slow when you have a very large number of groups (and it loops again to handle your 15 different variables). Using something like Cython or Numba to speedup groupby operations is on my to-do list, but I've found this to be less of a barrier than you might expect for multi-dimensional datasets -- individual group members tend to include more elements than in DataFrames.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  60303760
Powered by Datasette · Queries took 0.602ms · About: xarray-datasette