home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 557563566

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1603#issuecomment-557563566 https://api.github.com/repos/pydata/xarray/issues/1603 557563566 MDEyOklzc3VlQ29tbWVudDU1NzU2MzU2Ng== 2067093 2019-11-22T14:59:29Z 2019-11-22T14:59:29Z NONE

I've noticed that basically all my current troubles with xarray lead to this issue (lack of MultiIndex support). I use xarray for machine learning/data science/econometrics. My current problem requires a semi-hierarchical indexing on one of the dimensions, and slicing/aggregation along some levels of those dimensions.

My first attempt was to just assume each dimension was orthogonal, which resulted in out-of-memory errors. I ended up using a MultiIndex for the hierarchy dimension to have a "dense" representation of a sparse subspace. Unfortunately, currently .sel() and such will cut out MultiIndex dimensions, and I've had to do boolean masking to keep all the dimensions I need.

Multidimensional groupby, especially within the MultiIndex, is a headache as it currently stands. I had to resort to making auxilliary dimensions with one-hot encoded levels (dummy variables) and doing multiply-aggregate operations by hand.

xarray is really beautiful and should be used more by data scientists, but it's really difficult to recommend it to colleagues when not all the familiar pandas-style operations are supported.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  262642978
Powered by Datasette · Queries took 0.667ms · About: xarray-datasette