home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where issue = 262642978 and user = 2067093 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • NowanIlfideme · 2 ✖

issue 1

  • Explicit indexes in xarray's data-model (Future of MultiIndex) · 2 ✖

author_association 1

  • NONE 2
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
557579503 https://github.com/pydata/xarray/issues/1603#issuecomment-557579503 https://api.github.com/repos/pydata/xarray/issues/1603 MDEyOklzc3VlQ29tbWVudDU1NzU3OTUwMw== NowanIlfideme 2067093 2019-11-22T15:34:57Z 2019-11-22T15:34:57Z NONE

Thanks @NowanIlfideme for your feedback.

Could you perhaps share a gist of code related to your use case?

The first example in this comment is similar to my use case: https://github.com/pydata/xarray/issues/3213#issuecomment-520741706 . There are several "core" dimensions, but some part of the coordinates may be hierarchical or cross-defined (e.g. country > province > city > building, but also country > province > voting district > building). We might have a full or nearly-full panel in the MultiIndex representation, but have a huge cross product (even if we keep strictly hierarchical dimensions out).

Meanwhile using a true COO sparse representation (as I understand it) will likely end up with slower operations overall, since nearly all machine learning models (think: linear regression) require a dense array input anyways.

I'll make an example of this when I find some free time, along with a contrasting one in Pandas. :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
557563566 https://github.com/pydata/xarray/issues/1603#issuecomment-557563566 https://api.github.com/repos/pydata/xarray/issues/1603 MDEyOklzc3VlQ29tbWVudDU1NzU2MzU2Ng== NowanIlfideme 2067093 2019-11-22T14:59:29Z 2019-11-22T14:59:29Z NONE

I've noticed that basically all my current troubles with xarray lead to this issue (lack of MultiIndex support). I use xarray for machine learning/data science/econometrics. My current problem requires a semi-hierarchical indexing on one of the dimensions, and slicing/aggregation along some levels of those dimensions.

My first attempt was to just assume each dimension was orthogonal, which resulted in out-of-memory errors. I ended up using a MultiIndex for the hierarchy dimension to have a "dense" representation of a sparse subspace. Unfortunately, currently .sel() and such will cut out MultiIndex dimensions, and I've had to do boolean masking to keep all the dimensions I need.

Multidimensional groupby, especially within the MultiIndex, is a headache as it currently stands. I had to resort to making auxilliary dimensions with one-hot encoded levels (dummy variables) and doing multiply-aggregate operations by hand.

xarray is really beautiful and should be used more by data scientists, but it's really difficult to recommend it to colleagues when not all the familiar pandas-style operations are supported.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.883ms · About: xarray-datasette