home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where issue = 241290234 and user = 941907 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • smartass101 · 2 ✖

issue 1

  • sharing dimensions across dataarrays in a dataset · 2 ✖

author_association 1

  • NONE 2
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
430946620 https://github.com/pydata/xarray/issues/1471#issuecomment-430946620 https://api.github.com/repos/pydata/xarray/issues/1471 MDEyOklzc3VlQ29tbWVudDQzMDk0NjYyMA== smartass101 941907 2018-10-18T09:48:20Z 2018-10-18T09:48:20Z NONE

I indeed often resort to using a pandas.MultiIndex, but especially the dropping of the selected coordinate value (#1408) makes it quite inconvenient.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sharing dimensions across dataarrays in a dataset 241290234
430324391 https://github.com/pydata/xarray/issues/1471#issuecomment-430324391 https://api.github.com/repos/pydata/xarray/issues/1471 MDEyOklzc3VlQ29tbWVudDQzMDMyNDM5MQ== smartass101 941907 2018-10-16T17:24:42Z 2018-10-16T17:46:17Z NONE

I've hit this design limitation quite often as well, with several use-cases, both in experiment and simulation. It detracts from xarray's power of conveniently and transparently handling coordinate meta-data. From the Why xarray? page:

with xarray, you don’t need to keep track of the order of arrays dimensions or insert dummy dimensions

Adding effectively dummy dimensions or coordinates is essentially what this alignment design is forcing us to do.

A possible solution would be something like having (some) coordinate arrays in an (Unaligned)Dataset being a "reducible" (it would reduce to Index for each Datarray) MultiIndex. A workaround can be using MultiIndex coordinates directly, but then alignment cannot be done easily as levels do not behave as real dimensions.

Use-cases examples:

1. coordinate "metadata"

I often have measurements on related axes, but also with additional coordinates (different positions, etc.) Consider: python import numpy as np import xarray as xr n1 = np.arange(1, 22) m1 = xr.DataArray(n1*0.5, coords={'num': n1, 'B': 'r', 'ar' :'A'}, dims=['num'], name='MA') n2 = np.arange(2, 21) m2 = xr.DataArray(n2*0.5, coords={'num': n2, 'B': 'r', 'ar' :'B'}, dims=['num'], name='MB') ds = xr.merge([m1, m2]) print(ds)

What I would like to get (pseudocode):

python <xarray.Dataset> Dimensions: (num: 21, ar:2) # <-- note that MB is still of dims {'num': 19} only Coordinates: # <-- mostly unions as done by concat * num (num) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 B <U1 'r' * ar <U1 'A' 'B' # <-- this is now a dim of the dataset, but not of MA or MB Data variables: MA (num) float64 0.5 1.0 1.5 2.0 2.5 3.0 ... 8.0 8.5 9.0 9.5 10.0 10.5 MB (num) float64 1.0 1.5 2.0 2.5 3.0 3.5 ... 7.5 8.0 8.5 9.0 9.5 10.0 Instead I get

python MergeError: conflicting values for variable 'ar' on objects to be combined: first value: <xarray.Variable ()> array('A', dtype='<U1') second value: <xarray.Variable ()> array('B', dtype='<U1')

While it is possible to concat into something with dimensions (num, ar, B), it often results in huge arrays where most values are nan. I could also store the "position" metadata as attrs, but that pretty much defeats the point of using xarray to have coordinates transparently part of the coordinate metadata. Also, sometimes I would like to select arrays from the dataset from a given location, e.g. Dataset.sel(ar='B').

2. unaligned time domains

This s a large problem especially when different time-bases are involved. A difference in sampling intervals will blow up the storage by a huge number of nan values. Which of course greatly complicates further calculations, e.g. filtering in the time domain. Or just non-overlaping time intervals will require at least double the storage area.

I often find myself resorting rather to pandas.MultiIndex which gladly manages such non-aligned coordinates while still enabling slicing and selection on various levels. So it can be done and the pandas code and functionality already exists.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sharing dimensions across dataarrays in a dataset 241290234

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 4642.108ms · About: xarray-datasette