home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where issue = 241290234 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • shoyer 3
  • smartass101 2
  • zbarry 1
  • tommylees112 1

author_association 2

  • NONE 4
  • MEMBER 3

issue 1

  • sharing dimensions across dataarrays in a dataset · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
524895731 https://github.com/pydata/xarray/issues/1471#issuecomment-524895731 https://api.github.com/repos/pydata/xarray/issues/1471 MDEyOklzc3VlQ29tbWVudDUyNDg5NTczMQ== zbarry 4762711 2019-08-26T15:00:35Z 2019-08-26T15:00:35Z NONE

I just wanted to chime in as to the usefulness of being able to do something like this without the extra mental overhead being required by the workaround proposed. My use case parallels @smartass101's very closely. Have there been any updates to xarray since last year that might make streamlining this use case a bit more feasible, by any chance? :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sharing dimensions across dataarrays in a dataset 241290234
433952128 https://github.com/pydata/xarray/issues/1471#issuecomment-433952128 https://api.github.com/repos/pydata/xarray/issues/1471 MDEyOklzc3VlQ29tbWVudDQzMzk1MjEyOA== tommylees112 21049064 2018-10-29T15:21:34Z 2018-10-29T15:21:34Z NONE

@smartass101 & @shoyer what would be the code for working with a pandas.MultiIndex object in this use case? Could you show how it would work related to your example above:

<xarray.Dataset> Dimensions: (num: 21, ar:2) # <-- note that MB is still of dims {'num': 19} only Coordinates: # <-- mostly unions as done by concat * num (num) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 B <U1 'r' * ar <U1 'A' 'B' # <-- this is now a dim of the dataset, but not of MA or MB Data variables: MA (num) float64 0.5 1.0 1.5 2.0 2.5 3.0 ... 8.0 8.5 9.0 9.5 10.0 10.5 MB (num) float64 1.0 1.5 2.0 2.5 3.0 3.5 ... 7.5 8.0 8.5 9.0 9.5 10.0

I am working with land surface model outputs. I have lots of one-dimensional data for different lat/lon points, at different times. I want to join them all into one dataset to make plotting easier. E.g. plot the evapotranspiration estimates for all the stations at their x,y coordinates.

Thanks very much!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sharing dimensions across dataarrays in a dataset 241290234
431051341 https://github.com/pydata/xarray/issues/1471#issuecomment-431051341 https://api.github.com/repos/pydata/xarray/issues/1471 MDEyOklzc3VlQ29tbWVudDQzMTA1MTM0MQ== shoyer 1217238 2018-10-18T15:21:24Z 2018-10-18T15:21:24Z MEMBER

I'm marking #1408 as a bug so we won't forget about it. Hopefully it should be fixed automatically as part of the "explicit indexes" refactor.

On Thu, Oct 18, 2018 at 2:48 AM Ondrej Grover notifications@github.com wrote:

I indeed often resort to using a pandas.MultiIndex, but especially the dropping of the selected coordinate value (#1408 https://github.com/pydata/xarray/issues/1408) makes it quite inconvenient.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1471#issuecomment-430946620, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1pnDztKWoTaWEjzPpP6orveOMNWRks5umE6BgaJpZM4ORDdd .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sharing dimensions across dataarrays in a dataset 241290234
430946620 https://github.com/pydata/xarray/issues/1471#issuecomment-430946620 https://api.github.com/repos/pydata/xarray/issues/1471 MDEyOklzc3VlQ29tbWVudDQzMDk0NjYyMA== smartass101 941907 2018-10-18T09:48:20Z 2018-10-18T09:48:20Z NONE

I indeed often resort to using a pandas.MultiIndex, but especially the dropping of the selected coordinate value (#1408) makes it quite inconvenient.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sharing dimensions across dataarrays in a dataset 241290234
430358013 https://github.com/pydata/xarray/issues/1471#issuecomment-430358013 https://api.github.com/repos/pydata/xarray/issues/1471 MDEyOklzc3VlQ29tbWVudDQzMDM1ODAxMw== shoyer 1217238 2018-10-16T19:00:16Z 2018-10-16T19:00:34Z MEMBER

You can use a pandas.MultiIndex with xarray. The interface/abstraction could be improved and has some rough edges (e.g., see especially https://github.com/pydata/xarray/issues/1603), but I think this is the preferred way to support these use cases. It does already work for indexing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sharing dimensions across dataarrays in a dataset 241290234
430324391 https://github.com/pydata/xarray/issues/1471#issuecomment-430324391 https://api.github.com/repos/pydata/xarray/issues/1471 MDEyOklzc3VlQ29tbWVudDQzMDMyNDM5MQ== smartass101 941907 2018-10-16T17:24:42Z 2018-10-16T17:46:17Z NONE

I've hit this design limitation quite often as well, with several use-cases, both in experiment and simulation. It detracts from xarray's power of conveniently and transparently handling coordinate meta-data. From the Why xarray? page:

with xarray, you don’t need to keep track of the order of arrays dimensions or insert dummy dimensions

Adding effectively dummy dimensions or coordinates is essentially what this alignment design is forcing us to do.

A possible solution would be something like having (some) coordinate arrays in an (Unaligned)Dataset being a "reducible" (it would reduce to Index for each Datarray) MultiIndex. A workaround can be using MultiIndex coordinates directly, but then alignment cannot be done easily as levels do not behave as real dimensions.

Use-cases examples:

1. coordinate "metadata"

I often have measurements on related axes, but also with additional coordinates (different positions, etc.) Consider: python import numpy as np import xarray as xr n1 = np.arange(1, 22) m1 = xr.DataArray(n1*0.5, coords={'num': n1, 'B': 'r', 'ar' :'A'}, dims=['num'], name='MA') n2 = np.arange(2, 21) m2 = xr.DataArray(n2*0.5, coords={'num': n2, 'B': 'r', 'ar' :'B'}, dims=['num'], name='MB') ds = xr.merge([m1, m2]) print(ds)

What I would like to get (pseudocode):

python <xarray.Dataset> Dimensions: (num: 21, ar:2) # <-- note that MB is still of dims {'num': 19} only Coordinates: # <-- mostly unions as done by concat * num (num) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 B <U1 'r' * ar <U1 'A' 'B' # <-- this is now a dim of the dataset, but not of MA or MB Data variables: MA (num) float64 0.5 1.0 1.5 2.0 2.5 3.0 ... 8.0 8.5 9.0 9.5 10.0 10.5 MB (num) float64 1.0 1.5 2.0 2.5 3.0 3.5 ... 7.5 8.0 8.5 9.0 9.5 10.0 Instead I get

python MergeError: conflicting values for variable 'ar' on objects to be combined: first value: <xarray.Variable ()> array('A', dtype='<U1') second value: <xarray.Variable ()> array('B', dtype='<U1')

While it is possible to concat into something with dimensions (num, ar, B), it often results in huge arrays where most values are nan. I could also store the "position" metadata as attrs, but that pretty much defeats the point of using xarray to have coordinates transparently part of the coordinate metadata. Also, sometimes I would like to select arrays from the dataset from a given location, e.g. Dataset.sel(ar='B').

2. unaligned time domains

This s a large problem especially when different time-bases are involved. A difference in sampling intervals will blow up the storage by a huge number of nan values. Which of course greatly complicates further calculations, e.g. filtering in the time domain. Or just non-overlaping time intervals will require at least double the storage area.

I often find myself resorting rather to pandas.MultiIndex which gladly manages such non-aligned coordinates while still enabling slicing and selection on various levels. So it can be done and the pandas code and functionality already exists.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sharing dimensions across dataarrays in a dataset 241290234
313719395 https://github.com/pydata/xarray/issues/1471#issuecomment-313719395 https://api.github.com/repos/pydata/xarray/issues/1471 MDEyOklzc3VlQ29tbWVudDMxMzcxOTM5NQ== shoyer 1217238 2017-07-07T15:48:05Z 2017-07-07T15:48:05Z MEMBER

I'm afraid this isn't possible, by design. Every variable in a Dataset sharing the same coordinate system is enforced as part of the xarray data model. This makes data analysis and comparison with a Dataset quite straightforward, since everything is already on the same grid.

For cases where you need different coordinate values and/or dimension sizes, your options are to either rename dimensions for different variables or use multiple Dataset/DataArray objects (Python has nice built-in data structures).

In theory, we could add something like an "UnalignedDataset" that supports most of the Dataset methods without requiring alignment but I'm not sure it's worth the trouble.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sharing dimensions across dataarrays in a dataset 241290234

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.114ms · About: xarray-datasette