home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where author_association = "CONTRIBUTOR" and issue = 222676855 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • gerritholl 3
  • mangecoeur 1
  • gimperiale 1

issue 1

  • Many methods are broken (e.g., concat/stack/sortby) when using repeated dimensions · 5 ✖

author_association 1

  • CONTRIBUTOR · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
602795869 https://github.com/pydata/xarray/issues/1378#issuecomment-602795869 https://api.github.com/repos/pydata/xarray/issues/1378 MDEyOklzc3VlQ29tbWVudDYwMjc5NTg2OQ== mangecoeur 743508 2020-03-23T19:02:26Z 2020-03-23T19:02:26Z CONTRIBUTOR

Just wondering what the status of this is. I've been running into bugs trying to model symmetric distance matrices using the same dimension. Interestingly, it does work very well for selecting, e.g. if use .sel(nodes=node_list) on a square matrix i correctly get a square matrix subset 👍 But unfortunately a lot of other things seems to break, e.g. concatenating fails with ValueError: axes don't match array :( What would need to happen to make this work?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Many methods are broken (e.g., concat/stack/sortby) when using repeated dimensions 222676855
528920519 https://github.com/pydata/xarray/issues/1378#issuecomment-528920519 https://api.github.com/repos/pydata/xarray/issues/1378 MDEyOklzc3VlQ29tbWVudDUyODkyMDUxOQ== gimperiale 47244312 2019-09-06T16:22:12Z 2019-09-06T16:22:12Z CONTRIBUTOR

I'm not too fond of having multiple dimensions with the same name because, whenever you need to operate on one but not the other, you have little to no choice but revert to positional indexing.

Consider also how many methods expect either **kwargs or a dict-like parameter with the dimension or variable names as the keys. I would not be surprised to find that many API design choices fall apart in the face of this use case.

Also, having two non positional (as it should always be in xarray!) dimensions with the same name only makes sense when modelling symmetric N:N relationships. Two good examples are covariance matrices and the weights for a Dijkstra algorithm.

The problems start when the object represents an asymmetric relationship, e.g: - Cost (for the purpose of graph resolution, so time/money/other) of transportation via river, where going from A->B (downstream) is cheaper than going back from B->A (upstream) - Currency conversion, where EUR->USD is not identical to 1/(USD->EUR) because of arbitrage and illiquidity - In financial Monte Carlo simulations, I had to deal with credit rating transition matrices which define the probability of a company to change its credit rating. In unfavourable market conditions, the chances of being downgraded from AAA to AA are higher than being promoted from AA to AAA.

I could easily come up with many other cases. In case of asymmetric N:N relationships, it is highly desirable to share the same index across multiple dimensions with different names (that would typically convey the direction of the relationship, e.g. "from" and "to").

What if, instead of allowing for duplicate dimensions, we allowed sharing an index across different dimensions?

Something like python river_transport = Dataset( coords={ 'station': ['Kingston', 'Montreal'], 'station_from': ('station', ) 'station_to': ('station', ) }, data_vars={ cost=(('station_from', 'station_to'), [[0, 20], [15, 0]]), } } or, for DataArrays: python river_transport = DataArray( [[0, 20], [15, 0]], dims=('station_from', 'station_to'), coords={ 'station': ['Kingston', 'Montreal'], 'station_from': ('station', ) 'station_to': ('station', ) }, }

Note how this syntax doesn't exist as of today: python 'station_from': ('station', ) 'station_to': ('station', ) From an implementation point of view, I think it could be easily implemented by keeping track of a map of aliases and with some __geitem__ magic. More effort would be needed to convince DataArrays to accept (and not accidentally drop) a coordinate whose dims don't match any of the data variable's.

This design would not resolve the issue of compatibility with NetCDF though. I'd be surprised if the NetCDF designers never came across this - maybe it's a good idea to have a chat with them?

{
    "total_count": 5,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Many methods are broken (e.g., concat/stack/sortby) when using repeated dimensions 222676855
376106248 https://github.com/pydata/xarray/issues/1378#issuecomment-376106248 https://api.github.com/repos/pydata/xarray/issues/1378 MDEyOklzc3VlQ29tbWVudDM3NjEwNjI0OA== gerritholl 500246 2018-03-26T09:38:00Z 2018-03-26T09:38:00Z CONTRIBUTOR

This also affects the stack method.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Many methods are broken (e.g., concat/stack/sortby) when using repeated dimensions 222676855
367153633 https://github.com/pydata/xarray/issues/1378#issuecomment-367153633 https://api.github.com/repos/pydata/xarray/issues/1378 MDEyOklzc3VlQ29tbWVudDM2NzE1MzYzMw== gerritholl 500246 2018-02-20T23:10:13Z 2018-02-20T23:10:13Z CONTRIBUTOR

@jhamman Ok, good to hear it's not slated to be removed. I would love to work on this, I wish I had the time! I'll keep it in mind if I do find some spare time.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Many methods are broken (e.g., concat/stack/sortby) when using repeated dimensions 222676855
367147759 https://github.com/pydata/xarray/issues/1378#issuecomment-367147759 https://api.github.com/repos/pydata/xarray/issues/1378 MDEyOklzc3VlQ29tbWVudDM2NzE0Nzc1OQ== gerritholl 500246 2018-02-20T22:46:27Z 2018-02-20T22:46:27Z CONTRIBUTOR

I cannot see a use case in which repeated dims actually make sense.

I use repeated dimensions to store a covariance matrix. The data variable containing the covariance matrix has 4 dimensions, of which the last 2 are repeated. For example, I have a data variable with dimensions (channel, scanline, element, element), storing an element-element covariance matrix for every scanline in satellite data.

This is valid NetCDF and should be valid in xarray. It would be a significant problem for me if they became disallowed.

{
    "total_count": 6,
    "+1": 6,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Many methods are broken (e.g., concat/stack/sortby) when using repeated dimensions 222676855

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 324.544ms · About: xarray-datasette