home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 955936490 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date)

user 1

  • shoyer · 3 ✖

issue 1

  • Flexible indexes: review the implementation of alignment and merge · 3 ✖

author_association 1

  • MEMBER 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
945931002 https://github.com/pydata/xarray/issues/5647#issuecomment-945931002 https://api.github.com/repos/pydata/xarray/issues/5647 IC_kwDOAMm_X844YcL6 shoyer 1217238 2021-10-18T16:07:06Z 2021-10-18T16:07:06Z MEMBER

For alignment, I think we could directly call idx.reindex_like(other) for each index in each object to align where other corresponds to the aligned index that matches idx.

Sounds good to me!

If we like, we could probably even add static type information on reindex_like, to require the same index type: ```python T = TypeVar("T", bound="Index")

... def reindex_like(self: T, other: T) -> DimIntIndexers: ... ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible indexes: review the implementation of alignment and merge 955936490
945928870 https://github.com/pydata/xarray/issues/5647#issuecomment-945928870 https://api.github.com/repos/pydata/xarray/issues/5647 IC_kwDOAMm_X844Ybqm shoyer 1217238 2021-10-18T16:04:52Z 2021-10-18T16:04:52Z MEMBER

2. dim has a multi-index, labels is a pd.MultiIndex and level names exactly match: no real issue either in this case, but shouldn't we discourage this implicit behavior in the mid/long term and instead encourage using reindex_like for such more advanced use case?

I agree, this doesn't make sense in the long term because the "name" of the MultiIndex is no longer necessary: it's just the same index that happens to index each of the levels.

Let's preserve it (for now) for backwards compat, but in the long term the ideal is certainly either (a) using reindex_like or (b) using reindex with separate kwargs for each level

3. dim has a multi-index and labels is (cast to) a single pandas index (or the other way around): this is currently possible in Xarray but it seems silly? After re-indexing, all data along dim is filled with fill_value... Would it be fine to instead raise an error now? Would it really break any user case?

If part of Xarray's API currently doesn't raise an error but instead returns all NaNs, and this case can be detected based on static type information (e.g., shapes, dtypes, dimension names, variable name, index types), then I agree that the best user experience is almost certainly to raise an error instead.

NaNs in numerical computing are essentially a mechanism for "runtime error handling" that can arise from values in array computation. If something can be identified based on type information, that is a better user experience.

4. dim has a multi-index, labels is a pd.MultiIndex and multi-index level names don't match: same problem than for case 3.

Cases 3 and 4 are a big obstacle for me right now. I really don't know how we can still support those special cases without deeply re-thinking the problem. If they could be considered as a bug, then the new implementation would already raise an nice error message :-).

I think we can consider these edge cases bugs and fix them :)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible indexes: review the implementation of alignment and merge 955936490
889288761 https://github.com/pydata/xarray/issues/5647#issuecomment-889288761 https://api.github.com/repos/pydata/xarray/issues/5647 IC_kwDOAMm_X841AXg5 shoyer 1217238 2021-07-29T16:26:01Z 2021-07-29T16:26:01Z MEMBER

Conceptually, align should ensure that all indexes in the provided arguments match.

This is potentially ambiguous for cases with multiple (non-MultiIndex) indexes, if the result of aligning the separate indexes does not match, e.g., if we have: - Index for x along dimension x: [1, 2, 3] vs [1, 2] - Index for y along dimension x: [1, 2, 3] vs [2, 3]

We should raise an error in this cases (and/or suggest setting a MultiIndex). It should also be OK if not every index implements alignment, in which case they should raise an error if coordinates do not match exactly.

With regards to your concern:

This currently works well since a pd.Index can be directly treated as a 1-d array but this won’t be always the case anymore with custom indexes.

I don't think we should try to support alignment with multi-dimensional (non-orthogonal) inside align(). Instead we can just require the coordinates correspondign to such indexes to match exactly (or within a tolerance). I can see supporting alignment between two DataArrays that are indexed by a KDTreeIndex over multiple coordintaes along a single dimension in the same way tthat we support alignment with MultiIndex indexes currently.

But if your indexes correpsond to multi-dimensional arrays (rather than just multiple coordinates), joining indexes together is a much messier operation, one that may not be possible without thinking carefully about interpolation/regridding. In many cases it may not be possible to retain the multi-dimensional nature of the indexes in the result (e.g., the union of two partially overlapping grids). Since the desired behavior is not clear, it is better to force the user to make a choice, either by stacking the dimensions in multi-dimensional indexes into 1D (like a MultiIndex) or by calling a specialized method for interpolation/regridding.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible indexes: review the implementation of alignment and merge 955936490

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 3303.262ms · About: xarray-datasette