home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

13 rows where issue = 592312709 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 7

  • max-sixty 3
  • TomNicholas 3
  • shoyer 2
  • benbovy 2
  • ivanakcheurov 1
  • mathause 1
  • pep8speaks 1

author_association 2

  • MEMBER 11
  • NONE 2

issue 1

  • sel along 1D non-index coordinates · 13 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1239471756 https://github.com/pydata/xarray/pull/3925#issuecomment-1239471756 https://api.github.com/repos/pydata/xarray/issues/3925 IC_kwDOAMm_X85J4NaM mathause 10194086 2022-09-07T14:31:58Z 2022-09-07T14:31:58Z MEMBER

With #6971 on the way I guess we can close this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sel along 1D non-index coordinates 592312709
1146877063 https://github.com/pydata/xarray/pull/3925#issuecomment-1146877063 https://api.github.com/repos/pydata/xarray/issues/3925 IC_kwDOAMm_X85EW_SH ivanakcheurov 5347026 2022-06-05T20:18:26Z 2022-06-05T20:18:26Z NONE

In #5692 it is possible to perform selection using non-dimension coordinates with an index, although there's no easy way yet to set an index for such coordinates (this will be done in a follow-up PR by updating the API of set_index).

@benbovy, please could you give an example how it is possible? I would like sel based on a non-dim coordinate to be as fast as sel based on the dim itself as per the following timings: ```python

sel based on a non-dim coordinate

(using this coordinate directly .sel(product_id=26) would result in error "'no index found for coordinate product_id")

%timeit xds.sel(product=xds.product_id==26) 1.54 ms ± 64.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

sel based on the dim itself

%timeit xds.sel(product='GN91 Glove Medium') 499 µs ± 16.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit xds.where(xds.product_id==26, drop=True) 4.17 ms ± 39 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sel along 1D non-index coordinates 592312709
926049276 https://github.com/pydata/xarray/pull/3925#issuecomment-926049276 https://api.github.com/repos/pydata/xarray/issues/3925 IC_kwDOAMm_X843MmP8 TomNicholas 35968931 2021-09-23T18:20:50Z 2021-09-23T18:21:12Z MEMBER

FYI I probably am not going to work on this PR again - especially as I remember getting quite confused by the indexing internals, which have now changed. If we can just wait for this feature to be enabled via the indexing refactor then I would rather just do that. Perhaps I should close this?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sel along 1D non-index coordinates 592312709
925694605 https://github.com/pydata/xarray/pull/3925#issuecomment-925694605 https://api.github.com/repos/pydata/xarray/issues/3925 IC_kwDOAMm_X843LPqN benbovy 4160723 2021-09-23T10:37:07Z 2021-09-23T10:37:07Z MEMBER

In #5692 it is possible to perform selection using non-dimension coordinates with an index, although there's no easy way yet to set an index for such coordinates (this will be done in a follow-up PR by updating the API of set_index).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sel along 1D non-index coordinates 592312709
794116358 https://github.com/pydata/xarray/pull/3925#issuecomment-794116358 https://api.github.com/repos/pydata/xarray/issues/3925 MDEyOklzc3VlQ29tbWVudDc5NDExNjM1OA== benbovy 4160723 2021-03-09T16:23:17Z 2021-03-09T16:23:17Z MEMBER

pandas is 1000x faster than NumPy if the index is pre-existing, but 100x slower if the index is new. That's a 1e5 fold slow-down!

I think users will appreciate the flexibility, but if there's some way we warn users that they really should set the index ahead of time when they are doing repeating indexing that could also be welcome.

I think it's a good use case for some kind of EphemeralIndex (or BasicIndex or NumpyIndex) once the explicit index refactoring is done, along with some good documentation on which index to choose for which purpose.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sel along 1D non-index coordinates 592312709
609821244 https://github.com/pydata/xarray/pull/3925#issuecomment-609821244 https://api.github.com/repos/pydata/xarray/issues/3925 MDEyOklzc3VlQ29tbWVudDYwOTgyMTI0NA== max-sixty 5635139 2020-04-06T14:14:34Z 2020-04-06T14:14:34Z MEMBER

OK! May be a sizeable change but I update on @shoyer 's view, let's do it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sel along 1D non-index coordinates 592312709
609472798 https://github.com/pydata/xarray/pull/3925#issuecomment-609472798 https://api.github.com/repos/pydata/xarray/issues/3925 MDEyOklzc3VlQ29tbWVudDYwOTQ3Mjc5OA== shoyer 1217238 2020-04-05T19:53:44Z 2020-04-05T19:53:44Z MEMBER

Related to my microbenchmark, it might also be worth considering pure NumPy versions of common indexing operations, to avoid the need to repeatedly create hash-tables. But that could be quite a bit of work to do comprehensively.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sel along 1D non-index coordinates 592312709
609471635 https://github.com/pydata/xarray/pull/3925#issuecomment-609471635 https://api.github.com/repos/pydata/xarray/issues/3925 MDEyOklzc3VlQ29tbWVudDYwOTQ3MTYzNQ== shoyer 1217238 2020-04-05T19:45:07Z 2020-04-05T19:45:07Z MEMBER

I think this is generally a good idea!

In the future, creating an index explicit would just mean: 1. Repeated lookups are more efficient, due to caching in a hash-table. 2. The coordinate values are immutable, to ensure that the index values can be cached.

One minor concern I have here is about efficiency: building an pd.Index and its hash-table from scratch can be quite expensive. If we're only doing a single lookup this is fine, but if it's being done in a loop this could be surprisingly slow, and we would do far better sticking with pure NumPy operations.

Here's an microbenchmark that hopefully illustrates the issue: ``` import pandas as pd import numpy as np

def lookup_preindexed(needle, index): return index.get_loc(needle)

def lookup_newindex(needle, haystack): return lookup_preindexed(needle, pd.Index(haystack))

def lookup_numpy(needle, haystack): return (haystack == needle).argmax()

haystack = np.random.permutation(np.arange(1000000)) index = pd.Index(haystack) %timeit lookup_newindex(0, haystack) # 56.1 ms per loop %timeit lookup_preindexed(0, index) # 696 ns per loop %timeit lookup_numpy(0, haystack) # 517 µs per loop ```

pandas is 1000x faster than NumPy if the index is pre-existing, but 100x slower if the index is new. That's a 1e5 fold slow-down!

I think users will appreciate the flexibility, but if there's some way we warn users that they really should set the index ahead of time when they are doing repeating indexing that could also be welcome. Figuring out how to save the state for counting the number of times a new index is created could be pretty messy, though. I guess we could stuff it into Variable.encoding and issue a warning whenever the same variable has been converted into an index at least 100 times.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sel along 1D non-index coordinates 592312709
608018692 https://github.com/pydata/xarray/pull/3925#issuecomment-608018692 https://api.github.com/repos/pydata/xarray/issues/3925 MDEyOklzc3VlQ29tbWVudDYwODAxODY5Mg== TomNicholas 35968931 2020-04-02T18:08:32Z 2020-04-02T18:08:32Z MEMBER

Gotcha, thanks.

Hmm, this seems like a grey area... I think selecting along 1D non-dimension coords is probably way more common than along 2D coords, but I'm biased.

changing the number of dimensions doesn't change functionality

BTW I can immediately think of at least one other place where this rule is broken (#3774) - where going from 0D coords to 1D coords of length 1 changes whether combine_by_coords will accept them.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sel along 1D non-index coordinates 592312709
608014672 https://github.com/pydata/xarray/pull/3925#issuecomment-608014672 https://api.github.com/repos/pydata/xarray/issues/3925 MDEyOklzc3VlQ29tbWVudDYwODAxNDY3Mg== max-sixty 5635139 2020-04-02T18:01:11Z 2020-04-02T18:01:11Z MEMBER

Sorry, could you explain what you mean here? How would this PR violate that?

Of course. I mean that if someone changes the dimensionality of a non-index coord from 1D to 2D, then running .sel over it will stop working.

Whereas now, neither would work. Which in some ways is worse, but also less surprising. I think these are always difficult trade-offs...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sel along 1D non-index coordinates 592312709
608000289 https://github.com/pydata/xarray/pull/3925#issuecomment-608000289 https://api.github.com/repos/pydata/xarray/issues/3925 MDEyOklzc3VlQ29tbWVudDYwODAwMDI4OQ== TomNicholas 35968931 2020-04-02T17:44:39Z 2020-04-02T17:44:39Z MEMBER

In particular, one maxim we've generally held is that changing the number of dimensions doesn't change functionality.

Sorry, could you explain what you mean here? How would this PR violate that?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sel along 1D non-index coordinates 592312709
607977635 https://github.com/pydata/xarray/pull/3925#issuecomment-607977635 https://api.github.com/repos/pydata/xarray/issues/3925 MDEyOklzc3VlQ29tbWVudDYwNzk3NzYzNQ== max-sixty 5635139 2020-04-02T17:14:41Z 2020-04-02T17:14:41Z MEMBER

I think the functionality is useful, but I'm concerned it would make the API more confusing. In particular, one maxim we've generally held is that changing the number of dimensions doesn't change functionality. (ofc it changes whether it's possible to create an index, though the indexing API doesn't change)

What are others' thoughts?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sel along 1D non-index coordinates 592312709
607584705 https://github.com/pydata/xarray/pull/3925#issuecomment-607584705 https://api.github.com/repos/pydata/xarray/issues/3925 MDEyOklzc3VlQ29tbWVudDYwNzU4NDcwNQ== pep8speaks 24736507 2020-04-02T02:24:04Z 2020-04-02T02:24:04Z NONE

Hello @TomNicholas! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

  • In the file xarray/core/indexing.py:

Line 219:12: E131 continuation line unaligned for hanging indent Line 220:14: E131 continuation line unaligned for hanging indent Line 223:10: E111 indentation is not a multiple of four Line 223:10: E117 over-indented

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sel along 1D non-index coordinates 592312709

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 68.874ms · About: xarray-datasette