home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 374025325 and user = 16700639 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • bzah · 5 ✖

issue 1

  • Array indexing with dask arrays · 5 ✖

author_association 1

  • CONTRIBUTOR 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
944328081 https://github.com/pydata/xarray/issues/2511#issuecomment-944328081 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X844SU2R bzah 16700639 2021-10-15T14:03:21Z 2021-10-15T14:03:21Z CONTRIBUTOR

I'll drop a PR, it might be easier to try and play with this than a piece of code lost in an issue.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
931430066 https://github.com/pydata/xarray/issues/2511#issuecomment-931430066 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X843hH6y bzah 16700639 2021-09-30T15:30:02Z 2021-10-06T09:48:19Z CONTRIBUTOR

Okay I could re do my test. If I manually call compute() before doing isel(......) my whole computation takes about 5.65 seconds. However if I try with my naive patch it takes 32.34 seconds.

I'm sorry I cannot share as is my code, the relevant portion is really in the middle of many things. I'll try to get a minimalist version of it to share with you.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
930153816 https://github.com/pydata/xarray/issues/2511#issuecomment-930153816 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X843cQVY bzah 16700639 2021-09-29T13:02:15Z 2021-10-06T09:46:10Z CONTRIBUTOR

@pl-marasco Ok that's strange. I should have saved my use case :/ I will try to reproduce it and will provide a gist of it soon.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
932229595 https://github.com/pydata/xarray/issues/2511#issuecomment-932229595 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X843kLHb bzah 16700639 2021-10-01T13:29:32Z 2021-10-01T13:29:32Z CONTRIBUTOR

@pl-marasco Thanks for the example ! With it I have the same result as you, it takes the same time with patch or with compute.

However, I could construct an example giving very different results. It is quite close to my original code:

``` time_start = time.perf_counter() COORDS = dict( time=pd.date_range("2042-01-01", periods=200, freq=pd.DateOffset(days=1)), ) da = xr.DataArray( np.random.rand(200 * 3500 * 350).reshape((200, 3500, 350)), dims=('time', 'x', 'y'), coords=COORDS ).chunk(dict(time=-1, x=100, y=100))

resampled = da.resample(time="MS")

for label, sample in resampled:
    # sample = sample.compute()
    idx = sample.argmax('time')
    sample.isel(time=idx)

time_elapsed = time.perf_counter() - time_start
print(time_elapsed, " secs")

``` (Basically I want for each month the first event occurring in it).

Without the patch and uncommenting sample = sample.compute(), it takes 5.7 secs. With the patch it takes 53.9 seconds.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
922942743 https://github.com/pydata/xarray/issues/2511#issuecomment-922942743 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X843Av0X bzah 16700639 2021-09-20T13:45:56Z 2021-09-20T13:45:56Z CONTRIBUTOR

I wrote a very naive fix, it works but seems to perform really slowly, I would appreciate some feedback (I'm a beginner with Dask). Basically, I added k = dask.array.asarray(k, dtype=np.int64) to do the exact same thing as with numpy. I can create a PR if it's better to review this

The patch: ``` class VectorizedIndexer(ExplicitIndexer): """Tuple for vectorized indexing.

All elements should be slice or N-dimensional np.ndarray objects with an
integer dtype and the same number of dimensions. Indexing follows proposed
rules for np.ndarray.vindex, which matches NumPy's advanced indexing rules
(including broadcasting) except sliced axes are always moved to the end:
https://github.com/numpy/numpy/pull/6256
"""

__slots__ = ()

def __init__(self, key):
    if not isinstance(key, tuple):
        raise TypeError(f"key must be a tuple: {key!r}")

    new_key = []
    ndim = None
    for k in key:
        if isinstance(k, slice):
            k = as_integer_slice(k)
        elif isinstance(k, np.ndarray) or isinstance(k, dask.array.Array):
            if not np.issubdtype(k.dtype, np.integer):
                raise TypeError(
                    f"invalid indexer array, does not have integer dtype: {k!r}"
                )
            if ndim is None:
                ndim = k.ndim
            elif ndim != k.ndim:
                ndims = [k.ndim for k in key if isinstance(k, np.ndarray)]
                raise ValueError(
                    "invalid indexer key: ndarray arguments "
                    f"have different numbers of dimensions: {ndims}"
                )
            if isinstance(k, dask.array.Array):
                k = dask.array.asarray(k, dtype=np.int64)
            else:
                k = np.asarray(k, dtype=np.int64)
        else:
            raise TypeError(
                f"unexpected indexer type for {type(self).__name__}: {k!r}"
            )
        new_key.append(k)

    super().__init__(new_key)

```

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 2,
    "eyes": 0
}
  Array indexing with dask arrays 374025325

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1038.148ms · About: xarray-datasette