home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where author_association = "MEMBER" and issue = 225774140 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • rabernat 5
  • shoyer 3
  • jhamman 1

issue 1

  • selecting a point from an mfdataset · 9 ✖

author_association 1

  • MEMBER · 9 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
453806119 https://github.com/pydata/xarray/issues/1396#issuecomment-453806119 https://api.github.com/repos/pydata/xarray/issues/1396 MDEyOklzc3VlQ29tbWVudDQ1MzgwNjExOQ== jhamman 2443309 2019-01-13T06:32:45Z 2019-01-13T06:32:45Z MEMBER

closed via https://github.com/dask/dask/pull/2364 (a long time ago)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  selecting a point from an mfdataset 225774140
306843285 https://github.com/pydata/xarray/issues/1396#issuecomment-306843285 https://api.github.com/repos/pydata/xarray/issues/1396 MDEyOklzc3VlQ29tbWVudDMwNjg0MzI4NQ== rabernat 1197350 2017-06-07T16:07:03Z 2017-06-07T16:07:03Z MEMBER

Hi @JanisGailis. Thanks for looking into this issue! I will give your solution a try as soon as I get some free time.

However, I would like to point out that the issue is completely resolved by dask/dask#2364. So this can probably be closed after the next dask release.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  selecting a point from an mfdataset 225774140
303254497 https://github.com/pydata/xarray/issues/1396#issuecomment-303254497 https://api.github.com/repos/pydata/xarray/issues/1396 MDEyOklzc3VlQ29tbWVudDMwMzI1NDQ5Nw== rabernat 1197350 2017-05-23T00:16:58Z 2017-05-23T00:16:58Z MEMBER

This dask bug also explains why it is so slow to generate the repr for these big datasets.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  selecting a point from an mfdataset 225774140
301837577 https://github.com/pydata/xarray/issues/1396#issuecomment-301837577 https://api.github.com/repos/pydata/xarray/issues/1396 MDEyOklzc3VlQ29tbWVudDMwMTgzNzU3Nw== rabernat 1197350 2017-05-16T16:27:52Z 2017-05-16T16:27:52Z MEMBER

I have created a self-contained, reproducible example of this serious performance problem. https://gist.github.com/rabernat/7175328ee04a3167fa4dede1821964c6

This issue is becoming a big problem for me. I imagine other people must be experiencing it too.

I am happy to try to dig in and fix it, but I think some of @shoyer's backend insight would be valuable first.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  selecting a point from an mfdataset 225774140
298755789 https://github.com/pydata/xarray/issues/1396#issuecomment-298755789 https://api.github.com/repos/pydata/xarray/issues/1396 MDEyOklzc3VlQ29tbWVudDI5ODc1NTc4OQ== rabernat 1197350 2017-05-02T20:45:29Z 2017-05-02T20:45:41Z MEMBER

dask may be loading full arrays to do this computation

This is definitely what I suspect is happening. The problem with adding more chunks is that I quickly hit my system ulimit (see #1394), since, for some reason, all the 1754 files are opened as soon as I call .load(). Putting more chunks in just multiplies that number.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  selecting a point from an mfdataset 225774140
298755027 https://github.com/pydata/xarray/issues/1396#issuecomment-298755027 https://api.github.com/repos/pydata/xarray/issues/1396 MDEyOklzc3VlQ29tbWVudDI5ODc1NTAyNw== shoyer 1217238 2017-05-02T20:42:37Z 2017-05-02T20:42:37Z MEMBER

One thing worth trying is specifying chunks manually in open_mfdataset. Point-wise indexing should not really require chunks specified ahead of time, but the optimizations dask uses to make these operations efficient are somewhat fragile, so dask may be loading full arrays to do this computation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  selecting a point from an mfdataset 225774140
298754480 https://github.com/pydata/xarray/issues/1396#issuecomment-298754480 https://api.github.com/repos/pydata/xarray/issues/1396 MDEyOklzc3VlQ29tbWVudDI5ODc1NDQ4MA== shoyer 1217238 2017-05-02T20:40:35Z 2017-05-02T20:40:35Z MEMBER

OK, so that isn't terribly useful -- the slow-down is somewhere in dask-land. If it was an issue with alignment, that would come up when building the dask graph, not computing it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  selecting a point from an mfdataset 225774140
298745833 https://github.com/pydata/xarray/issues/1396#issuecomment-298745833 https://api.github.com/repos/pydata/xarray/issues/1396 MDEyOklzc3VlQ29tbWVudDI5ODc0NTgzMw== rabernat 1197350 2017-05-02T20:06:10Z 2017-05-02T20:06:10Z MEMBER

The relevant part of the stack trace is ``` /home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/xarray/core/dataarray.py in load(self) 571 working with many file objects on disk. 572 """ --> 573 ds = self._to_temp_dataset().load() 574 new = self._from_temp_dataset(ds) 575 self._variable = new._variable

/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/xarray/core/dataset.py in load(self) 467 468 # evaluate all the dask arrays simultaneously --> 469 evaluated_data = da.compute(*lazy_data.values()) 470 471 for k, data in zip(lazy_data, evaluated_data):

/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/dask/base.py in compute(args, kwargs) 200 dsk = collections_to_dsk(variables, optimize_graph, kwargs) 201 keys = [var._keys() for var in variables] --> 202 results = get(dsk, keys, *kwargs) 203 204 results_iter = iter(results)

/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, **kwargs) 1523 return sync(self.loop, self._get, dsk, keys, restrictions=restrictions, 1524 loose_restrictions=loose_restrictions, -> 1525 resources=resources) 1526 1527 def _optimize_insert_futures(self, dsk, keys):

/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/distributed/utils.py in sync(loop, func, args, kwargs) 200 loop.add_callback(f) 201 while not e.is_set(): --> 202 e.wait(1000000) 203 if error[0]: 204 six.reraise(error[0])

/home/rpa/.conda/envs/dask_distributed/lib/python3.5/threading.py in wait(self, timeout) 547 signaled = self._flag 548 if not signaled: --> 549 signaled = self._cond.wait(timeout) 550 return signaled 551

/home/rpa/.conda/envs/dask_distributed/lib/python3.5/threading.py in wait(self, timeout) 295 else: 296 if timeout > 0: --> 297 gotit = waiter.acquire(True, timeout) 298 else: 299 gotit = waiter.acquire(False)

KeyboardInterrupt: ```

I think the issue you are referring to is also mine (#1385).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  selecting a point from an mfdataset 225774140
298738745 https://github.com/pydata/xarray/issues/1396#issuecomment-298738745 https://api.github.com/repos/pydata/xarray/issues/1396 MDEyOklzc3VlQ29tbWVudDI5ODczODc0NQ== shoyer 1217238 2017-05-02T19:38:36Z 2017-05-02T19:38:36Z MEMBER

Can you try using Ctrl + C to interrupt things and report the stack-trace?

This might be an issue with xarray verifying aligned coordinates, which we should have an option to disable. (This came up somewhere else recently, but I couldn't find the issue with a quick search.)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  selecting a point from an mfdataset 225774140

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.865ms · About: xarray-datasette