home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 225774140 and user = 1197350 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • rabernat · 5 ✖

issue 1

  • selecting a point from an mfdataset · 5 ✖

author_association 1

  • MEMBER 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
306843285 https://github.com/pydata/xarray/issues/1396#issuecomment-306843285 https://api.github.com/repos/pydata/xarray/issues/1396 MDEyOklzc3VlQ29tbWVudDMwNjg0MzI4NQ== rabernat 1197350 2017-06-07T16:07:03Z 2017-06-07T16:07:03Z MEMBER

Hi @JanisGailis. Thanks for looking into this issue! I will give your solution a try as soon as I get some free time.

However, I would like to point out that the issue is completely resolved by dask/dask#2364. So this can probably be closed after the next dask release.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  selecting a point from an mfdataset 225774140
303254497 https://github.com/pydata/xarray/issues/1396#issuecomment-303254497 https://api.github.com/repos/pydata/xarray/issues/1396 MDEyOklzc3VlQ29tbWVudDMwMzI1NDQ5Nw== rabernat 1197350 2017-05-23T00:16:58Z 2017-05-23T00:16:58Z MEMBER

This dask bug also explains why it is so slow to generate the repr for these big datasets.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  selecting a point from an mfdataset 225774140
301837577 https://github.com/pydata/xarray/issues/1396#issuecomment-301837577 https://api.github.com/repos/pydata/xarray/issues/1396 MDEyOklzc3VlQ29tbWVudDMwMTgzNzU3Nw== rabernat 1197350 2017-05-16T16:27:52Z 2017-05-16T16:27:52Z MEMBER

I have created a self-contained, reproducible example of this serious performance problem. https://gist.github.com/rabernat/7175328ee04a3167fa4dede1821964c6

This issue is becoming a big problem for me. I imagine other people must be experiencing it too.

I am happy to try to dig in and fix it, but I think some of @shoyer's backend insight would be valuable first.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  selecting a point from an mfdataset 225774140
298755789 https://github.com/pydata/xarray/issues/1396#issuecomment-298755789 https://api.github.com/repos/pydata/xarray/issues/1396 MDEyOklzc3VlQ29tbWVudDI5ODc1NTc4OQ== rabernat 1197350 2017-05-02T20:45:29Z 2017-05-02T20:45:41Z MEMBER

dask may be loading full arrays to do this computation

This is definitely what I suspect is happening. The problem with adding more chunks is that I quickly hit my system ulimit (see #1394), since, for some reason, all the 1754 files are opened as soon as I call .load(). Putting more chunks in just multiplies that number.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  selecting a point from an mfdataset 225774140
298745833 https://github.com/pydata/xarray/issues/1396#issuecomment-298745833 https://api.github.com/repos/pydata/xarray/issues/1396 MDEyOklzc3VlQ29tbWVudDI5ODc0NTgzMw== rabernat 1197350 2017-05-02T20:06:10Z 2017-05-02T20:06:10Z MEMBER

The relevant part of the stack trace is ``` /home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/xarray/core/dataarray.py in load(self) 571 working with many file objects on disk. 572 """ --> 573 ds = self._to_temp_dataset().load() 574 new = self._from_temp_dataset(ds) 575 self._variable = new._variable

/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/xarray/core/dataset.py in load(self) 467 468 # evaluate all the dask arrays simultaneously --> 469 evaluated_data = da.compute(*lazy_data.values()) 470 471 for k, data in zip(lazy_data, evaluated_data):

/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/dask/base.py in compute(args, kwargs) 200 dsk = collections_to_dsk(variables, optimize_graph, kwargs) 201 keys = [var._keys() for var in variables] --> 202 results = get(dsk, keys, *kwargs) 203 204 results_iter = iter(results)

/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, **kwargs) 1523 return sync(self.loop, self._get, dsk, keys, restrictions=restrictions, 1524 loose_restrictions=loose_restrictions, -> 1525 resources=resources) 1526 1527 def _optimize_insert_futures(self, dsk, keys):

/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/distributed/utils.py in sync(loop, func, args, kwargs) 200 loop.add_callback(f) 201 while not e.is_set(): --> 202 e.wait(1000000) 203 if error[0]: 204 six.reraise(error[0])

/home/rpa/.conda/envs/dask_distributed/lib/python3.5/threading.py in wait(self, timeout) 547 signaled = self._flag 548 if not signaled: --> 549 signaled = self._cond.wait(timeout) 550 return signaled 551

/home/rpa/.conda/envs/dask_distributed/lib/python3.5/threading.py in wait(self, timeout) 295 else: 296 if timeout > 0: --> 297 gotit = waiter.acquire(True, timeout) 298 else: 299 gotit = waiter.acquire(False)

KeyboardInterrupt: ```

I think the issue you are referring to is also mine (#1385).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  selecting a point from an mfdataset 225774140

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 45.902ms · About: xarray-datasette