home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 302930480 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • jhamman 2
  • mrocklin 1
  • shoyer 1
  • Karel-van-de-Plassche 1

author_association 2

  • MEMBER 4
  • CONTRIBUTOR 1

issue 1

  • Should we be testing against multiple dask schedulers? · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
453865008 https://github.com/pydata/xarray/issues/1971#issuecomment-453865008 https://api.github.com/repos/pydata/xarray/issues/1971 MDEyOklzc3VlQ29tbWVudDQ1Mzg2NTAwOA== jhamman 2443309 2019-01-13T20:58:20Z 2019-01-13T20:58:20Z MEMBER

Closing this now. The distributed integration test module seems to be covering our IO use cases well enough. I don't think we need to do anything here at this time.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should we be testing against multiple dask schedulers? 302930480
392572591 https://github.com/pydata/xarray/issues/1971#issuecomment-392572591 https://api.github.com/repos/pydata/xarray/issues/1971 MDEyOklzc3VlQ29tbWVudDM5MjU3MjU5MQ== Karel-van-de-Plassche 6404167 2018-05-28T17:12:51Z 2018-05-28T17:13:56Z CONTRIBUTOR

Seems like the distributed scheduler is the advised one to use in general, so maybe some tests could be added for this one. For sure for diskIO, would be interesting to see the difference.

http://dask.pydata.org/en/latest/setup.html

Note that the newer dask.distributed scheduler is often preferable even on single workstations. It contains many diagnostics and features not found in the older single-machine scheduler. The following pages explain in more detail how to set up Dask on a variety of local and distributed hardware.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should we be testing against multiple dask schedulers? 302930480
371462262 https://github.com/pydata/xarray/issues/1971#issuecomment-371462262 https://api.github.com/repos/pydata/xarray/issues/1971 MDEyOklzc3VlQ29tbWVudDM3MTQ2MjI2Mg== mrocklin 306380 2018-03-08T11:35:25Z 2018-03-08T11:35:25Z MEMBER

FWIW most of the logic within the dask collections (array, dataframe, delayed) is only tested with dask.local.get_sync. This also makes the test suite much faster.

Obviously though for things like writing to disk it's useful to check different schedulers.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should we be testing against multiple dask schedulers? 302930480
371334589 https://github.com/pydata/xarray/issues/1971#issuecomment-371334589 https://api.github.com/repos/pydata/xarray/issues/1971 MDEyOklzc3VlQ29tbWVudDM3MTMzNDU4OQ== jhamman 2443309 2018-03-08T00:27:52Z 2018-03-08T00:27:52Z MEMBER

I managed to dig up some more information here. I was having a test failure in test_serializable_locks resulting in a traceback that looks like. ``` ... timeout_handle = self.add_timeout(self.time() + timeout, self.stop) self.start() if timeout is not None: self.remove_timeout(timeout_handle) if not future_cell[0].done():

      raise TimeoutError('Operation timed out after %s seconds' % timeout)

E tornado.ioloop.TimeoutError: Operation timed out after 10 seconds

../../../anaconda/envs/xarray36/lib/python3.6/site-packages/tornado/ioloop.py:457: TimeoutError ```

From then on we were using the distributed scheduler and any tests that used dask resulted in a additional timeout (or similar error).

Unfortunately, my attempts to provide a mcve have come up short. If I can come up with one, I'll report upstream but as it is, I can't really transfer this behavior outside of my example.

cc @mrocklin

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should we be testing against multiple dask schedulers? 302930480
371004338 https://github.com/pydata/xarray/issues/1971#issuecomment-371004338 https://api.github.com/repos/pydata/xarray/issues/1971 MDEyOklzc3VlQ29tbWVudDM3MTAwNDMzOA== shoyer 1217238 2018-03-07T02:48:16Z 2018-03-07T02:48:16Z MEMBER

Huh, that's interesting. Yes, I suppose should at least consider parametric tests using both dask's multithreaded and distributed schedulers. Though I'll note that for test we actually set the default scheduler to dask's basic non-parallelized get, for easier debugging: https://github.com/pydata/xarray/blob/54468e1924174a03e7ead3be8545f687f084f4dd/xarray/tests/init.py#L87

For #1793, the key thing would be to ensure that we run the tests in the isolated context without changing the default scheduler.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should we be testing against multiple dask schedulers? 302930480

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 558.864ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows