home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

13 rows where issue = 304589831 and user = 2443309 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • jhamman · 13 ✖

issue 1

  • Parallel open_mfdataset · 13 ✖

author_association 1

  • MEMBER 13
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
382487555 https://github.com/pydata/xarray/pull/1983#issuecomment-382487555 https://api.github.com/repos/pydata/xarray/issues/1983 MDEyOklzc3VlQ29tbWVudDM4MjQ4NzU1NQ== jhamman 2443309 2018-04-18T18:38:47Z 2018-04-18T18:38:47Z MEMBER

With my last commits here, this feature is completely optional and defaults to the current behavior. I cleaned up the tests a bit further and am now ready to merge this. Baring any objections, I'll merge this on Friday.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Parallel open_mfdataset 304589831
382157273 https://github.com/pydata/xarray/pull/1983#issuecomment-382157273 https://api.github.com/repos/pydata/xarray/issues/1983 MDEyOklzc3VlQ29tbWVudDM4MjE1NzI3Mw== jhamman 2443309 2018-04-17T21:41:03Z 2018-04-17T21:41:03Z MEMBER

I think that makes sense for now. We need to experiment with this a bit more but I don't see a problem merging the basic workflow we have now (with a minor change to the default behavior).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Parallel open_mfdataset 304589831
382146851 https://github.com/pydata/xarray/pull/1983#issuecomment-382146851 https://api.github.com/repos/pydata/xarray/issues/1983 MDEyOklzc3VlQ29tbWVudDM4MjE0Njg1MQ== jhamman 2443309 2018-04-17T21:08:29Z 2018-04-17T21:08:29Z MEMBER

@NicWayand - Thanks for giving this a go. Some thoughts on your problem...

I'm have been using this feature for the past few days and have been seeing a speedup on datasets with many files along the lines of what I showed above. I am applying my tests on perhaps the perfect test architecture (parallel shared fs, fast interconnect, etc.). I think there are many reasons/cases where this won't work as well.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Parallel open_mfdataset 304589831
381277673 https://github.com/pydata/xarray/pull/1983#issuecomment-381277673 https://api.github.com/repos/pydata/xarray/issues/1983 MDEyOklzc3VlQ29tbWVudDM4MTI3NzY3Mw== jhamman 2443309 2018-04-13T22:42:59Z 2018-04-13T22:42:59Z MEMBER

@rabernat - I got the tests passing here again. If you can make the time to try your example/test again, it would be great to figure out what wasn't working before.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Parallel open_mfdataset 304589831
380257320 https://github.com/pydata/xarray/pull/1983#issuecomment-380257320 https://api.github.com/repos/pydata/xarray/issues/1983 MDEyOklzc3VlQ29tbWVudDM4MDI1NzMyMA== jhamman 2443309 2018-04-10T21:44:28Z 2018-04-10T21:45:02Z MEMBER

@rabernat - I just pushed a few more commits here. Can I ask two questions:

When using the distributed scheduler, what configuration are you using? Can you try: - autoclose=True (in open_mfdataset) - processes=True (in client)

If this turns out to be a corner case with the distributed scheduler, I can add a integration test for that specific use case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Parallel open_mfdataset 304589831
380150362 https://github.com/pydata/xarray/pull/1983#issuecomment-380150362 https://api.github.com/repos/pydata/xarray/issues/1983 MDEyOklzc3VlQ29tbWVudDM4MDE1MDM2Mg== jhamman 2443309 2018-04-10T15:49:06Z 2018-04-10T15:49:06Z MEMBER

@rabernat - my last commit(s) seem to have broken the CI so I'll need to revisit this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Parallel open_mfdataset 304589831
379323343 https://github.com/pydata/xarray/pull/1983#issuecomment-379323343 https://api.github.com/repos/pydata/xarray/issues/1983 MDEyOklzc3VlQ29tbWVudDM3OTMyMzM0Mw== jhamman 2443309 2018-04-06T17:33:45Z 2018-04-06T17:33:45Z MEMBER

All the tests are passing here? Any final objectors?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Parallel open_mfdataset 304589831
379306351 https://github.com/pydata/xarray/pull/1983#issuecomment-379306351 https://api.github.com/repos/pydata/xarray/issues/1983 MDEyOklzc3VlQ29tbWVudDM3OTMwNjM1MQ== jhamman 2443309 2018-04-06T16:29:15Z 2018-04-06T16:29:15Z MEMBER

I image there will be a small performance cost when the number of files is small. That cost is probably lost in the noise in most i/o operations.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Parallel open_mfdataset 304589831
379303753 https://github.com/pydata/xarray/pull/1983#issuecomment-379303753 https://api.github.com/repos/pydata/xarray/issues/1983 MDEyOklzc3VlQ29tbWVudDM3OTMwMzc1Mw== jhamman 2443309 2018-04-06T16:19:35Z 2018-04-06T16:19:35Z MEMBER

I'm curious about the logic of defaulting to parallel when using distributed.

I'm not tied to the behavior. It was suggested by @shoyer a while back. Perhaps we try this and evaluate how it works in the wild?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Parallel open_mfdataset 304589831
376689828 https://github.com/pydata/xarray/pull/1983#issuecomment-376689828 https://api.github.com/repos/pydata/xarray/issues/1983 MDEyOklzc3VlQ29tbWVudDM3NjY4OTgyOA== jhamman 2443309 2018-03-27T21:59:35Z 2018-03-27T21:59:35Z MEMBER

Have you tested this with both a local system and an HPC cluster?

I have. See below for a simple example using this feature on Cheyenne.

```python In [1]: import xarray as xr ...: ...: import glob ...:

In [2]: pattern = '/glade/u/home/jhamman/workdir/LOCA_daily/met_data/CESM1-BGC/16th/rcp45/r1i1p1//nc'

In [3]: len(glob.glob(pattern)) Out[3]: 285

In [4]: %time ds = xr.open_mfdataset(pattern) CPU times: user 15.5 s, sys: 2.62 s, total: 18.1 s Wall time: 42.4 s

In [5]: ds.close()

In [6]: %time ds = xr.open_mfdataset(pattern, parallel=True) CPU times: user 18.4 s, sys: 5.28 s, total: 23.6 s Wall time: 30.7 s

In [7]: ds.close()

In [8]: from dask.distributed import Client

In [9]: client = Client() clien In [10]: client Out[10]: <Client: scheduler='tcp://127.0.0.1:39853' processes=72 cores=72>

In [11]: %time ds = xr.open_mfdataset(pattern, parallel=True, autoclose=True) CPU times: user 10.8 s, sys: 808 ms, total: 11.6 s Wall time: 12.4 s ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Parallel open_mfdataset 304589831
375799794 https://github.com/pydata/xarray/pull/1983#issuecomment-375799794 https://api.github.com/repos/pydata/xarray/issues/1983 MDEyOklzc3VlQ29tbWVudDM3NTc5OTc5NA== jhamman 2443309 2018-03-23T21:12:33Z 2018-03-23T21:12:33Z MEMBER

I'm tempted to just skip this test there but thought I should ask for help first...

I've skipped the offending test on appveyor for now. Objectors speak up please. I don't have a windows machine to test on and iterating via appveyor is not something a sane person does 😉.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Parallel open_mfdataset 304589831
373245814 https://github.com/pydata/xarray/pull/1983#issuecomment-373245814 https://api.github.com/repos/pydata/xarray/issues/1983 MDEyOklzc3VlQ29tbWVudDM3MzI0NTgxNA== jhamman 2443309 2018-03-15T03:05:08Z 2018-03-15T03:05:08Z MEMBER

If anyone understands Windows file handling with Python, I'm all ears as to why this is failing on AppVeyor. I'm tempted to just skip this test there but thought I should ask for help first...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Parallel open_mfdataset 304589831
372807932 https://github.com/pydata/xarray/pull/1983#issuecomment-372807932 https://api.github.com/repos/pydata/xarray/issues/1983 MDEyOklzc3VlQ29tbWVudDM3MjgwNzkzMg== jhamman 2443309 2018-03-13T20:30:49Z 2018-03-13T20:30:49Z MEMBER

@shoyer - I updated this to use dask.delayed. I actually like it more because I only have to call compute once. Thanks for the suggestion.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Parallel open_mfdataset 304589831

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 107.35ms · About: xarray-datasette