issue_comments
18 rows where issue = 304589831 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Parallel open_mfdataset · 18 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
382487555 | https://github.com/pydata/xarray/pull/1983#issuecomment-382487555 | https://api.github.com/repos/pydata/xarray/issues/1983 | MDEyOklzc3VlQ29tbWVudDM4MjQ4NzU1NQ== | jhamman 2443309 | 2018-04-18T18:38:47Z | 2018-04-18T18:38:47Z | MEMBER | With my last commits here, this feature is completely optional and defaults to the current behavior. I cleaned up the tests a bit further and am now ready to merge this. Baring any objections, I'll merge this on Friday. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel open_mfdataset 304589831 | |
382157273 | https://github.com/pydata/xarray/pull/1983#issuecomment-382157273 | https://api.github.com/repos/pydata/xarray/issues/1983 | MDEyOklzc3VlQ29tbWVudDM4MjE1NzI3Mw== | jhamman 2443309 | 2018-04-17T21:41:03Z | 2018-04-17T21:41:03Z | MEMBER | I think that makes sense for now. We need to experiment with this a bit more but I don't see a problem merging the basic workflow we have now (with a minor change to the default behavior). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel open_mfdataset 304589831 | |
382154051 | https://github.com/pydata/xarray/pull/1983#issuecomment-382154051 | https://api.github.com/repos/pydata/xarray/issues/1983 | MDEyOklzc3VlQ29tbWVudDM4MjE1NDA1MQ== | shoyer 1217238 | 2018-04-17T21:30:53Z | 2018-04-17T21:30:53Z | MEMBER | It sounds like the right resolution for now would be to leave the default as |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel open_mfdataset 304589831 | |
382146851 | https://github.com/pydata/xarray/pull/1983#issuecomment-382146851 | https://api.github.com/repos/pydata/xarray/issues/1983 | MDEyOklzc3VlQ29tbWVudDM4MjE0Njg1MQ== | jhamman 2443309 | 2018-04-17T21:08:29Z | 2018-04-17T21:08:29Z | MEMBER | @NicWayand - Thanks for giving this a go. Some thoughts on your problem... I'm have been using this feature for the past few days and have been seeing a speedup on datasets with many files along the lines of what I showed above. I am applying my tests on perhaps the perfect test architecture (parallel shared fs, fast interconnect, etc.). I think there are many reasons/cases where this won't work as well. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel open_mfdataset 304589831 | |
382071801 | https://github.com/pydata/xarray/pull/1983#issuecomment-382071801 | https://api.github.com/repos/pydata/xarray/issues/1983 | MDEyOklzc3VlQ29tbWVudDM4MjA3MTgwMQ== | NicWayand 1117224 | 2018-04-17T17:14:33Z | 2018-04-17T17:38:42Z | NONE | Thanks @jhamman for working on this! I did a test on my real world data (1202 ~3mb files) on my local computer and am not getting results I expected: 1) No speed up with parallel=True 2) Slow down when using distributed (processes=16 cores=16). Am I missing something? ```python nc_files = glob.glob(E.obs['NSIDC_0081']['sipn_nc']+'/*.nc') print(len(nc_files)) 1202 Parallel False%time ds = xr.open_mfdataset(nc_files, concat_dim='time', parallel=False, autoclose=True) CPU times: user 57.8 s, sys: 3.2 s, total: 1min 1s Wall time: 1min Parallel True with default scheduler%time ds = xr.open_mfdataset(nc_files, concat_dim='time', parallel=True, autoclose=True) CPU times: user 1min 16s, sys: 9.82 s, total: 1min 26s Wall time: 1min 16s Parallel True with distributedfrom dask.distributed import Client client = Client() print(client) <Client: scheduler='tcp://127.0.0.1:43291' processes=16 cores=16> %time ds = xr.open_mfdataset(nc_files, concat_dim='time', parallel=True, autoclose=True) CPU times: user 2min 17s, sys: 12.3 s, total: 2min 29s Wall time: 3min 48s ``` On feature/parallel_open_netcdf commit 280a46f13426a462fb3e983cfd5ac7a0565d1826 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel open_mfdataset 304589831 | |
381277673 | https://github.com/pydata/xarray/pull/1983#issuecomment-381277673 | https://api.github.com/repos/pydata/xarray/issues/1983 | MDEyOklzc3VlQ29tbWVudDM4MTI3NzY3Mw== | jhamman 2443309 | 2018-04-13T22:42:59Z | 2018-04-13T22:42:59Z | MEMBER | @rabernat - I got the tests passing here again. If you can make the time to try your example/test again, it would be great to figure out what wasn't working before. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel open_mfdataset 304589831 | |
380257320 | https://github.com/pydata/xarray/pull/1983#issuecomment-380257320 | https://api.github.com/repos/pydata/xarray/issues/1983 | MDEyOklzc3VlQ29tbWVudDM4MDI1NzMyMA== | jhamman 2443309 | 2018-04-10T21:44:28Z | 2018-04-10T21:45:02Z | MEMBER | @rabernat - I just pushed a few more commits here. Can I ask two questions: When using the distributed scheduler, what configuration are you using? Can you try:
- If this turns out to be a corner case with the distributed scheduler, I can add a integration test for that specific use case. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel open_mfdataset 304589831 | |
380150362 | https://github.com/pydata/xarray/pull/1983#issuecomment-380150362 | https://api.github.com/repos/pydata/xarray/issues/1983 | MDEyOklzc3VlQ29tbWVudDM4MDE1MDM2Mg== | jhamman 2443309 | 2018-04-10T15:49:06Z | 2018-04-10T15:49:06Z | MEMBER | @rabernat - my last commit(s) seem to have broken the CI so I'll need to revisit this. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel open_mfdataset 304589831 | |
380121937 | https://github.com/pydata/xarray/pull/1983#issuecomment-380121937 | https://api.github.com/repos/pydata/xarray/issues/1983 | MDEyOklzc3VlQ29tbWVudDM4MDEyMTkzNw== | rabernat 1197350 | 2018-04-10T14:32:25Z | 2018-04-10T14:32:25Z | MEMBER | I recently tried this branch with my data server and got an error. I opened a dataset this way ```python works fine with parallel=Falseds = xr.open_mfdataset(os.path.join(ddir, 'V1_1.204.nc'), decode_cf=False, parallel=True) ``` and got the following error.
Without the distributed scheduler (but with Any idea what could be going on? (Sorry for the non-reproducible bug report...I figured some trials "in the field" might be useful.) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel open_mfdataset 304589831 | |
379323343 | https://github.com/pydata/xarray/pull/1983#issuecomment-379323343 | https://api.github.com/repos/pydata/xarray/issues/1983 | MDEyOklzc3VlQ29tbWVudDM3OTMyMzM0Mw== | jhamman 2443309 | 2018-04-06T17:33:45Z | 2018-04-06T17:33:45Z | MEMBER | All the tests are passing here? Any final objectors? |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel open_mfdataset 304589831 | |
379306351 | https://github.com/pydata/xarray/pull/1983#issuecomment-379306351 | https://api.github.com/repos/pydata/xarray/issues/1983 | MDEyOklzc3VlQ29tbWVudDM3OTMwNjM1MQ== | jhamman 2443309 | 2018-04-06T16:29:15Z | 2018-04-06T16:29:15Z | MEMBER | I image there will be a small performance cost when the number of files is small. That cost is probably lost in the noise in most i/o operations. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel open_mfdataset 304589831 | |
379305062 | https://github.com/pydata/xarray/pull/1983#issuecomment-379305062 | https://api.github.com/repos/pydata/xarray/issues/1983 | MDEyOklzc3VlQ29tbWVudDM3OTMwNTA2Mg== | rabernat 1197350 | 2018-04-06T16:24:22Z | 2018-04-06T16:24:22Z | MEMBER | Can we imagine cases where it might actually degrade performance? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel open_mfdataset 304589831 | |
379304351 | https://github.com/pydata/xarray/pull/1983#issuecomment-379304351 | https://api.github.com/repos/pydata/xarray/issues/1983 | MDEyOklzc3VlQ29tbWVudDM3OTMwNDM1MQ== | shoyer 1217238 | 2018-04-06T16:21:51Z | 2018-04-06T16:21:51Z | MEMBER | My reason for suggesting default |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel open_mfdataset 304589831 | |
379303753 | https://github.com/pydata/xarray/pull/1983#issuecomment-379303753 | https://api.github.com/repos/pydata/xarray/issues/1983 | MDEyOklzc3VlQ29tbWVudDM3OTMwMzc1Mw== | jhamman 2443309 | 2018-04-06T16:19:35Z | 2018-04-06T16:19:35Z | MEMBER |
I'm not tied to the behavior. It was suggested by @shoyer a while back. Perhaps we try this and evaluate how it works in the wild? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel open_mfdataset 304589831 | |
376689828 | https://github.com/pydata/xarray/pull/1983#issuecomment-376689828 | https://api.github.com/repos/pydata/xarray/issues/1983 | MDEyOklzc3VlQ29tbWVudDM3NjY4OTgyOA== | jhamman 2443309 | 2018-03-27T21:59:35Z | 2018-03-27T21:59:35Z | MEMBER |
I have. See below for a simple example using this feature on Cheyenne. ```python In [1]: import xarray as xr ...: ...: import glob ...: In [2]: pattern = '/glade/u/home/jhamman/workdir/LOCA_daily/met_data/CESM1-BGC/16th/rcp45/r1i1p1//nc' In [3]: len(glob.glob(pattern)) Out[3]: 285 In [4]: %time ds = xr.open_mfdataset(pattern) CPU times: user 15.5 s, sys: 2.62 s, total: 18.1 s Wall time: 42.4 s In [5]: ds.close() In [6]: %time ds = xr.open_mfdataset(pattern, parallel=True) CPU times: user 18.4 s, sys: 5.28 s, total: 23.6 s Wall time: 30.7 s In [7]: ds.close() In [8]: from dask.distributed import Client In [9]: client = Client() clien In [10]: client Out[10]: <Client: scheduler='tcp://127.0.0.1:39853' processes=72 cores=72> In [11]: %time ds = xr.open_mfdataset(pattern, parallel=True, autoclose=True) CPU times: user 10.8 s, sys: 808 ms, total: 11.6 s Wall time: 12.4 s ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel open_mfdataset 304589831 | |
375799794 | https://github.com/pydata/xarray/pull/1983#issuecomment-375799794 | https://api.github.com/repos/pydata/xarray/issues/1983 | MDEyOklzc3VlQ29tbWVudDM3NTc5OTc5NA== | jhamman 2443309 | 2018-03-23T21:12:33Z | 2018-03-23T21:12:33Z | MEMBER |
I've skipped the offending test on appveyor for now. Objectors speak up please. I don't have a windows machine to test on and iterating via appveyor is not something a sane person does 😉. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel open_mfdataset 304589831 | |
373245814 | https://github.com/pydata/xarray/pull/1983#issuecomment-373245814 | https://api.github.com/repos/pydata/xarray/issues/1983 | MDEyOklzc3VlQ29tbWVudDM3MzI0NTgxNA== | jhamman 2443309 | 2018-03-15T03:05:08Z | 2018-03-15T03:05:08Z | MEMBER | If anyone understands Windows file handling with Python, I'm all ears as to why this is failing on AppVeyor. I'm tempted to just skip this test there but thought I should ask for help first... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel open_mfdataset 304589831 | |
372807932 | https://github.com/pydata/xarray/pull/1983#issuecomment-372807932 | https://api.github.com/repos/pydata/xarray/issues/1983 | MDEyOklzc3VlQ29tbWVudDM3MjgwNzkzMg== | jhamman 2443309 | 2018-03-13T20:30:49Z | 2018-03-13T20:30:49Z | MEMBER | @shoyer - I updated this to use dask.delayed. I actually like it more because I only have to call compute once. Thanks for the suggestion. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Parallel open_mfdataset 304589831 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 4