html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/pull/1983#issuecomment-382487555,https://api.github.com/repos/pydata/xarray/issues/1983,382487555,MDEyOklzc3VlQ29tbWVudDM4MjQ4NzU1NQ==,2443309,2018-04-18T18:38:47Z,2018-04-18T18:38:47Z,MEMBER,"With my last commits here, this feature is completely optional and defaults to the current behavior. I cleaned up the tests a bit further and am now ready to merge this. Baring any objections, I'll merge this on Friday. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,304589831 https://github.com/pydata/xarray/pull/1983#issuecomment-382157273,https://api.github.com/repos/pydata/xarray/issues/1983,382157273,MDEyOklzc3VlQ29tbWVudDM4MjE1NzI3Mw==,2443309,2018-04-17T21:41:03Z,2018-04-17T21:41:03Z,MEMBER,I think that makes sense for now. We need to experiment with this a bit more but I don't see a problem merging the basic workflow we have now (with a minor change to the default behavior). ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,304589831 https://github.com/pydata/xarray/pull/1983#issuecomment-382146851,https://api.github.com/repos/pydata/xarray/issues/1983,382146851,MDEyOklzc3VlQ29tbWVudDM4MjE0Njg1MQ==,2443309,2018-04-17T21:08:29Z,2018-04-17T21:08:29Z,MEMBER,"@NicWayand - Thanks for giving this a go. Some thoughts on your problem... I'm have been using this feature for the past few days and have been seeing a speedup on datasets with many files along the lines of what I showed above. I am applying my tests on perhaps the perfect test architecture (parallel shared fs, fast interconnect, etc.). I think there are many reasons/cases where this won't work as well. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,304589831 https://github.com/pydata/xarray/pull/1983#issuecomment-381277673,https://api.github.com/repos/pydata/xarray/issues/1983,381277673,MDEyOklzc3VlQ29tbWVudDM4MTI3NzY3Mw==,2443309,2018-04-13T22:42:59Z,2018-04-13T22:42:59Z,MEMBER,"@rabernat - I got the tests passing here again. If you can make the time to try your example/test again, it would be great to figure out what wasn't working before. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,304589831 https://github.com/pydata/xarray/pull/1983#issuecomment-380257320,https://api.github.com/repos/pydata/xarray/issues/1983,380257320,MDEyOklzc3VlQ29tbWVudDM4MDI1NzMyMA==,2443309,2018-04-10T21:44:28Z,2018-04-10T21:45:02Z,MEMBER,"@rabernat - I just pushed a few more commits here. Can I ask two questions: When using the distributed scheduler, what configuration are you using? Can you try: - `autoclose=True` (in open_mfdataset) - `processes=True` (in client) If this turns out to be a corner case with the distributed scheduler, I can add a integration test for that specific use case.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,304589831 https://github.com/pydata/xarray/pull/1983#issuecomment-380150362,https://api.github.com/repos/pydata/xarray/issues/1983,380150362,MDEyOklzc3VlQ29tbWVudDM4MDE1MDM2Mg==,2443309,2018-04-10T15:49:06Z,2018-04-10T15:49:06Z,MEMBER,@rabernat - my last commit(s) seem to have broken the CI so I'll need to revisit this.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,304589831 https://github.com/pydata/xarray/pull/1983#issuecomment-379323343,https://api.github.com/repos/pydata/xarray/issues/1983,379323343,MDEyOklzc3VlQ29tbWVudDM3OTMyMzM0Mw==,2443309,2018-04-06T17:33:45Z,2018-04-06T17:33:45Z,MEMBER,All the tests are passing here? Any final objectors?,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,304589831 https://github.com/pydata/xarray/pull/1983#issuecomment-379306351,https://api.github.com/repos/pydata/xarray/issues/1983,379306351,MDEyOklzc3VlQ29tbWVudDM3OTMwNjM1MQ==,2443309,2018-04-06T16:29:15Z,2018-04-06T16:29:15Z,MEMBER,I image there will be a small performance cost when the number of files is small. That cost is probably lost in the noise in most i/o operations. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,304589831 https://github.com/pydata/xarray/pull/1983#issuecomment-379303753,https://api.github.com/repos/pydata/xarray/issues/1983,379303753,MDEyOklzc3VlQ29tbWVudDM3OTMwMzc1Mw==,2443309,2018-04-06T16:19:35Z,2018-04-06T16:19:35Z,MEMBER,"> I'm curious about the logic of defaulting to parallel when using distributed. I'm not tied to the behavior. It was [suggested](https://github.com/pydata/xarray/pull/1983#discussion_r173990300) by @shoyer a while back. Perhaps we try this and evaluate how it works in the wild? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,304589831 https://github.com/pydata/xarray/pull/1983#issuecomment-376689828,https://api.github.com/repos/pydata/xarray/issues/1983,376689828,MDEyOklzc3VlQ29tbWVudDM3NjY4OTgyOA==,2443309,2018-03-27T21:59:35Z,2018-03-27T21:59:35Z,MEMBER,"> Have you tested this with both a local system and an HPC cluster? I have. See below for a simple example using this feature on [Cheyenne](https://www2.cisl.ucar.edu/resources/computational-systems/cheyenne). ```python In [1]: import xarray as xr ...: ...: import glob ...: In [2]: pattern = '/glade/u/home/jhamman/workdir/LOCA_daily/met_data/CESM1-BGC/16th/rcp45/r1i1p1/*/*nc' In [3]: len(glob.glob(pattern)) Out[3]: 285 In [4]: %time ds = xr.open_mfdataset(pattern) CPU times: user 15.5 s, sys: 2.62 s, total: 18.1 s Wall time: 42.4 s In [5]: ds.close() In [6]: %time ds = xr.open_mfdataset(pattern, parallel=True) CPU times: user 18.4 s, sys: 5.28 s, total: 23.6 s Wall time: 30.7 s In [7]: ds.close() In [8]: from dask.distributed import Client In [9]: client = Client() clien In [10]: client Out[10]: In [11]: %time ds = xr.open_mfdataset(pattern, parallel=True, autoclose=True) CPU times: user 10.8 s, sys: 808 ms, total: 11.6 s Wall time: 12.4 s ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,304589831 https://github.com/pydata/xarray/pull/1983#issuecomment-375799794,https://api.github.com/repos/pydata/xarray/issues/1983,375799794,MDEyOklzc3VlQ29tbWVudDM3NTc5OTc5NA==,2443309,2018-03-23T21:12:33Z,2018-03-23T21:12:33Z,MEMBER,"> I'm tempted to just skip this test there but thought I should ask for help first... I've skipped the offending test on appveyor for now. Objectors speak up please. I don't have a windows machine to test on and iterating via appveyor is not something a sane person does 😉. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,304589831 https://github.com/pydata/xarray/pull/1983#issuecomment-373245814,https://api.github.com/repos/pydata/xarray/issues/1983,373245814,MDEyOklzc3VlQ29tbWVudDM3MzI0NTgxNA==,2443309,2018-03-15T03:05:08Z,2018-03-15T03:05:08Z,MEMBER,"If anyone understands Windows file handling with Python, I'm all ears as to why this is failing on AppVeyor. I'm tempted to just skip this test there but thought I should ask for help first...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,304589831 https://github.com/pydata/xarray/pull/1983#issuecomment-372807932,https://api.github.com/repos/pydata/xarray/issues/1983,372807932,MDEyOklzc3VlQ29tbWVudDM3MjgwNzkzMg==,2443309,2018-03-13T20:30:49Z,2018-03-13T20:30:49Z,MEMBER,@shoyer - I updated this to use dask.delayed. I actually like it more because I only have to call compute once. Thanks for the suggestion. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,304589831