html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1301#issuecomment-291512017,https://api.github.com/repos/pydata/xarray/issues/1301,291512017,MDEyOklzc3VlQ29tbWVudDI5MTUxMjAxNw==,1360241,2017-04-04T14:11:08Z,2017-04-04T14:11:08Z,NONE,@rabernat This data is computed on demand from the OOI (http://oceanobservatories.org/cyberinfrastructure-technology/). Datasets can be massive and so they seem to be split up in ~500 MB files when data gets too big. That is why obs changes for each file. Would having obs be consistent across all files potentially make open_mfdataset faster?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278
https://github.com/pydata/xarray/issues/1301#issuecomment-286212647,https://api.github.com/repos/pydata/xarray/issues/1301,286212647,MDEyOklzc3VlQ29tbWVudDI4NjIxMjY0Nw==,1360241,2017-03-13T19:12:13Z,2017-03-13T19:12:13Z,NONE,"Data: Five files that are approximately 450 MB each.
venv1
dask 0.13.0 py27_0 conda-forge
xarray 0.8.2 py27_0 conda-forge
1.51642394066 seconds to load using open_mfdataset
venv2:
dask 0.13.0 py27_0 conda-forge
xarray 0.9.1 py27_0 conda-forge
279.011202097 seconds to load using open_mfdataset
I ran the same code in the OP on two conda envs with the same version of dask but two different versions of xarray. There was a significant difference in load time between the two conda envs.
I've posted the data on my work site if anyone wants to double check: https://marine.rutgers.edu/~michaesm/netcdf/data/
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278