html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1385#issuecomment-461561653,https://api.github.com/repos/pydata/xarray/issues/1385,461561653,MDEyOklzc3VlQ29tbWVudDQ2MTU2MTY1Mw==,16655388,2019-02-07T19:22:58Z,2019-02-07T19:22:58Z,NONE,"I just tried and it did not help ...
```
In [5]: run test_ouverture_fichier_nc_vs_xr.py
timing glob: 0.00s
timing netcdf4: 3.36s
timing xarray: 44.82s
timing xarray tune: 14.47s
In [6]: xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.15 |Anaconda, Inc.| (default, Dec 14 2018, 19:04:19)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-514.2.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
LOCALE: None.None
libhdf5: 1.10.4
libnetcdf: 4.6.1
xarray: 0.11.3
pandas: 0.24.0
numpy: 1.13.3
scipy: 1.2.0
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
cyordereddict: None
dask: 1.0.0
distributed: 1.25.2
matplotlib: 2.2.3
cartopy: None
seaborn: None
setuptools: 40.5.0
pip: 19.0.1
conda: None
pytest: None
IPython: 5.8.0
sphinx: 1.8.2
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135
https://github.com/pydata/xarray/issues/1385#issuecomment-461551320,https://api.github.com/repos/pydata/xarray/issues/1385,461551320,MDEyOklzc3VlQ29tbWVudDQ2MTU1MTMyMA==,16655388,2019-02-07T18:52:53Z,2019-02-07T18:52:53Z,NONE,"I have the same problem. open_mfdatasset is 10X slower than nc.MFDataset. I used the following code to get some timing on opening 456 local netcdf files located in a `nc_local` directory (of total size of 532MB)
```
clef = 'nc_local/*.nc'
t00 = time.time()
l_fichiers_nc = sorted(glob.glob(clef))
print ('timing glob: {:6.2f}s'.format(time.time()-t00))
# netcdf4
t00 = time.time()
ds1 = nc.MFDataset(l_fichiers_nc)
#dates1 = ouralib.netcdf.calcule_dates(ds1)
print ('timing netcdf4: {:6.2f}s'.format(time.time()-t00))
# xarray
t00 = time.time()
ds2 = xr.open_mfdataset(l_fichiers_nc)
print ('timing xarray: {:6.2f}s'.format(time.time()-t00))
# xarray tune
t00 = time.time()
ds3 = xr.open_mfdataset(l_fichiers_nc, decode_cf=False, concat_dim='time')
ds3 = xr.decode_cf(ds3)
print ('timing xarray tune: {:6.2f}s'.format(time.time()-t00))
```
The output I get is :
> timing glob: 0.00s
timing netcdf4: 3.80s
timing xarray: 44.60s
timing xarray tune: 15.61s
I made tests on a centOS server using python2.7 and 3.6, and on mac OS as well with python3.6. The timing changes but the ratios are similar between netCDF4 and xarray.
Is there any way of making open_mfdataset go faster?
In case it helps, here are output from `xr.show_versions` and `%prun xr.open_mfdataset(l_fichiers_nc)`. I do not know anything about the output of `%prun` but I have noticed that the first two lines of the ouput are different wether I'm using python 2.7 or python 3.6. I made those tests on centOS and macOS with anaconda environments.
for python 2.7:
```
13996351 function calls (13773659 primitive calls) in 42.133 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
2664 16.290 0.006 16.290 0.006 {time.sleep}
912 6.330 0.007 6.623 0.007 netCDF4_.py:244(_open_netcdf4_group)
```
for python 3.6:
```
9663408 function calls (9499759 primitive calls) in 31.934 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
5472 15.140 0.003 15.140 0.003 {method 'acquire' of '_thread.lock' objects}
912 5.661 0.006 5.718 0.006 netCDF4_.py:244(_open_netcdf4_group)
```
longer output of %prun with python3.6:
```
9663408 function calls (9499759 primitive calls) in 31.934 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
5472 15.140 0.003 15.140 0.003 {method 'acquire' of '_thread.lock' objects}
912 5.661 0.006 5.718 0.006 netCDF4_.py:244(_open_netcdf4_group)
4104 0.564 0.000 0.757 0.000 {built-in method _operator.getitem}
133152/129960 0.477 0.000 0.660 0.000 indexing.py:496(shape)
1554550/1554153 0.414 0.000 0.711 0.000 {built-in method builtins.isinstance}
912 0.260 0.000 0.260 0.000 {method 'close' of 'netCDF4._netCDF4.Dataset' objects}
6384 0.244 0.000 0.953 0.000 netCDF4_.py:361(open_store_variable)
910 0.241 0.000 0.595 0.001 duck_array_ops.py:141(array_equiv)
20990 0.235 0.000 0.343 0.000 {pandas._libs.lib.is_scalar}
37483/36567 0.228 0.000 0.230 0.000 {built-in method builtins.iter}
93986 0.219 0.000 1.607 0.000 variable.py:239(__init__)
93982 0.194 0.000 0.194 0.000 variable.py:706(attrs)
33744 0.189 0.000 0.189 0.000 {method 'getncattr' of 'netCDF4._netCDF4.Variable' objects}
15511 0.175 0.000 0.638 0.000 core.py:1776(normalize_chunks)
5930 0.162 0.000 0.350 0.000 missing.py:183(_isna_ndarraylike)
297391/296926 0.159 0.000 0.380 0.000 {built-in method builtins.getattr}
134230 0.155 0.000 0.269 0.000 abc.py:180(__instancecheck__)
6384 0.142 0.000 0.199 0.000 netCDF4_.py:34(__init__)
93986 0.126 0.000 0.671 0.000 variable.py:414(_parse_dimensions)
156545 0.119 0.000 0.811 0.000 utils.py:450(ndim)
12768 0.119 0.000 0.203 0.000 core.py:747(blockdims_from_blockshape)
6384 0.117 0.000 2.526 0.000 conventions.py:245(decode_cf_variable)
741183/696380 0.116 0.000 0.134 0.000 {built-in method builtins.len}
41957/23717 0.110 0.000 4.395 0.000 {built-in method numpy.core.multiarray.array}
93978 0.110 0.000 0.110 0.000 variable.py:718(encoding)
219940 0.109 0.000 0.109 0.000 _weakrefset.py:70(__contains__)
99458 0.100 0.000 0.440 0.000 variable.py:137(as_compatible_data)
53882 0.085 0.000 0.095 0.000 core.py:891(shape)
140604 0.084 0.000 0.628 0.000 variable.py:272(shape)
3192 0.084 0.000 0.170 0.000 utils.py:88(_StartCountStride)
10494 0.081 0.000 0.081 0.000 {method 'reduce' of 'numpy.ufunc' objects}
44688 0.077 0.000 0.157 0.000 variables.py:102(unpack_for_decoding)
```
output of xr.show_versions()
```
xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-514.2.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
LOCALE: en_CA.UTF-8
xarray: 0.11.0
pandas: 0.24.1
numpy: 1.15.4
scipy: None
netCDF4: 1.4.2
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
PseudonetCDF: None
rasterio: None
iris: None
bottleneck: None
cyordereddict: None
dask: 1.1.1
distributed: 1.25.3
matplotlib: 3.0.2
cartopy: None
seaborn: None
setuptools: 40.7.3
pip: 19.0.1
conda: None
pytest: None
IPython: 7.2.0
sphinx: None
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135