github: issue_comments: 2 rows where issue = 224553135 and user = 16655388 sorted by updated

2 rows where issue = 224553135 and user = 16655388 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	performed_via_github_app	issue
461561653	https://github.com/pydata/xarray/issues/1385#issuecomment-461561653	https://api.github.com/repos/pydata/xarray/issues/1385	MDEyOklzc3VlQ29tbWVudDQ2MTU2MTY1Mw==	sbiner 16655388	2019-02-07T19:22:58Z	2019-02-07T19:22:58Z	NONE	I just tried and it did not help ... ``` In [5]: run test_ouverture_fichier_nc_vs_xr.py timing glob: 0.00s timing netcdf4: 3.36s timing xarray: 44.82s timing xarray tune: 14.47s In [6]: xr.show_versions() INSTALLED VERSIONS commit: None python: 2.7.15 \|Anaconda, Inc.\| (default, Dec 14 2018, 19:04:19) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-514.2.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: None.None libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.11.3 pandas: 0.24.0 numpy: 1.13.3 scipy: 1.2.0 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 cyordereddict: None dask: 1.0.0 distributed: 1.25.2 matplotlib: 2.2.3 cartopy: None seaborn: None setuptools: 40.5.0 pip: 19.0.1 conda: None pytest: None IPython: 5.8.0 sphinx: 1.8.2 ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		slow performance with open_mfdataset 224553135
461551320	https://github.com/pydata/xarray/issues/1385#issuecomment-461551320	https://api.github.com/repos/pydata/xarray/issues/1385	MDEyOklzc3VlQ29tbWVudDQ2MTU1MTMyMA==	sbiner 16655388	2019-02-07T18:52:53Z	2019-02-07T18:52:53Z	NONE	I have the same problem. open_mfdatasset is 10X slower than nc.MFDataset. I used the following code to get some timing on opening 456 local netcdf files located in a `nc_local` directory (of total size of 532MB) ``` clef = 'nc_local/.nc' t00 = time.time() l_fichiers_nc = sorted(glob.glob(clef)) print ('timing glob: {:6.2f}s'.format(time.time()-t00)) netcdf4 t00 = time.time() ds1 = nc.MFDataset(l_fichiers_nc) dates1 = ouralib.netcdf.calcule_dates(ds1) print ('timing netcdf4: {:6.2f}s'.format(time.time()-t00)) xarray t00 = time.time() ds2 = xr.open_mfdataset(l_fichiers_nc) print ('timing xarray: {:6.2f}s'.format(time.time()-t00)) xarray tune t00 = time.time() ds3 = xr.open_mfdataset(l_fichiers_nc, decode_cf=False, concat_dim='time') ds3 = xr.decode_cf(ds3) print ('timing xarray tune: {:6.2f}s'.format(time.time()-t00)) ``` The output I get is : timing glob: 0.00s timing netcdf4: 3.80s timing xarray: 44.60s timing xarray tune: 15.61s I made tests on a centOS server using python2.7 and 3.6, and on mac OS as well with python3.6. The timing changes but the ratios are similar between netCDF4 and xarray. Is there any way of making open_mfdataset go faster? In case it helps, here are output from `xr.show_versions` and `%prun xr.open_mfdataset(l_fichiers_nc)`. I do not know anything about the output of `%prun` but I have noticed that the first two lines of the ouput are different wether I'm using python 2.7 or python 3.6. I made those tests on centOS and macOS with anaconda environments. for python 2.7: ``` 13996351 function calls (13773659 primitive calls) in 42.133 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 2664 16.290 0.006 16.290 0.006 {time.sleep} 912 6.330 0.007 6.623 0.007 netCDF4_.py:244(_open_netcdf4_group) ``` for python 3.6: ``` 9663408 function calls (9499759 primitive calls) in 31.934 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 5472 15.140 0.003 15.140 0.003 {method 'acquire' of 'thread.lock' objects} 912 5.661 0.006 5.718 0.006 netCDF4.py:244(_open_netcdf4_group) `longer output of %prun with python3.6:` 9663408 function calls (9499759 primitive calls) in 31.934 seconds Ordered by: internal time ncalls tottime 5472 15.140 912 4104 133152/129960 1554550/1554153 912 6384 910 20990 37483/36567 93986 93982 33744 15511 5930 297391/296926 134230 6384 93986 156545 12768 6384 741183/696380 41957/23717 93978 219940 99458 53882 140604 3192 10494 44688 ``` percall cumtime percall filename:lineno(function) 0.003 15.140 0.003 {method 'acquire' of 'thread.lock' objects} 5.661 0.006 5.718 0.006 netCDF4.py:244(open_netcdf4_group) 0.564 0.000 0.757 0.000 {built-in method _operator.getitem} 0.477 0.000 0.660 0.000 indexing.py:496(shape) 0.414 0.000 0.711 0.000 {built-in method builtins.isinstance} 0.260 0.000 0.260 0.000 {method 'close' of 'netCDF4._netCDF4.Dataset' objects} 0.244 0.000 0.953 0.000 netCDF4.py:361(open_store_variable) 0.241 0.000 0.595 0.001 duck_array_ops.py:141(array_equiv) 0.235 0.000 0.343 0.000 {pandas.libs.lib.is_scalar} 0.228 0.000 0.230 0.000 {built-in method builtins.iter} 0.219 0.000 1.607 0.000 variable.py:239(__init__) 0.194 0.000 0.194 0.000 variable.py:706(attrs) 0.189 0.000 0.189 0.000 {method 'getncattr' of 'netCDF4._netCDF4.Variable' objects} 0.175 0.000 0.638 0.000 core.py:1776(normalize_chunks) 0.162 0.000 0.350 0.000 missing.py:183(_isna_ndarraylike) 0.159 0.000 0.380 0.000 {built-in method builtins.getattr} 0.155 0.000 0.269 0.000 abc.py:180(__instancecheck__) 0.142 0.000 0.199 0.000 netCDF4.py:34(init) 0.126 0.000 0.671 0.000 variable.py:414(_parse_dimensions) 0.119 0.000 0.811 0.000 utils.py:450(ndim) 0.119 0.000 0.203 0.000 core.py:747(blockdims_from_blockshape) 0.117 0.000 2.526 0.000 conventions.py:245(decode_cf_variable) 0.116 0.000 0.134 0.000 {built-in method builtins.len} 0.110 0.000 4.395 0.000 {built-in method numpy.core.multiarray.array} 0.110 0.000 0.110 0.000 variable.py:718(encoding) 0.109 0.000 0.109 0.000 _weakrefset.py:70(contains*) 0.100 0.000 0.440 0.000 variable.py:137(as_compatible_data) 0.085 0.000 0.095 0.000 core.py:891(shape) 0.084 0.000 0.628 0.000 variable.py:272(shape) 0.084 0.000 0.170 0.000 utils.py:88(_StartCountStride) 0.081 0.000 0.081 0.000 {method 'reduce' of 'numpy.ufunc' objects} 0.077 0.000 0.157 0.000 variables.py:102(unpack_for_decoding) output of xr.show_versions() ``` xr.show_versions() INSTALLED VERSIONS commit: None python: 3.6.8.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-514.2.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8 xarray: 0.11.0 pandas: 0.24.1 numpy: 1.15.4 scipy: None netCDF4: 1.4.2 h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 PseudonetCDF: None rasterio: None iris: None bottleneck: None cyordereddict: None dask: 1.1.1 distributed: 1.25.3 matplotlib: 3.0.2 cartopy: None seaborn: None setuptools: 40.7.3 pip: 19.0.1 conda: None pytest: None IPython: 7.2.0 sphinx: None ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		slow performance with open_mfdataset 224553135

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

2 rows where issue = 224553135 and user = 16655388 sorted by updated_at descending

INSTALLED VERSIONS

netcdf4

dates1 = ouralib.netcdf.calcule_dates(ds1)

xarray

xarray tune

INSTALLED VERSIONS

Advanced export