issue_comments
2 rows where issue = 224553135 and user = 16655388 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- slow performance with open_mfdataset · 2 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
461561653 | https://github.com/pydata/xarray/issues/1385#issuecomment-461561653 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDQ2MTU2MTY1Mw== | sbiner 16655388 | 2019-02-07T19:22:58Z | 2019-02-07T19:22:58Z | NONE | I just tried and it did not help ... ``` In [5]: run test_ouverture_fichier_nc_vs_xr.py timing glob: 0.00s timing netcdf4: 3.36s timing xarray: 44.82s timing xarray tune: 14.47s In [6]: xr.show_versions() INSTALLED VERSIONScommit: None python: 2.7.15 |Anaconda, Inc.| (default, Dec 14 2018, 19:04:19) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-514.2.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: None.None libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.11.3 pandas: 0.24.0 numpy: 1.13.3 scipy: 1.2.0 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 cyordereddict: None dask: 1.0.0 distributed: 1.25.2 matplotlib: 2.2.3 cartopy: None seaborn: None setuptools: 40.5.0 pip: 19.0.1 conda: None pytest: None IPython: 5.8.0 sphinx: 1.8.2 ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
461551320 | https://github.com/pydata/xarray/issues/1385#issuecomment-461551320 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDQ2MTU1MTMyMA== | sbiner 16655388 | 2019-02-07T18:52:53Z | 2019-02-07T18:52:53Z | NONE | I have the same problem. open_mfdatasset is 10X slower than nc.MFDataset. I used the following code to get some timing on opening 456 local netcdf files located in a netcdf4t00 = time.time() ds1 = nc.MFDataset(l_fichiers_nc) dates1 = ouralib.netcdf.calcule_dates(ds1)print ('timing netcdf4: {:6.2f}s'.format(time.time()-t00)) xarrayt00 = time.time() ds2 = xr.open_mfdataset(l_fichiers_nc) print ('timing xarray: {:6.2f}s'.format(time.time()-t00)) xarray tunet00 = time.time() ds3 = xr.open_mfdataset(l_fichiers_nc, decode_cf=False, concat_dim='time') ds3 = xr.decode_cf(ds3) print ('timing xarray tune: {:6.2f}s'.format(time.time()-t00)) ``` The output I get is :
I made tests on a centOS server using python2.7 and 3.6, and on mac OS as well with python3.6. The timing changes but the ratios are similar between netCDF4 and xarray. Is there any way of making open_mfdataset go faster? In case it helps, here are output from for python 2.7: ``` 13996351 function calls (13773659 primitive calls) in 42.133 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 2664 16.290 0.006 16.290 0.006 {time.sleep} 912 6.330 0.007 6.623 0.007 netCDF4_.py:244(_open_netcdf4_group) ``` for python 3.6: ``` 9663408 function calls (9499759 primitive calls) in 31.934 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function)
5472 15.140 0.003 15.140 0.003 {method 'acquire' of 'thread.lock' objects}
912 5.661 0.006 5.718 0.006 netCDF4.py:244(_open_netcdf4_group)
Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 5472 15.140 0.003 15.140 0.003 {method 'acquire' of 'thread.lock' objects} 912 5.661 0.006 5.718 0.006 netCDF4.py:244(open_netcdf4_group) 4104 0.564 0.000 0.757 0.000 {built-in method _operator.getitem} 133152/129960 0.477 0.000 0.660 0.000 indexing.py:496(shape) 1554550/1554153 0.414 0.000 0.711 0.000 {built-in method builtins.isinstance} 912 0.260 0.000 0.260 0.000 {method 'close' of 'netCDF4._netCDF4.Dataset' objects} 6384 0.244 0.000 0.953 0.000 netCDF4.py:361(open_store_variable) 910 0.241 0.000 0.595 0.001 duck_array_ops.py:141(array_equiv) 20990 0.235 0.000 0.343 0.000 {pandas.libs.lib.is_scalar} 37483/36567 0.228 0.000 0.230 0.000 {built-in method builtins.iter} 93986 0.219 0.000 1.607 0.000 variable.py:239(__init__) 93982 0.194 0.000 0.194 0.000 variable.py:706(attrs) 33744 0.189 0.000 0.189 0.000 {method 'getncattr' of 'netCDF4._netCDF4.Variable' objects} 15511 0.175 0.000 0.638 0.000 core.py:1776(normalize_chunks) 5930 0.162 0.000 0.350 0.000 missing.py:183(_isna_ndarraylike) 297391/296926 0.159 0.000 0.380 0.000 {built-in method builtins.getattr} 134230 0.155 0.000 0.269 0.000 abc.py:180(__instancecheck__) 6384 0.142 0.000 0.199 0.000 netCDF4.py:34(init) 93986 0.126 0.000 0.671 0.000 variable.py:414(_parse_dimensions) 156545 0.119 0.000 0.811 0.000 utils.py:450(ndim) 12768 0.119 0.000 0.203 0.000 core.py:747(blockdims_from_blockshape) 6384 0.117 0.000 2.526 0.000 conventions.py:245(decode_cf_variable) 741183/696380 0.116 0.000 0.134 0.000 {built-in method builtins.len} 41957/23717 0.110 0.000 4.395 0.000 {built-in method numpy.core.multiarray.array} 93978 0.110 0.000 0.110 0.000 variable.py:718(encoding) 219940 0.109 0.000 0.109 0.000 _weakrefset.py:70(contains) 99458 0.100 0.000 0.440 0.000 variable.py:137(as_compatible_data) 53882 0.085 0.000 0.095 0.000 core.py:891(shape) 140604 0.084 0.000 0.628 0.000 variable.py:272(shape) 3192 0.084 0.000 0.170 0.000 utils.py:88(_StartCountStride) 10494 0.081 0.000 0.081 0.000 {method 'reduce' of 'numpy.ufunc' objects} 44688 0.077 0.000 0.157 0.000 variables.py:102(unpack_for_decoding) ``` output of xr.show_versions() ``` xr.show_versions() INSTALLED VERSIONScommit: None python: 3.6.8.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-514.2.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8 xarray: 0.11.0 pandas: 0.24.1 numpy: 1.15.4 scipy: None netCDF4: 1.4.2 h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 PseudonetCDF: None rasterio: None iris: None bottleneck: None cyordereddict: None dask: 1.1.1 distributed: 1.25.3 matplotlib: 3.0.2 cartopy: None seaborn: None setuptools: 40.7.3 pip: 19.0.1 conda: None pytest: None IPython: 7.2.0 sphinx: None ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1