home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "NONE", issue = 212561278 and user = 10554254 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • friedrichknuth · 3 ✖

issue 1

  • open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 · 3 ✖

author_association 1

  • NONE · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
344949160 https://github.com/pydata/xarray/issues/1301#issuecomment-344949160 https://api.github.com/repos/pydata/xarray/issues/1301 MDEyOklzc3VlQ29tbWVudDM0NDk0OTE2MA== friedrichknuth 10554254 2017-11-16T15:01:59Z 2017-11-16T15:02:48Z NONE

Looks like it has been resolved! Tested with the latest pre-release v0.10.0rc2 on the dataset linked by najascutellatus above. https://marine.rutgers.edu/~michaesm/netcdf/data/

da.set_options(get=da.async.get_sync) %prun -l 10 ds = xr.open_mfdataset('./*.nc')

xarray==0.10.0rc2-1-g8267fdb dask==0.15.4 ``` 194381 function calls (188429 primitive calls) in 0.869 seconds

Ordered by: internal time List reduced from 469 to 10 due to restriction <10>

ncalls tottime percall cumtime percall filename:lineno(function) 50 0.393 0.008 0.393 0.008 {numpy.core.multiarray.arange} 50 0.164 0.003 0.557 0.011 indexing.py:266(index_indexer_1d) 5 0.083 0.017 0.085 0.017 netCDF4.py:185(open_netcdf4_group) 190 0.024 0.000 0.066 0.000 netCDF4.py:256(open_store_variable) 190 0.022 0.000 0.022 0.000 netCDF4_.py:29(init) 50 0.018 0.000 0.021 0.000 {operator.getitem} 5145/3605 0.012 0.000 0.019 0.000 indexing.py:493(shape) 2317/1291 0.009 0.000 0.094 0.000 _abcoll.py:548(update) 26137 0.006 0.000 0.013 0.000 {isinstance} 720 0.005 0.000 0.006 0.000 {method 'getncattr' of 'netCDF4._netCDF4.Variable' objects}

xarray==0.9.1 dask==0.13.0

     241253 function calls (229881 primitive calls) in 98.123 seconds

Ordered by: internal time List reduced from 659 to 10 due to restriction <10>

ncalls tottime percall cumtime percall filename:lineno(function) 30 87.527 2.918 87.527 2.918 {pandas._libs.tslib.array_to_timedelta64} 65 7.055 0.109 7.059 0.109 {operator.getitem} 80 0.799 0.010 0.799 0.010 {numpy.core.multiarray.arange} 7895/4420 0.502 0.000 0.524 0.000 utils.py:412(shape) 68 0.442 0.007 0.442 0.007 {pandas._libs.algos.ensure_object} 80 0.350 0.004 1.150 0.014 indexing.py:318(_index_indexer_1d) 60/30 0.296 0.005 88.407 2.947 timedeltas.py:158(_convert_listlike) 30 0.284 0.009 0.298 0.010 algorithms.py:719(checked_add_with_arr) 123 0.140 0.001 0.140 0.001 {method 'astype' of 'numpy.ndarray' objects} 1049/719 0.096 0.000 96.513 0.134 {numpy.core.multiarray.array} ```

{
    "total_count": 3,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 2,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
293619896 https://github.com/pydata/xarray/issues/1301#issuecomment-293619896 https://api.github.com/repos/pydata/xarray/issues/1301 MDEyOklzc3VlQ29tbWVudDI5MzYxOTg5Ng== friedrichknuth 10554254 2017-04-12T15:42:18Z 2017-04-12T15:42:18Z NONE

decode_times=False significantly reduces read time, but the proportional performance discrepancy between xarray 0.8.2 and 0.9.1 remains the same.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
286220522 https://github.com/pydata/xarray/issues/1301#issuecomment-286220522 https://api.github.com/repos/pydata/xarray/issues/1301 MDEyOklzc3VlQ29tbWVudDI4NjIyMDUyMg== friedrichknuth 10554254 2017-03-13T19:41:25Z 2017-03-13T19:41:25Z NONE

Looks like the issue might be that xarray 0.9.1 is decoding all timestamps on load.

xarray==0.9.1, dask==0.13.0

``` da.set_options(get=da.async.get_sync) %prun -l 10 ds = xr.open_mfdataset('./*.nc')

     167305 function calls (160352 primitive calls) in 59.688 seconds

Ordered by: internal time List reduced from 625 to 10 due to restriction <10>

ncalls tottime percall cumtime percall filename:lineno(function) 18 57.057 3.170 57.057 3.170 {pandas.tslib.array_to_timedelta64} 39 0.860 0.022 0.863 0.022 {operator.getitem} 48 0.402 0.008 0.402 0.008 {numpy.core.multiarray.arange} 4341/2463 0.257 0.000 0.273 0.000 utils.py:412(shape) 88 0.245 0.003 0.245 0.003 {pandas.algos.ensure_object} 48 0.158 0.003 0.561 0.012 indexing.py:318(_index_indexer_1d) 36/18 0.135 0.004 57.509 3.195 timedeltas.py:150(_convert_listlike) 18 0.126 0.007 0.130 0.007 nanops.py:815(_checked_add_with_arr) 51 0.070 0.001 0.070 0.001 {method 'astype' of 'numpy.ndarray' objects} 676/475 0.047 0.000 58.853 0.124 {numpy.core.multiarray.array} ``pandas.tslib.array_to_timedelta64` appears to be the most expensive item on the list, and isn't being run when using xarray 0.8.2.

xarray==0.8.2, dask==0.13.0

``` da.set_options(get=da.async.get_sync) %prun -l 10 ds = xr.open_mfdataset('./*.nc')

     140668 function calls (136769 primitive calls) in 0.766 seconds

Ordered by: internal time List reduced from 621 to 10 due to restriction <10>

ncalls tottime percall cumtime percall filename:lineno(function) 2571/1800 0.178 0.000 0.184 0.000 utils.py:387(shape) 18 0.174 0.010 0.174 0.010 {numpy.core.multiarray.arange} 16 0.079 0.005 0.079 0.005 {numpy.core.multiarray.concatenate} 483/420 0.077 0.000 0.125 0.000 {numpy.core.multiarray.array} 15 0.054 0.004 0.197 0.013 indexing.py:259(index_indexer_1d) 3 0.041 0.014 0.043 0.014 netCDF4.py:181(init) 105 0.013 0.000 0.057 0.001 netCDF4_.py:196(open_store_variable) 15 0.012 0.001 0.013 0.001 {operator.getitem} 2715/1665 0.007 0.000 0.178 0.000 indexing.py:343(shape) 5971 0.006 0.000 0.006 0.000 collections.py:71(setitem) ``` The version of dask is held constant in each test.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.358ms · About: xarray-datasette