issue_comments
17 rows where issue = 212561278 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 · 17 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
344949160 | https://github.com/pydata/xarray/issues/1301#issuecomment-344949160 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDM0NDk0OTE2MA== | friedrichknuth 10554254 | 2017-11-16T15:01:59Z | 2017-11-16T15:02:48Z | NONE | Looks like it has been resolved! Tested with the latest pre-release v0.10.0rc2 on the dataset linked by najascutellatus above. https://marine.rutgers.edu/~michaesm/netcdf/data/
xarray==0.10.0rc2-1-g8267fdb dask==0.15.4 ``` 194381 function calls (188429 primitive calls) in 0.869 seconds Ordered by: internal time List reduced from 469 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 50 0.393 0.008 0.393 0.008 {numpy.core.multiarray.arange} 50 0.164 0.003 0.557 0.011 indexing.py:266(index_indexer_1d) 5 0.083 0.017 0.085 0.017 netCDF4.py:185(open_netcdf4_group) 190 0.024 0.000 0.066 0.000 netCDF4.py:256(open_store_variable) 190 0.022 0.000 0.022 0.000 netCDF4_.py:29(init) 50 0.018 0.000 0.021 0.000 {operator.getitem} 5145/3605 0.012 0.000 0.019 0.000 indexing.py:493(shape) 2317/1291 0.009 0.000 0.094 0.000 _abcoll.py:548(update) 26137 0.006 0.000 0.013 0.000 {isinstance} 720 0.005 0.000 0.006 0.000 {method 'getncattr' of 'netCDF4._netCDF4.Variable' objects}
Ordered by: internal time List reduced from 659 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 30 87.527 2.918 87.527 2.918 {pandas._libs.tslib.array_to_timedelta64} 65 7.055 0.109 7.059 0.109 {operator.getitem} 80 0.799 0.010 0.799 0.010 {numpy.core.multiarray.arange} 7895/4420 0.502 0.000 0.524 0.000 utils.py:412(shape) 68 0.442 0.007 0.442 0.007 {pandas._libs.algos.ensure_object} 80 0.350 0.004 1.150 0.014 indexing.py:318(_index_indexer_1d) 60/30 0.296 0.005 88.407 2.947 timedeltas.py:158(_convert_listlike) 30 0.284 0.009 0.298 0.010 algorithms.py:719(checked_add_with_arr) 123 0.140 0.001 0.140 0.001 {method 'astype' of 'numpy.ndarray' objects} 1049/719 0.096 0.000 96.513 0.134 {numpy.core.multiarray.array} ``` |
{ "total_count": 3, "+1": 1, "-1": 0, "laugh": 0, "hooray": 2, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 | |
344437569 | https://github.com/pydata/xarray/issues/1301#issuecomment-344437569 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDM0NDQzNzU2OQ== | jhamman 2443309 | 2017-11-14T23:41:57Z | 2017-11-14T23:41:57Z | MEMBER | @friedrichknuth, any chance you can take a look at this with the latest v0.10 release candidate? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 | |
293619896 | https://github.com/pydata/xarray/issues/1301#issuecomment-293619896 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDI5MzYxOTg5Ng== | friedrichknuth 10554254 | 2017-04-12T15:42:18Z | 2017-04-12T15:42:18Z | NONE | decode_times=False significantly reduces read time, but the proportional performance discrepancy between xarray 0.8.2 and 0.9.1 remains the same. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 | |
293593843 | https://github.com/pydata/xarray/issues/1301#issuecomment-293593843 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDI5MzU5Mzg0Mw== | acrosby 865212 | 2017-04-12T14:24:44Z | 2017-04-12T14:25:29Z | NONE | @friedrichknuth Did you try tests with the most recent version |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 | |
291516997 | https://github.com/pydata/xarray/issues/1301#issuecomment-291516997 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDI5MTUxNjk5Nw== | rabernat 1197350 | 2017-04-04T14:27:18Z | 2017-04-04T14:27:18Z | MEMBER | My understanding is that you are concatenating across the variable My tests showed that it's not necessarily the concat step that is slowing this down. Your profiling suggest that it's a netcdf datetime decoding issue. I wonder if @shoyer or @jhamman have any ideas about how to improve performance here. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 | |
291512017 | https://github.com/pydata/xarray/issues/1301#issuecomment-291512017 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDI5MTUxMjAxNw== | najascutellatus 1360241 | 2017-04-04T14:11:08Z | 2017-04-04T14:11:08Z | NONE | @rabernat This data is computed on demand from the OOI (http://oceanobservatories.org/cyberinfrastructure-technology/). Datasets can be massive and so they seem to be split up in ~500 MB files when data gets too big. That is why obs changes for each file. Would having obs be consistent across all files potentially make open_mfdataset faster? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 | |
286220522 | https://github.com/pydata/xarray/issues/1301#issuecomment-286220522 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDI4NjIyMDUyMg== | friedrichknuth 10554254 | 2017-03-13T19:41:25Z | 2017-03-13T19:41:25Z | NONE | Looks like the issue might be that xarray 0.9.1 is decoding all timestamps on load. xarray==0.9.1, dask==0.13.0 ``` da.set_options(get=da.async.get_sync) %prun -l 10 ds = xr.open_mfdataset('./*.nc')
Ordered by: internal time List reduced from 625 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function)
18 57.057 3.170 57.057 3.170 {pandas.tslib.array_to_timedelta64}
39 0.860 0.022 0.863 0.022 {operator.getitem}
48 0.402 0.008 0.402 0.008 {numpy.core.multiarray.arange}
4341/2463 0.257 0.000 0.273 0.000 utils.py:412(shape)
88 0.245 0.003 0.245 0.003 {pandas.algos.ensure_object}
48 0.158 0.003 0.561 0.012 indexing.py:318(_index_indexer_1d)
36/18 0.135 0.004 57.509 3.195 timedeltas.py:150(_convert_listlike)
18 0.126 0.007 0.130 0.007 nanops.py:815(_checked_add_with_arr)
51 0.070 0.001 0.070 0.001 {method 'astype' of 'numpy.ndarray' objects}
676/475 0.047 0.000 58.853 0.124 {numpy.core.multiarray.array}
xarray==0.8.2, dask==0.13.0 ``` da.set_options(get=da.async.get_sync) %prun -l 10 ds = xr.open_mfdataset('./*.nc')
Ordered by: internal time List reduced from 621 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 2571/1800 0.178 0.000 0.184 0.000 utils.py:387(shape) 18 0.174 0.010 0.174 0.010 {numpy.core.multiarray.arange} 16 0.079 0.005 0.079 0.005 {numpy.core.multiarray.concatenate} 483/420 0.077 0.000 0.125 0.000 {numpy.core.multiarray.array} 15 0.054 0.004 0.197 0.013 indexing.py:259(index_indexer_1d) 3 0.041 0.014 0.043 0.014 netCDF4.py:181(init) 105 0.013 0.000 0.057 0.001 netCDF4_.py:196(open_store_variable) 15 0.012 0.001 0.013 0.001 {operator.getitem} 2715/1665 0.007 0.000 0.178 0.000 indexing.py:343(shape) 5971 0.006 0.000 0.006 0.000 collections.py:71(setitem) ``` The version of dask is held constant in each test. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 | |
286220317 | https://github.com/pydata/xarray/issues/1301#issuecomment-286220317 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDI4NjIyMDMxNw== | rabernat 1197350 | 2017-03-13T19:40:50Z | 2017-03-13T19:40:50Z | MEMBER | And the length of
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 | |
286219858 | https://github.com/pydata/xarray/issues/1301#issuecomment-286219858 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDI4NjIxOTg1OA== | rabernat 1197350 | 2017-03-13T19:39:15Z | 2017-03-13T19:39:15Z | MEMBER | There is definitely something funky with these datasets that is causing xarray to go very slow. This is fast: ```python
But even just trying to print the repr is slow ```python
Maybe some of this has to do with the change at 0.9.0 to allowing index-less dimensions (i.e. coordinates are optional). All of these datasets have such a dimension, e.g.
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 | |
286212647 | https://github.com/pydata/xarray/issues/1301#issuecomment-286212647 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDI4NjIxMjY0Nw== | najascutellatus 1360241 | 2017-03-13T19:12:13Z | 2017-03-13T19:12:13Z | NONE | Data: Five files that are approximately 450 MB each. venv1 dask 0.13.0 py27_0 conda-forge xarray 0.8.2 py27_0 conda-forge 1.51642394066 seconds to load using open_mfdataset venv2: dask 0.13.0 py27_0 conda-forge xarray 0.9.1 py27_0 conda-forge 279.011202097 seconds to load using open_mfdataset I ran the same code in the OP on two conda envs with the same version of dask but two different versions of xarray. There was a significant difference in load time between the two conda envs. I've posted the data on my work site if anyone wants to double check: https://marine.rutgers.edu/~michaesm/netcdf/data/ |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 | |
285149350 | https://github.com/pydata/xarray/issues/1301#issuecomment-285149350 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDI4NTE0OTM1MA== | rabernat 1197350 | 2017-03-08T19:52:11Z | 2017-03-08T19:52:11Z | MEMBER | I just tried this on a few different datasets. Comparing python 2.7, xarray 0.7.2, dask 0.7.1 (an old environment I had on hand) with python 2.7, xarray 0.9.1-28-g1cad803, dask 0.13.0 (my current "production" environment), I could not reproduce. The up-to-date stack was faster by a factor of < 2. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 | |
285110824 | https://github.com/pydata/xarray/issues/1301#issuecomment-285110824 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDI4NTExMDgyNA== | shoyer 1217238 | 2017-03-08T17:35:49Z | 2017-03-08T17:35:49Z | MEMBER |
Indeed, this is highly recommended, see http://dask.pydata.org/en/latest/faq.html |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 | |
285052725 | https://github.com/pydata/xarray/issues/1301#issuecomment-285052725 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDI4NTA1MjcyNQ== | mangecoeur 743508 | 2017-03-08T14:20:30Z | 2017-03-08T14:20:30Z | CONTRIBUTOR | My 2cents - I've found that with big files any |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 | |
284915063 | https://github.com/pydata/xarray/issues/1301#issuecomment-284915063 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDI4NDkxNTA2Mw== | shoyer 1217238 | 2017-03-08T01:16:58Z | 2017-03-08T01:16:58Z | MEMBER | Hmm. It might be interesting to try |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 | |
284914442 | https://github.com/pydata/xarray/issues/1301#issuecomment-284914442 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDI4NDkxNDQ0Mg== | jhamman 2443309 | 2017-03-08T01:13:35Z | 2017-03-08T01:13:35Z | MEMBER | This is what I'm seeing for my
Weren't there some recent changes to the thread lock related to dask distributed? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 | |
284908153 | https://github.com/pydata/xarray/issues/1301#issuecomment-284908153 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDI4NDkwODE1Mw== | shoyer 1217238 | 2017-03-08T00:38:55Z | 2017-03-08T00:38:55Z | MEMBER | Wow, that is pretty bad. Try setting If that doesn't help, try downgrading dask to see if it's responsible. Profiling results from |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 | |
284905152 | https://github.com/pydata/xarray/issues/1301#issuecomment-284905152 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDI4NDkwNTE1Mg== | jhamman 2443309 | 2017-03-08T00:22:10Z | 2017-03-08T00:22:10Z | MEMBER | I've also noticed that we have a bottleneck here. @shoyer - any idea what we changed that could impact this? Could this be coming from a change upstream in dask? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 7