github: issue_comments: 17 rows where issue = 212561278 sorted by updated

17 rows where issue = 212561278 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
344949160	https://github.com/pydata/xarray/issues/1301#issuecomment-344949160	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDM0NDk0OTE2MA==	friedrichknuth 10554254	2017-11-16T15:01:59Z	2017-11-16T15:02:48Z	NONE	Looks like it has been resolved! Tested with the latest pre-release v0.10.0rc2 on the dataset linked by najascutellatus above. https://marine.rutgers.edu/~michaesm/netcdf/data/ `da.set_options(get=da.async.get_sync) %prun -l 10 ds = xr.open_mfdataset('./.nc')` xarray==0.10.0rc2-1-g8267fdb dask==0.15.4 ``` 194381 function calls (188429 primitive calls) in 0.869 seconds Ordered by: internal time List reduced from 469 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 50 0.393 0.008 0.393 0.008 {numpy.core.multiarray.arange} 50 0.164 0.003 0.557 0.011 indexing.py:266(index_indexer_1d) 5 0.083 0.017 0.085 0.017 netCDF4.py:185(open_netcdf4_group) 190 0.024 0.000 0.066 0.000 netCDF4.py:256(open_store_variable) 190 0.022 0.000 0.022 0.000 netCDF4_.py:29(init*) 50 0.018 0.000 0.021 0.000 {operator.getitem} 5145/3605 0.012 0.000 0.019 0.000 indexing.py:493(shape) 2317/1291 0.009 0.000 0.094 0.000 _abcoll.py:548(update) 26137 0.006 0.000 0.013 0.000 {isinstance} 720 0.005 0.000 0.006 0.000 {method 'getncattr' of 'netCDF4._netCDF4.Variable' objects} `xarray==0.9.1 dask==0.13.0` `241253 function calls (229881 primitive calls) in 98.123 seconds` Ordered by: internal time List reduced from 659 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 30 87.527 2.918 87.527 2.918 {pandas._libs.tslib.array_to_timedelta64} 65 7.055 0.109 7.059 0.109 {operator.getitem} 80 0.799 0.010 0.799 0.010 {numpy.core.multiarray.arange} 7895/4420 0.502 0.000 0.524 0.000 utils.py:412(shape) 68 0.442 0.007 0.442 0.007 {pandas._libs.algos.ensure_object} 80 0.350 0.004 1.150 0.014 indexing.py:318(_index_indexer_1d) 60/30 0.296 0.005 88.407 2.947 timedeltas.py:158(_convert_listlike) 30 0.284 0.009 0.298 0.010 algorithms.py:719(checked_add_with_arr) 123 0.140 0.001 0.140 0.001 {method 'astype' of 'numpy.ndarray' objects} 1049/719 0.096 0.000 96.513 0.134 {numpy.core.multiarray.array} ```	{ "total_count": 3, "+1": 1, "-1": 0, "laugh": 0, "hooray": 2, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
344437569	https://github.com/pydata/xarray/issues/1301#issuecomment-344437569	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDM0NDQzNzU2OQ==	jhamman 2443309	2017-11-14T23:41:57Z	2017-11-14T23:41:57Z	MEMBER	@friedrichknuth, any chance you can take a look at this with the latest v0.10 release candidate?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
293619896	https://github.com/pydata/xarray/issues/1301#issuecomment-293619896	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI5MzYxOTg5Ng==	friedrichknuth 10554254	2017-04-12T15:42:18Z	2017-04-12T15:42:18Z	NONE	decode_times=False significantly reduces read time, but the proportional performance discrepancy between xarray 0.8.2 and 0.9.1 remains the same.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
293593843	https://github.com/pydata/xarray/issues/1301#issuecomment-293593843	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI5MzU5Mzg0Mw==	acrosby 865212	2017-04-12T14:24:44Z	2017-04-12T14:25:29Z	NONE	@friedrichknuth Did you try tests with the most recent version `decode_times`=True/False on a single file read?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
291516997	https://github.com/pydata/xarray/issues/1301#issuecomment-291516997	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI5MTUxNjk5Nw==	rabernat 1197350	2017-04-04T14:27:18Z	2017-04-04T14:27:18Z	MEMBER	My understanding is that you are concatenating across the variable `obs`, so no, it wouldn't make sense to have `obs` be the same in all the datasets. My tests showed that it's not necessarily the concat step that is slowing this down. Your profiling suggest that it's a netcdf datetime decoding issue. I wonder if @shoyer or @jhamman have any ideas about how to improve performance here.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
291512017	https://github.com/pydata/xarray/issues/1301#issuecomment-291512017	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI5MTUxMjAxNw==	najascutellatus 1360241	2017-04-04T14:11:08Z	2017-04-04T14:11:08Z	NONE	@rabernat This data is computed on demand from the OOI (http://oceanobservatories.org/cyberinfrastructure-technology/). Datasets can be massive and so they seem to be split up in ~500 MB files when data gets too big. That is why obs changes for each file. Would having obs be consistent across all files potentially make open_mfdataset faster?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
286220522	https://github.com/pydata/xarray/issues/1301#issuecomment-286220522	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NjIyMDUyMg==	friedrichknuth 10554254	2017-03-13T19:41:25Z	2017-03-13T19:41:25Z	NONE	Looks like the issue might be that xarray 0.9.1 is decoding all timestamps on load. xarray==0.9.1, dask==0.13.0 ``` da.set_options(get=da.async.get_sync) %prun -l 10 ds = xr.open_mfdataset('./.nc') `167305 function calls (160352 primitive calls) in 59.688 seconds` Ordered by: internal time List reduced from 625 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 18 57.057 3.170 57.057 3.170 {pandas.tslib.array_to_timedelta64} 39 0.860 0.022 0.863 0.022 {operator.getitem} 48 0.402 0.008 0.402 0.008 {numpy.core.multiarray.arange} 4341/2463 0.257 0.000 0.273 0.000 utils.py:412(shape) 88 0.245 0.003 0.245 0.003 {pandas.algos.ensure_object} 48 0.158 0.003 0.561 0.012 indexing.py:318(_index_indexer_1d) 36/18 0.135 0.004 57.509 3.195 timedeltas.py:150(_convert_listlike) 18 0.126 0.007 0.130 0.007 nanops.py:815(_checked_add_with_arr) 51 0.070 0.001 0.070 0.001 {method 'astype' of 'numpy.ndarray' objects} 676/475 0.047 0.000 58.853 0.124 {numpy.core.multiarray.array} ``pandas.tslib.array_to_timedelta64` appears to be the most expensive item on the list, and isn't being run when using xarray 0.8.2. xarray==0.8.2, dask==0.13.0 ``` da.set_options(get=da.async.get_sync) %prun -l 10 ds = xr.open_mfdataset('./.nc') `140668 function calls (136769 primitive calls) in 0.766 seconds` Ordered by: internal time List reduced from 621 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 2571/1800 0.178 0.000 0.184 0.000 utils.py:387(shape) 18 0.174 0.010 0.174 0.010 {numpy.core.multiarray.arange} 16 0.079 0.005 0.079 0.005 {numpy.core.multiarray.concatenate} 483/420 0.077 0.000 0.125 0.000 {numpy.core.multiarray.array} 15 0.054 0.004 0.197 0.013 indexing.py:259(index_indexer_1d) 3 0.041 0.014 0.043 0.014 netCDF4.py:181(init) 105 0.013 0.000 0.057 0.001 netCDF4_.py:196(open_store_variable) 15 0.012 0.001 0.013 0.001 {operator.getitem} 2715/1665 0.007 0.000 0.178 0.000 indexing.py:343(shape) 5971 0.006 0.000 0.006 0.000 collections.py:71(setitem) ``` The version of dask is held constant in each test.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
286220317	https://github.com/pydata/xarray/issues/1301#issuecomment-286220317	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NjIyMDMxNw==	rabernat 1197350	2017-03-13T19:40:50Z	2017-03-13T19:40:50Z	MEMBER	And the length of `obs` is different in each dataset. ```python for myds in dsets: print(myds.dims) Frozen(SortedKeysDict({u'obs': 7537613})) Frozen(SortedKeysDict({u'obs': 7247697})) Frozen(SortedKeysDict({u'obs': 7497680})) Frozen(SortedKeysDict({u'obs': 7661468})) Frozen(SortedKeysDict({u'obs': 5750197})) ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
286219858	https://github.com/pydata/xarray/issues/1301#issuecomment-286219858	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NjIxOTg1OA==	rabernat 1197350	2017-03-13T19:39:15Z	2017-03-13T19:39:15Z	MEMBER	There is definitely something funky with these datasets that is causing xarray to go very slow. This is fast: ```python %time dsets = [xr.open_dataset(fname) for fname in glob('*.nc')] CPU times: user 1.1 s, sys: 664 ms, total: 1.76 s Wall time: 1.78 s ``` But even just trying to print the repr is slow ```python %time print(dsets[0]) CPU times: user 3.66 s, sys: 3.49 s, total: 7.15 s Wall time: 7.28 s ``` Maybe some of this has to do with the change at 0.9.0 to allowing index-less dimensions (i.e. coordinates are optional). All of these datasets have such a dimension, e.g. `<xarray.Dataset> Dimensions: (obs: 7247697) Coordinates: lon (obs) float64 -124.3 -124.3 ... lat (obs) float64 44.64 44.64 ... time (obs) datetime64[ns] 2014-11-10T00:00:00.011253 ... Dimensions without coordinates: obs Data variables: oxy_calphase (obs) float64 3.293e+04 ... quality_flag (obs) \|S2 'ok' 'ok' 'ok' ... ctdbp_no_seawater_conductivity_qc_executed (obs) uint8 29 29 29 29 29 ... ...`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
286212647	https://github.com/pydata/xarray/issues/1301#issuecomment-286212647	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NjIxMjY0Nw==	najascutellatus 1360241	2017-03-13T19:12:13Z	2017-03-13T19:12:13Z	NONE	Data: Five files that are approximately 450 MB each. venv1 dask 0.13.0 py27_0 conda-forge xarray 0.8.2 py27_0 conda-forge 1.51642394066 seconds to load using open_mfdataset venv2: dask 0.13.0 py27_0 conda-forge xarray 0.9.1 py27_0 conda-forge 279.011202097 seconds to load using open_mfdataset I ran the same code in the OP on two conda envs with the same version of dask but two different versions of xarray. There was a significant difference in load time between the two conda envs. I've posted the data on my work site if anyone wants to double check: https://marine.rutgers.edu/~michaesm/netcdf/data/	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
285149350	https://github.com/pydata/xarray/issues/1301#issuecomment-285149350	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NTE0OTM1MA==	rabernat 1197350	2017-03-08T19:52:11Z	2017-03-08T19:52:11Z	MEMBER	I just tried this on a few different datasets. Comparing python 2.7, xarray 0.7.2, dask 0.7.1 (an old environment I had on hand) with python 2.7, xarray 0.9.1-28-g1cad803, dask 0.13.0 (my current "production" environment), I could not reproduce. The up-to-date stack was faster by a factor of < 2.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
285110824	https://github.com/pydata/xarray/issues/1301#issuecomment-285110824	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NTExMDgyNA==	shoyer 1217238	2017-03-08T17:35:49Z	2017-03-08T17:35:49Z	MEMBER	One thing that helps get a better profile is setting dask backend to the non-parallel sync option which gives cleaner profiles. Indeed, this is highly recommended, see http://dask.pydata.org/en/latest/faq.html	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
285052725	https://github.com/pydata/xarray/issues/1301#issuecomment-285052725	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NTA1MjcyNQ==	mangecoeur 743508	2017-03-08T14:20:30Z	2017-03-08T14:20:30Z	CONTRIBUTOR	My 2cents - I've found that with big files any `%prun` tends to show `method 'acquire' of '_thread.lock'` as one of the highest time but it's not necessarily indicative of where the perf issue comes from because it's effectively just waiting for IO which is always slow. One thing that helps get a better profile is setting `dask` backend to the non-parallel `sync` option which gives cleaner profiles.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
284915063	https://github.com/pydata/xarray/issues/1301#issuecomment-284915063	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NDkxNTA2Mw==	shoyer 1217238	2017-03-08T01:16:58Z	2017-03-08T01:16:58Z	MEMBER	Hmm. It might be interesting to try `lock=threading.Lock()` to revert to the old version of the thread lock as well.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
284914442	https://github.com/pydata/xarray/issues/1301#issuecomment-284914442	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NDkxNDQ0Mg==	jhamman 2443309	2017-03-08T01:13:35Z	2017-03-08T01:13:35Z	MEMBER	This is what I'm seeing for my `%prun` profiling: ncalls tottime percall cumtime percall filename:lineno(function) 204 19.783 0.097 19.783 0.097 {method 'acquire' of '_thread.lock' objects} 89208/51003 2.524 0.000 5.553 0.000 indexing.py:361(shape) 1 1.359 1.359 37.876 37.876 <string>:1(<module>) 71379/53550 1.242 0.000 3.266 0.000 utils.py:412(shape) 538295 0.929 0.000 1.317 0.000 {built-in method builtins.isinstance} 24674/13920 0.836 0.000 4.139 0.000 _collections_abc.py:756(update) 9 0.788 0.088 0.803 0.089 netCDF4_.py:178(_open_netcdf4_group) Weren't there some recent changes to the thread lock related to dask distributed?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
284908153	https://github.com/pydata/xarray/issues/1301#issuecomment-284908153	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NDkwODE1Mw==	shoyer 1217238	2017-03-08T00:38:55Z	2017-03-08T00:38:55Z	MEMBER	Wow, that is pretty bad. Try setting `compat='broadcast_equals'` in the `open_mfdataset` call, to restore the default value of that parameter prior v0.9. If that doesn't help, try downgrading dask to see if it's responsible. Profiling results from `%prun` in IPython would also be helpful at tracking down the culprit.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
284905152	https://github.com/pydata/xarray/issues/1301#issuecomment-284905152	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NDkwNTE1Mg==	jhamman 2443309	2017-03-08T00:22:10Z	2017-03-08T00:22:10Z	MEMBER	I've also noticed that we have a bottleneck here. @shoyer - any idea what we changed that could impact this? Could this be coming from a change upstream in dask?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);