github: issue_comments: 10 rows where author_association = "MEMBER" and issue = 212561278 sorted by updated

10 rows where author_association = "MEMBER" and issue = 212561278 sorted by updated_at descending

Search:

✖

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
344437569	https://github.com/pydata/xarray/issues/1301#issuecomment-344437569	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDM0NDQzNzU2OQ==	jhamman 2443309	2017-11-14T23:41:57Z	2017-11-14T23:41:57Z	MEMBER	@friedrichknuth, any chance you can take a look at this with the latest v0.10 release candidate?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
291516997	https://github.com/pydata/xarray/issues/1301#issuecomment-291516997	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI5MTUxNjk5Nw==	rabernat 1197350	2017-04-04T14:27:18Z	2017-04-04T14:27:18Z	MEMBER	My understanding is that you are concatenating across the variable `obs`, so no, it wouldn't make sense to have `obs` be the same in all the datasets. My tests showed that it's not necessarily the concat step that is slowing this down. Your profiling suggest that it's a netcdf datetime decoding issue. I wonder if @shoyer or @jhamman have any ideas about how to improve performance here.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
286220317	https://github.com/pydata/xarray/issues/1301#issuecomment-286220317	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NjIyMDMxNw==	rabernat 1197350	2017-03-13T19:40:50Z	2017-03-13T19:40:50Z	MEMBER	And the length of `obs` is different in each dataset. ```python for myds in dsets: print(myds.dims) Frozen(SortedKeysDict({u'obs': 7537613})) Frozen(SortedKeysDict({u'obs': 7247697})) Frozen(SortedKeysDict({u'obs': 7497680})) Frozen(SortedKeysDict({u'obs': 7661468})) Frozen(SortedKeysDict({u'obs': 5750197})) ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
286219858	https://github.com/pydata/xarray/issues/1301#issuecomment-286219858	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NjIxOTg1OA==	rabernat 1197350	2017-03-13T19:39:15Z	2017-03-13T19:39:15Z	MEMBER	There is definitely something funky with these datasets that is causing xarray to go very slow. This is fast: ```python %time dsets = [xr.open_dataset(fname) for fname in glob('*.nc')] CPU times: user 1.1 s, sys: 664 ms, total: 1.76 s Wall time: 1.78 s ``` But even just trying to print the repr is slow ```python %time print(dsets[0]) CPU times: user 3.66 s, sys: 3.49 s, total: 7.15 s Wall time: 7.28 s ``` Maybe some of this has to do with the change at 0.9.0 to allowing index-less dimensions (i.e. coordinates are optional). All of these datasets have such a dimension, e.g. `<xarray.Dataset> Dimensions: (obs: 7247697) Coordinates: lon (obs) float64 -124.3 -124.3 ... lat (obs) float64 44.64 44.64 ... time (obs) datetime64[ns] 2014-11-10T00:00:00.011253 ... Dimensions without coordinates: obs Data variables: oxy_calphase (obs) float64 3.293e+04 ... quality_flag (obs) \|S2 'ok' 'ok' 'ok' ... ctdbp_no_seawater_conductivity_qc_executed (obs) uint8 29 29 29 29 29 ... ...`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
285149350	https://github.com/pydata/xarray/issues/1301#issuecomment-285149350	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NTE0OTM1MA==	rabernat 1197350	2017-03-08T19:52:11Z	2017-03-08T19:52:11Z	MEMBER	I just tried this on a few different datasets. Comparing python 2.7, xarray 0.7.2, dask 0.7.1 (an old environment I had on hand) with python 2.7, xarray 0.9.1-28-g1cad803, dask 0.13.0 (my current "production" environment), I could not reproduce. The up-to-date stack was faster by a factor of < 2.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
285110824	https://github.com/pydata/xarray/issues/1301#issuecomment-285110824	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NTExMDgyNA==	shoyer 1217238	2017-03-08T17:35:49Z	2017-03-08T17:35:49Z	MEMBER	One thing that helps get a better profile is setting dask backend to the non-parallel sync option which gives cleaner profiles. Indeed, this is highly recommended, see http://dask.pydata.org/en/latest/faq.html	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
284915063	https://github.com/pydata/xarray/issues/1301#issuecomment-284915063	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NDkxNTA2Mw==	shoyer 1217238	2017-03-08T01:16:58Z	2017-03-08T01:16:58Z	MEMBER	Hmm. It might be interesting to try `lock=threading.Lock()` to revert to the old version of the thread lock as well.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
284914442	https://github.com/pydata/xarray/issues/1301#issuecomment-284914442	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NDkxNDQ0Mg==	jhamman 2443309	2017-03-08T01:13:35Z	2017-03-08T01:13:35Z	MEMBER	This is what I'm seeing for my `%prun` profiling: ncalls tottime percall cumtime percall filename:lineno(function) 204 19.783 0.097 19.783 0.097 {method 'acquire' of '_thread.lock' objects} 89208/51003 2.524 0.000 5.553 0.000 indexing.py:361(shape) 1 1.359 1.359 37.876 37.876 <string>:1(<module>) 71379/53550 1.242 0.000 3.266 0.000 utils.py:412(shape) 538295 0.929 0.000 1.317 0.000 {built-in method builtins.isinstance} 24674/13920 0.836 0.000 4.139 0.000 _collections_abc.py:756(update) 9 0.788 0.088 0.803 0.089 netCDF4_.py:178(_open_netcdf4_group) Weren't there some recent changes to the thread lock related to dask distributed?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
284908153	https://github.com/pydata/xarray/issues/1301#issuecomment-284908153	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NDkwODE1Mw==	shoyer 1217238	2017-03-08T00:38:55Z	2017-03-08T00:38:55Z	MEMBER	Wow, that is pretty bad. Try setting `compat='broadcast_equals'` in the `open_mfdataset` call, to restore the default value of that parameter prior v0.9. If that doesn't help, try downgrading dask to see if it's responsible. Profiling results from `%prun` in IPython would also be helpful at tracking down the culprit.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
284905152	https://github.com/pydata/xarray/issues/1301#issuecomment-284905152	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NDkwNTE1Mg==	jhamman 2443309	2017-03-08T00:22:10Z	2017-03-08T00:22:10Z	MEMBER	I've also noticed that we have a bottleneck here. @shoyer - any idea what we changed that could impact this? Could this be coming from a change upstream in dask?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);