github: issue_comments: 10 rows where issue = 316618290 sorted by updated

10 rows where issue = 316618290 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
385116430	https://github.com/pydata/xarray/issues/2074#issuecomment-385116430	https://api.github.com/repos/pydata/xarray/issues/2074	MDEyOklzc3VlQ29tbWVudDM4NTExNjQzMA==	crusaderky 6213168	2018-04-27T23:13:20Z	2018-04-27T23:13:20Z	MEMBER	Done the work - but we'll need to wait for dask 0.17.3 to integrate it	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray.dot() dask problems 316618290
383817119	https://github.com/pydata/xarray/issues/2074#issuecomment-383817119	https://api.github.com/repos/pydata/xarray/issues/2074	MDEyOklzc3VlQ29tbWVudDM4MzgxNzExOQ==	mrocklin 306380	2018-04-24T06:22:39Z	2018-04-24T06:22:39Z	MEMBER	When doing benchmarks with things that might call BLAS operations in multiple threads I recommend setting the OMP_NUM_THREADS environment variable to 1. This will avoid oversubscription. On Mon, Apr 23, 2018 at 7:32 PM, Keisuke Fujii notifications@github.com wrote: @crusaderky https://github.com/crusaderky , Thanks for the detailed benchmarking. Further note: xr.dot uses tensordot if possible, as when I implemented dask did not have einsum. In the other cases, we use dask.atop with np.einsum. In your example, bench(100, False, ['t'], '...i,...i') uses dask.tensordot , bench(100, True, ['t'], '...i,...i') uses np.einsum. bench(100, True, [], ...i,...i->...i) also uses np.einsum. But I have no idea yet why dot(a, b, dims=[]) is faster than a * b. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2074#issuecomment-383754980, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszD_CL-zC6QgDunKQVaIGCiQA7u5Jks5trmSUgaJpZM4TfDSk .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray.dot() dask problems 316618290
383754980	https://github.com/pydata/xarray/issues/2074#issuecomment-383754980	https://api.github.com/repos/pydata/xarray/issues/2074	MDEyOklzc3VlQ29tbWVudDM4Mzc1NDk4MA==	fujiisoup 6815844	2018-04-23T23:32:33Z	2018-04-23T23:32:33Z	MEMBER	@crusaderky , Thanks for the detailed benchmarking. Further note: `xr.dot` uses `tensordot` if possible, as when I implemented `dask` did not have `einsum`. In the other cases, we use `dask.atop` with `np.einsum`. In your example, `bench(100, False, ['t'], '...i,...i')` uses `dask.tensordot`, `bench(100, True, ['t'], '...i,...i')` uses `np.einsum`. `bench(100, True, [], ...i,...i->...i)` also uses `np.einsum`. But I have no idea yet why `dot(a, b, dims=[])` is faster than `a * b`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray.dot() dask problems 316618290
383754250	https://github.com/pydata/xarray/issues/2074#issuecomment-383754250	https://api.github.com/repos/pydata/xarray/issues/2074	MDEyOklzc3VlQ29tbWVudDM4Mzc1NDI1MA==	shoyer 1217238	2018-04-23T23:28:27Z	2018-04-23T23:28:27Z	MEMBER	+1 for using dask.array.einsum in xarray.dot.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray.dot() dask problems 316618290
383724765	https://github.com/pydata/xarray/issues/2074#issuecomment-383724765	https://api.github.com/repos/pydata/xarray/issues/2074	MDEyOklzc3VlQ29tbWVudDM4MzcyNDc2NQ==	crusaderky 6213168	2018-04-23T21:12:04Z	2018-04-23T21:12:14Z	MEMBER	What are the arrays used as input for this case? See blob in the opening post dot reduces one dimension from each input `xarray.dot(a, b, dims=[])` is functionally identical to `a * b` to my understanding, but faster in some edge cases - which I can't make any sense of.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray.dot() dask problems 316618290
383723159	https://github.com/pydata/xarray/issues/2074#issuecomment-383723159	https://api.github.com/repos/pydata/xarray/issues/2074	MDEyOklzc3VlQ29tbWVudDM4MzcyMzE1OQ==	jakirkham 3019665	2018-04-23T21:06:42Z	2018-04-23T21:06:42Z	NONE	from what I understand `da.dot` implements... a limited special case of `da.einsum`? Basically `dot` is an inner product. Certainly inner products can be formulated using Einstein notation (i.e. calling with `einsum`). The question is whether the performance keeps up with that formulation. Currently it sounds like chunking causes some problems right now IIUC. However things like `dot` and `tensordot` dispatch through optimized BLAS routines. In theory `einsum` should do the same ( https://github.com/numpy/numpy/pull/9425 ), but the experimental data still shows a few warts. For example, `matmul` is implemented with `einsum`, but is slower than `dot`. ( https://github.com/numpy/numpy/issues/7569 ) ( https://github.com/numpy/numpy/issues/8957 ) Pure `einsum` implementations seem to perform similarly. I ran a few more benchmarks... What are the arrays used as input for this case? ...apparently `xarray.dot` on a dask backend is situationally faster than all other implementations when you are not reducing on any dimensions... Having a little trouble following this. `dot` reduces one dimension from each input. Excepting if one of the inputs is 0-D (i.e. a scalar), then it is just multiplying a single scalar through an array. Is that what you are referring?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray.dot() dask problems 316618290
383711323	https://github.com/pydata/xarray/issues/2074#issuecomment-383711323	https://api.github.com/repos/pydata/xarray/issues/2074	MDEyOklzc3VlQ29tbWVudDM4MzcxMTMyMw==	crusaderky 6213168	2018-04-23T20:26:59Z	2018-04-23T20:26:59Z	MEMBER	@jakirkham from what I understand `da.dot` implements... a limited special case of `da.einsum`? Ok this is funny. I ran a few more benchmarks, and apparently `xarray.dot` on a dask backend is situationally faster than all other implementations when you are not reducing on any dimensions - which I understand is really the same as (a * b), except that faster than (a * b)?!? ``` def bench(...): ... if not dims: print("a * b (numpy backend):") %timeit a.compute() * b.compute() print("a * b (dask backend):") %timeit (a * b).compute() bench(100, False, [], '...i,...i->...i') bench( 20, False, [], '...i,...i->...i') bench(100, True, [], '...i,...i->...i') bench( 20, True, [], '...i,...i->...i') ``` Output: ``` bench(100, False, [], ...i,...i->...i) xarray.dot(numpy backend): 291 ms ± 5.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) numpy.einsum: 296 ms ± 10 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) xarray.dot(dask backend): dimension 's' on 0th function argument to apply_ufunc with dask='parallelized' consists of multiple chunks, but is also a core dimension. To fix, rechunk into a single dask array chunk along this dimension, i.e., `.rechunk({'s': -1})`, but beware that this may significantly increase memory usage. dask.array.einsum: 296 ms ± 21.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) a * b (numpy backend) 279 ms ± 9.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) a * b (dask backend) 241 ms ± 8.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) bench(20, False, [], ...i,...i->...i) xarray.dot(numpy backend): 345 ms ± 6.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) numpy.einsum: 342 ms ± 4.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) xarray.dot(dask backend): dimension 's' on 0th function argument to apply_ufunc with dask='parallelized' consists of multiple chunks, but is also a core dimension. To fix, rechunk into a single dask array chunk along this dimension, i.e., `.rechunk({'s': -1})`, but beware that this may significantly increase memory usage. dask.array.einsum: 347 ms ± 6.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) a * b (numpy backend) 319 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) a * b (dask backend) 247 ms ± 5.37 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) bench(100, True, [], ...i,...i->...i) xarray.dot(numpy backend): 477 ms ± 8.29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) numpy.einsum: 514 ms ± 35.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) xarray.dot(dask backend): 241 ms ± 8.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) dask.array.einsum: 497 ms ± 21.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) a * b (numpy backend) 439 ms ± 27.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) a * b (dask backend) 517 ms ± 41.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) bench(20, True, [], ...i,...i->...i) xarray.dot(numpy backend): 572 ms ± 13.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) numpy.einsum: 563 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) xarray.dot(dask backend): 268 ms ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) dask.array.einsum: 563 ms ± 5.11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) a * b (numpy backend) 501 ms ± 5.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) a * b (dask backend) 922 ms ± 93.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` This particular bit is shocking and I can't wrap my head around it?!? ``` bench(100, True, [], ...i,...i->...i) xarray.dot(dask backend): 241 ms ± 8.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) a * b (dask backend) 517 ms ± 41.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) bench(20, True, [], ...i,...i->...i) xarray.dot(dask backend): 268 ms ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) a * b (dask backend) 922 ms ± 93.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray.dot() dask problems 316618290
383651390	https://github.com/pydata/xarray/issues/2074#issuecomment-383651390	https://api.github.com/repos/pydata/xarray/issues/2074	MDEyOklzc3VlQ29tbWVudDM4MzY1MTM5MA==	mrocklin 306380	2018-04-23T17:12:04Z	2018-04-23T17:12:04Z	MEMBER	See also https://github.com/dask/dask/issues/2225	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray.dot() dask problems 316618290
383637379	https://github.com/pydata/xarray/issues/2074#issuecomment-383637379	https://api.github.com/repos/pydata/xarray/issues/2074	MDEyOklzc3VlQ29tbWVudDM4MzYzNzM3OQ==	jakirkham 3019665	2018-04-23T16:26:51Z	2018-04-23T16:26:51Z	NONE	Might be worth revisiting how `da.dot` is implemented as well. That would be the least amount of rewriting for you and would generally be nice for Dask users. If you have not already, @crusaderky, it would be nice to raise an issue over at Dask with a straight Dask benchmark comparing Dask Array's `dot` and `einsum`. cc @mrocklin	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray.dot() dask problems 316618290
383419435	https://github.com/pydata/xarray/issues/2074#issuecomment-383419435	https://api.github.com/repos/pydata/xarray/issues/2074	MDEyOklzc3VlQ29tbWVudDM4MzQxOTQzNQ==	fujiisoup 6815844	2018-04-22T23:05:39Z	2018-04-22T23:06:05Z	MEMBER	`xr.dot` was implemented before dask/dask#3412 was merged, and thus it is not very efficient for dask now. The proposed solution is to simply wait for dask/dask#3412 to reach the next release and then reimplement xarray.dot to use dask.array.einsum. Agreed. I think the reimplementation would be easy, https://github.com/pydata/xarray/blob/99b457ce5859bd949cfea4671db5150c7297843a/xarray/core/computation.py#L1039-L1043 `dask='parallelrized'` -> `dask='allow'`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray.dot() dask problems 316618290

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);