github: issue_comments: 12 rows where author_association = "CONTRIBUTOR" and issue = 416962458 sorted by updated

12 rows where author_association = "CONTRIBUTOR" and issue = 416962458 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1306327743	https://github.com/pydata/xarray/issues/2799#issuecomment-1306327743	https://api.github.com/repos/pydata/xarray/issues/2799	IC_kwDOAMm_X85N3Pq_	hmaarrfk 90008	2022-11-07T22:45:07Z	2022-11-07T22:45:07Z	CONTRIBUTOR	As I've been recently going down this performance rabbit hole, I think the discussion around https://github.com/pydata/xarray/issues/7045 is relevant and provides some additional historical context as to "why" this performance penalty might be happening.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
786813358	https://github.com/pydata/xarray/issues/2799#issuecomment-786813358	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDc4NjgxMzM1OA==	hmaarrfk 90008	2021-02-26T18:19:28Z	2021-02-26T18:19:28Z	CONTRIBUTOR	I hope the following can help users that struggle with the speed of xarray: I've found that when doing numerical computation, I often use the xarray to grab all the metadata relevant to my computation. Scale, chromaticity, experimental information. Eventually, i create a function that acts as a barrier: - Xarray input (high level experimental data) - Computation parameters output (low level implementation detail relevant information). The low level implementation can operate on the fast numpy arrays. I've found this to be the struggle with creating high level APIs that do things like sanitize inputs (xarray routines like `_validate_indexers` and `_broadcast_indexes`) and low level APIs that are simply interested in moving and computing data. For the example that @nbren12 brought up originally, it might be better to create xarray routines (if they don't exist already) that can create fast iterators for the underlying numpy arrays given a set of dimensions that the user cares about.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
786764651	https://github.com/pydata/xarray/issues/2799#issuecomment-786764651	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDc4Njc2NDY1MQ==	nbren12 1386642	2021-02-26T16:51:50Z	2021-02-26T16:51:50Z	CONTRIBUTOR	@jhamman Weren't you talking about an xarray lite (TM) package?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
553294966	https://github.com/pydata/xarray/issues/2799#issuecomment-553294966	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDU1MzI5NDk2Ng==	nbren12 1386642	2019-11-13T08:32:05Z	2019-11-13T08:32:16Z	CONTRIBUTOR	This `variable` workaround is awesome @max-sixty. Are there any guidelines on when to use `Variable` vs `DataArray`? Some calculations (e.g. fast difference and derivatives/stencil operations) seem cleaner without explicit coordinate labels.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
552652019	https://github.com/pydata/xarray/issues/2799#issuecomment-552652019	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDU1MjY1MjAxOQ==	hmaarrfk 90008	2019-11-11T22:47:47Z	2019-11-11T22:47:47Z	CONTRIBUTOR	Sure, I just wanted to make the note that this operation should be more or less constant time, as opposed to dependent on the size of the array. Somebody had mentionned it should increase with the size of the array.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
552619589	https://github.com/pydata/xarray/issues/2799#issuecomment-552619589	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDU1MjYxOTU4OQ==	hmaarrfk 90008	2019-11-11T21:16:36Z	2019-11-11T21:16:36Z	CONTRIBUTOR	Hmm, slicing should basically be a no-op. The fact that xarray makes it about 100x slower is a real killer. It seems from this conversation that it might be hard to workaround ```python import xarray as xr import numpy as np n = np.zeros(shape=(1024, 1024)) x = xr.DataArray(n, dims=('y', 'x')) the_slice = np.s_[256:512, 256:512] %timeit n[the_slice] %timeit x[the_slice] 186 ns ± 0.778 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) 70.3 µs ± 593 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
539352070	https://github.com/pydata/xarray/issues/2799#issuecomment-539352070	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDUzOTM1MjA3MA==	ashwinvis 9155111	2019-10-08T06:08:27Z	2019-10-08T06:08:48Z	CONTRIBUTOR	I suspect system jitter in the profiling as the time for `Dataset.isel` went up. It would be useful to run `sudo python -m pyperf system tune` before running profiler/benchmarks.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
538366978	https://github.com/pydata/xarray/issues/2799#issuecomment-538366978	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDUzODM2Njk3OA==	ashwinvis 9155111	2019-10-04T11:57:10Z	2019-10-04T11:57:10Z	CONTRIBUTOR	At first sight it looks somewhat like a hybrid between Cython (for the ahead-of-time transpiling to C++) and numba (for having python-compatible syntax). Not really. Pythran always releases the GIL and does a bunch of optimizations between transpilation and compilations. A good approach would be try out different compilers and see what performance is obtained, without losing readability (https://github.com/pydata/xarray/issues/2799#issuecomment-469444519). See scikit-image/scikit-image/issues/4199 where the package `transonic` was being experimentally tested to replace Cython-only code with python code + type hints. As a bonus, you get to switch between Cython, Pythran and Numba,	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469451210	https://github.com/pydata/xarray/issues/2799#issuecomment-469451210	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDQ2OTQ1MTIxMA==	nbren12 1386642	2019-03-04T22:40:07Z	2019-03-04T22:40:07Z	CONTRIBUTOR	Sure, I've been using that as a workaround as well. Unfortunately, that approach throws away all the nice info (e.g. metadata, coordinate) that xarray objects have and requires duplicating much of xarray's indexing logic.	{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469447632	https://github.com/pydata/xarray/issues/2799#issuecomment-469447632	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDQ2OTQ0NzYzMg==	nbren12 1386642	2019-03-04T22:27:57Z	2019-03-04T22:27:57Z	CONTRIBUTOR	@max-sixty I tend to agree this use case could be outside of the scope of xarray. It sounds like significant progress might require re-implementing core `xarray` objects in C/Cython. Without more than 10x improvement, I would probably just continue using numpy arrays.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469443856	https://github.com/pydata/xarray/issues/2799#issuecomment-469443856	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDQ2OTQ0Mzg1Ng==	nbren12 1386642	2019-03-04T22:15:49Z	2019-03-04T22:15:49Z	CONTRIBUTOR	Thanks so much @shoyer. I didn't realize there was that much overhead for a single function call. OTOH, 2x slower than numpy would be way better than 1000x. After looking at the profiling info more, I tend to agree with your 10x maximum speed-up. A couple of particularly slow functions (e.g. `Dataset._validate_indexers`) account for about 75% of run time. However, the remaining 25% is split across several other pure python routines.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
469394020	https://github.com/pydata/xarray/issues/2799#issuecomment-469394020	https://api.github.com/repos/pydata/xarray/issues/2799	MDEyOklzc3VlQ29tbWVudDQ2OTM5NDAyMA==	nbren12 1386642	2019-03-04T19:45:11Z	2019-03-04T19:45:11Z	CONTRIBUTOR	cc @rabernat	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);