github: issue_comments: 4 rows where author_association = "NONE", issue = 374025325 and user = 22492773 sorted by updated

4 rows where author_association = "NONE", issue = 374025325 and user = 22492773 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
935769790	https://github.com/pydata/xarray/issues/2511#issuecomment-935769790	https://api.github.com/repos/pydata/xarray/issues/2511	IC_kwDOAMm_X843xra-	pl-marasco 22492773	2021-10-06T08:47:24Z	2021-10-06T08:47:24Z	NONE	@bzah I've been testing your code and I can confirm the increment of timing once the .compute() isn't in use. I've noticed that using your modification, seems that dask array is computed more than one time per sample. I've made some tests using a modified version from #3237 and here are my observations: Assuming that we have only one sample object after the resample the expected result should be 1 compute and that's what we obtain if we call the computation before the .argmax() If .compute() is removed then I got 3 total computations. Just as a confirmation if you increase the sample you will get a multiple of 3 as a result of computes. I still don't know the reason and if is correct or not but sounds weird to me; though it could explain the time increase. @dcherian @shyer do you know if all this make any sense? should the .isel() automatically trig the computation or should give back a lazy array? Here is the code I've been using (works only adding the modification proposed by @bzah) ``` import numpy as np import dask import xarray as xr class Scheduler: """ From: https://stackoverflow.com/questions/53289286/ """ `def __init__(self, max_computes=20): self.max_computes = max_computes self.total_computes = 0 def __call__(self, dsk, keys, kwargs): self.total_computes += 1 if self.total_computes > self.max_computes: raise RuntimeError( "Too many dask computations were scheduled: {}".format( self.total_computes ) ) return dask.get(dsk, keys, kwargs)` scheduler = Scheduler() with dask.config.set(scheduler=scheduler): `COORDS = dict(dim_0=pd.date_range("2042-01-01", periods=31, freq='D'), dim_1= range(0,500), dim_2= range(0,500)) da = xr.DataArray(np.random.rand(31 * 500 * 500).reshape((31, 500, 500)), coords=COORDS).chunk(dict(dim_0=-1, dim_1=100, dim_2=100)) print(da) resampled = da.resample(dim_0="MS") for label, sample in resampled: #sample = sample.compute() idx = sample.argmax('dim_0') sampled = sample.isel(dim_0=idx) print("Total number of computes: %d" % scheduler.total_computes)` ```	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Array indexing with dask arrays 374025325
932169790	https://github.com/pydata/xarray/issues/2511#issuecomment-932169790	https://api.github.com/repos/pydata/xarray/issues/2511	IC_kwDOAMm_X843j8g-	pl-marasco 22492773	2021-10-01T12:04:55Z	2021-10-01T12:04:55Z	NONE	@bzah I tested your patch with the following code: ``` import xarray as xr from distributed import Client client = Client() da = xr.DataArray(np.random.rand(2035003500).reshape((20,3500,3500)), dims=('time', 'x', 'y')).chunk(dict(time=-1, x=100, y=100)) idx = da.argmax('time').compute() da.isel(time=idx) ``` In my case seems that with or without it takes the same time but I would like to know if is the same for you. L.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Array indexing with dask arrays 374025325
930309991	https://github.com/pydata/xarray/issues/2511#issuecomment-930309991	https://api.github.com/repos/pydata/xarray/issues/2511	IC_kwDOAMm_X843c2dn	pl-marasco 22492773	2021-09-29T15:56:33Z	2021-09-29T15:56:33Z	NONE	@pl-marasco Ok that's strange. I should have saved my use case :/ I will try to reproduce it and will provide a gist of it soon. What I noticed, on my use case, is that it provoke a computation. Is that the reason for what you consider slow? Could be possible that is related to #3237 ?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Array indexing with dask arrays 374025325
930124657	https://github.com/pydata/xarray/issues/2511#issuecomment-930124657	https://api.github.com/repos/pydata/xarray/issues/2511	IC_kwDOAMm_X843cJNx	pl-marasco 22492773	2021-09-29T12:22:06Z	2021-09-29T12:22:06Z	NONE	@bzah I've been testing your solution and doesn't seems to slow as you are mentioning. Do you have a specific test to be conducted so that we can make a more robust comparison?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Array indexing with dask arrays 374025325

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);