github: issue_comments: 10 rows where issue = 425320466 sorted by updated

10 rows where issue = 425320466 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1101512178	https://github.com/pydata/xarray/issues/2852#issuecomment-1101512178	https://api.github.com/repos/pydata/xarray/issues/2852	IC_kwDOAMm_X85Bp73y	dcherian 2448579	2022-04-18T15:45:41Z	2022-04-18T15:45:41Z	MEMBER	You can do this with flox now. Eventually we can update xarray to support grouping by a dask variable. The limitation will be that the user will have to provide "expected groups" so that we can construct the output coordinate.	{ "total_count": 2, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 2, "rocket": 0, "eyes": 0 }	Allow grouping by dask variables 425320466
1100985429	https://github.com/pydata/xarray/issues/2852#issuecomment-1100985429	https://api.github.com/repos/pydata/xarray/issues/2852	IC_kwDOAMm_X85Bn7RV	stale[bot] 26384082	2022-04-18T00:43:46Z	2022-04-18T00:43:46Z	NONE	In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the `stale` label; otherwise it will be marked as closed automatically	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow grouping by dask variables 425320466
653016746	https://github.com/pydata/xarray/issues/2852#issuecomment-653016746	https://api.github.com/repos/pydata/xarray/issues/2852	MDEyOklzc3VlQ29tbWVudDY1MzAxNjc0Ng==	rabernat 1197350	2020-07-02T13:48:39Z	2020-07-02T13:48:39Z	MEMBER	👀 cc @chiaral	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow grouping by dask variables 425320466
652898319	https://github.com/pydata/xarray/issues/2852#issuecomment-652898319	https://api.github.com/repos/pydata/xarray/issues/2852	MDEyOklzc3VlQ29tbWVudDY1Mjg5ODMxOQ==	C-H-Simpson 20053498	2020-07-02T09:29:32Z	2020-07-02T09:29:55Z	NONE	I'm going to share a code snippet that might be useful to people reading this issue. I wanted to group my data by month and year, and take the mean for each group. I did not want to use `resample`, as I wanted the dimensions to be ('month', 'year'), rather than ('time'). The obvious way of doing this is to use a pd.MultiIndex to create a 'year_month' stacked coordinate: I found this did not have good perfomance. My solution was to use `xr.apply_ufunc`, as suggested above. I think it should be OK with dask chunked data, provided it is not chunked in time. Here is the code: ``` def _grouped_mean( data: np.ndarray, months: np.ndarray, years: np.ndarray) -> np.ndarray: """similar to grouping year_month MultiIndex, but faster. Should be used wrapped by _wrapped_grouped_mean""" unique_months = np.sort(np.unique(months)) unique_years = np.sort(np.unique(years)) old_shape = list(data.shape) new_shape = old_shape[:-1] new_shape.append(unique_months.shape[0]) new_shape.append(unique_years.shape[0]) output = np.zeros(new_shape) for i_month, j_year in np.ndindex(output.shape[2:]): indices = np.intersect1d( (months == unique_months[i_month]).nonzero(), (years == unique_years[j_year]).nonzero() ) output[:, :, i_month, j_year] =\ np.mean(data[:, :, indices], axis=-1) return output def _wrapped_grouped_mean(da: xr.DataArray) -> xr.DataArray: """similar to grouping by a year_month MultiIndex, but faster. `Wraps a numpy-style function with xr.apply_ufunc """ Y = xr.apply_ufunc( _grouped_mean, da, da.time.dt.month, da.time.dt.year, input_core_dims=[['lat', 'lon', 'time'], ['time'], ['time']], output_core_dims=[['lat', 'lon', 'month', 'year']], ) Y = Y.assign_coords( {'month': np.sort(np.unique(da.time.dt.month)), 'year': np.sort(np.unique(da.time.dt.year))}) return Y` ```	{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow grouping by dask variables 425320466
478624700	https://github.com/pydata/xarray/issues/2852#issuecomment-478624700	https://api.github.com/repos/pydata/xarray/issues/2852	MDEyOklzc3VlQ29tbWVudDQ3ODYyNDcwMA==	jmichel-otb 10595679	2019-04-01T15:23:35Z	2019-04-01T15:23:35Z	CONTRIBUTOR	That's a tough question ;) In the current dataset I have 950 unique labels, but in my use cases it can be be a lot more (e.g. agricultaral crops) or a lot less (adminstrative boundaries or regions).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow grouping by dask variables 425320466
478621867	https://github.com/pydata/xarray/issues/2852#issuecomment-478621867	https://api.github.com/repos/pydata/xarray/issues/2852	MDEyOklzc3VlQ29tbWVudDQ3ODYyMTg2Nw==	shoyer 1217238	2019-04-01T15:16:30Z	2019-04-01T15:16:30Z	MEMBER	Roughly how many unique labels do you have?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow grouping by dask variables 425320466
478563375	https://github.com/pydata/xarray/issues/2852#issuecomment-478563375	https://api.github.com/repos/pydata/xarray/issues/2852	MDEyOklzc3VlQ29tbWVudDQ3ODU2MzM3NQ==	dcherian 2448579	2019-04-01T12:43:03Z	2019-04-01T12:43:03Z	MEMBER	It sounds like there is an apply_ufunc solution to your problem but I dont know how to write it! ;)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow grouping by dask variables 425320466
478488200	https://github.com/pydata/xarray/issues/2852#issuecomment-478488200	https://api.github.com/repos/pydata/xarray/issues/2852	MDEyOklzc3VlQ29tbWVudDQ3ODQ4ODIwMA==	jmichel-otb 10595679	2019-04-01T08:37:42Z	2019-04-01T08:37:42Z	CONTRIBUTOR	Many thanks for your answers @shoyer and @rabernat . I am relatively new to `xarray` and `dask`, I am trying to determine if it can fit our need for analysis of large stacks of Sentinel data on our cluster. I will give a try to `dask.array.histogram` ass @rabernat suggested. I also had the following idea. Given that: * I know exactly beforehand which labels (or groups) I want to analyse, * `.where(label=xxx).mean('variable')` does the job perfectly for one label, I do not actually need the discovery of unique labels that `groupby()` performs, what I really need is an efficient way to perform multiple `where()` aggregate operations at once, to avoid traversing the data multiple time. Maybe there is already something like that in xarray, or maybe this is something I can derive from the implementation of `where()` ?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow grouping by dask variables 425320466
478415169	https://github.com/pydata/xarray/issues/2852#issuecomment-478415169	https://api.github.com/repos/pydata/xarray/issues/2852	MDEyOklzc3VlQ29tbWVudDQ3ODQxNTE2OQ==	shoyer 1217238	2019-04-01T02:31:58Z	2019-04-01T02:31:58Z	MEMBER	The current design of `GroupBy.apply()` in xarray is entirely ignorant of dask: it simply uses a `for` loop over the grouped variable to built up a computation with high level array operations. This makes operations that group over large keys stored in dask inefficient. This could be done efficiently (`dask.dataframe` does this, and might be worth trying in your case) but it's a more challenging distributed computing problem, and xarray's current data model would not know how large of a dimension to create for the returned ararys (doing this properly would require supporting arrays with unknown dimension sizes).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow grouping by dask variables 425320466
476678007	https://github.com/pydata/xarray/issues/2852#issuecomment-476678007	https://api.github.com/repos/pydata/xarray/issues/2852	MDEyOklzc3VlQ29tbWVudDQ3NjY3ODAwNw==	rabernat 1197350	2019-03-26T14:41:59Z	2019-03-26T14:41:59Z	MEMBER	label (y, x) uint16 dask.array<shape=(10980, 10980), chunksize=(200, 10980)> ... geoms_ds.groupby('label')` It is very hard to make this sort of groupby lazy, because you are grouping over the variable `label` itself. Groupby uses a split-apply-combine paradigm to transform the data. The apply and combine steps can be lazy. But the split step cannot. Xarray uses the group variable to determine how to index the array, i.e. which items belong in which group. To do this, it needs to read the whole variable into memory. In this specific example, it sounds like what you want is to compute the histogram of labels. That could be accomplished without groupby. For example, you could use apply_ufunc together with `dask.array.histogram`. So my recommendation is to think of a way to accomplish what you want that does not involve groupby.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow grouping by dask variables 425320466

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);