issue_comments: 457798514

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/2714#issuecomment-457798514	https://api.github.com/repos/pydata/xarray/issues/2714	457798514	MDEyOklzc3VlQ29tbWVudDQ1Nzc5ODUxNA==	1796208	2019-01-26T03:46:36Z	2019-01-26T03:46:36Z	NONE	Maybe it would help to describe what you were trying to do here. Sure - thanks! I have a dataset that's long, the sample code shown below is 200k rows, but the full dataset will be much larger. I'm interested in pairwise distances except not for all rows, just the distances for few thousand rows, wrt to the full 200k. Here's how I hack this together: My starting array ```python df_array = xr.DataArray(df) df_array = df_array.rename({PIVOT: 'all_sites'}) df_array <xarray.DataArray (all_sites: 185084, dim_1: 245)> array([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]]) Coordinates: * all_sites (all_sites) object '0.gravatar.com\|\|gprofiles.js\|\|Gravatar.init' ... 'ÐºÑ\x83Ñ\x80Ñ\x81Ñ\x8b.1Ñ\x81ÐµÐ½Ñ\x82Ñ\x8fÐ±Ñ\x80Ñ\x8f.Ñ\x80Ñ\x84\|\|store.js\|\|store.set' * dim_1 (dim_1) object 'AnalyserNode.connect' ... 'HTMLCanvasElement.previousSibling' ``` My slice of the array `python sites_of_interest = [sub list of all sites] df_dye_array = xr.DataArray(df.loc[sites_of_interest]) df_dye_array = df_dye_array.rename({PIVOT: 'dye_sites'})` Chunk `python df_array_c = df_array.chunk({'all_sites': 10_000}) df_dye_array_c = df_dye_array.chunk({'dye_sites': 100})` Get distances ```python def get_chebyshev_distances_xarray_ufunc(df_array, df_dye_array): chebyshev = lambda x: np.abs(df_array[:,0,:] - x).max(axis=1) result = np.apply_along_axis(chebyshev, 1, df_dye_array).T return result distance_array = xr.apply_ufunc( get_chebyshev_distances_xarray_ufunc, df_array_c, df_dye_array_c, dask='parallelized', output_dtypes=[float], input_core_dims=[['dim_1'], ['dim_1']], ) ``` What I get out is an array with the length of my original array and the width of my sites of interest where each number is the chebyshev distance between their respective rows of the original dataset (which are 245 long).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		403378297