home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 457798514

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2714#issuecomment-457798514 https://api.github.com/repos/pydata/xarray/issues/2714 457798514 MDEyOklzc3VlQ29tbWVudDQ1Nzc5ODUxNA== 1796208 2019-01-26T03:46:36Z 2019-01-26T03:46:36Z NONE

Maybe it would help to describe what you were trying to do here.

Sure - thanks!

I have a dataset that's long, the sample code shown below is 200k rows, but the full dataset will be much larger. I'm interested in pairwise distances except not for all rows, just the distances for few thousand rows, wrt to the full 200k.

Here's how I hack this together:

My starting array

```python

df_array = xr.DataArray(df) df_array = df_array.rename({PIVOT: 'all_sites'}) df_array

<xarray.DataArray (all_sites: 185084, dim_1: 245)> array([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]]) Coordinates: * all_sites (all_sites) object '0.gravatar.com||gprofiles.js||Gravatar.init' ... 'кÑ\x83Ñ\x80Ñ\x81Ñ\x8b.1Ñ\x81енÑ\x82Ñ\x8fбÑ\x80Ñ\x8f.Ñ\x80Ñ\x84||store.js||store.set' * dim_1 (dim_1) object 'AnalyserNode.connect' ... 'HTMLCanvasElement.previousSibling' ```

My slice of the array

python sites_of_interest = [sub list of all sites] df_dye_array = xr.DataArray(df.loc[sites_of_interest]) df_dye_array = df_dye_array.rename({PIVOT: 'dye_sites'})

Chunk

python df_array_c = df_array.chunk({'all_sites': 10_000}) df_dye_array_c = df_dye_array.chunk({'dye_sites': 100})

Get distances

```python def get_chebyshev_distances_xarray_ufunc(df_array, df_dye_array): chebyshev = lambda x: np.abs(df_array[:,0,:] - x).max(axis=1) result = np.apply_along_axis(chebyshev, 1, df_dye_array).T return result

distance_array = xr.apply_ufunc( get_chebyshev_distances_xarray_ufunc, df_array_c, df_dye_array_c, dask='parallelized', output_dtypes=[float], input_core_dims=[['dim_1'], ['dim_1']], ) ```

What I get out is an array with the length of my original array and the width of my sites of interest where each number is the chebyshev distance between their respective rows of the original dataset (which are 245 long).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  403378297
Powered by Datasette · Queries took 0.585ms · About: xarray-datasette