home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

13 rows where issue = 403378297 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • shoyer 6
  • birdsarah 6
  • stale[bot] 1

author_association 2

  • NONE 7
  • MEMBER 6

issue 1

  • Extra dimension on first argument passed into apply_ufunc · 13 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
751333859 https://github.com/pydata/xarray/issues/2714#issuecomment-751333859 https://api.github.com/repos/pydata/xarray/issues/2714 MDEyOklzc3VlQ29tbWVudDc1MTMzMzg1OQ== stale[bot] 26384082 2020-12-26T08:11:45Z 2020-12-26T08:11:45Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extra dimension on first argument passed into apply_ufunc 403378297
457800642 https://github.com/pydata/xarray/issues/2714#issuecomment-457800642 https://api.github.com/repos/pydata/xarray/issues/2714 MDEyOklzc3VlQ29tbWVudDQ1NzgwMDY0Mg== birdsarah 1796208 2019-01-26T04:22:42Z 2019-01-26T04:22:42Z NONE

Unfortunately neither of your suggestions work. With the second, I get the error:

  • ValueError: parameter 'value': expected array with shape (10000, 100), got (10000, 245)

With the first:

  • ValueError: operands could not be broadcast together with shapes (5000,100,245) (100,)

It's okay. I have something that works. And it's deterministic :D

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extra dimension on first argument passed into apply_ufunc 403378297
457799714 https://github.com/pydata/xarray/issues/2714#issuecomment-457799714 https://api.github.com/repos/pydata/xarray/issues/2714 MDEyOklzc3VlQ29tbWVudDQ1Nzc5OTcxNA== shoyer 1217238 2019-01-26T04:05:58Z 2019-01-26T04:07:24Z MEMBER

I think this would also work (due to the intrinsic broadcasting behavior of numpy functions): python def get_chebyshev_distances_xarray_ufunc(array, dye_array): return abs(array - dye_array).max(axis=-2)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extra dimension on first argument passed into apply_ufunc 403378297
457799469 https://github.com/pydata/xarray/issues/2714#issuecomment-457799469 https://api.github.com/repos/pydata/xarray/issues/2714 MDEyOklzc3VlQ29tbWVudDQ1Nzc5OTQ2OQ== shoyer 1217238 2019-01-26T04:02:09Z 2019-01-26T04:07:00Z MEMBER

OK, I think you're doing this right. You want an output with shape ['all_sites', 'dye_sites'] right?

My suggestion would be to add an explicit call to np.broadcast_arrays() at the start of your applied function. This will make the dimensions a little easier to understand. ```python def get_chebyshev_distances_xarray_ufunc(array, dye_array): array, dye_array = np.broadcast_arrays(array, dye_array) # array is a 3D numpy array with logical dimensions ['all_sites', 'dye_sites', 'dim_1'] # dye_array is a 3D numpy array with logical dimensions ['all_sites', 'dye_sites', 'dim_1']

# compute the distance matrix
# return a numpy array with logical dimensions ['all_sites', 'dye_sites']

``` (edit note: fixed dimensions)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extra dimension on first argument passed into apply_ufunc 403378297
457799701 https://github.com/pydata/xarray/issues/2714#issuecomment-457799701 https://api.github.com/repos/pydata/xarray/issues/2714 MDEyOklzc3VlQ29tbWVudDQ1Nzc5OTcwMQ== shoyer 1217238 2019-01-26T04:05:43Z 2019-01-26T04:05:43Z MEMBER

actually, scratch that

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extra dimension on first argument passed into apply_ufunc 403378297
457798552 https://github.com/pydata/xarray/issues/2714#issuecomment-457798552 https://api.github.com/repos/pydata/xarray/issues/2714 MDEyOklzc3VlQ29tbWVudDQ1Nzc5ODU1Mg== birdsarah 1796208 2019-01-26T03:47:08Z 2019-01-26T03:47:08Z NONE

The behavior is definitely deterministic, if hard to understand!

phew!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extra dimension on first argument passed into apply_ufunc 403378297
457798514 https://github.com/pydata/xarray/issues/2714#issuecomment-457798514 https://api.github.com/repos/pydata/xarray/issues/2714 MDEyOklzc3VlQ29tbWVudDQ1Nzc5ODUxNA== birdsarah 1796208 2019-01-26T03:46:36Z 2019-01-26T03:46:36Z NONE

Maybe it would help to describe what you were trying to do here.

Sure - thanks!

I have a dataset that's long, the sample code shown below is 200k rows, but the full dataset will be much larger. I'm interested in pairwise distances except not for all rows, just the distances for few thousand rows, wrt to the full 200k.

Here's how I hack this together:

My starting array

```python

df_array = xr.DataArray(df) df_array = df_array.rename({PIVOT: 'all_sites'}) df_array

<xarray.DataArray (all_sites: 185084, dim_1: 245)> array([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]]) Coordinates: * all_sites (all_sites) object '0.gravatar.com||gprofiles.js||Gravatar.init' ... 'кÑ\x83Ñ\x80Ñ\x81Ñ\x8b.1Ñ\x81енÑ\x82Ñ\x8fбÑ\x80Ñ\x8f.Ñ\x80Ñ\x84||store.js||store.set' * dim_1 (dim_1) object 'AnalyserNode.connect' ... 'HTMLCanvasElement.previousSibling' ```

My slice of the array

python sites_of_interest = [sub list of all sites] df_dye_array = xr.DataArray(df.loc[sites_of_interest]) df_dye_array = df_dye_array.rename({PIVOT: 'dye_sites'})

Chunk

python df_array_c = df_array.chunk({'all_sites': 10_000}) df_dye_array_c = df_dye_array.chunk({'dye_sites': 100})

Get distances

```python def get_chebyshev_distances_xarray_ufunc(df_array, df_dye_array): chebyshev = lambda x: np.abs(df_array[:,0,:] - x).max(axis=1) result = np.apply_along_axis(chebyshev, 1, df_dye_array).T return result

distance_array = xr.apply_ufunc( get_chebyshev_distances_xarray_ufunc, df_array_c, df_dye_array_c, dask='parallelized', output_dtypes=[float], input_core_dims=[['dim_1'], ['dim_1']], ) ```

What I get out is an array with the length of my original array and the width of my sites of interest where each number is the chebyshev distance between their respective rows of the original dataset (which are 245 long).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extra dimension on first argument passed into apply_ufunc 403378297
457794714 https://github.com/pydata/xarray/issues/2714#issuecomment-457794714 https://api.github.com/repos/pydata/xarray/issues/2714 MDEyOklzc3VlQ29tbWVudDQ1Nzc5NDcxNA== shoyer 1217238 2019-01-26T02:48:01Z 2019-01-26T03:40:29Z MEMBER

The notion of "core dimensions" in apply_ufunc() is definitely quite tricky to understand.

That said, I think this is (mostly) doing the right thing: - Your inputs have dimensions ['row_a', 'dim_1'] and ['row_b', 'dim_1'] - Xarray broadcasts over dimensions that aren't included in "core dimensions" , so inputs are broadcast to have dimensions like ['row_a', 'row_b', 'dim_1'] and ['row_a', 'row_b', 'dim_1'].

This is probably especially confusing because the unlabeled versions of da and db are given "broadcastable" shapes (1000, 1, 100) and (1000, 100) rather than the fully "broadcast" shapes of (1000, 1000, 100) and (1000, 1000, 100), which would make it more obvious what is going on.

For your specific use case: maybe you meant to specify input_core_dims=[['row_a'], ['row_b']] instead? That version would give inputs with dimensions like ['dim_1', 'row_a'] and ['dim_1', 'row_b'].

More generally: I think we really need a version of apply() that doesn't do this confusing broadcasting and dimension reordering. See https://github.com/pydata/xarray/issues/1130 for discussion about that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extra dimension on first argument passed into apply_ufunc 403378297
457798131 https://github.com/pydata/xarray/issues/2714#issuecomment-457798131 https://api.github.com/repos/pydata/xarray/issues/2714 MDEyOklzc3VlQ29tbWVudDQ1Nzc5ODEzMQ== shoyer 1217238 2019-01-26T03:40:14Z 2019-01-26T03:40:14Z MEMBER

unlabeled versions of da and db are given "broadcastable" shapes (1, 1000, 100) and (1000, 100)

Is it (1000, 1, 100) as my code seems to return, or, as you said (1, 1000, 100)? Is it deterministic?

That was a typo (I'll fix it). It should be (1000, 1, 100).

The behavior is definitely deterministic, if hard to understand!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extra dimension on first argument passed into apply_ufunc 403378297
457798029 https://github.com/pydata/xarray/issues/2714#issuecomment-457798029 https://api.github.com/repos/pydata/xarray/issues/2714 MDEyOklzc3VlQ29tbWVudDQ1Nzc5ODAyOQ== birdsarah 1796208 2019-01-26T03:38:31Z 2019-01-26T03:38:31Z NONE

Can you clarify one thing in your note.

unlabeled versions of da and db are given "broadcastable" shapes (1, 1000, 100) and (1000, 100)

Is it (1000, 1, 100) as my code seems to return, or, as you said (1, 1000, 100)? Is it deterministic?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extra dimension on first argument passed into apply_ufunc 403378297
457798018 https://github.com/pydata/xarray/issues/2714#issuecomment-457798018 https://api.github.com/repos/pydata/xarray/issues/2714 MDEyOklzc3VlQ29tbWVudDQ1Nzc5ODAxOA== shoyer 1217238 2019-01-26T03:38:20Z 2019-01-26T03:38:20Z MEMBER

Maybe it would help to describe what you were trying to do here. What should "one unit" of your calculation look like?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extra dimension on first argument passed into apply_ufunc 403378297
457797658 https://github.com/pydata/xarray/issues/2714#issuecomment-457797658 https://api.github.com/repos/pydata/xarray/issues/2714 MDEyOklzc3VlQ29tbWVudDQ1Nzc5NzY1OA== birdsarah 1796208 2019-01-26T03:32:10Z 2019-01-26T03:32:10Z NONE

Hi, I will have to think about your response a lot more to see if I can wrap my head around it.

In the meantime I'm not sure I have my input_core_dims correct, but that's the only configuration I could get to work.

I chunk along row_a, and row_b and I output a new array with the dims [row_a, row_b].

By trial and error, the above configuration is the only one I could find where I got out the dims I was expecting and didn't get an error.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extra dimension on first argument passed into apply_ufunc 403378297
457777423 https://github.com/pydata/xarray/issues/2714#issuecomment-457777423 https://api.github.com/repos/pydata/xarray/issues/2714 MDEyOklzc3VlQ29tbWVudDQ1Nzc3NzQyMw== birdsarah 1796208 2019-01-26T00:09:24Z 2019-01-26T00:09:24Z NONE

I should add, if I pass in plain numpy arrays then I do not have this problem. But ultimately I want to pass in a chunked DataArray, as described here: http://xarray.pydata.org/en/stable/dask.html#automatic-parallelization (this is my whole reason for using xarray).

The work around is easy I just use da[:,0,:] but it's odd!

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Extra dimension on first argument passed into apply_ufunc 403378297

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.976ms · About: xarray-datasette