home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 462859457 and user = 1828519 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • djhoese · 5 ✖

issue 1

  • Multidimensional dask coordinates unexpectedly computed · 5 ✖

author_association 1

  • CONTRIBUTOR 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
508862961 https://github.com/pydata/xarray/issues/3068#issuecomment-508862961 https://api.github.com/repos/pydata/xarray/issues/3068 MDEyOklzc3VlQ29tbWVudDUwODg2Mjk2MQ== djhoese 1828519 2019-07-05T21:10:50Z 2019-07-05T21:10:50Z CONTRIBUTOR

Ah, good call. The transpose currently in xarray would still be a problem though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional dask coordinates unexpectedly computed 462859457
507656176 https://github.com/pydata/xarray/issues/3068#issuecomment-507656176 https://api.github.com/repos/pydata/xarray/issues/3068 MDEyOklzc3VlQ29tbWVudDUwNzY1NjE3Ng== djhoese 1828519 2019-07-02T12:31:54Z 2019-07-02T12:33:15Z CONTRIBUTOR

@shoyer Understood. That explains why something like this wasn't caught before, but what would be the best solution for a short term fix?

For the long term, I also understand that there isn't really a good way to check equality of two dask arrays. I wonder if dask's graph optimization could be used to "simplify" two dask arrays' graph separately and check the graph equality. For example, two dask arrays created by doing da.zeros((10, 10), chunks=2) + 5 should be theoretically equal because their dask graphs are made up of the same tasks.

Edit: "short term fix": What is the best way to avoid the unnecessary transpose? Or is this not even the right way to approach this? Change dask to avoid the unnecessary transpose or change xarray to not do the tranpose or something else?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional dask coordinates unexpectedly computed 462859457
507410467 https://github.com/pydata/xarray/issues/3068#issuecomment-507410467 https://api.github.com/repos/pydata/xarray/issues/3068 MDEyOklzc3VlQ29tbWVudDUwNzQxMDQ2Nw== djhoese 1828519 2019-07-01T20:20:05Z 2019-07-01T20:20:05Z CONTRIBUTOR

Modifying this line to be:

python if dims == expanded_vars.sizes: return expanded_vars return expanded_var.transpose(*dims)

Then this issue is avoided for at least the + case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional dask coordinates unexpectedly computed 462859457
507405717 https://github.com/pydata/xarray/issues/3068#issuecomment-507405717 https://api.github.com/repos/pydata/xarray/issues/3068 MDEyOklzc3VlQ29tbWVudDUwNzQwNTcxNw== djhoese 1828519 2019-07-01T20:05:51Z 2019-07-01T20:05:51Z CONTRIBUTOR

Ok another update. In the previous example I accidentally added the lons coordinate DataArray with the dimensions redefined (('y', 'x'), lons2) which is technically redundant but it worked (no progress bar).

However, if I fix this redundancy and do:

python a = xr.DataArray(da.zeros((10, 10), chunks=2), dims=('y', 'x'), coords={'lons': lons2}) b = xr.DataArray(da.zeros((10, 10), chunks=2), dims=('y', 'x'), coords={'lons': lons2}) with ProgressBar(): c = a + b

I do get a progress bar again (lons2 is being computed). I've tracked it down to this transpose which is transposing when it doesn't need to which is causing the dask array to change:

https://github.com/pydata/xarray/blob/master/xarray/core/variable.py#L1223

I'm not sure if this would be considered a bug in dask or xarray. Also, not sure why the redundant version of the example worked.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional dask coordinates unexpectedly computed 462859457
507396912 https://github.com/pydata/xarray/issues/3068#issuecomment-507396912 https://api.github.com/repos/pydata/xarray/issues/3068 MDEyOklzc3VlQ29tbWVudDUwNzM5NjkxMg== djhoese 1828519 2019-07-01T19:38:06Z 2019-07-01T19:38:06Z CONTRIBUTOR

Ok I'm getting a little more of an understanding on this. The main issue is that the dask array is not literally considered the same object because I'm creating the object twice. If I create a single dask array and pass it:

python lons = da.zeros((10, 10), chunks=2) a = xr.DataArray(da.zeros((10, 10), chunks=2), dims=('y', 'x'), coords={'y': np.arange(10), 'x': np.arange(10), 'lons': (('y', 'x'), lons)}) b = xr.DataArray(da.zeros((10, 10), chunks=2), dims=('y', 'x'), coords={'y': np.arange(10), 'x': np.arange(10), 'lons': (('y', 'x'), lons)})

I still get the progress bar because xarray is creating two new DataArray objects for this lons coordinate. So lons_data_arr.variable._data is not lons_data_arr2.variable._data causing the equivalency check here to fail.

If I make a single DataArray that becomes the coordinate variable then it seems to work:

python lons2 = xr.DataArray(lons, dims=('y', 'x')) a = xr.DataArray(da.zeros((10, 10), chunks=2), dims=('y', 'x'), coords={'y': np.arange(10), 'x': np.arange(10), 'lons': (('y', 'x'), lons2)}) b = xr.DataArray(da.zeros((10, 10), chunks=2), dims=('y', 'x'), coords={'y': np.arange(10), 'x': np.arange(10), 'lons': (('y', 'x'), lons2)})

I get no progress bar. ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional dask coordinates unexpectedly computed 462859457

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.342ms · About: xarray-datasette