home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "MEMBER" and issue = 351000813 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • fujiisoup 3
  • shoyer 1

issue 1

  • Inconsistent results when calculating sums on float32 arrays w/ bottleneck installed · 4 ✖

author_association 1

  • MEMBER · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
413779588 https://github.com/pydata/xarray/issues/2370#issuecomment-413779588 https://api.github.com/repos/pydata/xarray/issues/2370 MDEyOklzc3VlQ29tbWVudDQxMzc3OTU4OA== fujiisoup 6815844 2018-08-17T07:16:43Z 2018-08-17T07:16:43Z MEMBER

Does it work to simply specify an explicit dtype in the sum?

Yes. If the original array is in np.float32 and we specify dtype=np.float64, then the calculation will be performed with np.nansum, so we can avoid using bottleneck. But if we set dtype=np.float32 (the same with the input dtype), then bottleneck will be used.

But I do think it still probably offers a meaningful speedup in many cases....

How about making numpy function default and we use bottleneck only when it is specified explicitly? It does not simplify our code though...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Inconsistent results when calculating sums on float32 arrays w/ bottleneck installed 351000813
413775977 https://github.com/pydata/xarray/issues/2370#issuecomment-413775977 https://api.github.com/repos/pydata/xarray/issues/2370 MDEyOklzc3VlQ29tbWVudDQxMzc3NTk3Nw== shoyer 1217238 2018-08-17T06:58:21Z 2018-08-17T06:58:21Z MEMBER

There has been discussion about changing this condo-forge dependencies for xarray: https://github.com/conda-forge/xarray-feedstock/issues/5. Bottleneck definitely isn’t a true required dependency.

Does it work to simply specify an explicit dtype in the sum?

I also wonder if it’s really worth the hassle of using bottleneck here, given these numerical precision issues and how it can’t be used with cask. But I do think it still probably offers a meaningful speedup in many cases....

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Inconsistent results when calculating sums on float32 arrays w/ bottleneck installed 351000813
413774281 https://github.com/pydata/xarray/issues/2370#issuecomment-413774281 https://api.github.com/repos/pydata/xarray/issues/2370 MDEyOklzc3VlQ29tbWVudDQxMzc3NDI4MQ== fujiisoup 6815844 2018-08-17T06:48:54Z 2018-08-17T06:48:54Z MEMBER

Right now bottleneck is automatically chosen if it is installed, which is rather annoying since the xarray recipe on conda-forge ships with bottleneck even though it should be an optional dependency.

I didn't notice that. I also think that bottleneck should be an optional dependency. @shoyer, can you check this? Maybe this file defines dependency?

Perhaps we could make it possible to to set the ops engine (to either numpy or bottleneck) and dtype (float32, float64) via set_options()

I think this is a reasonable option.

Personally, I think we can consider to stop using bottleneck entirely or make it completely optional. With dask backend, bottleneck is not being used and we use this konly in nan-aggregation methods for numpy backend and the rolling operation. After #1837, our rolling with pure numpy is not terriblly slow compared with bottleneck.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Inconsistent results when calculating sums on float32 arrays w/ bottleneck installed 351000813
413398724 https://github.com/pydata/xarray/issues/2370#issuecomment-413398724 https://api.github.com/repos/pydata/xarray/issues/2370 MDEyOklzc3VlQ29tbWVudDQxMzM5ODcyNA== fujiisoup 6815844 2018-08-16T02:01:38Z 2018-08-16T02:02:29Z MEMBER

After #2236, sum(skipna=False) will use numpy function even if bottleneck is installed. But in other cases, we still use bottleneck.

I am actually not sure that the automatically casting to float64 or switching to numpy function are the correct path. Some people may want to use float32 for saving memory and some other people may want to use bn.nansum as it is more efficient than numpy's counterpart. The best algorithm may depend on usecases.

My proposal is to make these method more explicit, e.g. supporting engine='bottleneck' keyword.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Inconsistent results when calculating sums on float32 arrays w/ bottleneck installed 351000813

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 800.238ms · About: xarray-datasette