home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

13 rows where issue = 978356586 and user = 2448579 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • dcherian · 13 ✖

issue 1

  • Enable `flox` in `GroupBy` and `resample` · 13 ✖

author_association 1

  • MEMBER 13
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1126852038 https://github.com/pydata/xarray/pull/5734#issuecomment-1126852038 https://api.github.com/repos/pydata/xarray/issues/5734 IC_kwDOAMm_X85DKmXG dcherian 2448579 2022-05-15T03:31:50Z 2022-05-15T03:31:50Z MEMBER

and @andersy005 !

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Enable `flox` in `GroupBy` and `resample` 978356586
1125236627 https://github.com/pydata/xarray/pull/5734#issuecomment-1125236627 https://api.github.com/repos/pydata/xarray/issues/5734 IC_kwDOAMm_X85DEb-T dcherian 2448579 2022-05-12T17:14:13Z 2022-05-12T17:14:13Z MEMBER

a global/context option that changes the default value of method

Unfortunately the optimal method depends on distribution of group labels across chunks, so a global option doesn't make sense. It would make sense to create a method="auto" and use that but it doesn't exist yet ("cohorts" is closest)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Enable `flox` in `GroupBy` and `resample` 978356586
1124194834 https://github.com/pydata/xarray/pull/5734#issuecomment-1124194834 https://api.github.com/repos/pydata/xarray/issues/5734 IC_kwDOAMm_X85DAdoS dcherian 2448579 2022-05-11T19:14:24Z 2022-05-11T19:15:44Z MEMBER

Thanks for testing it out! I was going to ping xclim when this finally got merged. Presumably you haven't found any bugs?


You can pass method as .mean(..., method=...). Clearly this needs docs :)

We could actually consider adding flox_kwargs to the groupby constructor since a method is really only dependent on the distribution of group labels across the chunks. Right now, I'd just like this to get merged :)

For resampling-type, we are using cohorts by default which generalizes to blockwise when applicable but is slower at graph-construction time. Note you can only blockwise if all members of a group are in a single block. So if you are resampling to yearly but a year of data occupies multiple chunks, you want "cohorts", not "blockwise".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Enable `flox` in `GroupBy` and `resample` 978356586
1117613319 https://github.com/pydata/xarray/pull/5734#issuecomment-1117613319 https://api.github.com/repos/pydata/xarray/issues/5734 IC_kwDOAMm_X85CnW0H dcherian 2448579 2022-05-04T17:26:08Z 2022-05-04T17:26:08Z MEMBER

Thanks @Illviljan I'm having trouble getting the inheritance order right and keeping mypy happy. Help is very welcome!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Enable `flox` in `GroupBy` and `resample` 978356586
1117497457 https://github.com/pydata/xarray/pull/5734#issuecomment-1117497457 https://api.github.com/repos/pydata/xarray/issues/5734 IC_kwDOAMm_X85Cm6hx dcherian 2448579 2022-05-04T15:33:30Z 2022-05-04T15:34:06Z MEMBER

@pydata/xarray This is ready to go. It's mostly one adaptor function and a lot of new tests. It does need docs, I can add that in a future PR.

By default, we use a strategy ("split-reduce") that is very similar to our current one with dask arrays, so users will have to explicitly choose a new strategy to see much improvements. For resampling we can choose a sensible default that should show only improvements, and no regressions ("cohorts")

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Enable `flox` in `GroupBy` and `resample` 978356586
1092097037 https://github.com/pydata/xarray/pull/5734#issuecomment-1092097037 https://api.github.com/repos/pydata/xarray/issues/5734 IC_kwDOAMm_X85BGBQN dcherian 2448579 2022-04-07T19:00:28Z 2022-04-07T19:00:28Z MEMBER

@pydata/xarray this is blocked by https://github.com/pydata/xarray/issues/6430 but is ready for review.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Enable `flox` in `GroupBy` and `resample` 978356586
966624963 https://github.com/pydata/xarray/pull/5734#issuecomment-966624963 https://api.github.com/repos/pydata/xarray/issues/5734 IC_kwDOAMm_X845nYbD dcherian 2448579 2021-11-11T21:05:54Z 2021-11-11T21:05:54Z MEMBER

This builds on #5950 so that should be reviewed and merged first.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Enable `flox` in `GroupBy` and `resample` 978356586
933160264 https://github.com/pydata/xarray/pull/5734#issuecomment-933160264 https://api.github.com/repos/pydata/xarray/issues/5734 IC_kwDOAMm_X843nuVI dcherian 2448579 2021-10-04T05:42:51Z 2021-11-11T20:58:53Z MEMBER

!!!

The only failures are in test_units.py so now I think we can figure out how to implement this cleanly.

FAILED xarray/tests/test_units.py::TestDataArray::test_computation_objects[float64-method_groupby-data] FAILED xarray/tests/test_units.py::TestDataArray::test_computation_objects[float64-method_groupby_bins-data] FAILED xarray/tests/test_units.py::TestDataArray::test_computation_objects[int64-method_groupby-data] FAILED xarray/tests/test_units.py::TestDataArray::test_computation_objects[int64-method_groupby_bins-data] FAILED xarray/tests/test_units.py::TestDataArray::test_resample[float64] - pi... FAILED xarray/tests/test_units.py::TestDataArray::test_resample[int64] - pint... FAILED xarray/tests/test_units.py::TestDataset::test_computation_objects[float64-data-method_groupby_bins] FAILED xarray/tests/test_units.py::TestDataset::test_computation_objects[int64-data-method_groupby_bins] FAILED xarray/tests/test_units.py::TestDataset::test_resample[float64-data] FAILED xarray/tests/test_units.py::TestDataset::test_resample[int64-data] - p...

I like @max-sixty's suggestion of generating the reductions like generate_ops.py. It seems like a good first step would be to refactor the existing reductions in a separate PR.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Enable `flox` in `GroupBy` and `resample` 978356586
965574480 https://github.com/pydata/xarray/pull/5734#issuecomment-965574480 https://api.github.com/repos/pydata/xarray/issues/5734 IC_kwDOAMm_X845jX9Q dcherian 2448579 2021-11-10T17:31:22Z 2021-11-10T21:52:17Z MEMBER

OK CI isn't using the numpy_groupies code path for reasons I don't understand. Does anyone see a reason why this might happen?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Enable `flox` in `GroupBy` and `resample` 978356586
964409224 https://github.com/pydata/xarray/pull/5734#issuecomment-964409224 https://api.github.com/repos/pydata/xarray/issues/5734 IC_kwDOAMm_X845e7eI dcherian 2448579 2021-11-09T18:14:38Z 2021-11-09T18:14:38Z MEMBER

Benchmarks are looking good (npg=True means use numpy groupies). Big gains (10-20x) for large number of groups (100), especially with dask.

``` [ 2.78%] ··· groupby.GroupBy.time_agg_large_num_groups ok [ 2.78%] ··· ======== ========== =========== ========== =========== -- ndim / npg
-------- --------------------------------------------- method 1 / True 1 / False 2 / True 2 / False ======== ========== =========== ========== =========== sum 8.38±0ms 101±0ms 9.54±0ms 136±0ms
mean 7.12±0ms 101±0ms 9.74±0ms 148±0ms
======== ========== =========== ========== ===========

[ 5.56%] ··· groupby.GroupBy.time_agg_small_num_groups ok [ 5.56%] ··· ======== ========== =========== ========== =========== -- ndim / npg
-------- --------------------------------------------- method 1 / True 1 / False 2 / True 2 / False ======== ========== =========== ========== =========== sum 8.27±0ms 4.55±0ms 9.07±0ms 8.46±0ms mean 7.19±0ms 4.50±0ms 9.24±0ms 8.36±0ms ======== ========== =========== ========== ===========

[ 8.33%] ··· groupby.GroupBy.time_init ok [ 8.33%] ··· ====== ========== ndim
------ ---------- 1 1.72±0ms 2 4.06±0ms ====== ==========

[ 11.11%] ··· groupby.GroupByDask.time_agg_large_num_groups ok [ 11.11%] ··· ======== ========== =========== ========== =========== -- ndim / npg
-------- --------------------------------------------- method 1 / True 1 / False 2 / True 2 / False ======== ========== =========== ========== =========== sum 8.41±0ms 202±0ms 9.93±0ms 226±0ms
mean 7.83±0ms 197±0ms 10.7±0ms 213±0ms
======== ========== =========== ========== ===========

[ 13.89%] ··· groupby.GroupByDask.time_agg_small_num_groups ok [ 13.89%] ··· ======== ========== =========== ========== =========== -- ndim / npg
-------- --------------------------------------------- method 1 / True 1 / False 2 / True 2 / False ======== ========== =========== ========== =========== sum 8.41±0ms 8.99±0ms 10.5±0ms 12.5±0ms mean 7.98±0ms 8.67±0ms 10.1±0ms 12.2±0ms ======== ========== =========== ========== ===========

[ 16.67%] ··· groupby.GroupByDask.time_init ok [ 16.67%] ··· ====== ========== ndim
------ ---------- 1 1.77±0ms 2 4.06±0ms ====== ==========

[ 36.11%] ··· groupby.Resample.time_agg_large_num_groups ok [ 36.11%] ··· ======== ========== =========== ========== =========== -- ndim / npg
-------- --------------------------------------------- method 1 / True 1 / False 2 / True 2 / False ======== ========== =========== ========== =========== sum 17.2±0ms 83.3±0ms 17.0±0ms 93.5±0ms mean 15.5±0ms 91.0±0ms 17.4±0ms 101±0ms
======== ========== =========== ========== ===========

[ 38.89%] ··· groupby.Resample.time_agg_small_num_groups ok [ 38.89%] ··· ======== ========== =========== ========== =========== -- ndim / npg
-------- --------------------------------------------- method 1 / True 1 / False 2 / True 2 / False ======== ========== =========== ========== =========== sum 16.7±0ms 12.3±0ms 16.7±0ms 13.3±0ms mean 15.2±0ms 12.5±0ms 19.3±0ms 13.9±0ms ======== ========== =========== ========== ===========

[ 41.67%] ··· groupby.Resample.time_init ok [ 41.67%] ··· ====== ========== ndim
------ ---------- 1 7.46±0ms 2 7.26±0ms ====== ==========

[ 44.44%] ··· groupby.ResampleDask.time_agg_large_num_groups ok [ 44.44%] ··· ======== ========== =========== ========== =========== -- ndim / npg
-------- --------------------------------------------- method 1 / True 1 / False 2 / True 2 / False ======== ========== =========== ========== =========== sum 22.3±0ms 561±0ms 28.3±0ms 607±0ms
mean 22.2±0ms 344±0ms 27.3±0ms 371±0ms
======== ========== =========== ========== ===========

[ 47.22%] ··· groupby.ResampleDask.time_agg_small_num_groups ok [ 47.22%] ··· ======== ========== =========== ========== =========== -- ndim / npg
-------- --------------------------------------------- method 1 / True 1 / False 2 / True 2 / False ======== ========== =========== ========== =========== sum 17.7±0ms 31.2±0ms 20.0±0ms 34.2±0ms mean 17.2±0ms 24.4±0ms 19.9±0ms 26.6±0ms ======== ========== =========== ========== ===========

[ 50.00%] ··· groupby.ResampleDask.time_init ok [ 50.00%] ··· ====== ========== ndim
------ ---------- 1 7.43±0ms 2 6.91±0ms == ```

{
    "total_count": 3,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 3,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Enable `flox` in `GroupBy` and `resample` 978356586
963568052 https://github.com/pydata/xarray/pull/5734#issuecomment-963568052 https://api.github.com/repos/pydata/xarray/issues/5734 IC_kwDOAMm_X845buG0 dcherian 2448579 2021-11-08T21:00:04Z 2021-11-08T21:00:04Z MEMBER

Maybe it's also on the pint side? Even if numpy_groupies supports the like argument it will crash because pint doesn't support asanyarray.

cc @keewis

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Enable `flox` in `GroupBy` and `resample` 978356586
954290212 https://github.com/pydata/xarray/pull/5734#issuecomment-954290212 https://api.github.com/repos/pydata/xarray/issues/5734 IC_kwDOAMm_X8444VAk dcherian 2448579 2021-10-28T23:11:08Z 2021-10-28T23:11:08Z MEMBER

appears numpy_groupies is forcing the duck arrays to numpy arrays.

yes; this will require upstream changes

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Enable `flox` in `GroupBy` and `resample` 978356586
913070347 https://github.com/pydata/xarray/pull/5734#issuecomment-913070347 https://api.github.com/repos/pydata/xarray/issues/5734 IC_kwDOAMm_X842bFkL dcherian 2448579 2021-09-05T01:52:49Z 2021-09-05T01:52:49Z MEMBER

We don't have any asv benchmarks for groupby currently. It would be good to add some!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Enable `flox` in `GroupBy` and `resample` 978356586

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 460.697ms · About: xarray-datasette