issue_comments
2 rows where user = 6491058 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
585452791 | https://github.com/pydata/xarray/issues/3762#issuecomment-585452791 | https://api.github.com/repos/pydata/xarray/issues/3762 | MDEyOklzc3VlQ29tbWVudDU4NTQ1Mjc5MQ== | bjcosta 6491058 | 2020-02-12T22:34:56Z | 2023-08-02T19:50:42Z | NONE |
Hi dcherian, I had a look at the apply_ufunc() example you linked and have re-implemented my code. The example helped me understand apply_ufunc() usage better but is very different from my use case and I still am unable to parallelize using dask. The key difference is apply_ufunc() as described in the docs and the example, applys a function to a vector of data of a single type (in the example case it is air temperature across the 3 dimensions lat,long,time). Where as I need to apply an operation using heterogeneous data (depth_bins, lower_limit, upper_limit) over a single dimension (time) to produce a new array of depths over time (which is why I tried groupby/map initially). Anyhow, I have an implementation using apply_ufunc() that works using xarray and numpy arrays with apply_ufunc(), but when I try to parallelize it using dask my ufunc is called with empty arrays by xarray and it fails. I.e. You can see when running the code below it logs the following when entering the ufunc: args: (array([], shape=(0, 0), dtype=int32), array([], dtype=int32), array([], dtype=int32), array([], dtype=int32)), kwargs: {} I was expecting this to be called once for each chunk with 1000 items for each array. Have I done something wrong in this work-around for the groupby/map code? Thanks, Brendon ```python import sys import math import logging import dask import xarray import numpy logger = logging.getLogger('main') if name == 'main': logging.basicConfig( stream=sys.stdout, format='%(asctime)s %(levelname)-8s %(message)s', level=logging.INFO, datefmt='%Y-%m-%d %H:%M:%S')
``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray groupby/map fails to parallelize 561921094 | |
583775471 | https://github.com/pydata/xarray/issues/3762#issuecomment-583775471 | https://api.github.com/repos/pydata/xarray/issues/3762 | MDEyOklzc3VlQ29tbWVudDU4Mzc3NTQ3MQ== | bjcosta 6491058 | 2020-02-08T20:47:23Z | 2020-02-08T20:47:23Z | NONE |
Thanks for the really quick response. I tried chunking and that is when the performance gets worse (from 5msec per interp to 70msec). I just added the exact chunking you suggested and tested again as well. It is slower but it also only seems to be using 10% of one CPU. I will have to wait until tomorrow to have a try of the ufunc method. It sounds like it might be a workaround so thanks for that suggestion. I still think there is a bug in this case (either in my code or in xarray). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray groupby/map fails to parallelize 561921094 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1