home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "NONE" and issue = 225774140 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • JanisGailis 2
  • rohitshukla0104 1

issue 1

  • selecting a point from an mfdataset · 3 ✖

author_association 1

  • NONE · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
383477976 https://github.com/pydata/xarray/issues/1396#issuecomment-383477976 https://api.github.com/repos/pydata/xarray/issues/1396 MDEyOklzc3VlQ29tbWVudDM4MzQ3Nzk3Ng== rohitshukla0104 35775735 2018-04-23T07:18:23Z 2019-01-13T06:32:12Z NONE

I am using MITgcm and want to incorporate my latitude and longitude information from my grid file to state file. Could you please help me in this regard?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  selecting a point from an mfdataset 225774140
306850272 https://github.com/pydata/xarray/issues/1396#issuecomment-306850272 https://api.github.com/repos/pydata/xarray/issues/1396 MDEyOklzc3VlQ29tbWVudDMwNjg1MDI3Mg== JanisGailis 9655353 2017-06-07T16:30:04Z 2017-06-07T16:50:40Z NONE

That's great to know! I think there's no need to try my 'solution' then, maybe only out of pure interest.

It would of course be interesting to know why a 'custom' chunked dataset was apparently not affected by the bug. And if it was indeed the case.

EDIT: I read the discussion on dask github and the xarray mailinglist. It's probably because when explicit chunking is used, the chunks are not aliased and fusing works as expected.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  selecting a point from an mfdataset 225774140
306838587 https://github.com/pydata/xarray/issues/1396#issuecomment-306838587 https://api.github.com/repos/pydata/xarray/issues/1396 MDEyOklzc3VlQ29tbWVudDMwNjgzODU4Nw== JanisGailis 9655353 2017-06-07T15:51:34Z 2017-06-07T15:53:06Z NONE

We had similar performance issues with xarray+dask, which we solved by using a chunking heuristic when opening a dataset. You can read about it in #1440. Now, in our case the data really wouldn't fit in memory, which is clearly not the case in your gist. Anyway, I thought I'd play around with your gist and see if chunking can make a difference.

I couldn't use your example directly, as the data it generates in memory is too large for the dev VM I'm on with this. So I changed the generated file size to (12, 1000, 2000), the essence of your gist remained though, it would take ~25 seconds to do the time series extraction, whereas ~800 ms using extract_point_xarray().

So, I thought I'd try our 'chunking heuristic' on the generated test datasets. Simply split the dataset in 2x2 chunks along spatial dimensions. So:

python ds = xr.open_mfdataset(all_files, decode_cf=False, chunks={'time':12, 'x':1000, 'y':500})

To my surprise: ```python

time extracting a timeseries of a single point

y, x = 200, 300 with ProgressBar(): %time ts = ds.data[:, y, x].load() results in [########################################] | 100% Completed | 0.7s CPU times: user 124 ms, sys: 268 ms, total: 392 ms Wall time: 826 ms ```

I'm not entirely sure what's happening, as the file obviously fits in memory just fine because the looping thing works well. Maybe it's fine when you loop through them one by one, but the single file chunk turns out to be too large when dask wants to parallelize the whole thing. I really have no idea.

I'd be very intrigued to see if you can get a similar result by doing a simple 2x2xtime chunking. By the way, chunks={'x':1000, 'y':500, 'time':1} produces similar results with some overhead. Extraction took ~1.5 seconds.

EDIT: python print(xr.__version__) print(dask.__version__) 0.9.5 0.14.1

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  selecting a point from an mfdataset 225774140

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.189ms · About: xarray-datasette