home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 1646267547 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • dcherian 2
  • groutr 2
  • Illviljan 1
  • headtr1ck 1

author_association 3

  • MEMBER 3
  • NONE 2
  • COLLABORATOR 1

issue 1

  • open_mfdataset very slow · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1489341690 https://github.com/pydata/xarray/issues/7697#issuecomment-1489341690 https://api.github.com/repos/pydata/xarray/issues/7697 IC_kwDOAMm_X85YxYz6 dcherian 2448579 2023-03-29T21:20:59Z 2023-03-29T21:20:59Z MEMBER

I thought the compat='override' option bypassed most of the consistency checking.

we still construct a dataset representation for each file which involves reading all coordinates etc. The consistency checking is bypassed at the "concatenation" stage.

You could also speed using dask by setting up a cluster and using parallel=True

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset very slow 1646267547
1489312337 https://github.com/pydata/xarray/issues/7697#issuecomment-1489312337 https://api.github.com/repos/pydata/xarray/issues/7697 IC_kwDOAMm_X85YxRpR groutr 10678620 2023-03-29T20:59:24Z 2023-03-29T20:59:24Z NONE

@dcherian I'll look at that. I thought the compat='override' option bypassed most of the consistency checking. In my case, it is typically safe to assume the set of files are consistent (each file represents one timestep, the structure of each file is otherwise identical).

@headtr1ck I was just informed that the underlying filesystem is actually a networked filesystem. The PR might still be useful, but the latest profile seems more reasonable in light of my new info.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset very slow 1646267547
1489302292 https://github.com/pydata/xarray/issues/7697#issuecomment-1489302292 https://api.github.com/repos/pydata/xarray/issues/7697 IC_kwDOAMm_X85YxPMU dcherian 2448579 2023-03-29T20:53:37Z 2023-03-29T20:53:37Z MEMBER

Fundamentally, xarray has to touch every file because there is no guarantee they are consistent with each other.

A number of us now use kerchunk to create virtual aggregate datasets that can be read a lot faster.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset very slow 1646267547
1489267595 https://github.com/pydata/xarray/issues/7697#issuecomment-1489267595 https://api.github.com/repos/pydata/xarray/issues/7697 IC_kwDOAMm_X85YxGuL groutr 10678620 2023-03-29T20:30:49Z 2023-03-29T20:33:28Z NONE

It seems that this problematic code is mostly used to determine the engine that is used to finally open it. Did you try specifying the correct engine directly?

I tried setting the engine to 'netcdf4' and while it did help a little bit, it still seems slow on my system.

Here is my profile with engine='netcdf4'

I'm not sure what to make of this profile. I don't see anything in the file_manager that would be especially slow. Perhaps it is a filesystem bottleneck at this point (given that the cpu time is 132s of the total 288s duration).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset very slow 1646267547
1489146483 https://github.com/pydata/xarray/issues/7697#issuecomment-1489146483 https://api.github.com/repos/pydata/xarray/issues/7697 IC_kwDOAMm_X85YwpJz headtr1ck 43316012 2023-03-29T19:02:39Z 2023-03-29T19:02:39Z COLLABORATOR

It seems that this problematic code is mostly used to determine the engine that is used to finally open it. Did you try specifying the correct engine directly?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset very slow 1646267547
1489083542 https://github.com/pydata/xarray/issues/7697#issuecomment-1489083542 https://api.github.com/repos/pydata/xarray/issues/7697 IC_kwDOAMm_X85YwZyW Illviljan 14371165 2023-03-29T18:17:35Z 2023-03-29T18:17:35Z MEMBER

Looks like you almost got this figured out! You want to create a PR for this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset very slow 1646267547

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 223.138ms · About: xarray-datasette