home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where user = 10678620 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 2

  • open_mfdataset very slow 2
  • Use read1 instead of read to get magic number 1

user 1

  • groutr · 3 ✖

author_association 1

  • NONE 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1489744942 https://github.com/pydata/xarray/pull/7698#issuecomment-1489744942 https://api.github.com/repos/pydata/xarray/issues/7698 IC_kwDOAMm_X85Yy7Qu groutr 10678620 2023-03-30T06:02:41Z 2023-03-30T06:03:03Z NONE

Agreed, and a reference to a pretty authoritative source: https://github.com/python/cpython/blob/3.11/Modules/_io/bufferedio.c#L915

It's confusing the method has a parameter called filename_or_obj but doesn't actually handle filenames.

One workaround is to use os.read when passed a filename, and .read() when passed a file object. Something similar to: python def get_magic_number(filename_or_obj, count=8): if isinstance(filename_or_obj, (str, os.PathLike)): fd = os.open(filename_or_obj, os.RDONLY) # Append os.O_BINARY on windows magic_number = os.read(fd, count) if len(magic_number) != count: raise TypeError("Error reading magic number") os.close(fd) elif isinstance(filename_or_obj, io.BufferedIOBase): if filename_or_obj.seekable(): pos = filename_or_obj.tell() filename_or_obj.seek(0) magic_number = filename_or_obj.read(count) filename_or_obj.seek(pos) else: raise TypeError("File not seekable.") else: raise TypeError("Cannot read magic number.") return magic_number On my laptop (w/ SSD) using os.read is about 2x faster than using .read()

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use read1 instead of read to get magic number 1646350377
1489312337 https://github.com/pydata/xarray/issues/7697#issuecomment-1489312337 https://api.github.com/repos/pydata/xarray/issues/7697 IC_kwDOAMm_X85YxRpR groutr 10678620 2023-03-29T20:59:24Z 2023-03-29T20:59:24Z NONE

@dcherian I'll look at that. I thought the compat='override' option bypassed most of the consistency checking. In my case, it is typically safe to assume the set of files are consistent (each file represents one timestep, the structure of each file is otherwise identical).

@headtr1ck I was just informed that the underlying filesystem is actually a networked filesystem. The PR might still be useful, but the latest profile seems more reasonable in light of my new info.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset very slow 1646267547
1489267595 https://github.com/pydata/xarray/issues/7697#issuecomment-1489267595 https://api.github.com/repos/pydata/xarray/issues/7697 IC_kwDOAMm_X85YxGuL groutr 10678620 2023-03-29T20:30:49Z 2023-03-29T20:33:28Z NONE

It seems that this problematic code is mostly used to determine the engine that is used to finally open it. Did you try specifying the correct engine directly?

I tried setting the engine to 'netcdf4' and while it did help a little bit, it still seems slow on my system.

Here is my profile with engine='netcdf4'

I'm not sure what to make of this profile. I don't see anything in the file_manager that would be especially slow. Perhaps it is a filesystem bottleneck at this point (given that the cpu time is 132s of the total 288s duration).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset very slow 1646267547

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.018ms · About: xarray-datasette