home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where issue = 1310058435 and user = 1828519 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • djhoese · 4 ✖

issue 1

  • Opening fsspec s3 file twice results in invalid start byte · 4 ✖

author_association 1

  • CONTRIBUTOR 4
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1205503288 https://github.com/pydata/xarray/issues/6813#issuecomment-1205503288 https://api.github.com/repos/pydata/xarray/issues/6813 IC_kwDOAMm_X85H2oU4 djhoese 1828519 2022-08-04T16:36:43Z 2022-08-04T16:36:43Z CONTRIBUTOR

@wroberts4 I'd say maybe make a pull request and we'll see what (if any) tests fail and what the people in charge of merging think about it. I think we've gone through the various possibilities and I think if there were any thread-safety issues trying to be protected against with the exception as it was, they weren't actually being protected against (later reading of the file could have caused an issue).

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening fsspec s3 file twice results in invalid start byte 1310058435
1204355953 https://github.com/pydata/xarray/issues/6813#issuecomment-1204355953 https://api.github.com/repos/pydata/xarray/issues/6813 IC_kwDOAMm_X85HyQNx djhoese 1828519 2022-08-03T18:57:20Z 2022-08-03T18:57:20Z CONTRIBUTOR

Good point. My initial answer was going to be that it isn't a problem because in the second usage of the file we would get the exception about .tell() not being at 0, but after the .seek(0) that would be true and we wouldn't get that exception. So...I guess maybe it should be documented that xarray doesn't support opening the same file-like object from different threads. In which case, making the changes suggested here would only add usability/functionality and not cause any additional issues...unless we're missing something.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening fsspec s3 file twice results in invalid start byte 1310058435
1204316906 https://github.com/pydata/xarray/issues/6813#issuecomment-1204316906 https://api.github.com/repos/pydata/xarray/issues/6813 IC_kwDOAMm_X85HyGrq djhoese 1828519 2022-08-03T18:17:41Z 2022-08-03T18:17:41Z CONTRIBUTOR

I am not certain whether seeking/reading from the same file in multiple places might have unforeseen consequences, such as when doing open_dataset in multiple threads.

Oh duh, that's a good point. So it might be fine dask-wise if the assumption is that open_dataset is called in the main thread and then dask is used to do computations on the arrays later on. If we're talking regular Python Threads or dask delayed functions that are calling open_dataset on the same file-like object (that was passed to the worker function) then it would cause issues. Possibly rare case, but still probably something that xarray wants to support.

Yeah I thought the .read/.write omission from the IOBase class was odd too. Just wanted to point out that the if block is using .read but IOBase is not guaranteed to have .read.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening fsspec s3 file twice results in invalid start byte 1310058435
1204300671 https://github.com/pydata/xarray/issues/6813#issuecomment-1204300671 https://api.github.com/repos/pydata/xarray/issues/6813 IC_kwDOAMm_X85HyCt_ djhoese 1828519 2022-08-03T18:02:20Z 2022-08-03T18:02:20Z CONTRIBUTOR

I talked with @wroberts4 about this in person and if we're not missing some reason to not .seek(0) on a data source then this seems like a simple convenience and user experience improvement. We were thinking maybe it would make more sense to change the function to look like:

python def read_magic_number_from_file(filename_or_obj, count=8) -> bytes: # check byte header to determine file type if isinstance(filename_or_obj, bytes): magic_number = filename_or_obj[:count] elif isinstance(filename_or_obj, io.IOBase): if filename_or_obj.tell() != 0: filename_or_obj.seek(0) # warn about re-seeking? magic_number = filename_or_obj.read(count) filename_or_obj.seek(0) else: raise TypeError(f"cannot read the magic number form {type(filename_or_obj)}") return magic_number

Additionally, the isinstance check is for io.IOBase but that base class isn't guaranteed to have a .read method. The check should probably be for RawIOBase:

https://docs.python.org/3/library/io.html#class-hierarchy

@kmuehlbauer @lamorton I saw you commented on the almost related #3991, do you have any thoughts on this? Should we put a PR together to continue the discussion? Maybe the fsspec folks (@martindurant?) have an opinion on this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening fsspec s3 file twice results in invalid start byte 1310058435

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1198.424ms · About: xarray-datasette