home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where issue = 803068773 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • shoyer 6
  • cjauvin 2
  • dcherian 2

author_association 2

  • MEMBER 8
  • CONTRIBUTOR 2

issue 1

  • Cache files for different CachingFileManager objects separately · 10 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1278202565 https://github.com/pydata/xarray/pull/4879#issuecomment-1278202565 https://api.github.com/repos/pydata/xarray/issues/4879 IC_kwDOAMm_X85ML9LF shoyer 1217238 2022-10-13T21:34:05Z 2022-10-13T21:34:05Z MEMBER

I think we could fix this by marking CachingFileManager with typing.final

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cache files for different CachingFileManager objects separately 803068773
1277939577 https://github.com/pydata/xarray/pull/4879#issuecomment-1277939577 https://api.github.com/repos/pydata/xarray/issues/4879 IC_kwDOAMm_X85MK895 dcherian 2448579 2022-10-13T17:20:38Z 2022-10-13T17:20:38Z MEMBER

mypy error:

xarray/backends/file_manager.py:277: error: Accessing "init" on an instance is unsound, since instance.init could be from an incompatible subclass [misc]

for python def __setstate__(self, state): """Restore from a pickle.""" opener, args, mode, kwargs, lock = state self.__init__(opener, *args, mode=mode, kwargs=kwargs, lock=lock)

Seems like we can just ignore?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cache files for different CachingFileManager objects separately 803068773
1269050790 https://github.com/pydata/xarray/pull/4879#issuecomment-1269050790 https://api.github.com/repos/pydata/xarray/issues/4879 IC_kwDOAMm_X85LpC2m shoyer 1217238 2022-10-05T22:27:28Z 2022-10-05T22:27:28Z MEMBER

Anyone want to review here? I think this should be ready to go in.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cache files for different CachingFileManager objects separately 803068773
1268700309 https://github.com/pydata/xarray/pull/4879#issuecomment-1268700309 https://api.github.com/repos/pydata/xarray/issues/4879 IC_kwDOAMm_X85LntSV shoyer 1217238 2022-10-05T17:06:02Z 2022-10-05T17:57:19Z MEMBER

~~Actually maybe we don't need to keep files open after pickling... let me give this one more try.~~

Nevermind, this didn't work -- it still results in failing tests with dask-distributed on Windows.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cache files for different CachingFileManager objects separately 803068773
1268684962 https://github.com/pydata/xarray/pull/4879#issuecomment-1268684962 https://api.github.com/repos/pydata/xarray/issues/4879 IC_kwDOAMm_X85Lnpii shoyer 1217238 2022-10-05T16:51:14Z 2022-10-05T16:51:14Z MEMBER

OK, after a bit more futzing tests are passing and I think this is actually ready to go in. I ended up leaving in the reference counting after all -- I couldn't figure out another way to keep files open after a pickle round-trip.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cache files for different CachingFileManager objects separately 803068773
1261264943 https://github.com/pydata/xarray/pull/4879#issuecomment-1261264943 https://api.github.com/repos/pydata/xarray/issues/4879 IC_kwDOAMm_X85LLWAv dcherian 2448579 2022-09-28T17:52:51Z 2022-09-28T17:52:51Z MEMBER

I don't think we can solve this in Xarray without fixes netCDF4-Python or the netCDF-C library:

I think we should document this and merge. Though the test failures are real (having trouble cleaning up on windows when deleting the temp file), and the diff includes some zarr files right now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cache files for different CachingFileManager objects separately 803068773
1259823913 https://github.com/pydata/xarray/pull/4879#issuecomment-1259823913 https://api.github.com/repos/pydata/xarray/issues/4879 IC_kwDOAMm_X85LF2Mp shoyer 1217238 2022-09-27T17:26:06Z 2022-09-27T17:26:06Z MEMBER

I added @cjauvin's integration test, and verified that the fix works for the scipy and h5netcdf backends.

Unfortunately, it doesn't work yet for the netCDF4 backend. I don't think we can solve this in Xarray without fixes netCDF4-Python or the netCDF-C library: https://github.com/Unidata/netcdf4-python/issues/1195

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cache files for different CachingFileManager objects separately 803068773
775749858 https://github.com/pydata/xarray/pull/4879#issuecomment-775749858 https://api.github.com/repos/pydata/xarray/issues/4879 MDEyOklzc3VlQ29tbWVudDc3NTc0OTg1OA== shoyer 1217238 2021-02-09T08:05:40Z 2021-02-09T08:05:40Z MEMBER

Thanks for sharing the test case, which I'll add to this PR. I ran it locally and it seems to pass with this PR for the netCDF4 and SciPy netCDF backends (but not h5netcdf). So I'm not entirely sure what to make of that.

On Mon, Feb 8, 2021 at 1:56 PM Christian Jauvin notifications@github.com wrote:

As my colleague @huard https://github.com/huard suggested, I have written an additional test which demonstrates the problem (essentially the same idea I proposed in my initial issue https://github.com/pydata/xarray/issues/4862):

master...cjauvin:add-netcdf-refresh-test https://github.com/pydata/xarray/compare/master...cjauvin:add-netcdf-refresh-test

As I explained in the issue I have a potential fix for the problem:

master...cjauvin:netcdf-caching-bug https://github.com/pydata/xarray/compare/master...cjauvin:netcdf-caching-bug

but the problem is that it feels a bit weird to have to that, so I suspect that there's a better way to solve it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/4879#issuecomment-775489828, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJJFVTISNAZV4YYOUMURLLS6BMZHANCNFSM4XH4MWIA .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cache files for different CachingFileManager objects separately 803068773
775489828 https://github.com/pydata/xarray/pull/4879#issuecomment-775489828 https://api.github.com/repos/pydata/xarray/issues/4879 MDEyOklzc3VlQ29tbWVudDc3NTQ4OTgyOA== cjauvin 488992 2021-02-08T21:56:21Z 2021-02-08T21:56:21Z CONTRIBUTOR

As my colleague @huard suggested, I have written an additional test which demonstrates the problem (essentially the same idea I proposed in my initial issue):

https://github.com/pydata/xarray/compare/master...cjauvin:add-netcdf-refresh-test

As I explained in the issue I have a potential fix for the problem:

https://github.com/pydata/xarray/compare/master...cjauvin:netcdf-caching-bug

but the problem is that it feels a bit weird to have to that, so I suspect that there's a better way to solve it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cache files for different CachingFileManager objects separately 803068773
774781361 https://github.com/pydata/xarray/pull/4879#issuecomment-774781361 https://api.github.com/repos/pydata/xarray/issues/4879 MDEyOklzc3VlQ29tbWVudDc3NDc4MTM2MQ== cjauvin 488992 2021-02-07T22:40:44Z 2021-02-07T22:42:11Z CONTRIBUTOR

Thank you for the feedback! I quickly tested your suggested fix against the script I refered to in my original issue, and it's still behaving the same if I'm not mistaken. I looked very quickly so perhaps I'm wrong, but what I seem to understand is that your fix is similar to an idea my colleague @huard had, which was to make the cached item more granular by adding a call to Path(..).stat() in the cache key tuple (the idea being that if the file has changed on disk between the two open calls, this will detect it). It doesn't work because (I think) it doesn't change the fact that the underlying netcdf file is never explicitly close, that is, this line is never called:

https://github.com/pydata/xarray/blob/a5f53e203c52a7605d5db799864046471115d04f/xarray/backends/file_manager.py#L222

Sorry in advance if something in my analysis is wrong, which is very likely!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cache files for different CachingFileManager objects separately 803068773

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.193ms · About: xarray-datasette