home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where issue = 662505658 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 6

  • shoyer 3
  • michaelaye 2
  • brianmapes 1
  • mullenkamp 1
  • markusritschel 1
  • kmuehlbauer 1

author_association 2

  • NONE 5
  • MEMBER 4

issue 1

  • jupyter repr caching deleted netcdf file · 9 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1267571723 https://github.com/pydata/xarray/issues/4240#issuecomment-1267571723 https://api.github.com/repos/pydata/xarray/issues/4240 IC_kwDOAMm_X85LjZwL mullenkamp 2656596 2022-10-04T20:58:37Z 2022-10-04T21:00:08Z NONE

Running xarray.backends.file_manager.FILE_CACHE.clear() fixed the issue for me. I couldn't find any other way to stop xarray from pulling up some old data from a newly saved file. I'm using the h5netcdf engine with xarray version 2022.6.0 by the way.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  jupyter repr caching deleted netcdf file 662505658
1258874354 https://github.com/pydata/xarray/issues/4240#issuecomment-1258874354 https://api.github.com/repos/pydata/xarray/issues/4240 IC_kwDOAMm_X85LCOXy brianmapes 2086210 2022-09-27T02:14:28Z 2022-09-27T02:14:28Z NONE

+1 Complicated, still vexing this user a year+ later, but it easier for me to just restart the kernel again and again than read this and #4879, which is closed but didn't seem to have succeeded if I read correctly?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  jupyter repr caching deleted netcdf file 662505658
774774033 https://github.com/pydata/xarray/issues/4240#issuecomment-774774033 https://api.github.com/repos/pydata/xarray/issues/4240 MDEyOklzc3VlQ29tbWVudDc3NDc3NDAzMw== shoyer 1217238 2021-02-07T21:48:38Z 2021-02-07T21:48:38Z MEMBER

I have a tentative fix for this in https://github.com/pydata/xarray/pull/4879. It would be great if someone could give this a try to verify that it resolve the issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  jupyter repr caching deleted netcdf file 662505658
764726612 https://github.com/pydata/xarray/issues/4240#issuecomment-764726612 https://api.github.com/repos/pydata/xarray/issues/4240 MDEyOklzc3VlQ29tbWVudDc2NDcyNjYxMg== kmuehlbauer 5821660 2021-01-21T15:34:42Z 2021-01-21T15:46:36Z MEMBER

I've stumbled over this weird behaviour many times and was wondering why this happens. So AFAICT @shoyer hit the nail on the head but the root cause is that the Dataset is added to the notebook namespace somehow, if one just evaluates it in the cell.

This doesn't happen if you invoke the __repr__ via

python display(xr.open_dataset("saved_on_disk.nc"))

I've forced myself to use either print or display for xarray data. As this also happens if the Dataset is attached to a variable you would need to specifically delete (or .close()) the variable in question before opening again.

python try: del ds except NameError: pass ds = xr.open_dataset("saved_on_disk.nc")

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  jupyter repr caching deleted netcdf file 662505658
676326130 https://github.com/pydata/xarray/issues/4240#issuecomment-676326130 https://api.github.com/repos/pydata/xarray/issues/4240 MDEyOklzc3VlQ29tbWVudDY3NjMyNjEzMA== markusritschel 3332539 2020-08-19T13:07:05Z 2020-08-19T13:07:05Z NONE

Would it be an option to consider the time stamp of the file's last change as a caching criterion?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  jupyter repr caching deleted netcdf file 662505658
663794065 https://github.com/pydata/xarray/issues/4240#issuecomment-663794065 https://api.github.com/repos/pydata/xarray/issues/4240 MDEyOklzc3VlQ29tbWVudDY2Mzc5NDA2NQ== shoyer 1217238 2020-07-25T02:05:18Z 2020-07-25T02:05:18Z MEMBER

Probably the easiest work around is to call .close() on the original dataset. Failing that, the file is cached in xarray.backends.file_manager.FILE_CACHE, which you could muck around with.

I believe it only gets activated by repr() because array values from netCDF file are loaded lazily. Not 100% without more testing, though.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  jupyter repr caching deleted netcdf file 662505658
663791784 https://github.com/pydata/xarray/issues/4240#issuecomment-663791784 https://api.github.com/repos/pydata/xarray/issues/4240 MDEyOklzc3VlQ29tbWVudDY2Mzc5MTc4NA== michaelaye 69774 2020-07-25T01:41:20Z 2020-07-25T01:41:20Z NONE

now i'm wondering why the caching logic is only activated by the repr? As you can see, when printed, it always updated to the status on disk?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  jupyter repr caching deleted netcdf file 662505658
663791386 https://github.com/pydata/xarray/issues/4240#issuecomment-663791386 https://api.github.com/repos/pydata/xarray/issues/4240 MDEyOklzc3VlQ29tbWVudDY2Mzc5MTM4Ng== michaelaye 69774 2020-07-25T01:37:20Z 2020-07-25T01:37:20Z NONE

is there a workaround for forcing the opening without restarting the notebook?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  jupyter repr caching deleted netcdf file 662505658
663790991 https://github.com/pydata/xarray/issues/4240#issuecomment-663790991 https://api.github.com/repos/pydata/xarray/issues/4240 MDEyOklzc3VlQ29tbWVudDY2Mzc5MDk5MQ== shoyer 1217238 2020-07-25T01:33:36Z 2020-07-25T01:33:36Z MEMBER

Thanks for the clear example!

This happens dues to xarray's caching logic for files: https://github.com/pydata/xarray/blob/b1c7e315e8a18e86c5751a0aa9024d41a42ca5e8/xarray/backends/file_manager.py#L50-L76

This means that when you open the same filename, xarray doesn't actually reopen the file from disk -- instead it points to the same file object already cached in memory.

I can see why this could be confusing. We do need this caching logic for files opened from the same backends.*DataStore class, but this could include some sort of unique identifier (i.e., from uuid) to ensure each separate call to xr.open_dataset results in a separately cached/opened file object: https://github.com/pydata/xarray/blob/b1c7e315e8a18e86c5751a0aa9024d41a42ca5e8/xarray/backends/netCDF4_.py#L355-L357

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  jupyter repr caching deleted netcdf file 662505658

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.05ms · About: xarray-datasette