home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

1 row where type = "issue" and user = 488992 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue · 1 ✖

state 1

  • closed 1

repo 1

  • xarray 1
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
801672790 MDU6SXNzdWU4MDE2NzI3OTA= 4862 Obtaining fresh data from the disk when reopening a NetCDF file a second time cjauvin 488992 closed 0     2 2021-02-04T22:09:09Z 2023-03-30T20:01:06Z 2023-03-30T20:01:06Z CONTRIBUTOR      

I have a program where I open a .nc file, do something with it, and want to reopen it later, after an external program has been modifying it, and my issue is that the caching mechanism will give me the already opened version of the file, and not the refreshed version on the disk. To demonstrate this behavior, let's say you have two files: bla.nc and bla_mod.nc, with different content:

```python import shutil import xarray as xr

a = xr.open_dataset("bla.nc")

Simulate external process modifying bla.nc while this script is running

shutil.copy("bla_mod.nc", "bla.nc")

a.close() # this is the only thing that WOULD make it work!

b = xr.open_dataset("bla.nc")

Here I would expect b to be different than a, but it is not

```

I understand that the file SHOULD be closed (or that I should use a context manager) in an ideal world, and that if so it would work but let's say it is not (perhaps we forgot, or we're simply being lazy).

At first I thought that I could use the cache parameter to open_dataset for that purpose, but after studying the code, I discovered that it is connected to a different caching mechanism than the one that is at play here.

After some experiments to better understand the code, I came to the conclusion that the only way my particular use case could be supported (that is, without using an explicit close or a context manager, which is, in itself, debatable, I admit) is that if the underlying netCDF4._netCDF4.Dataset file object is explicitly closed, like it is when flushed out of the cache:

https://github.com/pydata/xarray/blob/5735e163bea43ec9bc3c2e640fbf25a1d4a9d0c0/xarray/backends/file_manager.py#L222

Given that I cannot really see how, in the particular case where the user calls open_dataset for a second time, she wouldn't want the fresh version on disk, it made me think that a fix for that behavior would be to simply explicitly flush the cache immediately after the CachingFileManager for a particular dataset has been created, as I do here:

https://github.com/pydata/xarray/compare/master...cjauvin:netcdf-caching-bug

Because I admit that this looks weird at first sight (why close an object immediately after having created it?), I imagine that a better option would probably be to add a boolean option to the CachingFileManager, in order to make it optional (something like flush_and_close_file_if_already_present).

I think this subtle change would result in a more coherent experience with the exact use case that I present, but admittedly, I didn't study the overall code deeply enough to be certain that it couldn't result in unwanted side effects for some other backends.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4862/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 20.575ms · About: xarray-datasette