home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 789755611

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
789755611 MDU6SXNzdWU3ODk3NTU2MTE= 4833 Strange behaviour when overwriting files with to_netcdf and html repr 42455466 closed 0     2 2021-01-20T08:28:35Z 2021-01-20T20:00:23Z 2021-01-20T20:00:23Z NONE      

What happened:

I'm experiencing some strange behaviour when overwriting netcdf files using to_netcdf in a Jupyter notebook. The issue is a bit quirky and convoluted and only seems to come about when using xarray's html repr in Jupyter. I've tried to find a reproducible example that demonstrates the issue (it's still quite convoluted, sorry):

I can generate some data, save it to a netcdf file, reopen it and everything works as expected: ```python import numpy as np import xarray as xr

ones = xr.DataArray(np.ones(5), coords=[range(5)], dims=['x']).to_dataset(name='a')

ones.to_netcdf('./a.nc') print(xr.open_dataset('./a.nc')['a']) <xarray.DataArray 'a' (x: 5)> array([1., 1., 1., 1., 1.]) Coordinates: * x (x) int64 0 1 2 3 4 I can overwrite `a.nc` with a modified dataset and everything still works as expected:python twos = 2 * ones twos.to_netcdf('./a.nc') print(xr.open_dataset('./a.nc', cache=False)['a']) <xarray.DataArray 'a' (x: 5)> array([2., 2., 2., 2., 2.]) Coordinates: * x (x) int64 0 1 2 3 4 I can run the above cell as many times as I like and always get the expected behaviour. However, if instead of `print`ing the `open_dataset` line, I allow it to be rendered by the xarray html repr, I find that the cell will run once and then will fail with a `Permission denied` error the second time it is run:python twos.to_netcdf('./a.nc') xr.open_dataset('./a.nc', cache=False)['a']


KeyError Traceback (most recent call last) .../lib/python3.8/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock) 198 try: --> 199 file = self._cache[self._key] 200 except KeyError:

.../lib/python3.8/site-packages/xarray/backends/lru_cache.py in getitem(self, key) 52 with self._lock: ---> 53 value = self._cache[key] 54 self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('.../a.nc',), 'a', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]

During handling of the above exception, another exception occurred: . . . PermissionError: [Errno 13] Permission denied: b'.../a.nc' If I manually remove the file in question, I can resave it, but from then on xarray seems to have its wires crossed somehow and will present `twos` from `a.nc` regardless of what it actually contains:python !rm ./a.nc ones.to_netcdf('./a.nc') print(xr.open_dataset('./a.nc')['a']) <xarray.DataArray 'a' (x: 5)> array([2., 2., 2., 2., 2.]) Coordinates: * x (x) int64 0 1 2 3 4 ```

Note that in the last example, the data saved on disk is correct (i.e. contains ones) but xarray is still somehow linked to the twos data

Anything else we need to know?:

I've come across this unexpected behaviour a few times. In the above example, I've had to add cache=True to consistently produce the behaviour, but in the past I've managed to produce these symptoms without cache=True (I'm just not exactly sure how). Anecdotally, the behaviour always seems to occur after having rendered the xarray object in Jupyter using the html repr

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4833/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 0.523ms · About: xarray-datasette