home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 662505658

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
662505658 MDU6SXNzdWU2NjI1MDU2NTg= 4240 jupyter repr caching deleted netcdf file 69774 closed 0     9 2020-07-21T02:50:04Z 2022-10-18T16:40:41Z 2022-10-18T16:40:41Z NONE      

What happened:

Testing xarray data storage in a jupyter notebook with varying data sizes and storing to a netcdf, i noticed that open_dataset/array (both show this behaviour) continue to return data from the first testing run, ignoring the fact that each run deletes the previously created netcdf file. This only happens once the repr was used to display the xarray object. But once in error mode, even the previously fine printed objects are then showing the wrong data.

This was hard to track down as it depends on the precise sequence in jupyter.

What you expected to happen:

when i use open_dataset/array, the resulting object should reflect reality on disk.

Minimal Complete Verifiable Example:

```python import xarray as xr from pathlib import Path import numpy as np

def test_repr(nx): ds = xr.DataArray(np.random.rand(nx)) path = Path("saved_on_disk.nc") if path.exists(): path.unlink() ds.to_netcdf(path) return path ```

When executed in a cell with print for display, all is fine: python test_repr(4) print(xr.open_dataset("saved_on_disk.nc")) test_repr(5) print(xr.open_dataset("saved_on_disk.nc"))

but as soon as one cell used the jupyter repr:

python xr.open_dataset("saved_on_disk.nc")

all future file reads, even after executing the test function again and even using print and not repr, show the data from the last repr use.

Anything else we need to know?:

Here's a notebook showing the issue: https://gist.github.com/05c2542ed33662cdcb6024815cc0c72c

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-40-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.0 pandas: 1.0.5 numpy: 1.19.0 scipy: 1.5.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.5 cfgrib: None iris: None bottleneck: None dask: 2.21.0 distributed: 2.21.0 matplotlib: 3.3.0 cartopy: 0.18.0 seaborn: 0.10.1 numbagg: None pint: None setuptools: 49.2.0.post20200712 pip: 20.1.1 conda: installed pytest: 6.0.0rc1 IPython: 7.16.1 sphinx: 3.1.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4240/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 9 rows from issue in issue_comments
Powered by Datasette · Queries took 0.636ms · About: xarray-datasette