home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

2 rows where user = 36678697 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue 2

state 1

  • closed 2

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2037869483 I_kwDOAMm_X855d2ur 8544 Reading netcdf file with string coordinates makes IPython kernel crash (netcdf4 engine) Paul-Aime 36678697 closed 0     14 2023-12-12T14:26:42Z 2024-01-04T21:49:34Z 2023-12-22T09:29:45Z NONE      

What happened?

When trying to open a netcdf file that has strings as coordinates it makes the notebook kernel crash.

This only happens when engine=netcdf4, and not when engine=h5netcdf.

The bug occurs in IPython, in Jupyter in the web browser and in VSCode notebooks at least.

The bug can consistently be reproduced when reading the same file twice on the same cell, when running the cell twice.

What did you expect to happen?

It is expected for engine=netcdf4 to work the same as engine=h5netcdf, i.e. don't make the kernel crash.

Minimal Complete Verifiable Example

```Python

%%

import numpy as np import xarray as xr

%%

fpath = "test.nc"

da = xr.DataArray( data=np.random.randn(3, 10), dims=["label", "values"], coords=dict( label=["a", "b", "c"], ), ) da.to_netcdf(fpath)

%%

engine = "h5netcdf"

engine = "netcdf4" xr.open_dataarray(fpath, engine=engine) xr.open_dataarray(fpath, engine=engine) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

IPython crashes with: Segmentation fault (core dumped)

Jupyter Notebook logs:

[I 2023-12-12 15:20:00.474 ServerApp] Kernel restarted: 054a63c4-4f46-4dc2-b58f-4dcd4ce9951c [I 2023-12-12 15:20:00.482 ServerApp] Starting buffering for 054a63c4-4f46-4dc2-b58f-4dcd4ce9951c:0bd5dcd6-faa7-413a-b6c5-080b1c774933 [I 2023-12-12 15:20:00.494 ServerApp] Connecting to kernel 054a63c4-4f46-4dc2-b58f-4dcd4ce9951c. [I 2023-12-12 15:20:00.494 ServerApp] Restoring connection for 054a63c4-4f46-4dc2-b58f-4dcd4ce9951c:0bd5dcd6-faa7-413a-b6c5-080b1c774933 0.00s - Debugger warning: It seems that frozen modules are being used, which may 0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off 0.00s - to python to disable frozen modules. 0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation. [IPKernelApp] WARNING | Unknown error in handling startup files: [I 2023-12-12 15:20:09.463 ServerApp] AsyncIOLoopKernelRestarter: restarting kernel (1/5), keep random ports [W 2023-12-12 15:20:09.463 ServerApp] kernel 054a63c4-4f46-4dc2-b58f-4dcd4ce9951c restarted [I 2023-12-12 15:20:09.470 ServerApp] Starting buffering for 054a63c4-4f46-4dc2-b58f-4dcd4ce9951c:0bd5dcd6-faa7-413a-b6c5-080b1c774933 [I 2023-12-12 15:20:09.504 ServerApp] Connecting to kernel 054a63c4-4f46-4dc2-b58f-4dcd4ce9951c. [I 2023-12-12 15:20:09.505 ServerApp] Restoring connection for 054a63c4-4f46-4dc2-b58f-4dcd4ce9951c:0bd5dcd6-faa7-413a-b6c5-080b1c774933 0.00s - Debugger warning: It seems that frozen modules are being used, which may 0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off 0.00s - to python to disable frozen modules. 0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation. [IPKernelApp] WARNING | Unknown error in handling startup files:

VSCode notebook Jupyter logs:

``` 15:23:03.501 [info] Restart requested ~/Desktop/bug_xarray_notebook/bug.ipynb 15:23:03.502 [info] Dispose Kernel process 2763594. 15:23:03.589 [info] Process Execution: ~/miniconda3/bin/python -c "import ipykernel; print(ipykernel.version); print("5dc3a68c-e34e-4080-9c3e-2a532b2ccb4d"); print(ipykernel.file)" 15:23:03.671 [info] Process Execution: ~/miniconda3/bin/python -m ipykernel_launcher --f=~/.local/share/jupyter/runtime/kernel-v2-2727807dzOm3m1LEA5V.json > cwd: ~/Desktop/bug_xarray_notebook 15:23:04.149 [warn] StdErr from Kernel Process [IPKernelApp] WARNING | Unknown error in handling startup files: 15:23:04.454 [info] Restarted bd04fd87-98e7-486d-a6c6-7308101edcdf 15:23:08.046 [info] Handle Execution of Cells 0 for ~/Desktop/bug_xarray_notebook/bug.ipynb 15:23:08.055 [info] Kernel acknowledged execution of cell 0 @ 1702390988054 15:23:08.412 [info] End cell 0 execution after 0.358s, completed @ 1702390988412, started @ 1702390988054 15:23:09.260 [info] Handle Execution of Cells 1 for ~/Desktop/bug_xarray_notebook/bug.ipynb 15:23:09.269 [info] Kernel acknowledged execution of cell 1 @ 1702390989268 15:23:09.305 [info] End cell 1 execution after 0.036s, completed @ 1702390989304, started @ 1702390989268 15:23:10.893 [info] Handle Execution of Cells 2 for ~/Desktop/bug_xarray_notebook/bug.ipynb 15:23:10.907 [info] Kernel acknowledged execution of cell 2 @ 1702390990907 15:23:10.971 [info] End cell 2 execution after 0.064s, completed @ 1702390990971, started @ 1702390990907 15:23:12.255 [info] Handle Execution of Cells 2 for ~/Desktop/bug_xarray_notebook/bug.ipynb 15:23:12.262 [info] Kernel acknowledged execution of cell 2 @ 1702390992262 15:23:12.504 [error] Disposing session as kernel process died ExitCode: undefined, Reason: [IPKernelApp] WARNING | Unknown error in handling startup files:

15:23:12.505 [info] Dispose Kernel process 2764104. 15:23:12.518 [info] End cell 2 execution after -1702390992.262s, completed @ undefined, started @ 1702390992262 ```

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.12.0 | packaged by conda-forge | (main, Oct 3 2023, 08:43:22) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 5.15.0-91-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2 xarray: 2023.12.0 pandas: 2.1.4 numpy: 1.26.2 scipy: None netCDF4: 1.6.5 pydap: None h5netcdf: 1.3.0 h5py: 3.10.0 Nio: None zarr: None cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: None mypy: None IPython: 8.18.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8544/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
627735640 MDU6SXNzdWU2Mjc3MzU2NDA= 4113 xarray.DataArray.stack load data into memory Paul-Aime 36678697 closed 0     6 2020-05-30T13:45:38Z 2022-04-19T16:10:26Z 2022-04-19T16:10:25Z NONE      

Stacking is loading the data into memory, which is unexpected, or at least undocumented, afaik.

MCVE Code Sample

```python import os import psutil import numpy as np import xarray as xr

def main():

xr.DataArray(
    np.random.randn(1024, 1024, 100),
    dims=("x", "y", "z"),
).to_netcdf("da.nc")

da = xr.open_dataarray("da.nc")
print(f" da: {mb(da.nbytes)} MB")
print_ram_state()

mda = da.stack(px=("x", "y"))
print_ram_state()

def print_ram_state(): # https://stackoverflow.com/a/21632554 process = psutil.Process(os.getpid()) ram_state = process.memory_info().rss print(f"RAM: {mb(ram_state) :.2f} MB")

def mb(nbytes): return nbytes / (1024 * 1024)

if name == "main": main()

```

Problem Description

Using xarray.DataArray.stack method is loading the data into memory, which is unexpected behavior, or at least undocumented afaik.

Versions

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 23:03:20) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.3.0-53-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: None xarray: 0.15.1 pandas: 1.0.3 numpy: 1.17.5 scipy: 1.4.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.16.0 distributed: 2.16.0 matplotlib: 3.2.1 cartopy: None seaborn: 0.10.1 numbagg: None setuptools: 46.4.0.post20200518 pip: 20.1.1 conda: 4.8.3 pytest: 5.4.2 IPython: 7.14.0 sphinx: 3.0.4
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4113/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 19.186ms · About: xarray-datasette