home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

16 rows where issue = 1402002645 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • d1mach 8
  • keewis 6
  • max-sixty 2

author_association 2

  • MEMBER 8
  • NONE 8

issue 1

  • Segfault writing large netcdf files to s3fs · 16 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1272560073 https://github.com/pydata/xarray/issues/7146#issuecomment-1272560073 https://api.github.com/repos/pydata/xarray/issues/7146 IC_kwDOAMm_X85L2bnJ keewis 14808389 2022-10-09T14:56:28Z 2022-10-09T14:57:44Z MEMBER

Since we have eliminated xarray with this, you should be able to submit an issue to the h5py issue tracker while mentioning that this is probably a bug in libhdf5 since netcdf4 also fails with the same error (and you can also link this issue for more information)

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segfault writing large netcdf files to s3fs 1402002645
1272558504 https://github.com/pydata/xarray/issues/7146#issuecomment-1272558504 https://api.github.com/repos/pydata/xarray/issues/7146 IC_kwDOAMm_X85L2bOo d1mach 11075246 2022-10-09T14:49:33Z 2022-10-09T14:49:33Z NONE

I had to change ints and floats to doubles to reproduce the issue. ```python import h5py

N_TIMES = 48 with h5py.File("/my_s3_fs/test.nc", mode="w") as f: time = f.create_dataset("time", (N_TIMES,), dtype="d") time[:] = 0

d1 = f.create_dataset("d1", (N_TIMES, 201, 201), dtype="d")
d1[:] = 0

```

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segfault writing large netcdf files to s3fs 1402002645
1272555653 https://github.com/pydata/xarray/issues/7146#issuecomment-1272555653 https://api.github.com/repos/pydata/xarray/issues/7146 IC_kwDOAMm_X85L2aiF keewis 14808389 2022-10-09T14:36:13Z 2022-10-09T14:36:13Z MEMBER

great, good to know. Can you try this with h5py: ```python import h5py

N_TIMES = 48 with h5py.File("test.nc", mode="w") as f: time = f.create_dataset("time", (N_TIMES,), dtype="i") time[:] = 0

d1 = f.create_dataset("d1", (N_TIMES, 201, 201), dtype="f")
d1[:] = 0

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segfault writing large netcdf files to s3fs 1402002645
1272553921 https://github.com/pydata/xarray/issues/7146#issuecomment-1272553921 https://api.github.com/repos/pydata/xarray/issues/7146 IC_kwDOAMm_X85L2aHB d1mach 11075246 2022-10-09T14:27:06Z 2022-10-09T14:27:06Z NONE

datatype seems to be not important. But the two variables are required to get a segfault. The following with just floats produces a segfault ``` import numpy as np import xarray as xr

N_TIMES=48 ds = xr.Dataset({"time": ("T", np.zeros((N_TIMES))), 'd1': (["T", "x", "y"], np.zeros((N_TIMES, 201,201)))}) ds.to_netcdf(path="/my_s3_fs/test_netcdf.nc", format="NETCDF4", mode="w") ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segfault writing large netcdf files to s3fs 1402002645
1272544819 https://github.com/pydata/xarray/issues/7146#issuecomment-1272544819 https://api.github.com/repos/pydata/xarray/issues/7146 IC_kwDOAMm_X85L2X4z d1mach 11075246 2022-10-09T13:37:57Z 2022-10-09T14:25:51Z NONE

It seems that we need the time variable to reproduce the problem. The following code does not fail: ``` import numpy as np import xarray as xr import pandas as pd

N_TIMES=64 ds = xr.Dataset({'d1': (["T", "x", "y"], np.zeros((N_TIMES, 201,201)))}) ds.to_netcdf(path="/my_s3_fs/test_netcdf.nc", format="NETCDF4", mode="w") ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segfault writing large netcdf files to s3fs 1402002645
1272550986 https://github.com/pydata/xarray/issues/7146#issuecomment-1272550986 https://api.github.com/repos/pydata/xarray/issues/7146 IC_kwDOAMm_X85L2ZZK keewis 14808389 2022-10-09T14:09:44Z 2022-10-09T14:09:44Z MEMBER

okay, then does changing the dtype do anything? I.e. does this only happen with datetime64 / bytes, or do int / float / str also fail?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segfault writing large netcdf files to s3fs 1402002645
1272542780 https://github.com/pydata/xarray/issues/7146#issuecomment-1272542780 https://api.github.com/repos/pydata/xarray/issues/7146 IC_kwDOAMm_X85L2XY8 keewis 14808389 2022-10-09T13:26:25Z 2022-10-09T13:26:25Z MEMBER

with this: python ds2 = ds.time.dt.strftime("%Y%m%d%H%M%S").str.encode("utf-8").to_dataset().assign(d1=ds.d1) but we don't really need to check if the first dataset already fails.

Now I'd probably check if it's just the size that makes it fail (i.e. remove "time" from ds and keep just d1 while maybe increasing it by one if it does not fail as-is), or if it depends on the dtype (i.e. replace set time_vals to np.arange(N_TIMES, dtype=int)).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segfault writing large netcdf files to s3fs 1402002645
1272541759 https://github.com/pydata/xarray/issues/7146#issuecomment-1272541759 https://api.github.com/repos/pydata/xarray/issues/7146 IC_kwDOAMm_X85L2XI_ d1mach 11075246 2022-10-09T13:21:22Z 2022-10-09T13:21:40Z NONE

The first one results in a segfault: ```python import numpy as np import xarray as xr import pandas as pd

N_TIMES = 48 time_vals = pd.date_range("2022-10-06", freq="20 min", periods=N_TIMES) ds = xr.Dataset({"time": ("T", time_vals), 'd1': (["T", "x", "y"], np.zeros((len(time_vals), 201,201)))}) ds.to_netcdf(path="/my_s3_fs/test_netcdf.nc", format="NETCDF4", mode="w") ``` Not sure how to add the 3D var to the second dataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segfault writing large netcdf files to s3fs 1402002645
1272539394 https://github.com/pydata/xarray/issues/7146#issuecomment-1272539394 https://api.github.com/repos/pydata/xarray/issues/7146 IC_kwDOAMm_X85L2WkC keewis 14808389 2022-10-09T13:10:12Z 2022-10-09T13:10:25Z MEMBER

which ones fail if you add the 3D variable?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segfault writing large netcdf files to s3fs 1402002645
1272539102 https://github.com/pydata/xarray/issues/7146#issuecomment-1272539102 https://api.github.com/repos/pydata/xarray/issues/7146 IC_kwDOAMm_X85L2Wfe d1mach 11075246 2022-10-09T13:08:26Z 2022-10-09T13:08:26Z NONE

Will try to reproduce this with h5py. For the bug to show up the file has to be large enough. That is why my example has a 2D array variable alongside the time dimension. With just the time dimension the script completes without an error. All three cases work without an error: ds.to_netcdf(), ds2.to_netcdf(), and ds3.to_netcdf()

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segfault writing large netcdf files to s3fs 1402002645
1272535683 https://github.com/pydata/xarray/issues/7146#issuecomment-1272535683 https://api.github.com/repos/pydata/xarray/issues/7146 IC_kwDOAMm_X85L2VqD keewis 14808389 2022-10-09T12:48:51Z 2022-10-09T12:49:28Z MEMBER

if this crashes with both netcdf4 and h5netcdf this might be a bug in the libhdf5 library. If we can manage to reduce this to use just h5py (or netCDF4), it should be suitable for reporting on their issue tracker, and those libraries can then push it further to libhdf5 (otherwise, if you're up for investigating / debugging the C library, you could also report to libhdf5 directly).

As for the MCVE: I wonder if we can trim it a bit. Can you reproduce with ```python import xarray as xr import pandas as pd

N_TIMES = 48 time_vals = pd.date_range("2022-10-06", freq="20 min", periods=N_TIMES) ds = xr.Dataset({"time": ("T", time_vals)}) ds.to_netcdf(path="/my_s3_fs/test_netcdf.nc", format="NETCDF4", mode="w") or, if it is important to have bytes:python ds2 = ds.time.dt.strftime("%Y%m%d%H%M%S").str.encode("utf-8").to_dataset() ds2.to_netcdf(...) also it would be interesting to know if this happens only for data variables, or if coordinates have the same effect (use `ds2` instead of `ds2` if bytes are important):python ds3 = ds.set_coords("time") ds3.to_netcdf(...) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segfault writing large netcdf files to s3fs 1402002645
1272514059 https://github.com/pydata/xarray/issues/7146#issuecomment-1272514059 https://api.github.com/repos/pydata/xarray/issues/7146 IC_kwDOAMm_X85L2QYL d1mach 11075246 2022-10-09T10:48:35Z 2022-10-09T10:48:35Z NONE

Adding a gdb stackrace.txt from corefile obtained with docker run -v /mnt/fs:/my_s3_fs -it --rm --ulimit core=-1 --privileged netcdf:latest /bin/bash and sudo sysctl -w kernel.core_pattern=/tmp/core-%e.%p.%h.%t python mcve.py

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segfault writing large netcdf files to s3fs 1402002645
1272366287 https://github.com/pydata/xarray/issues/7146#issuecomment-1272366287 https://api.github.com/repos/pydata/xarray/issues/7146 IC_kwDOAMm_X85L1sTP max-sixty 5635139 2022-10-08T17:43:18Z 2022-10-08T17:43:18Z MEMBER

Thanks @d1mach . Could it be related to https://github.com/pydata/xarray/issues/7136 ?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segfault writing large netcdf files to s3fs 1402002645
1272365368 https://github.com/pydata/xarray/issues/7146#issuecomment-1272365368 https://api.github.com/repos/pydata/xarray/issues/7146 IC_kwDOAMm_X85L1sE4 d1mach 11075246 2022-10-08T17:37:24Z 2022-10-08T17:37:24Z NONE

libnetcdf, netcdf4 and hdf5 are at their latest versions available on conda-forge

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segfault writing large netcdf files to s3fs 1402002645
1272364031 https://github.com/pydata/xarray/issues/7146#issuecomment-1272364031 https://api.github.com/repos/pydata/xarray/issues/7146 IC_kwDOAMm_X85L1rv_ d1mach 11075246 2022-10-08T17:30:41Z 2022-10-08T17:30:41Z NONE

Can confirm the issue with xarray 2022.6.0 and dask 2022.9.2. The latest versions available on conda-forge. The issue might be related to netcdf4 and hdf5 libraries. Will try to update this as well.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segfault writing large netcdf files to s3fs 1402002645
1272362257 https://github.com/pydata/xarray/issues/7146#issuecomment-1272362257 https://api.github.com/repos/pydata/xarray/issues/7146 IC_kwDOAMm_X85L1rUR max-sixty 5635139 2022-10-08T17:20:00Z 2022-10-08T17:20:00Z MEMBER

That's quite an old version of xarray! Could we confirm it has similar results on a more recent version?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segfault writing large netcdf files to s3fs 1402002645

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.342ms · About: xarray-datasette