home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

11 rows where issue = 694112301 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 7

  • hansukyang 3
  • shoyer 2
  • bekatd 2
  • TomAugspurger 1
  • djhoese 1
  • tasansal 1
  • bilelomrani1 1

author_association 3

  • NONE 7
  • MEMBER 3
  • CONTRIBUTOR 1

issue 1

  • Threading Lock issue with to_netcdf and Dask arrays · 11 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
988359778 https://github.com/pydata/xarray/issues/4406#issuecomment-988359778 https://api.github.com/repos/pydata/xarray/issues/4406 IC_kwDOAMm_X8466Sxi tasansal 13684161 2021-12-08T00:05:24Z 2021-12-08T00:06:22Z NONE

I am having a similar issue as well. Using latest versions of dask, xarray, distributed, fsspec, and gcsfs. I use h5netcdf backend because it is the only one that works with fsspec's binary stream, reading from cloud.

My workflow consists of: 1. Start dask client with 1 process per CPU, and 2 threads each. This is because it doesn't scale up reading from the cloud with threads. 2. Opening 12x monthly climate data (hourly sampled) using xarray.open_mfdataset 3. Using reasonable dask chunks in the open function 4. Take monthly average across time axis, and write to local NetCDF. 5. Repeate 2-4 for different years.

It is a hit or miss. It hangs towards the middle or end of a year. Next time I run it, it doesn't.

Once it hangs, and I hit stop, in the traceback it is stuck at await of threading lock.

Any ideas how to avoid this?

Things I tried: 1. Use processes only, 1 thread per worker 2. lock=True, lock=False on open_mfdataset 3. Dask scheduler as: spawn and forkserver 4. Different (but recent) versions of all the libraries

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Threading Lock issue with to_netcdf and Dask arrays 694112301
785432974 https://github.com/pydata/xarray/issues/4406#issuecomment-785432974 https://api.github.com/repos/pydata/xarray/issues/4406 MDEyOklzc3VlQ29tbWVudDc4NTQzMjk3NA== djhoese 1828519 2021-02-24T22:42:15Z 2021-02-24T22:42:15Z CONTRIBUTOR

I'm having a similar issue to what is described here, but I'm seeing it even when I'm not rewriting an output file (although it is an option in my code). I have a delayed function that is calling to_netcdf and seem to run into some race condition where I get the same deadlock as the original poster. It seems highly dependent on the number of dask tasks and the number of workers. I think I've gotten around it for now by having my delayed function return the Dataset it is working on and then calling to_dataset later. My problem is I have cases where I might not want to write the file so my delayed function returns None. To handle this I need to pre-compute my delayed functions before calling to_dataset since I don't think there is a way to pass something to to_dataset so it doesn't create a file.

With the original code it happened quite a bit but was part of a much larger application so I can't really get a MWE together. Just wanted to mention it here as another data point (to_netcdf inside a Delayed function may not work 100% of the time).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Threading Lock issue with to_netcdf and Dask arrays 694112301
691756409 https://github.com/pydata/xarray/issues/4406#issuecomment-691756409 https://api.github.com/repos/pydata/xarray/issues/4406 MDEyOklzc3VlQ29tbWVudDY5MTc1NjQwOQ== hansukyang 11863789 2020-09-14T01:03:54Z 2020-09-14T01:03:54Z NONE

Good point! Yes, after a bit of trial and error, this is what I did. Is there any limitation when over-writing an existing NetCDF file that hasn't been opened by xarray?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Threading Lock issue with to_netcdf and Dask arrays 694112301
691717920 https://github.com/pydata/xarray/issues/4406#issuecomment-691717920 https://api.github.com/repos/pydata/xarray/issues/4406 MDEyOklzc3VlQ29tbWVudDY5MTcxNzkyMA== shoyer 1217238 2020-09-13T20:01:57Z 2020-09-13T20:01:57Z MEMBER

was using xarray to manipulate NetCDF and re-write to it

We should probably document this more clearly, but opening and then rewriting the same file in xarray without closing the original file is not something xarray supports.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Threading Lock issue with to_netcdf and Dask arrays 694112301
691670151 https://github.com/pydata/xarray/issues/4406#issuecomment-691670151 https://api.github.com/repos/pydata/xarray/issues/4406 MDEyOklzc3VlQ29tbWVudDY5MTY3MDE1MQ== hansukyang 11863789 2020-09-13T13:15:30Z 2020-09-13T13:15:30Z NONE

For my case, I saw this happen only when I started to run xarray scripts with cron, about a month ago. I would run it once every six hours and every day or so, I would see a NetCDF file locked up. I ended up changing the work flow somewhat so I don't do this any more (was using xarray to manipulate NetCDF and re-write to it) but this was confusing me for quite a while.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Threading Lock issue with to_netcdf and Dask arrays 694112301
691096342 https://github.com/pydata/xarray/issues/4406#issuecomment-691096342 https://api.github.com/repos/pydata/xarray/issues/4406 MDEyOklzc3VlQ29tbWVudDY5MTA5NjM0Mg== bilelomrani1 16692099 2020-09-11T13:30:29Z 2020-09-11T13:30:29Z NONE

Did this work reliably in the past? If so, any clues about specific versions of dask and/or netCDF that cause the issue would be helpful.

I am working on my first project using Dask arrays in cunjunction with xarray so I cannot tell if previous combinations of versions worked. I tried downgrading dask down to v2.20 but the issue is still here.

This is just using Dask's threaded scheduler, right? I don't recall any changes there recently.

Without the option chunks={'time': 200} the previous snippet seems to work very reliably.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Threading Lock issue with to_netcdf and Dask arrays 694112301
691083939 https://github.com/pydata/xarray/issues/4406#issuecomment-691083939 https://api.github.com/repos/pydata/xarray/issues/4406 MDEyOklzc3VlQ29tbWVudDY5MTA4MzkzOQ== TomAugspurger 1312546 2020-09-11T13:07:00Z 2020-09-11T13:07:00Z MEMBER

@TomAugspurger do you know off-hand if there have been any recent changes in Dask's scheduler that could have caused this?

This is just using Dask's threaded scheduler, right? I don't recall any changes there recently.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Threading Lock issue with to_netcdf and Dask arrays 694112301
690922236 https://github.com/pydata/xarray/issues/4406#issuecomment-690922236 https://api.github.com/repos/pydata/xarray/issues/4406 MDEyOklzc3VlQ29tbWVudDY5MDkyMjIzNg== bekatd 6948919 2020-09-11T07:18:03Z 2020-09-11T07:18:03Z NONE

Did this work reliably in the past? If so, any clues about specific versions of dask and/or netCDF that cause the issue would be helpful.

@TomAugspurger do you know off-hand if there have been any recent changes in Dask's scheduler that could have caused this?

I am new to xarray dask thing but month ago it was woking without issues. I recently reinstalled python and dont know if versions differs from previous one

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Threading Lock issue with to_netcdf and Dask arrays 694112301
690912480 https://github.com/pydata/xarray/issues/4406#issuecomment-690912480 https://api.github.com/repos/pydata/xarray/issues/4406 MDEyOklzc3VlQ29tbWVudDY5MDkxMjQ4MA== shoyer 1217238 2020-09-11T06:53:58Z 2020-09-11T06:53:58Z MEMBER

Did this work reliably in the past? If so, any clues about specific versions of dask and/or netCDF that cause the issue would be helpful.

@TomAugspurger do you know off-hand if there have been any recent changes in Dask's scheduler that could have caused this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Threading Lock issue with to_netcdf and Dask arrays 694112301
690340047 https://github.com/pydata/xarray/issues/4406#issuecomment-690340047 https://api.github.com/repos/pydata/xarray/issues/4406 MDEyOklzc3VlQ29tbWVudDY5MDM0MDA0Nw== bekatd 6948919 2020-09-10T14:48:38Z 2020-09-10T14:48:38Z NONE

Using:

  • xarray=0.16.0
  • dask=2.25.0
  • netcdf4=1.5.4

I am experiencing same when trying to write netcdf file using to_netcdf() on a files opened via xr.open_mfdataset with lock=None (which is default).

Then I tried to open files with lock=False and it worked like a charm. Issue have been gone for 100% of times.

BUT

Now I am facing different issue. Seems that hdf5 IS NOT thread safe, since I encounter NetCDF: HDF error while applying different function on a netcdf files which were previously processed by another functions with lock=False. Script just terminates not even reaching any calculation step in the code. seems like lock=False works opposite and file is in a corrupted mode?

This is the BIGGEST issue and needs resolve ASAP

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Threading Lock issue with to_netcdf and Dask arrays 694112301
688101357 https://github.com/pydata/xarray/issues/4406#issuecomment-688101357 https://api.github.com/repos/pydata/xarray/issues/4406 MDEyOklzc3VlQ29tbWVudDY4ODEwMTM1Nw== hansukyang 11863789 2020-09-07T07:28:37Z 2020-09-07T07:46:39Z NONE

I seem to also have similar issue, running under docker/linux environment. It doesn't happen always, maybe once out of 4~5 times. Wondering if this is related to NetCDF/HDF5 file locking issue (https://support.nesi.org.nz/hc/en-gb/articles/360000902955-NetCDF-HDF5-file-locking).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Threading Lock issue with to_netcdf and Dask arrays 694112301

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 4813.711ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows