home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 1333650265 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • shoyer · 3 ✖

issue 1

  • `sel` behaving randomly when applying to a dataset with multiprocessing · 3 ✖

author_association 1

  • MEMBER 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1210976795 https://github.com/pydata/xarray/issues/6904#issuecomment-1210976795 https://api.github.com/repos/pydata/xarray/issues/6904 IC_kwDOAMm_X85ILgob shoyer 1217238 2022-08-10T16:43:36Z 2022-08-10T16:43:36Z MEMBER

You might look into different multiprocessing modes: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

It may also be that the NetCDF or HDF5 libraries were simply not written in a way that can support multi-processing. This would not surprise me.

BTW is there any advantage or difference in terms of cpu and memory consumption in opening the file only one or let it open by every process? I'm asking because I thought opening in every process was just plain stupid but it seems to perform exactly the same, so maybe I'm just creating a problem where there is none

I agree, maybe this isn't worth the trouble. I have not seen it done successfully before.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `sel` behaving randomly when applying to a dataset with multiprocessing 1333650265
1210255676 https://github.com/pydata/xarray/issues/6904#issuecomment-1210255676 https://api.github.com/repos/pydata/xarray/issues/6904 IC_kwDOAMm_X85IIwk8 shoyer 1217238 2022-08-10T07:10:41Z 2022-08-10T07:10:41Z MEMBER

Will that work in the same way if I still use process_map, which uses concurrent.futures under the hood?

Yes it should, as long as you're using multi-processing under the covers.

If you do multi-threading, then you would want to use threading.Lock(). But I believe we already apply a thread lock by default.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `sel` behaving randomly when applying to a dataset with multiprocessing 1333650265
1210233503 https://github.com/pydata/xarray/issues/6904#issuecomment-1210233503 https://api.github.com/repos/pydata/xarray/issues/6904 IC_kwDOAMm_X85IIrKf shoyer 1217238 2022-08-10T06:45:06Z 2022-08-10T06:45:06Z MEMBER

Can you try explicitly passing in a multiprocessing lock into the open_dataset() constructor? Something like: python from multiprocessing import Lock ds = xarray.open_dataset(file, lock=Lock())

(We automatically select appropriate locks if using Dask, but I'm not sure how we would do that more generally...)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `sel` behaving randomly when applying to a dataset with multiprocessing 1333650265

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 184.322ms · About: xarray-datasette