home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 94328498 and user = 1197350 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date)

user 1

  • rabernat · 5 ✖

issue 1

  • open_mfdataset too many files · 5 ✖

author_association 1

  • MEMBER 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
120668247 https://github.com/pydata/xarray/issues/463#issuecomment-120668247 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDEyMDY2ODI0Nw== rabernat 1197350 2015-07-11T23:01:38Z 2015-07-11T23:01:38Z MEMBER

8 MB. This is daily satellite data, with one file per time point. (Most satellite data is distributed this way.)

There are many other workarounds to this problem. You can try to increase your ulimits. Or you can join these small netcdf files together into a big one. I had daily data files, and I used NCO to concatentate them into monthly files. That basically solved my problem. But of course that involves going out of xray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
120662901 https://github.com/pydata/xarray/issues/463#issuecomment-120662901 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDEyMDY2MjkwMQ== rabernat 1197350 2015-07-11T21:37:42Z 2015-07-11T21:37:42Z MEMBER

I came up with a solution for this, but it is so slow that it is useless.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
120449743 https://github.com/pydata/xarray/issues/463#issuecomment-120449743 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDEyMDQ0OTc0Mw== rabernat 1197350 2015-07-10T16:19:15Z 2015-07-10T16:19:15Z MEMBER

Ok, I will have a look at this. I would be happy to contribute to this awesome project.

By the way, by monitoring /proc, I was able to see that the scipy backend actually opens each file TWICE, exacerbating the problem.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
120446569 https://github.com/pydata/xarray/issues/463#issuecomment-120446569 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDEyMDQ0NjU2OQ== rabernat 1197350 2015-07-10T16:08:48Z 2015-07-10T16:08:48Z MEMBER

I am using the scipy backend because the netcdf4 backend doesn't work for me at all. It core dumps with the error

python: posixio.c:366: px_rel: Assertion `pxp->bf_offset <= offset && offset < pxp->bf_offset + (off_t) pxp->bf_extent' failed. Aborted (core dumped)

Are you suggesting I work on the scipy backend?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
120442769 https://github.com/pydata/xarray/issues/463#issuecomment-120442769 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDEyMDQ0Mjc2OQ== rabernat 1197350 2015-07-10T15:53:48Z 2015-07-10T15:53:48Z MEMBER

Just a little follow up...I tried to work around the file limit by serializing the processing of the files and creating xray datasets with with fewer files in them. However, I still eventually hit this error, suggesting that the files are never being closed. For example

I would like to do

python ds = xray.open_mfdataset(ddir + '*.nc' % yr, engine='scipy') EKE = (ds.variables['u']**2 + ds.variables['v']**2).mean(dim='time').load()

This tries to open 8031 files and produces the error: [Errno 24] Too many open files

So then I try to create a new dataset for each year

python EKE = [] for yr in xrange(1993,2015): print yr # this opens about 365 files ds = xray.open_mfdataset(ddir + '/dt_global_allsat_msla_uv_%04d*.nc' % yr, engine='scipy') EKE.append((ds.variables['u']**2 + ds.variables['v']**2).mean(dim='time').load())

This works okay for the first two years. However, by the third year, I still get the error: [Errno 24] Too many open files. This is when the ulimit of 1024 files is exceeded.

Using xray version 0.5.1 via conda module.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 2639.119ms · About: xarray-datasette