issue_comments
6 rows where issue = 94328498 and user = 743508 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- open_mfdataset too many files · 6 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
223918870 | https://github.com/pydata/xarray/issues/463#issuecomment-223918870 | https://api.github.com/repos/pydata/xarray/issues/463 | MDEyOklzc3VlQ29tbWVudDIyMzkxODg3MA== | mangecoeur 743508 | 2016-06-06T10:09:48Z | 2016-06-06T10:09:48Z | CONTRIBUTOR | So using a cleaner minimal example it does appear that the files are closed after the dataset is closed. However, they are all open during dataset loading - this is what blows past the OSX default max open file limit. I think this could be a real issue when using Xarray to handle too-big-for-ram datasets - you could easily be trying to access 1000s of files (especially with weather data), so Xarray should limit the number it holds open at any one time during data load. Not being familiar with the internals I'm not sure if this is an issue in Xarray itself or in the Dask backend. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset too many files 94328498 | |
223905394 | https://github.com/pydata/xarray/issues/463#issuecomment-223905394 | https://api.github.com/repos/pydata/xarray/issues/463 | MDEyOklzc3VlQ29tbWVudDIyMzkwNTM5NA== | mangecoeur 743508 | 2016-06-06T09:06:33Z | 2016-06-06T09:06:33Z | CONTRIBUTOR | @shoyer thanks - here's how i'm using mfdataset - not using any options. I'm going to try using the ``` python def weather_dataset(root_path: Path, *, start_date: datetime = None, end_date: datetime = None): flat_files_paths = get_dset_file_paths(root_path, start_date=start_date, end_date=end_date) # Convert Paths to list of strings for xarray dataset = xr.open_mfdataset([str(f) for f in flat_files_paths]) return dataset def cfsr_weather_loader(db, site_lookup_fn=None, dset_start=None, dset_end=None, site_conf=None): # Pull values out of the dt_conf = site_conf if site_conf else WEATHER_CFSR dset_start = dset_start if dset_start else dt_conf['start_dt'] dset_end = dset_end if dset_end else dt_conf['end_dt']
``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset too many files 94328498 | |
223837612 | https://github.com/pydata/xarray/issues/463#issuecomment-223837612 | https://api.github.com/repos/pydata/xarray/issues/463 | MDEyOklzc3VlQ29tbWVudDIyMzgzNzYxMg== | mangecoeur 743508 | 2016-06-05T21:05:40Z | 2016-06-05T21:05:40Z | CONTRIBUTOR | So on investigation, even though my dataset creation is wrapped in a |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset too many files 94328498 | |
223810723 | https://github.com/pydata/xarray/issues/463#issuecomment-223810723 | https://api.github.com/repos/pydata/xarray/issues/463 | MDEyOklzc3VlQ29tbWVudDIyMzgxMDcyMw== | mangecoeur 743508 | 2016-06-05T12:34:11Z | 2016-06-05T12:34:11Z | CONTRIBUTOR | I still hit this issue after wrapping my open_mfdataset in a with statement. I'm suspecting to be an OSX problem, MacOS has a very low default max-open-files limit for applications started from the shell (like 256). It's not yet clear to me whether my datasets are being correctly closed, investigating... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset too many files 94328498 | |
223687053 | https://github.com/pydata/xarray/issues/463#issuecomment-223687053 | https://api.github.com/repos/pydata/xarray/issues/463 | MDEyOklzc3VlQ29tbWVudDIyMzY4NzA1Mw== | mangecoeur 743508 | 2016-06-03T20:31:56Z | 2016-06-03T20:31:56Z | CONTRIBUTOR | It seems to happen even with a freshly restarted notebook, but I'll try a with statement to see if helps. On 3 Jun 2016 19:53, "Stephan Hoyer" notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset too many files 94328498 | |
223651454 | https://github.com/pydata/xarray/issues/463#issuecomment-223651454 | https://api.github.com/repos/pydata/xarray/issues/463 | MDEyOklzc3VlQ29tbWVudDIyMzY1MTQ1NA== | mangecoeur 743508 | 2016-06-03T18:08:24Z | 2016-06-03T18:08:24Z | CONTRIBUTOR | I'm also running into this error - but strangely it only happens when using IPython interactive backend. I have some tests which work fine, but doing the same in IPython fails. I'm opening a few hundred files (about 10Mb each, one per month across a few variables). I'm using the default NetCDF backend. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset too many files 94328498 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1