github: issue_comments: 6 rows where issue = 94328498 and user = 743508 sorted by updated

6 rows where issue = 94328498 and user = 743508 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
223918870	https://github.com/pydata/xarray/issues/463#issuecomment-223918870	https://api.github.com/repos/pydata/xarray/issues/463	MDEyOklzc3VlQ29tbWVudDIyMzkxODg3MA==	mangecoeur 743508	2016-06-06T10:09:48Z	2016-06-06T10:09:48Z	CONTRIBUTOR	So using a cleaner minimal example it does appear that the files are closed after the dataset is closed. However, they are all open during dataset loading - this is what blows past the OSX default max open file limit. I think this could be a real issue when using Xarray to handle too-big-for-ram datasets - you could easily be trying to access 1000s of files (especially with weather data), so Xarray should limit the number it holds open at any one time during data load. Not being familiar with the internals I'm not sure if this is an issue in Xarray itself or in the Dask backend.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset too many files 94328498
223905394	https://github.com/pydata/xarray/issues/463#issuecomment-223905394	https://api.github.com/repos/pydata/xarray/issues/463	MDEyOklzc3VlQ29tbWVudDIyMzkwNTM5NA==	mangecoeur 743508	2016-06-06T09:06:33Z	2016-06-06T09:06:33Z	CONTRIBUTOR	@shoyer thanks - here's how i'm using mfdataset - not using any options. I'm going to try using the `h5netcdf` backend to see if I get the same results. I'm still not 100% confident that I'm tracking open files correctly with `lsof` so I'm going to try to make a minimal example to investigate. ``` python def weather_dataset(root_path: Path, *, start_date: datetime = None, end_date: datetime = None): flat_files_paths = get_dset_file_paths(root_path, start_date=start_date, end_date=end_date) # Convert Paths to list of strings for xarray dataset = xr.open_mfdataset([str(f) for f in flat_files_paths]) return dataset def cfsr_weather_loader(db, site_lookup_fn=None, dset_start=None, dset_end=None, site_conf=None): # Pull values out of the dt_conf = site_conf if site_conf else WEATHER_CFSR dset_start = dset_start if dset_start else dt_conf['start_dt'] dset_end = dset_end if dset_end else dt_conf['end_dt'] if site_lookup_fn is None: site_lookup_fn = site_lookup_postcode_district def weather_loader(site_id, start_date, end_date, resample=None): # using the tuple because always getting mixed up with lon/lat geo_lookup = site_lookup_fn(site_id, db) # With statement should ensure dset is closed after loading. with weather_dataset(WEATHER_CFSR['path'], start_date=dset_start, end_date=dset_end) as weather: data = weighted_regional_timeseries(weather, start_date, end_date, lon=geo_lookup.lon, lat=geo_lookup.lat, weights=geo_lookup.weights) # RENAME from CFSR standard data = data.rename(columns=WEATHER_RENAME) if resample is not None: data = data.resample(resample).mean() data.irradiance /= 1000.0 # convert irradiance to kW return data return weather_loader ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset too many files 94328498
223837612	https://github.com/pydata/xarray/issues/463#issuecomment-223837612	https://api.github.com/repos/pydata/xarray/issues/463	MDEyOklzc3VlQ29tbWVudDIyMzgzNzYxMg==	mangecoeur 743508	2016-06-05T21:05:40Z	2016-06-05T21:05:40Z	CONTRIBUTOR	So on investigation, even though my dataset creation is wrapped in a `with` block, using lsof to check the file handles held by my iPython kernel suggests that all the input files are still open. Are you certain that the backend correctly closes files in a multifile dataset? Is there a way to explicitly force this to happen?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset too many files 94328498
223810723	https://github.com/pydata/xarray/issues/463#issuecomment-223810723	https://api.github.com/repos/pydata/xarray/issues/463	MDEyOklzc3VlQ29tbWVudDIyMzgxMDcyMw==	mangecoeur 743508	2016-06-05T12:34:11Z	2016-06-05T12:34:11Z	CONTRIBUTOR	I still hit this issue after wrapping my open_mfdataset in a with statement. I'm suspecting to be an OSX problem, MacOS has a very low default max-open-files limit for applications started from the shell (like 256). It's not yet clear to me whether my datasets are being correctly closed, investigating...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset too many files 94328498
223687053	https://github.com/pydata/xarray/issues/463#issuecomment-223687053	https://api.github.com/repos/pydata/xarray/issues/463	MDEyOklzc3VlQ29tbWVudDIyMzY4NzA1Mw==	mangecoeur 743508	2016-06-03T20:31:56Z	2016-06-03T20:31:56Z	CONTRIBUTOR	It seems to happen even with a freshly restarted notebook, but I'll try a with statement to see if helps. On 3 Jun 2016 19:53, "Stephan Hoyer" notifications@github.com wrote: I suspect you hit this in IPython after rerunning cells, because file handles are only automatically closed when programs exit. You might find it a good idea to explicitly close files by calling .close() (or using a "with" statement) on Datasets opened with open_mfdataset. On Fri, Jun 3, 2016 at 11:08 AM, mangecoeur notifications@github.com wrote: I'm also running into this error - but strangely it only happens when using IPython interactive backend. I have some tests which work fine, but doing the same in IPython fails. I'm opening a few hundred files (about 10Mb each, one per month across a few variables). I'm using the default NetCDF backend. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/463#issuecomment-223651454, or mute the thread < https://github.com/notifications/unsubscribe/ABKS1sOTvuTtWVVFM7tnP7tnuGKvI-MBks5qIG2YgaJpZM4FWKen . — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/463#issuecomment-223663026, or mute the thread https://github.com/notifications/unsubscribe/AAtYVCtspqRb0AXy1ilbgoRuZN_syEDvks5qIHglgaJpZM4FWKen .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset too many files 94328498
223651454	https://github.com/pydata/xarray/issues/463#issuecomment-223651454	https://api.github.com/repos/pydata/xarray/issues/463	MDEyOklzc3VlQ29tbWVudDIyMzY1MTQ1NA==	mangecoeur 743508	2016-06-03T18:08:24Z	2016-06-03T18:08:24Z	CONTRIBUTOR	I'm also running into this error - but strangely it only happens when using IPython interactive backend. I have some tests which work fine, but doing the same in IPython fails. I'm opening a few hundred files (about 10Mb each, one per month across a few variables). I'm using the default NetCDF backend.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset too many files 94328498

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);