github: issue_comments: 6 rows where issue = 504497403 sorted by updated

6 rows where issue = 504497403 sorted by updated_at descending

Search:

✖

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
540477057	https://github.com/pydata/xarray/issues/3386#issuecomment-540477057	https://api.github.com/repos/pydata/xarray/issues/3386	MDEyOklzc3VlQ29tbWVudDU0MDQ3NzA1Nw==	sipposip 42270910	2019-10-10T09:11:31Z	2019-10-10T09:11:31Z	NONE	@dcherian a dump of a single file: ``` ncdump -hs era5_mean_sea_level_pressure_2002.nc netcdf era5_mean_sea_level_pressure_2002 { dimensions: longitude = 1440 ; latitude = 721 ; time = 8760 ; variables: float longitude(longitude) ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; float latitude(latitude) ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; int time(time) ; time:units = "hours since 1900-01-01 00:00:00.0" ; time:long_name = "time" ; time:calendar = "gregorian" ; short msl(time, latitude, longitude) ; msl:scale_factor = 0.23025422306319 ; msl:add_offset = 99003.8223728885 ; msl:_FillValue = -32767s ; msl:missing_value = -32767s ; msl:units = "Pa" ; msl:long_name = "Mean sea level pressure" ; msl:standard_name = "air_pressure_at_mean_sea_level" ; // global attributes: :Conventions = "CF-1.6" ; :history = "2019-10-03 16:05:54 GMT by grib_to_netcdf-2.10.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -o /cache/data5/adaptor.mars.internal-1570117777.9045198-23871-11-c8564b6f-4db5-48d8-beab-ba9fef91d4e8.nc /cache/tmp/c8564b6f-4db5-48d8-beab-ba9fef91d4e8-adaptor.mars.internal-1570117777.905033-23871-3-tmp.grib" ; :_Format = "64-bit offset" ; } ``` @shoyer : thanks for the tip, I think that it indeed simply adding more data-loading threads is the best solution.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	add option to open_mfdataset for not using dask 504497403
540474492	https://github.com/pydata/xarray/issues/3386#issuecomment-540474492	https://api.github.com/repos/pydata/xarray/issues/3386	MDEyOklzc3VlQ29tbWVudDU0MDQ3NDQ5Mg==	crusaderky 6213168	2019-10-10T09:05:21Z	2019-10-10T09:05:21Z	MEMBER	@sipposip if your dask graph is resolved straight after the load from disk, you can try disabling the dask optimizer to see if you can squeeze some milliseconds out of load(). You can look up the setting syntax on the dask documentation.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	add option to open_mfdataset for not using dask 504497403
540208420	https://github.com/pydata/xarray/issues/3386#issuecomment-540208420	https://api.github.com/repos/pydata/xarray/issues/3386	MDEyOklzc3VlQ29tbWVudDU0MDIwODQyMA==	shoyer 1217238	2019-10-09T21:28:48Z	2019-10-09T21:28:48Z	MEMBER	netCDF4.MFDataset works on a much more restricted set of netCDF files than `xarray.open_mfdataset`. I'm not surprised it's a little bit faster, but I'm not sure it's worth the maintenance burden of supporting this separate code path. Making a fully featured version of open_mfdataset with dask would be challenging. Can you simply add more threads in TensorFlow/Keras for loading the data? My other suggestion is to pre-shuffle the data on disk, so you don't need random access inside your training loop.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	add option to open_mfdataset for not using dask 504497403
540033550	https://github.com/pydata/xarray/issues/3386#issuecomment-540033550	https://api.github.com/repos/pydata/xarray/issues/3386	MDEyOklzc3VlQ29tbWVudDU0MDAzMzU1MA==	dcherian 2448579	2019-10-09T14:43:29Z	2019-10-09T14:43:29Z	MEMBER	It would be useful to see what a single file looks like and what the combined dataset looks like. `open_mfdataset` can sometimes require some tuning to get good performance.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	add option to open_mfdataset for not using dask 504497403
539916279	https://github.com/pydata/xarray/issues/3386#issuecomment-539916279	https://api.github.com/repos/pydata/xarray/issues/3386	MDEyOklzc3VlQ29tbWVudDUzOTkxNjI3OQ==	sipposip 42270910	2019-10-09T09:20:06Z	2019-10-09T09:20:06Z	NONE	setting `dask.config.set(scheduler="synchronous")` globally indeed resolved the threading issues, thanks. However, loading and preprocessing a single timeslice of data is ~40 % slower with dask and open_mfdataset (with chunks={'time':1}) compared to netCDF4.MFDataset . Is this is expected/a known issue? If not, I can try to create a minimal reproducible example.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	add option to open_mfdataset for not using dask 504497403
539907822	https://github.com/pydata/xarray/issues/3386#issuecomment-539907822	https://api.github.com/repos/pydata/xarray/issues/3386	MDEyOklzc3VlQ29tbWVudDUzOTkwNzgyMg==	crusaderky 6213168	2019-10-09T08:58:21Z	2019-10-09T08:58:21Z	MEMBER	@sipposip xarray doesn't use netCDF4.MFDataset, but netCDF4.Dataset which is then wrapped by dask arrays which are then concatenated. Opening each file separately with open_dataset, and then concatenating them with xr.concat does not work, as this loads the data into memory. This is by design, because of the reason above. The NetCDF/HDF5 lazy loading means that data is loaded up into a numpy.ndarray on the first operation performed upon it. This includes concatenation. I'm aware that threads within threads, threads within processes, and processes within threads cause a world of pain in the form of random deadlocks - I've been there myself. You can completely disable dask threads process-wide with `python dask.config.set(scheduler="synchronous") ... ds.load()` or as a context manager `python with dask.config.set(scheduler="synchronous"): ds.load()` or for the single operation: `python ds.load(scheduler="synchronous")` Does this address your issue?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	add option to open_mfdataset for not using dask 504497403

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);