issue_comments
6 rows where issue = 504497403 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- add option to open_mfdataset for not using dask · 6 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
540477057 | https://github.com/pydata/xarray/issues/3386#issuecomment-540477057 | https://api.github.com/repos/pydata/xarray/issues/3386 | MDEyOklzc3VlQ29tbWVudDU0MDQ3NzA1Nw== | sipposip 42270910 | 2019-10-10T09:11:31Z | 2019-10-10T09:11:31Z | NONE | @dcherian a dump of a single file: ``` ncdump -hs era5_mean_sea_level_pressure_2002.nc netcdf era5_mean_sea_level_pressure_2002 { dimensions: longitude = 1440 ; latitude = 721 ; time = 8760 ; variables: float longitude(longitude) ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; float latitude(latitude) ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; int time(time) ; time:units = "hours since 1900-01-01 00:00:00.0" ; time:long_name = "time" ; time:calendar = "gregorian" ; short msl(time, latitude, longitude) ; msl:scale_factor = 0.23025422306319 ; msl:add_offset = 99003.8223728885 ; msl:_FillValue = -32767s ; msl:missing_value = -32767s ; msl:units = "Pa" ; msl:long_name = "Mean sea level pressure" ; msl:standard_name = "air_pressure_at_mean_sea_level" ; // global attributes: :Conventions = "CF-1.6" ; :history = "2019-10-03 16:05:54 GMT by grib_to_netcdf-2.10.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -o /cache/data5/adaptor.mars.internal-1570117777.9045198-23871-11-c8564b6f-4db5-48d8-beab-ba9fef91d4e8.nc /cache/tmp/c8564b6f-4db5-48d8-beab-ba9fef91d4e8-adaptor.mars.internal-1570117777.905033-23871-3-tmp.grib" ; :_Format = "64-bit offset" ; } ``` @shoyer : thanks for the tip, I think that it indeed simply adding more data-loading threads is the best solution. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
add option to open_mfdataset for not using dask 504497403 | |
540474492 | https://github.com/pydata/xarray/issues/3386#issuecomment-540474492 | https://api.github.com/repos/pydata/xarray/issues/3386 | MDEyOklzc3VlQ29tbWVudDU0MDQ3NDQ5Mg== | crusaderky 6213168 | 2019-10-10T09:05:21Z | 2019-10-10T09:05:21Z | MEMBER | @sipposip if your dask graph is resolved straight after the load from disk, you can try disabling the dask optimizer to see if you can squeeze some milliseconds out of load(). You can look up the setting syntax on the dask documentation. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
add option to open_mfdataset for not using dask 504497403 | |
540208420 | https://github.com/pydata/xarray/issues/3386#issuecomment-540208420 | https://api.github.com/repos/pydata/xarray/issues/3386 | MDEyOklzc3VlQ29tbWVudDU0MDIwODQyMA== | shoyer 1217238 | 2019-10-09T21:28:48Z | 2019-10-09T21:28:48Z | MEMBER | netCDF4.MFDataset works on a much more restricted set of netCDF files than Can you simply add more threads in TensorFlow/Keras for loading the data? My other suggestion is to pre-shuffle the data on disk, so you don't need random access inside your training loop. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
add option to open_mfdataset for not using dask 504497403 | |
540033550 | https://github.com/pydata/xarray/issues/3386#issuecomment-540033550 | https://api.github.com/repos/pydata/xarray/issues/3386 | MDEyOklzc3VlQ29tbWVudDU0MDAzMzU1MA== | dcherian 2448579 | 2019-10-09T14:43:29Z | 2019-10-09T14:43:29Z | MEMBER | It would be useful to see what a single file looks like and what the combined dataset looks like. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
add option to open_mfdataset for not using dask 504497403 | |
539916279 | https://github.com/pydata/xarray/issues/3386#issuecomment-539916279 | https://api.github.com/repos/pydata/xarray/issues/3386 | MDEyOklzc3VlQ29tbWVudDUzOTkxNjI3OQ== | sipposip 42270910 | 2019-10-09T09:20:06Z | 2019-10-09T09:20:06Z | NONE | setting
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
add option to open_mfdataset for not using dask 504497403 | |
539907822 | https://github.com/pydata/xarray/issues/3386#issuecomment-539907822 | https://api.github.com/repos/pydata/xarray/issues/3386 | MDEyOklzc3VlQ29tbWVudDUzOTkwNzgyMg== | crusaderky 6213168 | 2019-10-09T08:58:21Z | 2019-10-09T08:58:21Z | MEMBER | @sipposip xarray doesn't use netCDF4.MFDataset, but netCDF4.Dataset which is then wrapped by dask arrays which are then concatenated.
This is by design, because of the reason above. The NetCDF/HDF5 lazy loading means that data is loaded up into a numpy.ndarray on the first operation performed upon it. This includes concatenation. I'm aware that threads within threads, threads within processes, and processes within threads cause a world of pain in the form of random deadlocks - I've been there myself.
You can completely disable dask threads process-wide with
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
add option to open_mfdataset for not using dask 504497403 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 4