issue_comments
3 rows where issue = 224553135 and user = 53343824 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- slow performance with open_mfdataset · 3 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
781407863 | https://github.com/pydata/xarray/issues/1385#issuecomment-781407863 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDc4MTQwNzg2Mw== | jameshalgren 53343824 | 2021-02-18T15:06:13Z | 2021-02-18T15:06:13Z | NONE |
Indeed @dcherian -- it took some experimentation to get the right engine to support parallel execution and even then, results are still mixed, which, to me, means further work is needed to isolate the issue. Along the lines of suggestions here (thanks @jmccreight for pointing this out), we've introduced a very practical pre-processing step to rewrite the datasets so that the read is not striped across the file system, effectively isolating the performance bottleneck to a position where it can be dealt with independently. Of course, such an asynchronous workflow is not possible in all situations, so we're still looking at improving the direct performance. Two notes as we keep working: - The preprocessor. Reading and re-manipulating an individual dataset is lightning fast. We saw that a small change or adjustment in the individual files, made with a preprocessor, made the multi-file read massively faster. - The "more sophisticated example" referenced here has proven to be very useful. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
756922963 | https://github.com/pydata/xarray/issues/1385#issuecomment-756922963 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDc1NjkyMjk2Mw== | jameshalgren 53343824 | 2021-01-08T18:26:44Z | 2021-01-08T18:34:49Z | NONE | @dcherian We had looked at a number of options. In the end, the best performance I could achieve was with the work-around pre-processor script, rather than any of the built-in options. It's worth noting that a major part of the slowdown we were experiencing was from the dataframe transform option we were doing after reading the files. Once that was fixed, performance was much better, but not necessarily with any of the expected options. This script reading one-day's worth of NWM q_laterals runs in about 8 seconds (on Cheyenne). If you change the globbing pattern to include a full month, it takes about 380 seconds. setting We are reading everything into memory, which negates the lazy-access benefits of using a dataset and our next steps include looking into that. 300 seconds to read a month isn't totally unacceptable, but we'd like it be faster for the operational runs we'll eventually be doing -- for longer simulations, we may be able to achieve some improvement with asynchronous data access. We'll keep looking into it. (We'll start by trying to adapt the "slightly more sophisticated example" under the docs you referenced here...) Thanks (for the great package and for getting back on this question!) ``` python /glade/scratch/halgren/qlat_mfopen_test.pyimport time import xarray as xr import pandas as pd def get_ql_from_wrf_hydro_mf( qlat_files, index_col="feature_id", value_col="q_lateral" ): """ qlat_files: globbed list of CHRTOUT files containing desired lateral inflows index_col: column/field in the CHRTOUT files with the segment/link id value_col: column/field in the CHRTOUT files with the lateral inflow value
def drop_all_coords(ds): return ds.reset_coords(drop=True) def main():
if name == "main": main() ``` @groutr, @jmccreight |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
756364564 | https://github.com/pydata/xarray/issues/1385#issuecomment-756364564 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDc1NjM2NDU2NA== | jameshalgren 53343824 | 2021-01-07T20:28:32Z | 2021-01-07T20:28:32Z | NONE | @rabernat Is test dataset you mention still somewhere on Cheyenne -- we're seeing a general slowness processing multifile netcdf output from the National Water Model (our project here: NOAA-OWP/t-route) and we would like to see how things compare to your mini-benchmark test. cc @groutr
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1