pull_requests: 1715744126
This data as json
| id | node_id | number | state | locked | title | user | body | created_at | updated_at | closed_at | merged_at | merge_commit_sha | assignee | milestone | draft | head | base | author_association | auto_merge | repo | url | merged_by |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1715744126 | PR_kwDOAMm_X85mRC1- | 8717 | closed | 0 | Add lru_cache to named_array.utils.module_available and core.utils.module_available | 32731672 | Our application creates many small netcdf3 files: https://github.com/equinor/ert/blob/9c2b60099a54eeb5bb40013acef721e30558a86c/src/ert/storage/local_ensemble.py#L593 . A significant time in xarray.backends.common.py:AbstractWriteableDataStore.set_variables is spent on common.py:is_dask_collection as it checks for the presence of the module dask which takes about 0.3 ms. This time becomes significant in the case of many small files. This PR uses lru_cache to avoid rechecking for the presence of dask as it should not change for the lifetime of the application. In one stress test we called dataset.py:2201(to_netcdf) 13634 times which took 82.27 seconds, of which 46.8 seconds was spent on utils.py:1162(module_available). With the change in this PR, the same test spends only 50s on to_netcdf . Generally, under normal load, a session in our application will call to_netcdf ~1000 times, but 10 000 happens. - [ ] Closes #xxxx - [ ] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` | 2024-02-07T14:01:35Z | 2024-02-26T11:23:04Z | 2024-02-07T16:26:12Z | 2024-02-07T16:26:12Z | 0f7a0342ce3dea9a011543469372ad782ec4aba2 | 0 | e004bc3e3583f037133d54ddf0f800a306333c52 | f33a632bf87ec29dd9346f9b01ad4eec2194f72a | CONTRIBUTOR | 13221727 | https://github.com/pydata/xarray/pull/8717 |
Links from other tables
- 1 row from pull_requests_id in labels_pull_requests