home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1423948375

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1423948375 I_kwDOAMm_X85U37pX 7224 Insertion speed of new dataset elements 90008 open 0     3 2022-10-26T12:34:51Z 2022-10-29T22:39:39Z   CONTRIBUTOR      

What is your issue?

In https://github.com/pydata/xarray/pull/7221 I showed that a major contributor the slowdown in inserting a new element was the cost associated with an internal only debugging assert statement.

The benchmarks results 7221 and 7222 are pretty useful to look at.

Thank you for encouraging the creation of a "benchmark" so that we can monitor the performance of element insertion.

Unfortunately, that was the only "free" lunch I got.

A few other minor improvements can be obtained with: https://github.com/pydata/xarray/pull/7222

However, it seems to me that the fundamental reason this is "slow" is because element insertion is not so much "insertion" as it is: * Dataset Merge * Dataset Replacement of the internal methods.

This is really solidified in the https://github.com/pydata/xarray/blob/main/xarray/core/dataset.py#L4918

In my benchmarks, I found that in the limit of large datasets, list comprehensions of 1000 elements or more were often used to "search" for variables that were "indexed" https://github.com/pydata/xarray/blob/ca57e5cd984e626487636628b1d34dca85cc2e7c/xarray/core/merge.py#L267

I think a few speedsups can be obtained by avoiding these kinds of "searches" and list comprehensions. However, I think that the dataset would have to provide this kind of information to the merge_core routine, instead of the merge_core routine recreating it all the time.

Ultimately, I think you trade off "memory footprint" (due to the potential increase of datastructures you keep around) of a dataset, and "speed".

Anyway, I just wanted to share where I got.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7224/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 0.719ms · About: xarray-datasette