home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 1282452427

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/7181#issuecomment-1282452427 https://api.github.com/repos/pydata/xarray/issues/7181 1282452427 IC_kwDOAMm_X85McKvL 1486942 2022-10-18T14:03:40Z 2022-10-18T14:03:40Z CONTRIBUTOR

Our tests all pass, but there is something like a x20 slowdown, and it's basically entirely due to copies. It's plausible we're doing way too many copies as it is, but this is obviously still concerning.

I tried adding the following asv benchmark, based off the combine benchmark and the deepcopy tests:

```py class Copy: def setup(self): """Create 4 datasets with two different variables"""

    t_size, x_size, y_size = 50, 450, 400
    t = np.arange(t_size)
    data = np.random.randn(t_size, x_size, y_size)

    self.ds = xr.Dataset(
        {"A": xr.DataArray(data, coords={"T": t}, dims=("T", "X", "Y"))}
    )

def time_copy(self) -> None:
    copy(self.ds)

def time_deepcopy(self) -> None:
    deepcopy(self.ds)

def time_copy_method(self) -> None:
    self.ds.copy(deep=False)

def time_copy_method_deep(self) -> None:
    self.ds.copy(deep=True)

```

But I didn't see any regressions between v2022.06.0..HEAD, so that simplistic case is clearly not enough.

There are a few differences between our test datasets and the one in the benchmark above: - 4D vs 3D - smaller grid: (2, 5, 4, 3) vs (50, 450, 400) - more variables: ~20 vs 1 - variable attributes vs none - multiple files read in via open_mfdataset vs entirely in-memory

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1412895383
Powered by Datasette · Queries took 0.483ms · About: xarray-datasette