home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1322491028

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1322491028 I_kwDOAMm_X85O05yU 6850 Slow lazy performance on cloud data 31974425 closed 0     3 2022-07-29T17:05:31Z 2022-09-12T18:39:05Z 2022-09-12T18:39:04Z NONE      

Hi, I am not sure if this is the place to raise my issue but I'd appreciate any help!

I am trying to do a more complicated calculation with CESM cloud data (on pangeo cloud deployment) and am running into an issue on a simpler calculation as part of the workflow. In the process of taking the derivative the cell takes a very long time to run when differencing - even though this step is not even computing anything. It should run quickly but as you can see from the screen shot, the cell takes a long time to run. It shows runtime is ~20s but wall time is much longer (~2min). This becomes a serious issue when trying to take the derivative of multiple variables part of a larger workflow. @jbusecke and I replicated the differencing problem on a randomized dask dataset and, as you can see, the cell takes a much quicker time to run. Below I have pasted reproducible code that isolates the problem. I am not sure how to proceed on fixing this slow performance and would appreciate your help, thanks!

``` import xarray as xr import numpy as np import dask.array as dsa import pop_tools from xgcm import Grid import xgcm from intake import open_catalog

Dask sample dataset

test_values = dsa.random.random((14695, 2400, 3600), chunks=(1, 2400, 3600)) da_sample = xr.DataArray(test_values, dims=['time', 'x', 'y']) da_sample_u = xr.DataArray(test_values, dims=['time', 'x_u', 'y_u']) ds_sample = xr.Dataset(data_vars=dict(test_values=da_sample, u=da_sample_u))

%timeit ds_sample.pad({'nlon':(2,2)}).diff('nlon')

Original dataset

url = "https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/ocean/CESM_POP.yaml" cat = open_catalog(url) ds = cat["CESM_POP_hires_control"].to_dask() ds = ds.drop([d for d in ds.dims if d in ds.coords])

%timeit ds.pad({'nlon':(2,2)}).diff('nlon') ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6850/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 0.691ms · About: xarray-datasette