issues: 429511994
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
429511994 | MDU6SXNzdWU0Mjk1MTE5OTQ= | 2867 | Very slow coordinate assignment with dask array | 14314623 | closed | 0 | 7 | 2019-04-04T22:36:57Z | 2019-12-19T17:28:10Z | 2019-12-17T16:21:23Z | CONTRIBUTOR | I am trying to reconstruct vertical cell depth from a z-star ocean model. This involves a few operations involving both dimensions and coordinates of a dataset like this:
The problematic step is when I assign the calculated dask.arrays to the original dataset.
This happens in a function like this.
This takes very long compared to a version where I assign the values as data variables:
I have profiled my more complex code involving this function and it seems like there is a substantial increase in calls to Profile output of the first version (assigning coordinates)
27662983 function calls (26798524 primitive calls) in 71.940 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
268632 46.914 0.000 46.914 0.000 {method 'acquire' of '_thread.lock' objects}
438 4.296 0.010 4.296 0.010 {method 'read' of '_io.BufferedReader' objects}
76883 1.909 0.000 1.939 0.000 local.py:240(release_data)
144 1.489 0.010 4.519 0.031 rechunk.py:514(_compute_rechunk)
...
For the second version (assigning data variables)
12928834 function calls (12489174 primitive calls) in 16.554 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
438 3.841 0.009 3.841 0.009 {method 'read' of '_io.BufferedReader' objects}
9492 3.675 0.000 3.675 0.000 {method 'acquire' of '_thread.lock' objects}
144 1.673 0.012 4.712 0.033 rechunk.py:514(_compute_rechunk)
...
Does anyone have a feel for why this could happen or how I could refine my testing to get to the bottom of this? Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2867/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |