home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 902622057

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
902622057 MDU6SXNzdWU5MDI2MjIwNTc= 5381 concat() with compat='no_conflicts' on dask arrays has accidentally quadratic runtime 1217238 open 0     0 2021-05-26T16:12:06Z 2022-04-19T03:48:27Z   MEMBER      

This ends up calling fillna() in a loop inside xarray.core.merge.unique_variable(), something like: python out = variables[0] for var in variables[1:]: out = out.fillna(var) https://github.com/pydata/xarray/blob/55e5b5aaa6d9c27adcf9a7cb1f6ac3bf71c10dea/xarray/core/merge.py#L147-L149

This has quadratic behavior if the variables are stored in dask arrays (the dask graph gets one element larger after each loop iteration). This is OK for merge() (which typically only has two arguments) but is problematic for dealing with variables that shouldn't be concatenated inside concat(), which should be able to handle very long lists of arguments.

I encountered this because compat='no_conflicts' is the default for xarray.combine_nested().

I guess there's also the related issue which is that even if we produced the output dask graph by hand without a loop, it still wouldn't be easy to evaluate for a large number of elements. Ideally we would use some sort of tree-reduction to ensure the operation can be parallelized.

xref https://github.com/google/xarray-beam/pull/13

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5381/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.637ms · About: xarray-datasette