home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 1352310432

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/7368#issuecomment-1352310432 https://api.github.com/repos/pydata/xarray/issues/7368 1352310432 IC_kwDOAMm_X85Qmp6g 4160723 2022-12-14T22:33:23Z 2022-12-15T01:08:41Z MEMBER

I did some profiling to find the cause of the decrease in performance reported in the benchmarks (dataset creation). In summary, this is explained by a Coordinates object (built from the coords mapping) that is now included in objects to align when merging data vars and coordinates. Previously all non DataArray objects in the coords mapping were excluded from alignment (in deep_align). The introduced overhead comes from a call to Coordinates._reindex_callback(), which (I think?) should do no more than shallow copies and/or xarray wrapping stuff. In the benchmark report this is only marked as significant when creating small datasets (1.5-2x slower), and it becomes insignificant for datasets with more data variables.

Maybe there's some way to optimize that? I don't know if we can completely avoid it with the solution implemented in this PR, though. Promoting Coordinates is pretty clean and future proof IMO (assuming that we'll further refactor Coordinates to actually store variables and indexes, i.e., not as a proxy anymore). Is the (minor? temporary?) regression in performance acceptable and can we just leave it like that for now?

More details about the new workflow implemented in this PR when creating a new Dataset:

  • if Dataset's coords argument is a "simple" mapping, it is first internally converted into a Coordinates object, with the creation of default indexes for dimension coordinates
  • if one or more DataArray objects are given in coords, their coordinates (variables + indexes) are extracted and merged with the other input coordinates
  • see the implementation in xarray.core.coordinates.create_coords_with_default_indexes
  • otherwise, just reuse the Coordinates object passed as coords
  • coordinates are then merged with data variables
  • the Coordinates object is aligned with every other "alignable" object found in data_vars
  • coordinate indexes (if any) are passed explicitly to align so they are used in priority
  • explicitly using a Coordinates object skips the creation of default indexes during merging (in collect_variables_and_indexes())
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1485037066
Powered by Datasette · Queries took 0.642ms · About: xarray-datasette