home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 302496987

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/1413#issuecomment-302496987 https://api.github.com/repos/pydata/xarray/issues/1413 302496987 MDEyOklzc3VlQ29tbWVudDMwMjQ5Njk4Nw== 1197350 2017-05-18T18:14:56Z 2017-05-18T18:15:34Z MEMBER

Let me expand on what this does.

Many netCDF datasets consist of multiple files with identical coordinates, except for one (e.g. time). With xarray we can open these datasets with open_mfdataset, which calls concat on the list of individual dataset objects. concat calls align, which loads all of the dimension indices (and, optionally, non-dimension coordinates) from each file and checks them for consistency / alignment.

This align step is potentially quite expensive for big collections of files with large indices. For example, an unstructured grid or particle-based dataset would just have a single dimension coordinate, with the same length as the data variables. If the user knows that the datasets are already aligned, this PR enables the alignment step to be skipped by passing the argument prealigned=True to concat. My goal is to avoid touching the disk as much as possible.

This PR is a draft in progress. I still need to propagate the prealigned argument up to auto_combine and open_mfdataset.

An alternative API would be to add another option to the coords keywork, i.e. coords='prealigned'.

Feedback welcome.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  229474101
Powered by Datasette · Queries took 0.442ms · About: xarray-datasette