home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 324735578

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1525#issuecomment-324735578 https://api.github.com/repos/pydata/xarray/issues/1525 324735578 MDEyOklzc3VlQ29tbWVudDMyNDczNTU3OA== 306380 2017-08-24T19:37:27Z 2017-08-24T19:37:27Z MEMBER

To be explicit, by default da.from_array currently names arrays by hashing all of the data within them. This can be somewhat slow depending on what hashing libraries you have on your machine, generally something like 500-1000 MB/s. This buys you a deterministic name for your array. If someone else with the exact same data does the exact same operations that then Dask can track that and avoid repeated work.

So you have to choose:

  1. Avoid repeated work
  2. Avoid hashing data

The choice really depends on how often you plan to repeat the same computation on the same data that comes from the same numpy array. If you only ever call a.chunk(...) once per array then there is no reason to hash.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  252707680
Powered by Datasette · Queries took 5.02ms · About: xarray-datasette