home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 775502974

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
775502974 MDU6SXNzdWU3NzU1MDI5NzQ= 4738 ENH: Compute hash of xarray objects 13301940 open 0     11 2020-12-28T17:18:57Z 2023-12-06T18:24:59Z   MEMBER      

Is your feature request related to a problem? Please describe.

I'm working on some caching/data-provenance functionality for xarray objects, and I realized that there's no standard/efficient way of computing hashes for xarray objects.

Describe the solution you'd like

It would be useful to have a configurable, reliable/standard .hexdigest() method on xarray objects. For example, zarr provides a digest method that returns you a digest/hash of the data.

```python In [16]: import zarr

In [17]: z = zarr.zeros(shape=(10000, 10000), chunks=(1000, 1000))

In [18]: z.hexdigest() # uses sha1 by default for speed Out[18]: '7162d416d26a68063b66ed1f30e0a866e4abed60'

In [20]: z.hexdigest(hashname='sha256') Out[20]: '46fc6e52fc1384e37cead747075f55201667dd539e4e72d0f372eb45abdcb2aa' ```

I'm thinking that an xarray's built-in hashing mechanism would provide a more reliable way to treat metadata such as global attributes, encoding, etc... during the hash computation...

Describe alternatives you've considered

So far, I am using joblib's default hasher: joblib.hash() function. However, I am in favor of having a configurable/built-in hasher that is aware of xarray's data model and quirks :)

```python In [1]: import joblib

In [2]: import xarray as xr

In [3]: ds = xr.tutorial.open_dataset('rasm')

In [5]: joblib.hash(ds, hash_name='sha1') Out[5]: '3e5e3f56daf81e9e04a94a3dff9fdca9638c36cf'

In [8]: ds.attrs = {}

In [9]: joblib.hash(ds, hash_name='sha1') Out[9]: 'daab25fe735657e76514040608fadc67067d90a0' ```

Additional context Add any other context about the feature request here.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4738/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 10 rows from issue in issue_comments
Powered by Datasette · Queries took 0.566ms · About: xarray-datasette