issue_comments
10 rows where issue = 775502974 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- ENH: Compute hash of xarray objects · 10 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1139544716 | https://github.com/pydata/xarray/issues/4738#issuecomment-1139544716 | https://api.github.com/repos/pydata/xarray/issues/4738 | IC_kwDOAMm_X85D7BKM | LunarLanding 4441338 | 2022-05-27T11:48:14Z | 2022-05-27T11:48:14Z | NONE |
@andersy005 This runs with not issues atm.
With:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
ENH: Compute hash of xarray objects 775502974 | |
998992357 | https://github.com/pydata/xarray/issues/4738#issuecomment-998992357 | https://api.github.com/repos/pydata/xarray/issues/4738 | IC_kwDOAMm_X847i2nl | andersy005 13301940 | 2021-12-21T18:14:15Z | 2021-12-21T18:14:15Z | MEMBER | Okay... I think the following comment is still valid:
It appears that the deterministic behavior of the tokenization process is affected depending on whether the dataset/datarray contains non-dimension coordinates or dimension coordinates
```python In [39]: a = ds.isel(time=0) In [40]: a Out[40]: <xarray.Dataset> Dimensions: (y: 205, x: 275) Coordinates: time object 1980-09-16 12:00:00 xc (y, x) float64 189.2 189.4 189.6 189.7 ... 17.65 17.4 17.15 16.91 yc (y, x) float64 16.53 16.78 17.02 17.27 ... 28.26 28.01 27.76 27.51 Dimensions without coordinates: y, x Data variables: Tair (y, x) float64 ... In [41]: dask.base.tokenize(a) == dask.base.tokenize(a) Out[41]: True ``` ```python In [42]: b = ds.isel(y=0) In [43]: b Out[43]: <xarray.Dataset> Dimensions: (time: 36, x: 275) Coordinates: * time (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00 xc (x) float64 189.2 189.4 189.6 189.7 ... 293.5 293.8 294.0 294.3 yc (x) float64 16.53 16.78 17.02 17.27 ... 27.61 27.36 27.12 26.87 Dimensions without coordinates: x Data variables: Tair (time, x) float64 ... In [44]: dask.base.tokenize(b) == dask.base.tokenize(b) Out[44]: False ``` This looks like a bug in my opinion... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
ENH: Compute hash of xarray objects 775502974 | |
998948715 | https://github.com/pydata/xarray/issues/4738#issuecomment-998948715 | https://api.github.com/repos/pydata/xarray/issues/4738 | IC_kwDOAMm_X847ir9r | andersy005 13301940 | 2021-12-21T17:06:51Z | 2021-12-21T17:11:47Z | MEMBER |
I tried running the reproducer above and things seem to be working fine. I can't for the life of me understand why I got non-deterministic behavior four hours ago :( ```python In [1]: import dask, xarray as xr In [2]: ds = xr.tutorial.open_dataset('rasm') In [3]: dask.base.tokenize(ds) == dask.base.tokenize(ds) Out[3]: True In [4]: dask.base.tokenize(ds.Tair._coords) == dask.base.tokenize(ds.Tair._coords) Out[4]: True ``` ```python In [5]: xr.show_versions() INSTALLED VERSIONScommit: None python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 20:33:18) [Clang 11.1.0 ] python-bits: 64 OS: Darwin OS-release: 20.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.20.1 pandas: 1.3.4 numpy: 1.20.3 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: 0.11.0 h5py: 3.6.0 Nio: None zarr: 2.10.3 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.11.2 distributed: 2021.11.2 matplotlib: 3.5.0 cartopy: None seaborn: None numbagg: None fsspec: 2021.11.1 cupy: None pint: 0.18 sparse: None setuptools: 59.4.0 pip: 21.3.1 conda: None pytest: None IPython: 7.30.0 sphinx: 4.3.1 ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
ENH: Compute hash of xarray objects 775502974 | |
998764799 | https://github.com/pydata/xarray/issues/4738#issuecomment-998764799 | https://api.github.com/repos/pydata/xarray/issues/4738 | IC_kwDOAMm_X847h_D_ | andersy005 13301940 | 2021-12-21T13:08:21Z | 2021-12-21T13:09:01Z | MEMBER |
@dcherian, I just realized that ```python In [2]: import dask, xarray as xr In [3]: ds = xr.tutorial.open_dataset('rasm') In [4]: dask.base.tokenize(ds) == dask.base.tokenize(ds) Out[4]: False In [5]: dask.base.tokenize(ds) == dask.base.tokenize(ds) Out[5]: False ``` The issue appears to be caused by the coordinates which are used in
Is this the expected behavior or am I missing something? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
ENH: Compute hash of xarray objects 775502974 | |
757504237 | https://github.com/pydata/xarray/issues/4738#issuecomment-757504237 | https://api.github.com/repos/pydata/xarray/issues/4738 | MDEyOklzc3VlQ29tbWVudDc1NzUwNDIzNw== | andersy005 13301940 | 2021-01-10T16:34:20Z | 2021-01-10T16:34:20Z | MEMBER |
👍🏽
Due to the simplicity of |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
ENH: Compute hash of xarray objects 775502974 | |
752719785 | https://github.com/pydata/xarray/issues/4738#issuecomment-752719785 | https://api.github.com/repos/pydata/xarray/issues/4738 | MDEyOklzc3VlQ29tbWVudDc1MjcxOTc4NQ== | dcherian 2448579 | 2020-12-30T18:41:47Z | 2020-12-30T18:41:47Z | MEMBER | @andersy005 if you can rely on dask always being present, |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
ENH: Compute hash of xarray objects 775502974 | |
752226528 | https://github.com/pydata/xarray/issues/4738#issuecomment-752226528 | https://api.github.com/repos/pydata/xarray/issues/4738 | MDEyOklzc3VlQ29tbWVudDc1MjIyNjUyOA== | shoyer 1217238 | 2020-12-29T20:13:02Z | 2020-12-29T20:13:02Z | MEMBER | I asked because this isn't an operation I've used directly on pandas objects in the past. I'm not opposed, but my suggestion would be to write a separate utility function, e.g., in |
{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
ENH: Compute hash of xarray objects 775502974 | |
752156934 | https://github.com/pydata/xarray/issues/4738#issuecomment-752156934 | https://api.github.com/repos/pydata/xarray/issues/4738 | MDEyOklzc3VlQ29tbWVudDc1MjE1NjkzNA== | TomAugspurger 1312546 | 2020-12-29T16:53:16Z | 2020-12-29T16:53:16Z | MEMBER | IIUC, something like https://github.com/dask/dask/blob/4a7a2438219c4ee493434042e50f4cdb67b6ec9f/dask/base.py#L778 is what you're looking for. Further down we register tokenizers for various types like pandas' DataFrames and ndarrays. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
ENH: Compute hash of xarray objects 775502974 | |
752154350 | https://github.com/pydata/xarray/issues/4738#issuecomment-752154350 | https://api.github.com/repos/pydata/xarray/issues/4738 | MDEyOklzc3VlQ29tbWVudDc1MjE1NDM1MA== | andersy005 13301940 | 2020-12-29T16:47:03Z | 2020-12-29T16:47:03Z | MEMBER | Pandas has a built-in utility function ```python In [1]: import pandas as pd In [3]: df = pd.DataFrame({'A': [4, 5, 6, 7], 'B': [10, 20, 30, 40], 'C': [100, 50, -30, -50]}) In [4]: df Out[4]: A B C 0 4 10 100 1 5 20 50 2 6 30 -30 3 7 40 -50 In [6]: row_hashes = pd.util.hash_pandas_object(df) In [7]: row_hashes Out[7]: 0 14190898035981950066 1 16858535338008670510 2 1055569624497948892 3 5944630256416341839 dtype: uint64 ``` Combining the returned value of ```python In [8]: import hashlib In [10]: hashlib.sha1(row_hashes.values).hexdigest() # Compute overall hash of all rows. Out[10]: '1e1244d9b0489e1f479271f147025956d4994f67' ``` Regarding dask, I have no idea :) cc @TomAugspurger |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
ENH: Compute hash of xarray objects 775502974 | |
751963435 | https://github.com/pydata/xarray/issues/4738#issuecomment-751963435 | https://api.github.com/repos/pydata/xarray/issues/4738 | MDEyOklzc3VlQ29tbWVudDc1MTk2MzQzNQ== | shoyer 1217238 | 2020-12-29T06:24:30Z | 2020-12-29T06:24:30Z | MEMBER | Interesting! Do pandas or dask have anything like this? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
ENH: Compute hash of xarray objects 775502974 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 5