home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "MEMBER" and issue = 1474785646 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • jhamman 3

issue 1

  • 'open_mfdataset' zarr zip timestamp issue · 3 ✖

author_association 1

  • MEMBER · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1352342017 https://github.com/pydata/xarray/issues/7354#issuecomment-1352342017 https://api.github.com/repos/pydata/xarray/issues/7354 IC_kwDOAMm_X85QmxoB jhamman 2443309 2022-12-14T23:08:43Z 2022-12-14T23:09:06Z MEMBER

After thinking about this for a bit longer, I think we should be strongly considering dropping source encoding for datasets generated by open_mfdataset. Or, if nothing else, thinking about ways to alert the user that encoding was not consistent across all of the datasets loaded.

Other relevant issues: - https://github.com/pydata/xarray/issues/1614 - https://github.com/pydata/xarray/issues/6323 - https://github.com/pydata/xarray/issues/7039

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'open_mfdataset' zarr zip timestamp issue 1474785646
1352337857 https://github.com/pydata/xarray/issues/7354#issuecomment-1352337857 https://api.github.com/repos/pydata/xarray/issues/7354 IC_kwDOAMm_X85QmwnB jhamman 2443309 2022-12-14T23:04:58Z 2022-12-14T23:04:58Z MEMBER

I took a minute to look into this and think I understand what is going on. First, a little debugging:

python for name in [files[0], files[1], path]: print(name) ds = xr.open_zarr(name, decode_cf=False) print(' > time.attrs', ds.time.attrs) print(' > time.encoding', ds.time.encoding)

``` tmp_dir/2022-09-01T03:00:00.zarr.zip

time.attrs {'calendar': 'proleptic_gregorian', 'units': 'days since 2022-09-01 03:00:00'} time.encoding {'chunks': (1,), 'preferred_chunks': {'time': 1}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'dtype': dtype('int64')} tmp_dir/2022-09-01T04:00:00.zarr.zip time.attrs {'calendar': 'proleptic_gregorian', 'units': 'days since 2022-09-01 04:00:00'} time.encoding {'chunks': (1,), 'preferred_chunks': {'time': 1}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'dtype': dtype('int64')} tmp.zarr.zip time.attrs {'calendar': 'proleptic_gregorian', 'units': 'days since 2022-09-01'} time.encoding {'chunks': (1,), 'preferred_chunks': {'time': 1}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'dtype': dtype('int64')} ```

A few things that I noticed: - the dtype of the time variable is int64. - the units attr is days since ....

open_mfdataset tends to take the units of the first file and doesn't check if all the others agree. It also does not clear out the dtype encoding.

One quick solution here is that you could add

python del dataset['time'].encoding['units'] to the line right after your open_mfdataset call. You could also update the dtype of your time variable to be a float64.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'open_mfdataset' zarr zip timestamp issue 1474785646
1345876149 https://github.com/pydata/xarray/issues/7354#issuecomment-1345876149 https://api.github.com/repos/pydata/xarray/issues/7354 IC_kwDOAMm_X85QOHC1 jhamman 2443309 2022-12-12T04:57:05Z 2022-12-12T04:57:05Z MEMBER

@peterdudfield - have you tried this workflow with the latest version of xarray (2022.12.0)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'open_mfdataset' zarr zip timestamp issue 1474785646

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.352ms · About: xarray-datasette