home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "MEMBER" and issue = 576337745 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • rabernat 4
  • max-sixty 2
  • dcherian 1

issue 1

  • Errors using to_zarr for an s3 store · 7 ✖

author_association 1

  • MEMBER · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
595747237 https://github.com/pydata/xarray/issues/3831#issuecomment-595747237 https://api.github.com/repos/pydata/xarray/issues/3831 MDEyOklzc3VlQ29tbWVudDU5NTc0NzIzNw== dcherian 2448579 2020-03-06T12:29:21Z 2020-03-06T12:29:21Z MEMBER

One idea I have thought about is an "integration bot"

I think this should be under pangeo/stack-integration-tests (or similat) and run CI nightly with git-master versions and stable release versions of xarray / dask / zarr / gcsfs etc.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Errors using to_zarr for an s3 store 576337745
595732104 https://github.com/pydata/xarray/issues/3831#issuecomment-595732104 https://api.github.com/repos/pydata/xarray/issues/3831 MDEyOklzc3VlQ29tbWVudDU5NTczMjEwNA== rabernat 1197350 2020-03-06T11:44:02Z 2020-03-06T11:44:02Z MEMBER

but any thoughts on the best way for users & us to identify the appropriate library for users to direct their issues?

This is basically an integration problem. What we are lacking is a comprehensive set of integration tests for this ecosystem (xarray + dask + zarr + fsspec and all its implementations). Pangeo has served as a de facto point for this discussion, since we are using the whole stack. Some similar issues there are: - https://github.com/pangeo-data/pangeo/issues/767 (xarray + opendap) - https://github.com/pangeo-data/pangeo/issues/765 (xarray + zarr + s3fs) - https://github.com/pangeo-data/pangeo/issues/741 (xarray + zarr + gcsfs) - https://github.com/pangeo-data/pangeo/issues/691 etc...

All of these libraries understandably want to push the issues somewhere else, since they tend to be complex and hard to reduce to a MCVE. But there are fundamental issues related to integration that have to be addressed somewhere.

Is it just the last item in the call stack?

Yes and no. The details of how xarray is talking to these stores may matter. Continuing our planned refactor of the backened classes to use entry points, and formalizing the interface for backends, should help surface problems. The way we do consolidated metadata, for example, is pretty ad hoc:

https://github.com/pydata/xarray/blob/69723ebf34cb9c37917b44b2ac1ab92ae553fecc/xarray/backends/zarr.py#L451-L455

Does xarray need to build diagnostics / assertions for highlighting where the problem is? https://github.com/pangeo-data/pangeo/issues/691

Better diagnostics and more informative errors is always good. But we don't want to do this randomly. I think it should be part of the backend refactor. When is that happening btw? 🙃

We can't hope to test every possible permutation of xarray / zarr store / fsspec implementation within xarray. One idea I have thought about is an "integration bot", who watches all of these libraries and runs its own integration tests. This bot could even be configured to watch the repos and comment on PRs. That would be a cool project!

Would appreciate @jhamman's thoughts here.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Errors using to_zarr for an s3 store 576337745
595403356 https://github.com/pydata/xarray/issues/3831#issuecomment-595403356 https://api.github.com/repos/pydata/xarray/issues/3831 MDEyOklzc3VlQ29tbWVudDU5NTQwMzM1Ng== max-sixty 5635139 2020-03-05T19:25:40Z 2020-03-05T19:25:40Z MEMBER

Not to hijack this specific issue for the general case, but any thoughts on the best way for users & us to identify the appropriate library for users to direct their issues? Is it just the last item in the call stack? Does xarray need to build diagnostics / assertions for highlighting where the problem is?

A quick survey of the first two pages of xarray issues yield a bunch of issues which receive no response from us, and those that do are often a decent amount of back & forth: https://github.com/pydata/xarray/issues/3815 (zarr?) https://github.com/pydata/xarray/issues/3781 (scipy? dask?) https://github.com/pydata/xarray/issues/3776 (probably xarray, maybe netcdf) https://github.com/pydata/xarray/issues/3767 (scipy? netcdf? this did get a response, from @dcherian ) https://github.com/pydata/xarray/issues/3754 (pydap. @dcherian worked through this one with some back and forth)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Errors using to_zarr for an s3 store 576337745
595383819 https://github.com/pydata/xarray/issues/3831#issuecomment-595383819 https://api.github.com/repos/pydata/xarray/issues/3831 MDEyOklzc3VlQ29tbWVudDU5NTM4MzgxOQ== rabernat 1197350 2020-03-05T18:41:32Z 2020-03-05T18:41:32Z MEMBER

I had tried to delete all of the uploaded directory structure before in between attempts to give it that same effect of a "fresh path".

Key question: did you restart your kernel or call s3.invalidate_cache() in between attempts as well? If not, it again points to a caching problem.

The goal here is to drill down into the stack and find the point where s3fs is failing to update its cache correctly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Errors using to_zarr for an s3 store 576337745
595377795 https://github.com/pydata/xarray/issues/3831#issuecomment-595377795 https://api.github.com/repos/pydata/xarray/issues/3831 MDEyOklzc3VlQ29tbWVudDU5NTM3Nzc5NQ== rabernat 1197350 2020-03-05T18:27:53Z 2020-03-05T18:27:53Z MEMBER

In that case, I'm fairly certain it is https://github.com/dask/s3fs/issues/285.

There is a bug in s3fs where it caches the directory listing and then doesn't update it again, even if you delete files. This would potential cause problems when trying to overwrite, since s3fs would think the objects are already there, even if they are deleted.

The same bug means consolidated metadata usually doesn't work. Perhaps @martindurant can weigh in.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Errors using to_zarr for an s3 store 576337745
595298921 https://github.com/pydata/xarray/issues/3831#issuecomment-595298921 https://api.github.com/repos/pydata/xarray/issues/3831 MDEyOklzc3VlQ29tbWVudDU5NTI5ODkyMQ== rabernat 1197350 2020-03-05T15:47:12Z 2020-03-05T15:47:12Z MEMBER

These are tricky issues because they involve the integration of at least three libraries (xarray, zarr, s3fs, and possibly dask as well).

Are you using dask?

There could be some issues with s3fs caching (see https://github.com/dask/s3fs/issues/285). If you start fresh on a new path with nothing in it (so you don't need mode='w'), does it work?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Errors using to_zarr for an s3 store 576337745
595293828 https://github.com/pydata/xarray/issues/3831#issuecomment-595293828 https://api.github.com/repos/pydata/xarray/issues/3831 MDEyOklzc3VlQ29tbWVudDU5NTI5MzgyOA== max-sixty 5635139 2020-03-05T15:37:18Z 2020-03-05T15:37:18Z MEMBER

Thanks for the issue @LewisJarrod

@jhamman (or @rabernat ?) -- what's the best way of identifying whether this is an xarray or zarr issue? There's a few similar issues in the backlog, and they often go unanswered. To the extent we can help people split out where the issue is and push it to the relevant library, that would keep things moving. Thank you

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Errors using to_zarr for an s3 store 576337745

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.389ms · About: xarray-datasette