html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/3831#issuecomment-605222008,https://api.github.com/repos/pydata/xarray/issues/3831,605222008,MDEyOklzc3VlQ29tbWVudDYwNTIyMjAwOA==,6042212,2020-03-27T19:11:59Z,2020-03-27T19:11:59Z,CONTRIBUTOR,"Note that s3fs and gcsfs now expose the kwargs `skip_instance_cache` `use_listings_cache`, `listings_expiry_time`, and `max_paths` and pass them to `fsspec`. See https://filesystem-spec.readthedocs.io/en/latest/features.html#instance-caching and https://filesystem-spec.readthedocs.io/en/latest/features.html#listings-caching (although the new releases for both already include the change that accessing a file, contents or metadata, does *not* require a directory listing, which is the right thing for zarr, where the full paths are known)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,576337745 https://github.com/pydata/xarray/issues/3831#issuecomment-605179227,https://api.github.com/repos/pydata/xarray/issues/3831,605179227,MDEyOklzc3VlQ29tbWVudDYwNTE3OTIyNw==,703554,2020-03-27T18:10:05Z,2020-03-27T18:10:05Z,CONTRIBUTOR,"Just to say having some kind of stack integration tests is a marvellous idea. Another example of an issue that's very hard to pin down is https://github.com/zarr-developers/zarr-python/issues/528. Btw we have also run into issues with fsspec caching directory listings and not invalidating the cache when store changes are made, although I haven't checked with latest master. We have a lot of workarounds in our code where we reopen everything after we've made changes to a store. Probably an area where some more digging and careful testing may be needed.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,576337745 https://github.com/pydata/xarray/issues/3831#issuecomment-595747237,https://api.github.com/repos/pydata/xarray/issues/3831,595747237,MDEyOklzc3VlQ29tbWVudDU5NTc0NzIzNw==,2448579,2020-03-06T12:29:21Z,2020-03-06T12:29:21Z,MEMBER,"> One idea I have thought about is an ""integration bot"" I think this should be under `pangeo/stack-integration-tests` (or similat) and run CI nightly with git-master versions and stable release versions of xarray / dask / zarr / gcsfs etc.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,576337745 https://github.com/pydata/xarray/issues/3831#issuecomment-595732104,https://api.github.com/repos/pydata/xarray/issues/3831,595732104,MDEyOklzc3VlQ29tbWVudDU5NTczMjEwNA==,1197350,2020-03-06T11:44:02Z,2020-03-06T11:44:02Z,MEMBER,"> but any thoughts on the best way for users & us to identify the appropriate library for users to direct their issues? This is basically an integration problem. What we are lacking is a comprehensive set of integration tests for this ecosystem (xarray + dask + zarr + fsspec and all its implementations). Pangeo has served as a de facto point for this discussion, since we are using the whole stack. Some similar issues there are: - https://github.com/pangeo-data/pangeo/issues/767 (xarray + opendap) - https://github.com/pangeo-data/pangeo/issues/765 (xarray + zarr + s3fs) - https://github.com/pangeo-data/pangeo/issues/741 (xarray + zarr + gcsfs) - https://github.com/pangeo-data/pangeo/issues/691 etc... All of these libraries understandably want to push the issues somewhere else, since they tend to be complex and hard to reduce to a MCVE. But there are fundamental issues related to integration that have to be addressed somewhere. > Is it just the last item in the call stack? Yes and no. The details of how xarray is talking to these stores may matter. Continuing our planned refactor of the backened classes to use entry points, and formalizing the interface for backends, should help surface problems. The way we do consolidated metadata, for example, is pretty ad hoc: https://github.com/pydata/xarray/blob/69723ebf34cb9c37917b44b2ac1ab92ae553fecc/xarray/backends/zarr.py#L451-L455 > Does xarray need to build diagnostics / assertions for highlighting where the problem is? https://github.com/pangeo-data/pangeo/issues/691 Better diagnostics and more informative errors is always good. But we don't want to do this randomly. I think it should be part of the backend refactor. When is that happening btw? 🙃 We can't hope to test every possible permutation of xarray / zarr store / fsspec implementation within xarray. One idea I have thought about is an ""integration bot"", who watches all of these libraries and runs its own integration tests. This bot could even be configured to watch the repos and comment on PRs. That would be a cool project! Would appreciate @jhamman's thoughts here.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,576337745 https://github.com/pydata/xarray/issues/3831#issuecomment-595403356,https://api.github.com/repos/pydata/xarray/issues/3831,595403356,MDEyOklzc3VlQ29tbWVudDU5NTQwMzM1Ng==,5635139,2020-03-05T19:25:40Z,2020-03-05T19:25:40Z,MEMBER,"Not to hijack this specific issue for the general case, but any thoughts on the best way for users & us to identify the appropriate library for users to direct their issues? Is it just the last item in the call stack? Does xarray need to build diagnostics / assertions for highlighting where the problem is? A quick survey of the first two pages of xarray issues yield a bunch of issues which receive no response from us, and those that do are often a decent amount of back & forth: https://github.com/pydata/xarray/issues/3815 (zarr?) https://github.com/pydata/xarray/issues/3781 (scipy? dask?) https://github.com/pydata/xarray/issues/3776 (probably xarray, maybe netcdf) https://github.com/pydata/xarray/issues/3767 (scipy? netcdf? this did get a response, from @dcherian ) https://github.com/pydata/xarray/issues/3754 (pydap. @dcherian worked through this one with some back and forth)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,576337745 https://github.com/pydata/xarray/issues/3831#issuecomment-595398641,https://api.github.com/repos/pydata/xarray/issues/3831,595398641,MDEyOklzc3VlQ29tbWVudDU5NTM5ODY0MQ==,15351025,2020-03-05T19:15:28Z,2020-03-05T19:15:28Z,NONE,I never called `s3.invalidate_cache()` in between but I was restarting my kernel fairly regularly although not intentionally as a way to try to clear the cache. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,576337745 https://github.com/pydata/xarray/issues/3831#issuecomment-595383819,https://api.github.com/repos/pydata/xarray/issues/3831,595383819,MDEyOklzc3VlQ29tbWVudDU5NTM4MzgxOQ==,1197350,2020-03-05T18:41:32Z,2020-03-05T18:41:32Z,MEMBER,"> I had tried to delete all of the uploaded directory structure before in between attempts to give it that same effect of a ""fresh path"". Key question: did you restart your kernel or call `s3.invalidate_cache()` in between attempts as well? If not, it again points to a caching problem. The goal here is to drill down into the stack and find the point where s3fs is failing to update its cache correctly.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,576337745 https://github.com/pydata/xarray/issues/3831#issuecomment-595379998,https://api.github.com/repos/pydata/xarray/issues/3831,595379998,MDEyOklzc3VlQ29tbWVudDU5NTM3OTk5OA==,6042212,2020-03-05T18:32:38Z,2020-03-05T18:32:38Z,CONTRIBUTOR,"https://github.com/intake/filesystem_spec/pull/243 is where my attempt to fix this kind of thing will live. However, writing or deleting keys should invalidate the appropriate part of the cache as it currently stands, so I don't know why the problem has arisen. If it is a cache problem, then `s3.invalidate_cache()` can always be called.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,576337745 https://github.com/pydata/xarray/issues/3831#issuecomment-595377795,https://api.github.com/repos/pydata/xarray/issues/3831,595377795,MDEyOklzc3VlQ29tbWVudDU5NTM3Nzc5NQ==,1197350,2020-03-05T18:27:53Z,2020-03-05T18:27:53Z,MEMBER,"In that case, I'm fairly certain it is https://github.com/dask/s3fs/issues/285. There is a bug in s3fs where it caches the directory listing and then doesn't update it again, even if you delete files. This would potential cause problems when trying to overwrite, since s3fs would think the objects are already there, even if they are deleted. The same bug means consolidated metadata usually doesn't work. Perhaps @martindurant can weigh in.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,576337745 https://github.com/pydata/xarray/issues/3831#issuecomment-595376692,https://api.github.com/repos/pydata/xarray/issues/3831,595376692,MDEyOklzc3VlQ29tbWVudDU5NTM3NjY5Mg==,15351025,2020-03-05T18:25:18Z,2020-03-05T18:25:18Z,NONE,"Thanks for the responses. I have tried this both with and without dask so far. Taking out mode='w' did just go through successfully which I'm surprised by. I had tried to delete all of the uploaded directory structure before in between attempts to give it that same effect of a ""fresh path"". ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,576337745 https://github.com/pydata/xarray/issues/3831#issuecomment-595298921,https://api.github.com/repos/pydata/xarray/issues/3831,595298921,MDEyOklzc3VlQ29tbWVudDU5NTI5ODkyMQ==,1197350,2020-03-05T15:47:12Z,2020-03-05T15:47:12Z,MEMBER,"These are tricky issues because they involve the integration of at least three libraries (xarray, zarr, s3fs, and possibly dask as well). Are you using dask? There could be some issues with s3fs caching (see https://github.com/dask/s3fs/issues/285). If you start fresh on a new path with nothing in it (so you don't need `mode='w'`), does it work?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,576337745 https://github.com/pydata/xarray/issues/3831#issuecomment-595293828,https://api.github.com/repos/pydata/xarray/issues/3831,595293828,MDEyOklzc3VlQ29tbWVudDU5NTI5MzgyOA==,5635139,2020-03-05T15:37:18Z,2020-03-05T15:37:18Z,MEMBER,"Thanks for the issue @LewisJarrod @jhamman (or @rabernat ?) -- what's the best way of identifying whether this is an xarray or zarr issue? There's a few similar issues in the backlog, and they often go unanswered. To the extent we can help people split out where the issue is and push it to the relevant library, that would keep things moving. Thank you","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,576337745