issues
1 row where user = 38732257 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2188557281 | I_kwDOAMm_X86Ccrvh | 8842 | Opening zarr dataset with poor connection leads to NaN chunks | renaudjester 38732257 | open | 0 | 21 | 2024-03-15T13:47:18Z | 2024-04-28T20:05:15Z | NONE | ProblemI am using xarray to open zarr datasets located in an s3 bucket. However, it can happen that the results doesn't retrieve all the chunks and we have NaNs instead. It is usually linked with low bandwith internet connection and asking for a lot of chunks. More detailsIn our case (see code below), we started tracking the http calls to understand a bit better the problem (with http tracking software). 3 cases are possible as for the response when getting a chunk:
- 200: we get the chunk with the data
- 403: missing data, this is normal as I am dealing with ocean data so the chunks associated with the continent don't exists
- no response: there isn't even a response so the get request "fails" and we don't have the data.
The latter is a big problem as we have randomly empty chunks! as a user it is also very annoying to detect. We also noticed that when using Questions
To reproduceThis bug is difficult to reproduce. The only way I managed to reproduce it is with a computer connected to a phone that is connected to the 3G. With this setup it happens all the time though. With a good connection and on my computer it never happens. We have had several reports of this problem otherwise. See the two scripts: one with ``` import xarray as xr import matplotlib.pyplot as plt import time import sys import logging logging.basicConfig( stream=sys.stdout, format="%(asctime)s | %(name)14s | %(levelname)7s | %(message)s", datefmt="%Y-%m-%dT%H:%M:%S", encoding="utf-8", level=logging.ERROR, ) logging.getLogger("timeloop").setLevel(logging.DEBUG) logging.getLogger("urllib3").setLevel(logging.DEBUG) logging.getLogger("botocore").setLevel(logging.DEBUG) logging.getLogger("s3fs").setLevel(logging.DEBUG) logging.getLogger("fsspec").setLevel(logging.DEBUG) logging.getLogger("asyncio").setLevel(logging.DEBUG) logging.getLogger("numba").setLevel(logging.ERROR)logging.getLogger("s3transfer").setLevel(logging.DEBUG) start_time = time.time() print("Starting...") data = xr.open_dataset("https://s3.waw3-1.cloudferro.com/mdl-arco-geo-012/arco/GLOBAL_ANALYSISFORECAST_PHY_001_024/cmems_mod_glo_phy-thetao_anfc_0.083deg_P1D-m_202211/geoChunked.zarr", engine = "zarr") print("Dataset opened...") bla = data.thetao.sel(longitude = slice(-170.037309004901026,-70.037309004901026), latitude=slice(-80.27257431850789,-40.27257431850789), time=slice("2023-03-20T00:00:00","2023-03-20T00:00:00")).sel(elevation =0, method="nearest") print("Plotting... ") map = bla.isel(time=0).plot() map = data.isel(time=0).plot()print("Saving image...") plt.savefig("./bla_fast.png") print("Total processing time:", (time.time() - start_time))
import logging logging.basicConfig( stream=sys.stdout, format="%(asctime)s | %(name)14s | %(levelname)7s | %(message)s", datefmt="%Y-%m-%dT%H:%M:%S", encoding="utf-8", level=logging.ERROR, ) logging.getLogger("timeloop").setLevel(logging.DEBUG) logging.getLogger("urllib3").setLevel(logging.DEBUG) logging.getLogger("botocore").setLevel(logging.DEBUG) logging.getLogger("s3fs").setLevel(logging.DEBUG) logging.getLogger("fsspec").setLevel(logging.DEBUG) logging.getLogger("asyncio").setLevel(logging.DEBUG) logging.getLogger("numba").setLevel(logging.ERROR)logging.getLogger("s3transfer").setLevel(logging.DEBUG) start_time = time.time() with dask.config.set(num_workers=2):
print("Total processing time:", (time.time() - start_time)) ``` Expected resultor failed run Obtained result |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8842/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);