home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 1020190813

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/6033#issuecomment-1020190813 https://api.github.com/repos/pydata/xarray/issues/6033 1020190813 IC_kwDOAMm_X848zuBd 6042212 2022-01-24T15:00:53Z 2022-01-24T15:00:53Z CONTRIBUTOR

It would be interesting to turn on s3fs logging to see the access pattern, if you are interested. python fsspec.utils.setup_logging(logger_name="s3fs") Particularly, I am interested in whether xarray is loading chunk-by chunk serially versus concurrently. It would be good to know your chunksize versus total array size.

The dask version is interesting: xr.open_zarr(lookup(f"{path_forecast}/surface"), chunks={}) # uses dask where the dask partition size will be the same as the underlying chunk size. If you find a lot of latency (small chunks), you can sometimes get an order of magnitude download performance increase by specifying the chunksize along some dimension(s) to be a multiple of the on-disk size. I wouldn't normally recommend Dask just for loading the data into memory, but feel free to experiment.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1064837571
Powered by Datasette · Queries took 81.361ms · About: xarray-datasette