home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 409565674

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2329#issuecomment-409565674 https://api.github.com/repos/pydata/xarray/issues/2329 409565674 MDEyOklzc3VlQ29tbWVudDQwOTU2NTY3NA== 12278765 2018-08-01T12:58:31Z 2018-08-01T12:58:31Z NONE

I ran a comparison of the impact of chunk sizes with a profiler:

python profiler = Profiler() for chunks in [{'time': 30}, {'lat': 30}, {'lon': 30}]: print(chunks) profiler.start() with xr.open_dataset(nc_path, chunks=chunks) as ds: print(ds.mean(dim='time').load()) profiler.stop() print(profiler.output_text(unicode=True, color=True))

I am not sure if the profiler results are useful:

{'time': 30} <xarray.Dataset> Dimensions: (lat: 721, lon: 1440) Coordinates: * lon (lon) float32 0.0 0.25 0.5 0.75 1.0 1.25 1.5 1.75 2.0 2.25 2.5 ... * lat (lat) float32 90.0 89.75 89.5 89.25 89.0 88.75 88.5 88.25 88.0 ... Data variables: mtpr (lat, lon) float32 8.30159e-06 8.30159e-06 8.30159e-06 ... 5652.770 compare_chunks read_grib.py:281 └─ 5652.613 load xarray/core/dataset.py:466 └─ 5652.613 compute dask/base.py:349 └─ 5652.404 get dask/threaded.py:33 └─ 5652.400 get_async dask/local.py:389 └─ 5629.663 queue_get dask/local.py:127 └─ 5629.663 get Queue.py:150 └─ 5629.656 wait threading.py:309

In the case of chunks on lat or lon only, I get a MemoryError.

I don't know if this helps, but it would be great to have a solution or workaround for that. Surely I am not the only one working with dataset of that size? What would be the best practice in my case?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  345715825
Powered by Datasette · Queries took 2.96ms · About: xarray-datasette