html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2093#issuecomment-385451876,https://api.github.com/repos/pydata/xarray/issues/2093,385451876,MDEyOklzc3VlQ29tbWVudDM4NTQ1MTg3Ng==,1197350,2018-04-30T16:26:22Z,2018-04-30T16:26:22Z,MEMBER,"There is precedent for auto-aligning dask chunks with the underlying
dataset chunks. This is what we do with the `auto_chunk` argument in
`open_zarr`:
http://xarray.pydata.org/en/latest/generated/xarray.open_zarr.html#xarray.open_zarr
On Mon, Apr 30, 2018 at 12:21 PM, Matthew Rocklin
wrote:
> Given a tiled GeoTIFF image I'm looking for the best practice in reading
> it as a chunked dataset. I did this in this notebook
> by
> first opening the file with rasterio, looking at the block sizes, and then
> using those to inform the argument to chunks= in xarray.open_rasterio.
> This works, but is somewhat cumbersome because I also had to dive into the
> rasterio API. Do we want to provide defaults here?
>
> In dask.array every time this has come up we've always shot it down,
> automatic chunking is error prone and hard to do well. However in these
> cases the object we're being given usually also conveys its chunking in a
> way that matches how dask.array thinks about it, so the extra cognitive
> load on the user has been somewhat low. Rasterio's model and API feel much
> more foreign to me though than a project like NetCDF or H5Py. I find myself
> wanting a chunks=True or chunks='100MB' option.
>
> Thoughts on this? Is this in-scope? If so then what is the right API and
> what is the right policy for how to make xarray/dask.array chunks larger
> than GeoTIFF chunks?
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> , or mute the thread
>
> .
>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,318950038