html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2093#issuecomment-385451876,https://api.github.com/repos/pydata/xarray/issues/2093,385451876,MDEyOklzc3VlQ29tbWVudDM4NTQ1MTg3Ng==,1197350,2018-04-30T16:26:22Z,2018-04-30T16:26:22Z,MEMBER,"There is precedent for auto-aligning dask chunks with the underlying dataset chunks. This is what we do with the `auto_chunk` argument in `open_zarr`: http://xarray.pydata.org/en/latest/generated/xarray.open_zarr.html#xarray.open_zarr On Mon, Apr 30, 2018 at 12:21 PM, Matthew Rocklin wrote: > Given a tiled GeoTIFF image I'm looking for the best practice in reading > it as a chunked dataset. I did this in this notebook > by > first opening the file with rasterio, looking at the block sizes, and then > using those to inform the argument to chunks= in xarray.open_rasterio. > This works, but is somewhat cumbersome because I also had to dive into the > rasterio API. Do we want to provide defaults here? > > In dask.array every time this has come up we've always shot it down, > automatic chunking is error prone and hard to do well. However in these > cases the object we're being given usually also conveys its chunking in a > way that matches how dask.array thinks about it, so the extra cognitive > load on the user has been somewhat low. Rasterio's model and API feel much > more foreign to me though than a project like NetCDF or H5Py. I find myself > wanting a chunks=True or chunks='100MB' option. > > Thoughts on this? Is this in-scope? If so then what is the right API and > what is the right policy for how to make xarray/dask.array chunks larger > than GeoTIFF chunks? > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > , or mute the thread > > . > ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,318950038