home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where issue = 318950038 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 7

  • ebo 4
  • mrocklin 1
  • rabernat 1
  • jhamman 1
  • fmaussion 1
  • Zac-HD 1
  • stale[bot] 1

author_association 3

  • NONE 5
  • MEMBER 4
  • CONTRIBUTOR 1

issue 1

  • Default chunking in GeoTIFF images · 10 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
630508073 https://github.com/pydata/xarray/issues/2093#issuecomment-630508073 https://api.github.com/repos/pydata/xarray/issues/2093 MDEyOklzc3VlQ29tbWVudDYzMDUwODA3Mw== stale[bot] 26384082 2020-05-19T00:41:31Z 2020-05-19T00:41:31Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Default chunking in GeoTIFF images 318950038
398214607 https://github.com/pydata/xarray/issues/2093#issuecomment-398214607 https://api.github.com/repos/pydata/xarray/issues/2093 MDEyOklzc3VlQ29tbWVudDM5ODIxNDYwNw== ebo 601025 2018-06-18T22:24:18Z 2018-06-18T22:24:18Z NONE

On Jun 18 2018 4:03 PM, Fabien Maussion wrote:

Has a default GeoTIFF chunk been implemented?

No, unfortunately.

ok. Maybe the overall chunking issue has been sorted. I will try to look into this and see what is working now related to this issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Default chunking in GeoTIFF images 318950038
398210102 https://github.com/pydata/xarray/issues/2093#issuecomment-398210102 https://api.github.com/repos/pydata/xarray/issues/2093 MDEyOklzc3VlQ29tbWVudDM5ODIxMDEwMg== fmaussion 10050469 2018-06-18T22:03:28Z 2018-06-18T22:03:28Z MEMBER

Has a default GeoTIFF chunk been implemented?

No, unfortunately.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Default chunking in GeoTIFF images 318950038
398183774 https://github.com/pydata/xarray/issues/2093#issuecomment-398183774 https://api.github.com/repos/pydata/xarray/issues/2093 MDEyOklzc3VlQ29tbWVudDM5ODE4Mzc3NA== ebo 601025 2018-06-18T20:24:53Z 2018-06-18T20:24:53Z NONE

one of the issues related to this has been closed. Has a default GeoTIFF chunk been implemented?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Default chunking in GeoTIFF images 318950038
387058496 https://github.com/pydata/xarray/issues/2093#issuecomment-387058496 https://api.github.com/repos/pydata/xarray/issues/2093 MDEyOklzc3VlQ29tbWVudDM4NzA1ODQ5Ng== ebo 601025 2018-05-07T13:07:16Z 2018-05-07T13:07:16Z NONE

that would definitely work for me.

On May 7 2018 6:43 AM, Zac Hatfield-Dodds wrote:

With the benefit of almost a year's worth of procrastination, I think the best approach is to take the heuristics from #1440, but only support chunks=True - if a decent default heuristic isn't good enough, the user can specify exact chunks.

The underlying logic for this issue would be identical to that of

1440, so supporting both is "just" a matter of plumbing it in

correctly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Default chunking in GeoTIFF images 318950038
387052622 https://github.com/pydata/xarray/issues/2093#issuecomment-387052622 https://api.github.com/repos/pydata/xarray/issues/2093 MDEyOklzc3VlQ29tbWVudDM4NzA1MjYyMg== Zac-HD 12229877 2018-05-07T12:43:41Z 2018-05-07T12:43:41Z CONTRIBUTOR

With the benefit of almost a year's worth of procrastination, I think the best approach is to take the heuristics from #1440, but only support chunks=True - if a decent default heuristic isn't good enough, the user can specify exact chunks.

The underlying logic for this issue would be identical to that of #1440, so supporting both is "just" a matter of plumbing it in correctly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Default chunking in GeoTIFF images 318950038
385463527 https://github.com/pydata/xarray/issues/2093#issuecomment-385463527 https://api.github.com/repos/pydata/xarray/issues/2093 MDEyOklzc3VlQ29tbWVudDM4NTQ2MzUyNw== ebo 601025 2018-04-30T17:07:43Z 2018-04-30T17:07:43Z NONE

Most of the standard internal chunked (or what I believe to be called 'tiled' by the GIS community) is 256x256 (see: http://www.gdal.org/frmt_gtiff.html TILED=YES BLOCKXSIZE=n and BLOCKYSIZE=n). This is used when viewing images within a given region of interest or window. You can really tell the difference in speed between the tiled and stripped images (which has a blocksize 1x<width>).

@mrocklin, I agree that we might want to aggregate some number of them, but we would need to get some automation up front and sort out how we want to determine the expansion. Adding to the #1440 discussion mentioned, there will likely be advantage in increasing the block sizes in given directions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Default chunking in GeoTIFF images 318950038
385453205 https://github.com/pydata/xarray/issues/2093#issuecomment-385453205 https://api.github.com/repos/pydata/xarray/issues/2093 MDEyOklzc3VlQ29tbWVudDM4NTQ1MzIwNQ== jhamman 2443309 2018-04-30T16:31:19Z 2018-04-30T16:31:19Z MEMBER

1440 is related but more focused on netcdf.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Default chunking in GeoTIFF images 318950038
385452187 https://github.com/pydata/xarray/issues/2093#issuecomment-385452187 https://api.github.com/repos/pydata/xarray/issues/2093 MDEyOklzc3VlQ29tbWVudDM4NTQ1MjE4Nw== mrocklin 306380 2018-04-30T16:27:37Z 2018-04-30T16:27:37Z MEMBER

My guess is that geotiff chunks will be much smaller than is ideal for dask.array. We might want to expand those chunk sizes by some multiple.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Default chunking in GeoTIFF images 318950038
385451876 https://github.com/pydata/xarray/issues/2093#issuecomment-385451876 https://api.github.com/repos/pydata/xarray/issues/2093 MDEyOklzc3VlQ29tbWVudDM4NTQ1MTg3Ng== rabernat 1197350 2018-04-30T16:26:22Z 2018-04-30T16:26:22Z MEMBER

There is precedent for auto-aligning dask chunks with the underlying dataset chunks. This is what we do with the auto_chunk argument in open_zarr: http://xarray.pydata.org/en/latest/generated/xarray.open_zarr.html#xarray.open_zarr

On Mon, Apr 30, 2018 at 12:21 PM, Matthew Rocklin notifications@github.com wrote:

Given a tiled GeoTIFF image I'm looking for the best practice in reading it as a chunked dataset. I did this in this notebook https://gist.github.com/mrocklin/3df315e93d4bdeccf76db93caca2a9bd by first opening the file with rasterio, looking at the block sizes, and then using those to inform the argument to chunks= in xarray.open_rasterio. This works, but is somewhat cumbersome because I also had to dive into the rasterio API. Do we want to provide defaults here?

In dask.array every time this has come up we've always shot it down, automatic chunking is error prone and hard to do well. However in these cases the object we're being given usually also conveys its chunking in a way that matches how dask.array thinks about it, so the extra cognitive load on the user has been somewhat low. Rasterio's model and API feel much more foreign to me though than a project like NetCDF or H5Py. I find myself wanting a chunks=True or chunks='100MB' option.

Thoughts on this? Is this in-scope? If so then what is the right API and what is the right policy for how to make xarray/dask.array chunks larger than GeoTIFF chunks?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2093, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJFJoFj8_VGvsJ5oZQD4WTgW8Xx3lSyks5ttzoKgaJpZM4Ts2Hm .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Default chunking in GeoTIFF images 318950038

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.602ms · About: xarray-datasette