home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

15 rows where issue = 140063713 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 5

  • jhamman 6
  • IamJeffG 3
  • fmaussion 3
  • perrygeo 2
  • shoyer 1

author_association 3

  • MEMBER 10
  • CONTRIBUTOR 3
  • NONE 2

issue 1

  • ENH: Optional Read-Only RasterIO backend · 15 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
211362112 https://github.com/pydata/xarray/issues/790#issuecomment-211362112 https://api.github.com/repos/pydata/xarray/issues/790 MDEyOklzc3VlQ29tbWVudDIxMTM2MjExMg== IamJeffG 2002703 2016-04-18T12:40:31Z 2016-04-18T12:40:31Z CONTRIBUTOR

I didn't realize you'd already been integrating xarray with rasterio already. Is that library open source?

Alas no, and in fact I don't even have access anymore; it's with my last employer. I believe we'll benefit from the fresh look as you guys are already doing! Mostly I aimed to point out the value of exposing the affine transform, in addition to the data & coordinate variables.

:+1: Agreed to keep warping explicit, and not deal with it in this issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Optional Read-Only RasterIO backend 140063713
211351199 https://github.com/pydata/xarray/issues/790#issuecomment-211351199 https://api.github.com/repos/pydata/xarray/issues/790 MDEyOklzc3VlQ29tbWVudDIxMTM1MTE5OQ== perrygeo 1151287 2016-04-18T12:06:07Z 2016-04-18T12:06:07Z NONE

@IamJeffG thanks for the example. I didn't realize you'd already been integrating xarray with rasterio already. Is that library open source?

Reprojecting or clipping after reading xarray, like I do, goes against @perrygeo's recommendation. So maybe my example is moot, but I really like being able to do this programmatically in python, not CLI.

To clarify, I just want to make sure that clipping/reprojecting/resampling remains an explicit step. That's a great approach you outlined, I just wouldn't want any software to make those assumptions for me!

It's not good to assume a negative y-step size. Rarely, I will come across a dataset that breaks convention with a positive y coordinate, meaning the first pixel is the lower-left corner, but at least the dataset is self-consistent. Rasterio works beautifully even with these black sheep, so we don't want an xarray reader to force the assumption.

Agreed. We've been doing more testing on this topic and found that rasterio generally works as expected for positive-y rasters. But there are still some built-in assumptions about negative-y rasters that cause spectacular failures. It's still a work in progress...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Optional Read-Only RasterIO backend 140063713
211109010 https://github.com/pydata/xarray/issues/790#issuecomment-211109010 https://api.github.com/repos/pydata/xarray/issues/790 MDEyOklzc3VlQ29tbWVudDIxMTEwOTAxMA== jhamman 2443309 2016-04-17T20:34:31Z 2016-04-17T20:34:31Z MEMBER

Thanks for the comments @IamJeffG. I haven't had any time recently to mess around with this so I haven't made any progress since the original notebook.

It's not good to assume a negative y-step size. Rarely, I will come across a dataset that breaks convention with a positive y coordinate, meaning the first pixel is the lower-left corner, but at least the dataset is self-consistent. Rasterio works beautifully even with these black sheep, so we don't want an xarray reader to force the assumption.

Agreed. My notebook was just a quick example of how this could work and it would certainly benefit from some generalization when applying this as an xarray backend.

In a past life I made side library that wraps rasterio's API to take and return xarray.DataArrays. It provides IO/clip/warp/rasterize operations on DataArrays, which themselves are annotated with the CRS and affine transforms as attributes.

Interesting. Any chance that's available for public viewing?

Even if xarray's new rasterio backend only provides a reader ...

I only want to expose the reader and the necessary metadata to use the georeferenced dataset. Warping and other projection transformations would need to be handled downstream.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Optional Read-Only RasterIO backend 140063713
211045128 https://github.com/pydata/xarray/issues/790#issuecomment-211045128 https://api.github.com/repos/pydata/xarray/issues/790 MDEyOklzc3VlQ29tbWVudDIxMTA0NTEyOA== IamJeffG 2002703 2016-04-17T15:38:03Z 2016-04-17T15:38:03Z CONTRIBUTOR

In a past life I made side library that wraps rasterio's API to take and return xarray.DataArrays. It provides IO/clip/warp/rasterize operations on DataArrays, which themselves are annotated with the CRS and affine transforms as attributes.

My most common use case was reading disparate rasters and aligning them to the same grid: 1. Use rasterio to load separate spatial rasters over roughly the same area; let's say one is 30-meter satellite and one is 3-meter agricultural yield. Often I'll immediately wrap them in an xarray.DataArray and persist the CRS and affine transform as attributes. 2. Clip the fine-resolution yield array to my area of interest. I can either use rasterio.read(window=(...)) when reading or xarray.sel(x=slice(...), y=slice(...)) post-hoc. 3. I want to overlay the two arrays in a single xarray.Dataset. They may or may not have the same projection but definitely do not the same grid. I'll leave the fine-resolution data untouched to avoid needless resampling, but instead upsample and re-align the coarse-resolution satellite array to match the affine transform of the clipped fine-resolution array. Behind-the-scenes this looks like:

# agriculture and satellite are both 2D DataArrays. satellite_10m = np.zeros(agriculture.shape, dtype=satellite.dtype) rasterio.warp.transform( satellite.values, satellite_10m, src_transform=satellite.attrs['transform'], src_crs=satellite.attrs['crs'], dst_transform=agriculture.attrs['transform'], dst_crs=agriculture.attrs['crs']) # This forces realization of the dataset in memory. I don't do much out-of-core. 4. Stack them into the same Dataset, sharing the transform and coordinate variables from the fine-resolution array. 5. Now use xarray to do cool compuations on the aligned datasets.

Reprojecting or clipping after reading xarray, like I do, goes against @perrygeo's recommendation. So maybe my example is moot, but I really like being able to do this programmatically in python, not CLI.

Even if xarray's new rasterio backend only provides a reader (and not rasterio.warp or rasterio.features functions), Step 3 shows it's very useful if, in addition to the data, the reader will expose the CRS and affine transform objects to the client.

However, if you both expose the transform and realize the coordinate variables, it's possible for them to diverge as the single source of truth. In my above workflow, anytime I clip (step 2) or warp (step 3) data, my side library needed to manually re-set that DataArray's transform and coordinate variables. (This is surely out of scope for rasterio or xarray!)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Optional Read-Only RasterIO backend 140063713
211044118 https://github.com/pydata/xarray/issues/790#issuecomment-211044118 https://api.github.com/repos/pydata/xarray/issues/790 MDEyOklzc3VlQ29tbWVudDIxMTA0NDExOA== IamJeffG 2002703 2016-04-17T15:28:41Z 2016-04-17T15:28:41Z CONTRIBUTOR

@jhamman Small observation from your notebook: It's not good to assume a negative y-step size. Rarely, I will come across a dataset that breaks convention with a positive y coordinate, meaning the first pixel is the lower-left corner, but at least the dataset is self-consistent. Rasterio works beautifully even with these black sheep, so we don't want an xarray reader to force the assumption.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Optional Read-Only RasterIO backend 140063713
198474766 https://github.com/pydata/xarray/issues/790#issuecomment-198474766 https://api.github.com/repos/pydata/xarray/issues/790 MDEyOklzc3VlQ29tbWVudDE5ODQ3NDc2Ng== jhamman 2443309 2016-03-18T18:04:52Z 2016-03-18T18:04:52Z MEMBER

@shoyer - that's what I was thinking too. In fact, that's more or less what I did in this example, although this is a eager implementation: https://anaconda.org/jhamman/rasterio_to_xarray/notebook

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Optional Read-Only RasterIO backend 140063713
198428399 https://github.com/pydata/xarray/issues/790#issuecomment-198428399 https://api.github.com/repos/pydata/xarray/issues/790 MDEyOklzc3VlQ29tbWVudDE5ODQyODM5OQ== shoyer 1217238 2016-03-18T16:04:40Z 2016-03-18T16:04:40Z MEMBER

Because each point can be computed separately, we could straightforwardly add latitude/longitude as lazily computed 2D arrays (under "coordinates"), similarly to how we currently handle on-the-fly data rescaling.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Optional Read-Only RasterIO backend 140063713
198343073 https://github.com/pydata/xarray/issues/790#issuecomment-198343073 https://api.github.com/repos/pydata/xarray/issues/790 MDEyOklzc3VlQ29tbWVudDE5ODM0MzA3Mw== perrygeo 1151287 2016-03-18T12:57:14Z 2016-03-18T12:57:14Z NONE

@fmaussion @jhamman re: projecting coordinates to lat-lng.

If you consider the raster cells as independent points, you can project them independently but they will likely not be regularly spaced. With few exceptions, if you need to maintain a regular grid, transforming data between projections will alter the shape of the array and require resampling (GDAL and rasterio call the process "warping" to reflect this). There are decisions and tradeoffs to be considered with the various resampling methods, selecting new extents and cell sizes, etc so it's typically not something you want to do on-the-fly for analyses.

I think keeping the xarray coordinates as generic cartesian x-y makes sense, at least initially. Even in many GIS tools, analysis is done on a naive 2D plane and it's assumed that the inputs are of the same projection. I'd recommend doing any reprojection outside of xarray as a pre-processing step (with e.g. gdalwarp or rio warp).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Optional Read-Only RasterIO backend 140063713
197939841 https://github.com/pydata/xarray/issues/790#issuecomment-197939841 https://api.github.com/repos/pydata/xarray/issues/790 MDEyOklzc3VlQ29tbWVudDE5NzkzOTg0MQ== jhamman 2443309 2016-03-17T15:44:48Z 2016-03-17T15:44:48Z MEMBER

As for 1) I'm open to having more discussion on decoding the coordinates. My contention here is that are useful, even in their unstructured format, since it permits visualization out of the box. I'll ping @perrygeo for more on this.

2) I don't really want to get into this because there isn't a standard treatment in geotiffs so it would, at best, be a guess on our end.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Optional Read-Only RasterIO backend 140063713
197601132 https://github.com/pydata/xarray/issues/790#issuecomment-197601132 https://api.github.com/repos/pydata/xarray/issues/790 MDEyOklzc3VlQ29tbWVudDE5NzYwMTEzMg== fmaussion 10050469 2016-03-16T23:22:10Z 2016-03-16T23:22:10Z MEMBER

Hi @jhamman , this is close to how I would've done it, but I am maybe not the most qualified (probably the gis specialists from rasterio would be more helpful). But still, a couple of remarks from my side: - I wouldn't necessarily do the try_to_get_latlon_coords systematically. When the raster coords are lat-lon, the new coords are redundant. And when the coords are x-y, the lat-lon info are only partly useful (since the grid will be unstructured in lat-lon). Furthermore, I am not sure if +init=EPSG:4326 is the only lat-lon proj available (there are surely more - at least if you leave the wgs-84 area) - as mentioned by perrygeo in your rasterio post, the data model of geotiffs is not always clear. The pixel coordinates are very likely to be at the top-left corner of the pixel (as I assume in my small salem library). Most netcdf datasets we are using in the meteo/climate community are pixel-centered. I don't know if this is something that xarray wants to consider, but this becomes important if you want to make accurate projections. (in practice, the two concepts are equivalent for most applications, but you have to know what is what: in my small library I called those representations center_grid and corner_grid: https://github.com/fmaussion/salem/blob/master/salem/gis.py#L101 )

To your questions: 1. I agree that returning a dataset is a good idea. I don't know if raster is a good name, but I have no other idea right now 2. I don't know. The projection was always enough for me :flushed:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Optional Read-Only RasterIO backend 140063713
197573068 https://github.com/pydata/xarray/issues/790#issuecomment-197573068 https://api.github.com/repos/pydata/xarray/issues/790 MDEyOklzc3VlQ29tbWVudDE5NzU3MzA2OA== jhamman 2443309 2016-03-16T22:05:34Z 2016-03-16T22:05:34Z MEMBER

@fmaussion - Here's an example of the basic functionality I'm thinking of implementing: https://anaconda.org/jhamman/rasterio_to_xarray/notebook

A things to think about: 1. I've given each array the raster name. Does that make since? This allows us to return a Dataset instead of a DataArray. 2. Which attributes do we want to copy over from the rasterio dataset? It is not entirely clear which attributes in the rasterio._io.RasterReader object should become attrs. 3. I have not implemented lazy or windowed reading yet but it should be pretty straightforward using the window argument to src.read().

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Optional Read-Only RasterIO backend 140063713
196021945 https://github.com/pydata/xarray/issues/790#issuecomment-196021945 https://api.github.com/repos/pydata/xarray/issues/790 MDEyOklzc3VlQ29tbWVudDE5NjAyMTk0NQ== jhamman 2443309 2016-03-13T19:06:03Z 2016-03-13T19:06:03Z MEMBER

@fmaussion -

As for (1), I like your idea of leaving out the projection of the coordinates. That certainly makes things easier from the perspective of the backend.

A band dimension in (2) seems pretty manageable.

I'm not concerned about the GDAL dependency (3). I would love to see more robust conda support for GDAL but that's another issue. This would be an optional backend, similar to Pynio, which isn't broadly available on conda. We could sort out the CI issues.

So, if we took the simplest approach for implementing a rasterio backend, open_dataset would always return a Dataset with a single unprojected DataArray (name to be determined). The other big question is what to call the dimensions, since that is not explicitly provided in all raster formats.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Optional Read-Only RasterIO backend 140063713
195955210 https://github.com/pydata/xarray/issues/790#issuecomment-195955210 https://api.github.com/repos/pydata/xarray/issues/790 MDEyOklzc3VlQ29tbWVudDE5NTk1NTIxMA== fmaussion 10050469 2016-03-13T13:13:56Z 2016-03-13T13:15:07Z MEMBER

Hi @jhamman , I tend to agree with your doubts. I'll still comment on your cons:

To (1): I also think that xarray should avoid opening the projection can of worms. But the minimum things that xarray could do with rasterio is to read corner coordinates, dx and dy and define the two coordinates "x" and "y" out of it, without taking care of whether these are meters, degrees of arc or whatever. As long at the other rasterio file attributes are available as attribute of the DataArray or DataSet objects, users can do their own mixture

To (2): some geotiffs files also have more than one band. I don't know if these bands are named or have metadata, so maybe xarray will have to take decisions about these names too (most probably 1, 2, 3...).

I'll add a (3): rasterio depends on GDAL, which is huge and every now and then causes trouble on conda. This might also cause troubles to the continuous integration of xarray

Altogether this might be more complicated than worth it, but maybe the rasterio folks have interest in this and might provide more support.

If the idea for xarray accessors is implemented (https://github.com/pydata/xarray/issues/706#issuecomment-169099306) this will allow more specific libraries like mine to do their own rasterio support at low cost.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Optional Read-Only RasterIO backend 140063713
195676319 https://github.com/pydata/xarray/issues/790#issuecomment-195676319 https://api.github.com/repos/pydata/xarray/issues/790 MDEyOklzc3VlQ29tbWVudDE5NTY3NjMxOQ== jhamman 2443309 2016-03-12T07:01:12Z 2016-03-12T07:01:12Z MEMBER

Thanks @fmaussion. This was a helpful illustration of how this could be done. The salem GeoTiff and Grid objects include all and more than I was hoping to implement in xarray. However, after a bit more looking into this, I have mixed feelings about whether this would work in xarray. A brief summary of the pros/cons as I see them now.

Pros: - Rasterio supports a wide range of raster formats (e.g. GeoTiff, ArcInfo ASCII Grid, etc.) - Combined with pyproj, coordinate variables can be inferred - Supports windowed reading (and writing), this would fit in well with the chunking approach already taken by xarray. - Supports lazy loading of array values, this would fit in well with the loading policies of the other xarray backends.

Cons: - Would require xarray to adopt conventions for projecting arrays, naming (coordinates, dimensions, arrays), and handling of raster attributes. I can image ways this could be done but it may take us in down a path we don't want to go. - The xarray backends generally return Datasets, however, rasterio returns individual arrays that would better be applied to DataArrays.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Optional Read-Only RasterIO backend 140063713
195254611 https://github.com/pydata/xarray/issues/790#issuecomment-195254611 https://api.github.com/repos/pydata/xarray/issues/790 MDEyOklzc3VlQ29tbWVudDE5NTI1NDYxMQ== fmaussion 10050469 2016-03-11T08:25:04Z 2016-03-11T08:25:04Z MEMBER

:+1: Rasterio shines at reading georeferencing metadata out of any file, and I guess it would be no big deal to treat the various info as attributes in an xarray dataset. It is also possible to do lazy reading out of rasterio files.

(example with a geotiff file: https://github.com/fmaussion/salem/blob/master/salem/datasets.py#L263)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Optional Read-Only RasterIO backend 140063713

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.401ms · About: xarray-datasette