home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

38 rows where issue = 206905158 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 7

  • fmaussion 18
  • shoyer 9
  • gidden 6
  • benbovy 2
  • PeterDSteinberg 1
  • jhamman 1
  • Zac-HD 1

author_association 3

  • MEMBER 30
  • CONTRIBUTOR 7
  • NONE 1

issue 1

  • Add RasterIO backend · 38 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
306546430 https://github.com/pydata/xarray/pull/1260#issuecomment-306546430 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwNjU0NjQzMA== PeterDSteinberg 1445602 2017-06-06T16:44:43Z 2017-06-06T16:44:43Z NONE

+1 @fmaussion - Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
306546073 https://github.com/pydata/xarray/pull/1260#issuecomment-306546073 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwNjU0NjA3Mw== jhamman 2443309 2017-06-06T16:43:30Z 2017-06-06T16:43:30Z MEMBER

Nice work @fmaussion. Thanks for bringing this home.

{
    "total_count": 5,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
306445080 https://github.com/pydata/xarray/pull/1260#issuecomment-306445080 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwNjQ0NTA4MA== fmaussion 10050469 2017-06-06T10:25:01Z 2017-06-06T10:25:01Z MEMBER

OK, let's get this one in and see what people will report about it.

Thanks @shoyer for your patience, @NicWayand and @jhamman for the original PR, @gidden for the testing/reviews and @sgillies for rasterio!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
305446339 https://github.com/pydata/xarray/pull/1260#issuecomment-305446339 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwNTQ0NjMzOQ== fmaussion 10050469 2017-06-01T09:53:24Z 2017-06-01T09:53:24Z MEMBER

@gidden I just updated the documentation recipe to use an accessor to compute the lons and lats, let me know what you think

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
305428608 https://github.com/pydata/xarray/pull/1260#issuecomment-305428608 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwNTQyODYwOA== fmaussion 10050469 2017-06-01T08:39:02Z 2017-06-01T08:39:02Z MEMBER

@gidden yes absolutely please give it a try, I think it's ready

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
305417871 https://github.com/pydata/xarray/pull/1260#issuecomment-305417871 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwNTQxNzg3MQ== gidden 1392657 2017-06-01T07:53:59Z 2017-06-01T07:53:59Z CONTRIBUTOR

Hey @fmaussion, is this ready for me to try out again? I wanted to let you and @shoyer iterate first.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
305243114 https://github.com/pydata/xarray/pull/1260#issuecomment-305243114 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwNTI0MzExNA== shoyer 1217238 2017-05-31T16:32:27Z 2017-05-31T16:32:27Z MEMBER

Currently the rasterio tests are running on py36 only. Should I add rasterio to the other test suites as well?

Yes, except for the one that tests bare-minimal dependencies.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
305240650 https://github.com/pydata/xarray/pull/1260#issuecomment-305240650 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwNTI0MDY1MA== fmaussion 10050469 2017-05-31T16:23:30Z 2017-05-31T16:23:30Z MEMBER

OK, all green.

Currently the rasterio tests are running on py36 only. Should I add rasterio to the other test suites as well?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
305231172 https://github.com/pydata/xarray/pull/1260#issuecomment-305231172 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwNTIzMTE3Mg== shoyer 1217238 2017-05-31T15:51:24Z 2017-05-31T15:51:24Z MEMBER

Yes, please!

On Wed, May 31, 2017 at 8:48 AM, Fabien Maussion notifications@github.com wrote:

@fmaussion commented on this pull request.

In xarray/backends/rasterio_.py https://github.com/pydata/xarray/pull/1260#discussion_r119395579:

  • chunks : int, tuple or dict, optional
  • Chunk sizes along each dimension, e.g., 5, (5, 5) or
  • {'x': 5, 'y': 5}. If chunks is provided, it used to load the new
  • DataArray into a dask array. This is an experimental feature; see the
  • documentation for more details.
  • cache : bool, optional
  • If True, cache data loaded from the underlying datastore in memory as
  • NumPy arrays when accessed to avoid reading from the underlying data-
  • store multiple times. Defaults to True unless you specify the chunks
  • argument to use dask, in which case it defaults to False. Does not
  • change the behavior of coordinates corresponding to dimensions, which
  • always load their data from disk into a pandas.Index.
  • lock : False, True or threading.Lock, optional
  • If chunks is provided, this argument is passed on to
  • :py:func:dask.array.from_array. By default, a per-variable lock is
  • used when reading data from netCDF files with the netcdf4 and h5netcdf

Now I understand much more what's going on, and also why salem was having troubles when adding diagnostic variables to netcdf objects. Can I update the docs all at once here?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1260#discussion_r119395579, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1p7_JyTe8jdB1omZm_4V005U_ex_ks5r_YuygaJpZM4L92Pl .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
304481844 https://github.com/pydata/xarray/pull/1260#issuecomment-304481844 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwNDQ4MTg0NA== shoyer 1217238 2017-05-27T23:23:02Z 2017-05-27T23:23:02Z MEMBER

I think DataArray.chunk should probably gain those parameters. There just wasn't any need for it before. On Sat, May 27, 2017 at 9:35 AM Fabien Maussion notifications@github.com wrote:

@fmaussion commented on this pull request.

In xarray/backends/rasterio_.py https://github.com/pydata/xarray/pull/1260#discussion_r118824181:

  • y = np.linspace(start=y0, num=ny, stop=(y0 + (ny - 1) * dy)) +
  • Get bands

  • if riods.count < 1:
  • raise ValueError('Unknown dims')
  • bands = np.asarray(riods.indexes) +
  • Attributes

  • attrs = {}
  • if hasattr(riods, 'crs'):
  • CRS is a dict-like object specific to rasterio

  • We convert it back to a PROJ4 string using rasterio itself

  • attrs['crs'] = riods.crs.to_string()
  • Maybe we'd like to parse other attributes here (for later)

    +
  • data = indexing.LazilyIndexedArray(RasterioArrayWrapper(riods))

Implementing cache went well (appart from this bug https://github.com/pydata/xarray/issues/1429.

For chunking I have a question: DataArray.chunk() doesn't have http://xarray.pydata.org/en/stable/generated/xarray.DataArray.chunk.html#xarray.DataArray.chunk the name_prefix, token and lock keywords. Any reason for this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1260#discussion_r118824181, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1mo_XZFgCNEq4YoJunYeeWve5XHPks5r-FDDgaJpZM4L92Pl .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
304137429 https://github.com/pydata/xarray/pull/1260#issuecomment-304137429 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwNDEzNzQyOQ== fmaussion 10050469 2017-05-25T22:05:32Z 2017-05-25T22:05:41Z MEMBER

Thanks @shoyer , I have addressed all your comments but one which I didn't understand. Maybe we should wait for an answer of the rasterio devs about the dtype stuff before going on too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
303800141 https://github.com/pydata/xarray/pull/1260#issuecomment-303800141 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMzgwMDE0MQ== fmaussion 10050469 2017-05-24T17:48:05Z 2017-05-24T17:48:05Z MEMBER

This is ready for another round of reviews! I think this has come out quite nicely. Everything is much simpler now.

I have: - included all your comments - removed the GIS part - added an example on how to parse lons and lats in the new "recipes" section

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
303370889 https://github.com/pydata/xarray/pull/1260#issuecomment-303370889 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMzM3MDg4OQ== gidden 1392657 2017-05-23T11:34:47Z 2017-05-23T11:34:47Z CONTRIBUTOR

Sounds good!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
303369876 https://github.com/pydata/xarray/pull/1260#issuecomment-303369876 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMzM2OTg3Ng== fmaussion 10050469 2017-05-23T11:29:39Z 2017-05-23T11:29:39Z MEMBER

Would it be better to use the string representation of the CRS internally after reading in?

Yes this was my intention.

BTW, your serialization above doesn't work because the variable "raster" also has a CRS attr. This problem will be solved by the next iteration of my code (when data arrays will be returned instead of datasets)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
303330504 https://github.com/pydata/xarray/pull/1260#issuecomment-303330504 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMzMzMDUwNA== gidden 1392657 2017-05-23T08:37:59Z 2017-05-23T08:37:59Z CONTRIBUTOR

Note that the above tiff was made with something like the following code:

``` res = 0.5 nlat = 180 nlon = 360 left_lon = -180 upper_lat = 90 crs = 'epsg:4326'

profile = { 'affine': rio.Affine(res, 0.0, left_lon, 0.0, -res, upper_lat), 'dtype': 'int32', 'height': int(nlat / res), 'width': int(nlon / res), 'nodata': '-1', 'crs': crs, } ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
303330186 https://github.com/pydata/xarray/pull/1260#issuecomment-303330186 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMzMzMDE4Ng== gidden 1392657 2017-05-23T08:36:46Z 2017-05-23T08:36:46Z CONTRIBUTOR

Hey @fmaussion, I guess my question here is as follows: if a raster was generated by rasterio (thus using a dictionary representation of the CRS object even if a string is provided), should xarrary then fail to write to netcdf with that dataset?

If so, then that means that all users will have to do

ds = open_rasterio('file.tiff') ds.attrs['crs'] = ds.attrs['crs'].to_string() ds.to_netcdf('file.nc')

Would it be better to use the string representation of the CRS internally after reading in? I think this would solve @shoyer's concern about onboarding non-primitive datatypes.

As an aside, even my above suggestion did not work.. I'm not sure why.

``` In [33]: ds = xr.open_rasterio('./isimip_centroid_0_5.tiff')

In [34]: ds Out[34]: <xarray.Dataset> Dimensions: (band: 1, x: 720, y: 360) Coordinates: * y (y) float64 90.0 89.5 89.0 88.5 88.0 87.5 87.0 86.5 86.0 85.5 ... * x (x) float64 -180.0 -179.5 -179.0 -178.5 -178.0 -177.5 -177.0 ... * band (band) int64 1 Data variables: raster (band, y, x) int32 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ... Attributes: crs: CRS({'init': u'epsg:4326'})

In [35]: ds.attrs['crs'] = ds.attrs['crs'].to_string()

In [36]: ds Out[36]: <xarray.Dataset> Dimensions: (band: 1, x: 720, y: 360) Coordinates: * y (y) float64 90.0 89.5 89.0 88.5 88.0 87.5 87.0 86.5 86.0 85.5 ... * x (x) float64 -180.0 -179.5 -179.0 -178.5 -178.0 -177.5 -177.0 ... * band (band) int64 1 Data variables: raster (band, y, x) int32 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ... Attributes: crs: +init=epsg:4326

In [37]: ds.to_netcdf('test.nc')

TypeError Traceback (most recent call last) <ipython-input-37-d7687fe1d488> in <module>() ----> 1 ds.to_netcdf('test.nc')

/home/gidden/.local/lib/python2.7/site-packages/xarray-0.9.5_34_g48c7268-py2.7.egg/xarray/core/dataset.pyc in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims) 975 return to_netcdf(self, path, mode, format=format, group=group, 976 engine=engine, encoding=encoding, --> 977 unlimited_dims=unlimited_dims) 978 979 def unicode(self):

/home/gidden/.local/lib/python2.7/site-packages/xarray-0.9.5_34_g48c7268-py2.7.egg/xarray/backends/api.pyc in to_netcdf(dataset, path_or_file, mode, format, group, engine, writer, encoding, unlimited_dims) 588 # validate Dataset keys, DataArray names, and attr keys/values 589 _validate_dataset_names(dataset) --> 590 _validate_attrs(dataset) 591 592 try:

/home/gidden/.local/lib/python2.7/site-packages/xarray-0.9.5_34_g48c7268-py2.7.egg/xarray/backends/api.pyc in _validate_attrs(dataset) 119 for variable in dataset.variables.values(): 120 for k, v in variable.attrs.items(): --> 121 check_attr(k, v) 122 123

/home/gidden/.local/lib/python2.7/site-packages/xarray-0.9.5_34_g48c7268-py2.7.egg/xarray/backends/api.pyc in check_attr(name, value) 110 'string, ndarray or a list/tuple of ' 111 'numbers/strings for serialization to netCDF ' --> 112 'files'.format(value)) 113 114 # Check attrs on the dataset itself

TypeError: Invalid value for attr: CRS({'init': u'epsg:4326'}) must be a number string, ndarray or a list/tuple of numbers/strings for serialization to netCDF files ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
303327051 https://github.com/pydata/xarray/pull/1260#issuecomment-303327051 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMzMyNzA1MQ== fmaussion 10050469 2017-05-23T08:23:59Z 2017-05-23T08:23:59Z MEMBER

@shoyer @gidden tjhanks for testing, this is very useful.

Indeed rasterio uses a dict-like mapping for the PROJ4 strings (source: rasterio docs). This isn't a big deal since it provides the to_string() and from_string() methods which allow the do the round-trip.

I personally never noted the difference because pyproj (the python interface to the PROJ.4 library) can handle both representations:

``` In [1]: import xarray as xr

In [2]: ds = xr.open_rasterio('RGB.byte.tif')

In [3]: ds.crs Out[3]: CRS({'init': 'epsg:32618'})

In [4]: import pyproj

In [5]: pyproj.Proj(ds.crs) Out[5]: <pyproj.Proj at 0x7f12317c0468>

In [6]: pyproj.Proj(ds.crs.to_string()) Out[6]: <pyproj.Proj at 0x7f12317c0408> ```

My suggestion here (to avoid the serialisation problems you mention @gidden ) is to convert the CRS object to a string at read time

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
303317475 https://github.com/pydata/xarray/pull/1260#issuecomment-303317475 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMzMxNzQ3NQ== gidden 1392657 2017-05-23T07:47:35Z 2017-05-23T07:57:34Z CONTRIBUTOR

Hey @fmaussion, should we also do any attribute checking, or should we let the user fails as follows? I recently was testing this out and made a *.tiff with a shapefile as read from fiona. I got the following error:

``` In [12]: ds Out[12]: <xarray.Dataset> Dimensions: (band: 1, x: 720, y: 360) Coordinates: * y (y) float64 90.0 89.5 89.0 88.5 88.0 87.5 87.0 86.5 86.0 85.5 ... * x (x) float64 -180.0 -179.5 -179.0 -178.5 -178.0 -177.5 -177.0 ... * band (band) int64 1 Data variables: raster (band, y, x) int32 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ... Attributes: crs: CRS({'init': u'epsg:4326'})

In [13]: ds.to_netcdf('test.nc')

TypeError Traceback (most recent call last) <ipython-input-12-d7687fe1d488> in <module>() ----> 1 ds.to_netcdf('test.nc')

/home/gidden/.local/lib/python2.7/site-packages/xarray-0.9.5_34_g48c7268-py2.7.egg/xarray/core/dataset.pyc in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims) 975 return to_netcdf(self, path, mode, format=format, group=group, 976 engine=engine, encoding=encoding, --> 977 unlimited_dims=unlimited_dims) 978 979 def unicode(self):

/home/gidden/.local/lib/python2.7/site-packages/xarray-0.9.5_34_g48c7268-py2.7.egg/xarray/backends/api.pyc in to_netcdf(dataset, path_or_file, mode, format, group, engine, writer, encoding, unlimited_dims) 588 # validate Dataset keys, DataArray names, and attr keys/values 589 _validate_dataset_names(dataset) --> 590 _validate_attrs(dataset) 591 592 try:

/home/gidden/.local/lib/python2.7/site-packages/xarray-0.9.5_34_g48c7268-py2.7.egg/xarray/backends/api.pyc in _validate_attrs(dataset) 114 # Check attrs on the dataset itself 115 for k, v in dataset.attrs.items(): --> 116 check_attr(k, v) 117 118 # Check attrs on each variable within the dataset

/home/gidden/.local/lib/python2.7/site-packages/xarray-0.9.5_34_g48c7268-py2.7.egg/xarray/backends/api.pyc in check_attr(name, value) 110 'string, ndarray or a list/tuple of ' 111 'numbers/strings for serialization to netCDF ' --> 112 'files'.format(value)) 113 114 # Check attrs on the dataset itself

TypeError: Invalid value for attr: CRS({'init': u'epsg:4326'}) must be a number string, ndarray or a list/tuple of numbers/strings for serialization to netCDF files ```

And it seems like this happens whether I use a dictionary originally or just a string.

Perhaps this is a slightly larger issue with cleaning the CRS object such that it is netcdf compatible? Does this mean just a

crs = ' '.join('+{}={}'.format(k, v) for k, v in crs.items()) '?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
303159959 https://github.com/pydata/xarray/pull/1260#issuecomment-303159959 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMzE1OTk1OQ== fmaussion 10050469 2017-05-22T17:01:58Z 2017-05-22T17:01:58Z MEMBER

Uh, right, this is obviously not a string!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
303153678 https://github.com/pydata/xarray/pull/1260#issuecomment-303153678 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMzE1MzY3OA== shoyer 1217238 2017-05-22T16:38:03Z 2017-05-22T16:38:03Z MEMBER

side note: "rasterio CRS objects" are in fact strings corresponding to a PROJ4 string that will always be understood by pyproj, rasterio, gdal, etc. Examples: '+proj=aea +lat_1=-18 +lat_2=-32 +lat_0=0 +lon_0=24 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs ' or 'EPSG:4326')

Ah, that seems already pretty clean then. I was confused by the repr for a CRS() object in your comment above: https://github.com/pydata/xarray/pull/1260#issuecomment-279064452

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
302951509 https://github.com/pydata/xarray/pull/1260#issuecomment-302951509 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMjk1MTUwOQ== shoyer 1217238 2017-05-21T17:43:45Z 2017-05-22T16:36:32Z MEMBER

You should be able to simply wrap RasterioArrayWrapper in an indexing.LazilyIndexedArray and then pass it into the xarray.DataArray constructor directly.

EDIT: open_dataset also wrap data with CopyOnWriteArray and optionally MemoryCachedArray. These are also probably worth using for consistency, e.g., see these lines from xarray/backends/api.py: python data = indexing.CopyOnWriteArray(variable._data) if cache: data = indexing.MemoryCachedArray(data)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
303005011 https://github.com/pydata/xarray/pull/1260#issuecomment-303005011 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMzAwNTAxMQ== Zac-HD 12229877 2017-05-22T05:49:37Z 2017-05-22T05:49:37Z CONTRIBUTOR

I would also favour doing nothing (ie 3), because most users will already have some solution. It's also easier to change later if we don't do anything now - no need to think at all about backwards compatibility, and the design can be guided by how people are using the existing parts.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
302949290 https://github.com/pydata/xarray/pull/1260#issuecomment-302949290 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMjk0OTI5MA== fmaussion 10050469 2017-05-21T17:04:33Z 2017-05-21T17:04:33Z MEMBER

Thanks for looking into this!

I'm a slightly reluctant to add a dedicated method for convert rasterio CRS objects into lat/lon arrays. It feels a little overly specialized.

OK, so this corresponds to my solution 3 above (do nothing). I will however add the relevant lines of code to the documentation so that users wanting to add lons and lats to their data can do so. This will leave room for solution 2 is someone has the time to do it later. (side note: "rasterio CRS objects" are in fact strings corresponding to a PROJ4 string that will always be understood by pyproj, rasterio, gdal, etc. Examples: '+proj=aea +lat_1=-18 +lat_2=-32 +lat_0=0 +lon_0=24 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs ' or 'EPSG:4326')

Why return a Dataset rather than a DataArray? The later feels a little more natural to me, given that a rasterio describes a single array.

Agreed. Again this is built out of discussions on https://github.com/pydata/xarray/issues/790, but now obsolete. This speaks even stronger against the use of a DataStore, right? Is there any class I should inherit from to create a DataArray from a rasterio file? I basically need an init() and a __getitem()...

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
302941653 https://github.com/pydata/xarray/pull/1260#issuecomment-302941653 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMjk0MTY1Mw== fmaussion 10050469 2017-05-21T14:54:19Z 2017-05-21T16:05:14Z MEMBER

Thanks @gidden for the comments! Will look into it.

Your questions about the add_latlon makes me think that a kwarg at read time isn't the right approach. Here are the scenarios where you don't want to have the lat/lon computed per defaut: - when your data's crs is already WGS84, in that case x and y are lons and lats already - when your file isn't georeferenced properly - when your use case doesn't need them (salem for example will make better use of crs than xarray) - when your file is large: computing lons and lats on a huge 2D grid is going to be prohibitively expensive.

This latter use case is important, because it might be useful for users to first subset their data and then compute the lat lons. For this use case we could go for two options in place of the kwarg: 1. add a top level utility function get_latlon_from_crs which would work on any dataset with a proper crs (and could be extended) 2. compute lons and lats lazily (only when asked for) 3. do nothing and let the users do their own cuisine (consistent with xarray's general purpose)

I don't know how to do 2 because it implies using dask to compute two related variables at the same time. Furthermore, 2 requires dask while 1 could be extended towards other datasets which have a crs.

Right now I tend towards 3 (because I use salem), although I guess that many users will benefit from 1...

@gidden @shoyer @benbovy : thoughts?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
302945676 https://github.com/pydata/xarray/pull/1260#issuecomment-302945676 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMjk0NTY3Ng== gidden 1392657 2017-05-21T16:02:01Z 2017-05-21T16:02:01Z CONTRIBUTOR

Hey @fmaussion, I personally think some lat/lon support should be included here. I would lean toward 1 personally, but would be interested in what others think.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
302931510 https://github.com/pydata/xarray/pull/1260#issuecomment-302931510 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMjkzMTUxMA== benbovy 4160723 2017-05-21T11:42:45Z 2017-05-21T11:42:45Z MEMBER

a vast majority of the datasets xarray is reading are raster datasets (although in NetCDF format), hence "open_raster" could be confusing. "open_rasterio" underlines the fact that this opens "all datasets rasterio can open".

Yep makes sense!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
302652106 https://github.com/pydata/xarray/pull/1260#issuecomment-302652106 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMjY1MjEwNg== fmaussion 10050469 2017-05-19T09:14:47Z 2017-05-19T09:14:47Z MEMBER

which is also why I was questioning reusing the existing xarray backends system.

I am starting to understand what you mean, but in the absence of template I guess this was the easiest way to go (I overtook the design of the original PR). If you agree I'd suggest you to have a more detailed look at the current PR when you have time, and we can decide what to do from here. Since the public facing API shouldn't be affected we could also keep the current design for now and go back to it later when https://github.com/pydata/xarray/pull/1087 is ready.

Just a small suggestion: to me open_raster seems a slightly better name than open_rasterio as the dataset is a 'raster', not a 'rasterio'.

Yes I thought about it too, but a vast majority of the datasets xarray is reading are raster datasets (although in NetCDF format), hence "open_raster" could be confusing. "open_rasterio" underlines the fact that this opens "all datasets rasterio can open". I have no strong opinion about this though

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
302461027 https://github.com/pydata/xarray/pull/1260#issuecomment-302461027 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMjQ2MTAyNw== benbovy 4160723 2017-05-18T16:21:58Z 2017-05-18T16:21:58Z MEMBER

Excellent! I'm looking forward to see this merged!

Just a small suggestion: to me open_raster seems a slightly better name than open_rasterio as the dataset is a 'raster', not a 'rasterio'.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
302448132 https://github.com/pydata/xarray/pull/1260#issuecomment-302448132 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMjQ0ODEzMg== shoyer 1217238 2017-05-18T15:54:49Z 2017-05-18T15:54:49Z MEMBER

I haven't looked at the implementation yet, but I agree that a separate function open_rasterio is the way to go. Structurally, this is pretty different from opening a netCDF file, which is also why I was questioning reusing the existing xarray backends system.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
302255969 https://github.com/pydata/xarray/pull/1260#issuecomment-302255969 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMjI1NTk2OQ== fmaussion 10050469 2017-05-17T23:07:44Z 2017-05-17T23:10:11Z MEMBER

Folks, I finally managed to find a couple of hours to wrap this up: this is now ready for review.

Everything seems to work the way I'd like it to work, and only one thing is missing: the lazy computation of lons and lats with dask (I don't know how to do this quickly and I have not enough time to spend on this right now, unfortunately). This has been waiting for too long now, so I suggest to merge this when ready and this feature later on.

The solution retained for the API is to add a new open_rasterio top-level function: this makes sense since many keywords of open_dataset aren't relevant for rasterio datasets. It also underlines that rasterio datasets are quit different from xarray's data model.

another example I could add to the soon to come xarray gallery could be:

```python import xarray as xr import matplotlib.pyplot as plt import cartopy.crs as ccrs ds = xr.open_rasterio('RGB.byte.tif', add_latlon=True) ax = plt.subplot(projection=ccrs.PlateCarree()) ds.raster.sel(band=1).plot(ax=ax, x='lon', y='lat', transform=ccrs.PlateCarree()); ax.coastlines('10m'); ````

cc @gidden @jhamman @shoyer

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
284225363 https://github.com/pydata/xarray/pull/1260#issuecomment-284225363 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDI4NDIyNTM2Mw== fmaussion 10050469 2017-03-05T12:42:45Z 2017-03-05T12:42:45Z MEMBER

@shoyer thanks for the tips, I think that getting the lons and lats from dask is probably the most elegant method.

After trying various things I am still struggling with dask, and in particular on how to apply elemwise to functions like np.meshgrid (I get shape broadcasting errors). To get me started, I'd be grateful for an example on how to use dask to replace the code snippet below:

```python import xarray as xr import numpy as np

ds = xr.DataArray(np.zeros((2, 3)), coords={'x': np.arange(3), 'y': np.arange(2)}, dims=['y', 'x']).to_dataset(name='data')

non-dask version

lon, lat = np.meshgrid(ds.x, ds.y) ds['lon'] = (('y', 'x'), lon) ds['lat'] = (('y', 'x'), lat) ds.set_coords(['lon', 'lat'], inplace=True) print(ds) ```

Thanks a lot!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
279897816 https://github.com/pydata/xarray/pull/1260#issuecomment-279897816 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDI3OTg5NzgxNg== shoyer 1217238 2017-02-15T02:13:27Z 2017-02-15T02:13:27Z MEMBER

my initial idea to make them lazily evaluated might work, but in an ugly way: computing both lons and lats needs to be done in one single operation, and I'm not sure how this can be done in an elegant way

This could be done sanely with dask.array (example of multiple outputs from an element-wise function).

additionally, there is no way to make them show up as coordinates (as per @shoyer 's comment: #1260 (comment))

We could make this work, it just looks pretty hacky.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
279686053 https://github.com/pydata/xarray/pull/1260#issuecomment-279686053 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDI3OTY4NjA1Mw== fmaussion 10050469 2017-02-14T11:43:27Z 2017-02-14T11:43:27Z MEMBER

I made some progress with the lazy indexing, I'd be glad to have a first rough feedback on whether this is going in the right direction or not.

We have a decision to make regarding the API: I think that creating the optional lon and lat coords automatically is not a good idea: - in some cases, the x and y coordinates are already lons and lats and the 2D coords are obsolete - for big data files this is going to take ages and take a lot of memory - my initial idea to make them lazily evaluated might work, but in an ugly way: computing both lons and lats needs to be done in one single operation, and I'm not sure how this can be done in an elegant way - additionally, there is no way to make them show up as coordinates (as per @shoyer 's comment: https://github.com/pydata/xarray/pull/1260#issuecomment-279101252)

The current implementation delegates this task to a utility function (get_latlon_coords_from_crs). It is currently very rasterio specific, but could be made more general -- API to be defined.

Thoughts?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
279101252 https://github.com/pydata/xarray/pull/1260#issuecomment-279101252 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDI3OTEwMTI1Mg== shoyer 1217238 2017-02-11T00:14:36Z 2017-02-11T00:14:36Z MEMBER

Getting optional coordinates is hard is because the current "backends" system in xarray was really only designed for handling netCDF-like libraries.

For example, every backend gets passed through xarray.conventions.decode_cf to use CF conventions to decode its data. You would actually need to add a "global attribute" specifying coords='lat lon' to set coordinates variables with this system.

One alternative would be to simply write a function for reading files with rasterio straight into xarray Dataset objects.

There are probably hacks that can get this working with data stores but this really calls out for the bigger refactor like https://github.com/pydata/xarray/pull/1087.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
279076276 https://github.com/pydata/xarray/pull/1260#issuecomment-279076276 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDI3OTA3NjI3Ng== fmaussion 10050469 2017-02-10T21:52:35Z 2017-02-10T21:52:35Z MEMBER

thanks @shoyer , I think I can work on this a bit further now and I'll get back to you if I have more questions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
279072956 https://github.com/pydata/xarray/pull/1260#issuecomment-279072956 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDI3OTA3Mjk1Ng== fmaussion 10050469 2017-02-10T21:37:45Z 2017-02-10T21:37:45Z MEMBER

Can you clarify what you mean by an optional coordinate?

Yes sorry, I meant the have them listed as coordinates without * instead of Data variables.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
279070566 https://github.com/pydata/xarray/pull/1260#issuecomment-279070566 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDI3OTA3MDU2Ng== shoyer 1217238 2017-02-10T21:28:02Z 2017-02-10T21:28:02Z MEMBER

The order of the variables and coordinates is random.

Most likely you're using a dict instead of an OrderedDict somewhere.

Is it possible to make the x and y coords lazy?

See here for an example of lazily computing arrays: https://github.com/pydata/xarray/blob/62333208fc2a80c05848a12de67a10f00a6610a1/xarray/conventions.py#L314

There are other examples in that file, for decoding CF conventions.

I'd like to have the lon and lat variables listed as optional coordinates

Can you clarify what you mean by an optional coordinate?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
279064452 https://github.com/pydata/xarray/pull/1260#issuecomment-279064452 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDI3OTA2NDQ1Mg== fmaussion 10050469 2017-02-10T21:00:16Z 2017-02-10T21:00:16Z MEMBER

Before I'll get more into details with what needs to be done with rasterio itself, I'd like to get some xarray internals ready first. No need to do a full review yet, but I'd appreciate help with the following points:

  1. The order of the variables and coordinates is random. The current __repr__ can look like this:

<xarray.Dataset> Dimensions: (band: 1, x: 4, y: 3) Coordinates: * x (x) float64 1.0 1.5 2.0 2.5 * band (band) int64 1 * y (y) float64 2.0 1.0 0.0 Data variables: lat (y, x) float64 2.0 2.0 2.0 2.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 raster (band, y, x) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 ... lon (y, x) float64 1.0 1.5 2.0 2.5 1.0 1.5 2.0 2.5 1.0 1.5 2.0 2.5 Attributes: crs: CRS({'wktext': True, 'proj': 'longlat', 'no_defs': True, 'ellps': 'WGS84'}) transform: [1.0, 0.5, 0.0, 2.0, 0.0, -1.0] How is this possible?

  1. Is it possible to make the x and y coords lazy? Could you point me on a place in the code where this has been done already?

  2. I'd like to have the lon and lat variables listed as optional coordinates, and also make them lazy. Any hint?

Thanks a lot for your help, I'm afraid this is going to need a few iterations ;-)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 186.34ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows