id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 348462356,MDExOlB1bGxSZXF1ZXN0MjA2ODA3Mjkz,2351,Remove redundant code from open_rasterio and ensure all transform tuples are six elements long,296686,closed,0,,,2,2018-08-07T19:48:39Z,2018-08-13T22:34:18Z,2018-08-13T22:33:54Z,CONTRIBUTOR,,0,pydata/xarray/pulls/2351," - [x] Closes #2348 - [x] Tests added (for all bug fixes or enhancements) - [x] Tests passed (for all non-documentation changes) - [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later) This removes the redundant code that ended up with the `transform` attribute being set twice - and being set to a nine-element long tuple rather than the correct six-element long tuple. It also adds tests to ensure that all `transform` attributes are six-element-long tuples. I haven't made any changes to the documentation, as I wasn't sure if it was needed. This could potentially affect users as the documentation and the code differed and people may have written other interface code (as, in my case, code to export a DataArray to a GeoTIFF using rasterio) which relies on the transform element having 9 elements rather than the 6 it is meant to have. Any thoughts?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2351/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 348081353,MDU6SXNzdWUzNDgwODEzNTM=,2348,Should the transform attribute be a six-element or nine-element tuple when reading from rasterio?,296686,closed,0,,,2,2018-08-06T21:03:43Z,2018-08-13T22:33:54Z,2018-08-13T22:33:54Z,CONTRIBUTOR,,,,"My basic question is whether XArray should be storing the rasterio transform as a 6-element tuple or a 9-element tuple - as there seems to be a mismatch between the documentation and the actual code. The documentation at https://github.com/pydata/xarray/blob/7cd3442fc61e94601c3bfb20377f4f795cde584d/xarray/backends/rasterio_.py#L164-L170 says you can run the following code: ``` from affine import Affine da = xr.open_rasterio('path_to_file.tif') transform = Affine(*da.attrs['transform']) ``` This takes the tuple stored in the `transform` attribute and uses it as the arguments to the `Affine` class. However, running this gives an error: `TypeError: Expected 6 coefficients, found 9`. If you look at the code, then this line in the `open_rasterio` function sets the `transform` attribute to be a 6-element tuple - the first 6 elements of the full Affine tuple: https://github.com/pydata/xarray/blob/7cd3442fc61e94601c3bfb20377f4f795cde584d/xarray/backends/rasterio_.py#L249. However, about twenty lines later, another chunk of code looks to see if there is a transform attribute on the rasterio dataset and if so, sets the `transform` attribute to be the full Affine tuple (that is, a 9-element tuple): https://github.com/pydata/xarray/blob/7cd3442fc61e94601c3bfb20377f4f795cde584d/xarray/backends/rasterio_.py#L262-L268 Thus there seems to be confusion both within the code and the documentation as to whether the transform should be a six-element or nine-element tuple. Which is the intended behaviour? I am happy to submit a PR to fix either the code or the docs or both. #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.6.6.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 xarray: 0.10.8 pandas: 0.23.4 numpy: 1.14.2 scipy: 1.1.0 netCDF4: 1.4.0 h5netcdf: 0.6.1 h5py: 2.8.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.18.2 distributed: 1.22.1 matplotlib: 2.2.2 cartopy: None seaborn: None setuptools: 40.0.0 pip: 18.0 conda: None pytest: None IPython: 6.5.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2348/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 165151235,MDU6SXNzdWUxNjUxNTEyMzU=,897,Allow specification of figsize in plot methods,296686,closed,0,,,1,2016-07-12T18:45:26Z,2016-12-18T22:43:19Z,2016-12-18T22:43:19Z,CONTRIBUTOR,,,,"Pandas allows a call to `plot` like: ``` python df.plot(x='var1', y='var2', figsize=(10, 6)) ``` but this doesn't seem to be possible in xarray - but it would be handy if it were. It looks like fixing this would require modifying the `@_plot2d` decorator, specifically around https://github.com/pydata/xarray/blob/master/xarray/plot/plot.py#L376 (although I can't seem to find how to do the equivalent of `gca()` but set a `figsize` too). Any thoughts or ideas? ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/897/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 173632183,MDExOlB1bGxSZXF1ZXN0ODMwMDc3MzY=,990,Added convenience method for saving DataArray to netCDF file,296686,closed,0,,,17,2016-08-28T06:30:32Z,2016-09-06T04:00:25Z,2016-09-06T04:00:06Z,CONTRIBUTOR,,0,pydata/xarray/pulls/990,"Added a simple function to DataArray that creates a dataset with one variable called 'data' and then saves it to a netCDF file. All parameters are passed through to to_netcdf(). Added an equivalent function called `open_dataarray` to be used to load from these files. Fixes #915. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/990/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 173640823,MDExOlB1bGxSZXF1ZXN0ODMwMTI0MzE=,991,Added validation of attrs before saving to netCDF files,296686,closed,0,,,6,2016-08-28T11:01:18Z,2016-09-02T22:52:09Z,2016-09-02T22:52:04Z,CONTRIBUTOR,,0,pydata/xarray/pulls/991,"This allows us to give nice errors if users try to save a Dataset with attr values that can't be written to a netCDF file. Fixes #911. I've added tests to `test_backends.py` as I can't see a better place to put them. I've also made the tests fairly extensive, but also used some helper functions to stop too much repetition. Please let me know if any of this doesn't fit within the xarray style. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/991/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 166511736,MDU6SXNzdWUxNjY1MTE3MzY=,911,KeyError on saving to NetCDF - due to objects in attrs?,296686,closed,0,,,3,2016-07-20T07:09:38Z,2016-09-02T22:52:04Z,2016-09-02T22:52:04Z,CONTRIBUTOR,,,,"I have an xarray.Dataset that I'm trying to save out to a NetCDF file. The dataset looks like this: ``` python Out[97]: Dimensions: (x: 1240, y: 1162) Coordinates: * x (x) float64 -9.476e+05 -9.464e+05 -9.451e+05 -9.439e+05 ... * y (y) float64 1.429e+06 1.428e+06 1.427e+06 1.426e+06 1.424e+06 ... Data variables: data (y, x) float32 nan nan nan nan nan nan nan nan nan nan nan nan ... ``` It has two attributes, both of which have a string key, and a value which is an object (in this case, instances of classes from the `rasterio` library). ``` python OrderedDict([('affine', Affine(1256.5430440955893, 0.0, -947639.6305106478, 0.0, -1256.5430440955893, 1429277.8120091767)), ('crs', CRS({'init': 'epsg:27700'}))]) ``` When I try to save out the NetCDF using this code: ``` python ds.to_netcdf('test.nc') ``` I get the following error: ``` --------------------------------------------------------------------------- KeyError Traceback (most recent call last) in () ----> 1 ds.to_netcdf('blah3.nc') /Users/robin/anaconda3/lib/python3.5/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding) 789 from ..backends.api import to_netcdf 790 return to_netcdf(self, path, mode, format=format, group=group, --> 791 engine=engine, encoding=encoding) 792 793 dump = utils.function_alias(to_netcdf, 'dump') /Users/robin/anaconda3/lib/python3.5/site-packages/xarray/backends/api.py in to_netcdf(dataset, path, mode, format, group, engine, writer, encoding) 354 store = store_cls(path, mode, format, group, writer) 355 try: --> 356 dataset.dump_to_store(store, sync=sync, encoding=encoding) 357 if isinstance(path, BytesIO): 358 return path.getvalue() /Users/robin/anaconda3/lib/python3.5/site-packages/xarray/core/dataset.py in dump_to_store(self, store, encoder, sync, encoding) 735 variables, attrs = encoder(variables, attrs) 736 --> 737 store.store(variables, attrs, check_encoding) 738 if sync: 739 store.sync() /Users/robin/anaconda3/lib/python3.5/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set) 226 cf_variables, cf_attrs = cf_encoder(variables, attributes) 227 AbstractWritableDataStore.store(self, cf_variables, cf_attrs, --> 228 check_encoding_set) /Users/robin/anaconda3/lib/python3.5/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set) 201 if not (k in neccesary_dims and 202 is_trivial_index(v))) --> 203 self.set_variables(variables, check_encoding_set) 204 205 def set_attributes(self, attributes): /Users/robin/anaconda3/lib/python3.5/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set) 211 name = _encode_variable_name(vn) 212 check = vn in check_encoding_set --> 213 target, source = self.prepare_variable(name, v, check) 214 self.writer.add(source, target) 215 /Users/robin/anaconda3/lib/python3.5/site-packages/xarray/backends/netCDF4_.py in prepare_variable(self, name, variable, check_encoding) 277 # set attributes one-by-one since netCDF4<1.0.10 can't handle 278 # OrderedDict as the input to setncatts --> 279 nc4_var.setncattr(k, v) 280 return nc4_var, variable.data 281 netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.setncattr (netCDF4/_netCDF4.c:33460)() netCDF4/_netCDF4.pyx in netCDF4._netCDF4._set_att (netCDF4/_netCDF4.c:6171)() /Users/robin/anaconda3/lib/python3.5/collections/__init__.py in __getitem__(self, key) 967 if hasattr(self.__class__, ""__missing__""): 968 return self.__class__.__missing__(self, key) --> 969 raise KeyError(key) 970 def __setitem__(self, key, item): self.data[key] = item 971 def __delitem__(self, key): del self.data[key] KeyError: 0 ``` The error seems slightly strange to me, but it seems to be related to saving attributes. If I change the attributes to make all of the values strings (for example, using `ds['data'].attrs = {k: repr(v) for k, v in ds['data'].attrs.items()}`) then it saves out fine. Is there a restriction on what sort of values can be stored in `attrs` and saved out to NetCDF? If so, should this be enforced somehow? It would be ideal if any object could be stored as an attr and saved out (eg. as a pickle) - but this may be difficult (for example, for multiple python versions, if using pickle). Any thoughts? ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/911/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 166642852,MDU6SXNzdWUxNjY2NDI4NTI=,913,dtype changes after .load(),296686,closed,0,,,4,2016-07-20T17:56:35Z,2016-07-21T00:49:02Z,2016-07-21T00:49:02Z,CONTRIBUTOR,,,,"I've found that in some situations a `DataArray` using dask as the storage backend will report its `dtype` as `float32`, but then once the data has been loaded (eg. with `load()`) the `dtype` changes to `float64`. This surprised me, and actually caught me out in a few situations where I was writing code to export a DataArray to a custom file format (where the metadata specification for the custom format needed to know the `dtype` but then complained when the actual `dtype` was difference). Is this desired behaviour, or a bug? (Or somewhere in between...?). This only seems to occur with dask-backed DataArrays, and not 'normal' DataArrays. **Example:** Create the example netCDF file like this: ``` python xa = xr.DataArray(data=np.random.rand(10, 10).astype(np.float32)) xa.to_dataset(name='data').to_netcdf('test.nc') ``` Then doing some simple operations with normal DataArrays: ``` python normal_data = xr.open_dataset('test.nc')['data'] normal_data.dtype # => float32 normal_data.mean(dim='dim_0').dtype # => float32 ``` But doing the same thing in dask: ``` python dask_data = xr.open_dataset('test.nc', chunks={'dim_0': 2})['data'] dask_data.mean(dim='dim_0').dtype # => float32 dask_data.mean(dim='dim_0').load().dtype # => float64 ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/913/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue