id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 692016642,MDU6SXNzdWU2OTIwMTY2NDI=,4403,Add a callback/preprocess option to open_dataset,950575,closed,1,,,6,2020-09-03T14:21:58Z,2023-09-17T16:01:28Z,2023-09-17T16:01:28Z,CONTRIBUTOR,,,,"It is not uncommon to find datasets with bad metadata, like `gregorian_proleptic` instead of the expected `prolepic_gregorian` [1], that will prevent users from reading the full dataset. Ideally we could have a functionality similar to iris' callbacks [2] to workaround this. In fact, it looks like xarray already does something similar in the open_mfdataset but not for the open_dataset. Pinging @dcherian who gave the idea of using preprocess and @rsignell-usgs who is a pro on finding bad metadata everywhere. [1] https://nbviewer.jupyter.org/gist/rsignell-usgs/27ba1fdeb934d6fd5b83abe43098a047 [2] https://scitools.org.uk/iris/docs/latest/userguide/navigating_a_cube.html?highlight=callback#adding-and-removing-metadata-to-the-cube-at-load-time","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4403/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 320283034,MDExOlB1bGxSZXF1ZXN0MTg1OTg5ODY1,2105,Deprecate decode timedelta,950575,closed,0,,,3,2018-05-04T13:50:33Z,2019-05-17T13:48:30Z,2019-05-17T13:48:30Z,CONTRIBUTOR,,0,pydata/xarray/pulls/2105," - [X] Closes #1621 (remove if there is no corresponding issue, which should only be the case for minor changes) - [X] Tests added (for all bug fixes or enhancements) - [X] Tests passed (for all non-documentation changes) - [X] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later) I'll add tests, docs, and the whats-new entry later if I'm on the right path here. xref.: #843, #940, and #2085 See http://nbviewer.jupyter.org/gist/ocefpaf/e736c07faf3d4c9361ecf546a692c2cd","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2105/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 235687353,MDU6SXNzdWUyMzU2ODczNTM=,1452,Expected S1 dtype in datarray but got float64,950575,closed,0,,,2,2017-06-13T20:45:47Z,2017-09-04T20:13:39Z,2017-09-04T20:13:38Z,CONTRIBUTOR,,,,"Not sure if the dataset is pathological or if the problem is in `xarray`. `netCDF4 1.2.4` correctly returns `dtype` `S1` but `xarray 0.9.6` returns `'float64'` and then fails to open the dataset. (I am also having issues loading this variable with `netCDF4 >1.2.4`.) ```python In [1]: import xarray as xr from netCDF4 import Dataset url = 'http://geoport.whoi.edu/thredds/dodsC/usgs/vault0/models/tides/vdatum_gulf_of_maine/adcirc54_38_orig.nc' nc = Dataset(url) ds = xr.open_dataset(url) In [2]: nc.variables['tidenames'].dtype Out[2]: dtype('S1') In [3]: ds['tidenames'].dtype Out[3]: dtype('float64') In [4]: ds['tidenames'] Out[4]: --------------------------------------------------------------------------- ValueError Traceback (most recent call last) lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj) 691 type_pprinters=self.type_printers, 692 deferred_pprinters=self.deferred_printers) --> 693 printer.pretty(obj) 694 printer.flush() 695 return stream.getvalue() lib/python3.6/site-packages/IPython/lib/pretty.py in pretty(self, obj) 378 if callable(meth): 379 return meth(obj, self, cycle) --> 380 return _default_pprint(obj, self, cycle) 381 finally: 382 self.end_group() lib/python3.6/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle) 493 if _safe_getattr(klass, '__repr__', None) is not object.__repr__: 494 # A user-provided repr. Find newlines and replace them with p.break_() --> 495 _repr_pprint(obj, p, cycle) 496 return 497 p.begin_group(1, '<') lib/python3.6/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle) 691 """"""A pprint that just redirects to the normal repr function."""""" 692 # Find newlines and replace them with p.break_() --> 693 output = repr(obj) 694 for idx,output_line in enumerate(output.splitlines()): 695 if idx: lib/python3.6/site-packages/xarray/core/common.py in __repr__(self) 95 96 def __repr__(self): ---> 97 return formatting.array_repr(self) 98 99 def _iter(self): lib/python3.6/site-packages/xarray/core/formatting.py in array_repr(arr) 384 summary.append(repr(arr.data)) 385 elif arr._in_memory or arr.size < 1e5: --> 386 summary.append(short_array_repr(arr.values)) 387 else: 388 summary.append(u'[%s values with dtype=%s]' % (arr.size, arr.dtype)) lib/python3.6/site-packages/xarray/core/dataarray.py in values(self) 401 def values(self): 402 """"""The array's data as a numpy.ndarray"""""" --> 403 return self.variable.values 404 405 @values.setter lib/python3.6/site-packages/xarray/core/variable.py in values(self) 327 def values(self): 328 """"""The variable's data as a numpy.ndarray"""""" --> 329 return _as_array_or_item(self._data) 330 331 @values.setter lib/python3.6/site-packages/xarray/core/variable.py in _as_array_or_item(data) 203 TODO: remove this (replace with np.asarray) once these issues are fixed 204 """""" --> 205 data = np.asarray(data) 206 if data.ndim == 0: 207 if data.dtype.kind == 'M': lib/python3.6/site-packages/numpy/core/numeric.py in asarray(a, dtype, order) 480 481 """""" --> 482 return array(a, dtype, copy=False, order=order) 483 484 def asanyarray(a, dtype=None, order=None): lib/python3.6/site-packages/xarray/core/indexing.py in __array__(self, dtype) 425 426 def __array__(self, dtype=None): --> 427 self._ensure_cached() 428 return np.asarray(self.array, dtype=dtype) 429 lib/python3.6/site-packages/xarray/core/indexing.py in _ensure_cached(self) 422 def _ensure_cached(self): 423 if not isinstance(self.array, np.ndarray): --> 424 self.array = np.asarray(self.array) 425 426 def __array__(self, dtype=None): lib/python3.6/site-packages/numpy/core/numeric.py in asarray(a, dtype, order) 480 481 """""" --> 482 return array(a, dtype, copy=False, order=order) 483 484 def asanyarray(a, dtype=None, order=None): lib/python3.6/site-packages/xarray/core/indexing.py in __array__(self, dtype) 406 407 def __array__(self, dtype=None): --> 408 return np.asarray(self.array, dtype=dtype) 409 410 def __getitem__(self, key): lib/python3.6/site-packages/numpy/core/numeric.py in asarray(a, dtype, order) 480 481 """""" --> 482 return array(a, dtype, copy=False, order=order) 483 484 def asanyarray(a, dtype=None, order=None): lib/python3.6/site-packages/xarray/core/indexing.py in __array__(self, dtype) 373 def __array__(self, dtype=None): 374 array = orthogonally_indexable(self.array) --> 375 return np.asarray(array[self.key], dtype=None) 376 377 def __getitem__(self, key): lib/python3.6/site-packages/xarray/conventions.py in __getitem__(self, key) 365 def __getitem__(self, key): 366 return mask_and_scale(self.array[key], self.fill_value, --> 367 self.scale_factor, self.add_offset, self._dtype) 368 369 def __repr__(self): lib/python3.6/site-packages/xarray/conventions.py in mask_and_scale(array, fill_value, scale_factor, add_offset, dtype) 61 """""" 62 # by default, cast to float to ensure NaN is meaningful ---> 63 values = np.array(array, dtype=dtype, copy=True) 64 if fill_value is not None and not np.all(pd.isnull(fill_value)): 65 if getattr(fill_value, 'size', 1) > 1: ValueError: could not convert string to float: 'STEADY ' ``` I will try to investigate this later this week.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1452/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 170698635,MDExOlB1bGxSZXF1ZXN0ODA5OTU3NDE=,962,Two minor docs fixes,950575,closed,0,,,0,2016-08-11T17:18:28Z,2016-08-11T21:40:47Z,2016-08-11T21:40:46Z,CONTRIBUTOR,,0,pydata/xarray/pulls/962,"~~I am not sure how to build the docs locally (yet) to test these changes.~~ --- Edit: I found the env file in the `docs` and tested the docs locally. These changes looks fine in the HTML. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/962/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 169276671,MDExOlB1bGxSZXF1ZXN0ODAwMDg4NTU=,940,Don't convert time data to timedelta by default,950575,closed,0,,,6,2016-08-04T02:19:36Z,2016-08-11T16:15:05Z,2016-08-11T16:15:05Z,CONTRIBUTOR,,0,pydata/xarray/pulls/940,"I don't really like this PR... Too much change for a simple thing. I may try again soon. Closes #843 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/940/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 153066635,MDU6SXNzdWUxNTMwNjY2MzU=,843,Don't convert data with time units to timedeltas by default,950575,closed,0,,,6,2016-05-04T17:10:01Z,2016-08-11T16:14:28Z,2016-08-11T16:14:28Z,CONTRIBUTOR,,,,"Don't convert data with time units to `timedeltas` by default. Most of the time this behavior is not desirable (e.g: wave period data). @shoyer suggest: > possibly we should add an explicit toggle for decoding `timedelta`s vs `datetime`s. xref: https://github.com/pydata/xarray/pull/842#issuecomment-216933269 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/843/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 153126324,MDExOlB1bGxSZXF1ZXN0Njg5NDAxMDI=,844,Add a filter_by_attrs method to Dataset,950575,closed,0,,,27,2016-05-04T22:08:07Z,2016-08-03T17:53:43Z,2016-08-03T17:53:42Z,CONTRIBUTOR,,0,pydata/xarray/pulls/844,"This PR adds the `get_variables_by_attributes` method similar to the one in `netCDF4-python` and netcdf-java libraries. It is useful to filter a Dataset to known/expected attributes. @shoyer I don't really like the docs nor the change log entry I created. I will look at them again tomorrow with fresh eyes to see if I can improve them. Closes https://github.com/pydata/xarray/issues/567 xref: https://github.com/pydata/xarray/issues/567#issuecomment-216947679 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/844/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 152888663,MDExOlB1bGxSZXF1ZXN0Njg3ODU0NzM=,842,Fix #665 decode_cf_timedelta 2D,950575,closed,0,,,8,2016-05-03T22:26:34Z,2016-05-14T00:36:12Z,2016-05-04T17:11:59Z,CONTRIBUTOR,,0,pydata/xarray/pulls/842,"Long time listener first time caller :wink: I am not 100% about this PR though. I think that there are cases when we need the actual data rather than the `timedelta`. In [this](http://nbviewer.jupyter.org/gist/ocefpaf/6ed33fb35fe526f677e215b3fb304847) notebook we have wave period (`'mper'`) that should be _seconds_ ranging `0-30` and not those big numpy timedelta numbers. I know that I can get that behavior with `decode_times=True` when opening the dataset, but then the `time` coordinate get decoded as well. (Maybe I am way off and there is a way to do this.) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/842/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull