id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
692016642,MDU6SXNzdWU2OTIwMTY2NDI=,4403,Add a callback/preprocess option to open_dataset,950575,closed,1,,,6,2020-09-03T14:21:58Z,2023-09-17T16:01:28Z,2023-09-17T16:01:28Z,CONTRIBUTOR,,,,"It is not uncommon to find datasets with bad metadata, like `gregorian_proleptic` instead of the expected `prolepic_gregorian` [1], that will prevent users from reading the full dataset. Ideally we could have a functionality similar to iris' callbacks [2] to workaround this. In fact, it looks like xarray already does something similar in the open_mfdataset but not for the open_dataset.

Pinging @dcherian who gave the idea of using preprocess and @rsignell-usgs who is a pro on finding bad metadata everywhere.

[1] https://nbviewer.jupyter.org/gist/rsignell-usgs/27ba1fdeb934d6fd5b83abe43098a047

[2] https://scitools.org.uk/iris/docs/latest/userguide/navigating_a_cube.html?highlight=callback#adding-and-removing-metadata-to-the-cube-at-load-time","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4403/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
320283034,MDExOlB1bGxSZXF1ZXN0MTg1OTg5ODY1,2105,Deprecate decode timedelta,950575,closed,0,,,3,2018-05-04T13:50:33Z,2019-05-17T13:48:30Z,2019-05-17T13:48:30Z,CONTRIBUTOR,,0,pydata/xarray/pulls/2105," - [X] Closes #1621 (remove if there is no corresponding issue, which should only be the case for minor changes)
 - [X] Tests added (for all bug fixes or enhancements)
 - [X] Tests passed (for all non-documentation changes)
 - [X] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)

I'll add tests, docs, and the whats-new entry later if I'm on the right path here.

xref.: #843, #940, and #2085

See http://nbviewer.jupyter.org/gist/ocefpaf/e736c07faf3d4c9361ecf546a692c2cd","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2105/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
235687353,MDU6SXNzdWUyMzU2ODczNTM=,1452,Expected S1 dtype in datarray but got float64,950575,closed,0,,,2,2017-06-13T20:45:47Z,2017-09-04T20:13:39Z,2017-09-04T20:13:38Z,CONTRIBUTOR,,,,"Not sure if the dataset is pathological or if the problem is in `xarray`. `netCDF4 1.2.4` correctly returns `dtype`  `S1` but `xarray 0.9.6` returns `'float64'` and then fails to open the dataset. (I am also having issues loading this variable with `netCDF4 >1.2.4`.)

```python
In [1]: import xarray as xr
        from netCDF4 import Dataset
        url = 'http://geoport.whoi.edu/thredds/dodsC/usgs/vault0/models/tides/vdatum_gulf_of_maine/adcirc54_38_orig.nc'
        nc = Dataset(url)
        ds = xr.open_dataset(url)

In [2]: nc.variables['tidenames'].dtype
Out[2]: dtype('S1')

In [3]: ds['tidenames'].dtype
Out[3]: dtype('float64')

In [4]: ds['tidenames']
Out[4]: ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
    691                 type_pprinters=self.type_printers,
    692                 deferred_pprinters=self.deferred_printers)
--> 693             printer.pretty(obj)
    694             printer.flush()
    695             return stream.getvalue()

lib/python3.6/site-packages/IPython/lib/pretty.py in pretty(self, obj)
    378                             if callable(meth):
    379                                 return meth(obj, self, cycle)
--> 380             return _default_pprint(obj, self, cycle)
    381         finally:
    382             self.end_group()

lib/python3.6/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)
    493     if _safe_getattr(klass, '__repr__', None) is not object.__repr__:
    494         # A user-provided repr. Find newlines and replace them with p.break_()
--> 495         _repr_pprint(obj, p, cycle)
    496         return
    497     p.begin_group(1, '<')

lib/python3.6/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
    691     """"""A pprint that just redirects to the normal repr function.""""""
    692     # Find newlines and replace them with p.break_()
--> 693     output = repr(obj)
    694     for idx,output_line in enumerate(output.splitlines()):
    695         if idx:

lib/python3.6/site-packages/xarray/core/common.py in __repr__(self)
     95 
     96     def __repr__(self):
---> 97         return formatting.array_repr(self)
     98 
     99     def _iter(self):

lib/python3.6/site-packages/xarray/core/formatting.py in array_repr(arr)
    384         summary.append(repr(arr.data))
    385     elif arr._in_memory or arr.size < 1e5:
--> 386         summary.append(short_array_repr(arr.values))
    387     else:
    388         summary.append(u'[%s values with dtype=%s]' % (arr.size, arr.dtype))

lib/python3.6/site-packages/xarray/core/dataarray.py in values(self)
    401     def values(self):
    402         """"""The array's data as a numpy.ndarray""""""
--> 403         return self.variable.values
    404 
    405     @values.setter

lib/python3.6/site-packages/xarray/core/variable.py in values(self)
    327     def values(self):
    328         """"""The variable's data as a numpy.ndarray""""""
--> 329         return _as_array_or_item(self._data)
    330 
    331     @values.setter

lib/python3.6/site-packages/xarray/core/variable.py in _as_array_or_item(data)
    203     TODO: remove this (replace with np.asarray) once these issues are fixed
    204     """"""
--> 205     data = np.asarray(data)
    206     if data.ndim == 0:
    207         if data.dtype.kind == 'M':

lib/python3.6/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
    480 
    481     """"""
--> 482     return array(a, dtype, copy=False, order=order)
    483 
    484 def asanyarray(a, dtype=None, order=None):

lib/python3.6/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    425 
    426     def __array__(self, dtype=None):
--> 427         self._ensure_cached()
    428         return np.asarray(self.array, dtype=dtype)
    429 

lib/python3.6/site-packages/xarray/core/indexing.py in _ensure_cached(self)
    422     def _ensure_cached(self):
    423         if not isinstance(self.array, np.ndarray):
--> 424             self.array = np.asarray(self.array)
    425 
    426     def __array__(self, dtype=None):

lib/python3.6/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
    480 
    481     """"""
--> 482     return array(a, dtype, copy=False, order=order)
    483 
    484 def asanyarray(a, dtype=None, order=None):

lib/python3.6/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    406 
    407     def __array__(self, dtype=None):
--> 408         return np.asarray(self.array, dtype=dtype)
    409 
    410     def __getitem__(self, key):

lib/python3.6/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
    480 
    481     """"""
--> 482     return array(a, dtype, copy=False, order=order)
    483 
    484 def asanyarray(a, dtype=None, order=None):

lib/python3.6/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    373     def __array__(self, dtype=None):
    374         array = orthogonally_indexable(self.array)
--> 375         return np.asarray(array[self.key], dtype=None)
    376 
    377     def __getitem__(self, key):

lib/python3.6/site-packages/xarray/conventions.py in __getitem__(self, key)
    365     def __getitem__(self, key):
    366         return mask_and_scale(self.array[key], self.fill_value,
--> 367                               self.scale_factor, self.add_offset, self._dtype)
    368 
    369     def __repr__(self):

lib/python3.6/site-packages/xarray/conventions.py in mask_and_scale(array, fill_value, scale_factor, add_offset, dtype)
     61     """"""
     62     # by default, cast to float to ensure NaN is meaningful
---> 63     values = np.array(array, dtype=dtype, copy=True)
     64     if fill_value is not None and not np.all(pd.isnull(fill_value)):
     65         if getattr(fill_value, 'size', 1) > 1:

ValueError: could not convert string to float: 'STEADY '
```

I will try to investigate this later this week.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1452/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
170698635,MDExOlB1bGxSZXF1ZXN0ODA5OTU3NDE=,962,Two minor docs fixes,950575,closed,0,,,0,2016-08-11T17:18:28Z,2016-08-11T21:40:47Z,2016-08-11T21:40:46Z,CONTRIBUTOR,,0,pydata/xarray/pulls/962,"~~I am not sure how to build the docs locally (yet) to test these changes.~~

---

Edit: I found the env file in the `docs` and tested the docs locally. These changes looks fine in the HTML.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/962/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
169276671,MDExOlB1bGxSZXF1ZXN0ODAwMDg4NTU=,940,Don't convert time data to timedelta by default,950575,closed,0,,,6,2016-08-04T02:19:36Z,2016-08-11T16:15:05Z,2016-08-11T16:15:05Z,CONTRIBUTOR,,0,pydata/xarray/pulls/940,"I don't really like this PR... Too much change for a simple thing. I may try again soon.

Closes #843 
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/940/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
153066635,MDU6SXNzdWUxNTMwNjY2MzU=,843,Don't convert data with time units to timedeltas by default,950575,closed,0,,,6,2016-05-04T17:10:01Z,2016-08-11T16:14:28Z,2016-08-11T16:14:28Z,CONTRIBUTOR,,,,"Don't convert data with time units to `timedeltas` by default. Most of the time this behavior is not desirable (e.g: wave period data).

@shoyer suggest:

> possibly we should add an explicit toggle for decoding `timedelta`s vs `datetime`s.

xref: https://github.com/pydata/xarray/pull/842#issuecomment-216933269
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/843/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
153126324,MDExOlB1bGxSZXF1ZXN0Njg5NDAxMDI=,844,Add a filter_by_attrs method to Dataset,950575,closed,0,,,27,2016-05-04T22:08:07Z,2016-08-03T17:53:43Z,2016-08-03T17:53:42Z,CONTRIBUTOR,,0,pydata/xarray/pulls/844,"This PR adds the `get_variables_by_attributes` method similar to the one in `netCDF4-python` and netcdf-java libraries. It is useful to filter a Dataset to known/expected attributes.

@shoyer I don't really like the docs nor the change log entry I created. I will look at them again tomorrow with fresh eyes to see if I can improve them.

Closes https://github.com/pydata/xarray/issues/567

xref: https://github.com/pydata/xarray/issues/567#issuecomment-216947679
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/844/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
152888663,MDExOlB1bGxSZXF1ZXN0Njg3ODU0NzM=,842,Fix #665 decode_cf_timedelta 2D,950575,closed,0,,,8,2016-05-03T22:26:34Z,2016-05-14T00:36:12Z,2016-05-04T17:11:59Z,CONTRIBUTOR,,0,pydata/xarray/pulls/842,"Long time listener first time caller :wink: 

I am not 100% about this PR though. I think that there are cases when we need the actual data rather than the `timedelta`. In [this](http://nbviewer.jupyter.org/gist/ocefpaf/6ed33fb35fe526f677e215b3fb304847) notebook we have wave period (`'mper'`) that should be _seconds_ ranging `0-30` and not those big numpy timedelta numbers.

I know that I can get that behavior with `decode_times=True` when opening the dataset, but then the `time` coordinate get decoded as well. (Maybe I am way off and there is a way to do this.)
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/842/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull