id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 842436143,MDU6SXNzdWU4NDI0MzYxNDM=,5081,Lazy indexing arrays as a stand-alone package,1217238,open,0,,,6,2021-03-27T07:06:03Z,2023-12-15T13:20:03Z,,MEMBER,,,,"From @rabernat on [Twitter](https://twitter.com/rabernat/status/1330707155742322689): > ""Xarray has some secret private classes for lazily indexing / wrapping arrays that are so useful I think they should be broken out into a standalone package. https://github.com/pydata/xarray/blob/master/xarray/core/indexing.py#L516"" The idea here is create a first-class ""duck array"" library for lazy indexing that could replace xarray's internal classes for lazy indexing. This would be in some ways similar to dask.array, but much simpler, because it doesn't have to worry about parallel computing. Desired features: - Lazy indexing - Lazy transposes - Lazy concatenation (#4628) and stacking - Lazy vectorized operations (e.g., unary and binary arithmetic) - needed for decoding variables from disk (`xarray.encoding`) and - building lazy multi-dimensional coordinate arrays corresponding to map projections (#3620) - Maybe: lazy reshapes (#4113) A common feature of these operations is they can (and almost always should) be _fused_ with indexing: if N elements are selected via indexing, only O(N) compute and memory is required to produce them, regards of the size of the original arrays as long as the number of applied operations can be treated as a constant. Memory access is significantly slower than compute on modern hardware, so recomputing these operations on the fly is almost always a good idea. Out of scope: lazy computation when indexing could require access to many more elements to compute the desired value than are returned. For example, `mean()` probably should not be lazy, because that could involve computation of a very large number of elements that one might want to cache. This is valuable functionality for Xarray for two reasons: 1. It allows for ""previewing"" small bits of data loaded from disk or remote storage, even if that data needs some form of cheap ""decoding"" from its form on disk. 2. It allows for xarray to decode data in a lazy fashion that is compatible with full-featured systems for lazy computation (e.g., Dask), without requiring the user to choose dask when reading the data. Related issues: - [Proposal] Expose Variable without Pandas dependency #3981 - Lazy concatenation of arrays #4628 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5081/reactions"", ""total_count"": 6, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 6, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 253395960,MDU6SXNzdWUyNTMzOTU5NjA=,1533,Index variables loaded from dask can be computed twice,1217238,closed,0,,,6,2017-08-28T17:18:27Z,2023-04-06T04:15:46Z,2023-04-06T04:15:46Z,MEMBER,,,,as reported by @crusaderky in #1522 ,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1533/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 948890466,MDExOlB1bGxSZXF1ZXN0NjkzNjY1NDEy,5624,Make typing-extensions optional,1217238,closed,0,,,6,2021-07-20T17:43:22Z,2021-07-22T23:30:49Z,2021-07-22T23:02:03Z,MEMBER,,0,pydata/xarray/pulls/5624,"Type checking may be a little worse if typing-extensions are not installed, but I don't think it's worth the trouble of adding another hard dependency just for one use for TypeGuard. Note: sadly this doesn't work yet. Mypy (and pylance) don't like the type alias defined with try/except. Any ideas? In the worst case, we could revert the TypeGuard entirely, but that would be a shame... - [x] Closes #5495 - [x] Passes `pre-commit run --all-files` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5624/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 715374721,MDU6SXNzdWU3MTUzNzQ3MjE=,4490,Group together decoding options into a single argument,1217238,open,0,,,6,2020-10-06T06:15:18Z,2020-10-29T04:07:46Z,,MEMBER,,,,"**Is your feature request related to a problem? Please describe.** `open_dataset()` currently has a _very_ long function signature. This makes it hard to keep track of everything it can do, and is particularly problematic for the authors of _new_ backends (e.g., see https://github.com/pydata/xarray/pull/4477), which might need to know how to handle all these arguments. **Describe the solution you'd like** To simple the interface, I propose to group together all the decoding options into a new `DecodingOptions` class. I'm thinking something like: ```python from dataclasses import dataclass, field, asdict from typing import Optional, List @dataclass(frozen=True) class DecodingOptions: mask: Optional[bool] = None scale: Optional[bool] = None datetime: Optional[bool] = None timedelta: Optional[bool] = None use_cftime: Optional[bool] = None concat_characters: Optional[bool] = None coords: Optional[bool] = None drop_variables: Optional[List[str]] = None @classmethods def disabled(cls): return cls(mask=False, scale=False, datetime=False, timedelta=False, concat_characters=False, coords=False) def non_defaults(self): return {k: v for k, v in asdict(self).items() if v is not None} # add another method for creating default Variable Coder() objects, # e.g., those listed in encode_cf_variable() ``` The signature of `open_dataset` would then become: ```python def open_dataset( filename_or_obj, group=None, * engine=None, chunks=None, lock=None, cache=None, backend_kwargs=None, decode: Union[DecodingOptions, bool] = None, **deprecated_kwargs ): if decode is None: decode = DecodingOptions() if decode is False: decode = DecodingOptions.disabled() # handle deprecated_kwargs... ... ``` **Question**: are `decode` and `DecodingOptions` the right names? Maybe these should still include the name ""CF"", e.g., `decode_cf` and `CFDecodingOptions`, given that these are specific to CF conventions? **Note**: the current signature is `open_dataset(filename_or_obj, group=None, decode_cf=True, mask_and_scale=None, decode_times=True, autoclose=None, concat_characters=True, decode_coords=True, engine=None, chunks=None, lock=None, cache=None, drop_variables=None, backend_kwargs=None, use_cftime=None, decode_timedelta=None)` Usage with the new interface would look like `xr.open_dataset(filename, decode=False)` or `xr.open_dataset(filename, decode=xr.DecodingOptions(mask=False, scale=False))`. This requires a _little_ bit more typing than what we currently have, but it has a few advantages: 1. It's easier to understand the role of different arguments. Now there is a function with ~8 arguments and a class with ~8 arguments rather than a function with ~15 arguments. 2. It's easier to add new decoding arguments (e.g., for more advanced CF conventions), because they don't clutter the `open_dataset` interface. For example, I separated out `mask` and `scale` arguments, versus the current `mask_and_scale` argument. 3. If a new backend plugin for `open_dataset()` needs to handle every option supported by `open_dataset()`, this makes that task significantly easier. The only decoding options they need to worry about are _non-default_ options that were explicitly set, i.e., those exposed by the `non_defaults()` method. If another decoding option wasn't explicitly set and isn't recognized by the backend, they can just ignore it. **Describe alternatives you've considered** For the overall approach: 1. We could keep the current design, with separate keyword arguments for decoding options, and just be very careful about passing around these arguments. This seems pretty painful for the backend refactor, though. 2. We could keep the current design only for the user facing `open_dataset()` interface, and then internally convert into the `DecodingOptions()` struct for passing to backend constructors. This would provide much needed flexibility for backend authors, but most users wouldn't benefit from the new interface. Perhaps this would make sense as an intermediate step? ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4490/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 702372014,MDExOlB1bGxSZXF1ZXN0NDg3NjYxMzIz,4426,Fix for h5py deepcopy issues,1217238,closed,0,,,6,2020-09-16T01:11:00Z,2020-09-18T22:31:13Z,2020-09-18T22:31:09Z,MEMBER,,0,pydata/xarray/pulls/4426," - [x] Closes #4425 - [x] Tests added - [x] Passes `isort . && black . && mypy . && flake8` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4426/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 417542619,MDU6SXNzdWU0MTc1NDI2MTk=,2803,Test failure with TestValidateAttrs.test_validating_attrs,1217238,closed,0,,,6,2019-03-05T23:03:02Z,2020-08-25T14:29:19Z,2019-03-14T15:59:13Z,MEMBER,,,,"This is due to setting multi-dimensional attributes being an error, as of the latest netCDF4-Python release: https://github.com/Unidata/netcdf4-python/blob/master/Changelog E.g., as seen on Appveyor: https://ci.appveyor.com/project/shoyer/xray/builds/22834250/job/9q0ip6i3cchlbkw2 ``` ================================== FAILURES =================================== ___________________ TestValidateAttrs.test_validating_attrs ___________________ self = def test_validating_attrs(self): def new_dataset(): return Dataset({'data': ('y', np.arange(10.0))}, {'y': np.arange(10)}) def new_dataset_and_dataset_attrs(): ds = new_dataset() return ds, ds.attrs def new_dataset_and_data_attrs(): ds = new_dataset() return ds, ds.data.attrs def new_dataset_and_coord_attrs(): ds = new_dataset() return ds, ds.coords['y'].attrs for new_dataset_and_attrs in [new_dataset_and_dataset_attrs, new_dataset_and_data_attrs, new_dataset_and_coord_attrs]: ds, attrs = new_dataset_and_attrs() attrs[123] = 'test' with raises_regex(TypeError, 'Invalid name for attr'): ds.to_netcdf('test.nc') ds, attrs = new_dataset_and_attrs() attrs[MiscObject()] = 'test' with raises_regex(TypeError, 'Invalid name for attr'): ds.to_netcdf('test.nc') ds, attrs = new_dataset_and_attrs() attrs[''] = 'test' with raises_regex(ValueError, 'Invalid name for attr'): ds.to_netcdf('test.nc') # This one should work ds, attrs = new_dataset_and_attrs() attrs['test'] = 'test' with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = {'a': 5} with raises_regex(TypeError, 'Invalid value for attr'): ds.to_netcdf('test.nc') ds, attrs = new_dataset_and_attrs() attrs['test'] = MiscObject() with raises_regex(TypeError, 'Invalid value for attr'): ds.to_netcdf('test.nc') ds, attrs = new_dataset_and_attrs() attrs['test'] = 5 with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = 3.14 with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = [1, 2, 3, 4] with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = (1.9, 2.5) with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = np.arange(5) with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = np.arange(12).reshape(3, 4) with create_tmp_file() as tmp_file: > ds.to_netcdf(tmp_file) xarray\tests\test_backends.py:3450: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ xarray\core\dataset.py:1323: in to_netcdf compute=compute) xarray\backends\api.py:767: in to_netcdf unlimited_dims=unlimited_dims) xarray\backends\api.py:810: in dump_to_store unlimited_dims=unlimited_dims) xarray\backends\common.py:262: in store self.set_attributes(attributes) xarray\backends\common.py:278: in set_attributes self.set_attribute(k, v) xarray\backends\netCDF4_.py:418: in set_attribute _set_nc_attribute(self.ds, key, value) xarray\backends\netCDF4_.py:294: in _set_nc_attribute obj.setncattr(key, value) netCDF4\_netCDF4.pyx:2781: in netCDF4._netCDF4.Dataset.setncattr ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E ValueError: multi-dimensional array attributes not supported netCDF4\_netCDF4.pyx:1514: ValueError ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2803/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 398107776,MDU6SXNzdWUzOTgxMDc3NzY=,2666,Dataset.from_dataframe will produce a FutureWarning for DatetimeTZ data,1217238,open,0,,,6,2019-01-11T02:45:49Z,2019-12-30T22:58:23Z,,MEMBER,,,,"This appears with the development version of pandas; see https://github.com/pandas-dev/pandas/issues/24716 for details. Example: ``` In [16]: df = pd.DataFrame({""A"": pd.date_range('2000', periods=12, tz='US/Central')}) In [17]: df.to_xarray() /Users/taugspurger/Envs/pandas-dev/lib/python3.7/site-packages/xarray/core/dataset.py:3111: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'. To accept the future behavior, pass 'dtype=object'. To keep the old behavior, pass 'dtype=""datetime64[ns]""'. data = np.asarray(series).reshape(shape) Out[17]: Dimensions: (index: 12) Coordinates: * index (index) int64 0 1 2 3 4 5 6 7 8 9 10 11 Data variables: A (index) datetime64[ns] 2000-01-01T06:00:00 ... 2000-01-12T06:00:00 ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2666/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 269348789,MDU6SXNzdWUyNjkzNDg3ODk=,1668,Remove use of allow_cleanup_failure in test_backends.py,1217238,open,0,,,6,2017-10-28T20:47:31Z,2019-09-29T20:07:03Z,,MEMBER,,,,"This exists for the benefit of Windows, on which trying to delete an open file results in an error. But really, it would be nice to have a test suite that doesn't leave any temporary files hanging around. The main culprit is tests like this, where opening a file triggers an error: ```python with raises_regex(TypeError, 'pip install netcdf4'): open_dataset(tmp_file, engine='scipy') ``` The way to fix this is to use mocking of some sort, to intercept calls to backend file objects and close them afterwards.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1668/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 278713328,MDU6SXNzdWUyNzg3MTMzMjg=,1756,Deprecate inplace methods,1217238,closed,0,,2856429,6,2017-12-02T20:09:00Z,2019-03-25T19:19:10Z,2018-11-03T21:24:13Z,MEMBER,,,,"The following methods have an `inplace` argument: `DataArray.reset_coords` `DataArray.set_index` `DataArray.reset_index` `DataArray.reorder_levels` `Dataset.set_coords` `Dataset.reset_coords` `Dataset.rename` `Dataset.swap_dims` `Dataset.set_index` `Dataset.reset_index` `Dataset.reorder_levels` `Dataset.update` `Dataset.merge` As proposed in https://github.com/pydata/xarray/issues/1755#issuecomment-348682403, let's deprecate all of these at the next major release (v0.11). They add unnecessary complexity to methods and promote confusing about xarray's data model. Practically, we would change all of the default values to `inplace=None` and issue either a `DeprecationWarning` or `FutureWarning` (see [PEP 565](https://www.python.org/dev/peps/pep-0565/#additional-use-case-for-futurewarning) for more details on that choice).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1756/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 395332265,MDExOlB1bGxSZXF1ZXN0MjQxODExMjc4,2642,Use pycodestyle for lint checks.,1217238,closed,0,,,6,2019-01-02T18:11:38Z,2019-03-14T06:27:20Z,2019-01-03T18:10:13Z,MEMBER,,0,pydata/xarray/pulls/2642,"flake8 includes a few more useful checks, but it's annoying to only see it's output in Travis-CI results. This keeps Travis-CI and pep8speaks in sync. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2642/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 369310993,MDU6SXNzdWUzNjkzMTA5OTM=,2480,test_apply_dask_new_output_dimension is broken on master with dask-dev,1217238,closed,0,,,6,2018-10-11T21:24:33Z,2018-10-12T16:26:17Z,2018-10-12T16:26:17Z,MEMBER,,,,"Example build failure: https://travis-ci.org/pydata/xarray/jobs/439949937 ``` =================================== FAILURES =================================== _____________________ test_apply_dask_new_output_dimension _____________________ @requires_dask def test_apply_dask_new_output_dimension(): import dask.array as da array = da.ones((2, 2), chunks=(1, 1)) data_array = xr.DataArray(array, dims=('x', 'y')) def stack_negative(obj): def func(x): return np.stack([x, -x], axis=-1) return apply_ufunc(func, obj, output_core_dims=[['sign']], dask='parallelized', output_dtypes=[obj.dtype], output_sizes={'sign': 2}) expected = stack_negative(data_array.compute()) actual = stack_negative(data_array) assert actual.dims == ('x', 'y', 'sign') assert actual.shape == (2, 2, 2) assert isinstance(actual.data, da.Array) > assert_identical(expected, actual) xarray/tests/test_computation.py:737: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ xarray/tests/test_computation.py:24: in assert_identical assert a.identical(b), msg xarray/core/dataarray.py:1923: in identical self._all_compat(other, 'identical')) xarray/core/dataarray.py:1875: in _all_compat compat(self, other)) xarray/core/dataarray.py:1872: in compat return getattr(x.variable, compat_str)(y.variable) xarray/core/variable.py:1461: in identical self.equals(other)) xarray/core/variable.py:1439: in equals equiv(self.data, other.data))) xarray/core/duck_array_ops.py:144: in array_equiv arr1, arr2 = as_like_arrays(arr1, arr2) xarray/core/duck_array_ops.py:128: in as_like_arrays return tuple(np.asarray(d) for d in data) xarray/core/duck_array_ops.py:128: in return tuple(np.asarray(d) for d in data) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/numpy/core/numeric.py:501: in asarray return array(a, dtype, copy=False, order=order) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/core.py:1118: in __array__ x = self.compute() ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/base.py:156: in compute (result,) = compute(self, traverse=False, **kwargs) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/base.py:390: in compute dsk = collections_to_dsk(collections, optimize_graph, **kwargs) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/base.py:194: in collections_to_dsk for opt, (dsk, keys) in groups.items()])) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/base.py:194: in for opt, (dsk, keys) in groups.items()])) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/optimization.py:41: in optimize dsk = ensure_dict(dsk) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/utils.py:830: in ensure_dict result.update(dd) ../../../miniconda/envs/test_env/lib/python3.6/_collections_abc.py:720: in __iter__ yield from self._mapping ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/top.py:168: in __iter__ return iter(self._dict) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/top.py:160: in _dict concatenate=self.concatenate ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/top.py:305: in top keytups = list(itertools.product(*[range(dims[i]) for i in out_indices])) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ .0 = > keytups = list(itertools.product(*[range(dims[i]) for i in out_indices])) E KeyError: '.0' ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/top.py:305: KeyError ``` My guess is that this is somehow related to @mrocklin's recent refactor of dask.array.atop: https://github.com/dask/dask/pull/3998 If the cause isn't obvious, I'll try to come up with a simple dask only example that reproduces it.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2480/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 361915770,MDU6SXNzdWUzNjE5MTU3NzA=,2424,0.10.9 release,1217238,closed,0,,,6,2018-09-19T20:31:29Z,2018-09-26T01:05:09Z,2018-09-22T15:14:48Z,MEMBER,,,,"It's now been two months since the 0.10.8 release, so we really ought to issue a new minor release. I was initially thinking of skipping straight to 0.11.0 if we include https://github.com/pydata/xarray/pull/2261 (xarray.backends refactor), but it seems that will take a bit longer to review/test so it's probably worth issuing a 0.10.9 release first. @pydata/xarray -- are there any PRs / bug-fixes in particular we should wait for before issuing the release? I suppose it would be good to sort out https://github.com/pydata/xarray/issues/2422 (Plot2D no longer sorts coordinates before plotting)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2424/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 302153432,MDExOlB1bGxSZXF1ZXN0MTcyNzYxNTAw,1962,Support __array_ufunc__ for xarray objects.,1217238,closed,0,,,6,2018-03-05T02:36:20Z,2018-03-12T20:31:07Z,2018-03-12T20:31:07Z,MEMBER,,0,pydata/xarray/pulls/1962,"This means NumPy ufuncs are now supported directly on xarray.Dataset objects, and opens the door to supporting computation on new data types, such as sparse arrays or arrays with units. - [x] Closes #1617 (remove if there is no corresponding issue, which should only be the case for minor changes) - [x] Tests added (for all bug fixes or enhancements) - [x] Tests passed (for all non-documentation changes) - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1962/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 195579837,MDU6SXNzdWUxOTU1Nzk4Mzc=,1164,Don't warn when doing comparisons or arithmetic with NaN,1217238,closed,0,,,6,2016-12-14T16:33:05Z,2018-02-27T19:35:25Z,2018-02-27T16:03:43Z,MEMBER,,,,"Pandas used to unilaterally disable NumPy's warnings for doing comparisons with NaN, but now it doesn't: https://github.com/pandas-dev/pandas/issues/13109 See also http://stackoverflow.com/questions/41130138/why-is-invalid-value-encountered-in-greater-warning-thrown-in-python-xarray-fo/41147570#41147570 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1164/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 171828347,MDU6SXNzdWUxNzE4MjgzNDc=,974,Indexing with alignment and broadcasting,1217238,closed,0,,741199,6,2016-08-18T06:39:27Z,2018-02-04T23:30:12Z,2018-02-04T23:30:11Z,MEMBER,,,,"I think we can bring all of NumPy's advanced indexing to xarray in a very consistent way, with only very minor breaks in backwards compatibility. For _boolean indexing_: - `da[key]` where `key` is a boolean labelled array (with _any_ number of dimensions) is made equivalent to `da.where(key.reindex_like(ds), drop=True)`. This matches the existing behavior if `key` is a 1D boolean array. For multi-dimensional arrays, even though the result is now multi-dimensional, this coupled with automatic skipping of NaNs means that `da[key].mean()` gives the same result as in NumPy. - `da[key] = value` where `key` is a boolean labelled array can be made equivalent to `da = da.where(*align(key.reindex_like(da), value.reindex_like(da)))` (that is, the three argument form of `where`). - `da[key_0, ..., key_n]` where all of `key_i` are boolean arrays gets handled in the usual way. It is an `IndexingError` to supply multiple labelled keys if any of them are not already aligned with as the corresponding index coordinates (and share the same dimension name). If they want alignment, we suggest users simply write `da[key_0 & ... & key_n]`. For _vectorized indexing_ (by integer or index value): - `da[key_0, ..., key_n]` where all of `key_i` are integer labelled arrays with any number of dimensions gets handled like NumPy, except instead of broadcasting numpy-style we do broadcasting xarray-style: - If any of `key_i` are unlabelled, 1D arrays (e.g., numpy arrays), we convert them into an `xarray.Variable` along the respective dimension. 0D arrays remain scalars. This ensures that the result of broadcasting them (in the next step) will be consistent with our current ""outer indexing"" behavior. Unlabelled higher dimensional arrays triggers an `IndexingError`. - We ensure all keys have the same dimensions/coordinates by mapping it to `da[*broadcast(key_0, ..., key_n)]` (note that broadcast now includes automatic alignment). - The result's dimensions and coordinates are copied from the broadcast keys. - The result's values are taken by mapping each set of integer locations specified by the broadcast version of `key_i` to the integer position on the corresponding `i`th axis on `da`. - Labeled indexing like `ds.loc[key_0, ...., key_n]` works exactly as above, except instead of doing integer lookup, we lookup label values in the corresponding index instead. - Indexing with `.isel` and `.sel`/`.reindex` works like the two previous cases, except we lookup axes by dimension name instead of axis position. - I haven't fully thought through the implications for assignment (`da[key] = value` or `da.loc[key] = value`), but I think it works in a straightforwardly similar fashion. All of these methods should also work for indexing on `Dataset` by looping over Dataset variables in the usual way. This framework neatly subsumes most of the major limitations with xarray's existing indexing: - Boolean indexing on multi-dimensional arrays works in an intuitive way, for both selection and assignment. - No more need for specialized methods (`sel_points`/`isel_points`) for pointwise indexing. If you want to select along the diagonal of an array, you simply need to supply indexers that use a new dimension. Instead of `arr.sel_points(lat=stations.lat, lon=stations.lon, dim='station')`, you would simply write `arr.sel(lat=stations.lat, lon=stations.lon)` -- the `station` dimension is taken automatically from the indexer. - Other use cases for NumPy's advanced indexing that currently are impossible in xarray also automatically work. For example, nearest neighbor interpolation to a completely different grid is now as simple as `ds.reindex(lon=grid.lon, lat=grid.lat, method='nearest', tolerance=0.5)` or `ds.reindex_like(grid, method='nearest', tolerance=0.5)`. Questions to consider: - How does this interact with @benbovy's enhancements for MultiIndex indexing? (#802 and #947) - How do we handle mixed slice and array indexing? In NumPy, this is a [major source of confusion](https://github.com/numpy/numpy/pull/6256), because slicing is done before broadcasting and the order of slices in the result is handled separately from broadcast indices. I think we may be able to resolve this by mapping slices in this case to 1D arrays along their respective axes, and using our normal broadcasting rules. - Should we deprecate non-boolean indexing with `[]` and `.loc[]` and non-labelled arrays when some but not all dimensions are provided? Instead, we would require explicitly indexing like `[key, ...]` (yes, writing `...`), which indicates ""all trailing axes"" like NumPy. This behavior has been suggested for new indexers in NumPy because it precludes a class of bugs where the array has an unexpected number of dimensions. On the other hand, it's not so necessary for us when we have explicit indexing by dimension name with `.sel`. xref [these](https://github.com/pydata/xarray/pull/964#issuecomment-239469432) [comments](https://github.com/pydata/xarray/pull/964#issuecomment-239506907) from @MaximilianR and myself Note: I would _certainly_ welcome help making this happen from a contributor other than myself, though you should probably wait until I finish #964, first, which lays important groundwork. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/974/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 274308380,MDU6SXNzdWUyNzQzMDgzODA=,1720,Possible regression with PyNIO data not being lazily loaded,1217238,closed,0,,,6,2017-11-15T21:20:41Z,2017-11-17T17:33:13Z,2017-11-17T16:44:40Z,MEMBER,,,,"@weathergod reports on the mailing list: > I just tried [0.10.0 rc2] out in combination with the pynio engine (v1.5.0 from conda-forge), and doing a print on a dataset object causes all of the data to get loaded into memory. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1720/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 207587161,MDU6SXNzdWUyMDc1ODcxNjE=,1269,GroupBy like API for resample,1217238,closed,0,,,6,2017-02-14T17:46:02Z,2017-09-22T16:27:35Z,2017-09-22T16:27:35Z,MEMBER,,,,"Since we wrote `resample` in xarray, pandas updated resample to have a groupyby-like API (e.g., `df.resample('24H').mean()` vs. the old `df.resample('24H')` that uses the mean by default). It would be nice to redo the xarray resample API to match, e.g., `ds.resample(time='24H').mean()` vs `ds.resample('time', '24H')`. This would solve a few use cases, including grouped-resample arithmetic, iterating over groups and (mostly) take care of the need for `pd.TimeGrouper` support (https://github.com/pydata/xarray/issues/364). If we use `**kwargs` for matching dimension names, this could be done with a minimally painful deprecation cycle.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1269/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 88075523,MDU6SXNzdWU4ODA3NTUyMw==,432,Tools for converting between xray.Dataset and nested dictionaries/JSON,1217238,closed,0,,,6,2015-06-13T22:25:28Z,2016-08-11T21:54:51Z,2016-08-11T21:54:51Z,MEMBER,,,,"This came up in discussion with @freeman-lab -- xray does not have direct support for converting datasets to or from nested dictionaries (i.e., as could be serialized in JSON). This is quite straightforward to implement oneself, of course, but there's something to be said for making this more obvious. I'm thinking of a serialization format that looks something like this: ``` { 'variables': { 'temperature': { 'dimensions': ['x'], 'data': [1, 2, 3], 'attributes': {} } ... } 'attributes': { 'title': 'My example dataset', ... } } ``` The solution here would be to either: 1. add a few examples to the IO documentation of how to roll this one-self, or 2. create a few helper methods/functions to make this even easier: `xray.Dataset.to_dict`/`xray.read_dict`. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/432/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 104768781,MDExOlB1bGxSZXF1ZXN0NDQxNDkyOTQ=,559,Fix pcolormesh plots with cartopy,1217238,closed,0,,1307323,6,2015-09-03T19:50:22Z,2015-11-15T21:49:11Z,2015-09-14T20:33:36Z,MEMBER,,0,pydata/xarray/pulls/559,"``` python proj = ccrs.Orthographic(central_longitude=230, central_latitude=5) fig, ax = plt.subplots(figsize=(20, 8), subplot_kw=dict(projection=proj)) x.plot.pcolormesh(ax=ax, transform=ccrs.PlateCarree()) ax.coastlines() ``` Before: ![image](https://cloud.githubusercontent.com/assets/1217238/9669158/5149e908-523a-11e5-8d2a-c48326115174.png) After: ![image](https://cloud.githubusercontent.com/assets/1217238/9669163/556316fe-523a-11e5-8322-d7f92e551d65.png) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/559/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull