id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 272004812,MDU6SXNzdWUyNzIwMDQ4MTI=,1699,apply_ufunc(dask='parallelized') output_dtypes for datasets,6213168,open,0,,,8,2017-11-07T22:18:23Z,2020-04-06T15:31:17Z,,MEMBER,,,,"When a Dataset has variables with different dtypes, there's no way to tell apply_ufunc that the same function applied to different variables will produce different dtypes: ``` ds1 = xarray.Dataset(data_vars={'a': ('x', [1, 2]), 'b': ('x', [3.0, 4.5])}).chunk() ds2 = xarray.apply_ufunc(lambda x: x + 1, ds1, dask='parallelized', output_dtypes=[float]) ds2 Dimensions: (x: 2) Dimensions without coordinates: x Data variables: a (x) float64 dask.array b (x) float64 dask.array ds2.compute() Dimensions: (x: 2) Dimensions without coordinates: x Data variables: a (x) int64 2 3 b (x) float64 4.0 5.5 ``` ### Proposed solution When the output is a dataset, apply_ufunc could accept either ``output_dtypes=[t]`` (if all output variables will have the same dtype) or ``output_dtypes=[{var1: t1, var2: t2, ...}]``. In the example above, it would be ``output_dtypes=[{'a': int, 'b': float}]``.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1699/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 193294729,MDU6SXNzdWUxOTMyOTQ3Mjk=,1152,Scalar coords seep into index coords,6213168,closed,0,,,8,2016-12-03T15:43:53Z,2019-02-01T16:02:12Z,2019-02-01T16:02:12Z,MEMBER,,,,"Is this by design? I can't put any sense in it ``` >> a = xarray.DataArray([1, 2, 3], dims=['x'], coords={'x': [1, 2, 3], 'y': 10}) >> a.coords['x'] array([1, 2, 3]) Coordinates: * x (x) int64 1 2 3 y int64 10 ```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1152/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 317421267,MDU6SXNzdWUzMTc0MjEyNjc=,2079,New feature: interp1d,6213168,closed,0,,,8,2018-04-24T22:45:03Z,2018-05-06T19:30:32Z,2018-05-06T19:30:32Z,MEMBER,,,,"I've written a series of wrappers for the 1-dimensional scipy interpolators. Prototype code and colourful demo plots: https://gist.github.com/crusaderky/b0aa6b8fdf6e036cb364f6f40476cc67 # Features - Interpolate a ND array on any arbitrary dimension - Nearest-neighbour, linear, quadratic, cubic, Akima, PCHIP, and custom interpolators are supported - dask supported on both on the interpolated array and x_new - Supports ND x_new arrays - The CPU-heavy interpolator generation (splrep) is executed only once and then can be applied to multiple x_new (splev) - Pickleable and distributed-friendly # Design hacks - Depends on dask module, even when all inputs are based on plain numpy. - Abuses attrs and the ability to invoke a.attrname to get the user experience of a new DataArray method. - Abuses the fact that the chunks of a ``dask.array.Array`` can contain anything and you won't notice until you compute them. # Limitations - Can't dump to netcdf. Not solvable without hacking into the implementation details of scipy. - Datasets are not supported. Trivial to fix after solving #1699. - Chunks are not supported on x_new. Trivial to fix after solving #1995. - Chunks are not supported along the interpolated dimension. This is very complicated to solve. If x and x_new were always monotonic ascending,it would be (not trivially) solvable with dask.array.ghost.ghost. If you make no assumptions about monotonicity, things become way more complicated. A solution would need to go in the dask module, and then be invoked trivially from here with dask='allowed'.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2079/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 253349435,MDExOlB1bGxSZXF1ZXN0MTM3OTYwNDEw,1532,Avoid computing dask variables on __repr__ and __getattr__,6213168,closed,0,,2415632,8,2017-08-28T14:37:20Z,2017-09-21T22:30:02Z,2017-09-21T20:55:43Z,MEMBER,,0,pydata/xarray/pulls/1532," - [x] Fixes #1522 - [x] Tests added / passed - [x] Passes ``git diff upstream/master | flake8 --diff`` - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API Stop dataset data vars and non-index dataset/dataarray coords from being loaded by repr() and getattr(). The latter is particularly acute when working in Jupyter, which does a dozen or so getattr() when printing an object.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1532/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 252543868,MDU6SXNzdWUyNTI1NDM4Njg=,1522,Dataset.__repr__ computes dask variables,6213168,closed,0,,,8,2017-08-24T09:37:12Z,2017-09-21T20:55:43Z,2017-09-21T20:55:43Z,MEMBER,,,,"DataArray.\_\_repr\_\_ and Variable.\_\_repr\_\_ print a placeholder if the data uses the dask backend. Not so Dataset.\_\_repr\_\_, which tries computing the data before printing a tiny preview of it. This issue is extremely annoying when working in Jupyter, and particularly acute if the chunks are very big or are at the end of a very long chain of computation. For data variables, the expected behaviour is to print a placeholder just like DataArray does. For coords, we could either - print a placeholders (same treatment as data variables) - automatically invoke load() when the coord is added to the dataset - see #1521 for discussion.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1522/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue