id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 690518703,MDU6SXNzdWU2OTA1MTg3MDM=,4399,"Dask gufunc kwarg ""output_sizes"" is not deep copied",23618263,closed,0,,,2,2020-09-01T23:41:47Z,2020-09-04T15:57:19Z,2020-09-04T15:57:19Z,CONTRIBUTOR,,,," **What happened**: Defining the kwargs used in `xr.apply_ufunc` in a separate dictionary and using it multiple times in different call of the method, while using `dask=""paralellized""`, ends in an error since the dimension names in `ouput_sizes` (inside `dask_gufunc_kwargs`) are modified internally. **What you expected to happen**: Keep the same dictionary of kwargs unmodified **Minimal Complete Verifiable Example**: ```python import numpy as np import xarray as xr def dummy1(data, nfft): return data[..., (nfft // 2) + 1 :] * 2 def dummy2(data, nfft): return data[..., (nfft // 2) + 1 :] / 2 def xoperations(xarr, **kwargs): ufunc_kwargs = dict( kwargs=kwargs, input_core_dims=[[""time""]], output_core_dims=[[""freq""]], dask=""parallelized"", output_dtypes=[np.float], dask_gufunc_kwargs=dict(output_sizes={""freq"": int(kwargs[""nfft""] / 2) + 1}), ) ans1 = xr.apply_ufunc(dummy1, xarr, **ufunc_kwargs) ans2 = xr.apply_ufunc(dummy2, xarr, **ufunc_kwargs) return ans1, ans2 test = xr.DataArray( 4, coords=[(""time"", np.arange(1000)), (""lon"", np.arange(160, 300, 10))] ).chunk({""time"": -1, ""lon"": 10}) xoperations(test, nfft=1024) ``` This returns ``` --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in 32 ).chunk({""time"": -1, ""lon"": 10}) 33 ---> 34 xoperations(test, nfft=1024) in xoperations(xarr, **kwargs) 23 24 ans1 = xr.apply_ufunc(dummy1, xarr, **ufunc_kwargs) ---> 25 ans2 = xr.apply_ufunc(dummy2, xarr, **ufunc_kwargs) 26 27 return ans1, ans2 ~/GitLab/xarray_test/xarray/xarray/core/computation.py in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, dask_gufunc_kwargs, *args) 1086 join=join, 1087 exclude_dims=exclude_dims, -> 1088 keep_attrs=keep_attrs, 1089 ) 1090 # feed Variables directly through apply_variable_ufunc ~/GitLab/xarray_test/xarray/xarray/core/computation.py in apply_dataarray_vfunc(func, signature, join, exclude_dims, keep_attrs, *args) 260 261 data_vars = [getattr(a, ""variable"", a) for a in args] --> 262 result_var = func(*data_vars) 263 264 if signature.num_outputs > 1: ~/GitLab/xarray_test/xarray/xarray/core/computation.py in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, vectorize, keep_attrs, dask_gufunc_kwargs, *args) 632 if key not in signature.all_output_core_dims: 633 raise ValueError( --> 634 f""dimension '{key}' in 'output_sizes' must correspond to output_core_dims"" 635 ) 636 output_sizes_renamed[signature.dims_map[key]] = value ValueError: dimension 'dim0' in 'output_sizes' must correspond to output_core_dims ``` It is easily verifiable by sneaking a `print` statement before and after calling the first `apply_ufunc`. Everything is the same but the dimension names in `output_sizes` ```python {'kwargs': {'nfft': 1024}, 'input_core_dims': [['time']], 'output_core_dims': [['freq']], 'dask': 'parallelized', 'output_dtypes': [], 'dask_gufunc_kwargs': {'output_sizes': {'freq': 513}}} {'kwargs': {'nfft': 1024}, 'input_core_dims': [['time']], 'output_core_dims': [['freq']], 'dask': 'parallelized', 'output_dtypes': [], 'dask_gufunc_kwargs': {'output_sizes': {'dim0': 513}}} ``` **Anything else we need to know?**: I have a fork with a fix ready to be sent as a PR. I just imported the `copy` module and used `deepcopy` like this ```python dask_gufunc_kwargs = copy.deepcopy(dask_gufunc_kwargs) ``` around here https://github.com/pydata/xarray/blob/2acd0fc6563c3ad57f16e6ee804d592969419938/xarray/core/computation.py#L1013-L1020 If it's good enough then I can send the PR. **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: 2acd0fc6563c3ad57f16e6ee804d592969419938 python: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:25:08) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 3.12.74-60.64.40-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4 xarray: 0.14.2.dev337+g2acd0fc6 pandas: 1.1.1 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.3 pydap: installed h5netcdf: 0.8.1 h5py: 2.10.0 Nio: 1.5.5 zarr: 2.4.0 cftime: 1.2.1 nc_time_axis: 1.2.0 PseudoNetCDF: installed rasterio: 1.1.5 cfgrib: 0.9.8.4 iris: 2.4.0 bottleneck: 1.3.2 dask: 2.25.0 distributed: 2.25.0 matplotlib: 3.3.1 cartopy: 0.18.0 seaborn: 0.10.1 numbagg: installed pint: 0.15 setuptools: 49.6.0.post20200814 pip: 20.2.2 conda: None pytest: 6.0.1 IPython: 7.18.1 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4399/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 690769335,MDExOlB1bGxSZXF1ZXN0NDc3NjEzNTQ5,4402,Use a copy of dask_gufuc_kwargs,23618263,closed,0,,,1,2020-09-02T06:52:07Z,2020-09-04T15:57:19Z,2020-09-04T15:57:19Z,CONTRIBUTOR,,0,pydata/xarray/pulls/4402," - [x] Closes #4399 - [ ] Tests added - [x] Passes `isort . && black . && mypy . && flake8` - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` Following the suggestion of @kmuehlbauer, I am using just a shallow copy since a deep one is not required.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4402/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 537144956,MDExOlB1bGxSZXF1ZXN0MzUyNTUzNDkz,3615,Minor docstring fixes,23618263,closed,0,,,1,2019-12-12T18:33:27Z,2019-12-12T19:13:41Z,2019-12-12T18:48:50Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3615,"Really minor docstring fixes, just added 's' at the end of `kwarg` and deleted ` Default n = 5` from `thin` method's docstring since it doesn't have a default value.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3615/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 493765130,MDExOlB1bGxSZXF1ZXN0MzE3NjU2NzEz,3309,Fix DataArray api doc,23618263,closed,0,,,2,2019-09-15T17:36:46Z,2019-09-15T21:22:30Z,2019-09-15T20:27:31Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3309," Seems like I forgot to point `head`, `tail` and `thin` to the right direction in the `DataArray` api documentation","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3309/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 491324262,MDExOlB1bGxSZXF1ZXN0MzE1NzE0MjYz,3298,"Accept int value in head, thin and tail",23618263,closed,0,,,7,2019-09-09T21:00:41Z,2019-09-15T07:05:58Z,2019-09-14T21:46:16Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3298," Related #3278 This PR makes the methods `head`, `thin` and `tail` for both `DataArray` and `Dataset` accept a single integer value as a parameter. If no parameter is given, then it defaults to 5. - [x] Tests added - [x] Passes `black . && mypy . && flake8` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3298/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 488812619,MDExOlB1bGxSZXF1ZXN0MzEzNzU3MjIw,3278,"Add head, tail and thin methods",23618263,closed,0,,,6,2019-09-03T20:41:42Z,2019-09-05T05:49:36Z,2019-09-05T04:22:24Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3278," I feel like there's room for improvement in the docstrings, any change or suggestion is welcome! - [x] Closes #319 - [x] Tests added - [x] Passes `black . && mypy . && flake8` - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3278/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 486153978,MDExOlB1bGxSZXF1ZXN0MzExNjYzOTg2,3271,Raise proper error for scalar array when coords is a dict,23618263,closed,0,,,3,2019-08-28T04:29:02Z,2019-08-29T17:23:20Z,2019-08-29T17:09:00Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3271," As explained here https://github.com/pydata/xarray/pull/3159#discussion_r316230281 , when a user uses a scalar array to build a `DataArray` with `coords` given as a dictionary the error is not self explanatory. ```python >>> xr.DataArray(np.array(1), coords={'x': np.arange(4), 'y': 'a'}, dims=['x']) ... KeyError: 'x' ``` This PR makes sure that when `data` is a scalar array and `dims` is not empty, it sets the shape to `(0,)` to make it fail with the proper raise message ```python >>> xr.DataArray(np.array(1), coords={'x': np.arange(4), 'y': 'a'}, dims=['x']) ... ValueError: conflicting sizes for dimension 'x': length 0 on the data but length 4 on coordinate 'x' ``` - [x] Test updated - [x] Passes `black . && mypy . && flake8` - [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API (is this needed for a change like this?) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3271/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 472100381,MDExOlB1bGxSZXF1ZXN0MzAwNTc2Nzc4,3159,Initialize empty or full DataArray,23618263,closed,0,,,20,2019-07-24T06:21:50Z,2019-08-27T16:28:04Z,2019-08-26T20:36:36Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3159," I attempted to implement what has been asked for in #277 as an effort to contribute to this project. This PR adds the ability to initialize a DataArray with a constant value, including `np.nan`. Also, if `data = None` then it is initialized as `np.empty` to take advantage of its speed for big arrays. ```python >> foo = xr.DataArray(None, coords=[range(3), range(4)]) >> foo array([[4.673257e-310, 0.000000e+000, 0.000000e+000, 0.000000e+000], [0.000000e+000, 0.000000e+000, 0.000000e+000, 0.000000e+000], [0.000000e+000, 0.000000e+000, 0.000000e+000, 0.000000e+000]]) Coordinates: * dim_0 (dim_0) int64 0 1 2 * dim_1 (dim_1) int64 0 1 2 3 ``` - [x] Closes #878, #277 - [x] Tests added - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API Regarding the tests, I am not sure how to test the creation of an empty DataArray with `data=None` since the values changes between calls of `np.empty`. This is the reason I only added the test for the constant value.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3159/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull