id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 2275107296,I_kwDOAMm_X86Hm2Hg,8992,(i)loc slicer specialization for convenient slicing by dimension label as `.loc('dim_name')[:n]`,941907,open,0,,,0,2024-05-02T10:04:11Z,2024-05-02T14:47:09Z,,NONE,,,,"### Is your feature request related to a problem? Until [PEP 472](https://legacy.python.org/dev/peps/pep-0472/), I'm sure we would all love to be able to do indexing with labeled dimension names inside brackets. Here I'm proposing a slightly modified syntax which is possible to implement and would be quite convenient IMHO. ### Describe the solution you'd like This is inspired by the Pandas `.loc(axis=n)` specialization. Essentially the .`(i)loc` accessors would become callable like in Pandas, which would enable to specify the desired order of dimensions in the subsequent slicing brackets. Schematically ```python darr.loc('dim name 1', 'dim name 2')[x1:x2,y1:y2] ``` is equivalent to first returning an augmented `_LocIndexer` which now associates positional indexes to according to the provided dim order ```python loc_idx_spec = darr.loc('dim name 1', 'dim name 2') loc_idx_spec[x1:x2,y1:y2] ``` The first part is essentially similar to `.transpose('dim name 1', 'dim name 2')` and in the case of a `DataArray` it could be used instead. But this syntax could work also for Dataset. Additonally, it does not require an actual transpose operation. This accessor becomes especially convenient when you quickly want to index just one dimension such as ```python darr.loc('dim name')[:x2] ``` ### Describe alternatives you've considered The equivalent `darr.sel({'dim name 1': slice(x1, x2), 'dim name 2': slice(y1,y2)})` is admittedly not that much worse, but for me writing `slice` feels cumbersome especially in situations when you have a lot of `None` specifications such as `slice(None,None,2)`. ### Additional context This `.loc(axis=n)` API is (not so obviously) documented for Pandas [here](https://pandas.pydata.org/docs/user_guide/advanced.html#using-slicers).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8992/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 489034521,MDU6SXNzdWU0ODkwMzQ1MjE=,3279,Feature request: vector cross product,941907,closed,0,,,2,2019-09-04T09:05:41Z,2021-12-29T07:54:37Z,2021-12-29T07:54:37Z,NONE,,,,"xarray currently has the `xarray.dot()` function for calculating arbitrary dot products which is indeed very handy. Sometimes, especially for physical applications I also need a vector cross product. I' wondering whether you would be interested in having ` xarray.cross` as a wrapper of [`numpy.cross`.](https://docs.scipy.org/doc/numpy/reference/generated/numpy.cross.html) I currently use the following implementation: ```python def cross(a, b, spatial_dim, output_dtype=None): """"""xarray-compatible cross product Compatible with dask, parallelization uses a.dtype as output_dtype """""" # TODO find spatial dim default by looking for unique 3(or 2)-valued dim? for d in (a, b): if spatial_dim not in d.dims: raise ValueError('dimension {} not in {}'.format(spatial_dim, d)) if d.sizes[spatial_dim] != 3: #TODO handle 2-valued cases raise ValueError('dimension {} has not length 3 in {}'.format(d)) if output_dtype is None: output_dtype = a.dtype # TODO some better way to determine default? c = xr.apply_ufunc(np.cross, a, b, input_core_dims=[[spatial_dim], [spatial_dim]], output_core_dims=[[spatial_dim]], dask='parallelized', output_dtypes=[output_dtype] ) return c ``` #### Example usage ```python import numpy as np import xarray as xr a = xr.DataArray(np.empty((10, 3)), dims=['line', 'cartesian']) b = xr.full_like(a, 1) c = cross(a, b, 'cartesian') ``` #### Main question Do you want such a function (and possibly associated `DataArray.cross` methods) in the `xarray` namespace, or should it be in some other package? I didn't find a package which would be a good fit as this is close to core numpy functionality and isn't as domain specific as some geo packages. I'm not aware of some ""xrphysics"" package. I could make a PR if you'd want to have it in `xarray` directly. #### Output of ``xr.show_versions()``
# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.9.0-9-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.12.3 pandas: 0.24.2 numpy: 1.16.4 scipy: 1.3.0 netCDF4: 1.4.2 pydap: None h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.1.0 distributed: 2.1.0 matplotlib: 3.1.0 cartopy: None seaborn: 0.9.0 numbagg: None setuptools: 41.0.1 pip: 19.1.1 conda: 4.7.11 pytest: 5.0.1 IPython: 7.6.1 sphinx: 2.1.2
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3279/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 181340410,MDU6SXNzdWUxODEzNDA0MTA=,1040,DataArray.diff dim argument should be optional as is in docstring,941907,closed,0,,,7,2016-10-06T07:14:50Z,2020-03-28T18:18:21Z,2020-03-28T18:18:21Z,NONE,,,,"The dosctring of `DataArray.diff` lists the `dim` arg as optional, [but it isn't](https://github.com/pydata/xarray/blob/fbb4f0618eade20981bd5cfb9771b82fd88a8db5/xarray/core/dataarray.py#L1468). IMHO it should indeed be optional as it is quite convenient to apply `diff` to 1D signals without specifying the dimension. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1040/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 528701910,MDU6SXNzdWU1Mjg3MDE5MTA=,3574,apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta,941907,closed,0,,,12,2019-11-26T12:45:55Z,2020-01-22T15:43:19Z,2020-01-22T15:43:19Z,NONE,,,,"#### MCVE Code Sample ```python import numpy as np import xarray as xr ds = xr.Dataset({ 'signal': (['das_time', 'das', 'record'], np.empty((1000, 120, 45))), 'min_height': (['das'], np.empty((120,))) # each DAS has a different resolution }) def some_peak_finding_func(data1d, min_height): """"""process data1d with contraints by min_height"""""" result = np.zeros((4,2)) # summary matrix with 2 peak characteristics return result ds_dask = ds.chunk({'record':3}) xr.apply_ufunc(some_peak_finding_func, ds_dask['signal'], ds_dask['min_height'], input_core_dims=[['das_time'], []], # apply peak finding along trace output_core_dims=[['peak_pos', 'pulse']], vectorize=True, # up to here works without dask! dask='parallelized', output_sizes={'peak_pos': 4, 'pulse':2}, output_dtypes=[np.float], ) ``` fails with `ValueError: cannot call `vectorize` with a signature including new output dimensions on size 0 inputs` because `dask.array.utils.compute_meta()` passes it 0-sized arrays. #### Expected Output This should work and works well on the non-chunked ds, without `dask='parallelized'` and the associated `output*` parameters. #### Problem Description I'm trying to parallelize a peak finding routine with dask (works well without it) and I hoped that `dask='parallelized` would make that simple. However, the peak finding needs to be vectorized and it works well with vectorize=True`, but `np.vectorize` appears to have issues in `compute_meta` which is internally issued by dask in blockwise application as indicated in the source code: https://github.com/dask/dask/blob/e6ba8f5de1c56afeaed05c39c2384cd473d7c893/dask/array/utils.py#L118 A possible solution might be for `apply_ufunc` to pass `meta` directly to dask if it would be possible to foresee what `meta` should be. I suppose we are aiming for `np.nadarray` most of the time, though `sparse` might change that in the future. I know I could use groupby-apply as an alternative, but there are several issues that made us use `apply_ufunc` instead: - groupby-apply seems to have much larger overhead - the non-core dimensions would have to be stacked into a new dimension over which to groupby, but some of the dimensions to be stacked are already a MutliIndex and cannot be easily stacked. - we could unstack the MultiIndex dimensions first at the risk of introducing quite a number of NaNs - extra coords might lose dimension infromation (will depend on all) after unstacking after application #### Output of ``xr.show_versions()``
commit: None python: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.9.0-11-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.14.0 pandas: 0.25.1 numpy: 1.17.2 scipy: 1.3.1 netCDF4: 1.4.2 pydap: None h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.5.2 distributed: 2.5.2 matplotlib: 3.1.1 cartopy: None seaborn: 0.9.0 numbagg: None setuptools: 41.4.0 pip: 19.2.3 conda: 4.7.12 pytest: 5.2.1 IPython: 7.8.0 sphinx: 2.2.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3574/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 189998469,MDU6SXNzdWUxODk5OTg0Njk=,1130,"pipe, apply should call maybe_wrap_array, possibly resolve dim->axis",941907,closed,0,,,6,2016-11-17T10:04:10Z,2019-01-24T18:34:38Z,2019-01-24T18:34:37Z,NONE,,,,"While `pipe` and `Dataset.apply` (btw, why not call them both the same?) specify that they expected `DataArray` returning functions, it would be very convenient to have them call `maybe_wrap_array` anyways. I've often tried piping functions which at first looked like ufuncs only to find out that they forgot to call `__array_wrap__` (I'm looking at you `np.angle`). The extra call to `maybe_wrap_array` is cheap, does not break anything and would be very useful. It would greatly enlarge the set of functions that can be readily applied to `DataArray` objects without any need for writing function wrappers (motivated in part by #1080). Since many such functions expect an `axis` argument, some syntax for `dim -> axis` resolution could be also added. I see some options 1) check if axis argument is a string and coerce it to a number, something like ```python axis = kwargs.get('axis') if axis is not None: if isinnstance(axis, str): kwargs['axis'] = darray.get_axis_num(axis) ``` Simple, but specifying `axis='smth'` is not very explicit and may mean something else for certain funcs, it assumes a lot about function signatures. 2) similar to 1., but only if both `dim` and `axis='dim'` are specified. Still possible conflict of specific meaning, but less likely. ```python if kwargs.get('axis') == 'dim': kwargs['axis'] = darray.get_axis_num(kwargs['dim']) ``` Other coding might be possible. 3) use some syntax similar to `pipe((f, 'arg2', ('axis', dim)), *args, **kwargs)`, but that's getting complicated and less readable. Let me know what you think and perhaps you'll come up with some nicer syntax for dim-> axis resolution.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1130/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 190026722,MDExOlB1bGxSZXF1ZXN0OTQxNTMzNjE=,1131,Fix #1040: diff dim argument should be optional,941907,closed,0,,,2,2016-11-17T11:55:53Z,2019-01-14T21:18:18Z,2019-01-14T21:18:18Z,NONE,,0,pydata/xarray/pulls/1131,"* {Dataset,DataArray}.diff dim argument defaults to last dimension * add test cases * add changelog","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1131/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 187373423,MDU6SXNzdWUxODczNzM0MjM=,1080,"acccessor extending approach limits functional programming approach, make direct monkey-patching also possible",941907,closed,0,,,9,2016-11-04T16:06:34Z,2016-12-06T10:44:16Z,2016-12-06T10:44:16Z,NONE,,,,"Hi, thatnks for creating and continuing development of xarray. I'm in the process of converting my own functions and classes to it which did something very similar (label indexing, plotting, etc.) but was inferior in many ways. Right now I'm designing a set of functions for digital signal processing (I need them the most, though inteprolation is also important), mostly lowpass/highpass filters and spectrograms based on `scipy.signal`. Initially I started writing a `dsp` accessor with such methods, but later I realized, that this accessor approach makes it quite hard to do something like `dataset.apply(lowpass, 5)`. Instead, one has to do something like `dataset.apply(lambda d: d.dsp.lowpass(0.5))` which is less convenient than the clear functional programming `apply` approach. I agree that making sure that adding a method to the class does not overwrite something else is a good idea, but that can be done for single methods as well. It would be even possible to save replaced method somewhere and replace them later if requested. The great advantage is that the added methods can still be first-class functions as well. Such methods cannot save state as easily as accessor methods, but in many cases that is not necessary. I actually implemented something similar for my DataArray-like class (before xarray existed, now I'm trying to convert to `xarray`) with such plugin handling (below with slight modifications for `DataArray`). Let me know what you think. ```python '''Module for handling various DataArray method plugins''' from xarray import DataArray from types import FunctionType # map: name of patched method -> stack of previous methods _REPLACED_METHODS = {} def patch_dataarray(method_func): '''Sets method_func as a method of the DataArray class The method name is inferred from method_func.__name__ Can be used as decorator for functions that should be added to the DataArray class as methods, for example:: @patch_dataarray def square(self, arg): return self**2 The decorated function then becomes a method of the class, so these two are equivalent:: foo(sig) == sig.foo() ''' method_name = method_func.__name__ method_stack = _REPLACED_METHODS.setdefault(method_name, []) method_stack.append(getattr(DataArray, method_name, None)) setattr(DataArray, method_name, method_func) return method_func def restore_method(method_func): '''Restore a previous version of a method of the DataArray class''' method_name = method_func.__name__ try: method_stack = _REPLACED_METHODS[method_name] except KeyError: return # no previous method to restore previous_method = method_stack.pop(-1) if previous_method is None: delattr(DataArray, method_name) else: setattr(DataArray, method_name, previous_method) def unload_module_patches(module): '''Restore previous versions of methods found in the given module''' for name in dir(module): obj = getattr(module, name) if isinstance(obj, FunctionType): restore_method(obj) def patch_dataarray_wraps(func, func_name=None): '''Return a decorator that patches DataArray with the decorated function and copies the name of the func and adds a line to the docstring about wrapping the function ''' if func_name is None: func_name = func.__name__ def updater(new_func): '''copy the function name and add a docline''' new_func.__name__ = func_name new_func.__doc__ = (('Wrapper around function %s\n\n' % func_name) + new_func.__doc__) return patch_dataarray(new_func) return updater ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1080/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue