id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
2275107296,I_kwDOAMm_X86Hm2Hg,8992,(i)loc slicer specialization for convenient slicing by dimension label as `.loc('dim_name')[:n]`,941907,open,0,,,0,2024-05-02T10:04:11Z,2024-05-02T14:47:09Z,,NONE,,,,"### Is your feature request related to a problem?
Until [PEP 472](https://legacy.python.org/dev/peps/pep-0472/), I'm sure we would all love to be able to do indexing with labeled dimension names inside brackets. Here I'm proposing a slightly modified syntax which is possible to implement and would be quite convenient IMHO.
### Describe the solution you'd like
This is inspired by the Pandas `.loc(axis=n)` specialization. Essentially the .`(i)loc` accessors would become callable like in Pandas, which would enable to specify the desired order of dimensions in the subsequent slicing brackets. Schematically
```python
darr.loc('dim name 1', 'dim name 2')[x1:x2,y1:y2]
```
is equivalent to first returning an augmented `_LocIndexer` which now associates positional indexes to according to the provided dim order
```python
loc_idx_spec = darr.loc('dim name 1', 'dim name 2')
loc_idx_spec[x1:x2,y1:y2]
```
The first part is essentially similar to `.transpose('dim name 1', 'dim name 2')` and in the case of a `DataArray` it could be used instead. But this syntax could work also for Dataset. Additonally, it does not require an actual transpose operation.
This accessor becomes especially convenient when you quickly want to index just one dimension such as
```python
darr.loc('dim name')[:x2]
```
### Describe alternatives you've considered
The equivalent `darr.sel({'dim name 1': slice(x1, x2), 'dim name 2': slice(y1,y2)})` is admittedly not that much worse, but for me writing `slice` feels cumbersome especially in situations when you have a lot of `None` specifications such as `slice(None,None,2)`.
### Additional context
This `.loc(axis=n)` API is (not so obviously) documented for Pandas [here](https://pandas.pydata.org/docs/user_guide/advanced.html#using-slicers).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8992/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
489034521,MDU6SXNzdWU0ODkwMzQ1MjE=,3279,Feature request: vector cross product,941907,closed,0,,,2,2019-09-04T09:05:41Z,2021-12-29T07:54:37Z,2021-12-29T07:54:37Z,NONE,,,,"xarray currently has the `xarray.dot()` function for calculating arbitrary dot products which is indeed very handy.
Sometimes, especially for physical applications I also need a vector cross product. I' wondering whether you would be interested in having ` xarray.cross` as a wrapper of [`numpy.cross`.](https://docs.scipy.org/doc/numpy/reference/generated/numpy.cross.html) I currently use the following implementation:
```python
def cross(a, b, spatial_dim, output_dtype=None):
""""""xarray-compatible cross product
Compatible with dask, parallelization uses a.dtype as output_dtype
""""""
# TODO find spatial dim default by looking for unique 3(or 2)-valued dim?
for d in (a, b):
if spatial_dim not in d.dims:
raise ValueError('dimension {} not in {}'.format(spatial_dim, d))
if d.sizes[spatial_dim] != 3: #TODO handle 2-valued cases
raise ValueError('dimension {} has not length 3 in {}'.format(d))
if output_dtype is None:
output_dtype = a.dtype # TODO some better way to determine default?
c = xr.apply_ufunc(np.cross, a, b,
input_core_dims=[[spatial_dim], [spatial_dim]],
output_core_dims=[[spatial_dim]],
dask='parallelized', output_dtypes=[output_dtype]
)
return c
```
#### Example usage
```python
import numpy as np
import xarray as xr
a = xr.DataArray(np.empty((10, 3)), dims=['line', 'cartesian'])
b = xr.full_like(a, 1)
c = cross(a, b, 'cartesian')
```
#### Main question
Do you want such a function (and possibly associated `DataArray.cross` methods) in the `xarray` namespace, or should it be in some other package? I didn't find a package which would be a good fit as this is close to core numpy functionality and isn't as domain specific as some geo packages. I'm not aware of some ""xrphysics"" package.
I could make a PR if you'd want to have it in `xarray` directly.
#### Output of ``xr.show_versions()``
# Paste the output here xr.show_versions() here
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.9.0-9-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.1
xarray: 0.12.3
pandas: 0.24.2
numpy: 1.16.4
scipy: 1.3.0
netCDF4: 1.4.2
pydap: None
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.1.0
distributed: 2.1.0
matplotlib: 3.1.0
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.0.1
pip: 19.1.1
conda: 4.7.11
pytest: 5.0.1
IPython: 7.6.1
sphinx: 2.1.2
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3279/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
181340410,MDU6SXNzdWUxODEzNDA0MTA=,1040,DataArray.diff dim argument should be optional as is in docstring,941907,closed,0,,,7,2016-10-06T07:14:50Z,2020-03-28T18:18:21Z,2020-03-28T18:18:21Z,NONE,,,,"The dosctring of `DataArray.diff` lists the `dim` arg as optional, [but it isn't](https://github.com/pydata/xarray/blob/fbb4f0618eade20981bd5cfb9771b82fd88a8db5/xarray/core/dataarray.py#L1468). IMHO it should indeed be optional as it is quite convenient to apply `diff` to 1D signals without specifying the dimension.
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1040/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
528701910,MDU6SXNzdWU1Mjg3MDE5MTA=,3574,apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta,941907,closed,0,,,12,2019-11-26T12:45:55Z,2020-01-22T15:43:19Z,2020-01-22T15:43:19Z,NONE,,,,"#### MCVE Code Sample
```python
import numpy as np
import xarray as xr
ds = xr.Dataset({
'signal': (['das_time', 'das', 'record'], np.empty((1000, 120, 45))),
'min_height': (['das'], np.empty((120,))) # each DAS has a different resolution
})
def some_peak_finding_func(data1d, min_height):
""""""process data1d with contraints by min_height""""""
result = np.zeros((4,2)) # summary matrix with 2 peak characteristics
return result
ds_dask = ds.chunk({'record':3})
xr.apply_ufunc(some_peak_finding_func, ds_dask['signal'], ds_dask['min_height'],
input_core_dims=[['das_time'], []], # apply peak finding along trace
output_core_dims=[['peak_pos', 'pulse']],
vectorize=True, # up to here works without dask!
dask='parallelized',
output_sizes={'peak_pos': 4, 'pulse':2},
output_dtypes=[np.float],
)
```
fails with `ValueError: cannot call `vectorize` with a signature including new output dimensions on size 0 inputs` because `dask.array.utils.compute_meta()` passes it 0-sized arrays.
#### Expected Output
This should work and works well on the non-chunked ds, without `dask='parallelized'` and the associated `output*` parameters.
#### Problem Description
I'm trying to parallelize a peak finding routine with dask (works well without it) and I hoped that `dask='parallelized` would make that simple. However, the peak finding needs to be vectorized and it works well with vectorize=True`, but `np.vectorize` appears to have issues in `compute_meta` which is internally issued by dask in blockwise application as indicated in the source code:
https://github.com/dask/dask/blob/e6ba8f5de1c56afeaed05c39c2384cd473d7c893/dask/array/utils.py#L118
A possible solution might be for `apply_ufunc` to pass `meta` directly to dask if it would be possible to foresee what `meta` should be. I suppose we are aiming for `np.nadarray` most of the time, though `sparse` might change that in the future.
I know I could use groupby-apply as an alternative, but there are several issues that made us use `apply_ufunc` instead:
- groupby-apply seems to have much larger overhead
- the non-core dimensions would have to be stacked into a new dimension over which to groupby, but some of the dimensions to be stacked are already a MutliIndex and cannot be easily stacked.
- we could unstack the MultiIndex dimensions first at the risk of introducing quite a number of NaNs
- extra coords might lose dimension infromation (will depend on all) after unstacking after application
#### Output of ``xr.show_versions()``
commit: None
python: 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.9.0-11-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.1
xarray: 0.14.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: 1.4.2
pydap: None
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.5.2
distributed: 2.5.2
matplotlib: 3.1.1
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 19.2.3
conda: 4.7.12
pytest: 5.2.1
IPython: 7.8.0
sphinx: 2.2.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3574/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
189998469,MDU6SXNzdWUxODk5OTg0Njk=,1130,"pipe, apply should call maybe_wrap_array, possibly resolve dim->axis",941907,closed,0,,,6,2016-11-17T10:04:10Z,2019-01-24T18:34:38Z,2019-01-24T18:34:37Z,NONE,,,,"While `pipe` and `Dataset.apply` (btw, why not call them both the same?) specify that they expected `DataArray` returning functions, it would be very convenient to have them call `maybe_wrap_array` anyways.
I've often tried piping functions which at first looked like ufuncs only to find out that they forgot to call `__array_wrap__` (I'm looking at you `np.angle`). The extra call to `maybe_wrap_array` is cheap, does not break anything and would be very useful. It would greatly enlarge the set of functions that can be readily applied to `DataArray` objects without any need for writing function wrappers (motivated in part by #1080).
Since many such functions expect an `axis` argument, some syntax for `dim -> axis` resolution could be also added. I see some options
1) check if axis argument is a string and coerce it to a number, something like
```python
axis = kwargs.get('axis')
if axis is not None:
if isinnstance(axis, str):
kwargs['axis'] = darray.get_axis_num(axis)
```
Simple, but specifying `axis='smth'` is not very explicit and may mean something else for certain funcs, it assumes a lot about function signatures.
2) similar to 1., but only if both `dim` and `axis='dim'` are specified. Still possible conflict of specific meaning, but less likely.
```python
if kwargs.get('axis') == 'dim':
kwargs['axis'] = darray.get_axis_num(kwargs['dim'])
```
Other coding might be possible.
3) use some syntax similar to `pipe((f, 'arg2', ('axis', dim)), *args, **kwargs)`, but that's getting complicated and less readable.
Let me know what you think and perhaps you'll come up with some nicer syntax for dim-> axis resolution.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1130/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
190026722,MDExOlB1bGxSZXF1ZXN0OTQxNTMzNjE=,1131,Fix #1040: diff dim argument should be optional,941907,closed,0,,,2,2016-11-17T11:55:53Z,2019-01-14T21:18:18Z,2019-01-14T21:18:18Z,NONE,,0,pydata/xarray/pulls/1131,"* {Dataset,DataArray}.diff dim argument defaults to last dimension
* add test cases
* add changelog","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1131/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
187373423,MDU6SXNzdWUxODczNzM0MjM=,1080,"acccessor extending approach limits functional programming approach, make direct monkey-patching also possible",941907,closed,0,,,9,2016-11-04T16:06:34Z,2016-12-06T10:44:16Z,2016-12-06T10:44:16Z,NONE,,,,"Hi, thatnks for creating and continuing development of xarray. I'm in the process of converting my own functions and classes to it which did something very similar (label indexing, plotting, etc.) but was inferior in many ways.
Right now I'm designing a set of functions for digital signal processing (I need them the most, though inteprolation is also important), mostly lowpass/highpass filters and spectrograms based on `scipy.signal`. Initially I started writing a `dsp` accessor with such methods, but later I realized, that this accessor approach makes it quite hard to do something like `dataset.apply(lowpass, 5)`. Instead, one has to do something like `dataset.apply(lambda d: d.dsp.lowpass(0.5))` which is less convenient than the clear functional programming `apply` approach.
I agree that making sure that adding a method to the class does not overwrite something else is a good idea, but that can be done for single methods as well. It would be even possible to save replaced method somewhere and replace them later if requested. The great advantage is that the added methods can still be first-class functions as well.
Such methods cannot save state as easily as accessor methods, but in many cases that is not necessary.
I actually implemented something similar for my DataArray-like class (before xarray existed, now I'm trying to convert to `xarray`) with such plugin handling (below with slight modifications for `DataArray`). Let me know what you think.
```python
'''Module for handling various DataArray method plugins'''
from xarray import DataArray
from types import FunctionType
# map: name of patched method -> stack of previous methods
_REPLACED_METHODS = {}
def patch_dataarray(method_func):
'''Sets method_func as a method of the DataArray class
The method name is inferred from method_func.__name__
Can be used as decorator for functions that should be added to the
DataArray class as methods, for example::
@patch_dataarray
def square(self, arg):
return self**2
The decorated function then becomes a method of the class, so
these two are equivalent::
foo(sig) == sig.foo()
'''
method_name = method_func.__name__
method_stack = _REPLACED_METHODS.setdefault(method_name, [])
method_stack.append(getattr(DataArray, method_name, None))
setattr(DataArray, method_name, method_func)
return method_func
def restore_method(method_func):
'''Restore a previous version of a method of the DataArray class'''
method_name = method_func.__name__
try:
method_stack = _REPLACED_METHODS[method_name]
except KeyError:
return # no previous method to restore
previous_method = method_stack.pop(-1)
if previous_method is None:
delattr(DataArray, method_name)
else:
setattr(DataArray, method_name, previous_method)
def unload_module_patches(module):
'''Restore previous versions of methods found in the given module'''
for name in dir(module):
obj = getattr(module, name)
if isinstance(obj, FunctionType):
restore_method(obj)
def patch_dataarray_wraps(func, func_name=None):
'''Return a decorator that patches DataArray with the decorated function
and copies the name of the func and adds a line to the docstring
about wrapping the function
'''
if func_name is None:
func_name = func.__name__
def updater(new_func):
'''copy the function name and add a docline'''
new_func.__name__ = func_name
new_func.__doc__ = (('Wrapper around function %s\n\n' % func_name)
+ new_func.__doc__)
return patch_dataarray(new_func)
return updater
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1080/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue