home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

7 rows where user = 941907 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 6
  • pull 1

state 2

  • closed 6
  • open 1

repo 1

  • xarray 7
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2275107296 I_kwDOAMm_X86Hm2Hg 8992 (i)loc slicer specialization for convenient slicing by dimension label as `.loc('dim_name')[:n]` smartass101 941907 open 0     0 2024-05-02T10:04:11Z 2024-05-02T14:47:09Z   NONE      

Is your feature request related to a problem?

Until PEP 472, I'm sure we would all love to be able to do indexing with labeled dimension names inside brackets. Here I'm proposing a slightly modified syntax which is possible to implement and would be quite convenient IMHO.

Describe the solution you'd like

This is inspired by the Pandas .loc(axis=n) specialization. Essentially the .(i)loc accessors would become callable like in Pandas, which would enable to specify the desired order of dimensions in the subsequent slicing brackets. Schematically ```python darr.loc('dim name 1', 'dim name 2')[x1:x2,y1:y2]

is equivalent to first returning an augmented `_LocIndexer` which now associates positional indexes to according to the provided dim orderpython loc_idx_spec = darr.loc('dim name 1', 'dim name 2') loc_idx_spec[x1:x2,y1:y2] `` The first part is essentially similar to.transpose('dim name 1', 'dim name 2')and in the case of aDataArray` it could be used instead. But this syntax could work also for Dataset. Additonally, it does not require an actual transpose operation.

This accessor becomes especially convenient when you quickly want to index just one dimension such as python darr.loc('dim name')[:x2]

Describe alternatives you've considered

The equivalent darr.sel({'dim name 1': slice(x1, x2), 'dim name 2': slice(y1,y2)}) is admittedly not that much worse, but for me writing slice feels cumbersome especially in situations when you have a lot of None specifications such as slice(None,None,2).

Additional context

This .loc(axis=n) API is (not so obviously) documented for Pandas here.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8992/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
489034521 MDU6SXNzdWU0ODkwMzQ1MjE= 3279 Feature request: vector cross product smartass101 941907 closed 0     2 2019-09-04T09:05:41Z 2021-12-29T07:54:37Z 2021-12-29T07:54:37Z NONE      

xarray currently has the xarray.dot() function for calculating arbitrary dot products which is indeed very handy. Sometimes, especially for physical applications I also need a vector cross product. I' wondering whether you would be interested in having xarray.cross as a wrapper of numpy.cross. I currently use the following implementation:

```python def cross(a, b, spatial_dim, output_dtype=None): """xarray-compatible cross product

Compatible with dask, parallelization uses a.dtype as output_dtype
"""
# TODO find spatial dim default by looking for unique 3(or 2)-valued dim?
for d in (a, b):
    if spatial_dim not in d.dims:
        raise ValueError('dimension {} not in {}'.format(spatial_dim, d))
    if d.sizes[spatial_dim] != 3:  #TODO handle 2-valued cases
        raise ValueError('dimension {} has not length 3 in {}'.format(d))

if output_dtype is None: 
    output_dtype = a.dtype  # TODO some better way to determine default?
c = xr.apply_ufunc(np.cross, a, b,
                   input_core_dims=[[spatial_dim], [spatial_dim]], 
                   output_core_dims=[[spatial_dim]], 
                   dask='parallelized', output_dtypes=[output_dtype]
                  )
return c

```

Example usage

python import numpy as np import xarray as xr a = xr.DataArray(np.empty((10, 3)), dims=['line', 'cartesian']) b = xr.full_like(a, 1) c = cross(a, b, 'cartesian')

Main question

Do you want such a function (and possibly associated DataArray.cross methods) in the xarray namespace, or should it be in some other package? I didn't find a package which would be a good fit as this is close to core numpy functionality and isn't as domain specific as some geo packages. I'm not aware of some "xrphysics" package.

I could make a PR if you'd want to have it in xarray directly.

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.9.0-9-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.12.3 pandas: 0.24.2 numpy: 1.16.4 scipy: 1.3.0 netCDF4: 1.4.2 pydap: None h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.1.0 distributed: 2.1.0 matplotlib: 3.1.0 cartopy: None seaborn: 0.9.0 numbagg: None setuptools: 41.0.1 pip: 19.1.1 conda: 4.7.11 pytest: 5.0.1 IPython: 7.6.1 sphinx: 2.1.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3279/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
181340410 MDU6SXNzdWUxODEzNDA0MTA= 1040 DataArray.diff dim argument should be optional as is in docstring smartass101 941907 closed 0     7 2016-10-06T07:14:50Z 2020-03-28T18:18:21Z 2020-03-28T18:18:21Z NONE      

The dosctring of DataArray.diff lists the dim arg as optional, but it isn't. IMHO it should indeed be optional as it is quite convenient to apply diff to 1D signals without specifying the dimension.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1040/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
528701910 MDU6SXNzdWU1Mjg3MDE5MTA= 3574 apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta smartass101 941907 closed 0     12 2019-11-26T12:45:55Z 2020-01-22T15:43:19Z 2020-01-22T15:43:19Z NONE      

MCVE Code Sample

```python import numpy as np import xarray as xr

ds = xr.Dataset({ 'signal': (['das_time', 'das', 'record'], np.empty((1000, 120, 45))), 'min_height': (['das'], np.empty((120,))) # each DAS has a different resolution })

def some_peak_finding_func(data1d, min_height): """process data1d with contraints by min_height""" result = np.zeros((4,2)) # summary matrix with 2 peak characteristics return result

ds_dask = ds.chunk({'record':3})

xr.apply_ufunc(some_peak_finding_func, ds_dask['signal'], ds_dask['min_height'], input_core_dims=[['das_time'], []], # apply peak finding along trace output_core_dims=[['peak_pos', 'pulse']], vectorize=True, # up to here works without dask! dask='parallelized', output_sizes={'peak_pos': 4, 'pulse':2}, output_dtypes=[np.float], ) ```

fails with ValueError: cannot callvectorizewith a signature including new output dimensions on size 0 inputs because dask.array.utils.compute_meta() passes it 0-sized arrays.

Expected Output

This should work and works well on the non-chunked ds, without dask='parallelized' and the associated output* parameters.

Problem Description

I'm trying to parallelize a peak finding routine with dask (works well without it) and I hoped that dask='parallelized would make that simple. However, the peak finding needs to be vectorized and it works well with vectorize=True, butnp.vectorizeappears to have issues incompute_meta` which is internally issued by dask in blockwise application as indicated in the source code:

https://github.com/dask/dask/blob/e6ba8f5de1c56afeaed05c39c2384cd473d7c893/dask/array/utils.py#L118

A possible solution might be for apply_ufunc to pass meta directly to dask if it would be possible to foresee what meta should be. I suppose we are aiming for np.nadarray most of the time, though sparse might change that in the future.

I know I could use groupby-apply as an alternative, but there are several issues that made us use apply_ufunc instead: - groupby-apply seems to have much larger overhead - the non-core dimensions would have to be stacked into a new dimension over which to groupby, but some of the dimensions to be stacked are already a MutliIndex and cannot be easily stacked. - we could unstack the MultiIndex dimensions first at the risk of introducing quite a number of NaNs - extra coords might lose dimension infromation (will depend on all) after unstacking after application

Output of xr.show_versions()

commit: None python: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.9.0-11-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.14.0 pandas: 0.25.1 numpy: 1.17.2 scipy: 1.3.1 netCDF4: 1.4.2 pydap: None h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.5.2 distributed: 2.5.2 matplotlib: 3.1.1 cartopy: None seaborn: 0.9.0 numbagg: None setuptools: 41.4.0 pip: 19.2.3 conda: 4.7.12 pytest: 5.2.1 IPython: 7.8.0 sphinx: 2.2.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3574/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
189998469 MDU6SXNzdWUxODk5OTg0Njk= 1130 pipe, apply should call maybe_wrap_array, possibly resolve dim->axis smartass101 941907 closed 0     6 2016-11-17T10:04:10Z 2019-01-24T18:34:38Z 2019-01-24T18:34:37Z NONE      

While pipe and Dataset.apply (btw, why not call them both the same?) specify that they expected DataArray returning functions, it would be very convenient to have them call maybe_wrap_array anyways.

I've often tried piping functions which at first looked like ufuncs only to find out that they forgot to call __array_wrap__ (I'm looking at you np.angle). The extra call to maybe_wrap_array is cheap, does not break anything and would be very useful. It would greatly enlarge the set of functions that can be readily applied to DataArray objects without any need for writing function wrappers (motivated in part by #1080).

Since many such functions expect an axis argument, some syntax for dim -> axis resolution could be also added. I see some options

1) check if axis argument is a string and coerce it to a number, something like python axis = kwargs.get('axis') if axis is not None: if isinnstance(axis, str): kwargs['axis'] = darray.get_axis_num(axis)

Simple, but specifying axis='smth' is not very explicit and may mean something else for certain funcs, it assumes a lot about function signatures.

2) similar to 1., but only if both dim and axis='dim' are specified. Still possible conflict of specific meaning, but less likely.

python if kwargs.get('axis') == 'dim': kwargs['axis'] = darray.get_axis_num(kwargs['dim'])

Other coding might be possible.

3) use some syntax similar to pipe((f, 'arg2', ('axis', dim)), *args, **kwargs), but that's getting complicated and less readable.

Let me know what you think and perhaps you'll come up with some nicer syntax for dim-> axis resolution.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1130/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
190026722 MDExOlB1bGxSZXF1ZXN0OTQxNTMzNjE= 1131 Fix #1040: diff dim argument should be optional smartass101 941907 closed 0     2 2016-11-17T11:55:53Z 2019-01-14T21:18:18Z 2019-01-14T21:18:18Z NONE   0 pydata/xarray/pulls/1131
  • {Dataset,DataArray}.diff dim argument defaults to last dimension

  • add test cases

  • add changelog

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1131/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
187373423 MDU6SXNzdWUxODczNzM0MjM= 1080 acccessor extending approach limits functional programming approach, make direct monkey-patching also possible smartass101 941907 closed 0     9 2016-11-04T16:06:34Z 2016-12-06T10:44:16Z 2016-12-06T10:44:16Z NONE      

Hi, thatnks for creating and continuing development of xarray. I'm in the process of converting my own functions and classes to it which did something very similar (label indexing, plotting, etc.) but was inferior in many ways.

Right now I'm designing a set of functions for digital signal processing (I need them the most, though inteprolation is also important), mostly lowpass/highpass filters and spectrograms based on scipy.signal. Initially I started writing a dsp accessor with such methods, but later I realized, that this accessor approach makes it quite hard to do something like dataset.apply(lowpass, 5). Instead, one has to do something like dataset.apply(lambda d: d.dsp.lowpass(0.5)) which is less convenient than the clear functional programming apply approach.

I agree that making sure that adding a method to the class does not overwrite something else is a good idea, but that can be done for single methods as well. It would be even possible to save replaced method somewhere and replace them later if requested. The great advantage is that the added methods can still be first-class functions as well.

Such methods cannot save state as easily as accessor methods, but in many cases that is not necessary.

I actually implemented something similar for my DataArray-like class (before xarray existed, now I'm trying to convert to xarray) with such plugin handling (below with slight modifications for DataArray). Let me know what you think.

```python '''Module for handling various DataArray method plugins''' from xarray import DataArray from types import FunctionType

map: name of patched method -> stack of previous methods

_REPLACED_METHODS = {}

def patch_dataarray(method_func): '''Sets method_func as a method of the DataArray class

The method name is inferred from method_func.__name__

Can be used as decorator for functions that should be added to the
DataArray class as methods, for example::

    @patch_dataarray
    def square(self, arg):
        return self**2

The decorated function then becomes a method of the class, so
these two are equivalent::

    foo(sig) == sig.foo()

'''
method_name = method_func.__name__
method_stack = _REPLACED_METHODS.setdefault(method_name, [])
method_stack.append(getattr(DataArray, method_name, None))
setattr(DataArray, method_name, method_func)
return method_func

def restore_method(method_func): '''Restore a previous version of a method of the DataArray class''' method_name = method_func.name try: method_stack = _REPLACED_METHODS[method_name] except KeyError: return # no previous method to restore previous_method = method_stack.pop(-1) if previous_method is None: delattr(DataArray, method_name) else: setattr(DataArray, method_name, previous_method)

def unload_module_patches(module): '''Restore previous versions of methods found in the given module''' for name in dir(module): obj = getattr(module, name) if isinstance(obj, FunctionType): restore_method(obj)

def patch_dataarray_wraps(func, func_name=None): '''Return a decorator that patches DataArray with the decorated function

and copies the name of the func and adds a line to the docstring
about wrapping the function
'''
if func_name is None:
    func_name = func.__name__
def updater(new_func):
    '''copy the function name and add a docline'''
    new_func.__name__ = func_name
    new_func.__doc__ = (('Wrapper around function %s\n\n' % func_name)
                        + new_func.__doc__)
    return patch_dataarray(new_func)
return updater

```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1080/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 25.955ms · About: xarray-datasette