id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 262642978,MDU6SXNzdWUyNjI2NDI5Nzg=,1603,Explicit indexes in xarray's data-model (Future of MultiIndex),6815844,closed,0,,741199,68,2017-10-04T01:51:47Z,2022-09-28T09:24:20Z,2022-09-28T09:24:20Z,MEMBER,,,,"I think we can continue the discussion we have in #1426 about `MultiIndex` here. In [comment](https://github.com/pydata/xarray/pull/1426#issuecomment-304778433) , @shoyer recommended to remove `MultiIndex` from public API. I agree with this, as long as my codes work with this improvement. I think if we could have a list of possible `MultiIndex` use cases here, it would be easier to deeply discuss and arrive at a consensus of the future API. Current limitations of `MultiIndex` are + It drops scalar coordinate after selection #1408, #1491 + It does not support to serialize to NetCDF #1077 + Stack/unstack behaviors are inconsistent #1431","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1603/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 29136905,MDU6SXNzdWUyOTEzNjkwNQ==,60,Implement DataArray.idxmax(),1217238,closed,0,,741199,14,2014-03-10T22:03:06Z,2020-03-29T01:54:25Z,2020-03-29T01:54:25Z,MEMBER,,,,"Should match the pandas function: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.idxmax.html ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/60/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 46756098,MDU6SXNzdWU0Njc1NjA5OA==,266,Easy iteration over slices of a DataArray,358378,closed,0,,741199,2,2014-10-24T16:20:51Z,2019-01-15T20:09:35Z,2019-01-15T20:09:34Z,CONTRIBUTOR,,,,"The `DataArray` object would benefit from functionality similar to `iris.cube.Cube.slices`. Given an array ``` [23]: data.coords Out[23]: Coordinates: * sza (sza) float64 0.0 36.87 53.13 60.0 72.54 75.52 81.37 87.13 88.28 * vza (vza) float64 0.0 72.54 * raa (raa) float64 0.0 60.0 90.0 120.0 180.0 * wl (wl) float64 360.0 380.0 400.0 420.0 440.0 ``` it would be nice to be able to do ``` for sl in data.slices([""raa"", ""wl""]): # do magic with a DataArray of coordinates (sza, vza) ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/266/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 171828347,MDU6SXNzdWUxNzE4MjgzNDc=,974,Indexing with alignment and broadcasting,1217238,closed,0,,741199,6,2016-08-18T06:39:27Z,2018-02-04T23:30:12Z,2018-02-04T23:30:11Z,MEMBER,,,,"I think we can bring all of NumPy's advanced indexing to xarray in a very consistent way, with only very minor breaks in backwards compatibility. For _boolean indexing_: - `da[key]` where `key` is a boolean labelled array (with _any_ number of dimensions) is made equivalent to `da.where(key.reindex_like(ds), drop=True)`. This matches the existing behavior if `key` is a 1D boolean array. For multi-dimensional arrays, even though the result is now multi-dimensional, this coupled with automatic skipping of NaNs means that `da[key].mean()` gives the same result as in NumPy. - `da[key] = value` where `key` is a boolean labelled array can be made equivalent to `da = da.where(*align(key.reindex_like(da), value.reindex_like(da)))` (that is, the three argument form of `where`). - `da[key_0, ..., key_n]` where all of `key_i` are boolean arrays gets handled in the usual way. It is an `IndexingError` to supply multiple labelled keys if any of them are not already aligned with as the corresponding index coordinates (and share the same dimension name). If they want alignment, we suggest users simply write `da[key_0 & ... & key_n]`. For _vectorized indexing_ (by integer or index value): - `da[key_0, ..., key_n]` where all of `key_i` are integer labelled arrays with any number of dimensions gets handled like NumPy, except instead of broadcasting numpy-style we do broadcasting xarray-style: - If any of `key_i` are unlabelled, 1D arrays (e.g., numpy arrays), we convert them into an `xarray.Variable` along the respective dimension. 0D arrays remain scalars. This ensures that the result of broadcasting them (in the next step) will be consistent with our current ""outer indexing"" behavior. Unlabelled higher dimensional arrays triggers an `IndexingError`. - We ensure all keys have the same dimensions/coordinates by mapping it to `da[*broadcast(key_0, ..., key_n)]` (note that broadcast now includes automatic alignment). - The result's dimensions and coordinates are copied from the broadcast keys. - The result's values are taken by mapping each set of integer locations specified by the broadcast version of `key_i` to the integer position on the corresponding `i`th axis on `da`. - Labeled indexing like `ds.loc[key_0, ...., key_n]` works exactly as above, except instead of doing integer lookup, we lookup label values in the corresponding index instead. - Indexing with `.isel` and `.sel`/`.reindex` works like the two previous cases, except we lookup axes by dimension name instead of axis position. - I haven't fully thought through the implications for assignment (`da[key] = value` or `da.loc[key] = value`), but I think it works in a straightforwardly similar fashion. All of these methods should also work for indexing on `Dataset` by looping over Dataset variables in the usual way. This framework neatly subsumes most of the major limitations with xarray's existing indexing: - Boolean indexing on multi-dimensional arrays works in an intuitive way, for both selection and assignment. - No more need for specialized methods (`sel_points`/`isel_points`) for pointwise indexing. If you want to select along the diagonal of an array, you simply need to supply indexers that use a new dimension. Instead of `arr.sel_points(lat=stations.lat, lon=stations.lon, dim='station')`, you would simply write `arr.sel(lat=stations.lat, lon=stations.lon)` -- the `station` dimension is taken automatically from the indexer. - Other use cases for NumPy's advanced indexing that currently are impossible in xarray also automatically work. For example, nearest neighbor interpolation to a completely different grid is now as simple as `ds.reindex(lon=grid.lon, lat=grid.lat, method='nearest', tolerance=0.5)` or `ds.reindex_like(grid, method='nearest', tolerance=0.5)`. Questions to consider: - How does this interact with @benbovy's enhancements for MultiIndex indexing? (#802 and #947) - How do we handle mixed slice and array indexing? In NumPy, this is a [major source of confusion](https://github.com/numpy/numpy/pull/6256), because slicing is done before broadcasting and the order of slices in the result is handled separately from broadcast indices. I think we may be able to resolve this by mapping slices in this case to 1D arrays along their respective axes, and using our normal broadcasting rules. - Should we deprecate non-boolean indexing with `[]` and `.loc[]` and non-labelled arrays when some but not all dimensions are provided? Instead, we would require explicitly indexing like `[key, ...]` (yes, writing `...`), which indicates ""all trailing axes"" like NumPy. This behavior has been suggested for new indexers in NumPy because it precludes a class of bugs where the array has an unexpected number of dimensions. On the other hand, it's not so necessary for us when we have explicit indexing by dimension name with `.sel`. xref [these](https://github.com/pydata/xarray/pull/964#issuecomment-239469432) [comments](https://github.com/pydata/xarray/pull/964#issuecomment-239506907) from @MaximilianR and myself Note: I would _certainly_ welcome help making this happen from a contributor other than myself, though you should probably wait until I finish #964, first, which lays important groundwork. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/974/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 107424151,MDU6SXNzdWUxMDc0MjQxNTE=,585,Parallel map/apply powered by dask.array,1217238,closed,0,,741199,11,2015-09-20T23:27:55Z,2017-10-13T15:58:30Z,2017-10-09T23:26:06Z,MEMBER,,,,"Dask is awesome, but it isn't always easy to use it for parallel operations. In many cases, especially when wrapping routines from external libraries, it is most straightforward to express operations in terms of a function that expects and returns xray objects loaded into memory. Dask array has a `map_blocks` function/method, but it's applicability is limited because dask.array doesn't have axis names for unambiguously identifying dimensions. `da.atop` can handle many of these cases, but it's not the easiest to use. Fortunately, we have sufficient metadata in xray that we could probably parallelize many `atop` operations automatically by inferring result dimensions and dtypes from applying the function once. See here for more discussion on the dask side: https://github.com/blaze/dask/issues/702 So I would like to add some convenience methods for automatic parallelization with dask of a function defined on xray objects loaded into memory. In addition to a `map_blocks` method/function, it would be useful to add some sort of `parallel_apply` method to groupby objects that works very similarly, by lazily applying a function that takes and returns xray objects loaded into memory. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/585/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 171077425,MDU6SXNzdWUxNzEwNzc0MjU=,967,sortby() or sort_index() method for Dataset and DataArray,1217238,closed,0,,741199,8,2016-08-14T20:40:13Z,2017-05-12T00:29:12Z,2017-05-12T00:29:12Z,MEMBER,,,,"They should function like the pandas methods of the same name. Under the covers, I believe it would suffice to simply remap `ds.sort_index('time')` -> `ds.isel(time=ds.indexes['time'].argsort())`. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/967/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 42380798,MDU6SXNzdWU0MjM4MDc5OA==,230,"set_index(keys, inplace=False) should be both a DataArray and Dataset method.",1217238,closed,0,,741199,1,2014-09-10T06:03:56Z,2017-02-01T16:57:50Z,2017-02-01T16:57:50Z,MEMBER,,,,"originally mentioned in #197. ideally this will smoothly create multi-indexes as/when necessary (#164), just like the pandas method ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/230/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 124665607,MDU6SXNzdWUxMjQ2NjU2MDc=,700,BUG: not converting series with CategoricalIndex,953992,closed,0,,741199,2,2016-01-03T19:05:59Z,2017-02-01T16:56:56Z,2017-02-01T16:56:56Z,MEMBER,,,,"xray 0.6.1 ``` In [1]: s = Series(range(5),index=pd.CategoricalIndex(list('aabbc'),name='foo')) In [4]: xray.DataArray.from_series(s) ValueError: object __array__ method not producing an array ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/700/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 168848449,MDU6SXNzdWUxNjg4NDg0NDk=,931,How to cite xarray in a research paper,358378,closed,0,2443309,741199,4,2016-08-02T10:13:09Z,2016-08-04T21:17:53Z,2016-08-04T21:17:53Z,CONTRIBUTOR,,,,"It would be helpful if the documentation had an entry (for example, in the FAQ) about how to properly cite xarray for a scientific publication. I personally like the way e.g. the ipython folks are doing it, they have bibtex code to copy'n'paste (see https://ipython.org/citing.html). This issue is related to #290, but addresses the general problem and not a specific way. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/931/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 109589162,MDU6SXNzdWUxMDk1ODkxNjI=,605,Support Two-Dimensional Coordinate Variables,2443309,closed,0,,741199,11,2015-10-02T23:27:18Z,2016-07-31T23:02:46Z,2016-07-31T23:02:46Z,MEMBER,,,,"The CF Conventions supports the notion of a 2d coordinate variable in the case of irregularly spaced data. An example of this sort of dataset is below. The CF Convention is to add a ""coordinates"" attribute with a string describing the 2d coordinates. ``` dimensions: xc = 128 ; yc = 64 ; lev = 18 ; variables: float T(lev,yc,xc) ; T:long_name = ""temperature"" ; T:units = ""K"" ; T:coordinates = ""lon lat"" ; float xc(xc) ; xc:axis = ""X"" ; xc:long_name = ""x-coordinate in Cartesian system"" ; xc:units = ""m"" ; float yc(yc) ; yc:axis = ""Y"" ; yc:long_name = ""y-coordinate in Cartesian system"" ; yc:units = ""m"" ; float lev(lev) ; lev:long_name = ""pressure level"" ; lev:units = ""hPa"" ; float lon(yc,xc) ; lon:long_name = ""longitude"" ; lon:units = ""degrees_east"" ; float lat(yc,xc) ; lat:long_name = ""latitude"" ; lat:units = ""degrees_north"" ; ``` I'd like to discuss how we could support this in xray. There motivating application for this is in plotting operations but it may also have application in other grouping and remapping operations (e.g. #324, #475, #486). One option would just to honor the ""coordinates"" attr in plotting and use the specified coordinates as the x/y values. ref: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html#idp5559280 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/605/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 33559045,MDU6SXNzdWUzMzU1OTA0NQ==,130,Wrap bottleneck for fast moving window aggregations,1217238,closed,0,,741199,4,2014-05-15T06:42:43Z,2016-02-20T02:35:09Z,2016-02-20T02:35:09Z,MEMBER,,,,"Like pandas, we should wrap [bottleneck](https://github.com/kwgoodman/bottleneck) to create fast moving window operations and missing value operation that can be applied to xray data arrays. As xray is designed to make it straightforward to work with high dimensional arrays, it would be particularly convenient if bottleneck had fast functions for N > 3 dimensions (see kwgoodman/bottleneck/issues/84) but we should wrap bottleneck regardless for functions like rolling_mean, rolling_sum, rolling_min, etc. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/130/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 35956575,MDU6SXNzdWUzNTk1NjU3NQ==,164,Support pandas.MultIndex axes on xray objects,1217238,closed,0,,741199,0,2014-06-18T05:53:08Z,2016-01-18T00:11:11Z,2016-01-18T00:11:11Z,MEMBER,,,,"- Appropriate casting with `xray.Coordinate` - Call out to `MultIndex.get_locs` in `indexing.convert_label_indexer` - Get multi-index support working with `.loc` and `.sel()` - Serialization to NetCDF - Consider stack/unstack and pivot like methods Not all of these would be necessary for an MVP. ~~Right now we don't consider the possibility of a MultiIndex at all -- at the very least it would be nice to give an error message.~~ ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/164/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 38109425,MDU6SXNzdWUzODEwOTQyNQ==,185,Plot methods,1217238,closed,0,,741199,10,2014-07-17T18:07:18Z,2015-08-18T18:25:39Z,2015-08-18T18:25:39Z,MEMBER,,,,"It would be awesome to have built in plot methods, similar to `pandas.DataFrame.plot` and `pandas.Series.plot`. Although we could just copy the basic plotting methods from pandas, the strongest need is for cases where there is no corresponding plot methods. Notably, we should have shortcut methods for plotting 2-dimensional arrays with labels, corresponding to matplotlib's contour/contourf/imshow/pcolormesh. If we include an axis argument, such an API should even suffice for plotting data on a map via [cartopy](http://scitools.org.uk/cartopy/docs/latest/matplotlib/intro.html), although it wouldn't hurt to add some optional keyword arguments shortcuts (e.g., `proj='orthographic'`). ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/185/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 59572709,MDU6SXNzdWU1OTU3MjcwOQ==,354,resample method,1217238,closed,0,,741199,0,2015-03-02T23:55:58Z,2015-03-05T19:29:39Z,2015-03-05T19:29:39Z,MEMBER,,,,"This should be a shortcut for `.groupby(resampled_times).mean('time')` (e.g., [this example](http://xray.readthedocs.org/en/v0.3.2/examples/weather-data.html#monthly-averaging)), with an API similar to [`resample` in pandas](http://pandas.pydata.org/pandas-docs/stable/timeseries.html#up-and-downsampling). Something like the following should work: `ds.resample('24H', dim='time', how='mean', base=12, label='right')`. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/354/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 51070269,MDU6SXNzdWU1MTA3MDI2OQ==,286,Add support for attribute based variable lookups?,1217238,closed,0,,741199,0,2014-12-05T07:17:31Z,2014-12-24T07:07:24Z,2014-12-24T07:07:24Z,MEMBER,,,,"e.g., `ds.latitude` instead of `ds['latitude']` It should include autocomplete support in editors like IPython. This would make it a little easier to use xray, but isn't a top priority for me to implement right now. Pull requests would be welcome! ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/286/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 39383188,MDU6SXNzdWUzOTM4MzE4OA==,200,"Support mathematical operators (+-*/, etc) for Dataset objects",1217238,closed,0,,741199,0,2014-08-04T00:10:04Z,2014-09-07T04:18:05Z,2014-09-07T04:18:05Z,MEMBER,,,,"(`Dataset`, `Dataset`) operations like `ds - ds` should align based on the names of non-coordinates, and then pass all operations off to the `DataArray` objects. Even when we switch to doing automatic alignment, an exception should be raised if the intersection of non-coordinate names is empty. (`Dataset`, `DataArray`) or (`Dataset`, `ndarray`) operations like `ds - ds['x']` should simply map over the dataset non-coordinates. Note that this behaved is _different_ from pandas, for which `df - df['x']` will usually raise an exception: pandas aligns Series to DataFrame rows, following numpy's broadcasting rules. This would be a nice complement to Dataset summary methods (#131). ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/200/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue