id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 205455788,MDU6SXNzdWUyMDU0NTU3ODg=,1251,Consistent naming for xarray's methods that apply functions,1217238,closed,0,,,13,2017-02-05T21:27:24Z,2022-04-27T20:06:25Z,2022-04-27T20:06:25Z,MEMBER,,,,"We currently have two types of methods that take a function to apply to xarray objects: - `pipe` (on `DataArray` and `Dataset`): apply a function to this entire object (`array.pipe(func)` -> `func(array)`) - `apply` (on `Dataset` and `GroupBy`): apply a function to each labeled object in this object (e.g., `ds.apply(func)` -> `ds({k: func(v) for k, v in ds.data_vars.items()})`). And one more method that we want to add but isn't finalized yet -- currently named `apply_ufunc`: - Apply a function that acts on unlabeled (i.e., numpy) arrays to each array in the object I'd like to have three distinct names that makes it clear what these methods do and how they are different. This has come up a few times recently, e.g., https://github.com/pydata/xarray/issues/1130 One proposal: rename `apply` to `map`, and then use `apply` only for methods that act on unlabeled arrays. This would require a deprecation cycle, but eventually it would let us add `.apply` methods for handling raw arrays to both Dataset and DataArray. (We could use a separate apply method from `apply_ufunc` to convert `dim` arguments to `axis` and not do automatic broadcasting.)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1251/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 28376794,MDU6SXNzdWUyODM3Njc5NA==,25,Consistent rules for handling merges between variables with different attributes,1217238,closed,0,,,13,2014-02-26T22:37:01Z,2020-04-05T19:13:13Z,2014-09-04T06:50:49Z,MEMBER,,,,"Currently, variable attributes are checked for equality before allowing for a merge via a call to `xarray_equal`. It should be possible to merge datasets even if some of the variable metadata disagrees (conflicting attributes should be dropped). This is already the behavior for global attributes. The right design of this feature should probably include some optional argument to `Dataset.merge` indicating how strict we want the merge to be. I can see at least three versions that could be useful: 1. Drop conflicting metadata silently. 2. Don't allow for conflicting values, but drop non-matching keys. 3. Require all keys and values to match. We can argue about which of these should be the default option. My inclination is to be as flexible as possible by using 1 or 2 in most cases. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/25/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 124700322,MDExOlB1bGxSZXF1ZXN0NTQ5NDUxNzE=,702,Basic multiIndex support and stack/unstack methods,1217238,closed,0,,,13,2016-01-04T05:48:49Z,2016-06-01T16:48:54Z,2016-01-18T00:11:11Z,MEMBER,,0,pydata/xarray/pulls/702,"Fixes #164, #700 Example usage: ``` In [3]: df = pd.DataFrame({'foo': range(3), ...: 'x': ['a', 'b', 'b'], ...: 'y': [0, 0, 1]}) ...: In [4]: s = df.set_index(['x', 'y'])['foo'] In [5]: arr = xray.DataArray(s, dims='z') In [6]: arr Out[6]: array([0, 1, 2]) Coordinates: * z (z) object ('a', 0) ('b', 0) ('b', 1) In [7]: arr.indexes['z'] Out[7]: MultiIndex(levels=[[u'a', u'b'], [0, 1]], labels=[[0, 1, 1], [0, 0, 1]], names=[u'x', u'y']) In [8]: arr.unstack('z') Out[8]: array([[ 0., nan], [ 1., 2.]]) Coordinates: * x (x) object 'a' 'b' * y (y) int64 0 1 In [9]: arr.unstack('z').stack(z=('x', 'y')) Out[9]: array([ 0., nan, 1., 2.]) Coordinates: * z (z) object ('a', 0) ('a', 1) ('b', 0) ('b', 1) ``` TODO (maybe not necessary yet, but eventually): - [x] Multi-index support working with `.loc` and `.sel()` - [x] Multi-dimensional `stack`/`unstack` - [ ] Serialization to NetCDF - [ ] Better repr, showing level names/dtypes? - [ ] Make levels accessible as coordinate variables (e.g., `ds['time']` can pull out the `'time'` level of a multi-index) - [ ] Make `isel_points`/`sel_points` return objects with a MultiIndex? (probably after the previous TODO, so we can preserve basic backwards compatibility) - [ ] Add `set_index`/`reset_index`/`swaplevel` to make it easier to create and manipulate multi-indexes It would be nice to eventually build a full example showing how `stack` can be combined with lazy loading / dask to do out-of-core PCA on a large geophysical dataset (e.g., identify El Nino). cc @MaximilianR @jreback @jhamman ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/702/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull