id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 771198437,MDExOlB1bGxSZXF1ZXN0NTQyNzk3NDYy,4711,Adding vectorized indexing docs,44210245,closed,0,,,3,2020-12-18T22:10:49Z,2021-02-16T23:37:30Z,2021-02-16T23:37:30Z,CONTRIBUTOR,,0,pydata/xarray/pulls/4711,"- [x] closes #4630, closes #3768 #4630: Adds a new vectorized indexing example to `sel` docstring and narrative docs. Thanks to @dcherian for introducing me to vectorized indexing and @keewis for providing some information to get started. Also thanks to the community for the excellent contribution guide. http://xarray.pydata.org/en/stable/contributing.html Am I missing anything here? Or is there anything that can be improved? I'm happy to see this through - thanks in advance for any feedback/tips! ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4711/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 564555854,MDU6SXNzdWU1NjQ1NTU4NTQ=,3768,Pointwise indexing,8238804,closed,0,,,6,2020-02-13T09:39:27Z,2021-02-16T23:37:29Z,2021-02-16T23:37:29Z,NONE,,,,"#### MCVE Code Sample ```python import xarray as xr import numpy as np da = xr.DataArray( np.arange(56).reshape((7, 8)), coords={ 'x': list('abcdefg'), 'y': 10 * np.arange(8) }, dims=['x', 'y'] ) # Shouldn't this be (2,)? assert da.isel(x=[0, 1], y=[0, 1]).shape == (2, 2) ``` #### Expected Output I had expected `da.isel(x=[0, 1], y=[0, 1])` to have shape `(2,)`. I had generally expected indexing with `isel` to behave more like numpy indexing. It's very possible I'm just missing something, or that this is more of a documentation issue more than a behavior issue. #### Problem Description Going off this example in #507: ```python In [3]: da.isel_points(x=[0, 1, 6], y=[0, 1, 0], dim='points') Out[3]: array([ 0, 9, 48]) Coordinates: y (points) int64 0 10 0 x (points) |S1 'a' 'b' 'g' * points (points) int64 0 1 2 ``` and the [deprecation of `isel_points` with `isel`](http://xarray.pydata.org/en/stable/whats-new.html#id75), I had expected to get numpy-like coordinate indexing using `isel`. This was made a little bit more confusing by the documentation for [setting values by index](http://xarray.pydata.org/en/stable/indexing.html#assigning-values-with-indexing). In particular the example: ```python In [68]: da[ind_x, ind_y] = -2 # assign -2 to (ix, iy) = (0, 0) and (1, 1) In [69]: da Out[69]: array([[-2, -2, -1, -1], [-2, -2, 6, 7], [ 8, 9, 10, 11]]) ``` To me, the comment `# assign -2 to (ix, iy) = (0, 0) and (1, 1)` makes it sound like values will be assigned at the coordinates (0, 0) and (1, 1), not (0, 0), (0, 1), (1, 0), and (1, 1). All in all, I'm not sure if this is a bug, or an issue with documentation. If `isel` is not meant to behave like `isel_points`, it would be nice to see that in the documentation. If it is possible to get and set points by coordinate (without looping over single coordinates) it would be nice to see an example in the documentation where that's shown. #### Output of ``xr.show_versions()``
# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 (default, Jan 4 2020, 12:18:30) [Clang 11.0.0 (clang-1100.0.33.16)] python-bits: 64 OS: Darwin OS-release: 19.3.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.2 libnetcdf: 4.6.3 xarray: 0.15.0 pandas: 1.0.1 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.2 pydap: None h5netcdf: 0.7.4 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.9.2 distributed: 2.9.3 matplotlib: 3.1.3 cartopy: None seaborn: 0.10.0 numbagg: None setuptools: 45.2.0 pip: 20.0.2 conda: None pytest: 5.3.4 IPython: 7.11.1 sphinx: 2.3.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3768/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 753874419,MDU6SXNzdWU3NTM4NzQ0MTk=,4630,".sel(...., method='nearest') fails for large requests. ",44210245,closed,0,,,8,2020-11-30T23:20:18Z,2021-02-16T23:37:29Z,2021-02-16T23:37:29Z,CONTRIBUTOR,,,,"A common usage of `xarray` is to retrieve climate model data from the grid cells closest to a weather station. That might look like this ``` import xarray as xr import numpy as np ds = xr.tutorial.open_dataset(""air_temperature"") # Define taget latitude and longitude tgt_lat = np.linspace(0, 100, num=10) tgt_lon = np.linspace(0, 100, num=10) # Retrieve data at target latitude and longitude tgt_data = ds['air'].sel(lon=tgt_lon, lat=tgt_lat, method='nearest') ``` My problem is that I am trying subset `ds` to 10 points in space (which is the length of tgt_lat and tgt_lon), but in fact `xarray` retrieves 100 points (10 latitude by 10 longitude). I can get around this by calling `tgt_data = tgt_data.values.diagonal()`. But this results in a non-xarray object. Furthermore, if instead of querying for 10 points in space, I query for 10,000, I run out of memory because `xarray` retrieves 100,000,000 points in space (10,000^2). Is there a way to only retrieve the diagonal elements? If not, is this something that should be added? ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4630/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 788398518,MDExOlB1bGxSZXF1ZXN0NTU2OTE3MDIx,4823,Allow fsspec URLs in open_(mf)dataset,6042212,closed,0,,,20,2021-01-18T16:22:35Z,2021-02-16T21:26:53Z,2021-02-16T21:18:05Z,CONTRIBUTOR,,0,pydata/xarray/pulls/4823," - [x] Closes #4461 and related - [x] Tests added - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [x] New functions/methods are listed in `api.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4823/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 603309899,MDU6SXNzdWU2MDMzMDk4OTk=,3985,xarray=1.15.1 regression: Groupby drop multi-index,8419157,closed,0,,,4,2020-04-20T15:05:51Z,2021-02-16T15:59:46Z,2021-02-16T15:59:46Z,NONE,,,,"I have written a function `process_stacked_groupby` that stack all but one dimension of a dataset/dataarray and perform `groupby-apply-combine` on the stacked dimension. However, after upgrading to 0.15.1, the function cease to work. #### MCVE Code Sample ```python import xarray as xr # Dimensions N = xr.DataArray(np.arange(100), dims='N', name='N') reps = xr.DataArray(np.arange(5), dims='reps', name='reps') horizon = xr.DataArray([1, -1], dims='horizon', name='horizon') horizon.attrs = {'long_name': 'Horizonal', 'units': 'H'} vertical = xr.DataArray(np.arange(1, 4), dims='vertical', name='vertical') vertical.attrs = {'long_name': 'Vertical', 'units': 'V'} # Variables x = xr.DataArray(np.random.randn(len(N), len(reps), len(horizon), len(vertical)), dims=['N', 'reps', 'horizon', 'vertical'], name='x') y = x * 0.1 y.name = 'y' # Merge x, y data = xr.merge([x, y]) # Assign coords data = data.assign_coords(reps=reps, vertical=vertical, horizon=horizon) # Function that stack all but one diensions and groupby over the stacked dimension. def process_stacked_groupby(ds, dim, func, *args): # Function to apply to stacked groupby def apply_fn(ds, dim, func, *args): # Get groupby dim groupby_dim = list(ds.dims) groupby_dim.remove(dim) groupby_var = ds[groupby_dim] # Unstack groupby dim ds2 = ds.unstack(groupby_dim).squeeze() # perform function ds3 = func(ds2, *args) # Add mulit-index groupby_var to result ds3 = (ds3 .reset_coords(drop=True) .assign_coords(groupby_var) .expand_dims(groupby_dim) ) return ds3 # Get list of dimensions groupby_dims = list(ds.dims) # Remove dimension not grouped groupby_dims.remove(dim) # Stack all but one dimensions stack_dim = '_'.join(groupby_dims) ds2 = ds.stack({stack_dim: groupby_dims}) # Groupby and apply ds2 = ds2.groupby(stack_dim, squeeze=False).map(apply_fn, args=(dim, func, *args)) # Unstack ds2 = ds2.unstack(stack_dim) # Restore attrs for dim in groupby_dims: ds2[dim].attrs = ds[dim].attrs return ds2 # Function to apply on groupby def fn(ds): return ds # Run groupby with applied function data.pipe(process_stacked_groupby, 'N', fn) ``` #### Expected Output Prior to xarray=0.15.0, the above code produce a result that I wanted. The function should be able to 1. stack chosen dimensions 2. groupby the stacked dimension 3. apply a function on each group a. The function actually passes along another function with unstacked group coord b. Add multi-index stacked group coord back to the results of this function 4. combine the groups 5. Unstack stacked dimension #### Problem Description After upgrading to 0.15.1, the above code stopped working. The error occurred at the line ``` # Unstack ds2 = ds2.unstack(stack_dim) ``` with `ValueError: cannot unstack dimensions that do not have a MultiIndex: ['horizon_reps_vertical']`. This is on 5th step where the resulting combined object was found not to contain any multi-index. Somewhere in the 4th step, the combination of groups have lost the multi-index stacked dimension. #### Versions 0.15.1","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3985/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 709187212,MDExOlB1bGxSZXF1ZXN0NDkzMjkyOTIw,4461,Allow fsspec/zarr/mfdataset,6042212,closed,0,,,18,2020-09-25T18:14:38Z,2021-02-16T15:36:54Z,2021-02-16T15:36:54Z,CONTRIBUTOR,,0,pydata/xarray/pulls/4461,"Requires https://github.com/zarr-developers/zarr-python/pull/606 - [ ] ~Closes #xxxx~ - [x] Tests added - [x] Passes `isort . && black . && mypy . && flake8` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [x] New functions/methods are listed in `api.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4461/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 496809167,MDU6SXNzdWU0OTY4MDkxNjc=,3332,Memory usage of `da.rolling().construct`,923438,closed,0,,,5,2019-09-22T17:35:06Z,2021-02-16T15:00:37Z,2021-02-16T15:00:37Z,NONE,,,,"If I were to do `data_array.rolling( time=1000 ).construct('temp_time')` - what is going on under hood ? Does it make a 1000 phyiscal copies of the original dataarray - or is it only returning a view ? I feel like it's the latter - but I'm seeing a memory spike (about 20-30% increase in total process memory consumption) when I use it - so there might be something else going on ? Any ideas / pointers would be appreciated. Thanks! ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3332/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 748229907,MDU6SXNzdWU3NDgyMjk5MDc=,4598,Calling pd.to_datetime on cftime variable,17162724,closed,0,,,4,2020-11-22T12:14:27Z,2021-02-16T02:42:35Z,2021-02-16T02:42:35Z,CONTRIBUTOR,,,,"It would be nice to be able to convert cftime variables to pandas datetime to utilize the functionality there. I understand this is an upstream issue as pandas probably isn't aware of cftime. However, i'm curious if a method could be added to cftime such as .to_dataframe(). I've found `pd.to_datetime(np.datetime64(date_cf))` is the best way to do this currently. ``` import xarray as xr import numpy as np import pandas as pd date_str = '2020-01-01' date_np = np.datetime64(date_str) >>> date_np numpy.datetime64('2020-01-01') date_pd = pd.to_datetime(date_np) >>> date_pd Timestamp('2020-01-01 00:00:00') date_cf = xr.cftime_range(start=date_str, periods=1)[0] pd.to_datetime(date_cf) >>> pd.to_datetime(date_cf) Traceback (most recent call last): File """", line 1, in File ""/home/ray/local/bin/anaconda3/envs/a/lib/python3.8/site-packages/pandas/core/tools/datetimes.py"", line 830, in to_datetime result = convert_listlike(np.array([arg]), format)[0] File ""/home/ray/local/bin/anaconda3/envs/a/lib/python3.8/site-packages/pandas/core/tools/datetimes.py"", line 459, in _convert_listlike_datetimes result, tz_parsed = objects_to_datetime64ns( File ""/home/ray/local/bin/anaconda3/envs/a/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py"", line 2044, in objects_to_datetime64ns result, tz_parsed = tslib.array_to_datetime( File ""pandas/_libs/tslib.pyx"", line 352, in pandas._libs.tslib.array_to_datetime File ""pandas/_libs/tslib.pyx"", line 579, in pandas._libs.tslib.array_to_datetime File ""pandas/_libs/tslib.pyx"", line 718, in pandas._libs.tslib.array_to_datetime_object File ""pandas/_libs/tslib.pyx"", line 552, in pandas._libs.tslib.array_to_datetime TypeError: is not convertible to datetime ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4598/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue