id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 779392905,MDU6SXNzdWU3NzkzOTI5MDU=,4768,weighted for xr.corr,12237157,closed,0,,,2,2021-01-05T18:24:29Z,2023-12-12T00:24:22Z,2023-12-12T00:24:22Z,CONTRIBUTOR,,,," **Is your feature request related to a problem? Please describe.** I want to make weighted correlation, e.g. spatial correlation but weighted `xr.corr(fct,obs,dim=['lon','lat'], weights=np.cos(np.abs(fct.lat)))` So far, `xr.corr` does not accept `weights` or `input.weighted(weights)`. A more straightforward case would be weighting of different members: `xr.corr(fct,obs,dim='member',weights=np.arange(fct.member.size))` **Describe the solution you'd like** We started xskillscore https://github.com/xarray-contrib/xskillscore some time ago, before xr.corr was implemented and have keywords `weighted`, `skipna` and `keep_attrs` implemented. We also have xs.rmse, xs.mse, ... implemented via `xr.apply_ufunc` https://github.com/aaronspring/xskillscore/blob/150f7b9b2360750e6077036c7c3fd6e4439c60b6/xskillscore/core/deterministic.py#L849 which are faster than xr-based versions of `mse` https://github.com/aaronspring/xskillscore/blob/150f7b9b2360750e6077036c7c3fd6e4439c60b6/xskillscore/xr/deterministic.py#L6 or `xr.corr`, see https://github.com/xarray-contrib/xskillscore/pull/231 **Additional context** My question here is whether it would be better to move these xskillscore metrics upward into xarray or start a PR for weighted and skipna for `xr.corr` (what I prefer).","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4768/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1471561942,I_kwDOAMm_X85XtkDW,7342,"`xr.DataArray.plot.pcolormesh(robust=""col/row"")`",12237157,closed,0,,,3,2022-12-01T16:01:27Z,2022-12-12T12:17:45Z,2022-12-12T12:17:45Z,CONTRIBUTOR,,,,"### Is your feature request related to a problem? I often want to get a quick view from multi-dimensional data from an `xr.Dataset` with multiple variables at once in a one-liner. I really like the `robust=True` feature and think it could also allow `""col""` and `""row""` to be robust only across columns or rows. ### Describe the solution you'd like ```python ds = xr.tutorial.load_dataset(""eraint_uvz"") ds.mean(""month"").to_array().plot(col=""level"", row=""variable"", robust=""row"") ``` What I get and do not like because it apply robust either to all data or nothing: ![image](https://user-images.githubusercontent.com/12237157/205099862-a74d2a75-c91f-4b01-b667-be367d01d01c.png) What I would like to see, see below in alternative what I always do ### Describe alternatives you've considered ```python ds = xr.tutorial.load_dataset(""eraint_uvz"") for v in ds.data_vars: ds[v].mean(""month"").plot(col=""level"", robust=True) plt.show() ``` ![image](https://user-images.githubusercontent.com/12237157/205099760-7fc2f7b5-c473-4818-9e9f-6d541f2712e6.png) ### Additional context _No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7342/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1071049280,I_kwDOAMm_X84_1upA,6045,`xr.infer_freq` month bug for `freq='6MS'` starting Jan becomes `freq='2QS-OCT'`,12237157,closed,0,,,3,2021-12-03T23:36:56Z,2022-06-24T22:58:47Z,2022-06-24T22:58:47Z,CONTRIBUTOR,,,,"**What happened**: @dougiesquire brought up https://github.com/pangeo-data/climpred/issues/698. During debugging I discovered unexpected behaviour in `xr.infer_freq`: `freq='6MS'` starting Jan becomes `freq='2QS-OCT'` **What you expected to happen**: `freq='6MS'` starting Jan becomes `freq='2QS-Jan'` **Minimal Complete Verifiable Example**: Creating an `6MS` index starting in Jan with pandas and xarray yields different `freq`. `2QS` and `6MS` are equivalent for quarter starting months but the `month` offset in `CFTimeIndex.freq` is wrong. ```python import pandas as pd i_pd = pd.date_range(start=""2000-01-01"", end=""2002-01-01"", freq=""6MS"") i_pd DatetimeIndex(['2000-01-01', '2000-07-01', '2001-01-01', '2001-07-01', '2002-01-01'], dtype='datetime64[ns]', freq='6MS') pd.infer_freq(i_pd) '2QS-OCT' import xarray as xr xr.cftime_range(start=""2000-01-01"", end=""2002-01-01"", freq=""6MS"") CFTimeIndex([2000-01-01 00:00:00, 2000-07-01 00:00:00, 2001-01-01 00:00:00, 2001-07-01 00:00:00, 2002-01-01 00:00:00], dtype='object', length=5, calendar='gregorian', freq='2QS-OCT') ``` **Anything else we need to know?**: outline how to solve: https://github.com/pangeo-data/climpred/issues/698#issuecomment-985899966 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6045/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1214290591,I_kwDOAMm_X85IYJqf,6510,Feature request: raise more informative error message for `xr.open_dataset(list_of_paths)`,12237157,open,0,,,4,2022-04-25T10:22:25Z,2022-04-29T16:47:56Z,,CONTRIBUTOR,,,,"### Is your feature request related to a problem? I sometimes use `xr.open_dataset` instead of `xr.open_mfdataset` on multiple paths. I propose to raise a more informative error message than `ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4', 'h5netcdf', 'scipy', 'cfgrib']. Consider explicitly selecting one of the installed engines via the ``engine`` parameter, or installing additional IO dependencies, see: https://docs.xarray.dev/en/stable/getting-started-guide/installing.html https://docs.xarray.dev/en/stable/user-guide/io.html`. ```python import xarray as xr xr.__version__ # '2022.3.0' ds = xr.tutorial.load_dataset(""air_temperature"") ds.isel(time=slice(None,1500)).to_netcdf(""file1.nc"") ds.isel(time=slice(1500,None)).to_netcdf(""file2.nc"") xr.open_mfdataset([""file1.nc"",""file2.nc""]) # works xr.open_mfdataset(""file?.nc"") # works # I understand what I need to do here xr.open_dataset(""file?.nc"") # fails FileNotFoundError: No such file or directory: b'/dir/file?.nc' # I dont here; I also first try to check whether one of these files is corrupt xr.open_dataset([""file1.nc"",""file2.nc""]) # fails ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4', 'h5netcdf', 'scipy', 'cfgrib']. Consider explicitly selecting one of the installed engines via the ``engine`` parameter, or installing additional IO dependencies, see: links ``` ### Describe the solution you'd like directing the user towards the solution, i.e. ""found path as list, please use open_mfdataset"" ### Describe alternatives you've considered _No response_ ### Additional context _No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6510/reactions"", ""total_count"": 6, ""+1"": 6, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1092867975,I_kwDOAMm_X85BI9eH,6134,[FEATURE]: `CFTimeIndex.shift(float)`,12237157,closed,0,,,1,2022-01-03T22:33:58Z,2022-02-15T23:05:04Z,2022-02-15T23:05:04Z,CONTRIBUTOR,,,,"### Is your feature request related to a problem? `CFTimeIndex.shift()` allows only `int` but sometimes I'd like to shift by a float e.g. 0.5. For small freqs, that shouldnt be a problem as `pd.Timedelta` allows floats for `days` and below. For freqs of months and larger, it becomes more tricky. Fractional shifts work for `calendar=360` easily, for other `calendar`s thats not possible. ### Describe the solution you'd like `CFTimeIndex.shift(0.5, 'D')` `CFTimeIndex.shift(0.5, 'M')` for 360day calendar `CFTimeIndex.shift(0.5, 'M')` for other calendars fails ### Describe alternatives you've considered solution we have in climpred: https://github.com/pangeo-data/climpred/blob/617223b5bea23a094065efe46afeeafe9796fa97/climpred/utils.py#L657 ### Additional context https://xarray.pydata.org/en/stable/generated/xarray.CFTimeIndex.shift.html","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6134/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1093466537,PR_kwDOAMm_X84wg_Js,6135,Implement multiplication of cftime Tick offsets by floats,12237157,closed,0,,,7,2022-01-04T15:28:16Z,2022-02-15T23:05:04Z,2022-02-15T23:05:04Z,CONTRIBUTOR,,0,pydata/xarray/pulls/6135," - [x] Closes #6134 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [x] ~~New functions/methods are listed in `api.rst`~~ --- - `shift` allows `float` with freq `D`, `H`, `min`, `S`, `ms` --- Refs: - https://docs.python.org/3/library/datetime.html - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timedelta.html#pandas.Timedelta - https://xarray.pydata.org/en/stable/generated/xarray.CFTimeIndex.shift.html ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6135/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1120583442,I_kwDOAMm_X85Cyr8S,6230,[PERFORMANCE]: `isin` on `CFTimeIndex`-backed `Coordinate` slow ,12237157,open,0,,,5,2022-02-01T12:04:02Z,2022-02-07T23:40:48Z,,CONTRIBUTOR,,,,"### Is your feature request related to a problem? I want to do `coord1.isin.coord2` and it is quite slow when coords are large and of object type `CFTimeIndex`. ```python import xarray as xr import numpy as np n=1000 coord1 = xr.cftime_range(start='2000', freq='MS', periods=n) coord2 = xr.cftime_range(start='2000', freq='3MS', periods=n) # cftimeindex: very fast %timeit coord1.isin(coord2) # 743 µs ± 1.33 µs # np.isin on index.asi8 %timeit np.isin(coord1.asi8,coord2.asi8) # 7.83 ms ± 14.1 µs da = xr.DataArray(np.random.random((n,n)),dims=['a','b'],coords={'a':coord1,'b':coord2}) # when xr.DataArray coordinate slow %timeit da.a.isin(da.b) # 94.9 ms ± 959 µs # when converting xr.DataArray coordinate back to index slow %timeit np.isin(da.a.to_index(), da.b.to_index()) # 97.4 ms ± 819 µs # when converting xr.DataArray coordinate back to index asi %timeit np.isin(da.a.to_index().asi8, da.b.to_index().asi8) # 7.89 ms ± 15.2 µs ``` ### Describe the solution you'd like faster `coord1.isin.coord2` by default. could we re-route here, e.g. to the alternative? conversion from `coordinate` `to_index()` is costly I guess ### Describe alternatives you've considered `np.isin(coord1.to_index().asi8, coord2.to_index().asi8` brings me nice speedups in https://github.com/pangeo-data/climpred/pull/724 ### Additional context unsure whether this issue should go here on in `cftime`","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6230/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1119996723,PR_kwDOAMm_X84x3fWS,6223,`GHA` `concurrency` followup,12237157,closed,0,,,1,2022-01-31T22:21:09Z,2022-01-31T23:16:20Z,2022-01-31T23:16:20Z,CONTRIBUTOR,,0,pydata/xarray/pulls/6223,follows #6210 ,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6223/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1118564242,PR_kwDOAMm_X84xywhB,6210,`GHA` `concurrency`,12237157,closed,0,,,3,2022-01-30T14:56:01Z,2022-01-31T22:25:27Z,2022-01-31T16:59:27Z,CONTRIBUTOR,,0,pydata/xarray/pulls/6210," - [x] Closes #5190 - [ ] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#concurrency `concurrency` instead of `cancel-duplicate-runs.yaml`","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6210/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1071054456,PR_kwDOAMm_X84vYnOq,6046,Fix `xr.infer_freq` quarterly month,12237157,closed,0,,,0,2021-12-03T23:48:43Z,2022-01-04T13:54:49Z,2022-01-04T13:54:49Z,CONTRIBUTOR,,1,pydata/xarray/pulls/6046," - [ ] Closes #6045 - [ ] Tests added - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` ~~- [ ] New functions/methods are listed in `api.rst`~~ ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6046/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1058047751,PR_kwDOAMm_X84uv1d0,6007,Use condas dask-core in ci instead of dask to speedup ci and reduce dependencies,12237157,closed,0,,,1,2021-11-19T02:02:41Z,2021-11-28T21:01:36Z,2021-11-28T04:40:34Z,CONTRIBUTOR,,0,pydata/xarray/pulls/6007," - [ ] Closes #xxxx - [ ] Tests added - [ ] Passes `pre-commit run --all-files` - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` Tried to reduce dependencies from installing dask via conda which installs like pip install dask[complete]. dask-core is like pip install dask. https://github.com/xgcm/xhistogram/pull/71#discussion_r752738286 Why? dask[complete] includes bokeh etc which are not needed here and likely speed up CI setup/install times but now dask and dask-core are conda installed :( seems like iris installs dask https://github.com/conda-forge/iris-feedstock/blob/master/recipe/meta.yaml, so this would require an iris-feedstock PR first linking https://github.com/SciTools/iris/pull/4434 and https://github.com/conda-forge/iris-feedstock/pull/77 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6007/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 954308458,MDExOlB1bGxSZXF1ZXN0Njk4MjI0Mjcx,5639,Del duplicate set_options in api.rst,12237157,closed,0,,,3,2021-07-27T22:19:38Z,2021-07-30T08:47:36Z,2021-07-30T08:20:15Z,CONTRIBUTOR,,0,pydata/xarray/pulls/5639," - [x] Passes `pre-commit run --all-files` - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5639/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 827561388,MDExOlB1bGxSZXF1ZXN0NTg5NDU5NDQ1,5020,add polyval to polyfit see also,12237157,closed,0,,,1,2021-03-10T11:14:02Z,2021-03-10T14:20:11Z,2021-03-10T12:59:41Z,CONTRIBUTOR,,0,pydata/xarray/pulls/5020," - [x] Closes #5016 - [ ] Tests added - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [ ] New functions/methods are listed in `api.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5020/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 748094631,MDExOlB1bGxSZXF1ZXN0NTI1MTgzOTQ5,4597,add freq as CFTimeIndex property and to CFTimeIndex.__repr__,12237157,closed,0,,,11,2020-11-21T20:12:36Z,2020-11-25T09:16:49Z,2020-11-24T21:53:27Z,CONTRIBUTOR,,0,pydata/xarray/pulls/4597," - [x] Closes #2416 - [x] Tests added - [x] Passes `isort . && black . && mypy . && flake8` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [x] New functions/methods are listed in `api.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4597/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 707223289,MDU6SXNzdWU3MDcyMjMyODk=,4451,xr.open_dataset(remote_url) file not found,12237157,closed,0,,,1,2020-09-23T10:00:54Z,2020-09-23T12:03:37Z,2020-09-23T12:03:37Z,CONTRIBUTOR,,,,"**What happened**: I tried to open a remote url and got OSError, but !wget url works **What you expected to happen**: open the remote netcdf file **Minimal Complete Verifiable Example**: ```python from netCDF4 import Dataset import netCDF4 netCDF4.__version__ import xarray as xr xr.__version__ url='https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc' # working_url='https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p5deg/GFS_Global_0p5deg_20200923_0000.grib2' xr.open_dataset(url) ... netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__() netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success() OSError: [Errno -90] NetCDF: file not found: b'https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc' # seems to be netcdf4 upstream issue Dataset(url) --------------------------------------------------------------------------- OSError Traceback (most recent call last) in ----> 1 Dataset(url) netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__() netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success() OSError: [Errno -90] NetCDF: file not found: b'https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc' ``` **Anything else we need to know?**: **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 2.6.32-754.29.2.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.6.2 xarray: 0.16.1 pandas: 1.1.2 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.1.2 pydap: installed h5netcdf: 0.8.0 h5py: 2.10.0 Nio: 1.5.5 zarr: 2.4.0 cftime: 1.2.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.1.0 cfgrib: 0.9.7.6 iris: 2.2.0 bottleneck: 1.3.1 dask: 2.15.0 distributed: 2.20.0 matplotlib: 3.1.2 cartopy: 0.17.0 seaborn: 0.10.1 numbagg: None pint: 0.11 setuptools: 47.1.1.post20200529 pip: 20.2.3 conda: None pytest: 5.3.5 IPython: 7.15.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4451/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 668717850,MDU6SXNzdWU2Njg3MTc4NTA=,4290,bool(Dataset(False)) is True,12237157,closed,0,,,9,2020-07-30T13:23:14Z,2020-08-05T14:25:55Z,2020-08-05T13:48:55Z,CONTRIBUTOR,,,,"**What happened**: ```python v=True bool(xr.DataArray(v)) # True bool(xr.DataArray(v).to_dataset(name='var')) # True v=False bool(xr.DataArray(v)) # False # unexpected behaviour below bool(xr.DataArray(v).to_dataset(name='var')) # True ``` **What you expected to happen**: ```python bool(xr.DataArray(False).to_dataset(name='var')) # False ``` Maybe this is intentional and I dont understand why. xr.__version__ = '0.16.0' ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4290/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 624378150,MDExOlB1bGxSZXF1ZXN0NDIyODEzOTYy,4092,CFTimeIndex calendar in repr,12237157,closed,0,,,19,2020-05-25T15:55:20Z,2020-07-23T17:38:39Z,2020-07-23T10:42:29Z,CONTRIBUTOR,,0,pydata/xarray/pulls/4092," - [x] Closes #2416 - [x] Tests added - [x] Passes `isort -rc . && black . && mypy . && flake8` - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API Done: - added `calendar` property to `CFTimeIndex` - rebuild __repr__ from pandas ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4092/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 611839345,MDU6SXNzdWU2MTE4MzkzNDU=,4025,Visualize task tree,12237157,closed,0,,,3,2020-05-04T12:31:25Z,2020-05-08T09:10:08Z,2020-05-04T14:43:25Z,CONTRIBUTOR,,,,"While reading this excellent discussion on working with large onetimestep datasets https://discourse.pangeo.io/t/best-practices-to-go-from-1000s-of-netcdf-files-to-analyses-on-a-hpc-cluster/588/10 I asked myself again why we don’t have the task tree visualisation in xarray as we have in dask. Is there a technical reason that prevents us from implementing visualize? This feature would be extremely useful for me. Maybe it’s easier to do this for dataarrays first. ```python # ds = rasm Tutorial ds = ds.chunk({“time”:2}) ds.visualize() ``` #### Expected Output Figure of task tree https://docs.dask.org/en/latest/graphviz.html #### Problem Description visualize the task tree only implemented in dask. Now I recreate my xr Problem in dask to circumvent. Nicer would be .visualize() in xarray. https://discourse.pangeo.io/t/best-practices-to-go-from-1000s-of-netcdf-files-to-analyses-on-a-hpc-cluster/588/10 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4025/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 577105538,MDExOlB1bGxSZXF1ZXN0Mzg0OTY0MDcz,3844,Implement skipna kwarg in xr.quantile,12237157,closed,0,,,5,2020-03-06T18:36:55Z,2020-03-09T09:46:25Z,2020-03-08T17:42:44Z,CONTRIBUTOR,,0,pydata/xarray/pulls/3844," - [x] Closes #3843 - [x] Tests added - [x] Passes `isort -rc . && black . && mypy . && flake8` - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3844/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 577088426,MDU6SXNzdWU1NzcwODg0MjY=,3843,Implement `skipna` in xr.quantile for speedup,12237157,closed,0,,,1,2020-03-06T17:58:28Z,2020-03-08T17:42:43Z,2020-03-08T17:42:43Z,CONTRIBUTOR,,,,"`xr.quantile` uses `np.nanquantile` which is slower than `np.quantile` but only needed when ignoring nans is needed. Adding `skipna` as kwarg would lead to a speedup for many use-cases. #### MCVE Code Sample `np.quantile` is much faster than `np.nanquantile` ```python control = xr.DataArray(np.random.random((50,256,192)),dims=['time','x','y']) %time _ = control.quantile(dim='time',q=q) CPU times: user 4.14 s, sys: 61.4 ms, total: 4.2 s Wall time: 4.3 s %time _ = np.quantile(control,q,axis=0) CPU times: user 47.1 ms, sys: 4.27 ms, total: 51.4 ms Wall time: 52.6 ms %time _ = np.nanquantile(control,q,axis=0) CPU times: user 3.18 s, sys: 21.4 ms, total: 3.2 s Wall time: 3.22 s ``` #### Expected Output faster xr.quantile: ``` %time _ = control.quantile(dim='time',q=q) CPU times: user 4.95 s, sys: 34.3 ms, total: 4.98 s Wall time: 5.88 s %time _ = control.quantile(dim='time',q=q, skipna=False) CPU times: user 85.3 ms, sys: 16.7 ms, total: 102 ms Wall time: 127 ms ``` #### Problem Description np.nanquantile not always needed #### Versions
Output of `xr.show_versions()` xr=0.15.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3843/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 433833707,MDU6SXNzdWU0MzM4MzM3MDc=,2900,open_mfdataset with proprocess ds[var],12237157,closed,0,,,3,2019-04-16T15:07:36Z,2019-04-16T19:09:34Z,2019-04-16T19:09:34Z,CONTRIBUTOR,,,,"#### Code Sample, a copy-pastable example if possible I would like to load only one variable from larger files containing 10s of variables. The files get really large when I open them. I expect them to be opened lazily also fast if I only want to extract one variable (maybe this is my misunderstand here). I hoped to use `preprocess`, but I don't get it working. Here my minimum example with 3 files of 12 timesteps each and two variable, but I only want to load one: ```python ds = xr.open_mfdataset(path) ds Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array caex90 (time, depth_2, y, x) float32 dask.array def preprocess(ds,var='co2flux'): return ds[var] ds = xr.open_mfdataset(path,preprocess=preprocess) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in 1 def preprocess(ds,var='co2flux'): 2 return ds[var] ----> 3 ds = xr.open_mfdataset(path,preprocess=preprocess) /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, lock, data_vars, coords, autoclose, parallel, **kwargs) 717 data_vars=data_vars, coords=coords, 718 infer_order_from_coords=infer_order_from_coords, --> 719 ids=ids) 720 except ValueError: 721 for ds in datasets: /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _auto_combine(datasets, concat_dims, compat, data_vars, coords, infer_order_from_coords, ids) 551 # Repeatedly concatenate then merge along each dimension 552 combined = _combine_nd(combined_ids, concat_dims, compat=compat, --> 553 data_vars=data_vars, coords=coords) 554 return combined 555 /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _combine_nd(combined_ids, concat_dims, data_vars, coords, compat) 473 data_vars=data_vars, 474 coords=coords, --> 475 compat=compat) 476 combined_ds = list(combined_ids.values())[0] 477 return combined_ds /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _auto_combine_all_along_first_dim(combined_ids, dim, data_vars, coords, compat) 491 datasets = combined_ids.values() 492 new_combined_ids[new_id] = _auto_combine_1d(datasets, dim, compat, --> 493 data_vars, coords) 494 return new_combined_ids 495 /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _auto_combine_1d(datasets, concat_dim, compat, data_vars, coords) 505 if concat_dim is not None: 506 dim = None if concat_dim is _CONCAT_DIM_DEFAULT else concat_dim --> 507 sorted_datasets = sorted(datasets, key=vars_as_keys) 508 grouped_by_vars = itertools.groupby(sorted_datasets, key=vars_as_keys) 509 concatenated = [_auto_concat(list(ds_group), dim=dim, /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in vars_as_keys(ds) 496 497 def vars_as_keys(ds): --> 498 return tuple(sorted(ds)) 499 500 /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/common.py in __bool__(self) 80 81 def __bool__(self): ---> 82 return bool(self.values) 83 84 # Python 3 uses __bool__, Python 2 uses __nonzero__ ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ``` I was hoping that `data_vars` could work like this but it has no effect. Probably I got the documentation wrong here. ```python ds = xr.open_mfdataset(path,data_vars=['co2flux']) ds Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array caex90 (time, depth_2, y, x) float32 dask.array ``` #### Problem description I would expect from the documentation the below behaviour. #### Expected Output ```python ds = xr.open_mfdataset(path,data_vars=['co2flux']) ds Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array ds = xr.open_mfdataset(path,preprocess=preprocess) ds Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array ``` #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 |Anaconda, Inc.| (default, Oct 23 2018, 19:16:44) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 2.6.32-696.18.7.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.24.2 numpy: 1.14.2 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.2.0 cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 1.2.0 distributed: 1.27.0 matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.4.0 pip: 18.1 conda: None pytest: None IPython: 7.0.1 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2900/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue