home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

21 rows where user = 12237157 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, updated_at, closed_at, draft, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 11
  • pull 10

state 2

  • closed 19
  • open 2

repo 1

  • xarray 21
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
779392905 MDU6SXNzdWU3NzkzOTI5MDU= 4768 weighted for xr.corr aaronspring 12237157 closed 0     2 2021-01-05T18:24:29Z 2023-12-12T00:24:22Z 2023-12-12T00:24:22Z CONTRIBUTOR      

Is your feature request related to a problem? Please describe. I want to make weighted correlation, e.g. spatial correlation but weighted xr.corr(fct,obs,dim=['lon','lat'], weights=np.cos(np.abs(fct.lat))) So far, xr.corr does not accept weights or input.weighted(weights). A more straightforward case would be weighting of different members: xr.corr(fct,obs,dim='member',weights=np.arange(fct.member.size))

Describe the solution you'd like We started xskillscore https://github.com/xarray-contrib/xskillscore some time ago, before xr.corr was implemented and have keywords weighted, skipna and keep_attrs implemented. We also have xs.rmse, xs.mse, ... implemented via xr.apply_ufunc https://github.com/aaronspring/xskillscore/blob/150f7b9b2360750e6077036c7c3fd6e4439c60b6/xskillscore/core/deterministic.py#L849 which are faster than xr-based versions of mse https://github.com/aaronspring/xskillscore/blob/150f7b9b2360750e6077036c7c3fd6e4439c60b6/xskillscore/xr/deterministic.py#L6 or xr.corr, see https://github.com/xarray-contrib/xskillscore/pull/231

Additional context My question here is whether it would be better to move these xskillscore metrics upward into xarray or start a PR for weighted and skipna for xr.corr (what I prefer).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4768/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1471561942 I_kwDOAMm_X85XtkDW 7342 `xr.DataArray.plot.pcolormesh(robust="col/row")` aaronspring 12237157 closed 0     3 2022-12-01T16:01:27Z 2022-12-12T12:17:45Z 2022-12-12T12:17:45Z CONTRIBUTOR      

Is your feature request related to a problem?

I often want to get a quick view from multi-dimensional data from an xr.Dataset with multiple variables at once in a one-liner. I really like the robust=True feature and think it could also allow "col" and "row" to be robust only across columns or rows.

Describe the solution you'd like

python ds = xr.tutorial.load_dataset("eraint_uvz") ds.mean("month").to_array().plot(col="level", row="variable", robust="row") What I get and do not like because it apply robust either to all data or nothing:

What I would like to see, see below in alternative what I always do

Describe alternatives you've considered

python ds = xr.tutorial.load_dataset("eraint_uvz") for v in ds.data_vars: ds[v].mean("month").plot(col="level", robust=True) plt.show()

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7342/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1071049280 I_kwDOAMm_X84_1upA 6045 `xr.infer_freq` month bug for `freq='6MS'` starting Jan becomes `freq='2QS-OCT'` aaronspring 12237157 closed 0     3 2021-12-03T23:36:56Z 2022-06-24T22:58:47Z 2022-06-24T22:58:47Z CONTRIBUTOR      

What happened:

@dougiesquire brought up https://github.com/pangeo-data/climpred/issues/698. During debugging I discovered unexpected behaviour in xr.infer_freq: freq='6MS' starting Jan becomes freq='2QS-OCT'

What you expected to happen: freq='6MS' starting Jan becomes freq='2QS-Jan'

Minimal Complete Verifiable Example:

Creating an 6MS index starting in Jan with pandas and xarray yields different freq. 2QS and 6MS are equivalent for quarter starting months but the month offset in CFTimeIndex.freq is wrong.

```python import pandas as pd i_pd = pd.date_range(start="2000-01-01", end="2002-01-01", freq="6MS") i_pd DatetimeIndex(['2000-01-01', '2000-07-01', '2001-01-01', '2001-07-01', '2002-01-01'], dtype='datetime64[ns]', freq='6MS')

pd.infer_freq(i_pd) '2QS-OCT'

import xarray as xr xr.cftime_range(start="2000-01-01", end="2002-01-01", freq="6MS")

CFTimeIndex([2000-01-01 00:00:00, 2000-07-01 00:00:00, 2001-01-01 00:00:00, 2001-07-01 00:00:00, 2002-01-01 00:00:00], dtype='object', length=5, calendar='gregorian', freq='2QS-OCT') ```

Anything else we need to know?:

outline how to solve: https://github.com/pangeo-data/climpred/issues/698#issuecomment-985899966

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6045/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1214290591 I_kwDOAMm_X85IYJqf 6510 Feature request: raise more informative error message for `xr.open_dataset(list_of_paths)` aaronspring 12237157 open 0     4 2022-04-25T10:22:25Z 2022-04-29T16:47:56Z   CONTRIBUTOR      

Is your feature request related to a problem?

I sometimes use xr.open_dataset instead of xr.open_mfdataset on multiple paths. I propose to raise a more informative error message than ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4', 'h5netcdf', 'scipy', 'cfgrib']. Consider explicitly selecting one of the installed engines via the ``engine`` parameter, or installing additional IO dependencies, see: https://docs.xarray.dev/en/stable/getting-started-guide/installing.html https://docs.xarray.dev/en/stable/user-guide/io.html.

```python import xarray as xr

xr.version # '2022.3.0'

ds = xr.tutorial.load_dataset("air_temperature")

ds.isel(time=slice(None,1500)).to_netcdf("file1.nc") ds.isel(time=slice(1500,None)).to_netcdf("file2.nc")

xr.open_mfdataset(["file1.nc","file2.nc"]) # works xr.open_mfdataset("file?.nc") # works

I understand what I need to do here

xr.open_dataset("file?.nc") # fails FileNotFoundError: No such file or directory: b'/dir/file?.nc'

I dont here; I also first try to check whether one of these files is corrupt

xr.open_dataset(["file1.nc","file2.nc"]) # fails ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4', 'h5netcdf', 'scipy', 'cfgrib']. Consider explicitly selecting one of the installed engines via the engine parameter, or installing additional IO dependencies, see: links ```

Describe the solution you'd like

directing the user towards the solution, i.e. "found path as list, please use open_mfdataset"

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6510/reactions",
    "total_count": 6,
    "+1": 6,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1092867975 I_kwDOAMm_X85BI9eH 6134 [FEATURE]: `CFTimeIndex.shift(float)` aaronspring 12237157 closed 0     1 2022-01-03T22:33:58Z 2022-02-15T23:05:04Z 2022-02-15T23:05:04Z CONTRIBUTOR      

Is your feature request related to a problem?

CFTimeIndex.shift() allows only int but sometimes I'd like to shift by a float e.g. 0.5.

For small freqs, that shouldnt be a problem as pd.Timedelta allows floats for days and below. For freqs of months and larger, it becomes more tricky. Fractional shifts work for calendar=360 easily, for other calendars thats not possible.

Describe the solution you'd like

CFTimeIndex.shift(0.5, 'D') CFTimeIndex.shift(0.5, 'M') for 360day calendar CFTimeIndex.shift(0.5, 'M') for other calendars fails

Describe alternatives you've considered

solution we have in climpred: https://github.com/pangeo-data/climpred/blob/617223b5bea23a094065efe46afeeafe9796fa97/climpred/utils.py#L657

Additional context

https://xarray.pydata.org/en/stable/generated/xarray.CFTimeIndex.shift.html

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6134/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1093466537 PR_kwDOAMm_X84wg_Js 6135 Implement multiplication of cftime Tick offsets by floats aaronspring 12237157 closed 0     7 2022-01-04T15:28:16Z 2022-02-15T23:05:04Z 2022-02-15T23:05:04Z CONTRIBUTOR   0 pydata/xarray/pulls/6135
  • [x] Closes #6134
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] ~~New functions/methods are listed in api.rst~~

  • shift allows float with freq D, H, min, S, ms

Refs: - https://docs.python.org/3/library/datetime.html - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timedelta.html#pandas.Timedelta - https://xarray.pydata.org/en/stable/generated/xarray.CFTimeIndex.shift.html

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6135/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1120583442 I_kwDOAMm_X85Cyr8S 6230 [PERFORMANCE]: `isin` on `CFTimeIndex`-backed `Coordinate` slow aaronspring 12237157 open 0     5 2022-02-01T12:04:02Z 2022-02-07T23:40:48Z   CONTRIBUTOR      

Is your feature request related to a problem?

I want to do coord1.isin.coord2 and it is quite slow when coords are large and of object type CFTimeIndex.

```python import xarray as xr import numpy as np

n=1000 coord1 = xr.cftime_range(start='2000', freq='MS', periods=n) coord2 = xr.cftime_range(start='2000', freq='3MS', periods=n)

cftimeindex: very fast

%timeit coord1.isin(coord2) # 743 µs ± 1.33 µs

np.isin on index.asi8

%timeit np.isin(coord1.asi8,coord2.asi8) # 7.83 ms ± 14.1 µs

da = xr.DataArray(np.random.random((n,n)),dims=['a','b'],coords={'a':coord1,'b':coord2})

when xr.DataArray coordinate slow

%timeit da.a.isin(da.b) # 94.9 ms ± 959 µs

when converting xr.DataArray coordinate back to index slow

%timeit np.isin(da.a.to_index(), da.b.to_index()) # 97.4 ms ± 819 µs

when converting xr.DataArray coordinate back to index asi

%timeit np.isin(da.a.to_index().asi8, da.b.to_index().asi8) # 7.89 ms ± 15.2 µs ```

Describe the solution you'd like

faster coord1.isin.coord2 by default. could we re-route here, e.g. to the alternative?

conversion from coordinate to_index() is costly I guess

Describe alternatives you've considered

np.isin(coord1.to_index().asi8, coord2.to_index().asi8 brings me nice speedups in https://github.com/pangeo-data/climpred/pull/724

Additional context

unsure whether this issue should go here on in cftime

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6230/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1119996723 PR_kwDOAMm_X84x3fWS 6223 `GHA` `concurrency` followup aaronspring 12237157 closed 0     1 2022-01-31T22:21:09Z 2022-01-31T23:16:20Z 2022-01-31T23:16:20Z CONTRIBUTOR   0 pydata/xarray/pulls/6223

follows #6210

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6223/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1118564242 PR_kwDOAMm_X84xywhB 6210 `GHA` `concurrency` aaronspring 12237157 closed 0     3 2022-01-30T14:56:01Z 2022-01-31T22:25:27Z 2022-01-31T16:59:27Z CONTRIBUTOR   0 pydata/xarray/pulls/6210
  • [x] Closes #5190
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#concurrency

concurrency instead of cancel-duplicate-runs.yaml

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6210/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1071054456 PR_kwDOAMm_X84vYnOq 6046 Fix `xr.infer_freq` quarterly month aaronspring 12237157 closed 0     0 2021-12-03T23:48:43Z 2022-01-04T13:54:49Z 2022-01-04T13:54:49Z CONTRIBUTOR   1 pydata/xarray/pulls/6046
  • [ ] Closes #6045
  • [ ] Tests added
  • [x] Passes pre-commit run --all-files
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst ~~- [ ] New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6046/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1058047751 PR_kwDOAMm_X84uv1d0 6007 Use condas dask-core in ci instead of dask to speedup ci and reduce dependencies aaronspring 12237157 closed 0     1 2021-11-19T02:02:41Z 2021-11-28T21:01:36Z 2021-11-28T04:40:34Z CONTRIBUTOR   0 pydata/xarray/pulls/6007
  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

Tried to reduce dependencies from installing dask via conda which installs like pip install dask[complete]. dask-core is like pip install dask. https://github.com/xgcm/xhistogram/pull/71#discussion_r752738286

Why? dask[complete] includes bokeh etc which are not needed here and likely speed up CI setup/install times

but now dask and dask-core are conda installed :( seems like iris installs dask https://github.com/conda-forge/iris-feedstock/blob/master/recipe/meta.yaml, so this would require an iris-feedstock PR first

linking https://github.com/SciTools/iris/pull/4434 and https://github.com/conda-forge/iris-feedstock/pull/77

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6007/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
954308458 MDExOlB1bGxSZXF1ZXN0Njk4MjI0Mjcx 5639 Del duplicate set_options in api.rst aaronspring 12237157 closed 0     3 2021-07-27T22:19:38Z 2021-07-30T08:47:36Z 2021-07-30T08:20:15Z CONTRIBUTOR   0 pydata/xarray/pulls/5639
  • [x] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5639/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
827561388 MDExOlB1bGxSZXF1ZXN0NTg5NDU5NDQ1 5020 add polyval to polyfit see also aaronspring 12237157 closed 0     1 2021-03-10T11:14:02Z 2021-03-10T14:20:11Z 2021-03-10T12:59:41Z CONTRIBUTOR   0 pydata/xarray/pulls/5020
  • [x] Closes #5016
  • [ ] Tests added
  • [x] Passes pre-commit run --all-files
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5020/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
748094631 MDExOlB1bGxSZXF1ZXN0NTI1MTgzOTQ5 4597 add freq as CFTimeIndex property and to CFTimeIndex.__repr__ aaronspring 12237157 closed 0     11 2020-11-21T20:12:36Z 2020-11-25T09:16:49Z 2020-11-24T21:53:27Z CONTRIBUTOR   0 pydata/xarray/pulls/4597
  • [x] Closes #2416
  • [x] Tests added
  • [x] Passes isort . && black . && mypy . && flake8
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4597/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
707223289 MDU6SXNzdWU3MDcyMjMyODk= 4451 xr.open_dataset(remote_url) file not found aaronspring 12237157 closed 0     1 2020-09-23T10:00:54Z 2020-09-23T12:03:37Z 2020-09-23T12:03:37Z CONTRIBUTOR      

What happened:

I tried to open a remote url and got OSError, but !wget url works

What you expected to happen:

open the remote netcdf file

Minimal Complete Verifiable Example:

```python from netCDF4 import Dataset

import netCDF4 netCDF4.version

import xarray as xr xr.version

url='https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc'

working_url='https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p5deg/GFS_Global_0p5deg_20200923_0000.grib2'

xr.open_dataset(url) ... netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.init()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

OSError: [Errno -90] NetCDF: file not found: b'https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc'

seems to be netcdf4 upstream issue

Dataset(url)

OSError Traceback (most recent call last) <ipython-input-14-265839034cee> in <module> ----> 1 Dataset(url)

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.init()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

OSError: [Errno -90] NetCDF: file not found: b'https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc' ```

Anything else we need to know?:

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 2.6.32-754.29.2.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.6.2 xarray: 0.16.1 pandas: 1.1.2 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.1.2 pydap: installed h5netcdf: 0.8.0 h5py: 2.10.0 Nio: 1.5.5 zarr: 2.4.0 cftime: 1.2.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.1.0 cfgrib: 0.9.7.6 iris: 2.2.0 bottleneck: 1.3.1 dask: 2.15.0 distributed: 2.20.0 matplotlib: 3.1.2 cartopy: 0.17.0 seaborn: 0.10.1 numbagg: None pint: 0.11 setuptools: 47.1.1.post20200529 pip: 20.2.3 conda: None pytest: 5.3.5 IPython: 7.15.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4451/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
668717850 MDU6SXNzdWU2Njg3MTc4NTA= 4290 bool(Dataset(False)) is True aaronspring 12237157 closed 0     9 2020-07-30T13:23:14Z 2020-08-05T14:25:55Z 2020-08-05T13:48:55Z CONTRIBUTOR      

What happened:

```python v=True bool(xr.DataArray(v)) # True bool(xr.DataArray(v).to_dataset(name='var')) # True

v=False bool(xr.DataArray(v)) # False

unexpected behaviour below

bool(xr.DataArray(v).to_dataset(name='var')) # True ```

What you expected to happen:

python bool(xr.DataArray(False).to_dataset(name='var')) # False

Maybe this is intentional and I dont understand why.

xr.version = '0.16.0'

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4290/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
624378150 MDExOlB1bGxSZXF1ZXN0NDIyODEzOTYy 4092 CFTimeIndex calendar in repr aaronspring 12237157 closed 0     19 2020-05-25T15:55:20Z 2020-07-23T17:38:39Z 2020-07-23T10:42:29Z CONTRIBUTOR   0 pydata/xarray/pulls/4092
  • [x] Closes #2416
  • [x] Tests added
  • [x] Passes isort -rc . && black . && mypy . && flake8
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

Done: - added calendar property to CFTimeIndex - rebuild repr from pandas

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4092/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
611839345 MDU6SXNzdWU2MTE4MzkzNDU= 4025 Visualize task tree aaronspring 12237157 closed 0     3 2020-05-04T12:31:25Z 2020-05-08T09:10:08Z 2020-05-04T14:43:25Z CONTRIBUTOR      

While reading this excellent discussion on working with large onetimestep datasets https://discourse.pangeo.io/t/best-practices-to-go-from-1000s-of-netcdf-files-to-analyses-on-a-hpc-cluster/588/10 I asked myself again why we don’t have the task tree visualisation in xarray as we have in dask. Is there a technical reason that prevents us from implementing visualize?

This feature would be extremely useful for me.

Maybe it’s easier to do this for dataarrays first.

```python

ds = rasm Tutorial

ds = ds.chunk({“time”:2}) ds.visualize()

```

Expected Output

Figure of task tree

https://docs.dask.org/en/latest/graphviz.html

Problem Description

visualize the task tree only implemented in dask. Now I recreate my xr Problem in dask to circumvent. Nicer would be .visualize() in xarray.

https://discourse.pangeo.io/t/best-practices-to-go-from-1000s-of-netcdf-files-to-analyses-on-a-hpc-cluster/588/10

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4025/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
577105538 MDExOlB1bGxSZXF1ZXN0Mzg0OTY0MDcz 3844 Implement skipna kwarg in xr.quantile aaronspring 12237157 closed 0     5 2020-03-06T18:36:55Z 2020-03-09T09:46:25Z 2020-03-08T17:42:44Z CONTRIBUTOR   0 pydata/xarray/pulls/3844
  • [x] Closes #3843
  • [x] Tests added
  • [x] Passes isort -rc . && black . && mypy . && flake8
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3844/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
577088426 MDU6SXNzdWU1NzcwODg0MjY= 3843 Implement `skipna` in xr.quantile for speedup aaronspring 12237157 closed 0     1 2020-03-06T17:58:28Z 2020-03-08T17:42:43Z 2020-03-08T17:42:43Z CONTRIBUTOR      

xr.quantile uses np.nanquantile which is slower than np.quantile but only needed when ignoring nans is needed. Adding skipna as kwarg would lead to a speedup for many use-cases.

MCVE Code Sample

np.quantile is much faster than np.nanquantile ```python control = xr.DataArray(np.random.random((50,256,192)),dims=['time','x','y']) %time _ = control.quantile(dim='time',q=q) CPU times: user 4.14 s, sys: 61.4 ms, total: 4.2 s Wall time: 4.3 s

%time _ = np.quantile(control,q,axis=0) CPU times: user 47.1 ms, sys: 4.27 ms, total: 51.4 ms Wall time: 52.6 ms

%time _ = np.nanquantile(control,q,axis=0) CPU times: user 3.18 s, sys: 21.4 ms, total: 3.2 s Wall time: 3.22 s ```

Expected Output

faster xr.quantile: ``` %time _ = control.quantile(dim='time',q=q) CPU times: user 4.95 s, sys: 34.3 ms, total: 4.98 s Wall time: 5.88 s

%time _ = control.quantile(dim='time',q=q, skipna=False) CPU times: user 85.3 ms, sys: 16.7 ms, total: 102 ms Wall time: 127 ms

```

Problem Description

np.nanquantile not always needed

Versions

Output of `xr.show_versions()` xr=0.15.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3843/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
433833707 MDU6SXNzdWU0MzM4MzM3MDc= 2900 open_mfdataset with proprocess ds[var] aaronspring 12237157 closed 0     3 2019-04-16T15:07:36Z 2019-04-16T19:09:34Z 2019-04-16T19:09:34Z CONTRIBUTOR      

Code Sample, a copy-pastable example if possible

I would like to load only one variable from larger files containing 10s of variables. The files get really large when I open them. I expect them to be opened lazily also fast if I only want to extract one variable (maybe this is my misunderstand here).

I hoped to use preprocess, but I don't get it working.

Here my minimum example with 3 files of 12 timesteps each and two variable, but I only want to load one:

```python ds = xr.open_mfdataset(path) ds <xarray.Dataset> Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)> caex90 (time, depth_2, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)>

def preprocess(ds,var='co2flux'): return ds[var]

ds = xr.open_mfdataset(path,preprocess=preprocess)

ValueError Traceback (most recent call last) <ipython-input-17-770267b86462> in <module> 1 def preprocess(ds,var='co2flux'): 2 return ds[var] ----> 3 ds = xr.open_mfdataset(path,preprocess=preprocess)

/work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, lock, data_vars, coords, autoclose, parallel, **kwargs) 717 data_vars=data_vars, coords=coords, 718 infer_order_from_coords=infer_order_from_coords, --> 719 ids=ids) 720 except ValueError: 721 for ds in datasets:

/work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _auto_combine(datasets, concat_dims, compat, data_vars, coords, infer_order_from_coords, ids) 551 # Repeatedly concatenate then merge along each dimension 552 combined = _combine_nd(combined_ids, concat_dims, compat=compat, --> 553 data_vars=data_vars, coords=coords) 554 return combined 555

/work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _combine_nd(combined_ids, concat_dims, data_vars, coords, compat) 473 data_vars=data_vars, 474 coords=coords, --> 475 compat=compat) 476 combined_ds = list(combined_ids.values())[0] 477 return combined_ds

/work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _auto_combine_all_along_first_dim(combined_ids, dim, data_vars, coords, compat) 491 datasets = combined_ids.values() 492 new_combined_ids[new_id] = _auto_combine_1d(datasets, dim, compat, --> 493 data_vars, coords) 494 return new_combined_ids 495

/work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _auto_combine_1d(datasets, concat_dim, compat, data_vars, coords) 505 if concat_dim is not None: 506 dim = None if concat_dim is _CONCAT_DIM_DEFAULT else concat_dim --> 507 sorted_datasets = sorted(datasets, key=vars_as_keys) 508 grouped_by_vars = itertools.groupby(sorted_datasets, key=vars_as_keys) 509 concatenated = [_auto_concat(list(ds_group), dim=dim,

/work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in vars_as_keys(ds) 496 497 def vars_as_keys(ds): --> 498 return tuple(sorted(ds)) 499 500

/work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/common.py in bool(self) 80 81 def bool(self): ---> 82 return bool(self.values) 83 84 # Python 3 uses bool, Python 2 uses nonzero

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ```

I was hoping that data_vars could work like this but it has no effect. Probably I got the documentation wrong here. python ds = xr.open_mfdataset(path,data_vars=['co2flux']) ds <xarray.Dataset> Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)> caex90 (time, depth_2, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)>

Problem description

I would expect from the documentation the below behaviour.

Expected Output

```python ds = xr.open_mfdataset(path,data_vars=['co2flux']) ds <xarray.Dataset> Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)>

ds = xr.open_mfdataset(path,preprocess=preprocess) ds <xarray.Dataset> Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)>

```

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 |Anaconda, Inc.| (default, Oct 23 2018, 19:16:44) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 2.6.32-696.18.7.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.24.2 numpy: 1.14.2 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.2.0 cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 1.2.0 distributed: 1.27.0 matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.4.0 pip: 18.1 conda: None pytest: None IPython: 7.0.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2900/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 169.895ms · About: xarray-datasette