github: issues: 21 rows where user = 12237157 sorted by updated

21 rows where user = 12237157 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	draft	pull_request	body	reactions	state_reason	repo	type
779392905	MDU6SXNzdWU3NzkzOTI5MDU=	4768	weighted for xr.corr	aaronspring 12237157	closed	2	2021-01-05T18:24:29Z	2023-12-12T00:24:22Z	2023-12-12T00:24:22Z	CONTRIBUTOR			Is your feature request related to a problem? Please describe. I want to make weighted correlation, e.g. spatial correlation but weighted `xr.corr(fct,obs,dim=['lon','lat'], weights=np.cos(np.abs(fct.lat)))` So far, `xr.corr` does not accept `weights` or `input.weighted(weights)`. A more straightforward case would be weighting of different members: `xr.corr(fct,obs,dim='member',weights=np.arange(fct.member.size))` Describe the solution you'd like We started xskillscore https://github.com/xarray-contrib/xskillscore some time ago, before xr.corr was implemented and have keywords `weighted`, `skipna` and `keep_attrs` implemented. We also have xs.rmse, xs.mse, ... implemented via `xr.apply_ufunc` https://github.com/aaronspring/xskillscore/blob/150f7b9b2360750e6077036c7c3fd6e4439c60b6/xskillscore/core/deterministic.py#L849 which are faster than xr-based versions of `mse` https://github.com/aaronspring/xskillscore/blob/150f7b9b2360750e6077036c7c3fd6e4439c60b6/xskillscore/xr/deterministic.py#L6 or `xr.corr`, see https://github.com/xarray-contrib/xskillscore/pull/231 Additional context My question here is whether it would be better to move these xskillscore metrics upward into xarray or start a PR for weighted and skipna for `xr.corr` (what I prefer).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4768/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1471561942	I_kwDOAMm_X85XtkDW	7342	`xr.DataArray.plot.pcolormesh(robust="col/row")`	aaronspring 12237157	closed	3	2022-12-01T16:01:27Z	2022-12-12T12:17:45Z	2022-12-12T12:17:45Z	CONTRIBUTOR			Is your feature request related to a problem? I often want to get a quick view from multi-dimensional data from an `xr.Dataset` with multiple variables at once in a one-liner. I really like the `robust=True` feature and think it could also allow `"col"` and `"row"` to be robust only across columns or rows. Describe the solution you'd like `python ds = xr.tutorial.load_dataset("eraint_uvz") ds.mean("month").to_array().plot(col="level", row="variable", robust="row")` What I get and do not like because it apply robust either to all data or nothing: What I would like to see, see below in alternative what I always do Describe alternatives you've considered `python ds = xr.tutorial.load_dataset("eraint_uvz") for v in ds.data_vars: ds[v].mean("month").plot(col="level", robust=True) plt.show()` Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7342/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1071049280	I_kwDOAMm_X84_1upA	6045	`xr.infer_freq` month bug for `freq='6MS'` starting Jan becomes `freq='2QS-OCT'`	aaronspring 12237157	closed	3	2021-12-03T23:36:56Z	2022-06-24T22:58:47Z	2022-06-24T22:58:47Z	CONTRIBUTOR			What happened: @dougiesquire brought up https://github.com/pangeo-data/climpred/issues/698. During debugging I discovered unexpected behaviour in `xr.infer_freq`: `freq='6MS'` starting Jan becomes `freq='2QS-OCT'` What you expected to happen: `freq='6MS'` starting Jan becomes `freq='2QS-Jan'` Minimal Complete Verifiable Example: Creating an `6MS` index starting in Jan with pandas and xarray yields different `freq`. `2QS` and `6MS` are equivalent for quarter starting months but the `month` offset in `CFTimeIndex.freq` is wrong. ```python import pandas as pd i_pd = pd.date_range(start="2000-01-01", end="2002-01-01", freq="6MS") i_pd DatetimeIndex(['2000-01-01', '2000-07-01', '2001-01-01', '2001-07-01', '2002-01-01'], dtype='datetime64[ns]', freq='6MS') pd.infer_freq(i_pd) '2QS-OCT' import xarray as xr xr.cftime_range(start="2000-01-01", end="2002-01-01", freq="6MS") CFTimeIndex([2000-01-01 00:00:00, 2000-07-01 00:00:00, 2001-01-01 00:00:00, 2001-07-01 00:00:00, 2002-01-01 00:00:00], dtype='object', length=5, calendar='gregorian', freq='2QS-OCT') ``` Anything else we need to know?: outline how to solve: https://github.com/pangeo-data/climpred/issues/698#issuecomment-985899966	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6045/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1214290591	I_kwDOAMm_X85IYJqf	6510	Feature request: raise more informative error message for `xr.open_dataset(list_of_paths)`	aaronspring 12237157	open	4	2022-04-25T10:22:25Z	2022-04-29T16:47:56Z		CONTRIBUTOR			Is your feature request related to a problem? I sometimes use `xr.open_dataset` instead of `xr.open_mfdataset` on multiple paths. I propose to raise a more informative error message than ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4', 'h5netcdf', 'scipy', 'cfgrib']. Consider explicitly selecting one of the installed engines via the ``engine`` parameter, or installing additional IO dependencies, see: https://docs.xarray.dev/en/stable/getting-started-guide/installing.html https://docs.xarray.dev/en/stable/user-guide/io.html. ```python import xarray as xr xr.version # '2022.3.0' ds = xr.tutorial.load_dataset("air_temperature") ds.isel(time=slice(None,1500)).to_netcdf("file1.nc") ds.isel(time=slice(1500,None)).to_netcdf("file2.nc") xr.open_mfdataset(["file1.nc","file2.nc"]) # works xr.open_mfdataset("file?.nc") # works I understand what I need to do here xr.open_dataset("file?.nc") # fails FileNotFoundError: No such file or directory: b'/dir/file?.nc' I dont here; I also first try to check whether one of these files is corrupt xr.open_dataset(["file1.nc","file2.nc"]) # fails ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4', 'h5netcdf', 'scipy', 'cfgrib']. Consider explicitly selecting one of the installed engines via the `engine` parameter, or installing additional IO dependencies, see: links ``` Describe the solution you'd like directing the user towards the solution, i.e. "found path as list, please use open_mfdataset" Describe alternatives you've considered No response Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6510/reactions", "total_count": 6, "+1": 6, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1092867975	I_kwDOAMm_X85BI9eH	6134	[FEATURE]: `CFTimeIndex.shift(float)`	aaronspring 12237157	closed	1	2022-01-03T22:33:58Z	2022-02-15T23:05:04Z	2022-02-15T23:05:04Z	CONTRIBUTOR			Is your feature request related to a problem? `CFTimeIndex.shift()` allows only `int` but sometimes I'd like to shift by a float e.g. 0.5. For small freqs, that shouldnt be a problem as `pd.Timedelta` allows floats for `days` and below. For freqs of months and larger, it becomes more tricky. Fractional shifts work for `calendar=360` easily, for other `calendar`s thats not possible. Describe the solution you'd like `CFTimeIndex.shift(0.5, 'D')` `CFTimeIndex.shift(0.5, 'M')` for 360day calendar `CFTimeIndex.shift(0.5, 'M')` for other calendars fails Describe alternatives you've considered solution we have in climpred: https://github.com/pangeo-data/climpred/blob/617223b5bea23a094065efe46afeeafe9796fa97/climpred/utils.py#L657 Additional context https://xarray.pydata.org/en/stable/generated/xarray.CFTimeIndex.shift.html	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6134/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1093466537	PR_kwDOAMm_X84wg_Js	6135	Implement multiplication of cftime Tick offsets by floats	aaronspring 12237157	closed	7	2022-01-04T15:28:16Z	2022-02-15T23:05:04Z	2022-02-15T23:05:04Z	CONTRIBUTOR	0	pydata/xarray/pulls/6135	[x] Closes #6134 [x] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [x] ~~New functions/methods are listed in `api.rst`~~ `shift` allows `float` with freq `D`, `H`, `min`, `S`, `ms` Refs: - https://docs.python.org/3/library/datetime.html - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timedelta.html#pandas.Timedelta - https://xarray.pydata.org/en/stable/generated/xarray.CFTimeIndex.shift.html	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6135/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1120583442	I_kwDOAMm_X85Cyr8S	6230	[PERFORMANCE]: `isin` on `CFTimeIndex`-backed `Coordinate` slow	aaronspring 12237157	open	5	2022-02-01T12:04:02Z	2022-02-07T23:40:48Z		CONTRIBUTOR			Is your feature request related to a problem? I want to do `coord1.isin.coord2` and it is quite slow when coords are large and of object type `CFTimeIndex`. ```python import xarray as xr import numpy as np n=1000 coord1 = xr.cftime_range(start='2000', freq='MS', periods=n) coord2 = xr.cftime_range(start='2000', freq='3MS', periods=n) cftimeindex: very fast %timeit coord1.isin(coord2) # 743 µs ± 1.33 µs np.isin on index.asi8 %timeit np.isin(coord1.asi8,coord2.asi8) # 7.83 ms ± 14.1 µs da = xr.DataArray(np.random.random((n,n)),dims=['a','b'],coords={'a':coord1,'b':coord2}) when xr.DataArray coordinate slow %timeit da.a.isin(da.b) # 94.9 ms ± 959 µs when converting xr.DataArray coordinate back to index slow %timeit np.isin(da.a.to_index(), da.b.to_index()) # 97.4 ms ± 819 µs when converting xr.DataArray coordinate back to index asi %timeit np.isin(da.a.to_index().asi8, da.b.to_index().asi8) # 7.89 ms ± 15.2 µs ``` Describe the solution you'd like faster `coord1.isin.coord2` by default. could we re-route here, e.g. to the alternative? conversion from `coordinate` `to_index()` is costly I guess Describe alternatives you've considered `np.isin(coord1.to_index().asi8, coord2.to_index().asi8` brings me nice speedups in https://github.com/pangeo-data/climpred/pull/724 Additional context unsure whether this issue should go here on in `cftime`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6230/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1119996723	PR_kwDOAMm_X84x3fWS	6223	`GHA` `concurrency` followup	aaronspring 12237157	closed	1	2022-01-31T22:21:09Z	2022-01-31T23:16:20Z	2022-01-31T23:16:20Z	CONTRIBUTOR	0	pydata/xarray/pulls/6223	follows #6210	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6223/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1118564242	PR_kwDOAMm_X84xywhB	6210	`GHA` `concurrency`	aaronspring 12237157	closed	3	2022-01-30T14:56:01Z	2022-01-31T22:25:27Z	2022-01-31T16:59:27Z	CONTRIBUTOR	0	pydata/xarray/pulls/6210	[x] Closes #5190 [ ] Tests added [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] New functions/methods are listed in `api.rst` https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#concurrency `concurrency` instead of `cancel-duplicate-runs.yaml`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6210/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1071054456	PR_kwDOAMm_X84vYnOq	6046	Fix `xr.infer_freq` quarterly month	aaronspring 12237157	closed	0	2021-12-03T23:48:43Z	2022-01-04T13:54:49Z	2022-01-04T13:54:49Z	CONTRIBUTOR	1	pydata/xarray/pulls/6046	[ ] Closes #6045 [ ] Tests added [x] Passes `pre-commit run --all-files` [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` ~~- [ ] New functions/methods are listed in `api.rst`~~	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6046/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
1058047751	PR_kwDOAMm_X84uv1d0	6007	Use condas dask-core in ci instead of dask to speedup ci and reduce dependencies	aaronspring 12237157	closed	1	2021-11-19T02:02:41Z	2021-11-28T21:01:36Z	2021-11-28T04:40:34Z	CONTRIBUTOR	0	pydata/xarray/pulls/6007	[ ] Closes #xxxx [ ] Tests added [ ] Passes `pre-commit run --all-files` [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] New functions/methods are listed in `api.rst` Tried to reduce dependencies from installing dask via conda which installs like pip install dask[complete]. dask-core is like pip install dask. https://github.com/xgcm/xhistogram/pull/71#discussion_r752738286 Why? dask[complete] includes bokeh etc which are not needed here and likely speed up CI setup/install times but now dask and dask-core are conda installed :( seems like iris installs dask https://github.com/conda-forge/iris-feedstock/blob/master/recipe/meta.yaml, so this would require an iris-feedstock PR first linking https://github.com/SciTools/iris/pull/4434 and https://github.com/conda-forge/iris-feedstock/pull/77	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6007/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
954308458	MDExOlB1bGxSZXF1ZXN0Njk4MjI0Mjcx	5639	Del duplicate set_options in api.rst	aaronspring 12237157	closed	3	2021-07-27T22:19:38Z	2021-07-30T08:47:36Z	2021-07-30T08:20:15Z	CONTRIBUTOR	0	pydata/xarray/pulls/5639	[x] Passes `pre-commit run --all-files` [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5639/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
827561388	MDExOlB1bGxSZXF1ZXN0NTg5NDU5NDQ1	5020	add polyval to polyfit see also	aaronspring 12237157	closed	1	2021-03-10T11:14:02Z	2021-03-10T14:20:11Z	2021-03-10T12:59:41Z	CONTRIBUTOR	0	pydata/xarray/pulls/5020	[x] Closes #5016 [ ] Tests added [x] Passes `pre-commit run --all-files` [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] New functions/methods are listed in `api.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5020/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
748094631	MDExOlB1bGxSZXF1ZXN0NTI1MTgzOTQ5	4597	add freq as CFTimeIndex property and to CFTimeIndex.__repr__	aaronspring 12237157	closed	11	2020-11-21T20:12:36Z	2020-11-25T09:16:49Z	2020-11-24T21:53:27Z	CONTRIBUTOR	0	pydata/xarray/pulls/4597	[x] Closes #2416 [x] Tests added [x] Passes `isort . && black . && mypy . && flake8` [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [x] New functions/methods are listed in `api.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4597/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
707223289	MDU6SXNzdWU3MDcyMjMyODk=	4451	xr.open_dataset(remote_url) file not found	aaronspring 12237157	closed	1	2020-09-23T10:00:54Z	2020-09-23T12:03:37Z	2020-09-23T12:03:37Z	CONTRIBUTOR			What happened: I tried to open a remote url and got OSError, but !wget url works What you expected to happen: open the remote netcdf file Minimal Complete Verifiable Example: ```python from netCDF4 import Dataset import netCDF4 netCDF4.version import xarray as xr xr.version url='https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc' working_url='https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p5deg/GFS_Global_0p5deg_20200923_0000.grib2' xr.open_dataset(url) ... netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.init() netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success() OSError: [Errno -90] NetCDF: file not found: b'https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc' seems to be netcdf4 upstream issue Dataset(url) OSError Traceback (most recent call last) <ipython-input-14-265839034cee> in <module> ----> 1 Dataset(url) netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.init() netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success() OSError: [Errno -90] NetCDF: file not found: b'https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc' ``` Anything else we need to know?: Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 \| packaged by conda-forge \| (default, Jan 7 2020, 22:33:48) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 2.6.32-754.29.2.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.6.2 xarray: 0.16.1 pandas: 1.1.2 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.1.2 pydap: installed h5netcdf: 0.8.0 h5py: 2.10.0 Nio: 1.5.5 zarr: 2.4.0 cftime: 1.2.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.1.0 cfgrib: 0.9.7.6 iris: 2.2.0 bottleneck: 1.3.1 dask: 2.15.0 distributed: 2.20.0 matplotlib: 3.1.2 cartopy: 0.17.0 seaborn: 0.10.1 numbagg: None pint: 0.11 setuptools: 47.1.1.post20200529 pip: 20.2.3 conda: None pytest: 5.3.5 IPython: 7.15.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4451/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
668717850	MDU6SXNzdWU2Njg3MTc4NTA=	4290	bool(Dataset(False)) is True	aaronspring 12237157	closed	9	2020-07-30T13:23:14Z	2020-08-05T14:25:55Z	2020-08-05T13:48:55Z	CONTRIBUTOR			What happened: ```python v=True bool(xr.DataArray(v)) # True bool(xr.DataArray(v).to_dataset(name='var')) # True v=False bool(xr.DataArray(v)) # False unexpected behaviour below bool(xr.DataArray(v).to_dataset(name='var')) # True ``` What you expected to happen: `python bool(xr.DataArray(False).to_dataset(name='var')) # False` Maybe this is intentional and I dont understand why. xr.version = '0.16.0'	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4290/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
624378150	MDExOlB1bGxSZXF1ZXN0NDIyODEzOTYy	4092	CFTimeIndex calendar in repr	aaronspring 12237157	closed	19	2020-05-25T15:55:20Z	2020-07-23T17:38:39Z	2020-07-23T10:42:29Z	CONTRIBUTOR	0	pydata/xarray/pulls/4092	[x] Closes #2416 [x] Tests added [x] Passes `isort -rc . && black . && mypy . && flake8` [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API Done: - added `calendar` property to `CFTimeIndex` - rebuild repr from pandas	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4092/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
611839345	MDU6SXNzdWU2MTE4MzkzNDU=	4025	Visualize task tree	aaronspring 12237157	closed	3	2020-05-04T12:31:25Z	2020-05-08T09:10:08Z	2020-05-04T14:43:25Z	CONTRIBUTOR			While reading this excellent discussion on working with large onetimestep datasets https://discourse.pangeo.io/t/best-practices-to-go-from-1000s-of-netcdf-files-to-analyses-on-a-hpc-cluster/588/10 I asked myself again why we don’t have the task tree visualisation in xarray as we have in dask. Is there a technical reason that prevents us from implementing visualize? This feature would be extremely useful for me. Maybe it’s easier to do this for dataarrays first. ```python ds = rasm Tutorial ds = ds.chunk({“time”:2}) ds.visualize() ``` Expected Output Figure of task tree https://docs.dask.org/en/latest/graphviz.html Problem Description visualize the task tree only implemented in dask. Now I recreate my xr Problem in dask to circumvent. Nicer would be .visualize() in xarray. https://discourse.pangeo.io/t/best-practices-to-go-from-1000s-of-netcdf-files-to-analyses-on-a-hpc-cluster/588/10	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4025/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
577105538	MDExOlB1bGxSZXF1ZXN0Mzg0OTY0MDcz	3844	Implement skipna kwarg in xr.quantile	aaronspring 12237157	closed	5	2020-03-06T18:36:55Z	2020-03-09T09:46:25Z	2020-03-08T17:42:44Z	CONTRIBUTOR	0	pydata/xarray/pulls/3844	[x] Closes #3843 [x] Tests added [x] Passes `isort -rc . && black . && mypy . && flake8` [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3844/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
577088426	MDU6SXNzdWU1NzcwODg0MjY=	3843	Implement `skipna` in xr.quantile for speedup	aaronspring 12237157	closed	1	2020-03-06T17:58:28Z	2020-03-08T17:42:43Z	2020-03-08T17:42:43Z	CONTRIBUTOR			`xr.quantile` uses `np.nanquantile` which is slower than `np.quantile` but only needed when ignoring nans is needed. Adding `skipna` as kwarg would lead to a speedup for many use-cases. MCVE Code Sample `np.quantile` is much faster than `np.nanquantile` ```python control = xr.DataArray(np.random.random((50,256,192)),dims=['time','x','y']) %time _ = control.quantile(dim='time',q=q) CPU times: user 4.14 s, sys: 61.4 ms, total: 4.2 s Wall time: 4.3 s %time _ = np.quantile(control,q,axis=0) CPU times: user 47.1 ms, sys: 4.27 ms, total: 51.4 ms Wall time: 52.6 ms %time _ = np.nanquantile(control,q,axis=0) CPU times: user 3.18 s, sys: 21.4 ms, total: 3.2 s Wall time: 3.22 s ``` Expected Output faster xr.quantile: ``` %time _ = control.quantile(dim='time',q=q) CPU times: user 4.95 s, sys: 34.3 ms, total: 4.98 s Wall time: 5.88 s %time _ = control.quantile(dim='time',q=q, skipna=False) CPU times: user 85.3 ms, sys: 16.7 ms, total: 102 ms Wall time: 127 ms ``` Problem Description np.nanquantile not always needed Versions Output of `xr.show_versions()` xr=0.15.1	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3843/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
433833707	MDU6SXNzdWU0MzM4MzM3MDc=	2900	open_mfdataset with proprocess ds[var]	aaronspring 12237157	closed	3	2019-04-16T15:07:36Z	2019-04-16T19:09:34Z	2019-04-16T19:09:34Z	CONTRIBUTOR			Code Sample, a copy-pastable example if possible I would like to load only one variable from larger files containing 10s of variables. The files get really large when I open them. I expect them to be opened lazily also fast if I only want to extract one variable (maybe this is my misunderstand here). I hoped to use `preprocess`, but I don't get it working. Here my minimum example with 3 files of 12 timesteps each and two variable, but I only want to load one: ```python ds = xr.open_mfdataset(path) ds <xarray.Dataset> Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)> caex90 (time, depth_2, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)> def preprocess(ds,var='co2flux'): return ds[var] ds = xr.open_mfdataset(path,preprocess=preprocess) ValueError Traceback (most recent call last) <ipython-input-17-770267b86462> in <module> 1 def preprocess(ds,var='co2flux'): 2 return ds[var] ----> 3 ds = xr.open_mfdataset(path,preprocess=preprocess) /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, lock, data_vars, coords, autoclose, parallel, kwargs) 717 data_vars=data_vars, coords=coords, 718 infer_order_from_coords=infer_order_from_coords, --> 719 ids=ids) 720 except ValueError: 721 for ds in datasets: /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _auto_combine(datasets, concat_dims, compat, data_vars, coords, infer_order_from_coords, ids) 551 # Repeatedly concatenate then merge along each dimension 552 combined = _combine_nd(combined_ids, concat_dims, compat=compat, --> 553 data_vars=data_vars, coords=coords) 554 return combined 555 /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _combine_nd(combined_ids, concat_dims, data_vars, coords, compat) 473 data_vars=data_vars, 474 coords=coords, --> 475 compat=compat) 476 combined_ds = list(combined_ids.values())[0] 477 return combined_ds /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _auto_combine_all_along_first_dim(combined_ids, dim, data_vars, coords, compat) 491 datasets = combined_ids.values() 492 new_combined_ids[new_id] = _auto_combine_1d(datasets, dim, compat, --> 493 data_vars, coords) 494 return new_combined_ids 495 /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in _auto_combine_1d(datasets, concat_dim, compat, data_vars, coords) 505 if concat_dim is not None: 506 dim = None if concat_dim is _CONCAT_DIM_DEFAULT else concat_dim --> 507 sorted_datasets = sorted(datasets, key=vars_as_keys) 508 grouped_by_vars = itertools.groupby(sorted_datasets, key=vars_as_keys) 509 concatenated = [_auto_concat(list(ds_group), dim=dim, /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/combine.py in vars_as_keys(ds) 496 497 def vars_as_keys(ds): --> 498 return tuple(sorted(ds)) 499 500 /work/mh0727/m300524/anaconda3/envs/my_jupyter/lib/python3.6/site-packages/xarray/core/common.py in bool(self) 80 81 def bool(self): ---> 82 return bool(self.values) 83 84 # Python 3 uses bool, Python 2 uses nonzero** ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ``` I was hoping that `data_vars` could work like this but it has no effect. Probably I got the documentation wrong here. python ds = xr.open_mfdataset(path,data_vars=['co2flux']) ds <xarray.Dataset> Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)> caex90 (time, depth_2, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)> Problem description I would expect from the documentation the below behaviour. Expected Output ```python ds = xr.open_mfdataset(path,data_vars=['co2flux']) ds <xarray.Dataset> Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)> ds = xr.open_mfdataset(path,preprocess=preprocess) ds <xarray.Dataset> Dimensions: (depth: 1, depth_2: 1, time: 36, x: 2, y: 2) Coordinates: * depth (depth) float64 0.0 lon (y, x) float64 -48.11 -47.43 -48.21 -47.52 lat (y, x) float64 56.52 56.47 56.14 56.09 * depth_2 (depth_2) float64 90.0 * time (time) datetime64[ns] 1850-01-31T23:15:00 ... 1852-12-31T23:15:00 Dimensions without coordinates: x, y Data variables: co2flux (time, depth, y, x) float32 dask.array<shape=(36, 1, 2, 2), chunksize=(12, 1, 2, 2)> ``` Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 \|Anaconda, Inc.\| (default, Oct 23 2018, 19:16:44) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 2.6.32-696.18.7.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.24.2 numpy: 1.14.2 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.2.0 cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 1.2.0 distributed: 1.27.0 matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.4.0 pip: 18.1 conda: None pytest: None IPython: 7.0.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2900/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

21 rows where user = 12237157 sorted by updated_at descending

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem?

I understand what I need to do here

I dont here; I also first try to check whether one of these files is corrupt

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem?

cftimeindex: very fast

np.isin on index.asi8

when xr.DataArray coordinate slow

when converting xr.DataArray coordinate back to index slow

when converting xr.DataArray coordinate back to index asi

Describe the solution you'd like

Describe alternatives you've considered

Additional context

working_url='https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p5deg/GFS_Global_0p5deg_20200923_0000.grib2'

seems to be netcdf4 upstream issue

Dataset(url)

unexpected behaviour below

ds = rasm Tutorial

Expected Output

Problem Description

MCVE Code Sample

Expected Output

Problem Description

Versions

Code Sample, a copy-pastable example if possible

ds = xr.open_mfdataset(path,preprocess=preprocess)

Problem description

Expected Output

Output of xr.show_versions()

Advanced export

Output of `xr.show_versions()`