id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 718436141,MDU6SXNzdWU3MTg0MzYxNDE=,4498,Resample is ~100x slower than Pandas resample; Speed is related to resample period (unlike Pandas),145117,closed,0,,,7,2020-10-09T21:37:20Z,2022-05-15T02:38:29Z,2022-05-15T02:38:29Z,CONTRIBUTOR,,,,"**What happened**: I have a 10 minute frequency time series. When I resample to hourly it is slow. When I resample to daily it is fast. If I drop to Pandas and resample the speeds are ~100x faster than xarray, and also the same time regardless of the resample period. I've posted this to SO: https://stackoverflow.com/questions/64282393/ **What you expected to happen**: I expect xarray to be within an order of magnitude speed of Pandas, not > 2 orders of magnitude slower. **Minimal Complete Verifiable Example**: ```python import numpy as np import xarray as xr import pandas as pd import time size = 10000 times = pd.date_range('2000-01-01', periods=size, freq=""10Min"") da = xr.DataArray(data = np.random.random(size), dims = ['time'], coords = {'time': times}, name='foo') start = time.time() da_ = da.resample({'time':""1H""}).mean() print(""1H"", 'xr', str(time.time() - start)) start = time.time() da_ = da.to_dataframe().resample(""1H"").mean() print(""1H"", 'pd', str(time.time() - start), ""\n"") start = time.time() da_ = da.resample({'time':""1D""}).mean() print(""1D"", 'xr', str(time.time() - start)) start = time.time() da_ = da.to_dataframe().resample(""1D"").mean() print(""1D"", 'pd', str(time.time() - start)) ``` Output/timings ``` : 1H xr 0.1761918067932129 : 1H pd 0.0021948814392089844 : : 1D xr 0.00958395004272461 : 1D pd 0.001646280288696289 ``` **Anything else we need to know?**: **Environment**:
Output of xr.show_versions() xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 | packaged by conda-forge | (default, Aug 21 2020, 18:21:27) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-48-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.0 pandas: 1.1.1 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.4 pydap: None h5netcdf: 0.8.1 h5py: 2.10.0 Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.5 cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.3.1 cartopy: None seaborn: None numbagg: None pint: 0.15 setuptools: 49.6.0.post20200814 pip: 20.2.2 conda: None pytest: None IPython: 7.17.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4498/reactions"", ""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 297780998,MDU6SXNzdWUyOTc3ODA5OTg=,1917,Decode times adds micro-second noise to standard calendar,145117,closed,0,,,5,2018-02-16T13:14:15Z,2018-02-26T10:28:17Z,2018-02-26T10:28:17Z,CONTRIBUTOR,,,,"#### Code Sample, a copy-pastable example if possible I have a simplified NetCDF file with the following header: ```bash netcdf foo { dimensions: time = UNLIMITED ; // (366 currently) x = 2 ; y = 2 ; variables: float time(time) ; time:standard_name = ""time"" ; time:long_name = ""time"" ; time:units = ""DAYS since 2000-01-01 00:00:00"" ; time:calendar = ""standard"" ; time:axis = ""T"" ; ... } ``` I would expect xarray to be able to decode these times. It does, but appears to do so incorrectly and without reporting any issues. Note the fractional time added to each date. ```python In [4]: xr.open_dataset('foo.nc').time Out[4]: array(['2000-01-01T00:00:00.000000000', '2000-01-02T00:00:00.003211264', '2000-01-03T00:00:00.006422528', ..., '2000-12-29T00:00:01.962606592', '2000-12-30T00:00:01.672216576', '2000-12-31T00:00:01.381826560'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02T00:00:00.003211264 ... Attributes: standard_name: time long_name: time axis: T ``` #### Problem description Days since a valid date on a `standard` calendar should not add microseconds. I know that xarray has time issues, for example #118 #521 numpy:#6207 #531 #789 and #848. But all of those appear to address non-standard times. This bug (if it is a bug) seems to occur with a very simple and straight forward calendar, and is silent, so it took me 2 days to figure out what was going on. #### Output of ``xr.show_versions()``
In [5]: xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.5.4.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.0 pandas: 0.22.0 numpy: 1.12.1 scipy: 0.19.1 netCDF4: 1.3.1 h5netcdf: None Nio: None bottleneck: 1.2.1 cyordereddict: 1.0.0 dask: 0.16.0 matplotlib: 2.1.1 cartopy: 0.15.1 seaborn: 0.8.1 setuptools: 38.4.0 pip: 9.0.1 conda: None pytest: None IPython: 6.2.1 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1917/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue