github: issues: 3 rows where state = "closed", type = "issue" and user = 102827 sorted by updated

3 rows where state = "closed", type = "issue" and user = 102827 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
374279704	MDU6SXNzdWUzNzQyNzk3MDQ=	2514	interpolate_na with limit argument changes size of chunks	cchwala 102827	closed	8	2018-10-26T08:31:35Z	2021-03-26T19:50:50Z	2021-03-26T19:50:50Z	CONTRIBUTOR	Code Sample, a copy-pastable example if possible ```python import pandas as pd import xarray as xr import numpy as np t = pd.date_range(start='2018-01-01', end='2018-02-01', freq='H') foo = np.sin(np.arange(len(t))) bar = np.cos(np.arange(len(t))) foo[1] = np.NaN bar[2] = np.NaN ds_test = xr.Dataset(data_vars={'foo': ('time', foo), 'bar': ('time', bar)}, coords={'time': t}).chunk() print(ds_test) print("\n\n### After `.interpolate_na(dim='time')`\n") print(ds_test.interpolate_na(dim='time')) print("\n\n### After `.interpolate_na(dim='time', limit=5)`\n") print(ds_test.interpolate_na(dim='time', limit=5)) print("\n\n### After `.interpolate_na(dim='time', limit=20)`\n") print(ds_test.interpolate_na(dim='time', limit=20)) ``` Output of the above code. Note the different chunk sizes, depending on the value of `limit`: ``` <xarray.Dataset> Dimensions: (time: 745) Coordinates: * time (time) datetime64[ns] 2018-01-01 2018-01-01T01:00:00 ... 2018-02-01 Data variables: foo (time) float64 dask.array<shape=(745,), chunksize=(745,)> bar (time) float64 dask.array<shape=(745,), chunksize=(745,)> After `.interpolate_na(dim='time')` <xarray.Dataset> Dimensions: (time: 745) Coordinates: * time (time) datetime64[ns] 2018-01-01 2018-01-01T01:00:00 ... 2018-02-01 Data variables: foo (time) float64 dask.array<shape=(745,), chunksize=(745,)> bar (time) float64 dask.array<shape=(745,), chunksize=(745,)> After `.interpolate_na(dim='time', limit=5)` <xarray.Dataset> Dimensions: (time: 745) Coordinates: * time (time) datetime64[ns] 2018-01-01 2018-01-01T01:00:00 ... 2018-02-01 Data variables: foo (time) float64 dask.array<shape=(745,), chunksize=(3,)> bar (time) float64 dask.array<shape=(745,), chunksize=(3,)> After `.interpolate_na(dim='time', limit=20)` <xarray.Dataset> Dimensions: (time: 745) Coordinates: * time (time) datetime64[ns] 2018-01-01 2018-01-01T01:00:00 ... 2018-02-01 Data variables: foo (time) float64 dask.array<shape=(745,), chunksize=(10,)> bar (time) float64 dask.array<shape=(745,), chunksize=(10,)> ``` Problem description When using `xarray.DataArray.interpolate_na()` with the `limit` kwarg this changes the chunksize of the resulting `dask.arrays`. Expected Output The chunksize should not change. Very small chunks which results from typical small values of `limit` are not optimal for the performance of `dask`. Also, things like `.rolling()` will fail if the chunksize is smaller than the window length of the rolling window. Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 2.7.15.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: None.None xarray: 0.10.9 pandas: 0.23.3 numpy: 1.13.3 scipy: 1.0.0 netCDF4: 1.4.1 h5netcdf: 0.5.0 h5py: 2.8.0 Nio: None zarr: None cftime: 1.0.1 PseudonetCDF: None rasterio: None iris: None bottleneck: 1.2.1 cyordereddict: 1.0.0 dask: 0.19.4 distributed: 1.23.3 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 38.5.2 pip: 9.0.1 conda: 4.5.11 pytest: 3.4.2 IPython: 5.5.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2514/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
376154741	MDU6SXNzdWUzNzYxNTQ3NDE=	2531	DataArray.rolling() does not preserve chunksizes in some cases	cchwala 102827	closed	2	2018-10-31T20:50:33Z	2021-03-26T19:50:49Z	2021-03-26T19:50:49Z	CONTRIBUTOR	This issue was found and discussed in the related issue #2514 I open a separate issue for clarity. Code Sample, a copy-pastable example if possible ```python import pandas as pd import numpy as np import xarray as xr t = pd.date_range(start='2018-01-01', end='2018-02-01', freq='H') bar = np.sin(np.arange(len(t))) baz = np.cos(np.arange(len(t))) da_test = xr.DataArray(data=np.stack([bar, baz]), coords={'time': t, 'sensor': ['one', 'two']}, dims=('sensor', 'time')) print(da_test.chunk({'time': 100}).rolling(time=60).mean().chunks) print(da_test.chunk({'time': 100}).rolling(time=60).count().chunks) Output for `mean`: ((2,), (745,)) Output for `count`: ((2,), (100, 100, 100, 100, 100, 100, 100, 45)) Desired Output: ((2,), (100, 100, 100, 100, 100, 100, 100, 45)) ``` Problem description DataArray.rolling() does not preserve the chunksizes, apparently depending on the applied method. Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 2.7.15.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: None.None xarray: 0.10.9 pandas: 0.23.3 numpy: 1.13.3 scipy: 1.0.0 netCDF4: 1.4.1 h5netcdf: 0.5.0 h5py: 2.8.0 Nio: None zarr: None cftime: 1.0.1 PseudonetCDF: None rasterio: None iris: None bottleneck: 1.2.1 cyordereddict: 1.0.0 dask: 0.19.4 distributed: 1.23.3 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 38.5.2 pip: 9.0.1 conda: 4.5.11 pytest: 3.4.2 IPython: 5.5.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2531/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
226549366	MDU6SXNzdWUyMjY1NDkzNjY=	1399	`decode_cf_datetime()` slow because `pd.to_timedelta()` is slow if floats are passed	cchwala 102827	closed	6	2017-05-05T11:48:00Z	2017-07-25T17:42:52Z	2017-07-25T17:42:52Z	CONTRIBUTOR	Hi, `decode_cf_datetime` is slowed down because it always passes floats to `pd.to_timedelta`, while `pd.to_timedelta` is much faster when working on integers. Here is a notebook that shows the differences. Working with integers is approx. one order of magnitude faster. Hence, it would be great to automatically do the conversion from raw time value floats to integers in nanoseconds where possible (likely limited to resolutions bellow days or hours to avoid coping with different durations numbers of nanoseconds within e.g. different months). As alternative, maybe avoid forcing the cast to floats and indicate in the docstring that the raw values should be integers to speed up the conversion. This could possibly also be resolved in `pd.to_timedelta` but I assume it will be more complicated to deal with all the edge cases there.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1399/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

3 rows where state = "closed", type = "issue" and user = 102827 sorted by updated_at descending

Code Sample, a copy-pastable example if possible

After `.interpolate_na(dim='time')`

After `.interpolate_na(dim='time', limit=5)`

After `.interpolate_na(dim='time', limit=20)`

Problem description

Expected Output

Output of `xr.show_versions()`

Code Sample, a copy-pastable example if possible

Problem description

Output of `xr.show_versions()`

Advanced export

issues

3 rows where state = "closed", type = "issue" and user = 102827 sorted by updated_at descending

Code Sample, a copy-pastable example if possible

After .interpolate_na(dim='time')

After .interpolate_na(dim='time', limit=5)

After .interpolate_na(dim='time', limit=20)

Problem description

Expected Output

Output of xr.show_versions()

Code Sample, a copy-pastable example if possible

Problem description

Output of xr.show_versions()

Advanced export

After `.interpolate_na(dim='time')`

After `.interpolate_na(dim='time', limit=5)`

After `.interpolate_na(dim='time', limit=20)`

Output of `xr.show_versions()`

Output of `xr.show_versions()`