home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

3 rows where state = "closed", type = "issue" and user = 102827 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue · 3 ✖

state 1

  • closed · 3 ✖

repo 1

  • xarray 3
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
374279704 MDU6SXNzdWUzNzQyNzk3MDQ= 2514 interpolate_na with limit argument changes size of chunks cchwala 102827 closed 0     8 2018-10-26T08:31:35Z 2021-03-26T19:50:50Z 2021-03-26T19:50:50Z CONTRIBUTOR      

Code Sample, a copy-pastable example if possible

```python import pandas as pd import xarray as xr import numpy as np

t = pd.date_range(start='2018-01-01', end='2018-02-01', freq='H') foo = np.sin(np.arange(len(t))) bar = np.cos(np.arange(len(t)))

foo[1] = np.NaN bar[2] = np.NaN

ds_test = xr.Dataset(data_vars={'foo': ('time', foo), 'bar': ('time', bar)}, coords={'time': t}).chunk()

print(ds_test) print("\n\n### After .interpolate_na(dim='time')\n") print(ds_test.interpolate_na(dim='time')) print("\n\n### After .interpolate_na(dim='time', limit=5)\n") print(ds_test.interpolate_na(dim='time', limit=5)) print("\n\n### After .interpolate_na(dim='time', limit=20)\n") print(ds_test.interpolate_na(dim='time', limit=20)) ```

Output of the above code. Note the different chunk sizes, depending on the value of limit: ``` <xarray.Dataset> Dimensions: (time: 745) Coordinates: * time (time) datetime64[ns] 2018-01-01 2018-01-01T01:00:00 ... 2018-02-01 Data variables: foo (time) float64 dask.array<shape=(745,), chunksize=(745,)> bar (time) float64 dask.array<shape=(745,), chunksize=(745,)>

After .interpolate_na(dim='time')

<xarray.Dataset> Dimensions: (time: 745) Coordinates: * time (time) datetime64[ns] 2018-01-01 2018-01-01T01:00:00 ... 2018-02-01 Data variables: foo (time) float64 dask.array<shape=(745,), chunksize=(745,)> bar (time) float64 dask.array<shape=(745,), chunksize=(745,)>

After .interpolate_na(dim='time', limit=5)

<xarray.Dataset> Dimensions: (time: 745) Coordinates: * time (time) datetime64[ns] 2018-01-01 2018-01-01T01:00:00 ... 2018-02-01 Data variables: foo (time) float64 dask.array<shape=(745,), chunksize=(3,)> bar (time) float64 dask.array<shape=(745,), chunksize=(3,)>

After .interpolate_na(dim='time', limit=20)

<xarray.Dataset> Dimensions: (time: 745) Coordinates: * time (time) datetime64[ns] 2018-01-01 2018-01-01T01:00:00 ... 2018-02-01 Data variables: foo (time) float64 dask.array<shape=(745,), chunksize=(10,)> bar (time) float64 dask.array<shape=(745,), chunksize=(10,)> ```

Problem description

When using xarray.DataArray.interpolate_na() with the limit kwarg this changes the chunksize of the resulting dask.arrays.

Expected Output

The chunksize should not change. Very small chunks which results from typical small values of limit are not optimal for the performance of dask. Also, things like .rolling() will fail if the chunksize is smaller than the window length of the rolling window.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.15.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: None.None xarray: 0.10.9 pandas: 0.23.3 numpy: 1.13.3 scipy: 1.0.0 netCDF4: 1.4.1 h5netcdf: 0.5.0 h5py: 2.8.0 Nio: None zarr: None cftime: 1.0.1 PseudonetCDF: None rasterio: None iris: None bottleneck: 1.2.1 cyordereddict: 1.0.0 dask: 0.19.4 distributed: 1.23.3 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 38.5.2 pip: 9.0.1 conda: 4.5.11 pytest: 3.4.2 IPython: 5.5.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2514/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
376154741 MDU6SXNzdWUzNzYxNTQ3NDE= 2531 DataArray.rolling() does not preserve chunksizes in some cases cchwala 102827 closed 0     2 2018-10-31T20:50:33Z 2021-03-26T19:50:49Z 2021-03-26T19:50:49Z CONTRIBUTOR      

This issue was found and discussed in the related issue #2514

I open a separate issue for clarity.

Code Sample, a copy-pastable example if possible

```python import pandas as pd import numpy as np import xarray as xr

t = pd.date_range(start='2018-01-01', end='2018-02-01', freq='H') bar = np.sin(np.arange(len(t))) baz = np.cos(np.arange(len(t)))

da_test = xr.DataArray(data=np.stack([bar, baz]), coords={'time': t, 'sensor': ['one', 'two']}, dims=('sensor', 'time'))

print(da_test.chunk({'time': 100}).rolling(time=60).mean().chunks)

print(da_test.chunk({'time': 100}).rolling(time=60).count().chunks) Output for mean: ((2,), (745,)) Output for count: ((2,), (100, 100, 100, 100, 100, 100, 100, 45)) Desired Output: ((2,), (100, 100, 100, 100, 100, 100, 100, 45)) ```

Problem description

DataArray.rolling() does not preserve the chunksizes, apparently depending on the applied method.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.15.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: None.None xarray: 0.10.9 pandas: 0.23.3 numpy: 1.13.3 scipy: 1.0.0 netCDF4: 1.4.1 h5netcdf: 0.5.0 h5py: 2.8.0 Nio: None zarr: None cftime: 1.0.1 PseudonetCDF: None rasterio: None iris: None bottleneck: 1.2.1 cyordereddict: 1.0.0 dask: 0.19.4 distributed: 1.23.3 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 38.5.2 pip: 9.0.1 conda: 4.5.11 pytest: 3.4.2 IPython: 5.5.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2531/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
226549366 MDU6SXNzdWUyMjY1NDkzNjY= 1399 `decode_cf_datetime()` slow because `pd.to_timedelta()` is slow if floats are passed cchwala 102827 closed 0     6 2017-05-05T11:48:00Z 2017-07-25T17:42:52Z 2017-07-25T17:42:52Z CONTRIBUTOR      

Hi, decode_cf_datetime is slowed down because it always passes floats to pd.to_timedelta, while pd.to_timedelta is much faster when working on integers.

Here is a notebook that shows the differences. Working with integers is approx. one order of magnitude faster.

Hence, it would be great to automatically do the conversion from raw time value floats to integers in nanoseconds where possible (likely limited to resolutions bellow days or hours to avoid coping with different durations numbers of nanoseconds within e.g. different months).

As alternative, maybe avoid forcing the cast to floats and indicate in the docstring that the raw values should be integers to speed up the conversion.

This could possibly also be resolved in pd.to_timedelta but I assume it will be more complicated to deal with all the edge cases there.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1399/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 5685.95ms · About: xarray-datasette