home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where issue = 374279704 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 4

  • cchwala 4
  • fujiisoup 2
  • dcherian 1
  • stale[bot] 1

author_association 3

  • CONTRIBUTOR 4
  • MEMBER 3
  • NONE 1

issue 1

  • interpolate_na with limit argument changes size of chunks · 8 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
702555417 https://github.com/pydata/xarray/issues/2514#issuecomment-702555417 https://api.github.com/repos/pydata/xarray/issues/2514 MDEyOklzc3VlQ29tbWVudDcwMjU1NTQxNw== stale[bot] 26384082 2020-10-02T06:38:05Z 2020-10-02T06:38:05Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  interpolate_na with limit argument changes size of chunks 374279704
433454137 https://github.com/pydata/xarray/issues/2514#issuecomment-433454137 https://api.github.com/repos/pydata/xarray/issues/2514 MDEyOklzc3VlQ29tbWVudDQzMzQ1NDEzNw== cchwala 102827 2018-10-26T15:49:20Z 2018-10-31T21:14:48Z CONTRIBUTOR

EDIT: The issue of this post is now separated #2531

I think I have a fix, but wanted to write some failing tests before committing the changes. Doing this I discovered that also DataArray.rolling() does not preserve the chunksizes, apparently depending on the applied method.

```python import pandas as pd import numpy as np import xarray as xr

t = pd.date_range(start='2018-01-01', end='2018-02-01', freq='H') bar = np.sin(np.arange(len(t))) baz = np.cos(np.arange(len(t)))

da_test = xr.DataArray(data=np.stack([bar, baz]), coords={'time': t, 'sensor': ['one', 'two']}, dims=('sensor', 'time'))

print(da_test.chunk({'time': 100}).rolling(time=60).mean().chunks)

print(da_test.chunk({'time': 100}).rolling(time=60).count().chunks) Output for mean: ((2,), (745,)) Output for count: ((2,), (100, 100, 100, 100, 100, 100, 100, 45)) Desired Output: ((2,), (100, 100, 100, 100, 100, 100, 100, 45)) ```

My fix solves my initial problem, but maybe if done correctly it should also solve this bug, too.

Any idea why this depends on whether .mean() or .count() is used?

I have already pushed some WIP changes. Should I already open a PR if though most new test still fail?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  interpolate_na with limit argument changes size of chunks 374279704
433992180 https://github.com/pydata/xarray/issues/2514#issuecomment-433992180 https://api.github.com/repos/pydata/xarray/issues/2514 MDEyOklzc3VlQ29tbWVudDQzMzk5MjE4MA== cchwala 102827 2018-10-29T17:01:12Z 2018-10-29T17:01:12Z CONTRIBUTOR

@dcherian Okay. A WIP PR will follow, but might take some days.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  interpolate_na with limit argument changes size of chunks 374279704
433528586 https://github.com/pydata/xarray/issues/2514#issuecomment-433528586 https://api.github.com/repos/pydata/xarray/issues/2514 MDEyOklzc3VlQ29tbWVudDQzMzUyODU4Ng== fujiisoup 6815844 2018-10-26T20:08:33Z 2018-10-27T20:50:47Z MEMBER

Nice catch!

For some historical reasons, mean and some reduction method uses bottleneck as default, while count does not.

mean goes through this function https://github.com/pydata/xarray/blob/b622c5e7da928524ef949d9e389f6c7f38644494/xarray/core/dask_array_ops.py#L23

It looks there is another but for this function.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  interpolate_na with limit argument changes size of chunks 374279704
433514829 https://github.com/pydata/xarray/issues/2514#issuecomment-433514829 https://api.github.com/repos/pydata/xarray/issues/2514 MDEyOklzc3VlQ29tbWVudDQzMzUxNDgyOQ== dcherian 2448579 2018-10-26T19:14:44Z 2018-10-26T19:14:44Z MEMBER

@cchwala Discussion is a lot easier on a PR so go ahead and do that. You can add WIP in the title.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  interpolate_na with limit argument changes size of chunks 374279704
433369567 https://github.com/pydata/xarray/issues/2514#issuecomment-433369567 https://api.github.com/repos/pydata/xarray/issues/2514 MDEyOklzc3VlQ29tbWVudDQzMzM2OTU2Nw== cchwala 102827 2018-10-26T10:53:32Z 2018-10-26T10:53:32Z CONTRIBUTOR

Thanks @fujiisoup for the quick response and the pointers. I will have a look and report back if a PR is within my capabilities or not.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  interpolate_na with limit argument changes size of chunks 374279704
433353091 https://github.com/pydata/xarray/issues/2514#issuecomment-433353091 https://api.github.com/repos/pydata/xarray/issues/2514 MDEyOklzc3VlQ29tbWVudDQzMzM1MzA5MQ== fujiisoup 6815844 2018-10-26T09:49:04Z 2018-10-26T09:49:04Z MEMBER

Thanks, @cchwala, for reporting the issue.

It looks that the actual chunks size is ((10, 735), ) not all 10. python In [16]: ds_test.interpolate_na(dim='time', limit=20)['foo'].chunks Out[16]: ((10, 735),) (why does our __repr__ only show the first chunk size?) But it should be ((745, ), ) as you suggested.

The problem would be in https://github.com/pydata/xarray/blob/5940100761478604080523ebb1291ecff90e779e/xarray/core/dask_array_ops.py#L74-L85

This method is desinged to be used for multiplly chunked array, so I didn't care to add a small chunk on the head. Do you mind to look the inside?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  interpolate_na with limit argument changes size of chunks 374279704
433346685 https://github.com/pydata/xarray/issues/2514#issuecomment-433346685 https://api.github.com/repos/pydata/xarray/issues/2514 MDEyOklzc3VlQ29tbWVudDQzMzM0NjY4NQ== cchwala 102827 2018-10-26T09:27:19Z 2018-10-26T09:27:19Z CONTRIBUTOR

The problem seems to occur here

https://github.com/pydata/xarray/blob/5940100761478604080523ebb1291ecff90e779e/xarray/core/missing.py#L368-L376

because of the usage of .construct(). A quick try without it, shows that the chunksize is preserved then.

Hence, .construct() might need a fix for correctly dealing with the chunks of dask.arrays.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  interpolate_na with limit argument changes size of chunks 374279704

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.155ms · About: xarray-datasette