home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 1088893989 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • josephnowak 4
  • dcherian 2

author_association 2

  • CONTRIBUTOR 4
  • MEMBER 2

issue 1

  • Forward Fill not working when there are all-NaN chunks · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1001787425 https://github.com/pydata/xarray/issues/6112#issuecomment-1001787425 https://api.github.com/repos/pydata/xarray/issues/6112 IC_kwDOAMm_X847thAh josephnowak 25071375 2021-12-27T22:44:43Z 2021-12-27T22:45:04Z CONTRIBUTOR

I will be on the lookout for any changes that may be required.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Forward Fill not working when there are all-NaN chunks 1088893989
1001744349 https://github.com/pydata/xarray/issues/6112#issuecomment-1001744349 https://api.github.com/repos/pydata/xarray/issues/6112 IC_kwDOAMm_X847tWfd dcherian 2448579 2021-12-27T20:39:57Z 2021-12-27T20:39:57Z MEMBER

Both sound good to me.

Your code for limit looks OK though I didn't look closely. It looks very similar to https://github.com/pydata/xarray/blob/3960ea3ba08f81d211899827612550f6ac2de804/xarray/core/missing.py#L30-L34

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Forward Fill not working when there are all-NaN chunks 1088893989
1001740657 https://github.com/pydata/xarray/issues/6112#issuecomment-1001740657 https://api.github.com/repos/pydata/xarray/issues/6112 IC_kwDOAMm_X847tVlx josephnowak 25071375 2021-12-27T20:27:16Z 2021-12-27T20:27:16Z CONTRIBUTOR

Two questions: 1. Is possible to set the array used for the test_push_dask as np.array([np.nan, 1, 2, 3, np.nan, np.nan, np.nan, np.nan, 4, 5, np.nan, 6])?, using that array you can validate the test case that I put on this issue without creating another array (It's the original array but permuted). 2. Can I erase the conditional that checks for the case where all the chunks have size 1?, I think that with the new method that is not necessary. py # I think this is only necessary due to the use of the map_overlap of the previous method. if all(c == 1 for c in array.chunks[axis]): array = array.rechunk({axis: 2})

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Forward Fill not working when there are all-NaN chunks 1088893989
1001676665 https://github.com/pydata/xarray/issues/6112#issuecomment-1001676665 https://api.github.com/repos/pydata/xarray/issues/6112 IC_kwDOAMm_X847tF95 josephnowak 25071375 2021-12-27T17:53:07Z 2021-12-27T17:59:57Z CONTRIBUTOR

yes, of course, by the way, it would be possible to add something like the following code for the case that there is a limit? I know this code generates like 4x more tasks but at least it does the job so, probably a warning could be sufficient. (If it is not good enough to be added there is no problem, probably building the graph manually will be a better option than using this algorithm for the forward fill with limits).

```py def ffill(x: xr.DataArray, dim: str, limit=None):

def _fill_with_last_one(a, b):
    # cumreduction apply the push func over all the blocks first so, 
    # the only missing part is filling the missing values using
    # the last data for every one of them
    if isinstance(a, np.ma.masked_array) or isinstance(b, np.ma.masked_array):
        a = np.ma.getdata(a)
        b = np.ma.getdata(b)
        values = np.where(~np.isnan(b), b, a)
        return np.ma.masked_array(values, mask=np.ma.getmaskarray(b))

    return np.where(~np.isnan(b), b, a)


from bottleneck import push


def _ffill(arr):
    return xr.DataArray(
        da.reductions.cumreduction(
            func=push,
            binop=_fill_with_last_one,
            ident=np.nan,
            x=arr.data,
            axis=arr.dims.index(dim),
            dtype=arr.dtype,
            method="sequential",
        ),
        dims=x.dims,
        coords=x.coords
    )

if limit is not None:
    axis = x.dims.index(dim)
    arange = xr.DataArray(
        da.broadcast_to(
            da.arange(
                x.shape[axis],
                chunks=x.chunks[axis],
                dtype=x.dtype
            ).reshape(
                tuple(size if i == axis else 1 for i, size in enumerate(x.shape))
            ),
            x.shape,
            x.chunks
        ),
        coords=x.coords,
        dims=x.dims
    )
    valid_limits = (arange - _ffill(arange.where(x.notnull(), np.nan))) <= limit
    return _ffill(arr).where(valid_limits, np.nan)

return _ffill(arr)

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Forward Fill not working when there are all-NaN chunks 1088893989
1001667358 https://github.com/pydata/xarray/issues/6112#issuecomment-1001667358 https://api.github.com/repos/pydata/xarray/issues/6112 IC_kwDOAMm_X847tDse dcherian 2448579 2021-12-27T17:25:10Z 2021-12-27T17:25:10Z MEMBER

Thanks @josephnowak . This is a great idea! 👏🏾 👏🏾 Can you send in a pull request please? We'll need to add the example from your first post as a test.

I think you can replace this dask_array_ops.push with your version: https://github.com/pydata/xarray/blob/3960ea3ba08f81d211899827612550f6ac2de804/xarray/core/dask_array_ops.py#L56-L80

This function is expected to return a dask array, so you can just return the result of cumreduction instead of wrapping it up in a DataArray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Forward Fill not working when there are all-NaN chunks 1088893989
1001656569 https://github.com/pydata/xarray/issues/6112#issuecomment-1001656569 https://api.github.com/repos/pydata/xarray/issues/6112 IC_kwDOAMm_X847tBD5 josephnowak 25071375 2021-12-27T17:00:53Z 2021-12-27T17:00:53Z CONTRIBUTOR

Probably using the logic of the cumsum and cumprod of dask you can implement the forward fill. I check a little bit the dask code that is on Xarray and apparently none of them use the HighLevelGraph so if the idea is to avoid building the graph manually I think that you can use the cumreduction function of dask to make the work (Probably there is a better dask function for doing this kind of computations but I haven't find it).

```py def ffill(x: xr.DataArray, dim: str, limit=None):

def _fill_with_last_one(a, b):
    # cumreduction apply the push func over all the blocks first so, 
    # the only missing part is filling the missing values using
    # the last data for every one of them
    if isinstance(a, np.ma.masked_array) or isinstance(b, np.ma.masked_array):
        a = np.ma.getdata(a)
        b = np.ma.getdata(b)
        values = np.where(~np.isnan(b), b, a)
        return np.ma.masked_array(values, mask=np.ma.getmaskarray(b))

    return np.where(~np.isnan(b), b, a)


from bottleneck import push

return xr.DataArray(
    da.reductions.cumreduction(
        func=push,
        binop=_fill_with_last_one,
        ident=np.nan,
        x=x.data,
        axis=x.dims.index(dim),
        dtype=x.dtype,
        method="sequential",
    ),
    dims=x.dims,
    coords=x.coords
)

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Forward Fill not working when there are all-NaN chunks 1088893989

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.747ms · About: xarray-datasette