html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/6112#issuecomment-1001787425,https://api.github.com/repos/pydata/xarray/issues/6112,1001787425,IC_kwDOAMm_X847thAh,25071375,2021-12-27T22:44:43Z,2021-12-27T22:45:04Z,CONTRIBUTOR,I will be on the lookout for any changes that may be required.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1088893989 https://github.com/pydata/xarray/issues/6112#issuecomment-1001740657,https://api.github.com/repos/pydata/xarray/issues/6112,1001740657,IC_kwDOAMm_X847tVlx,25071375,2021-12-27T20:27:16Z,2021-12-27T20:27:16Z,CONTRIBUTOR,"Two questions: 1. Is possible to set the array used for the test_push_dask as np.array([np.nan, 1, 2, 3, np.nan, np.nan, np.nan, np.nan, 4, 5, np.nan, 6])?, using that array you can validate the test case that I put on this issue without creating another array (It's the original array but permuted). 2. Can I erase the conditional that checks for the case where all the chunks have size 1?, I think that with the new method that is not necessary. ```py # I think this is only necessary due to the use of the map_overlap of the previous method. if all(c == 1 for c in array.chunks[axis]): array = array.rechunk({axis: 2}) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1088893989 https://github.com/pydata/xarray/issues/6112#issuecomment-1001676665,https://api.github.com/repos/pydata/xarray/issues/6112,1001676665,IC_kwDOAMm_X847tF95,25071375,2021-12-27T17:53:07Z,2021-12-27T17:59:57Z,CONTRIBUTOR,"yes, of course, by the way, it would be possible to add something like the following code for the case that there is a limit? I know this code generates like 4x more tasks but at least it does the job so, probably a warning could be sufficient. (If it is not good enough to be added there is no problem, probably building the graph manually will be a better option than using this algorithm for the forward fill with limits). ```py def ffill(x: xr.DataArray, dim: str, limit=None): def _fill_with_last_one(a, b): # cumreduction apply the push func over all the blocks first so, # the only missing part is filling the missing values using # the last data for every one of them if isinstance(a, np.ma.masked_array) or isinstance(b, np.ma.masked_array): a = np.ma.getdata(a) b = np.ma.getdata(b) values = np.where(~np.isnan(b), b, a) return np.ma.masked_array(values, mask=np.ma.getmaskarray(b)) return np.where(~np.isnan(b), b, a) from bottleneck import push def _ffill(arr): return xr.DataArray( da.reductions.cumreduction( func=push, binop=_fill_with_last_one, ident=np.nan, x=arr.data, axis=arr.dims.index(dim), dtype=arr.dtype, method=""sequential"", ), dims=x.dims, coords=x.coords ) if limit is not None: axis = x.dims.index(dim) arange = xr.DataArray( da.broadcast_to( da.arange( x.shape[axis], chunks=x.chunks[axis], dtype=x.dtype ).reshape( tuple(size if i == axis else 1 for i, size in enumerate(x.shape)) ), x.shape, x.chunks ), coords=x.coords, dims=x.dims ) valid_limits = (arange - _ffill(arange.where(x.notnull(), np.nan))) <= limit return _ffill(arr).where(valid_limits, np.nan) return _ffill(arr) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1088893989 https://github.com/pydata/xarray/issues/6112#issuecomment-1001656569,https://api.github.com/repos/pydata/xarray/issues/6112,1001656569,IC_kwDOAMm_X847tBD5,25071375,2021-12-27T17:00:53Z,2021-12-27T17:00:53Z,CONTRIBUTOR,"Probably using the logic of the cumsum and cumprod of dask you can implement the forward fill. I check a little bit the dask code that is on Xarray and apparently none of them use the HighLevelGraph so if the idea is to avoid building the graph manually I think that you can use the cumreduction function of dask to make the work (Probably there is a better dask function for doing this kind of computations but I haven't find it). ```py def ffill(x: xr.DataArray, dim: str, limit=None): def _fill_with_last_one(a, b): # cumreduction apply the push func over all the blocks first so, # the only missing part is filling the missing values using # the last data for every one of them if isinstance(a, np.ma.masked_array) or isinstance(b, np.ma.masked_array): a = np.ma.getdata(a) b = np.ma.getdata(b) values = np.where(~np.isnan(b), b, a) return np.ma.masked_array(values, mask=np.ma.getmaskarray(b)) return np.where(~np.isnan(b), b, a) from bottleneck import push return xr.DataArray( da.reductions.cumreduction( func=push, binop=_fill_with_last_one, ident=np.nan, x=x.data, axis=x.dims.index(dim), dtype=x.dtype, method=""sequential"", ), dims=x.dims, coords=x.coords ) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1088893989