html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/pull/6825#issuecomment-1194615618,https://api.github.com/repos/pydata/xarray/issues/6825,1194615618,IC_kwDOAMm_X85HNGNC,8881170,2022-07-25T20:52:55Z,2022-07-25T20:52:55Z,CONTRIBUTOR,Thanks @dcherian!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1317320059 https://github.com/pydata/xarray/pull/6825#issuecomment-1194581162,https://api.github.com/repos/pydata/xarray/issues/6825,1194581162,IC_kwDOAMm_X85HM9yq,8881170,2022-07-25T20:22:28Z,2022-07-25T20:22:28Z,CONTRIBUTOR,Is there some `#noqa` equivalent to avoid testing the docstring example here? Or should I be pointing to a test dataset to open?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1317320059 https://github.com/pydata/xarray/issues/4922#issuecomment-791465015,https://api.github.com/repos/pydata/xarray/issues/4922,791465015,MDEyOklzc3VlQ29tbWVudDc5MTQ2NTAxNQ==,8881170,2021-03-05T14:47:46Z,2021-03-05T14:47:46Z,CONTRIBUTOR,"> I feel like this should not work i.e. rolling window length (6) < size along axis (3). So the bottleneck error seems right. This is normally the case, but with `min_periods=1` it should just return the given value so long as there's at least one observation (as in case #2, where the boundaries return as normal and the middle number is smoothed). Thanks for the pointer on #4977!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,811321550 https://github.com/pydata/xarray/issues/4922#issuecomment-790986252,https://api.github.com/repos/pydata/xarray/issues/4922,790986252,MDEyOklzc3VlQ29tbWVudDc5MDk4NjI1Mg==,8881170,2021-03-04T22:21:37Z,2021-03-04T22:32:01Z,CONTRIBUTOR,"@dcherian, to add to the complexity here, it's even weirder than originally reported. See my test cases below. This might alter how this bug is approached. ```python import xarray as xr def _rolling(ds): return ds.rolling(time=6, center=False, min_periods=1).mean() # Length 3 array to test that min_periods is called in, despite asking # for 6 time-steps of smoothing ds = xr.DataArray([1, 2, 3], dims='time') ds['time'] = xr.cftime_range(start='2021-01-01', freq='D', periods=3) ``` ### 1. With `bottleneck` installed, `min_periods` is ignored as a kwarg with in-memory arrays. (`bottleneck` installed) ```python # Just apply rolling to the base array. ds.rolling(time=6, center=False, min_periods=1).mean() >>> ValueError: Moving window (=6) must between 1 and 3, inclusive # Group into single day climatology groups and apply ds.groupby('time.dayofyear').map(_rolling) >>> ValueError: Moving window (=6) must between 1 and 1, inclusive ``` ### 2. With `bottleneck` uninstalled, `min_periods` works with in-memory arrays. (`bottleneck` uninstalled) ```python # Just apply rolling to the base array. ds.rolling(time=6, center=False, min_periods=1).mean() >>> >>> array([1. , 1.5, 2. ]) >>> Coordinates: >>> * time (time) object 2021-01-01 00:00:00 ... 2021-01-03 00:00:00 # Group into single day climatology groups and apply ds.groupby('time.dayofyear').map(_rolling) >>> >>> array([1., 2., 3.]) >>> Coordinates: >>> * time (time) object 2021-01-01 00:00:00 ... 2021-01-03 00:00:00 ``` ### 3. Regardless of `bottleneck`, `dask` objects ignore `min_period` when a `groupby` object. This specifically seems like an issue with `.map()` (independent of `bottleneck` installation) ```python # Just apply rolling to the base array. ds.chunk().rolling(time=6, center=False, min_periods=1).mean().compute() >>> >>> array([1. , 1.5, 2. ]) >>> Coordinates: >>> * time (time) object 2021-01-01 00:00:00 ... 2021-01-03 00:00:00 # Group into single day climatology groups and apply ds.chunk().groupby('time.dayofyear').map(_rolling) >>> ValueError: For window size 6, every chunk should be larger than 3, but the smallest chunk size is 1. >>> Rechunk your array >>> with a larger chunk size or a chunk size that >>> more evenly divides the shape of your array. ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,811321550 https://github.com/pydata/xarray/issues/3813#issuecomment-655142333,https://api.github.com/repos/pydata/xarray/issues/3813,655142333,MDEyOklzc3VlQ29tbWVudDY1NTE0MjMzMw==,8881170,2020-07-07T21:22:30Z,2020-07-07T21:22:30Z,CONTRIBUTOR,"FYI, this is also seen on `xr.apply_ufunc`, but only when `vectorize=True`. It seems like ndarrays write switch are turned off when `vectorize=True`. This is also solved by `.copy()`, which is good anways to avoid mutating the original ndarrays. Perhaps also a `copy=bool` could be added to `apply_ufunc` to create copies of the ndarrays? I'd be happy to lead that PR if it makes sense. Example: ```python def match_nans(a, b): """"""Pairwise matching of nans between two time series."""""" # Try with and without `.copy` commands. # a = a.copy() # b = b.copy() if np.isnan(a).any() or np.isnan(b).any(): idx = np.logical_or(np.isnan(a), np.isnan(b)) a[idx], b[idx] = np.nan, np.nan return a, b A = xr.DataArray(np.random.rand(10, 5), dims=['time', 'space']) B = xr.DataArray(np.random.rand(10, 5), dims=['time', 'space']) A[0, 1] = np.nan B[5, 0] = np.nan xr.apply_ufunc(match_nans, A, B, input_core_dims=[['time'], ['time']], output_core_dims=[['time'], ['time']], # Try with and without vectorize. vectorize=True,) ```","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,573031381 https://github.com/pydata/xarray/issues/1815#issuecomment-628135082,https://api.github.com/repos/pydata/xarray/issues/1815,628135082,MDEyOklzc3VlQ29tbWVudDYyODEzNTA4Mg==,8881170,2020-05-13T17:27:06Z,2020-05-13T17:27:06Z,CONTRIBUTOR,"> > So would you be re-doing the same computation by running .compute() separately on these objects? > > Yes. but you can do `dask.compute(xarray_obj1, xarray_obj2,...)` or combine those objects appropriately into a Dataset and then call compute on that. Good call. I figured there was a workaround.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,287223508 https://github.com/pydata/xarray/issues/1815#issuecomment-628070696,https://api.github.com/repos/pydata/xarray/issues/1815,628070696,MDEyOklzc3VlQ29tbWVudDYyODA3MDY5Ng==,8881170,2020-05-13T15:33:56Z,2020-05-13T15:33:56Z,CONTRIBUTOR,"One issue I see is that this would return multiple dask objects, correct? So to get the results from them, you'd have to run `.compute()` on each separately. I think it's a valid assumption to expect that the multiple output objects would share a lot of the same computational pipeline. So would you be re-doing the same computation by running `.compute()` separately on these objects? The earlier mentioned code snippets provide a nice path forward, since you can just run compute on one object, and then split its `result` (or however you name it) dimension into multiple individual objects. Thoughts?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,287223508 https://github.com/pydata/xarray/pull/3816#issuecomment-624158963,https://api.github.com/repos/pydata/xarray/issues/3816,624158963,MDEyOklzc3VlQ29tbWVudDYyNDE1ODk2Mw==,8881170,2020-05-05T16:28:26Z,2020-05-05T16:28:26Z,CONTRIBUTOR,"I missed this originally @dcherian, but thanks for the great work here. The docs changes are a great help.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,573768194 https://github.com/pydata/xarray/issues/1815#issuecomment-614244205,https://api.github.com/repos/pydata/xarray/issues/1815,614244205,MDEyOklzc3VlQ29tbWVudDYxNDI0NDIwNQ==,8881170,2020-04-15T19:45:50Z,2020-04-15T19:45:50Z,CONTRIBUTOR,"I think ideally it would be nice to return multiple DataArrays or a Dataset of variables. But I'm really happy with this solution. I'm using it on a 600GB dataset of particle trajectories and was able to write a ufunc to go through and return each particle's x, y, z location when it met a certain condition. I think having something simple like the stackoverflow snippet I posted would be great for the docs as an `apply_ufunc` example. I'd be happy to lead this if folks think it's a good idea.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,287223508 https://github.com/pydata/xarray/issues/1815#issuecomment-614216243,https://api.github.com/repos/pydata/xarray/issues/1815,614216243,MDEyOklzc3VlQ29tbWVudDYxNDIxNjI0Mw==,8881170,2020-04-15T18:49:51Z,2020-04-15T18:49:51Z,CONTRIBUTOR,"This looks essentially the same to @stefraynaud's answer, but I came across this stackoverflow response here: https://stackoverflow.com/questions/52094320/with-xarray-how-to-parallelize-1d-operations-on-a-multidimensional-dataset. @andersy005, I imagine you're far past this now. And this might have been related to discussions with Genevieve and I anyways. ```python def new_linregress(x, y): # Wrapper around scipy linregress to use in apply_ufunc slope, intercept, r_value, p_value, std_err = stats.linregress(x, y) return np.array([slope, intercept, r_value, p_value, std_err]) # return a new DataArray stats = xr.apply_ufunc(new_linregress, ds[x], ds[y], input_core_dims=[['year'], ['year']], output_core_dims=[[""parameter""]], vectorize=True, dask=""parallelized"", output_dtypes=['float64'], output_sizes={""parameter"": 5}, ) ```","{""total_count"": 3, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 3, ""rocket"": 0, ""eyes"": 0}",,287223508 https://github.com/pydata/xarray/pull/3667#issuecomment-573107748,https://api.github.com/repos/pydata/xarray/issues/3667,573107748,MDEyOklzc3VlQ29tbWVudDU3MzEwNzc0OA==,8881170,2020-01-10T16:32:47Z,2020-01-10T16:32:47Z,CONTRIBUTOR,Thanks @dcherian -- done in https://github.com/pydata/xarray/pull/3682.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,546451185 https://github.com/pydata/xarray/pull/3667#issuecomment-572688941,https://api.github.com/repos/pydata/xarray/issues/3667,572688941,MDEyOklzc3VlQ29tbWVudDU3MjY4ODk0MQ==,8881170,2020-01-09T18:23:14Z,2020-01-09T18:23:14Z,CONTRIBUTOR,"Oops, forgot to add to `whats-new`, but this is a pretty minor addition.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,546451185 https://github.com/pydata/xarray/pull/3667#issuecomment-572137657,https://api.github.com/repos/pydata/xarray/issues/3667,572137657,MDEyOklzc3VlQ29tbWVudDU3MjEzNzY1Nw==,8881170,2020-01-08T16:04:54Z,2020-01-08T16:04:54Z,CONTRIBUTOR,What's going on here? I use travis on my repos so I'm not familiar with the Azure setup. I only modified a docstring so I'm not sure why it would break the testing suite? Unless it's testing my code snippet in the docs?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,546451185 https://github.com/pydata/xarray/issues/3580#issuecomment-561261583,https://api.github.com/repos/pydata/xarray/issues/3580,561261583,MDEyOklzc3VlQ29tbWVudDU2MTI2MTU4Mw==,8881170,2019-12-03T17:02:39Z,2019-12-03T17:02:39Z,CONTRIBUTOR,"I can't seem to replicate this issue for some reason. I have the same versions of `xarray`, `numpy`, and `netCDF4` installed. ```python-traceback IndexError: The indexing operation you are attempting to perform is not valid on netCDF4.Variable object. Try loading your data into memory first by calling .load(). ``` This implies that it's having issues slicing numpy-style with a dask array. I bet if you load it into memory and slice that way it'll work. But at ~22GB you might not be able to do that. The preferred way to slice in `xarray` is to use `.sel()` and `.isel()` to leverage the label-aware nature of `xarray`. So you should have no problem doing this operation explicitly with the following: `fullda['sst'].isel(M=0, S=0, X=0, Y=0)`. You of course don't need to slice the `L` dimension since you are taking the full thing, but the equivalent notation there is :`fullda['sst'].isel(L=slice(0, None))`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,529644880 https://github.com/pydata/xarray/issues/2969#issuecomment-494059784,https://api.github.com/repos/pydata/xarray/issues/2969,494059784,MDEyOklzc3VlQ29tbWVudDQ5NDA1OTc4NA==,8881170,2019-05-20T16:30:02Z,2019-05-20T16:30:02Z,CONTRIBUTOR,Thanks for the feedback and link to the other issue. I wasn't sure what to search to find other issues on this. The coordinate transformation seems like the most straightforward approach. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,445175953