html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2808#issuecomment-896827548,https://api.github.com/repos/pydata/xarray/issues/2808,896827548,IC_kwDOAMm_X841dICc,102827,2021-08-11T13:28:08Z,2021-08-11T13:28:08Z,CONTRIBUTOR,"Thanks @keewis for linking the new tutorial. It helped me a lot figuring out how to use `apply_ufunc` for my 1D case. The fact that the tutorial shows the ""typical"" errors messages that you get when trying to use it, make the tutorial really nice to follow.","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,420584430
https://github.com/pydata/xarray/pull/2532#issuecomment-434966059,https://api.github.com/repos/pydata/xarray/issues/2532,434966059,MDEyOklzc3VlQ29tbWVudDQzNDk2NjA1OQ==,102827,2018-11-01T08:13:48Z,2018-11-01T08:13:48Z,CONTRIBUTOR,"Yes. Test are still failing. The PR is WIP. I just wanted to open the PR now to have the discussion here instead of in the issues.

I will work on fixing the code to pass all current test. I will also check how the rechunking affects performance.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,376162232
https://github.com/pydata/xarray/issues/2514#issuecomment-433454137,https://api.github.com/repos/pydata/xarray/issues/2514,433454137,MDEyOklzc3VlQ29tbWVudDQzMzQ1NDEzNw==,102827,2018-10-26T15:49:20Z,2018-10-31T21:14:48Z,CONTRIBUTOR,"EDIT: The issue of this post is now separated #2531

I think I have a fix, but wanted to write some failing tests before committing the changes. Doing this I discovered that also `DataArray.rolling()` does not preserve the chunksizes, apparently depending on the applied method.

```python
import pandas as pd
import numpy as np
import xarray as xr

t = pd.date_range(start='2018-01-01', end='2018-02-01', freq='H')
bar = np.sin(np.arange(len(t)))
baz = np.cos(np.arange(len(t)))

da_test = xr.DataArray(data=np.stack([bar, baz]),
                       coords={'time': t,
                               'sensor': ['one', 'two']},
                       dims=('sensor', 'time'))

print(da_test.chunk({'time': 100}).rolling(time=60).mean().chunks)

print(da_test.chunk({'time': 100}).rolling(time=60).count().chunks)
```
```
Output for `mean`: ((2,), (745,))
Output for `count`: ((2,), (100, 100, 100, 100, 100, 100, 100, 45))
Desired Output: ((2,), (100, 100, 100, 100, 100, 100, 100, 45))
```

My fix solves my initial problem, but maybe if done correctly it should also solve this bug, too.

Any idea why this depends on whether `.mean()` or `.count()` is used?

I have already pushed some [WIP changes](https://github.com/cchwala/xarray/commits/fix_dask_rolling_window_chunksize). Should I already open a PR if though most new test still fail?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,374279704
https://github.com/pydata/xarray/issues/2531#issuecomment-434843563,https://api.github.com/repos/pydata/xarray/issues/2531,434843563,MDEyOklzc3VlQ29tbWVudDQzNDg0MzU2Mw==,102827,2018-10-31T20:52:49Z,2018-10-31T20:52:49Z,CONTRIBUTOR,"The cause has been explained by @fujiisoup here https://github.com/pydata/xarray/issues/2514#issuecomment-433528586

> Nice catch!
> 
> For some historical reasons, `mean` and some reduction method uses bottleneck as default, while `count` does not.
> 
> `mean` goes through this function
> 
> [xarray/xarray/core/dask_array_ops.py](https://github.com/pydata/xarray/blob/b622c5e7da928524ef949d9e389f6c7f38644494/xarray/core/dask_array_ops.py#L23)
> 
> Line 23 in [b622c5e](/pydata/xarray/commit/b622c5e7da928524ef949d9e389f6c7f38644494)
>  def dask_rolling_wrapper(moving_func, a, window, min_count=None, axis=-1): 
> 
> It looks there is another but for this function.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,376154741
https://github.com/pydata/xarray/issues/2514#issuecomment-433992180,https://api.github.com/repos/pydata/xarray/issues/2514,433992180,MDEyOklzc3VlQ29tbWVudDQzMzk5MjE4MA==,102827,2018-10-29T17:01:12Z,2018-10-29T17:01:12Z,CONTRIBUTOR,"@dcherian Okay. A WIP PR will follow, but might take some days.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,374279704
https://github.com/pydata/xarray/issues/2514#issuecomment-433369567,https://api.github.com/repos/pydata/xarray/issues/2514,433369567,MDEyOklzc3VlQ29tbWVudDQzMzM2OTU2Nw==,102827,2018-10-26T10:53:32Z,2018-10-26T10:53:32Z,CONTRIBUTOR,Thanks @fujiisoup for the quick response and the pointers. I will have a look and report back if a PR is within my capabilities or not.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,374279704
https://github.com/pydata/xarray/issues/2514#issuecomment-433346685,https://api.github.com/repos/pydata/xarray/issues/2514,433346685,MDEyOklzc3VlQ29tbWVudDQzMzM0NjY4NQ==,102827,2018-10-26T09:27:19Z,2018-10-26T09:27:19Z,CONTRIBUTOR,"The problem seems to occur here

https://github.com/pydata/xarray/blob/5940100761478604080523ebb1291ecff90e779e/xarray/core/missing.py#L368-L376

because of the usage of `.construct()`. A quick try without it, shows that the chunksize is preserved then.

Hence, [`.construct()`](https://github.com/pydata/xarray/blob/5940100761478604080523ebb1291ecff90e779e/xarray/core/rolling.py#L169) might need a fix for correctly dealing with the chunks of `dask.arrays`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,374279704
https://github.com/pydata/xarray/issues/1836#issuecomment-361532119,https://api.github.com/repos/pydata/xarray/issues/1836,361532119,MDEyOklzc3VlQ29tbWVudDM2MTUzMjExOQ==,102827,2018-01-30T09:32:26Z,2018-01-30T09:32:26Z,CONTRIBUTOR,"Thanks @jhamman for looking into this.

Currently I am fine with using `persist()` since I can break down my analysis workflow to certain time periods for which data fits into RAM on a large machine. As I have written, the distributed scheduler failed for me because of #1464. But I would like to use it in the future. From other discussions on the dask schedulers (here or on SO) using the distributed scheduler seems to be a general recommendation anyway.

In summary, I am fine with my current workaround. I do not think that solving this issue has a high priority, in particular when the distributed scheduler is further improved. The main annoyance was to track down the problem described in my first post. Hence, maybe the limitations of the schedulers could be described a bit better in the documentation. Would you want a PR on this?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,289342234
https://github.com/pydata/xarray/issues/1836#issuecomment-358445479,https://api.github.com/repos/pydata/xarray/issues/1836,358445479,MDEyOklzc3VlQ29tbWVudDM1ODQ0NTQ3OQ==,102827,2018-01-17T21:07:43Z,2018-01-17T21:07:43Z,CONTRIBUTOR,"Thanks for the quick answer. 

The problem is that my actual use case also involves writing back a `xarray.Dataset` via `to_netcdf()`. I left this out of the example above to isolate the problem. With the `distributed` scheduler and `to_netcdf()`, I ran into this issue #1464. As I can see, this might be fixed ""soon"" (#1793).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,289342234
https://github.com/pydata/xarray/pull/1414#issuecomment-317786250,https://api.github.com/repos/pydata/xarray/issues/1414,317786250,MDEyOklzc3VlQ29tbWVudDMxNzc4NjI1MA==,102827,2017-07-25T16:03:46Z,2017-07-25T16:03:46Z,CONTRIBUTOR,"@jhamman @shoyer 
This should be ready to merge.

Should I open an xarray issue concerning the bug with `pandas.to_timedelta()` or is it enough to have the issue I submitted for pandas? I think the bug should be resolved in xarray when it is resolved in pandas because then the [overflow check here](https://github.com/pydata/xarray/blob/master/xarray/conventions.py#L155) should catch the cases I discovered.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,229807027
https://github.com/pydata/xarray/pull/1414#issuecomment-316963228,https://api.github.com/repos/pydata/xarray/issues/1414,316963228,MDEyOklzc3VlQ29tbWVudDMxNjk2MzIyOA==,102827,2017-07-21T10:10:54Z,2017-07-21T10:10:54Z,CONTRIBUTOR,"hmm... it's still complicated. To avoid the `NaT`s in my code, I tried to extend [the current overflow check](https://github.com/pydata/xarray/blob/master/xarray/conventions.py#L155) so that it switches to `_decode_datetime_with_netcdf4()` earlier. This was my attempt:

```python
(pd.to_timedelta(flat_num_dates.min(), delta) -
 pd.to_timedelta(1, 'd') +
 ref_date)
(pd.to_timedelta(flat_num_dates.max(), delta) +
 pd.to_timedelta(1, 'd') +
 ref_date)
```
But unfortunately, as shown in my notebook above, `pandas.to_timedelta()` has a bug and does not detect the overflow in those esoteric cases that I have identified... I have filed this Issue pandas-dev/pandas/issues/17037 because it should be solved there.

Since I do not think this will be fixed soon (I would gladly look at it, but have no time and probably not enough knowledge about the `pandas` core stuff), I am not sure what to do. 

Do you want to merge this PR, knowing that there still is the overflow issue that was in the code before? Or should I continue to try to fix the current overflow bug in this PR?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,229807027
https://github.com/pydata/xarray/pull/1414#issuecomment-315643209,https://api.github.com/repos/pydata/xarray/issues/1414,315643209,MDEyOklzc3VlQ29tbWVudDMxNTY0MzIwOQ==,102827,2017-07-16T22:41:50Z,2017-07-16T22:41:50Z,CONTRIBUTOR,"...but wait. The `NaT`s that my code produces beyond the int64 overflow should be valid dates, produced using `_decode_datetime_with_netcdf4`, right? 

Hence, I should still add a check for `NaT` results and fall back to the netcdf version then.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,229807027
https://github.com/pydata/xarray/pull/1414#issuecomment-315637844,https://api.github.com/repos/pydata/xarray/issues/1414,315637844,MDEyOklzc3VlQ29tbWVudDMxNTYzNzg0NA==,102827,2017-07-16T21:15:04Z,2017-07-16T21:34:12Z,CONTRIBUTOR,"@jhamman - I found some differences between the old code in master an my code when decoding values close to the np.datetime64 overflow. My code produces `NaT` where the old code returned some date. 

First, I wanted to test and fix that. However, I may have found that the old implementation did not behave correctly when crossing the ""overflow"" line just slightly.

I have summed that up in a notebook [here](https://gist.github.com/cchwala/25efac7857c9b53f6f81d6fa44135a45).

My conclusion would be, that the code in this PR here is not only faster, but also more correct than the old one. However, since it is quite late in the evening and my head needs some rest, I would like to get a second (or third) opinion...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,229807027
https://github.com/pydata/xarray/pull/1414#issuecomment-315322859,https://api.github.com/repos/pydata/xarray/issues/1414,315322859,MDEyOklzc3VlQ29tbWVudDMxNTMyMjg1OQ==,102827,2017-07-14T10:05:04Z,2017-07-14T10:05:04Z,CONTRIBUTOR,"@jhamman - Sorry. I was away from office (and everything related to work) for more than a month and had to catchup with a lot of things. I will sum up my stuff and post here, hopefully after todays lunch break.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,229807027
https://github.com/pydata/xarray/pull/1414#issuecomment-305469383,https://api.github.com/repos/pydata/xarray/issues/1414,305469383,MDEyOklzc3VlQ29tbWVudDMwNTQ2OTM4Mw==,102827,2017-06-01T11:43:27Z,2017-06-01T11:43:27Z,CONTRIBUTOR,"Just a short notice. Sorry, for the delay. I am still working on this PR, but I am too busy right now to finish the overflow testing. I think I found some edge cases which have to be handled. I will provide more details soon.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,229807027
https://github.com/pydata/xarray/pull/1414#issuecomment-302943727,https://api.github.com/repos/pydata/xarray/issues/1414,302943727,MDEyOklzc3VlQ29tbWVudDMwMjk0MzcyNw==,102827,2017-05-21T15:28:15Z,2017-05-21T15:28:15Z,CONTRIBUTOR,"Thanks @shoyer and @jhamman for the feedback. I will change things accordingly. 

Concerning tests, I will think again about additional checking for correct handling of overflow. I must admit, that I am not 100% sure that every case is handled correctly by the current code and checked by the current tests. Will have to think about it a little when I find time within the next days...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,229807027
https://github.com/pydata/xarray/issues/1399#issuecomment-300072972,https://api.github.com/repos/pydata/xarray/issues/1399,300072972,MDEyOklzc3VlQ29tbWVudDMwMDA3Mjk3Mg==,102827,2017-05-09T06:26:36Z,2017-05-09T06:26:36Z,CONTRIBUTOR,Okay. I will try to come up with a PR within the next days.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,226549366
https://github.com/pydata/xarray/issues/1399#issuecomment-299819380,https://api.github.com/repos/pydata/xarray/issues/1399,299819380,MDEyOklzc3VlQ29tbWVudDI5OTgxOTM4MA==,102827,2017-05-08T09:32:58Z,2017-05-08T09:32:58Z,CONTRIBUTOR,"Hmm... The ""nanosecond""-issue seems to need a fix very much at the foundation. As long as pandas and xarray rely on `datetime64[ns]` you cannot avoid nanoseconds, right? `pd.to_datetime()` [forces the conversion to nanoscends ](https://github.com/pandas-dev/pandas/blob/c8dafb5a7ae9fe42b9d15c47082a6fb139e78b5d/pandas/core/tools/timedeltas.py#L156)even if you pass integers but for a time `unit` different to `ns`. This does not make me as nervous as Fabien since my data is always quite recent, but I see that this is far from ideal for a tool for climate scientists.

An intermediate fix (@shoyer, do you actually want one?) that I could think of for the performance issue right now would be to do the conversion to `datetime64[ns]` depending on the time unit, e.g.

- multiply raw values (most likely floats) with number of nanoseconds in time `unit` for units smaller then days (or hours?) and use these values as integers in `pd.to_datetime()`
- else, fall back to using netCDF4/netcdftime for months and years (as suggested by shoyer) casting the raw values to floats

The only thing that bothers me is that I am not sure if the ""number of nanoseconds"" is always the same in every day or hour in the view of `datetime64`, due to leap seconds or other particularities.

@shoyer: Does this sound reasonable or did I forget to take into account any side effects?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,226549366