html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/1844#issuecomment-1163265245,https://api.github.com/repos/pydata/xarray/issues/1844,1163265245,IC_kwDOAMm_X85FVgTd,2448579,2022-06-22T15:30:44Z,2022-06-22T15:30:44Z,MEMBER,"You can now do `month_day_str = da.time.dt.strftime(""%m-%d"")` See https://strftime.org/ for more options","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,290023410 https://github.com/pydata/xarray/issues/1844#issuecomment-1161471072,https://api.github.com/repos/pydata/xarray/issues/1844,1161471072,IC_kwDOAMm_X85FOqRg,43267076,2022-06-21T09:05:35Z,2022-06-21T09:05:53Z,NONE,"I also suggest that, for some applications, it can be useful to simply drop all the 29th of February. This is accomplished by means of: dataset = dataset.convert_calendar('365_day')","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,290023410 https://github.com/pydata/xarray/issues/1844#issuecomment-441034802,https://api.github.com/repos/pydata/xarray/issues/1844,441034802,MDEyOklzc3VlQ29tbWVudDQ0MTAzNDgwMg==,33062222,2018-11-22T13:43:23Z,2018-11-22T13:44:48Z,NONE,"For anyone stumbling upon this thread in the future, I would like to mention that I used the above grouping approach suggested by @spencerkclark for my dataset to calculate climatology with calendar day and it works smoothly. The only thing one should be careful is that you can't directly plot the data using `In[1]: da.groupby(month_day_str).mean('time').plot()` `Out[1]: TypeError: Plotting requires coordinates to be numeric or dates of type np.datetime64 or datetime.datetime.` To get around it, either use group by the > modified_ordinal _day Or convert back the grouped coordinate month_day_str to numeric. However, after doing all this I found out that the CDO function also calculates climatology by the ordinal day of the year. So, to be consistent I would stick to that method but it's anyway good to know that there is a way around to group by day and month if required in Xarray.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,290023410 https://github.com/pydata/xarray/issues/1844#issuecomment-418420696,https://api.github.com/repos/pydata/xarray/issues/1844,418420696,MDEyOklzc3VlQ29tbWVudDQxODQyMDY5Ng==,8453445,2018-09-04T15:53:10Z,2018-09-04T15:53:10Z,CONTRIBUTOR,"Thanks - i will give this a try! And thanks for the clarifications.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,290023410 https://github.com/pydata/xarray/issues/1844#issuecomment-418191318,https://api.github.com/repos/pydata/xarray/issues/1844,418191318,MDEyOklzc3VlQ29tbWVudDQxODE5MTMxOA==,6628425,2018-09-03T20:51:37Z,2018-09-03T20:55:08Z,MEMBER,"Building on the above example, if you're OK with using a coordinate of strings, the following might be a little simpler way of defining the labels to use for grouping (this is perhaps closer to a single attribute solution): ``` In [14]: month_day_str = xr.DataArray(da.indexes['time'].strftime('%m-%d'), coords=da.coords, ...: name='month_day_str') ...: In [15]: da.groupby(month_day_str).mean('time') Out[15]: array([2., 3.]) Coordinates: * month_day_str (month_day_str) object '01-01' '03-01' ``` Note #2090 / #2144 would make this more straightforward.","{""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,290023410 https://github.com/pydata/xarray/issues/1844#issuecomment-418188977,https://api.github.com/repos/pydata/xarray/issues/1844,418188977,MDEyOklzc3VlQ29tbWVudDQxODE4ODk3Nw==,6628425,2018-09-03T20:30:45Z,2018-09-03T20:30:45Z,MEMBER,"No worries @chiaral; I agree on the xarray side this isn't so well documented (you have to follow the link to the pandas description of the [datetime components](http://pandas.pydata.org/pandas-docs/stable/api.html#time-date-components)). Unfortunately there is not a simple attribute for grouping by matching month and day. It is possible to define your own vector of integers for this purpose, however. Perhaps you've already found a workaround, but just in case, here is one way to define a ""modified ordinal day"" that you can use in a `groupby` call: ``` In [1]: import xarray as xr In [2]: from datetime import datetime In [3]: dates = [datetime(1999, 1, 1), datetime(1999, 3, 1), ...: datetime(2000, 1, 1), datetime(2000, 3, 1)] ...: In [4]: da = xr.DataArray([1, 2, 3, 4], coords=[dates], dims=['time']) In [5]: not_leap_year = xr.DataArray(~da.indexes['time'].is_leap_year, coords=da.coords) In [6]: march_or_later = da.time.dt.month >= 3 In [7]: ordinal_day = da.time.dt.dayofyear In [8]: modified_ordinal_day = ordinal_day + (not_leap_year & march_or_later) In [9]: modified_ordinal_day = modified_ordinal_day.rename('modified_ordinal_day') In [10]: modified_ordinal_day Out[10]: array([ 1, 61, 1, 61]) Coordinates: * time (time) datetime64[ns] 1999-01-01 1999-03-01 2000-01-01 2000-03-01 In [11]: da.groupby(modified_ordinal_day).mean('time') Out[11]: array([2., 3.]) Coordinates: * modified_ordinal_day (modified_ordinal_day) int64 1 61 ``` Note if we use the standard ordinal day we get three groups, because of the difference between non-leap and leap years: ``` In [12]: ordinal_day Out[12]: array([ 1, 60, 1, 61]) Coordinates: * time (time) datetime64[ns] 1999-01-01 1999-03-01 2000-01-01 2000-03-01 In [13]: da.groupby(ordinal_day).mean('time') Out[13]: array([2., 2., 4.]) Coordinates: * dayofyear (dayofyear) int64 1 60 61 ```","{""total_count"": 5, ""+1"": 5, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,290023410 https://github.com/pydata/xarray/issues/1844#issuecomment-418175182,https://api.github.com/repos/pydata/xarray/issues/1844,418175182,MDEyOklzc3VlQ29tbWVudDQxODE3NTE4Mg==,8453445,2018-09-03T18:38:47Z,2018-09-03T18:38:47Z,CONTRIBUTOR,"Yes, @spencerkclark that was my initial intent. I - for some reasons, and I understand I was wrong about it, - thought that dayoftheyear would align the days always on the same grid. To be honest I have never used it until now, so I wasn't sure how it worked. I was just surprised by that behavior, which I understand is intended. It is just not explained well IMHO. If we calculate the daily climatology, the 366th day is the 31st of december of every 4 years, right? it just wasn't exactly what I expected, so I thought to put a note in this issue, which popped up when I was looking for some more details about this attribute. Said so - is there a more suitable attribute for what I want to do? This is maybe not the best place to discuss about that, I can send an email to the mailing list.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,290023410 https://github.com/pydata/xarray/issues/1844#issuecomment-417855365,https://api.github.com/repos/pydata/xarray/issues/1844,417855365,MDEyOklzc3VlQ29tbWVudDQxNzg1NTM2NQ==,6628425,2018-09-01T12:09:25Z,2018-09-01T12:09:25Z,MEMBER,"@chiaral if I understand correctly, your data does use a standard calendar, but the issue is that you would like to group values based on matching month and day numbers (e.g. all January 1st's, all January 6th's, ..., all March 2nd's etc.) rather than matching ""days since December 31st the preceding year,"" which is what the `dayofyear` attribute corresponds with. Is that right?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,290023410 https://github.com/pydata/xarray/issues/1844#issuecomment-417694660,https://api.github.com/repos/pydata/xarray/issues/1844,417694660,MDEyOklzc3VlQ29tbWVudDQxNzY5NDY2MA==,1217238,2018-08-31T15:09:56Z,2018-08-31T15:09:56Z,MEMBER,@chiaral You should take a look at CFTimeIndex which specifically was designed to solve this problem: http://xarray.pydata.org/en/stable/time-series.html#non-standard-calendars-and-dates-outside-the-timestamp-valid-range,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,290023410 https://github.com/pydata/xarray/issues/1844#issuecomment-417437968,https://api.github.com/repos/pydata/xarray/issues/1844,417437968,MDEyOklzc3VlQ29tbWVudDQxNzQzNzk2OA==,8453445,2018-08-30T19:24:46Z,2018-08-30T19:24:46Z,CONTRIBUTOR,"I am commenting on this issue, because my findings seem relevant to this example. I have just encountered an unexpected (to me) behavior of dayofyear. I have a dataset, ds: ``` Dimensions: (L: 45, S: 1168) Coordinates: * S (S) datetime64[ns] 1999-01-01T12:00:00 1999-01-06T12:00:00 ... * L (L) float64 0.0 24.0 48.0 72.0 96.0 120.0 144.0 168.0 192.0 ... Data variables: pr (S, L) float32 2.0625568e-05 3.5336856e-05 5.2443047e-05 ... truth (S, L) float32 2.0625568e-05 3.5336856e-05 5.2443047e-05 ... ``` S is my time coordinate. It is daily, but not continuous ``` array(['1999-01-01T12:00:00.000000000', '1999-01-06T12:00:00.000000000', '1999-01-11T12:00:00.000000000', ..., '2014-12-17T12:00:00.000000000', '2014-12-22T12:00:00.000000000', '2014-12-27T12:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * S (S) datetime64[ns] 1999-01-01T12:00:00 1999-01-06T12:00:00 ... ``` For example for 1999 first three months: ``` ds.S.sel(S=slice('1999-01-01','1999-03-05')) array(['1999-01-01T12:00:00.000000000', '1999-01-06T12:00:00.000000000', '1999-01-11T12:00:00.000000000', '1999-01-16T12:00:00.000000000', '1999-01-21T12:00:00.000000000', '1999-01-26T12:00:00.000000000', '1999-01-31T12:00:00.000000000', '1999-02-05T12:00:00.000000000', '1999-02-10T12:00:00.000000000', '1999-02-15T12:00:00.000000000', '1999-02-20T12:00:00.000000000', '1999-02-25T12:00:00.000000000', '1999-03-02T12:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * S (S) datetime64[ns] 1999-01-01T12:00:00 1999-01-06T12:00:00 ... ``` and for 2008: ``` broadcasted_data.S.sel(S=slice('2008-01-01','2008-03-05')) array(['2008-01-01T12:00:00.000000000', '2008-01-06T12:00:00.000000000', '2008-01-11T12:00:00.000000000', '2008-01-16T12:00:00.000000000', '2008-01-21T12:00:00.000000000', '2008-01-26T12:00:00.000000000', '2008-01-31T12:00:00.000000000', '2008-02-05T12:00:00.000000000', '2008-02-10T12:00:00.000000000', '2008-02-15T12:00:00.000000000', '2008-02-20T12:00:00.000000000', '2008-02-25T12:00:00.000000000', '2008-03-02T12:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * S (S) datetime64[ns] 2008-01-01T12:00:00 2008-01-06T12:00:00 ... ``` Please note, within the non leap (1999) or leap (2008) years, the days are the same. There are 73 S values per year. However when I groupby('S.dayofyear') things are not aligned anymore starting from March. For example, if I groupby() and print the value of dayofyear and the grouped values: ``` for k, gg in ds.groupby('S.dayofyear'): print(k) print(gg) ..... 51 ## 51st day of the year Dimensions: (L: 45, S: 16) Coordinates: * S (S) datetime64[ns] 1999-02-20T12:00:00 2000-02-20T12:00:00 ... * L (L) float64 0.0 24.0 48.0 72.0 96.0 120.0 144.0 168.0 192.0 ... Data variables: pr (S, L) float32 2.8822698e-05 3.1478736e-05 3.707411e-05 ... truth (S, L) float32 2.8387214e-05 2.8993465e-05 2.8109233e-05 ... 56 ## 56st day of the year Dimensions: (L: 45, S: 16) Coordinates: * S (S) datetime64[ns] 1999-02-25T12:00:00 2000-02-25T12:00:00 ... * L (L) float64 0.0 24.0 48.0 72.0 96.0 120.0 144.0 168.0 192.0 ... Data variables: pr (S, L) float32 3.5827405e-05 2.27847e-05 2.8826753e-05 ... truth (S, L) float32 2.9589286e-05 2.6589936e-05 2.7626802e-05 ... ``` up to here everything looks good, I have 16 values (one for each year of data) for each day of the year, but starting with March 2nd, they start getting split in two groups: ``` 61 ## 61st day of the year Dimensions: (L: 45, S: 12) Coordinates: * S (S) datetime64[ns] 1999-03-02T12:00:00 2001-03-02T12:00:00 ... * L (L) float64 0.0 24.0 48.0 72.0 96.0 120.0 144.0 168.0 192.0 ... Data variables: pr (S, L) float32 2.2245076e-05 2.9928206e-05 3.2708682e-05 ... truth (S, L) float32 2.5899697e-05 2.5815236e-05 2.6628013e-05 ... 62## 62nd day of the year Dimensions: (L: 45, S: 4) Coordinates: * S (S) datetime64[ns] 2000-03-02T12:00:00 2004-03-02T12:00:00 ... * L (L) float64 0.0 24.0 48.0 72.0 96.0 120.0 144.0 168.0 192.0 ... Data variables: pr (S, L) float32 2.3905726e-05 2.1646814e-05 1.5209519e-05 ... truth (S, L) float32 2.4452387e-05 2.5048954e-05 2.5876538e-05 ... 66## 66th day of the year Dimensions: (L: 45, S: 12) Coordinates: * S (S) datetime64[ns] 1999-03-07T12:00:00 2001-03-07T12:00:00 ... * L (L) float64 0.0 24.0 48.0 72.0 96.0 120.0 144.0 168.0 192.0 ... Data variables: pr (S, L) float32 2.60827e-05 4.9364742e-05 3.838778e-05 ... truth (S, L) float32 2.6537613e-05 2.7840171e-05 2.7700215e-05 ... 67## 67th day of the year Dimensions: (L: 45, S: 4) Coordinates: * S (S) datetime64[ns] 2000-03-07T12:00:00 2004-03-07T12:00:00 ... * L (L) float64 0.0 24.0 48.0 72.0 96.0 120.0 144.0 168.0 192.0 ... Data variables: pr (S, L) float32 1.59269e-05 2.7056101e-05 1.8332774e-05 ... truth (S, L) float32 2.1952277e-05 2.7667278e-05 2.5342364e-05 ... ``` and so on. This was unexpected to me. And not well document. It means that, especially when we calculate anomalies, we might not be aligning things correctly? or am I wrong? Is there a way to group the data by the day of the year so that everything is grouped on 366 days?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,290023410 https://github.com/pydata/xarray/issues/1844#issuecomment-359406359,https://api.github.com/repos/pydata/xarray/issues/1844,359406359,MDEyOklzc3VlQ29tbWVudDM1OTQwNjM1OQ==,1889878,2018-01-22T12:12:57Z,2018-01-22T12:12:57Z,NONE,Thanks a lot for the help!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,290023410 https://github.com/pydata/xarray/issues/1844#issuecomment-359366336,https://api.github.com/repos/pydata/xarray/issues/1844,359366336,MDEyOklzc3VlQ29tbWVudDM1OTM2NjMzNg==,10512793,2018-01-22T09:21:56Z,2018-01-22T09:21:56Z,CONTRIBUTOR,"Example for the docs proposed here: https://github.com/pydata/xarray/pull/1848","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,290023410 https://github.com/pydata/xarray/issues/1844#issuecomment-359129344,https://api.github.com/repos/pydata/xarray/issues/1844,359129344,MDEyOklzc3VlQ29tbWVudDM1OTEyOTM0NA==,1217238,2018-01-20T00:49:33Z,2018-01-20T00:49:56Z,MEMBER,"You can do this in a single step with `xarray.apply_ufunc()`, which is a sort of more flexible/powerful interface to xarray's broadcasting arithmetic. Extending the [toy weather example](http://xarray.pydata.org/en/stable/examples/weather-data.html) from the docs: ```python import xarray as xr import numpy as np import pandas as pd import seaborn as sns # pandas aware plotting library np.random.seed(123) times = pd.date_range('2000-01-01', '2001-12-31', name='time') annual_cycle = np.sin(2 * np.pi * (np.array(times.dayofyear) / 365.25 - 0.28)) base = 10 + 15 * annual_cycle.reshape(-1, 1) tmin_values = base + 3 * np.random.randn(annual_cycle.size, 3) tmax_values = base + 10 + 3 * np.random.randn(annual_cycle.size, 3) ds = xr.Dataset({'tmin': (('time', 'location'), tmin_values), 'tmax': (('time', 'location'), tmax_values)},((62, 3), (3,), (3,)) {'time': times, 'location': ['IA', 'IN', 'IL']}) # new code ds_mean = ds.groupby('time.month').mean('time') ds_std = ds.groupby('time.month').std('time') xarray.apply_ufunc(lambda x, m, s: (x - m) / s, ds.groupby('time.month'), ds_mean, ds_std) ``` The other way (about twice as slow) is to chain two calls to `groupby()`: ```python (ds.groupby('time.month') - ds_mean).groupby('time.month') / ds_std ``` I'll mark this as a documentation issue in case anyone wants to add an example to the docs.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,290023410 https://github.com/pydata/xarray/issues/1844#issuecomment-359066317,https://api.github.com/repos/pydata/xarray/issues/1844,359066317,MDEyOklzc3VlQ29tbWVudDM1OTA2NjMxNw==,1889878,2018-01-19T19:31:43Z,2018-01-19T19:43:38Z,NONE,"I end up doing the following: ``` # dset, mean, std - all XArray objects as explained above time_index = dset.time.dt.dayofyear dset_mean = mean.sel(dayofyear=time_index) dset_std = std.sel(dayofyear=time_index) new_dset = ((dset - dset_mean) / dset_std).drop(""dayofyear"") ``` One issue though is that this quite bad on memory as it constructs 3 arrays in memmory as large as the original one. If anoyne has any suggestion on how to improve this I would be very grateful. Also is it possible to compute and store `new_dset` simutlanously so I don't create it in memory?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,290023410 https://github.com/pydata/xarray/issues/1844#issuecomment-359061384,https://api.github.com/repos/pydata/xarray/issues/1844,359061384,MDEyOklzc3VlQ29tbWVudDM1OTA2MTM4NA==,1889878,2018-01-19T19:12:23Z,2018-01-19T19:12:23Z,NONE,"Thanks for the suggestion. However, option 2 and 3 are not really options, as after this, I need to provide the standardized field with the original time index. I'm using Xarray for the first time but will try to do the reindexing. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,290023410 https://github.com/pydata/xarray/issues/1844#issuecomment-359025678,https://api.github.com/repos/pydata/xarray/issues/1844,359025678,MDEyOklzc3VlQ29tbWVudDM1OTAyNTY3OA==,7747527,2018-01-19T16:55:25Z,2018-01-19T16:55:25Z,NONE,"So you got a two-year temperature field with dimension [730, 1, 481, 781], and another mean, and std data arrays of [366, 1, 481, 781] and you want to normalize the temperature field. Sorry I'm not familiar with the Xarray's groupby functions, I'll try several things before some experts jumping in. - Concat two std/mean fields along dayofyear, and reindex to the time index from the temperature data. Then you can do the (dset-mean)/std - Separate the temperature fields into two one-year chunks, reindex time to dayofyear, then do the calculation. - Flatten the spatial grid then use numpy to do the trick. I'm also interested in the right way to do it using built-in Xarray functions. I'm pretty sure there are some more clever ways to do this. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,290023410