home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 417437968

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1844#issuecomment-417437968 https://api.github.com/repos/pydata/xarray/issues/1844 417437968 MDEyOklzc3VlQ29tbWVudDQxNzQzNzk2OA== 8453445 2018-08-30T19:24:46Z 2018-08-30T19:24:46Z CONTRIBUTOR

I am commenting on this issue, because my findings seem relevant to this example.

I have just encountered an unexpected (to me) behavior of dayofyear.

I have a dataset, ds:

<xarray.Dataset> Dimensions: (L: 45, S: 1168) Coordinates: * S (S) datetime64[ns] 1999-01-01T12:00:00 1999-01-06T12:00:00 ... * L (L) float64 0.0 24.0 48.0 72.0 96.0 120.0 144.0 168.0 192.0 ... Data variables: pr (S, L) float32 2.0625568e-05 3.5336856e-05 5.2443047e-05 ... truth (S, L) float32 2.0625568e-05 3.5336856e-05 5.2443047e-05 ...

S is my time coordinate. It is daily, but not continuous

<xarray.DataArray 'S' (S: 1168)> array(['1999-01-01T12:00:00.000000000', '1999-01-06T12:00:00.000000000', '1999-01-11T12:00:00.000000000', ..., '2014-12-17T12:00:00.000000000', '2014-12-22T12:00:00.000000000', '2014-12-27T12:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * S (S) datetime64[ns] 1999-01-01T12:00:00 1999-01-06T12:00:00 ...

For example for 1999 first three months:

``` ds.S.sel(S=slice('1999-01-01','1999-03-05'))

<xarray.DataArray 'S' (S: 13)> array(['1999-01-01T12:00:00.000000000', '1999-01-06T12:00:00.000000000', '1999-01-11T12:00:00.000000000', '1999-01-16T12:00:00.000000000', '1999-01-21T12:00:00.000000000', '1999-01-26T12:00:00.000000000', '1999-01-31T12:00:00.000000000', '1999-02-05T12:00:00.000000000', '1999-02-10T12:00:00.000000000', '1999-02-15T12:00:00.000000000', '1999-02-20T12:00:00.000000000', '1999-02-25T12:00:00.000000000', '1999-03-02T12:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * S (S) datetime64[ns] 1999-01-01T12:00:00 1999-01-06T12:00:00 ... ```

and for 2008:

``` broadcasted_data.S.sel(S=slice('2008-01-01','2008-03-05'))

<xarray.DataArray 'S' (S: 13)> array(['2008-01-01T12:00:00.000000000', '2008-01-06T12:00:00.000000000', '2008-01-11T12:00:00.000000000', '2008-01-16T12:00:00.000000000', '2008-01-21T12:00:00.000000000', '2008-01-26T12:00:00.000000000', '2008-01-31T12:00:00.000000000', '2008-02-05T12:00:00.000000000', '2008-02-10T12:00:00.000000000', '2008-02-15T12:00:00.000000000', '2008-02-20T12:00:00.000000000', '2008-02-25T12:00:00.000000000', '2008-03-02T12:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * S (S) datetime64[ns] 2008-01-01T12:00:00 2008-01-06T12:00:00 ... ```

Please note, within the non leap (1999) or leap (2008) years, the days are the same. There are 73 S values per year.

However when I groupby('S.dayofyear') things are not aligned anymore starting from March.

For example, if I groupby() and print the value of dayofyear and the grouped values:

``` for k, gg in ds.groupby('S.dayofyear'): print(k) print(gg)

..... 51 ## 51st day of the year <xarray.Dataset> Dimensions: (L: 45, S: 16) Coordinates: * S (S) datetime64[ns] 1999-02-20T12:00:00 2000-02-20T12:00:00 ... * L (L) float64 0.0 24.0 48.0 72.0 96.0 120.0 144.0 168.0 192.0 ... Data variables: pr (S, L) float32 2.8822698e-05 3.1478736e-05 3.707411e-05 ... truth (S, L) float32 2.8387214e-05 2.8993465e-05 2.8109233e-05 ... 56 ## 56st day of the year <xarray.Dataset> Dimensions: (L: 45, S: 16) Coordinates: * S (S) datetime64[ns] 1999-02-25T12:00:00 2000-02-25T12:00:00 ... * L (L) float64 0.0 24.0 48.0 72.0 96.0 120.0 144.0 168.0 192.0 ... Data variables: pr (S, L) float32 3.5827405e-05 2.27847e-05 2.8826753e-05 ... truth (S, L) float32 2.9589286e-05 2.6589936e-05 2.7626802e-05 ...

``` up to here everything looks good, I have 16 values (one for each year of data) for each day of the year, but starting with March 2nd, they start getting split in two groups:

``` 61 ## 61st day of the year <xarray.Dataset> Dimensions: (L: 45, S: 12) Coordinates: * S (S) datetime64[ns] 1999-03-02T12:00:00 2001-03-02T12:00:00 ... * L (L) float64 0.0 24.0 48.0 72.0 96.0 120.0 144.0 168.0 192.0 ... Data variables: pr (S, L) float32 2.2245076e-05 2.9928206e-05 3.2708682e-05 ... truth (S, L) float32 2.5899697e-05 2.5815236e-05 2.6628013e-05 ... 62## 62nd day of the year <xarray.Dataset> Dimensions: (L: 45, S: 4) Coordinates: * S (S) datetime64[ns] 2000-03-02T12:00:00 2004-03-02T12:00:00 ... * L (L) float64 0.0 24.0 48.0 72.0 96.0 120.0 144.0 168.0 192.0 ... Data variables: pr (S, L) float32 2.3905726e-05 2.1646814e-05 1.5209519e-05 ... truth (S, L) float32 2.4452387e-05 2.5048954e-05 2.5876538e-05 ... 66## 66th day of the year <xarray.Dataset> Dimensions: (L: 45, S: 12) Coordinates: * S (S) datetime64[ns] 1999-03-07T12:00:00 2001-03-07T12:00:00 ... * L (L) float64 0.0 24.0 48.0 72.0 96.0 120.0 144.0 168.0 192.0 ... Data variables: pr (S, L) float32 2.60827e-05 4.9364742e-05 3.838778e-05 ... truth (S, L) float32 2.6537613e-05 2.7840171e-05 2.7700215e-05 ... 67## 67th day of the year <xarray.Dataset> Dimensions: (L: 45, S: 4) Coordinates: * S (S) datetime64[ns] 2000-03-07T12:00:00 2004-03-07T12:00:00 ... * L (L) float64 0.0 24.0 48.0 72.0 96.0 120.0 144.0 168.0 192.0 ... Data variables: pr (S, L) float32 1.59269e-05 2.7056101e-05 1.8332774e-05 ... truth (S, L) float32 2.1952277e-05 2.7667278e-05 2.5342364e-05 ...

```

and so on.

This was unexpected to me. And not well document. It means that, especially when we calculate anomalies, we might not be aligning things correctly? or am I wrong? Is there a way to group the data by the day of the year so that everything is grouped on 366 days?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  290023410
Powered by Datasette · Queries took 323.429ms · About: xarray-datasette