home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

2 rows where state = "open", type = "issue" and user = 5572303 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 1

  • issue · 2 ✖

state 1

  • open · 2 ✖

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
182667672 MDU6SXNzdWUxODI2Njc2NzI= 1046 center=True for xarray.DataArray.rolling() chunweiyuan 5572303 open 0     8 2016-10-13T00:37:25Z 2024-04-04T21:06:57Z   CONTRIBUTOR      

The logic behind setting center=True confuses me. Say window size = 3. The default behavior (center=False) sets the window to go from i-2 to i, so I would've expected center=True to set the window from i-1 to i+1. But that's not what I see.

For example, this is what data looks like:

```

data = xr.DataArray(np.arange(27).reshape(3, 3, 3), coords=[('x', ['a', 'b', 'c']), ('y', [-2, 0, 2]), ('z', [0, 1 ,2])])

data xarray.DataArray (x: 3, y: 3, z: 3), array([[[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8]],

   [[ 9, 10, 11],
    [12, 13, 14],
    [15, 16, 17]],

   [[18, 19, 20],
    [21, 22, 23],
    [24, 25, 26]]])

Coordinates: * x (x) |S1 'a' 'b' 'c' * y (y) int64 -2 0 2 * z (z) int64 0 1 2 ```

Now, if I set y-window size = 3, center = False, min # of entries = 1, I get

```

r = data.rolling(y=3, center=False, min_periods=1) r.mean() <xarray.DataArray (x: 3, y: 3, z: 3)> array([[[ 0. , 1. , 2. ], [ 1.5, 2.5, 3.5], [ 3. , 4. , 5. ]],

   [[  9. ,  10. ,  11. ],
    [ 10.5,  11.5,  12.5],
    [ 12. ,  13. ,  14. ]],

   [[ 18. ,  19. ,  20. ],
    [ 19.5,  20.5,  21.5],
    [ 21. ,  22. ,  23. ]]])

Coordinates: * x (x) |S1 'a' 'b' 'c' * y (y) int64 -2 0 2 * z (z) int64 0 1 2 ```

Which essentially gives me a "trailing window" of size 3, meaning the window goes from i-2 to i. This is not explained in the doc but can be understood empirically.

On the other hand, setting center = True gives

```

r = data.rolling(y=3, center=True, min_periods=1) r.mean() <xarray.DataArray (x: 3, y: 3, z: 3)> array([[[ 1.5, 2.5, 3.5], [ 3. , 4. , 5. ], [ nan, nan, nan]],

   [[ 10.5,  11.5,  12.5],
    [ 12. ,  13. ,  14. ],
    [  nan,   nan,   nan]],

   [[ 19.5,  20.5,  21.5],
    [ 21. ,  22. ,  23. ],
    [  nan,   nan,   nan]]])

Coordinates: * x (x) |S1 'a' 'b' 'c' * y (y) int64 -2 0 2 * z (z) int64 0 1 2 ```

In other words, it just pushes every cell up the y-dim by 1, using nan to represent things coming off the edge of the universe. If you look at _center_result() of xarray/core/rolling.py, that's exactly what it does with .shift().

I would've expected center=True to change the window to go from i-1 to i+1. In which case, with min_periods=1, would not render any nan value in r.mean().

Could someone explain the logical flow to me?

Much obliged,

Chun

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1046/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
221366244 MDU6SXNzdWUyMjEzNjYyNDQ= 1371 Weighted quantile chunweiyuan 5572303 open 0     8 2017-04-12T19:29:04Z 2019-03-20T22:34:22Z   CONTRIBUTOR      

For our work we frequently need to compute weighted quantiles. This is especially important when we need to weigh data from recent years more heavily in making predictions.

I've put together a function (called weighted_quantile) largely based on the source code of np.percentile. It allows one to input weights along a single dimension, as a dict w_dict. Below are some manual tests:

When all weights = 1, it's identical to using np.nanpercentile: ```

ar0 <xarray.DataArray (x: 3, y: 4)> array([[3, 4, 8, 1], [5, 3, 7, 9], [4, 9, 6, 2]]) Coordinates: * x (x) |S1 'a' 'b' 'c' * y (y) int64 0 1 2 3 ar0.quantile(q=[0.25, 0.5, 0.75], dim='y') <xarray.DataArray (quantile: 3, x: 3)> array([[ 2.5 , 4.5 , 3.5 ], [ 3.5 , 6. , 5. ], [ 5. , 7.5 , 6.75]]) Coordinates: * x (x) |S1 'a' 'b' 'c' * quantile (quantile) float64 0.25 0.5 0.75 weighted_quantile(da=ar0, q=[0.25, 0.5, 0.75], dim='y', w_dict={'y': [1,1,1,1]}) <xarray.DataArray (quantile: 3, x: 3)> array([[ 2.5 , 4.5 , 3.5 ], [ 3.5 , 6. , 5. ], [ 5. , 7.5 , 6.75]]) Coordinates: * x (x) |S1 'a' 'b' 'c' * quantile (quantile) float64 0.25 0.5 0.75 ```

Now different weights: ```

weighted_quantile(da=ar0, q=[0.25, 0.5, 0.75], dim='y', w_dict={'y': [1,2,3,4.0]}) <xarray.DataArray (quantile: 3, x: 3)> array([[ 3.25 , 5.666667, 4.333333], [ 4. , 7. , 5.333333], [ 6. , 8. , 6.75 ]]) Coordinates: * x (x) |S1 'a' 'b' 'c' * quantile (quantile) float64 0.25 0.5 0.75 ```

Also handles nan values like np.nanpercentile: ```

ar <xarray.DataArray (x: 2, y: 2, z: 2)> array([[[ nan, 3.], [ nan, 5.]],

   [[  8.,   1.],
    [ nan,   0.]]])

Coordinates: * x (x) |S1 'a' 'b' * y (y) int64 0 1 * z (z) int64 8 9

da_stacked = ar.stack(mi=['x', 'y']) out = weighted_quantile(da=ar, q=[0.25, 0.5, 0.75], dim=['x', 'y'], w_dict={'x': [1, 1]}) out <xarray.DataArray (quantile: 3, z: 2)> array([[ 8. , 0.75], [ 8. , 2. ], [ 8. , 3.5 ]]) Coordinates: * z (z) int64 8 9 * quantile (quantile) float64 0.25 0.5 0.75 da_stacked.quantile(q=[0.25, 0.5, 0.75], dim='mi') <xarray.DataArray (quantile: 3, z: 2)> array([[ 8. , 0.75], [ 8. , 2. ], [ 8. , 3.5 ]]) Coordinates: * z (z) int64 8 9 * quantile (quantile) float64 0.25 0.5 0.75 ```

Lastly, different interpolation schemes are consistent: ```

out = weighted_quantile(da=ar, q=[0.25, 0.5, 0.75], dim=['x', 'y'], w_dict={'x': [1, 1]}, interpolation='nearest') out <xarray.DataArray (quantile: 3, z: 2)> array([[ 8., 1.], [ 8., 3.], [ 8., 3.]]) Coordinates: * z (z) int64 8 9 * quantile (quantile) float64 0.25 0.5 0.75 da_stacked.quantile(q=[0.25, 0.5, 0.75], dim='mi', interpolation='nearest') <xarray.DataArray (quantile: 3, z: 2)> array([[ 8., 1.], [ 8., 3.], [ 8., 3.]]) Coordinates: * z (z) int64 8 9 * quantile (quantile) float64 0.25 0.5 0.75 ```

We wonder if it's ok to make this part of xarray. If so, the most logical place to implement it would seem to be in Variable.quantile(). Another option is to make it a utility function, to be called as xr.weighted_quantile().

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1371/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 3676.118ms · About: xarray-datasette