home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

18 rows where user = 12862013 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 6

  • Add DataArray.pad, Dataset.pad, Variable.pad 8
  • Pad method 3
  • rolling.construct alignment 2
  • Skipna not working in DataArray.rolling.mean 2
  • doctest failure with numpy 1.20 2
  • fix da.pad example for numpy 1.20 1

user 1

  • mark-boer · 18 ✖

author_association 1

  • CONTRIBUTOR 18
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
774494996 https://github.com/pydata/xarray/pull/4865#issuecomment-774494996 https://api.github.com/repos/pydata/xarray/issues/4865 MDEyOklzc3VlQ29tbWVudDc3NDQ5NDk5Ng== mark-boer 12862013 2021-02-06T15:34:24Z 2021-02-06T15:34:24Z CONTRIBUTOR

@keewis, I agree a test would be nice, but what kind of test would you want to add? I mean, xarray just mimics the behaviour of np.pad and dask.array.pad. And the default behaviour of numpy was a bit funky, which is probably why they changed it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix da.pad example for numpy 1.20 802400938
773647435 https://github.com/pydata/xarray/issues/4858#issuecomment-773647435 https://api.github.com/repos/pydata/xarray/issues/4858 MDEyOklzc3VlQ29tbWVudDc3MzY0NzQzNQ== mark-boer 12862013 2021-02-04T22:33:45Z 2021-02-04T22:34:30Z CONTRIBUTOR

My suggestion is: 1. replace the example with arr.pad(x=1, constant_values=1.23456789) and mention that the float is cast to int (or would you leave the example away?) 2. open a new issue to discuss the issue of assigning float to int

I agree, I think that would be a good solutions for now, I think replacing the example is fine. Maybe we could even open a new issue, to discuss how xarray functions handle np.nan.

Is dask.array.pad gonna handle casting the same way? It would be strange if the cast to float happens, depending on the underlying array type. But that discussion should probably happen in the newly opened issue ;-)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  doctest failure with numpy 1.20 800118528
773240852 https://github.com/pydata/xarray/issues/4858#issuecomment-773240852 https://api.github.com/repos/pydata/xarray/issues/4858 MDEyOklzc3VlQ29tbWVudDc3MzI0MDg1Mg== mark-boer 12862013 2021-02-04T11:33:32Z 2021-02-04T11:34:36Z CONTRIBUTOR

Hi, we had a similar discussion in de #3596, xarray makes a distinction between np.nan and xarray.dtypes.NaN. The current behaviour is consistent with that of other xarray functions such as shift. Though, I am personally not a big fan of this distinction.

Check e.g. this comment: https://github.com/pydata/xarray/pull/3596#discussion_r388612638

The example I posted in this comment: ```

da = xr.DataArray(np.arange(9).reshape(3,3), dims=("x", "y")) da.shift(x=1, fill_value=np.nan) array([[-9223372036854775808, -9223372036854775808, -9223372036854775808], [ 0, 1, 2], [ 3, 4, 5]]) Dimensions without coordinates: x, y

da.rolling(x=3).construct("new_axis", stride=3, fill_value=np.nan) <xarray.DataArray (x: 1, y: 3, new_axis: 3)> array([[[-9223372036854775808, -9223372036854775808, 0], [-9223372036854775808, -9223372036854775808, 1], [-9223372036854775808, -9223372036854775808, 2]]]) Dimensions without coordinates: x, y, new_axis ```

Hmm, so numpy changed its behaviour? Then this example, should probably also fail in numpy 1.20.

On a side note: I am not a big fan of the example in the doctest, it displays an edge case, which is not unique to pad.

I think the nicest solution would be to make the usage xarray.dtypes.NaN and np.nan equivalent. But this would require changes in all xarray functions that take some kind of fill_value.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  doctest failure with numpy 1.20 800118528
665604427 https://github.com/pydata/xarray/issues/4278#issuecomment-665604427 https://api.github.com/repos/pydata/xarray/issues/4278 MDEyOklzc3VlQ29tbWVudDY2NTYwNDQyNw== mark-boer 12862013 2020-07-29T11:17:23Z 2020-07-29T11:17:23Z CONTRIBUTOR

I'll close this issue. Thank you @dcherian. I will have another look at the docs and if anything remains unclear, I'll have a look at adding a note or example.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Skipna not working in DataArray.rolling.mean  666880880
665312413 https://github.com/pydata/xarray/issues/4278#issuecomment-665312413 https://api.github.com/repos/pydata/xarray/issues/4278 MDEyOklzc3VlQ29tbWVudDY2NTMxMjQxMw== mark-boer 12862013 2020-07-28T22:17:34Z 2020-07-28T22:17:34Z CONTRIBUTOR

Oh good point, after reading the docs, I thought that the min_periods was only related to the edges of the array.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Skipna not working in DataArray.rolling.mean  666880880
602033234 https://github.com/pydata/xarray/pull/3596#issuecomment-602033234 https://api.github.com/repos/pydata/xarray/issues/3596 MDEyOklzc3VlQ29tbWVudDYwMjAzMzIzNA== mark-boer 12862013 2020-03-21T11:50:23Z 2020-03-21T11:50:44Z CONTRIBUTOR

Wow, didn't look for a week. Very happy 🎉! Thank you @dcherian and @max-sixty for all your input.

Must be one of the largest first contributions...

I was inspired by the hacktoberfest last year. Took a little bit over a month though 😛

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add DataArray.pad, Dataset.pad, Variable.pad 532940062
596645749 https://github.com/pydata/xarray/pull/3596#issuecomment-596645749 https://api.github.com/repos/pydata/xarray/issues/3596 MDEyOklzc3VlQ29tbWVudDU5NjY0NTc0OQ== mark-boer 12862013 2020-03-09T16:48:11Z 2020-03-09T16:48:11Z CONTRIBUTOR

:) I think we need to extrapolate indexes by default. It seems like the most sensible option.

Sorry, obviously that is a solution ;-). But I do have some concerns:

In some instances extrapolating all coords, can lead to some unwanted behaviour. Would you suggest we only interpolate the indexes?

How would we handle unsorted indexes?

How would we extrapolate all the different kind of indexes, like the MultiIndex or CategoricalIndex?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add DataArray.pad, Dataset.pad, Variable.pad 532940062
596240476 https://github.com/pydata/xarray/pull/3596#issuecomment-596240476 https://api.github.com/repos/pydata/xarray/issues/3596 MDEyOklzc3VlQ29tbWVudDU5NjI0MDQ3Ng== mark-boer 12862013 2020-03-08T19:09:17Z 2020-03-08T19:09:17Z CONTRIBUTOR

Edit: But it's more awkward for indexes than non-index coords. The index becomes less useful with non-unique values, and generally indexes don't have nulls. I'm not sure what the other options would be: to some extent it's the intersection of pad with xarray's data model.

Hmm, I don't really see a solution. What do you suggest?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add DataArray.pad, Dataset.pad, Variable.pad 532940062
595498610 https://github.com/pydata/xarray/pull/3596#issuecomment-595498610 https://api.github.com/repos/pydata/xarray/issues/3596 MDEyOklzc3VlQ29tbWVudDU5NTQ5ODYxMA== mark-boer 12862013 2020-03-05T23:34:18Z 2020-03-05T23:34:18Z CONTRIBUTOR

The big outstanding issue is what to do about dimension coordinates or indexes. Currently this PR treats all variables in coords different from those in data_vars. I think this is confusing.

I do agree it can be confusing, but it is not unique in xarray. Dataset.shift only shifts data_vars, bfill and ffill only fill data_vars, etc.

Personally I think that extrapolating data coordinates without specifically settings a keyword or flag could also be confusing. I occasionally have data in my coords that make no sense to extrapolate. I do agree that adding an extrapolate_coord option or keyword of some sorts would be cool in the future.

Both defaults could be really useful, I'm still in a bit of doubt. Are there any other people that might want to weigh in?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add DataArray.pad, Dataset.pad, Variable.pad 532940062
585612496 https://github.com/pydata/xarray/pull/3596#issuecomment-585612496 https://api.github.com/repos/pydata/xarray/issues/3596 MDEyOklzc3VlQ29tbWVudDU4NTYxMjQ5Ng== mark-boer 12862013 2020-02-13T08:37:16Z 2020-02-13T08:37:16Z CONTRIBUTOR

Hi @dcherian

Once again an excellent code review :-)

I adressed most points in your review, except for the function signature of DataArray.pad. I think I prefer the default values of None. Now numpy/Dask will throw an error when you set invalid combinations e.g. constant_values=0, and set the mode="mean". The np.nan is unfortunate, but this is the difference between np.nan and xarray.dtypes.NaN. I could set the default to dtypes.NaN, but then you lose the automatic error when setting an explicit value with mode!="constant".

I can also see that something like coords_mode being useful, but I think it would be wise to wait and see how pad gets used and what issues we run into along the way.

I also ran into a small issue with pint (https://github.com/hgrecco/pint/issues/1026), but I'll get to that once this PR is merged.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add DataArray.pad, Dataset.pad, Variable.pad 532940062
578549057 https://github.com/pydata/xarray/pull/3596#issuecomment-578549057 https://api.github.com/repos/pydata/xarray/issues/3596 MDEyOklzc3VlQ29tbWVudDU3ODU0OTA1Nw== mark-boer 12862013 2020-01-26T22:16:42Z 2020-01-26T22:16:42Z CONTRIBUTOR

Hi @dcherian and @fujiisoup, appart from the issues I raised in my last comment, I think this PR is close to done.

Thx in advance

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add DataArray.pad, Dataset.pad, Variable.pad 532940062
572931220 https://github.com/pydata/xarray/issues/3671#issuecomment-572931220 https://api.github.com/repos/pydata/xarray/issues/3671 MDEyOklzc3VlQ29tbWVudDU3MjkzMTIyMA== mark-boer 12862013 2020-01-10T08:40:35Z 2020-01-10T08:40:35Z CONTRIBUTOR

Hi @fujiisoup, thx for your response.

That is exactly what I needed and what I used to do. But I mistakenly thought that there was a performance penalty to doing this. But the performance decrease turned out to be the result of the order in which I rolled and sliced.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  rolling.construct alignment 546791416
572556658 https://github.com/pydata/xarray/issues/3671#issuecomment-572556658 https://api.github.com/repos/pydata/xarray/issues/3671 MDEyOklzc3VlQ29tbWVudDU3MjU1NjY1OA== mark-boer 12862013 2020-01-09T13:17:29Z 2020-01-09T13:17:29Z CONTRIBUTOR

Small update:

I was currently using: data = ( data.rolling(x=window_size ).construct("roll_x") .rolling(y=window_size ).construct("roll_y") .isel(x=slice(window_size - 1, None, stride), y=slice(window_size - 1, None, stride)) .stack(n=("x", "y")) )

but this was performing quite badly for larger arrays. However after having a look at DataArrayRolling and rewriting this piece of code to the following, performance was good.

data = ( data.rolling(x=window_size ).construct("roll_x") .isel(x=slice(window_size - 1, None, stride)) .rolling(y=window_size ).construct("roll_y") .isel(y=slice(window_size - 1, None, stride)) .stack(n=("x", "y")) )

I saw that DataArrayRolling.construct() also uses a slice (isel) to create the strided array, so there is not much of a performance penalty of first creating a rolling window with stride 1 and then slicing the data.

However, if you want to add strides to the 'Strided rolling' (#3607) it would be nice to still be able to create rolling windows that start at index 0.

By the way: feel free to close this issue, if you do not see a need for an alignment option.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  rolling.construct alignment 546791416
570057684 https://github.com/pydata/xarray/pull/3596#issuecomment-570057684 https://api.github.com/repos/pydata/xarray/issues/3596 MDEyOklzc3VlQ29tbWVudDU3MDA1NzY4NA== mark-boer 12862013 2020-01-01T14:29:15Z 2020-01-01T14:29:15Z CONTRIBUTOR

Hi everyone, happy new year :tada:

I feel like this PR is slowly getting into a descent shape. But I could use your input on a couple of subjects:

How should we handle the default padding? I currently pad with dtypes.NaN this is similar behaviour as used by the replaced variable.pad_with_fill_valueand is consistent with theshift` method.

We could chose to pad with 0's as numpy does it, or we could force the user to choose a constantt_value, but I don't think I would prefer this.

How should the coordinates of a DataArray/set be padded? I chose default padding except for modes "edge", "reflect", "symmetric", "wrap". @dcherian noted that this could make sense in some situations, but that I would require some further discussion.

Personally I think it more often than not it makes sense, but padding with NaN's should also work fine.

dask_array_compat Also I created a workaround for the changing of dtype by Dask.pad mode=mean and added an additional check to validate the output shape of dask.pad. Let me know what you think about this, I'm not 100% convinced...

I used a couple of # type: ignore as any work around the mypy errors lead to uglier code. But if any of you have some suggestions, I'm happy to hear them

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add DataArray.pad, Dataset.pad, Variable.pad 532940062
564904765 https://github.com/pydata/xarray/pull/3596#issuecomment-564904765 https://api.github.com/repos/pydata/xarray/issues/3596 MDEyOklzc3VlQ29tbWVudDU2NDkwNDc2NQ== mark-boer 12862013 2019-12-12T08:35:06Z 2019-12-12T08:35:06Z CONTRIBUTOR

It seems like we have some value mismatches on dask==1.2.

The reason that mean and reflect are marked as xfail are because there too there are value mismatches. I'll see if I can find a workaround for this failing test.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add DataArray.pad, Dataset.pad, Variable.pad 532940062
561372979 https://github.com/pydata/xarray/issues/2605#issuecomment-561372979 https://api.github.com/repos/pydata/xarray/issues/2605 MDEyOklzc3VlQ29tbWVudDU2MTM3Mjk3OQ== mark-boer 12862013 2019-12-03T21:50:53Z 2019-12-03T21:50:53Z CONTRIBUTOR

I will create a WIP pull request tommorow.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pad method 390774883
554998976 https://github.com/pydata/xarray/issues/2605#issuecomment-554998976 https://api.github.com/repos/pydata/xarray/issues/2605 MDEyOklzc3VlQ29tbWVudDU1NDk5ODk3Ng== mark-boer 12862013 2019-11-18T12:41:33Z 2019-11-18T12:41:33Z CONTRIBUTOR

@dcherian Thx! I found the Variable.pad_with_fill_value already, but I will also have a look at rolling and coarsen.

I have some questions related to this issue, I hope this is the correct place to ask those:

Dask added the pad method in version 1.7.0 according to its documentation. What is the minimum version of Dask that should be supported, I found Dask=1.2 is the continuous integration requirements.

Also, how should I handle implementation differences between numpy and Dask? E.g. in the current version of Dask and numpy I'm using: mode="mean" converts an array of integers to floats in Dask, but in numpy it keeps it an array of integers.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pad method 390774883
549058104 https://github.com/pydata/xarray/issues/2605#issuecomment-549058104 https://api.github.com/repos/pydata/xarray/issues/2605 MDEyOklzc3VlQ29tbWVudDU0OTA1ODEwNA== mark-boer 12862013 2019-11-02T16:16:27Z 2019-11-02T16:16:27Z CONTRIBUTOR

Has some1 started working on this? I thought I might give it a try ;-)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pad method 390774883

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.413ms · About: xarray-datasette