home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

19 rows where issue = 333248242 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 4

  • fujiisoup 10
  • rpnaut 6
  • shoyer 2
  • st-bender 1

author_association 3

  • MEMBER 12
  • NONE 6
  • CONTRIBUTOR 1

issue 1

  • Refactor nanops · 19 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
424700785 https://github.com/pydata/xarray/pull/2236#issuecomment-424700785 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDQyNDcwMDc4NQ== fujiisoup 6815844 2018-09-26T12:42:55Z 2018-09-26T12:42:55Z MEMBER

Thanks, @st-bender, for the bug report. I copied your comment to #2440.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
424697772 https://github.com/pydata/xarray/pull/2236#issuecomment-424697772 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDQyNDY5Nzc3Mg== st-bender 28786187 2018-09-26T12:32:34Z 2018-09-26T12:35:30Z CONTRIBUTOR

Hi, just to let you know that .std() does not accept the ddof keyword anymore (it worked in 0.10.8) Should I open a new bugreport?

Edit: It fails with:

~/Work/miniconda3/envs/stats/lib/python3.6/site-packages/xarray/core/duck_array_ops.py in f(values, axis, skipna, **kwargs)
    234 
    235         try:
--> 236             return func(values, axis=axis, **kwargs)
    237         except AttributeError:
    238             if isinstance(values, dask_array_type):

TypeError: nanstd() got an unexpected keyword argument 'ddof'
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
413446001 https://github.com/pydata/xarray/pull/2236#issuecomment-413446001 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDQxMzQ0NjAwMQ== fujiisoup 6815844 2018-08-16T06:59:37Z 2018-08-16T06:59:37Z MEMBER

Thanks for the review. Merging.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
413431477 https://github.com/pydata/xarray/pull/2236#issuecomment-413431477 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDQxMzQzMTQ3Nw== fujiisoup 6815844 2018-08-16T05:37:25Z 2018-08-16T05:37:25Z MEMBER

Thanks, @shoyer. All done.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
412249461 https://github.com/pydata/xarray/pull/2236#issuecomment-412249461 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDQxMjI0OTQ2MQ== fujiisoup 6815844 2018-08-11T04:15:30Z 2018-08-11T04:15:30Z MEMBER

Can anyone give further review?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
412249224 https://github.com/pydata/xarray/pull/2236#issuecomment-412249224 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDQxMjI0OTIyNA== fujiisoup 6815844 2018-08-11T04:08:59Z 2018-08-11T04:08:59Z MEMBER

I noticed min_count is working also on resampled object. Your issue might be for the API of resample.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
412246496 https://github.com/pydata/xarray/pull/2236#issuecomment-412246496 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDQxMjI0NjQ5Ng== fujiisoup 6815844 2018-08-11T03:00:13Z 2018-08-11T03:00:13Z MEMBER

@rpnaut

Thanks for testing.

Your min_count argument is not allowed for type 'dataset' but only for type 'dataarray'. Starting with the dataset located here:

I think it is not true. It works on Dataset, but not on resampled object. I will raise a issue for this later.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
412054270 https://github.com/pydata/xarray/pull/2236#issuecomment-412054270 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDQxMjA1NDI3MA== rpnaut 30219501 2018-08-10T11:20:40Z 2018-08-10T11:24:37Z NONE

Ok. The strange thing with the spatial dimensions is, that the new syntax forces the user to tell exactly, on which dimension the mathematical operator for resampling (like sum) should be applied. The syntax is now data.resample(time="M").sum(dim="time",min_count=1).

That weird - two times giving the dimension. However, doing so the xarray is not summing up for, e.g. the first month, all values he finds in a specific dataaray, but only those values along the dimension time.

AND NOW, the good new is that I got the following picture with your 'min_count=0' and 'min_count=1':

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
411713980 https://github.com/pydata/xarray/pull/2236#issuecomment-411713980 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDQxMTcxMzk4MA== rpnaut 30219501 2018-08-09T10:34:19Z 2018-08-09T16:15:55Z NONE

To wrap it up. Your implementation works for timeseries - data. There is something strange with time-space data, which should be fixed. If this is fixed, it is worth to test in my evaluation environment. Do you have a feeling, why the new syntax is giving such strange behaviour? Shall we put the bug onto the issue list?

And maybe, it would be interesting to have in the future the min_count argument also available for the old syntax and not only the new. The reason: The dimension name is not flexible anymore - it cannot be a variable like dim=${dim}

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
411705656 https://github.com/pydata/xarray/pull/2236#issuecomment-411705656 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDQxMTcwNTY1Ng== rpnaut 30219501 2018-08-09T10:01:32Z 2018-08-09T10:30:15Z NONE

Thanks, @fujiisoup .

I have good news and i have bad news.

A) Your min_count argument still seems to work only if using the new syntax for resample, i.e. data.resample($dim=$freq).sum(). I guess, this is due to the cancellation of the old syntax in the future. Using the old syntax data.resample(dim=$dim,freq=$freq,how=$oper) your code seems to ignore the min_count argument.

B) Your min_count argument is not allowed for type 'dataset' but only for type 'dataarray'. Starting with the dataset located here: https://swiftbrowser.dkrz.de/public/dkrz_c0725fe8741c474b97f291aac57f268f/GregorMoeller/ I've got the following message: data.resample(time="M").sum(min_count=1) TypeError: sum() got an unexpected keyword argument 'min_count'

Thus, I have tested your implementation only on dataarrays. I take the netcdf - array 'TOT_PREC' and try to compute the monthly sum: In [39]: data = array.open_dataset("eObs_gridded_0.22deg_rot_v14.0.TOT_PREC.1950-2016.nc_CutParamTimeUnitCor_FinalEvalGrid") In [40]: datamonth = data["TOT_PREC"].resample(time="M").sum() In [41]: datamonth Out[41]: <xarray.DataArray 'TOT_PREC' (time: 5)> array([ 551833.25 , 465640.09375, 328445.90625, 836892.1875 , 503601.5 ], dtype=float32) Coordinates: time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ...

So, the xarray-package is still throwing away the dimensions in x- and y-direction. It has nothing to do with any min-count argument. THIS MUST BE A BUG OF XARRAY. The afore-mentioned dimensions only survive using the old-syntax: ``` In [41]: datamonth = data["TOT_PREC"].resample(dim="time",freq="M",how="sum") /usr/bin/ipython3:1: FutureWarning: .resample() has been modified to defer calculations. Instead of passing 'dim' and how="sum", instead consider using .resample(time="M").sum('time') #!/usr/bin/env python3

In [42]: datamonth Out[42]: <xarray.DataArray 'TOT_PREC' (time: 5, rlat: 136, rlon: 144)> array([[[ 0. , 0. , ..., 0. , 0. ], [ 0. , 0. , ..., 0. , 0. ], ..., [ 0. , 0. , ..., 44.900028, 41.400024], [ 0. , 0. , ..., 49.10001 , 46.5 ]]], dtype=float32) Coordinates: * time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ... * rlon (rlon) float32 -22.6 -22.38 -22.16 -21.94 -21.72 -21.5 -21.28 ... * rlat (rlat) float32 -12.54 -12.32 -12.1 -11.88 -11.66 -11.44 -11.22 ...

```

Nevertheless, I have started to use your min_count argument only at one point (the x- and y-dimensions do not matter). In that case, your implementation works fine: ``` In [46]: pointdata = data.isel(rlon=10,rlat=10) In [47]: pointdata["TOT_PREC"] Out[47]: <xarray.DataArray 'TOT_PREC' (time: 153)> array([ nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, ... nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], dtype=float32) Coordinates: rlon float32 -20.4 rlat float32 -10.34 * time (time) datetime64[ns] 2006-05-01T12:00:00 2006-05-02T12:00:00 ... Attributes: standard_name: precipitation_amount long_name: Precipitation units: kg m-2 grid_mapping: rotated_pole cell_methods: time: sum

In [48]: pointdata["TOT_PREC"].resample(time="M").sum(min_count=1) Out[48]: <xarray.DataArray 'TOT_PREC' (time: 5)> array([ nan, nan, nan, nan, nan]) Coordinates: * time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ... rlon float32 -20.4 rlat float32 -10.34

In [49]: pointdata["TOT_PREC"].resample(time="M").sum(min_count=0) Out[49]: <xarray.DataArray 'TOT_PREC' (time: 5)> array([ 0., 0., 0., 0., 0.], dtype=float32) Coordinates: * time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ... rlon float32 -20.4 rlat float32 -10.34

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
403272397 https://github.com/pydata/xarray/pull/2236#issuecomment-403272397 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDQwMzI3MjM5Nw== fujiisoup 6815844 2018-07-08T08:40:55Z 2018-07-08T08:40:55Z MEMBER

I think this is ready for another review.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
399279344 https://github.com/pydata/xarray/pull/2236#issuecomment-399279344 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDM5OTI3OTM0NA== fujiisoup 6815844 2018-06-21T23:59:48Z 2018-06-21T23:59:48Z MEMBER

Thanks, @rpnaut . Actually, I'm changing the code also around sum. So it looks my change caused the bug you reported. I think we do not have a good test coverage around the dataset.reduction.

I will add the test also. Thanks again for the testing :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
399278523 https://github.com/pydata/xarray/pull/2236#issuecomment-399278523 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDM5OTI3ODUyMw== fujiisoup 6815844 2018-06-21T23:54:23Z 2018-06-21T23:54:23Z MEMBER

@shoyer , thanks for the details. I think I understood your idea. This sounds a cleaner solution. I will update the code again, but it will take some more days (or a week).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
399076615 https://github.com/pydata/xarray/pull/2236#issuecomment-399076615 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDM5OTA3NjYxNQ== rpnaut 30219501 2018-06-21T11:50:19Z 2018-06-21T12:04:58Z NONE

Okay. Using the old resample-nomenclature I tried to compare the results with your modified code (min_count = 2) and with the tagged version 0.10.7 (no min_count argument). But I am not sure, if this also works. Do you examine the keywords given from the old-resample method?.

However, in the comparison I did not saw the nan's I expected over water.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
399071095 https://github.com/pydata/xarray/pull/2236#issuecomment-399071095 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDM5OTA3MTA5NQ== rpnaut 30219501 2018-06-21T11:26:45Z 2018-06-21T12:03:23Z NONE

!!!Correction!!!. The resample-example above gives also a missing lon-lat dimensions in case of unmodified model code. The resulting numbers are the same.

data_aggreg = data["TOT_PREC"].resample(time="M").sum() data_aggreg <xarray.DataArray 'TOT_PREC' (time: 5)> array([ 551833.25 , 465640.09375, 328445.90625, 836892.1875 , 503601.5 ], dtype=float32) Coordinates: * time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ...

Now I am a little bit puzzled. But ...

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!IT SEEMS TO BE A BUG!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

If I do the resample process using the old nomenclature: data_aggreg = data["TOT_PREC"].resample(dim="time",how="sum",freq="M") it works. DO we have a bug in xarray?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
399061170 https://github.com/pydata/xarray/pull/2236#issuecomment-399061170 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDM5OTA2MTE3MA== rpnaut 30219501 2018-06-21T10:55:56Z 2018-06-21T11:03:16Z NONE

Hello from me hopefully contributing some needfull things.

At first, I would like to comment that I checked out your code.

I ran the following code example using a datafile uploaded under the following link: https://swiftbrowser.dkrz.de/public/dkrz_c0725fe8741c474b97f291aac57f268f/GregorMoeller/

``` import xarray import matplotlib.pyplot as plt

data = xarray.open_dataset("eObs_gridded_0.22deg_rot_v14.0.TOT_PREC.1950-2016.nc_CutParamTimeUnitCor_FinalEvalGrid") data <xarray.Dataset> Dimensions: (rlat: 136, rlon: 144, time: 153) Coordinates: * rlon (rlon) float32 -22.6 -22.38 -22.16 -21.94 -21.72 -21.5 ... * rlat (rlat) float32 -12.54 -12.32 -12.1 -11.88 -11.66 -11.44 ... * time (time) datetime64[ns] 2006-05-01T12:00:00 ... Data variables: rotated_pole int32 ... TOT_PREC (time, rlat, rlon) float32 ... Attributes: CDI: Climate Data Interface version 1.8.0 (http://m... Conventions: CF-1.6 history: Thu Jun 14 12:34:59 2018: cdo -O -s -P 4 remap... CDO: Climate Data Operators version 1.8.0 (http://m... cdo_openmp_thread_number: 4

data_aggreg = data["TOT_PREC"].resample(time="M").sum(min_count=0) data_aggreg2 = data["TOT_PREC"].resample(time="M").sum(min_count=1) I have recognized that the min_count option at recent state technically only works for DataArrays and not for DataSets. However, more interesting is the fact that the dimensions are destroyed: data_aggreg <xarray.DataArray 'TOT_PREC' (time: 5)> array([ 551833.25 , 465640.09375, 328445.90625, 836892.1875 , 503601.5 ], dtype=float32) Coordinates: * time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ... no longitude and latitude survives your operation. If I would use the the sum-operator on the full dataset (where maybe the code was not modified?) I got

data_aggreg = data.resample(time="M").sum() data_aggreg <xarray.Dataset> Dimensions: (rlat: 136, rlon: 144, time: 5) Coordinates: * time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ... * rlon (rlon) float32 -22.6 -22.38 -22.16 -21.94 -21.72 -21.5 ... * rlat (rlat) float32 -12.54 -12.32 -12.1 -11.88 -11.66 -11.44 ... Data variables: rotated_pole (time) int64 1 1 1 1 1 TOT_PREC (time, rlat, rlon) float32 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
398931150 https://github.com/pydata/xarray/pull/2236#issuecomment-398931150 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDM5ODkzMTE1MA== shoyer 1217238 2018-06-20T23:42:04Z 2018-06-20T23:42:04Z MEMBER

A module of bottleneck/numpy functions that act on numpy arrays only. A module of functions that act on numpy or dask arrays (or these could be moved into duck_array_ops).

Could you explain more detail about this idea?

OK, let me try:

  1. On numpy arrays, we use bottleneck eqiuvalents of numpy functions when possible because bottleneck is faster than numpy
  2. On dask arrays, we use dask equivalents of numpy functions.
  3. We also want to add some extra features on top of what numpy/dask/bottleneck provide, e.g., handling of min_count

We could implement this with: - nputils.nansum() is equivalent to numpy.nansum() but uses bottleneck.nansum() internally instead when possible. - duck_array_ops.nansum() uses numpy_nansum() or dask.array.nansum(), based upon the type of the inputs. - duck_array_ops.sum() uses numpy.sum() or dask.array.sum(), based upon the type of the inputs. - duck_array_ops.sum_with_mincount() adds mincount and skipna support and is used in the Dataset.sum() implementation. Its is written using duck_array_ops.nansum(), duck_array_ops.sum(), duck_array_ops.where() and duck_array_ops.isnull().

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
398926990 https://github.com/pydata/xarray/pull/2236#issuecomment-398926990 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDM5ODkyNjk5MA== fujiisoup 6815844 2018-06-20T23:17:12Z 2018-06-20T23:17:12Z MEMBER

I think it would make sense to restructure this a little bit to have two well defined layers:

A module of bottleneck/numpy functions that act on numpy arrays only. A module of functions that act on numpy or dask arrays (or these could be moved into duck_array_ops).

Could you explain more detail about this idea?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
398150942 https://github.com/pydata/xarray/pull/2236#issuecomment-398150942 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDM5ODE1MDk0Mg== shoyer 1217238 2018-06-18T18:28:58Z 2018-06-18T18:28:58Z MEMBER

Very nice!

In my implementation, bottleneck is not used when skipna=False. bottleneck would be advantageous when skipna=True as numpy needs to copy the entire array once, but I think numpy's method is still OK if skipna=False.

I think this is correct -- bottleneck does not speed up non-NaN skipping functions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.01ms · About: xarray-datasette