home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

13 rows where issue = 753517739 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • jbusecke 6
  • dcherian 5
  • mathause 1
  • keewis 1

author_association 2

  • MEMBER 7
  • CONTRIBUTOR 6

issue 1

  • Non lazy behavior for weighted average when using resampled data · 13 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
741811314 https://github.com/pydata/xarray/issues/4625#issuecomment-741811314 https://api.github.com/repos/pydata/xarray/issues/4625 MDEyOklzc3VlQ29tbWVudDc0MTgxMTMxNA== jbusecke 14314623 2020-12-09T14:34:45Z 2020-12-09T14:34:45Z CONTRIBUTOR

As @dcherian pointed out above copy(..., deep=False) does fix this for all cases I am testing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non lazy behavior for weighted average when using resampled data 753517739
741808410 https://github.com/pydata/xarray/issues/4625#issuecomment-741808410 https://api.github.com/repos/pydata/xarray/issues/4625 MDEyOklzc3VlQ29tbWVudDc0MTgwODQxMA== jbusecke 14314623 2020-12-09T14:30:57Z 2020-12-09T14:30:57Z CONTRIBUTOR

So I have added a test in #4668 and it confirms that this behavior is only occurring if the resample interval is smaller or equal than the chunks. If the resample interval is larger than the chunks it stays completely lazy...not sure if this is a general limitation? Does anyone have more insight into how resample handles this kind of workflow?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non lazy behavior for weighted average when using resampled data 753517739
736880359 https://github.com/pydata/xarray/issues/4625#issuecomment-736880359 https://api.github.com/repos/pydata/xarray/issues/4625 MDEyOklzc3VlQ29tbWVudDczNjg4MDM1OQ== jbusecke 14314623 2020-12-01T23:15:19Z 2020-12-01T23:15:19Z CONTRIBUTOR

Oh I remember that too, and I didn't understand it at all...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non lazy behavior for weighted average when using resampled data 753517739
736713989 https://github.com/pydata/xarray/issues/4625#issuecomment-736713989 https://api.github.com/repos/pydata/xarray/issues/4625 MDEyOklzc3VlQ29tbWVudDczNjcxMzk4OQ== dcherian 2448579 2020-12-01T17:47:44Z 2020-12-01T17:47:44Z MEMBER

Yes something like what you have with python with raise_if_dask_computes(): ds.resample(time='3AS').map(mean_func)

BUT something is wrong with my explanation above. The error is only triggered when the number of timesteps is not divisble by the resampling frequency. If you set periods=3 when creating t, the old version works fine, if you change it to 4 it computes. But setting deep=False fixes it in all cases. I am v. confused!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non lazy behavior for weighted average when using resampled data 753517739
736563711 https://github.com/pydata/xarray/issues/4625#issuecomment-736563711 https://api.github.com/repos/pydata/xarray/issues/4625 MDEyOklzc3VlQ29tbWVudDczNjU2MzcxMQ== jbusecke 14314623 2020-12-01T13:50:21Z 2020-12-01T13:50:21Z CONTRIBUTOR

Do you have a suggestion how to test this? Should I write a test involving resample + weighted?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non lazy behavior for weighted average when using resampled data 753517739
736221445 https://github.com/pydata/xarray/issues/4625#issuecomment-736221445 https://api.github.com/repos/pydata/xarray/issues/4625 MDEyOklzc3VlQ29tbWVudDczNjIyMTQ0NQ== dcherian 2448579 2020-12-01T05:08:12Z 2020-12-01T05:08:12Z MEMBER

Untested but specifying deep=False in the call to copy should fix it

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non lazy behavior for weighted average when using resampled data 753517739
736195480 https://github.com/pydata/xarray/issues/4625#issuecomment-736195480 https://api.github.com/repos/pydata/xarray/issues/4625 MDEyOklzc3VlQ29tbWVudDczNjE5NTQ4MA== dcherian 2448579 2020-12-01T03:34:29Z 2020-12-01T03:34:29Z MEMBER

PRs are always welcome!

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 1,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non lazy behavior for weighted average when using resampled data 753517739
736147406 https://github.com/pydata/xarray/issues/4625#issuecomment-736147406 https://api.github.com/repos/pydata/xarray/issues/4625 MDEyOklzc3VlQ29tbWVudDczNjE0NzQwNg== jbusecke 14314623 2020-12-01T00:58:21Z 2020-12-01T00:58:21Z CONTRIBUTOR

Sweet. Ill try to apply this fix for my workflow now. Happy to submit a PR with the suggested changes to weighted.py too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non lazy behavior for weighted average when using resampled data 753517739
736131299 https://github.com/pydata/xarray/issues/4625#issuecomment-736131299 https://api.github.com/repos/pydata/xarray/issues/4625 MDEyOklzc3VlQ29tbWVudDczNjEzMTI5OQ== dcherian 2448579 2020-12-01T00:12:41Z 2020-12-01T00:12:41Z MEMBER

Ah this works (but we lose weights as a coord var).

``` python

simple customized weighted mean function

def mean_func(ds): return ds.weighted(ds.weights.reset_coords(drop=True)).mean('time') ```

Adding reset_coords fixes this because it gets rid of the non-dim coord weights.

https://github.com/pydata/xarray/blob/180e76d106c697b1dd94b814a49dc2d7e58c8551/xarray/core/weighted.py#L149 dot compares the weights coord var on ds and weights to decide if it should keep it.

The new call to .copy ends up making a copy of weights coord on the weights dataarray, so the lazy equality check fails. One solution is to avoid the call to copy and create the DataArray directly

python enc = weights.encoding weights = DataArray( weights.data.map_blocks(_weight_check, dtype=weights.dtype), dims=weights.dims, coords=weights.coords, attrs=weights.attrs ) weights.encoding = enc This works locally.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non lazy behavior for weighted average when using resampled data 753517739
736101365 https://github.com/pydata/xarray/issues/4625#issuecomment-736101365 https://api.github.com/repos/pydata/xarray/issues/4625 MDEyOklzc3VlQ29tbWVudDczNjEwMTM2NQ== dcherian 2448579 2020-11-30T22:46:27Z 2020-11-30T22:51:34Z MEMBER

The weighted fix in #4559 is correct, that's why python with ProgressBar(): mean_func(ds) does not compute.

This is more instructive: ``` python from xarray.tests import raise_if_dask_computes

with raise_if_dask_computes(): ds.resample(time='3AS').map(mean_func) ```

``` python .... 150 151 def _sum_of_weights(

~/work/python/xarray/xarray/core/computation.py in dot(dims, arrays, kwargs) 1483 output_core_dims=output_core_dims, 1484 join=join, -> 1485 dask="allowed", 1486 ) 1487 return result.transpose([d for d in all_dims if d in result.dims])

~/work/python/xarray/xarray/core/computation.py in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, dask_gufunc_kwargs, *args) 1132 join=join, 1133 exclude_dims=exclude_dims, -> 1134 keep_attrs=keep_attrs, 1135 ) 1136 # feed Variables directly through apply_variable_ufunc

~/work/python/xarray/xarray/core/computation.py in apply_dataarray_vfunc(func, signature, join, exclude_dims, keep_attrs, *args) 266 else: 267 name = result_name(args) --> 268 result_coords = build_output_coords(args, signature, exclude_dims) 269 270 data_vars = [getattr(a, "variable", a) for a in args]

~/work/python/xarray/xarray/core/computation.py in build_output_coords(args, signature, exclude_dims) 231 # TODO: save these merged indexes, instead of re-computing them later 232 merged_vars, unused_indexes = merge_coordinates_without_align( --> 233 coords_list, exclude_dims=exclude_dims 234 ) 235

~/work/python/xarray/xarray/core/merge.py in merge_coordinates_without_align(objects, prioritized, exclude_dims) 327 filtered = collected 328 --> 329 return merge_collected(filtered, prioritized) 330 331

~/work/python/xarray/xarray/core/merge.py in merge_collected(grouped, prioritized, compat) 227 variables = [variable for variable, _ in elements_list] 228 try: --> 229 merged_vars[name] = unique_variable(name, variables, compat) 230 except MergeError: 231 if compat != "minimal":

~/work/python/xarray/xarray/core/merge.py in unique_variable(name, variables, compat, equals) 132 if equals is None: 133 # now compare values with minimum number of computes --> 134 out = out.compute() 135 for var in variables[1:]: 136 equals = getattr(out, compat)(var)

~/work/python/xarray/xarray/core/variable.py in compute(self, kwargs) 459 """ 460 new = self.copy(deep=False) --> 461 return new.load(kwargs) 462 463 def dask_tokenize(self):

~/work/python/xarray/xarray/core/variable.py in load(self, kwargs) 435 """ 436 if is_duck_dask_array(self._data): --> 437 self._data = as_compatible_data(self._data.compute(kwargs)) 438 elif not is_duck_array(self._data): 439 self._data = np.asarray(self._data)

~/miniconda3/envs/dcpy/lib/python3.7/site-packages/dask/base.py in compute(self, kwargs) 165 dask.base.compute 166 """ --> 167 (result,) = compute(self, traverse=False, kwargs) 168 return result 169

~/miniconda3/envs/dcpy/lib/python3.7/site-packages/dask/base.py in compute(args, kwargs) 450 postcomputes.append(x.dask_postcompute()) 451 --> 452 results = schedule(dsk, keys, kwargs) 453 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)]) 454

~/work/python/xarray/xarray/tests/init.py in call(self, dsk, keys, kwargs) 112 raise RuntimeError( 113 "Too many computes. Total: %d > max: %d." --> 114 % (self.total_computes, self.max_computes) 115 ) 116 return dask.get(dsk, keys, kwargs)

RuntimeError: Too many computes. Total: 1 > max: 0. ```

It looks like we're repeatedly checking weights for equality (if you navigate to merge_collected in the stack, name = "weights". The lazy_array_equal check is failing, because a copy is made somewhere.

``` python ipdb> up

/home/deepak/work/python/xarray/xarray/core/merge.py(229)merge_collected() 227 variables = [variable for variable, _ in elements_list] 228 try: --> 229 merged_vars[name] = unique_variable(name, variables, compat) 230 except MergeError: 231 if compat != "minimal":

ipdb> name

'weights'

ipdb> variables

[<xarray.Variable (time: 1)> dask.array<getitem, shape=(1,), dtype=float64, chunksize=(1,), chunktype=numpy.ndarray>, <xarray.Variable (time: 1)> dask.array<copy, shape=(1,), dtype=float64, chunksize=(1,), chunktype=numpy.ndarray>]

ipdb> variables[0].data.name

'getitem-2a74b8ca20ae20100597e397404ba17b'

ipdb> variables[1].data.name

'copy-fff901a87f4a2293c750766c554aa68d' ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non lazy behavior for weighted average when using resampled data 753517739
736096615 https://github.com/pydata/xarray/issues/4625#issuecomment-736096615 https://api.github.com/repos/pydata/xarray/issues/4625 MDEyOklzc3VlQ29tbWVudDczNjA5NjYxNQ== keewis 14808389 2020-11-30T22:34:04Z 2020-11-30T22:34:04Z MEMBER

the issue seems to be just this: https://github.com/pydata/xarray/blob/180e76d106c697b1dd94b814a49dc2d7e58c8551/xarray/core/weighted.py#L116-L118 Also, the computation is still triggered, even if we remove the map_blocks call: python weights = weights.copy(data=weights.data) not sure why, though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non lazy behavior for weighted average when using resampled data 753517739
736082255 https://github.com/pydata/xarray/issues/4625#issuecomment-736082255 https://api.github.com/repos/pydata/xarray/issues/4625 MDEyOklzc3VlQ29tbWVudDczNjA4MjI1NQ== jbusecke 14314623 2020-11-30T22:00:38Z 2020-11-30T22:00:38Z CONTRIBUTOR

Oh nooo. So would you suggest that in addition to #4559, we should have a kwarg to completely skip this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non lazy behavior for weighted average when using resampled data 753517739
735875504 https://github.com/pydata/xarray/issues/4625#issuecomment-735875504 https://api.github.com/repos/pydata/xarray/issues/4625 MDEyOklzc3VlQ29tbWVudDczNTg3NTUwNA== mathause 10194086 2020-11-30T15:58:51Z 2020-11-30T15:58:51Z MEMBER

I fear it's the weight check :facepalm:, try commenting lines 105 to 121:

https://github.com/pydata/xarray/blob/255bc8ee9cbe8b212e3262b0d4b2e32088a08064/xarray/core/weighted.py#L105

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Non lazy behavior for weighted average when using resampled data 753517739

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.844ms · About: xarray-datasette