home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

16 rows where user = 941907 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 7

  • apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 5
  • acccessor extending approach limits functional programming approach, make direct monkey-patching also possible 4
  • DataArray.apply is missing 2
  • sharing dimensions across dataarrays in a dataset 2
  • DataArray.diff dim argument should be optional as is in docstring 1
  • Issue a warning when overwriting attributes with accessors instead of erroring 1
  • pipe, apply should call maybe_wrap_array, possibly resolve dim->axis 1

user 1

  • smartass101 · 16 ✖

author_association 1

  • NONE 16
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
605468155 https://github.com/pydata/xarray/issues/1040#issuecomment-605468155 https://api.github.com/repos/pydata/xarray/issues/1040 MDEyOklzc3VlQ29tbWVudDYwNTQ2ODE1NQ== smartass101 941907 2020-03-28T16:12:38Z 2020-03-28T16:12:38Z NONE

These days I mostly use .differentiate('coord') anyways, so I'm also for updating the docstring.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray.diff dim argument should be optional as is in docstring 181340410
567082163 https://github.com/pydata/xarray/issues/3574#issuecomment-567082163 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NzA4MjE2Mw== smartass101 941907 2019-12-18T15:32:38Z 2019-12-18T15:32:38Z NONE

meta = np.ndarray if vectorize is True else None if the user doesn't explicitly provide meta.

Yes, sorry, written this way I now see what you meant and that will likely work indeed.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
566938638 https://github.com/pydata/xarray/issues/3574#issuecomment-566938638 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NjkzODYzOA== smartass101 941907 2019-12-18T08:55:29Z 2019-12-18T08:55:29Z NONE

meta should be passed to blockwise through _apply_blockwise with default None (I think) and np.ndarray if vectorize is True. You'll have to pass the vectorize kwarg down to this level I think.

I'm afraid that passing meta=None will not help as explained in https://github.com/dask/dask/issues/5642 and seen around this line because in that case compute_meta will be called which might fail with a np.vectorize-wrapped function. I belive a better solution would be to address https://github.com/dask/dask/issues/5642 so that meta isn't computed even though we already provide an output dtype.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
565186199 https://github.com/pydata/xarray/issues/3574#issuecomment-565186199 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NTE4NjE5OQ== smartass101 941907 2019-12-12T21:04:33Z 2019-12-12T21:04:33Z NONE

The problem is that Dask, as of version 2.0, calls functions applied to dask arrays with size zero inputs, to figure out the output array type, e.g., is the output a dense numpy.ndarray or a sparse array?

Yes, now I recall that this was the issue, yeah. It doesn't even depend on your actual data really.

Possible option 3. is to address https://github.com/dask/dask/issues/5642 directly (haven't found time to do a PR yet). Essentially from the code described in that issue I have the feeling that if a dtype is passed (as apply_ufunc does), then meta should not need to be calculated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
564934693 https://github.com/pydata/xarray/issues/3574#issuecomment-564934693 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU2NDkzNDY5Mw== smartass101 941907 2019-12-12T09:57:18Z 2019-12-12T09:57:28Z NONE

Sounds similar. But I'm not sure why you get the 0d issue when even your chunks don't (from a quick reading) seem to have a 0 size in any of the dimensions. Could you please show us what is the resulting chunk setup?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
558616375 https://github.com/pydata/xarray/issues/3574#issuecomment-558616375 https://api.github.com/repos/pydata/xarray/issues/3574 MDEyOklzc3VlQ29tbWVudDU1ODYxNjM3NQ== smartass101 941907 2019-11-26T12:56:47Z 2019-11-26T12:56:47Z NONE

Another approach would be to bypass compute_meta in dask.blockwise if dtype is provided which seems to be hinted at here

https://github.com/dask/dask/blob/3960c6518318f2417658c2fc47cd5b5ece726f8b/dask/array/blockwise.py#L234

Perhaps this is an oversight in dask, what do you think?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc with dask='parallelized' and vectorize=True fails on compute_meta 528701910
430946620 https://github.com/pydata/xarray/issues/1471#issuecomment-430946620 https://api.github.com/repos/pydata/xarray/issues/1471 MDEyOklzc3VlQ29tbWVudDQzMDk0NjYyMA== smartass101 941907 2018-10-18T09:48:20Z 2018-10-18T09:48:20Z NONE

I indeed often resort to using a pandas.MultiIndex, but especially the dropping of the selected coordinate value (#1408) makes it quite inconvenient.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sharing dimensions across dataarrays in a dataset 241290234
430324391 https://github.com/pydata/xarray/issues/1471#issuecomment-430324391 https://api.github.com/repos/pydata/xarray/issues/1471 MDEyOklzc3VlQ29tbWVudDQzMDMyNDM5MQ== smartass101 941907 2018-10-16T17:24:42Z 2018-10-16T17:46:17Z NONE

I've hit this design limitation quite often as well, with several use-cases, both in experiment and simulation. It detracts from xarray's power of conveniently and transparently handling coordinate meta-data. From the Why xarray? page:

with xarray, you don’t need to keep track of the order of arrays dimensions or insert dummy dimensions

Adding effectively dummy dimensions or coordinates is essentially what this alignment design is forcing us to do.

A possible solution would be something like having (some) coordinate arrays in an (Unaligned)Dataset being a "reducible" (it would reduce to Index for each Datarray) MultiIndex. A workaround can be using MultiIndex coordinates directly, but then alignment cannot be done easily as levels do not behave as real dimensions.

Use-cases examples:

1. coordinate "metadata"

I often have measurements on related axes, but also with additional coordinates (different positions, etc.) Consider: python import numpy as np import xarray as xr n1 = np.arange(1, 22) m1 = xr.DataArray(n1*0.5, coords={'num': n1, 'B': 'r', 'ar' :'A'}, dims=['num'], name='MA') n2 = np.arange(2, 21) m2 = xr.DataArray(n2*0.5, coords={'num': n2, 'B': 'r', 'ar' :'B'}, dims=['num'], name='MB') ds = xr.merge([m1, m2]) print(ds)

What I would like to get (pseudocode):

python <xarray.Dataset> Dimensions: (num: 21, ar:2) # <-- note that MB is still of dims {'num': 19} only Coordinates: # <-- mostly unions as done by concat * num (num) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 B <U1 'r' * ar <U1 'A' 'B' # <-- this is now a dim of the dataset, but not of MA or MB Data variables: MA (num) float64 0.5 1.0 1.5 2.0 2.5 3.0 ... 8.0 8.5 9.0 9.5 10.0 10.5 MB (num) float64 1.0 1.5 2.0 2.5 3.0 3.5 ... 7.5 8.0 8.5 9.0 9.5 10.0 Instead I get

python MergeError: conflicting values for variable 'ar' on objects to be combined: first value: <xarray.Variable ()> array('A', dtype='<U1') second value: <xarray.Variable ()> array('B', dtype='<U1')

While it is possible to concat into something with dimensions (num, ar, B), it often results in huge arrays where most values are nan. I could also store the "position" metadata as attrs, but that pretty much defeats the point of using xarray to have coordinates transparently part of the coordinate metadata. Also, sometimes I would like to select arrays from the dataset from a given location, e.g. Dataset.sel(ar='B').

2. unaligned time domains

This s a large problem especially when different time-bases are involved. A difference in sampling intervals will blow up the storage by a huge number of nan values. Which of course greatly complicates further calculations, e.g. filtering in the time domain. Or just non-overlaping time intervals will require at least double the storage area.

I often find myself resorting rather to pandas.MultiIndex which gladly manages such non-aligned coordinates while still enabling slicing and selection on various levels. So it can be done and the pandas code and functionality already exists.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sharing dimensions across dataarrays in a dataset 241290234
261496603 https://github.com/pydata/xarray/issues/1130#issuecomment-261496603 https://api.github.com/repos/pydata/xarray/issues/1130 MDEyOklzc3VlQ29tbWVudDI2MTQ5NjYwMw== smartass101 941907 2016-11-18T10:16:17Z 2016-11-18T10:16:17Z NONE

... rather not adjust DataArray.pipe, which is intentionally very simple.

It would be just one extra call to a funciton which is very simple. As I commented in #1074, I think it makes more sense to have pipe wrap arrays, because otherwise the pipe-chain might get broken, whereas with apply I'd be ok with it behaving as the python apply function which simply applies a function and nothing more. But maybe_wrap_array is simple and does not break anything, it wraps it only when it's safe.

In the unlikely event your function takes a dim argument

I think that could be quite likely as one might want to apply a DataArray-compatible function. This would force users to remember which type of "function applier" to use for a given function and might be confusing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  pipe, apply should call maybe_wrap_array, possibly resolve dim->axis 189998469
261495282 https://github.com/pydata/xarray/issues/1074#issuecomment-261495282 https://api.github.com/repos/pydata/xarray/issues/1074 MDEyOklzc3VlQ29tbWVudDI2MTQ5NTI4Mg== smartass101 941907 2016-11-18T10:09:48Z 2016-11-18T10:09:48Z NONE

Actually, I think that pipe should default to raw=False if there even is to be such a parameter. the reason is that one usually uses pipe to chain together functions, each of which usually expect a DataArray and "downcasting" to ndarray often breaks such a chain. If you insist on having one method behave as if raw=True, then I think it should be apply in order to be constsent with the python apply function which simply applies the function and nothing more.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray.apply is missing 186868181
261223890 https://github.com/pydata/xarray/issues/1074#issuecomment-261223890 https://api.github.com/repos/pydata/xarray/issues/1074 MDEyOklzc3VlQ29tbWVudDI2MTIyMzg5MA== smartass101 941907 2016-11-17T11:29:38Z 2016-11-17T11:29:38Z NONE

I think #1130 is related. I also think that apply is somewhat synonymous to pipe and is a lot more understandable for people without a pandas background. It would also be more consistent to have them both named the same on both Dataset and DataArray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray.apply is missing 186868181
261200107 https://github.com/pydata/xarray/issues/1080#issuecomment-261200107 https://api.github.com/repos/pydata/xarray/issues/1080 MDEyOklzc3VlQ29tbWVudDI2MTIwMDEwNw== smartass101 941907 2016-11-17T09:41:18Z 2016-11-17T09:41:18Z NONE

Thank you for continuing this discussion even though you didn't agree with the initial proposal. I have accepted and embraced option 3) as it is indeed about the cleanest and most readable option.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  acccessor extending approach limits functional programming approach, make direct monkey-patching also possible 187373423
260116620 https://github.com/pydata/xarray/issues/1080#issuecomment-260116620 https://api.github.com/repos/pydata/xarray/issues/1080 MDEyOklzc3VlQ29tbWVudDI2MDExNjYyMA== smartass101 941907 2016-11-12T11:28:02Z 2016-11-12T11:28:02Z NONE

Code is read in text form more often than it is interactively explored.

Good point, in that case explicit namespacing indeed helps.

At Google, our Python style guide actually prohibits writing import like from xarray import Dataset. You have to write import xarray or import xarray as xr and always use the namespace.

A module-level namespace has nothing to do with the class namespace, but I see you try to tie them, which makes sense in relationship with the argument about reading code in text form. However, that may not be clear for Python programmers as those namespaces are not tied in reality, better mention it in the docs. BTW, if you are enforcing some specific style guide, please note that in the docs. And I hope you strike the right balance between style complacency and universality.

xarray objects are already non-threadsafe, in the same way that the built-in list and dict are not threadsafe. I don't see how caching attributes changes this. You can choose whether or not to save state on the accessor (and of course, generally it would be better not to).

My problem with non-functional paradigms lies more in the apply, map... paradigms which accessors don't fit into than thread safety.

Finally, I'll note that we also have the .pipe method (e.g., array.pipe(square)), so if you just want functions that you can call with method chaining syntax, you don't even need to write an accessor at all.

That is indeed a good alternative, just not sure my colleagues will like the transition from sig.lowpass(0.2).multiply(3) to sig.pipe(xdsp.lowpass, 0.2).pipe(np.multiply, 3). A benefit of pipe is that methods can be tab-completed from namespaces (useful for interactive usage) and that any compatible function can be used, not just registered methods. Perhaps I will suggest DataArray.__call__ = DataArray.pipe (maybe that could be added in xarray ? should I make an issue for that?) which would make it quite convenient to write only sig(xdsp.lowpass, 0.2)(np.multiply, 3) which is almost the same in terms of chars written and has quite clear syntax (calling a signal with a function argument applies the function to it).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  acccessor extending approach limits functional programming approach, make direct monkey-patching also possible 187373423
258702758 https://github.com/pydata/xarray/issues/1080#issuecomment-258702758 https://api.github.com/repos/pydata/xarray/issues/1080 MDEyOklzc3VlQ29tbWVudDI1ODcwMjc1OA== smartass101 941907 2016-11-06T19:09:43Z 2016-11-06T19:09:43Z NONE

The namespace argument doesn't seem very convincing since you already implement many methods which may shadow variables (mean, diff). By limiting control of the namespace you make some uses somewhat inconvenient. If you want users to use DataArray as a general and universal and also extensible container, limiting its namespace goes against that. If they shadow vars by their methods, that's their decision to make.

While it may seem cleaner to have a stricter API, in real use cases users care more about convenient code access than where it came from. And when they look at the method object it will clearly tell them where it was defined. Python's introspection capabilities are powerful enough that users can find out such information.

What I meant by the 2. point was that in many cases one just needs a simple method and with the accessor approach one has to write extra lines of code like the ones you suggested earlier that may later seem cryptic. Caching of the accessor can be indeed useful, just not always. If you want people to develop plugins, make it as simple as possible and yet also advanced for those who require it. And then there"s also the problem of accessors not being usable in functional programming paradigms.

Tl;dr: accessors have benefits (namespace containment, caching) but also limitations (not functional paradigm, overkill sometimes). Give users more control over methods and you'll get more plugins.

On November 6, 2016 2:22:44 PM GMT+01:00, Stephan Hoyer notifications@github.com wrote:

Is it because of namespace growth/conflicts? There are already many methods like diff, any which don't seem particularly more important than others. For instance, ndarray has no diff method yet you implement it.

Indeed. My thinking was the xarray.Dataset and xarray.DataArray are in the "xarray" namespace. We allow you to register an extension namespace, but want to keep it well contained and under one attribute, so it's clear(er) to users and developers what is going on, and where the code comes from.

A stricter approach would have been to put everything under an attribute just for extensions, e.g., Dataset.x.namespace instead of Dataset.namespace, but this gets even more cumbersome -- and also conflicts with variables named x!

Could you please give some clear arguments why you discourage the use of normal methods? The two arguments listed in the docs don't really make a compelling case against method monkey-patching, because 1. name clashes can be easily checked for either approach (in either case you just check the existence of a class attribute)

I'll add a wrote about the value of namespaces to the doc.

  1. caching on the dataset sometimes makes no sense and just adds redundancy and complicates the design and registering of extra functionality

We could certainly turn this off (optionally) if there are cases where it does the wrong thing. Could you go into this in a little more detail, perhaps with a concrete example? My expectation was that this should have minimal design or performance downsides.

You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/pydata/xarray/issues/1080#issuecomment-258680571

Sent from my Android device with K-9 Mail. Please excuse my brevity.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  acccessor extending approach limits functional programming approach, make direct monkey-patching also possible 187373423
258690368 https://github.com/pydata/xarray/issues/1082#issuecomment-258690368 https://api.github.com/repos/pydata/xarray/issues/1082 MDEyOklzc3VlQ29tbWVudDI1ODY5MDM2OA== smartass101 941907 2016-11-06T16:02:36Z 2016-11-06T16:02:36Z NONE

I vote for warning by default. Raising an error brings more inconvenience than it's worth. Remember to warneach time, not just on first code run.

On November 6, 2016 2:11:54 PM GMT+01:00, Stephan Hoyer notifications@github.com wrote:

On the mailing list, @rabernat wrote:

Also, how can I interactively develop an accessor? If I try to re-register under the same name, I get the error AccessorRegistrationError: cannot register accessor <class '__main__.ExchAccessor'> under name 'exch' for type <class 'xarray.core.dataset.Dataset'> because an attribute with that name already exists.

In #1080, @smartass101 suggests:

Btw, perhaps it might be better to (perhaps optionally) issue a warning when overriding an existing class attribute during registering instead of completely refusing to do so.

I think this is a good idea, and would nicely solve @rabernat's problem (which might be your problem, too). We could add a new keyword argument (e.g., allow_override=True or warn=True to register_*_accessor) which switches to this new mode.

Should it be the default behavior? It is also possible that warnings instead of errors are enough in general.

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/pydata/xarray/issues/1082

Sent from my Android device with K-9 Mail. Please excuse my brevity.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Issue a warning when overwriting attributes with accessors instead of erroring 187560717
258623314 https://github.com/pydata/xarray/issues/1080#issuecomment-258623314 https://api.github.com/repos/pydata/xarray/issues/1080 MDEyOklzc3VlQ29tbWVudDI1ODYyMzMxNA== smartass101 941907 2016-11-05T16:41:07Z 2016-11-05T16:41:07Z NONE

Thank you for your response.

I still don't understand why you are pushing accessors in place of methods to such an extent. Is it because of namespace growth/conflicts? There are already many methods like diff, any which don't seem particularly more important than others. For instance, ndarray has no diff method yet you implement it.

While the solutions you presented are usable, they seem like workarounds and somewhat redundant or add extra like overhead (in terms of writing code). Registering extra dataset accessors where DataArray method application would do seems again redundant.

I would definitely discourage writing too many of such methods, though.

Could you please give some clear arguments why you discourage the use of normal methods? The two arguments listed in the docs don't really make a compelling case against method monkey-patching, because 1. name clashes can be easily checked for either approach (in either case you just check the existence of a class attribute) 2. caching on the dataset sometimes makes no sense and just adds redundancy and complicates the design and registering of extra functionality

I'm not trying to say that the accessor approach is wrong, I'm sure it makes sense for certain plugins. I'm just trying to share my experience with a very similar case where the simpler method approach turned out to be satisfactory and I think enabling it would increase the chances of more xarray plugins (which may not need accessor logic) coming to life.

Btw, perhaps it might be better to (perhaps optionally) issue a warning when overriding an existing class attribute during registering instead of completely refusing to do so.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  acccessor extending approach limits functional programming approach, make direct monkey-patching also possible 187373423

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.107ms · About: xarray-datasette