home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where issue = 168470276 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • crusaderky 5
  • shoyer 4

issue 1

  • align() and broadcast() before concat() · 9 ✖

author_association 1

  • MEMBER 9
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
241232663 https://github.com/pydata/xarray/issues/927#issuecomment-241232663 https://api.github.com/repos/pydata/xarray/issues/927 MDEyOklzc3VlQ29tbWVudDI0MTIzMjY2Mw== shoyer 1217238 2016-08-21T01:00:27Z 2016-08-21T01:00:27Z MEMBER

Fixed by #963

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  align() and broadcast() before concat() 168470276
239289914 https://github.com/pydata/xarray/issues/927#issuecomment-239289914 https://api.github.com/repos/pydata/xarray/issues/927 MDEyOklzc3VlQ29tbWVudDIzOTI4OTkxNA== crusaderky 6213168 2016-08-11T20:57:00Z 2016-08-11T20:57:00Z MEMBER

Finished first complete part, ready for merging: https://github.com/pydata/xarray/pull/963

Note: there is a failed unit test, TestDataset.test_broadcast_nocopy, which shows broadcast on dataset doing a data copy whereas it shouldn't. Could you look into it?

I'm now moving to auto-calling broadcast inside concat()...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  align() and broadcast() before concat() 168470276
238114831 https://github.com/pydata/xarray/issues/927#issuecomment-238114831 https://api.github.com/repos/pydata/xarray/issues/927 MDEyOklzc3VlQ29tbWVudDIzODExNDgzMQ== shoyer 1217238 2016-08-07T23:14:44Z 2016-08-07T23:14:44Z MEMBER

First of all -- thanks for diving into this! You are really getting into the messy guts here, so thanks for sticking with it.

I can't put any sense in the skip_single_target hack or in the whole special treatment inside align() for when there is only one arg. What's the benefit of the whole thing? What would happen if we simply removed the special logic?

I actually just added this hack in a few days ago to fix a regression in the v0.8.0 release (https://github.com/pydata/xarray/issues/943). The problem is that __setitem__ assignment in xarray does automatic alignment, but arrays with non-unique indexes cannot be aligned. This meant that it was impossible to assign to an array with a non-unique index, even if the new array did not have any labels.

I agree that the way I fixed this is really messy. It would be better simply not to align as part of merge if there is only a single labeled argument, but it wasn't obvious how to do last week when I was rushing to get the bug fix out (the logic is currently buried in deep_align and needs to be extracted).

I'll take a look now and see if I can come up with a cleaner fix, but for now I would stick with a separate internal only private_align that allows the extra argument.

DataArray.reindex(copy=False) still performs a copy, even if there's nothing to do. I'm a bit afraid to go and fix it right now as I don't want to trigger domino effects

One issue issue here is that the purpose and functionality of the copy argument is currently poorly documented. It should say something like:

If `copy=True`, the return value is independent of the input. If `copy=False` and reindexing is unnecessary, or can be performed with only slice operations, then the data may include references to arrays also found in the input.

(I'll update this soon.)

We should not be making any promises about returning exactly the same input object, and in fact for consistency we should (as we current do) always return a new Dataset or DataArray, because constructing these objects is pretty cheap. Rather, the issue is whether we need to copy numpy.ndarray input, which can be very expensive if the array is large.

I'm experiencing a lot of grief because assertDatasetIdentical expects both the coords and the data vars to have the same order, which in some situations it's simply impossible to control without touching vast areas of your code.

This shouldn't be true. assertDatasetIdentical calls the Dataset.identical method, which doesn't care about order:

``` In [29]: from collections import OrderedDict

In [30]: ds1 = xr.Dataset(OrderedDict([('x', 0), ('y', 1)]))

In [31]: ds2 = xr.Dataset(OrderedDict([('y', 1), ('x', 0)]))

In [32]: ds1 Out[32]: <xarray.Dataset> Dimensions: () Coordinates: empty Data variables: x int64 0 y int64 1

In [33]: ds2 Out[33]: <xarray.Dataset> Dimensions: () Coordinates: empty Data variables: y int64 1 x int64 0

In [34]: ds1.identical(ds2) Out[34]: True ```

As a more general and fundamental point, I cannot understand what's the benefit of using OrderedDict instead of a plain dict for coords, attrs, and Dataset.data_vars?

This is something we considered before (https://github.com/pydata/xarray/issues/75). I think the main advantage is that OrderedDict is more intuitive (for people) and makes it easier to see at a glance how data changes.


One other thing to note: concat already handles broadcasting for each variable (independently) with ensure_common_dims. This would definitely be handled more cleanly with support for an exclude argument to broadcast_variables. It wouldn't work to use the main broadcast function for this because it would broadcast unnecessary variables when concatenating datasets.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  align() and broadcast() before concat() 168470276
238081972 https://github.com/pydata/xarray/issues/927#issuecomment-238081972 https://api.github.com/repos/pydata/xarray/issues/927 MDEyOklzc3VlQ29tbWVudDIzODA4MTk3Mg== crusaderky 6213168 2016-08-07T13:19:12Z 2016-08-07T13:19:12Z MEMBER

Started working on it. Still a lot of cleanup to be done. https://github.com/crusaderky/xarray/commit/8830437865c43e472101ca91a12f714ee43546cc

Observations: - I can't put any sense in the skip_single_target hack or in the whole special treatment inside align() for when there is only one arg. What's the benefit of the whole thing? What would happen if we simply removed the special logic? - DataArray.reindex(copy=False) still performs a copy, even if there's nothing to do. I'm a bit afraid to go and fix it right now as I don't want to trigger domino effects - I'm experiencing a lot of grief because assertDatasetIdentical expects both the coords and the data vars to have the same order, which in some situations it's simply impossible to control without touching vast areas of your code. As a more general and fundamental point, I cannot understand what's the benefit of using OrderedDict instead of a plain dict for coords, attrs, and Dataset.data_vars?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  align() and broadcast() before concat() 168470276
236919314 https://github.com/pydata/xarray/issues/927#issuecomment-236919314 https://api.github.com/repos/pydata/xarray/issues/927 MDEyOklzc3VlQ29tbWVudDIzNjkxOTMxNA== shoyer 1217238 2016-08-02T14:21:06Z 2016-08-02T14:21:06Z MEMBER

Awesome! We usually stick to outer joins by default so we don't inadvertantly drop data. If datasets are already aligned, calling align with copy=False should have minimal performance costs. On Tue, Aug 2, 2016 at 12:20 AM crusaderky notifications@github.com wrote:

I can work on it. Should we go for a default outer join + broadcast inside concat? If the input already arrive aligned, or if the user wants a different type of join, this will slow things down with useless code though. What's your policy on this?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/927#issuecomment-236822408, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1uZ5XlGSWrDfp3MRrKxPFd2zlD_aks5qbu-ugaJpZM4JY0b1 .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  align() and broadcast() before concat() 168470276
236822408 https://github.com/pydata/xarray/issues/927#issuecomment-236822408 https://api.github.com/repos/pydata/xarray/issues/927 MDEyOklzc3VlQ29tbWVudDIzNjgyMjQwOA== crusaderky 6213168 2016-08-02T07:20:13Z 2016-08-02T07:20:13Z MEMBER

I can work on it. Should we go for a default outer join + broadcast inside concat? If the input already arrive aligned, or if the user wants a different type of join, this will slow things down with useless code though. What's your policy on this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  align() and broadcast() before concat() 168470276
236767844 https://github.com/pydata/xarray/issues/927#issuecomment-236767844 https://api.github.com/repos/pydata/xarray/issues/927 MDEyOklzc3VlQ29tbWVudDIzNjc2Nzg0NA== shoyer 1217238 2016-08-02T01:43:45Z 2016-08-02T01:43:45Z MEMBER

Indeed, we could probably replace align with partial_align, and use this inside concat (see #930 for an example of that). This is probably worth doing.

I didn't add exclude to align before mostly because I wasn't sure if the functionality would be useful to users, and I wanted to avoid making the mistake of expanding the API prematurely (it's harder to remove features than add them). Also, I didn't write tests or a good docstring for partial_align :).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  align() and broadcast() before concat() 168470276
236544958 https://github.com/pydata/xarray/issues/927#issuecomment-236544958 https://api.github.com/repos/pydata/xarray/issues/927 MDEyOklzc3VlQ29tbWVudDIzNjU0NDk1OA== crusaderky 6213168 2016-08-01T10:26:41Z 2016-08-01T10:40:27Z MEMBER

I'm now facing the equivalent problem with broadcast(). I need to concat 3 arrays:

python a = xarray.DataArray([[1,2],[3,4]], dims=['time', 'scenario'], coords={'time': ['t1', 't2']}) b = xarray.DataArray([5,6], dims=['time'], coords={'time': ['t3', 't4']}) c = xarray.DataArray(7, coords={'time': 't5'})

I want to broadcast dimension 'scenario' and concatenate on dimension 'time'. However, I can't invoke broadcast() because 1. a and b have misaligned time 2. c has a time coord, but not a time dim - if you broadcast a and c, you will get incorrect results

Again the solution would be to have an optional parameter exclude=['time']

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  align() and broadcast() before concat() 168470276
236541952 https://github.com/pydata/xarray/issues/927#issuecomment-236541952 https://api.github.com/repos/pydata/xarray/issues/927 MDEyOklzc3VlQ29tbWVudDIzNjU0MTk1Mg== crusaderky 6213168 2016-08-01T10:10:53Z 2016-08-01T10:10:53Z MEMBER

Just found xarray.core.alignment.partial_align(), which does exactly that. Not sure why the functionality is not part of the public API? Looks like the solution is to simply merge partial_align() into align()?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  align() and broadcast() before concat() 168470276

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 17.199ms · About: xarray-datasette