home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 238114831

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/927#issuecomment-238114831 https://api.github.com/repos/pydata/xarray/issues/927 238114831 MDEyOklzc3VlQ29tbWVudDIzODExNDgzMQ== 1217238 2016-08-07T23:14:44Z 2016-08-07T23:14:44Z MEMBER

First of all -- thanks for diving into this! You are really getting into the messy guts here, so thanks for sticking with it.

I can't put any sense in the skip_single_target hack or in the whole special treatment inside align() for when there is only one arg. What's the benefit of the whole thing? What would happen if we simply removed the special logic?

I actually just added this hack in a few days ago to fix a regression in the v0.8.0 release (https://github.com/pydata/xarray/issues/943). The problem is that __setitem__ assignment in xarray does automatic alignment, but arrays with non-unique indexes cannot be aligned. This meant that it was impossible to assign to an array with a non-unique index, even if the new array did not have any labels.

I agree that the way I fixed this is really messy. It would be better simply not to align as part of merge if there is only a single labeled argument, but it wasn't obvious how to do last week when I was rushing to get the bug fix out (the logic is currently buried in deep_align and needs to be extracted).

I'll take a look now and see if I can come up with a cleaner fix, but for now I would stick with a separate internal only private_align that allows the extra argument.

DataArray.reindex(copy=False) still performs a copy, even if there's nothing to do. I'm a bit afraid to go and fix it right now as I don't want to trigger domino effects

One issue issue here is that the purpose and functionality of the copy argument is currently poorly documented. It should say something like:

If `copy=True`, the return value is independent of the input. If `copy=False` and reindexing is unnecessary, or can be performed with only slice operations, then the data may include references to arrays also found in the input.

(I'll update this soon.)

We should not be making any promises about returning exactly the same input object, and in fact for consistency we should (as we current do) always return a new Dataset or DataArray, because constructing these objects is pretty cheap. Rather, the issue is whether we need to copy numpy.ndarray input, which can be very expensive if the array is large.

I'm experiencing a lot of grief because assertDatasetIdentical expects both the coords and the data vars to have the same order, which in some situations it's simply impossible to control without touching vast areas of your code.

This shouldn't be true. assertDatasetIdentical calls the Dataset.identical method, which doesn't care about order:

``` In [29]: from collections import OrderedDict

In [30]: ds1 = xr.Dataset(OrderedDict([('x', 0), ('y', 1)]))

In [31]: ds2 = xr.Dataset(OrderedDict([('y', 1), ('x', 0)]))

In [32]: ds1 Out[32]: <xarray.Dataset> Dimensions: () Coordinates: empty Data variables: x int64 0 y int64 1

In [33]: ds2 Out[33]: <xarray.Dataset> Dimensions: () Coordinates: empty Data variables: y int64 1 x int64 0

In [34]: ds1.identical(ds2) Out[34]: True ```

As a more general and fundamental point, I cannot understand what's the benefit of using OrderedDict instead of a plain dict for coords, attrs, and Dataset.data_vars?

This is something we considered before (https://github.com/pydata/xarray/issues/75). I think the main advantage is that OrderedDict is more intuitive (for people) and makes it easier to see at a glance how data changes.


One other thing to note: concat already handles broadcasting for each variable (independently) with ensure_common_dims. This would definitely be handled more cleanly with support for an exclude argument to broadcast_variables. It wouldn't work to use the main broadcast function for this because it would broadcast unnecessary variables when concatenating datasets.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  168470276
Powered by Datasette · Queries took 0.728ms · About: xarray-datasette