html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/25#issuecomment-54416883,https://api.github.com/repos/pydata/xarray/issues/25,54416883,MDEyOklzc3VlQ29tbWVudDU0NDE2ODgz,1217238,2014-09-04T06:50:49Z,2014-09-04T06:50:49Z,MEMBER,"I'm going to close this issue as fixed, but feel free to complain if you feel otherwise (particularly if you have ideas for how we should improve this).
The rule that we seem to have settled on is that xray will either drop _all_ attributes if the result could be ambiguous, or, if there is a clear priority, it will only keep around attributes from the first object. The one firm rule is that xray does not do any checking of attributes for conflicts.
Unless `compat == 'identical'`, there's no checking for conflicts: operations are either keep them all (mostly just subsetting/indexing) or drop them all. For some unary operations like `mean`, an option `keep_attrs` allows for switching the default from ""drop"" to ""keep"". Binary mathematical operations like `*` are always ""drop"".
In cases where there are two objects to combine but where the priority is clearer (e.g., in `concat` and `merge`), we'll preserve attributes from the first object and ignore the second. We use the same rule for in-place binary operations.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,28376794
https://github.com/pydata/xarray/issues/25#issuecomment-42349463,https://api.github.com/repos/pydata/xarray/issues/25,42349463,MDEyOklzc3VlQ29tbWVudDQyMzQ5NDYz,1217238,2014-05-06T19:44:27Z,2014-05-06T19:44:27Z,MEMBER,"I think this has been _mostly_ resolved by the `identical` and `equals` methods and the corresponding `compat` option for `Dataset.merge` and `Dataset.concat`.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,28376794
https://github.com/pydata/xarray/issues/25#issuecomment-36194729,https://api.github.com/repos/pydata/xarray/issues/25,36194729,MDEyOklzc3VlQ29tbWVudDM2MTk0NzI5,1217238,2014-02-27T00:07:29Z,2014-02-27T00:07:29Z,MEMBER,"Stern warnings about conflicting attributes like units may be an
appropriate compromise. But if we go that way, I would advocate for making
units drop upon doing any mathematical operation. We could try to update
units automatically (e.g., kg \* kg = kg^2), but that is tricky to always
right.
Celsius, for example, is a pretty weird physical unit because of how it can
take on negative values, so it actually makes a lot of sense to use mostly
use Kelvin instead (for which I can sensibly apply any math operation like
times or minus). That doesn't mean that I want to store all my raw data in
degrees K, though...
On Wed, Feb 26, 2014 at 3:45 PM, ebrevdo notifications@github.com wrote:
> err, which _attributes_ conflict.
>
> On Wed, Feb 26, 2014 at 3:45 PM, Eugene Brevdo ebrevdo@gmail.com wrote:
>
> > I don't think that example has your intended affect. I don't know why
> > anyone would add something of units kelvin with those of celsius. I
> > understand what you're saying, so maybe we should just throw a stern
> > warning listing which units conflict and how, every single time.
> >
> > On Wed, Feb 26, 2014 at 3:42 PM, Stephan Hoyer > wrote:
> >
> > > I see your point, but I favor a more pragmatic approach by default. See
> > > my
> > > fourth bullet under ""Design Goals"" in the README and bullet ii under
> > > Iris
> > > in ""Prior Art"".
> > >
> > > My vision here is a more powerful ndarray enhanced rather than limited
> > > by
> > > metadata. This is closer to what pandas does, which even allows for
> > > conflicting indices resulting in NaN values (a feature I would love to
> > > copy).
> > >
> > > I think that both use cases can be covered as long as the merge/conflict
> > > logic is clearly documented and it is possible to write stricter logic
> > > for
> > > library code (which by necessity will be more verbose). If it is
> > > essential
> > > for units to agree before doing x + y, you can add `assert
> > > x.attribubes.get('units') == y.attributes.get('units')`. Otherwise, we
> > > will
> > > end up prohibiting operations like that when x has units of Celsius and
> > > y
> > > has units of Kelvin.
> > >
> > > On Wed, Feb 26, 2014 at 3:23 PM, ebrevdo notifications@github.com
> > > wrote:
> > >
> > > > Also, there are plenty of other bits where you _don't_ want conflicts.
> > > > Imagine that you have variables indexed on different basemap
> > > > projections.
> > > > Creating exceptions to the rule seems like a bit of a rabbit hole.
> > > >
> > > > On Wed, Feb 26, 2014 at 3:13 PM, Eugene Brevdo ebrevdo@gmail.com
> > > > wrote:
> > > >
> > > > > This is an option, but these lists will break if we try to express
> > > > > other
> > > > > data formats using these conventions. For example, grib likely has
> > > > > other
> > > > > conventions. We would have to overload attribute or variable
> > > > > depending on
> > > > > what the underlying datastore is.
> > > > >
> > > > > On Wed, Feb 26, 2014 at 3:03 PM, Stephan Hoyer <
> > > > > notifications@github.com
> > > > > wrote:
> > > > >
> > > > > > x + y could indeed check variable attributes before trying to do
> > > > > > the
> > > > > > merge. I don't know if it does in the current implementation.
> > > > > >
> > > > > > My concern is more that metadata like ""title"" or ""source"" should
> > > > > > not
> > > > > > be
> > > > > > required to match, because that metadata will almost always be
> > > > > > conflicting.
> > > > > > Perhaps ""units"", ""_FIllValue"", ""scale_factor"" and ""add_offset"" (if
> > > > > >
> > > > > > values were not automatically masked/scaled) should be specifically
> > > > > > blacklisted to prohibit conflicts.
> > > > > >
> > > > > > ##
> > > > > >
> > > > > > Reply to this email directly or view it on GitHub<
> > > > > > https://github.com/akleeman/xray/issues/25#issuecomment-36189171>
> > > > > > .
> > > >
> > > > ##
> > > >
> > > > Reply to this email directly or view it on GitHub<
> > > > https://github.com/akleeman/xray/issues/25#issuecomment-36190935>
> > > >
> > > > .
> > >
> > > ##
> > >
> > > Reply to this email directly or view it on GitHub<
> > > https://github.com/akleeman/xray/issues/25#issuecomment-36192859>
> > > .
>
> ##
>
> Reply to this email directly or view it on GitHubhttps://github.com/akleeman/xray/issues/25#issuecomment-36193148
> .
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,28376794
https://github.com/pydata/xarray/issues/25#issuecomment-36192859,https://api.github.com/repos/pydata/xarray/issues/25,36192859,MDEyOklzc3VlQ29tbWVudDM2MTkyODU5,1217238,2014-02-26T23:42:27Z,2014-02-26T23:42:27Z,MEMBER,"I see your point, but I favor a more pragmatic approach by default. See my
fourth bullet under ""Design Goals"" in the README and bullet ii under Iris
in ""Prior Art"".
My vision here is a more powerful ndarray enhanced rather than limited by
metadata. This is closer to what pandas does, which even allows for
conflicting indices resulting in NaN values (a feature I would love to
copy).
I think that both use cases can be covered as long as the merge/conflict
logic is clearly documented and it is possible to write stricter logic for
library code (which by necessity will be more verbose). If it is essential
for units to agree before doing x + y, you can add `assert
x.attribubes.get('units') == y.attributes.get('units')`. Otherwise, we will
end up prohibiting operations like that when x has units of Celsius and y
has units of Kelvin.
On Wed, Feb 26, 2014 at 3:23 PM, ebrevdo notifications@github.com wrote:
> Also, there are plenty of other bits where you _don't_ want conflicts.
> Imagine that you have variables indexed on different basemap projections.
> Creating exceptions to the rule seems like a bit of a rabbit hole.
>
> On Wed, Feb 26, 2014 at 3:13 PM, Eugene Brevdo ebrevdo@gmail.com wrote:
>
> > This is an option, but these lists will break if we try to express other
> > data formats using these conventions. For example, grib likely has other
> > conventions. We would have to overload attribute or variable depending on
> > what the underlying datastore is.
> >
> > On Wed, Feb 26, 2014 at 3:03 PM, Stephan Hoyer > wrote:
> >
> > > x + y could indeed check variable attributes before trying to do the
> > > merge. I don't know if it does in the current implementation.
> > >
> > > My concern is more that metadata like ""title"" or ""source"" should not be
> > > required to match, because that metadata will almost always be
> > > conflicting.
> > > Perhaps ""units"", ""_FIllValue"", ""scale_factor"" and ""add_offset"" (if
> > > values were not automatically masked/scaled) should be specifically
> > > blacklisted to prohibit conflicts.
> > >
> > > ##
> > >
> > > Reply to this email directly or view it on GitHub<
> > > https://github.com/akleeman/xray/issues/25#issuecomment-36189171>
> > > .
>
> ##
>
> Reply to this email directly or view it on GitHubhttps://github.com/akleeman/xray/issues/25#issuecomment-36190935
> .
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,28376794
https://github.com/pydata/xarray/issues/25#issuecomment-36189171,https://api.github.com/repos/pydata/xarray/issues/25,36189171,MDEyOklzc3VlQ29tbWVudDM2MTg5MTcx,1217238,2014-02-26T23:03:06Z,2014-02-26T23:03:06Z,MEMBER,"`x + y` could indeed check variable attributes before trying to do the merge. I don't know if it does in the current implementation.
My concern is more that metadata like ""title"" or ""source"" should not be required to match, because that metadata will almost always be conflicting. Perhaps ""units"", ""_FIllValue"", ""scale_factor"" and ""add_offset"" (if values were not automatically masked/scaled) should be specifically blacklisted to prohibit conflicts.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,28376794
https://github.com/pydata/xarray/issues/25#issuecomment-36187723,https://api.github.com/repos/pydata/xarray/issues/25,36187723,MDEyOklzc3VlQ29tbWVudDM2MTg3NzIz,1217238,2014-02-26T22:47:54Z,2014-02-26T22:47:54Z,MEMBER,"`Dataset.merge` is also triggered by assigning a `DatasetArray` to a dataset or by doing a mathematical operation on two `DatasetArray`s (e.g., x + y). The later is how I encountered this issue today.
For merge itself, I would agree that we may want to default to stricter behavior, but for these other versions of merge we should default to something more flexible.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,28376794