html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/25#issuecomment-36196139,https://api.github.com/repos/pydata/xarray/issues/25,36196139,MDEyOklzc3VlQ29tbWVudDM2MTk2MTM5,1794715,2014-02-27T00:28:13Z,2014-02-27T00:28:13Z,CONTRIBUTOR,"Agreed. I would avoid that kind of thing too. Maybe a stern warning for all conflicting attributes, and saying that they will be dropped from the new variable. For units specifically, Python has a variety of unit libraries that wrap numpy arrays and can probably do some magic. Not sure if we really want to do that, though. On Wed, Feb 26, 2014 at 4:07 PM, Stephan Hoyer notifications@github.comwrote: > Stern warnings about conflicting attributes like units may be an > appropriate compromise. But if we go that way, I would advocate for making > units drop upon doing any mathematical operation. We could try to update > units automatically (e.g., kg \* kg = kg^2), but that is tricky to always > right. > > Celsius, for example, is a pretty weird physical unit because of how it can > take on negative values, so it actually makes a lot of sense to use mostly > use Kelvin instead (for which I can sensibly apply any math operation like > times or minus). That doesn't mean that I want to store all my raw data in > degrees K, though... > > On Wed, Feb 26, 2014 at 3:45 PM, ebrevdo notifications@github.com wrote: > > > err, which _attributes_ conflict. > > > > On Wed, Feb 26, 2014 at 3:45 PM, Eugene Brevdo ebrevdo@gmail.com > > wrote: > > > > > I don't think that example has your intended affect. I don't know why > > > anyone would add something of units kelvin with those of celsius. I > > > understand what you're saying, so maybe we should just throw a stern > > > warning listing which units conflict and how, every single time. > > > > > > On Wed, Feb 26, 2014 at 3:42 PM, Stephan Hoyer < > > > notifications@github.com > > > wrote: > > > > > > > I see your point, but I favor a more pragmatic approach by default. > > > > See > > > > my > > > > fourth bullet under ""Design Goals"" in the README and bullet ii under > > > > Iris > > > > in ""Prior Art"". > > > > > > > > My vision here is a more powerful ndarray enhanced rather than limited > > > > by > > > > metadata. This is closer to what pandas does, which even allows for > > > > conflicting indices resulting in NaN values (a feature I would love to > > > > copy). > > > > > > > > I think that both use cases can be covered as long as the > > > > merge/conflict > > > > logic is clearly documented and it is possible to write stricter logic > > > > for > > > > library code (which by necessity will be more verbose). If it is > > > > essential > > > > for units to agree before doing x + y, you can add `assert > > > > x.attribubes.get('units') == y.attributes.get('units')`. Otherwise, we > > > > will > > > > end up prohibiting operations like that when x has units of Celsius > > > > and > > > > y > > > > has units of Kelvin. > > > > > > > > On Wed, Feb 26, 2014 at 3:23 PM, ebrevdo notifications@github.com > > > > wrote: > > > > > > > > > Also, there are plenty of other bits where you _don't_ want > > > > > conflicts. > > > > > Imagine that you have variables indexed on different basemap > > > > > projections. > > > > > Creating exceptions to the rule seems like a bit of a rabbit hole. > > > > > > > > > > On Wed, Feb 26, 2014 at 3:13 PM, Eugene Brevdo ebrevdo@gmail.com > > > > > wrote: > > > > > > > > > > > This is an option, but these lists will break if we try to express > > > > > > other > > > > > > data formats using these conventions. For example, grib likely has > > > > > > other > > > > > > conventions. We would have to overload attribute or variable > > > > > > depending on > > > > > > what the underlying datastore is. > > > > > > > > > > > > On Wed, Feb 26, 2014 at 3:03 PM, Stephan Hoyer < > > > > > > notifications@github.com > > > > > > wrote: > > > > > > > > > > > > > x + y could indeed check variable attributes before trying to do > > > > > > > the > > > > > > > merge. I don't know if it does in the current implementation. > > > > > > > > > > > > > > My concern is more that metadata like ""title"" or ""source"" should > > > > > > > not > > > > > > > be > > > > > > > required to match, because that metadata will almost always be > > > > > > > conflicting. > > > > > > > Perhaps ""units"", ""_FIllValue"", ""scale_factor"" and ""add_offset"" > > > > > > > (if > > > > > > > > > > > > > > values were not automatically masked/scaled) should be > > > > > > > specifically > > > > > > > blacklisted to prohibit conflicts. > > > > > > > > > > > > > > ## > > > > > > > > > > > > > > Reply to this email directly or view it on GitHub< > > > > > > > https://github.com/akleeman/xray/issues/25#issuecomment-36189171> > > > > > > > . > > > > > > > > > > ## > > > > > > > > > > Reply to this email directly or view it on GitHub< > > > > > https://github.com/akleeman/xray/issues/25#issuecomment-36190935> > > > > > > > > > > . > > > > > > > > ## > > > > > > > > Reply to this email directly or view it on GitHub< > > > > https://github.com/akleeman/xray/issues/25#issuecomment-36192859> > > > > . > > > > ## > > > > Reply to this email directly or view it on GitHub< > > https://github.com/akleeman/xray/issues/25#issuecomment-36193148> > > > > . > > ## > > Reply to this email directly or view it on GitHubhttps://github.com/akleeman/xray/issues/25#issuecomment-36194729 > . ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,28376794 https://github.com/pydata/xarray/issues/25#issuecomment-36193148,https://api.github.com/repos/pydata/xarray/issues/25,36193148,MDEyOklzc3VlQ29tbWVudDM2MTkzMTQ4,1794715,2014-02-26T23:45:57Z,2014-02-26T23:45:57Z,CONTRIBUTOR,"err, which _attributes_ conflict. On Wed, Feb 26, 2014 at 3:45 PM, Eugene Brevdo ebrevdo@gmail.com wrote: > I don't think that example has your intended affect. I don't know why > anyone would add something of units kelvin with those of celsius. I > understand what you're saying, so maybe we should just throw a stern > warning listing which units conflict and how, every single time. > > On Wed, Feb 26, 2014 at 3:42 PM, Stephan Hoyer notifications@github.comwrote: > > > I see your point, but I favor a more pragmatic approach by default. See my > > fourth bullet under ""Design Goals"" in the README and bullet ii under Iris > > in ""Prior Art"". > > > > My vision here is a more powerful ndarray enhanced rather than limited by > > metadata. This is closer to what pandas does, which even allows for > > conflicting indices resulting in NaN values (a feature I would love to > > copy). > > > > I think that both use cases can be covered as long as the merge/conflict > > logic is clearly documented and it is possible to write stricter logic for > > library code (which by necessity will be more verbose). If it is essential > > for units to agree before doing x + y, you can add `assert > > x.attribubes.get('units') == y.attributes.get('units')`. Otherwise, we > > will > > end up prohibiting operations like that when x has units of Celsius and y > > has units of Kelvin. > > > > On Wed, Feb 26, 2014 at 3:23 PM, ebrevdo notifications@github.com > > wrote: > > > > > Also, there are plenty of other bits where you _don't_ want conflicts. > > > Imagine that you have variables indexed on different basemap > > > projections. > > > Creating exceptions to the rule seems like a bit of a rabbit hole. > > > > > > On Wed, Feb 26, 2014 at 3:13 PM, Eugene Brevdo ebrevdo@gmail.com > > > wrote: > > > > > > > This is an option, but these lists will break if we try to express > > > > other > > > > data formats using these conventions. For example, grib likely has > > > > other > > > > conventions. We would have to overload attribute or variable > > > > depending on > > > > what the underlying datastore is. > > > > > > > > On Wed, Feb 26, 2014 at 3:03 PM, Stephan Hoyer < > > > > notifications@github.com > > > > wrote: > > > > > > > > > x + y could indeed check variable attributes before trying to do the > > > > > merge. I don't know if it does in the current implementation. > > > > > > > > > > My concern is more that metadata like ""title"" or ""source"" should not > > > > > be > > > > > required to match, because that metadata will almost always be > > > > > conflicting. > > > > > Perhaps ""units"", ""_FIllValue"", ""scale_factor"" and ""add_offset"" (if > > > > > > > > > > values were not automatically masked/scaled) should be specifically > > > > > blacklisted to prohibit conflicts. > > > > > > > > > > ## > > > > > > > > > > Reply to this email directly or view it on GitHub< > > > > > https://github.com/akleeman/xray/issues/25#issuecomment-36189171> > > > > > . > > > > > > ## > > > > > > Reply to this email directly or view it on GitHub< > > > https://github.com/akleeman/xray/issues/25#issuecomment-36190935> > > > > > > . > > > > ## > > > > Reply to this email directly or view it on GitHubhttps://github.com/akleeman/xray/issues/25#issuecomment-36192859 > > . ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,28376794 https://github.com/pydata/xarray/issues/25#issuecomment-36193126,https://api.github.com/repos/pydata/xarray/issues/25,36193126,MDEyOklzc3VlQ29tbWVudDM2MTkzMTI2,1794715,2014-02-26T23:45:42Z,2014-02-26T23:45:42Z,CONTRIBUTOR,"I don't think that example has your intended affect. I don't know why anyone would add something of units kelvin with those of celsius. I understand what you're saying, so maybe we should just throw a stern warning listing which units conflict and how, every single time. On Wed, Feb 26, 2014 at 3:42 PM, Stephan Hoyer notifications@github.comwrote: > I see your point, but I favor a more pragmatic approach by default. See my > fourth bullet under ""Design Goals"" in the README and bullet ii under Iris > in ""Prior Art"". > > My vision here is a more powerful ndarray enhanced rather than limited by > metadata. This is closer to what pandas does, which even allows for > conflicting indices resulting in NaN values (a feature I would love to > copy). > > I think that both use cases can be covered as long as the merge/conflict > logic is clearly documented and it is possible to write stricter logic for > library code (which by necessity will be more verbose). If it is essential > for units to agree before doing x + y, you can add `assert > x.attribubes.get('units') == y.attributes.get('units')`. Otherwise, we will > end up prohibiting operations like that when x has units of Celsius and y > has units of Kelvin. > > On Wed, Feb 26, 2014 at 3:23 PM, ebrevdo notifications@github.com wrote: > > > Also, there are plenty of other bits where you _don't_ want conflicts. > > Imagine that you have variables indexed on different basemap projections. > > Creating exceptions to the rule seems like a bit of a rabbit hole. > > > > On Wed, Feb 26, 2014 at 3:13 PM, Eugene Brevdo ebrevdo@gmail.com > > wrote: > > > > > This is an option, but these lists will break if we try to express > > > other > > > data formats using these conventions. For example, grib likely has > > > other > > > conventions. We would have to overload attribute or variable depending > > > on > > > what the underlying datastore is. > > > > > > On Wed, Feb 26, 2014 at 3:03 PM, Stephan Hoyer < > > > notifications@github.com > > > wrote: > > > > > > > x + y could indeed check variable attributes before trying to do the > > > > merge. I don't know if it does in the current implementation. > > > > > > > > My concern is more that metadata like ""title"" or ""source"" should not > > > > be > > > > required to match, because that metadata will almost always be > > > > conflicting. > > > > Perhaps ""units"", ""_FIllValue"", ""scale_factor"" and ""add_offset"" (if > > > > > > > > values were not automatically masked/scaled) should be specifically > > > > blacklisted to prohibit conflicts. > > > > > > > > ## > > > > > > > > Reply to this email directly or view it on GitHub< > > > > https://github.com/akleeman/xray/issues/25#issuecomment-36189171> > > > > . > > > > ## > > > > Reply to this email directly or view it on GitHub< > > https://github.com/akleeman/xray/issues/25#issuecomment-36190935> > > > > . > > ## > > Reply to this email directly or view it on GitHubhttps://github.com/akleeman/xray/issues/25#issuecomment-36192859 > . ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,28376794 https://github.com/pydata/xarray/issues/25#issuecomment-36190935,https://api.github.com/repos/pydata/xarray/issues/25,36190935,MDEyOklzc3VlQ29tbWVudDM2MTkwOTM1,1794715,2014-02-26T23:23:39Z,2014-02-26T23:23:39Z,CONTRIBUTOR,"Also, there are plenty of other bits where you _don't_ want conflicts. Imagine that you have variables indexed on different basemap projections. Creating exceptions to the rule seems like a bit of a rabbit hole. On Wed, Feb 26, 2014 at 3:13 PM, Eugene Brevdo ebrevdo@gmail.com wrote: > This is an option, but these lists will break if we try to express other > data formats using these conventions. For example, grib likely has other > conventions. We would have to overload attribute or variable depending on > what the underlying datastore is. > > On Wed, Feb 26, 2014 at 3:03 PM, Stephan Hoyer notifications@github.comwrote: > > > x + y could indeed check variable attributes before trying to do the > > merge. I don't know if it does in the current implementation. > > > > My concern is more that metadata like ""title"" or ""source"" should not be > > required to match, because that metadata will almost always be conflicting. > > Perhaps ""units"", ""_FIllValue"", ""scale_factor"" and ""add_offset"" (if > > values were not automatically masked/scaled) should be specifically > > blacklisted to prohibit conflicts. > > > > ## > > > > Reply to this email directly or view it on GitHubhttps://github.com/akleeman/xray/issues/25#issuecomment-36189171 > > . ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,28376794 https://github.com/pydata/xarray/issues/25#issuecomment-36190079,https://api.github.com/repos/pydata/xarray/issues/25,36190079,MDEyOklzc3VlQ29tbWVudDM2MTkwMDc5,1794715,2014-02-26T23:13:42Z,2014-02-26T23:13:42Z,CONTRIBUTOR,"This is an option, but these lists will break if we try to express other data formats using these conventions. For example, grib likely has other conventions. We would have to overload attribute or variable depending on what the underlying datastore is. On Wed, Feb 26, 2014 at 3:03 PM, Stephan Hoyer notifications@github.comwrote: > x + y could indeed check variable attributes before trying to do the > merge. I don't know if it does in the current implementation. > > My concern is more that metadata like ""title"" or ""source"" should not be > required to match, because that metadata will almost always be conflicting. > Perhaps ""units"", ""_FIllValue"", ""scale_factor"" and ""add_offset"" (if values > were not automatically masked/scaled) should be specifically blacklisted to > prohibit conflicts. > > ## > > Reply to this email directly or view it on GitHubhttps://github.com/akleeman/xray/issues/25#issuecomment-36189171 > . ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,28376794 https://github.com/pydata/xarray/issues/25#issuecomment-36188397,https://api.github.com/repos/pydata/xarray/issues/25,36188397,MDEyOklzc3VlQ29tbWVudDM2MTg4Mzk3,1794715,2014-02-26T22:55:10Z,2014-02-26T22:55:10Z,CONTRIBUTOR,"It depends on whether x+y does attribute checking before performing the merge. Again, if units don't match then maybe you shouldn't add. I always favor the strictest approach so you don't get strange surprises. On Wed, Feb 26, 2014 at 2:47 PM, Stephan Hoyer notifications@github.comwrote: > Dataset.merge is also triggered by assigning a DatasetArray to a dataset > or by doing a mathematical operation on two DatasetArrays (e.g., x + y). > The later is how I encountered this issue today. > > For merge itself, I would agree that we may want to default to stricter > behavior, but for these other versions of merge we should default to > something more flexible. > > ## > > Reply to this email directly or view it on GitHubhttps://github.com/akleeman/xray/issues/25#issuecomment-36187723 > . ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,28376794 https://github.com/pydata/xarray/issues/25#issuecomment-36186918,https://api.github.com/repos/pydata/xarray/issues/25,36186918,MDEyOklzc3VlQ29tbWVudDM2MTg2OTE4,1794715,2014-02-26T22:39:30Z,2014-02-26T22:39:30Z,CONTRIBUTOR,"I would default to 3, and in the exception suggest using a different merge option. Imagine merging two datasets with different _FillValue, unit, or compression attributes. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,28376794