home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where author_association = "MEMBER" and issue = 28376794 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • shoyer 6

issue 1

  • Consistent rules for handling merges between variables with different attributes · 6 ✖

author_association 1

  • MEMBER · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
54416883 https://github.com/pydata/xarray/issues/25#issuecomment-54416883 https://api.github.com/repos/pydata/xarray/issues/25 MDEyOklzc3VlQ29tbWVudDU0NDE2ODgz shoyer 1217238 2014-09-04T06:50:49Z 2014-09-04T06:50:49Z MEMBER

I'm going to close this issue as fixed, but feel free to complain if you feel otherwise (particularly if you have ideas for how we should improve this).

The rule that we seem to have settled on is that xray will either drop all attributes if the result could be ambiguous, or, if there is a clear priority, it will only keep around attributes from the first object. The one firm rule is that xray does not do any checking of attributes for conflicts.

Unless compat == 'identical', there's no checking for conflicts: operations are either keep them all (mostly just subsetting/indexing) or drop them all. For some unary operations like mean, an option keep_attrs allows for switching the default from "drop" to "keep". Binary mathematical operations like * are always "drop".

In cases where there are two objects to combine but where the priority is clearer (e.g., in concat and merge), we'll preserve attributes from the first object and ignore the second. We use the same rule for in-place binary operations.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Consistent rules for handling merges between variables with different attributes 28376794
42349463 https://github.com/pydata/xarray/issues/25#issuecomment-42349463 https://api.github.com/repos/pydata/xarray/issues/25 MDEyOklzc3VlQ29tbWVudDQyMzQ5NDYz shoyer 1217238 2014-05-06T19:44:27Z 2014-05-06T19:44:27Z MEMBER

I think this has been mostly resolved by the identical and equals methods and the corresponding compat option for Dataset.merge and Dataset.concat.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Consistent rules for handling merges between variables with different attributes 28376794
36194729 https://github.com/pydata/xarray/issues/25#issuecomment-36194729 https://api.github.com/repos/pydata/xarray/issues/25 MDEyOklzc3VlQ29tbWVudDM2MTk0NzI5 shoyer 1217238 2014-02-27T00:07:29Z 2014-02-27T00:07:29Z MEMBER

Stern warnings about conflicting attributes like units may be an appropriate compromise. But if we go that way, I would advocate for making units drop upon doing any mathematical operation. We could try to update units automatically (e.g., kg * kg = kg^2), but that is tricky to always right.

Celsius, for example, is a pretty weird physical unit because of how it can take on negative values, so it actually makes a lot of sense to use mostly use Kelvin instead (for which I can sensibly apply any math operation like times or minus). That doesn't mean that I want to store all my raw data in degrees K, though...

On Wed, Feb 26, 2014 at 3:45 PM, ebrevdo notifications@github.com wrote:

err, which attributes conflict.

On Wed, Feb 26, 2014 at 3:45 PM, Eugene Brevdo ebrevdo@gmail.com wrote:

I don't think that example has your intended affect. I don't know why anyone would add something of units kelvin with those of celsius. I understand what you're saying, so maybe we should just throw a stern warning listing which units conflict and how, every single time.

On Wed, Feb 26, 2014 at 3:42 PM, Stephan Hoyer <notifications@github.com wrote:

I see your point, but I favor a more pragmatic approach by default. See my fourth bullet under "Design Goals" in the README and bullet ii under Iris in "Prior Art".

My vision here is a more powerful ndarray enhanced rather than limited by metadata. This is closer to what pandas does, which even allows for conflicting indices resulting in NaN values (a feature I would love to copy).

I think that both use cases can be covered as long as the merge/conflict logic is clearly documented and it is possible to write stricter logic for library code (which by necessity will be more verbose). If it is essential for units to agree before doing x + y, you can add assert x.attribubes.get('units') == y.attributes.get('units'). Otherwise, we will end up prohibiting operations like that when x has units of Celsius and y has units of Kelvin.

On Wed, Feb 26, 2014 at 3:23 PM, ebrevdo notifications@github.com wrote:

Also, there are plenty of other bits where you don't want conflicts. Imagine that you have variables indexed on different basemap projections. Creating exceptions to the rule seems like a bit of a rabbit hole.

On Wed, Feb 26, 2014 at 3:13 PM, Eugene Brevdo ebrevdo@gmail.com wrote:

This is an option, but these lists will break if we try to express other data formats using these conventions. For example, grib likely has other conventions. We would have to overload attribute or variable depending on what the underlying datastore is.

On Wed, Feb 26, 2014 at 3:03 PM, Stephan Hoyer < notifications@github.com wrote:

x + y could indeed check variable attributes before trying to do the merge. I don't know if it does in the current implementation.

My concern is more that metadata like "title" or "source" should not be required to match, because that metadata will almost always be conflicting. Perhaps "units", "_FIllValue", "scale_factor" and "add_offset" (if

values were not automatically masked/scaled) should be specifically blacklisted to prohibit conflicts.

Reply to this email directly or view it on GitHub< https://github.com/akleeman/xray/issues/25#issuecomment-36189171> .

Reply to this email directly or view it on GitHub< https://github.com/akleeman/xray/issues/25#issuecomment-36190935>

.

Reply to this email directly or view it on GitHub< https://github.com/akleeman/xray/issues/25#issuecomment-36192859> .

Reply to this email directly or view it on GitHubhttps://github.com/akleeman/xray/issues/25#issuecomment-36193148 .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Consistent rules for handling merges between variables with different attributes 28376794
36192859 https://github.com/pydata/xarray/issues/25#issuecomment-36192859 https://api.github.com/repos/pydata/xarray/issues/25 MDEyOklzc3VlQ29tbWVudDM2MTkyODU5 shoyer 1217238 2014-02-26T23:42:27Z 2014-02-26T23:42:27Z MEMBER

I see your point, but I favor a more pragmatic approach by default. See my fourth bullet under "Design Goals" in the README and bullet ii under Iris in "Prior Art".

My vision here is a more powerful ndarray enhanced rather than limited by metadata. This is closer to what pandas does, which even allows for conflicting indices resulting in NaN values (a feature I would love to copy).

I think that both use cases can be covered as long as the merge/conflict logic is clearly documented and it is possible to write stricter logic for library code (which by necessity will be more verbose). If it is essential for units to agree before doing x + y, you can add assert x.attribubes.get('units') == y.attributes.get('units'). Otherwise, we will end up prohibiting operations like that when x has units of Celsius and y has units of Kelvin.

On Wed, Feb 26, 2014 at 3:23 PM, ebrevdo notifications@github.com wrote:

Also, there are plenty of other bits where you don't want conflicts. Imagine that you have variables indexed on different basemap projections. Creating exceptions to the rule seems like a bit of a rabbit hole.

On Wed, Feb 26, 2014 at 3:13 PM, Eugene Brevdo ebrevdo@gmail.com wrote:

This is an option, but these lists will break if we try to express other data formats using these conventions. For example, grib likely has other conventions. We would have to overload attribute or variable depending on what the underlying datastore is.

On Wed, Feb 26, 2014 at 3:03 PM, Stephan Hoyer <notifications@github.com wrote:

x + y could indeed check variable attributes before trying to do the merge. I don't know if it does in the current implementation.

My concern is more that metadata like "title" or "source" should not be required to match, because that metadata will almost always be conflicting. Perhaps "units", "_FIllValue", "scale_factor" and "add_offset" (if values were not automatically masked/scaled) should be specifically blacklisted to prohibit conflicts.

Reply to this email directly or view it on GitHub< https://github.com/akleeman/xray/issues/25#issuecomment-36189171> .

Reply to this email directly or view it on GitHubhttps://github.com/akleeman/xray/issues/25#issuecomment-36190935 .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Consistent rules for handling merges between variables with different attributes 28376794
36189171 https://github.com/pydata/xarray/issues/25#issuecomment-36189171 https://api.github.com/repos/pydata/xarray/issues/25 MDEyOklzc3VlQ29tbWVudDM2MTg5MTcx shoyer 1217238 2014-02-26T23:03:06Z 2014-02-26T23:03:06Z MEMBER

x + y could indeed check variable attributes before trying to do the merge. I don't know if it does in the current implementation.

My concern is more that metadata like "title" or "source" should not be required to match, because that metadata will almost always be conflicting. Perhaps "units", "_FIllValue", "scale_factor" and "add_offset" (if values were not automatically masked/scaled) should be specifically blacklisted to prohibit conflicts.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Consistent rules for handling merges between variables with different attributes 28376794
36187723 https://github.com/pydata/xarray/issues/25#issuecomment-36187723 https://api.github.com/repos/pydata/xarray/issues/25 MDEyOklzc3VlQ29tbWVudDM2MTg3NzIz shoyer 1217238 2014-02-26T22:47:54Z 2014-02-26T22:47:54Z MEMBER

Dataset.merge is also triggered by assigning a DatasetArray to a dataset or by doing a mathematical operation on two DatasetArrays (e.g., x + y). The later is how I encountered this issue today.

For merge itself, I would agree that we may want to default to stricter behavior, but for these other versions of merge we should default to something more flexible.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Consistent rules for handling merges between variables with different attributes 28376794

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 10.942ms · About: xarray-datasette