home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

11 rows where issue = 160505403 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • shoyer 6
  • max-sixty 2
  • fujiisoup 2
  • fmaussion 1

issue 1

  • Iterating over a Dataset iterates only over its data_vars · 11 ✖

author_association 1

  • MEMBER 11
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
391964427 https://github.com/pydata/xarray/issues/884#issuecomment-391964427 https://api.github.com/repos/pydata/xarray/issues/884 MDEyOklzc3VlQ29tbWVudDM5MTk2NDQyNw== fujiisoup 6815844 2018-05-25T07:17:33Z 2018-05-25T07:17:33Z MEMBER

So, the behavior of dict(dataset) will change if we changed the behavior of __iter__. Can we issue a warning if dict(dataset) is called (or is it impossible)?

In #2162, you changed python for k, v in dataset.items(): to python dataset = OrderedDict(dataset) for k, v in dataset.items(): to avoid the warning, but I am afraid it will cause the unexpected behavior when we stop supporting iteration over coordinates.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Iterating over a Dataset iterates only over its data_vars 160505403
391951914 https://github.com/pydata/xarray/issues/884#issuecomment-391951914 https://api.github.com/repos/pydata/xarray/issues/884 MDEyOklzc3VlQ29tbWVudDM5MTk1MTkxNA== shoyer 1217238 2018-05-25T06:09:07Z 2018-05-25T06:09:07Z MEMBER

Do we need to change the behavior of dict(dataset) so that dict(dataset).keys() and dataset.keys() become consistent?

No, I think these are guaranteed to be consistent because we inherit from collections.Mapping to implement dict methods like keys(), values() and items() (via __iter__ and __getitem__).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Iterating over a Dataset iterates only over its data_vars 160505403
391943527 https://github.com/pydata/xarray/issues/884#issuecomment-391943527 https://api.github.com/repos/pydata/xarray/issues/884 MDEyOklzc3VlQ29tbWVudDM5MTk0MzUyNw== fujiisoup 6815844 2018-05-25T05:15:41Z 2018-05-25T05:15:41Z MEMBER

Do we need to change the behavior of dict(dataset) so that dict(dataset).keys() and dataset.keys() become consistent?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Iterating over a Dataset iterates only over its data_vars 160505403
338446509 https://github.com/pydata/xarray/issues/884#issuecomment-338446509 https://api.github.com/repos/pydata/xarray/issues/884 MDEyOklzc3VlQ29tbWVudDMzODQ0NjUwOQ== shoyer 1217238 2017-10-22T02:36:39Z 2017-10-22T02:36:39Z MEMBER

Another question is what to do with x in ds (i.e.,Dataset.__contains__). Currently it checks data variables and coordinates, but not dimensions (as @fujiisoup pointed out in https://github.com/pydata/xarray/pull/1632#issuecomment-336637245), which already feels somewhat inconsistent.

If we were starting over, I might change how __contains__ works for Dataset to only include data variables, but I don't think it's worth it at this time:

  • Unlike the current version of __iter__, __contains__ is actually useful in its current state and I suspect is widely used. Issuing a deprecation warning for 'variable' in ds would annoy lots of users, for no particularly good reason. In many cases (checking data variables), the behavior would not actually change.
  • There is also the expectation that k in ds is equivalent to ds[k] for mapping types, which is currently mostly true for xarray.Dataset.
  • The main advantage I see to changing the behavior of k in ds is that it would remove most of the remaining use cases for Dataset.data_vars, which would let us eventually remove data_vars. But given that ds[k] for coordinates k will continue to be supported, I think I would still recommend explicitly writing k in ds.data_vars for cases where the data/coordinates distinction matters.

Related: over in https://github.com/pydata/xarray/pull/1645, I am deprecating the current behavior of DataArray.__contains__, so that we can make it check array values instead of DataArray.coords in the future.

If we aren't going to fundamentally change the behavior of k in ds to exclude coordinates, then we should probably update it, at @fujiisoup suggests, to also include MultiIndex levels and dimension names (i.e., the stuff checked in xarray.core.dataset._get_virtual_variable()).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Iterating over a Dataset iterates only over its data_vars 160505403
338445322 https://github.com/pydata/xarray/issues/884#issuecomment-338445322 https://api.github.com/repos/pydata/xarray/issues/884 MDEyOklzc3VlQ29tbWVudDMzODQ0NTMyMg== shoyer 1217238 2017-10-22T02:06:27Z 2017-10-22T02:28:30Z MEMBER

I'm pretty sure now that we should (at least) change iter(ds) and ds.keys() to only include data variables. This is a repeated source of annoyance.

For the most recent example, consider the API for argmin() suggested by @fujiisoup in https://github.com/pydata/xarray/issues/1388#issuecomment-338397633, where argmin() would return a Dataset. We'd like make arr.isel(**arr.argmin(dim)) work, but this requires no additional members (beyond data variables) in arr.argmin(dim).keys().

The question is how to do it: is it worth a deprecation cycle? My proposal: - In v0.10, start issuing FutureWarning when calling ds.__iter__. Suggest that users switch to iterating over .variables instead if they really want everything. (Many cases where every data variable and coordinate is being iterated over should probably already be using the low-level API.) - In v0.11, switch to the new behavior.

Note: __len__ should also be changed in lock-step, to ensure the invariant len(list(ds)) == len(ds).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Iterating over a Dataset iterates only over its data_vars 160505403
237858903 https://github.com/pydata/xarray/issues/884#issuecomment-237858903 https://api.github.com/repos/pydata/xarray/issues/884 MDEyOklzc3VlQ29tbWVudDIzNzg1ODkwMw== max-sixty 5635139 2016-08-05T14:04:22Z 2016-08-05T14:04:22Z MEMBER

From those, the latter three are trivial to keep. The first (iter(ds)) would potentially break some contracts with the Mapping abc

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Iterating over a Dataset iterates only over its data_vars 160505403
237783362 https://github.com/pydata/xarray/issues/884#issuecomment-237783362 https://api.github.com/repos/pydata/xarray/issues/884 MDEyOklzc3VlQ29tbWVudDIzNzc4MzM2Mg== shoyer 1217238 2016-08-05T08:06:32Z 2016-08-05T08:07:22Z MEMBER

What would happen to the actual variables? Would ds['longitude'] also return a KeyError? This would be very far from the NetCDF model and would brake many many things....

There is no way we would change this -- ds['longitude'] will always return the longitude array.

What are your thoughts now?

I like this change for the reasons you outline above, and those I mentioned in the previous issue. I'm somewhat concerned about breaking a lot of user code, and also am not sure yet what the right solution here looks, because of concerns about breaking duck-type compatibility with dictionaries.

Which of these should no longer include coordinates? Can we make things consistent without changing all of them? - iter(ds) (currently we actually define Dataset.keys() implicitly via __iter__) - 'x' in ds - del ds['x'] - ds['x'] (this would be quite a compatibility break, as @fmaussion notes above)

We would also need to encourage users to use the low level Dataset.variables property to check whether something is either a coordinate or data variables.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Iterating over a Dataset iterates only over its data_vars 160505403
226366273 https://github.com/pydata/xarray/issues/884#issuecomment-226366273 https://api.github.com/repos/pydata/xarray/issues/884 MDEyOklzc3VlQ29tbWVudDIyNjM2NjI3Mw== max-sixty 5635139 2016-06-16T01:34:43Z 2016-06-16T01:34:43Z MEMBER

Then I merged the functionality you are describing, but at some point switched it back....

Ha!

What are your thoughts now?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Iterating over a Dataset iterates only over its data_vars 160505403
226332618 https://github.com/pydata/xarray/issues/884#issuecomment-226332618 https://api.github.com/repos/pydata/xarray/issues/884 MDEyOklzc3VlQ29tbWVudDIyNjMzMjYxOA== fmaussion 10050469 2016-06-15T21:58:06Z 2016-06-15T21:58:06Z MEMBER

What would happen to the actual variables? Would ds['longitude'] also return a KeyError? This would be very far from the NetCDF model and would brake many many things....

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Iterating over a Dataset iterates only over its data_vars 160505403
226329448 https://github.com/pydata/xarray/issues/884#issuecomment-226329448 https://api.github.com/repos/pydata/xarray/issues/884 MDEyOklzc3VlQ29tbWVudDIyNjMyOTQ0OA== shoyer 1217238 2016-06-15T21:44:13Z 2016-06-15T21:44:13Z MEMBER

Here is one place where I discussed this with myself! https://github.com/pydata/xarray/issues/211

Then I merged the functionality you are describing, but at some point switched it back....

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Iterating over a Dataset iterates only over its data_vars 160505403
226328663 https://github.com/pydata/xarray/issues/884#issuecomment-226328663 https://api.github.com/repos/pydata/xarray/issues/884 MDEyOklzc3VlQ29tbWVudDIyNjMyODY2Mw== shoyer 1217238 2016-06-15T21:40:53Z 2016-06-15T21:40:53Z MEMBER

An early version of xarray actually worked exactly like this, but we switched it to the current functionality. Let me see if I can dig up the relevant issues....

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Iterating over a Dataset iterates only over its data_vars 160505403

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.975ms · About: xarray-datasette