home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where issue = 379415229 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • shoyer · 10 ✖

issue 1

  • Feature: N-dimensional auto_combine · 10 ✖

author_association 1

  • MEMBER 10
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
447048979 https://github.com/pydata/xarray/pull/2553#issuecomment-447048979 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0NzA0ODk3OQ== shoyer 1217238 2018-12-13T17:16:16Z 2018-12-13T17:16:16Z MEMBER

thanks @TomNicholas! I'm looking forward to the follow-ups :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
445907024 https://github.com/pydata/xarray/pull/2553#issuecomment-445907024 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0NTkwNzAyNA== shoyer 1217238 2018-12-10T17:48:21Z 2018-12-10T17:48:21Z MEMBER

Okay good. So with this API then auto_combine() (and therefore the default behaviour of open_mfdatset()) would be to throw a warning if there isn't enough info in the dataset coords to order them?

Yes, but throwing a warning should probably only be a temporary solution. Long term we should pick a default value (e.g., combine='auto') and require explicitly opting in to different behavior (e.g., combine='manual').

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
445892746 https://github.com/pydata/xarray/pull/2553#issuecomment-445892746 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0NTg5Mjc0Ng== shoyer 1217238 2018-12-10T17:06:59Z 2018-12-10T17:06:59Z MEMBER

The only problem with that idea is that both of these functions should be options for open_mfdataset(), so you still have multiple possible argument structures for that function. I think that's okay though - it would be a lot easier to explain if you say "if the keyword argument combine='auto' then the function auto_combine() is used, if combine='manual' then nested_concat_and_merge() is used, and the filepaths must be structured accordingly."

This sounds pretty good to me.

If you like, we could also merge this work (which is excellent progress towards user-facing APIs) but keep the changes internal to xarray for now until we figure out the public APIs.

That would be great. Then I could start using the master branch of xarray again in my code, while we redo the public API. If I set concat_dims back to concat_dim then it should be completely backwards-compatible.

OK, do you want to go ahead and revert the documentation/public API for now?

I would even be OK supporting nested lists temporarily in xarray via APIs like open_mfdataset and auto_concat without official documentation/support -- using this feature would be "at your own risk". As long as we don't break any existing code, that is totally fine. We just shouldn't advertise an interface until we've happy with it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
445419876 https://github.com/pydata/xarray/pull/2553#issuecomment-445419876 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0NTQxOTg3Ng== shoyer 1217238 2018-12-08T01:49:28Z 2018-12-10T17:00:04Z MEMBER

I think that would mean there might be some situations that 1D auto_combine() could deal with but nested_concat_and_merge() could not. Situations where you can only merge the variables once they have been individually concatenated. (actually I'm not sure - I would need to think about that)

I think multiple levels nested_concat_and_merge() could always replace auto_combine() as long as merging happens last, at the outer-most level. That's basically what auto_combine() already does.

There are probably some edge cases where the existing auto_combine would you be a little more sloppy about how you put together your lists of variables. But I don't think that's really a good idea for us to support in xarray, since it removes most of the freedom from the implementation.

I personally think a), but I expect users who haven't read the source code for auto_combine() might disagree.

Yes, this is my concern. The way this API handles nested lists makes sense from an implementation perspective, but not really from a user perspective. For users, I think it makes sense to have: 1. The precisely defined nested_concat_and_merge() for cases where you already data sorted. This is pretty common (e.g., with hierarchical directory structures). 2. A magical auto_combine() that inspects coordinates to figure out how to put everything together. We might even rename this to combine() (if we think it's comprehensive enough).

The original version of auto_combine() that I wrote was aiming towards this second use-case, but was obviously fragile and incomplete.

If you like, we could also merge this work (which is excellent progress towards user-facing APIs) but keep the changes internal to xarray for now until we figure out the public APIs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
444713331 https://github.com/pydata/xarray/pull/2553#issuecomment-444713331 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0NDcxMzMzMQ== shoyer 1217238 2018-12-06T01:22:08Z 2018-12-06T01:22:17Z MEMBER

Do you think it would make sense to try to get rid of "both merging and concatenating" in favor of requiring another dimension in the grid? I guess we do sort of need this for this for open_mfdataset but it feels more complex than desired.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
444712974 https://github.com/pydata/xarray/pull/2553#issuecomment-444712974 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0NDcxMjk3NA== shoyer 1217238 2018-12-06T01:20:18Z 2018-12-06T01:20:18Z MEMBER

Specifying dim=None will specify (1) explicitly, but otherwise yes, that's exactly what it does.

Yes, this was a typo on my part :).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
444712053 https://github.com/pydata/xarray/pull/2553#issuecomment-444712053 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0NDcxMjA1Mw== shoyer 1217238 2018-12-06T01:15:23Z 2018-12-06T01:19:54Z MEMBER

if the grid of results is auto-combinable along all it's dimensions separately, then the new auto_combine() will auto-magically combine it all

OK, so the rule is that a grid dimension can correspond to any of:

  1. merging data variables
  2. concatenating along the dimension
  3. both merging and concatenating

And to specify case (1) explicitly, you can write dim=None along that dimension?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
444580848 https://github.com/pydata/xarray/pull/2553#issuecomment-444580848 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0NDU4MDg0OA== shoyer 1217238 2018-12-05T17:56:26Z 2018-12-05T17:56:26Z MEMBER

Generally I like the look of this.

What happens if you have a nested list of Dataset objects with different data variables? e.g., a 2D grid of results, with different variables saved in different files. Is this supported at all?

I ask because the current version of auto_combine() was (for better or worse) designed to handle this automagically, but I'm not sure this makes sense for higher dimensional data.

It might be better to have a separate nested_concat() function rather than to squeeze this all into auto_combine().

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
443446419 https://github.com/pydata/xarray/pull/2553#issuecomment-443446419 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0MzQ0NjQxOQ== shoyer 1217238 2018-12-01T18:15:30Z 2018-12-01T18:15:30Z MEMBER

Itertoolz is a dask.array dependency that we didn't explicitly declare in xarray (yes, this was sloppy). We should probably just copy the function we use into some utils file inside xarray -- it has a compatible open source license. Or we could rewrite to not need it. On Sat, Dec 1, 2018 at 8:18 AM Tom Nicholas notifications@github.com wrote:

I've fixed the code so it works with python 3.5 & 2.7 (tested on my local machine), but the tests on python 2.7 are still failing because for some reason it can't find the itertoolz module. I tried to fix this by adding toolz an explicit dependency in setup.py but that didn't work either.

However, given that xarray is supposed to be dropping python 2 support at the end of this year (), does this particularly matter?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/2553#issuecomment-443437580, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1gVaW7quLcUIo1l-l72xh363f7Utks5u0qvvgaJpZM4YX8wE .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
437614965 https://github.com/pydata/xarray/pull/2553#issuecomment-437614965 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQzNzYxNDk2NQ== shoyer 1217238 2018-11-10T19:47:22Z 2018-11-10T19:47:22Z MEMBER

I should have suggested this before, but the internal implementation of numpy.block() is a nice example of this sort of logic: https://github.com/numpy/numpy/blob/e726c045c72833d5c826e8a13f889746ee0bfdf4/numpy/core/shape_base.py#L661

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 191.146ms · About: xarray-datasette