home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

13 rows where author_association = "MEMBER", issue = 379415229 and user = 35968931 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • TomNicholas · 13 ✖

issue 1

  • Feature: N-dimensional auto_combine · 13 ✖

author_association 1

  • MEMBER · 13 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
446420988 https://github.com/pydata/xarray/pull/2553#issuecomment-446420988 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0NjQyMDk4OA== TomNicholas 35968931 2018-12-12T00:57:56Z 2018-12-12T00:57:56Z MEMBER

Removed the unnecessary argument.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
446223544 https://github.com/pydata/xarray/pull/2553#issuecomment-446223544 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0NjIyMzU0NA== TomNicholas 35968931 2018-12-11T14:35:12Z 2018-12-11T14:35:12Z MEMBER

Okay I've reverted the API, so it basically converts a concat_dim input to it's multidimensional version before continuing.

If we merge this then should I start a separate Pull Request for further discussion about the API?

One of the Travis CI builds failed but again I don't think that was me.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
445903380 https://github.com/pydata/xarray/pull/2553#issuecomment-445903380 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0NTkwMzM4MA== TomNicholas 35968931 2018-12-10T17:37:25Z 2018-12-10T17:37:25Z MEMBER

This sounds pretty good to me.

Okay good. So with this API then auto_combine() (and therefore the default behaviour of open_mfdatset()) would be to throw a warning if there isn't enough info in the dataset coords to order them?

OK, do you want to go ahead and revert the documentation/public API for now?

Yes, I'll do that.

I would even be OK supporting nested lists temporarily ... "at your own risk"

If I revert the documentation and you merge this PR then that's exactly what we will have, which would be useful for me until we do the public API.

(Also it seems that whatever was wrong with cftime has been fixed now, as the CI tests are passing.)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
445845065 https://github.com/pydata/xarray/pull/2553#issuecomment-445845065 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0NTg0NTA2NQ== TomNicholas 35968931 2018-12-10T14:59:30Z 2018-12-10T15:21:05Z MEMBER

There are probably some edge cases where the existing auto_combine would you be a little more slopping about how you put together your lists of variables.

I think that's probably the case, but I also think that those edge cases will be so specific that maybe we don't have to explicitly support them. We could just say that anyone who has a combination of datasets that is that funky can just concatenate them themselves?

For users, I think it makes sense to have...

I agree, two separate functions is a lot more intuitive than having auto_combine() behave either auto-magically or totally manually depending on the value of an optional keyword argument.

The only problem with that idea is that both of these functions should be options for open_mfdataset(), so you still have multiple possible argument structures for that function. I think that's okay though - it would be a lot easier to explain if you say "if the keyword argument combine='auto' then the function auto_combine() is used, if combine='manual' then nested_concat_and_merge() is used, and the filepaths must be structured accordingly."

If you like, we could also merge this work (which is excellent progress towards user-facing APIs) but keep the changes internal to xarray for now until we figure out the public APIs.

That would be great. Then I could start using the master branch of xarray again in my code, while we redo the public API. If I set concat_dims back to concat_dim then it should be completely backwards-compatible.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
444712699 https://github.com/pydata/xarray/pull/2553#issuecomment-444712699 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0NDcxMjY5OQ== TomNicholas 35968931 2018-12-06T01:18:41Z 2018-12-06T10:05:40Z MEMBER

Specifying dim=None will specify (1) explicitly, but otherwise yes, that's exactly what it does.

That is how the current auto_combine behaves too, just for 1D obviously.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
444712936 https://github.com/pydata/xarray/pull/2553#issuecomment-444712936 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0NDcxMjkzNg== TomNicholas 35968931 2018-12-06T01:20:05Z 2018-12-06T10:03:43Z MEMBER

Also if you specify dim='something' when that dimension doesn't already exist, you are explicitly telling it to add a new dimension.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
444715777 https://github.com/pydata/xarray/pull/2553#issuecomment-444715777 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0NDcxNTc3Nw== TomNicholas 35968931 2018-12-06T01:34:18Z 2018-12-06T10:03:26Z MEMBER

I think that would mean there might be some situations that 1D auto_combine() could deal with but nested_concat_and_merge() could not. Situations where you can only merge the variables once they have been individually concatenated. (actually I'm not sure - I would need to think about that)

I think this is a question of whether you think that:

a) auto_combine() is a well-defined operation that is still well-defined in N-D

or

b) auto_combine()'s purpose is to do it's best to combine whatever is thrown at it, and more surgical N-D operations should be the job of another function

I personally think a), but I expect users who haven't read the source code for auto_combine() might disagree.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
444713178 https://github.com/pydata/xarray/pull/2553#issuecomment-444713178 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0NDcxMzE3OA== TomNicholas 35968931 2018-12-06T01:21:20Z 2018-12-06T01:21:20Z MEMBER

I appreciate that this is pretty complicated, perhaps it should have its own section in the docs (I can do another PR?)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
444708274 https://github.com/pydata/xarray/pull/2553#issuecomment-444708274 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0NDcwODI3NA== TomNicholas 35968931 2018-12-06T00:56:01Z 2018-12-06T01:01:49Z MEMBER

Thanks for the comments.


What happens if you have a nested list of Dataset objects with different data variables?

This is supported. This new auto_combine() simply applies the old auto_combine() N times along N dimensions, so if the grid of results is auto-combinable along all it's dimensions separately, then the new auto_combine() will auto-magically combine it all, e.g:

```python objs = [[Dataset({'foo': ('x', [0, 1])}), Dataset({'bar': ('x', [10, 20])})], [Dataset({'foo': ('x', [2, 3])}), Dataset({'bar': ('x', [30, 40])})]] expected = Dataset({'foo': ('x', [0, 1, 2, 3]), 'bar': ('x', [10, 20, 30, 40])})

This works

actual = auto_combine(objs, concat_dims=['x', None]) assert_identical(expected, actual)

Also works auto-magically

actual = auto_combine(objs) assert_identical(expected, actual)

Proving it works symmetrically

objs = [[Dataset({'foo': ('x', [0, 1])}), Dataset({'foo': ('x', [2, 3])})], [Dataset({'bar': ('x', [10, 20])}), Dataset({'bar': ('x', [30, 40])})]] actual = auto_combine(objs, concat_dims=[None, 'x']) assert_identical(expected, actual) ``` (I'll add this example as another unit test)


I should point out that there is one way in which this function is not exactly as general as auto_combine() applied N times though. The options compat, data_vars, and coords are specified once for the combining along all dimensions, so you can't currently tell it to use compat='identical' along dim1 and compat='no_conflicts' along dim2, it has to be the same for both. I thought about making these kwargs also accept lists, but although that would be easy to do it seemed like it would complicate the API for a very specific use case?


It might be better to have a separate nested_concat() function rather than to squeeze this all into auto_combine().

That was basically what I tried to do in my first attempt, but nested concatenation without merging along every dimension misses some common use cases, for example if you wanted to auto_combine (or open_mfdataset()) files structured like bash root ├─ time1 │ ├── density.nc │ └── temperature.nc └─ time2 ├── density.nc └── temperature.nc then you would want to merge density.nc and temperature.nc, then concat along 'time'. I suppose you could have nested_auto_combine() as a separate function though. I don't really think that's necessary though, as apart from the substitution concat_dim -> concat_dims then this is fully backwards-compatible.


You might also find it interesting to see how I've used this fork in my own code: I create the grid of datasets here, so that I can combine them here.


I have a question actually - currently if the concat or merge fails, then the error message won't clearly tell you which dimension it was trying to combine along when it failed. Is there a way to do that easily with try... except... statements? Something like python for dim in concat_dims: try: _auto_combine_along_first_dim(...) except MergeError or ValueError as err: raise ValueError(f"Encoutered {err} while trying to combine along dimension {dim}")


(Also something else is breaking in cftime on the python 2.7 builds on AppVeyor now...)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
443536378 https://github.com/pydata/xarray/pull/2553#issuecomment-443536378 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0MzUzNjM3OA== TomNicholas 35968931 2018-12-02T19:50:59Z 2018-12-02T20:08:28Z MEMBER

Now that all the tests are passing for all python versions, and I've added an example of multidimensional concatenation to the docstring of auto_combine, I think this is ready to be merged.

However, the appveyor builds on python 2.7 are now failing on setup when they try to install cftime, which I don't think was anything I did?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
443456199 https://github.com/pydata/xarray/pull/2553#issuecomment-443456199 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0MzQ1NjE5OQ== TomNicholas 35968931 2018-12-01T20:30:30Z 2018-12-01T20:31:11Z MEMBER

Okay, I've rewritten it using itertools.groupby instead of toolz.itertoolz.groupby so that there should no longer be a dependence on toolz. The itertools groupby assumes the input is sorted, which makes the code slightly more complex, but I think that's still neater than including the functions and tests required for toolz.itertoolz.groupby.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
443437580 https://github.com/pydata/xarray/pull/2553#issuecomment-443437580 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0MzQzNzU4MA== TomNicholas 35968931 2018-12-01T16:18:54Z 2018-12-01T16:43:05Z MEMBER

I've fixed the code so it works with python 3.5 & 2.7 (tested on my local machine), but the tests on python 2.7 are still failing because for some reason it can't find the itertoolz module. I tried to fix this by adding toolz an explicit dependency in setup.py but that didn't work either.

However, given that xarray is supposed to be dropping python 2 support at the end of this year (https://github.com/pydata/xarray/issues/1830), does this particularly matter?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229
441639494 https://github.com/pydata/xarray/pull/2553#issuecomment-441639494 https://api.github.com/repos/pydata/xarray/issues/2553 MDEyOklzc3VlQ29tbWVudDQ0MTYzOTQ5NA== TomNicholas 35968931 2018-11-26T13:31:11Z 2018-12-01T15:56:36Z MEMBER

This is basically done now - I've implemented everything I wanted to, and included unit tests for the new functionality. I'm also successfully using it in my personal code now.

@shoyer I haven't changed the way the combine function is applied repeatedly to match the implementation in numpy.block(), I think the way I've done it is relatively clear, but let me know if you think it should be refactored.

~~I don't understand why some of the CI tests are failing - all test which run on my machine pass, and the errors in the log seem to come from dask arrays not being loaded, i.e:~~

AssertionError: <xarray.Dataset> Dimensions: (x: 10, y: 8) Dimensions without coordinates: x, y Data variables: foo (x, y) float64 -1.19 0.4476 0.6283 ... -0.09699 -0.2311 -0.6482 <xarray.Dataset> Dimensions: (x: 10, y: 8) Dimensions without coordinates: x, y Data variables: foo (x, y) float64 dask.array<shape=(10, 8), chunksize=(5, 4)>

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature: N-dimensional auto_combine 379415229

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 802.29ms · About: xarray-datasette