home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 444708274

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/2553#issuecomment-444708274 https://api.github.com/repos/pydata/xarray/issues/2553 444708274 MDEyOklzc3VlQ29tbWVudDQ0NDcwODI3NA== 35968931 2018-12-06T00:56:01Z 2018-12-06T01:01:49Z MEMBER

Thanks for the comments.


What happens if you have a nested list of Dataset objects with different data variables?

This is supported. This new auto_combine() simply applies the old auto_combine() N times along N dimensions, so if the grid of results is auto-combinable along all it's dimensions separately, then the new auto_combine() will auto-magically combine it all, e.g:

```python objs = [[Dataset({'foo': ('x', [0, 1])}), Dataset({'bar': ('x', [10, 20])})], [Dataset({'foo': ('x', [2, 3])}), Dataset({'bar': ('x', [30, 40])})]] expected = Dataset({'foo': ('x', [0, 1, 2, 3]), 'bar': ('x', [10, 20, 30, 40])})

This works

actual = auto_combine(objs, concat_dims=['x', None]) assert_identical(expected, actual)

Also works auto-magically

actual = auto_combine(objs) assert_identical(expected, actual)

Proving it works symmetrically

objs = [[Dataset({'foo': ('x', [0, 1])}), Dataset({'foo': ('x', [2, 3])})], [Dataset({'bar': ('x', [10, 20])}), Dataset({'bar': ('x', [30, 40])})]] actual = auto_combine(objs, concat_dims=[None, 'x']) assert_identical(expected, actual) ``` (I'll add this example as another unit test)


I should point out that there is one way in which this function is not exactly as general as auto_combine() applied N times though. The options compat, data_vars, and coords are specified once for the combining along all dimensions, so you can't currently tell it to use compat='identical' along dim1 and compat='no_conflicts' along dim2, it has to be the same for both. I thought about making these kwargs also accept lists, but although that would be easy to do it seemed like it would complicate the API for a very specific use case?


It might be better to have a separate nested_concat() function rather than to squeeze this all into auto_combine().

That was basically what I tried to do in my first attempt, but nested concatenation without merging along every dimension misses some common use cases, for example if you wanted to auto_combine (or open_mfdataset()) files structured like bash root ├─ time1 │ ├── density.nc │ └── temperature.nc └─ time2 ├── density.nc └── temperature.nc then you would want to merge density.nc and temperature.nc, then concat along 'time'. I suppose you could have nested_auto_combine() as a separate function though. I don't really think that's necessary though, as apart from the substitution concat_dim -> concat_dims then this is fully backwards-compatible.


You might also find it interesting to see how I've used this fork in my own code: I create the grid of datasets here, so that I can combine them here.


I have a question actually - currently if the concat or merge fails, then the error message won't clearly tell you which dimension it was trying to combine along when it failed. Is there a way to do that easily with try... except... statements? Something like python for dim in concat_dims: try: _auto_combine_along_first_dim(...) except MergeError or ValueError as err: raise ValueError(f"Encoutered {err} while trying to combine along dimension {dim}")


(Also something else is breaking in cftime on the python 2.7 builds on AppVeyor now...)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  379415229
Powered by Datasette · Queries took 2.353ms · About: xarray-datasette