home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 435336049

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2159#issuecomment-435336049 https://api.github.com/repos/pydata/xarray/issues/2159 435336049 MDEyOklzc3VlQ29tbWVudDQzNTMzNjA0OQ== 35968931 2018-11-02T10:29:24Z 2018-11-02T11:07:17Z MEMBER

I was thinking about the general solution to this problem again and wanted to clarify some things.

Currently concat() will concatenate datasets in the order they are supplied, and will not check that the resulting dimensions indexes are monotonic. This behvaiour violates CF conventions (as mentioned by @aluhamaa) but currently passes silently.

I think that any general multi-dimensional version of the auto_combine() function (and therefore open_mfdataset()) should:

1) If possible use the values in the dimension indexes to arrange the datasets so that the indexes are monotonic,

2) Else issue a warning that some of the indexes supplied are not monotonic,

3) Then instead concatenate the supplied datasets in the order supplied (for some N-dimensional definition of "order"). The warning should tell the user that's what it's doing.

This approach would then be backwards-compatible, accommodate users whose data does not have monotonic indexes (they would just have to arrange their datasets into the correct order themselves first), while still doing the obviously correct thing in unambiguous cases.

However this would mean that users wanting to do a multi-dimensional auto_combine on data without monotonic indexes would have to supply their datasets in some way that specifies their desired N-dimensional ordering. This could be done as list-of-lists, combining the inner-most dimensions first, e.g. [[x1y1, x2y1], [x1y2, x2y2]], concat_dims=['y', 'x']. But auto_combine would then have to be able to handle this type of input, and quickly distinguish between the two cases of monotonic & non-monotonic indices. Is this the behaviour which we want?

Also I'm assuming we are not going to provide functionality to handle uneven sub-lists, e.g. [[t1x1, t1x2], [t2x1, t2x2, t2x3]]?

Edit:

I've just realised that there is a lot of related discussion in #2039, #1385, & #1823. I suppose what I'm suggesting here is essentially the N-D generalisation of the approach discussed in those issues, namely an extra argument prealigned for open_mfdataset(), which defaults to False. Then with prealigned=True, the required input would be a nested list of (paths to) datasets, which is nested the same number of times as there are dimensions in concat_dims. Then to recreate the current behaviour for an ordered 1D list of datasets with non-monotonic indexes you would only have to pass prealigned=True.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  324350248
Powered by Datasette · Queries took 0.668ms · About: xarray-datasette