html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/1413#issuecomment-315896334,https://api.github.com/repos/pydata/xarray/issues/1413,315896334,MDEyOklzc3VlQ29tbWVudDMxNTg5NjMzNA==,2443309,2017-07-17T21:53:40Z,2017-07-17T21:53:40Z,MEMBER,"Okay thanks, closing now. We can always reopen this if necessary.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,229474101
https://github.com/pydata/xarray/pull/1413#issuecomment-315354054,https://api.github.com/repos/pydata/xarray/issues/1413,315354054,MDEyOklzc3VlQ29tbWVudDMxNTM1NDA1NA==,1197350,2017-07-14T13:01:45Z,2017-07-14T13:02:20Z,MEMBER,"Yes, I think it should be closed. There are better ways to accomplish the desired goals.
Specifically, allowing the user to pass kwargs to concat via open_mfdataset would be useful.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,229474101
https://github.com/pydata/xarray/pull/1413#issuecomment-315205652,https://api.github.com/repos/pydata/xarray/issues/1413,315205652,MDEyOklzc3VlQ29tbWVudDMxNTIwNTY1Mg==,2443309,2017-07-13T21:20:41Z,2017-07-13T21:20:41Z,MEMBER,@rabernat - I'm just catching up on this issue. Is you last comment indicating that we should close this PR?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,229474101
https://github.com/pydata/xarray/pull/1413#issuecomment-302881933,https://api.github.com/repos/pydata/xarray/issues/1413,302881933,MDEyOklzc3VlQ29tbWVudDMwMjg4MTkzMw==,1217238,2017-05-20T16:00:15Z,2017-07-13T21:20:10Z,MEMBER,Sounds good to me!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,229474101
https://github.com/pydata/xarray/pull/1413#issuecomment-302843502,https://api.github.com/repos/pydata/xarray/issues/1413,302843502,MDEyOklzc3VlQ29tbWVudDMwMjg0MzUwMg==,1197350,2017-05-20T01:51:03Z,2017-05-20T01:51:03Z,MEMBER,"Since the expensive part (for me) is actually reading all the coordinates, I'm not sure that this PR makes sense any more.
The same thing I am going for here could probably be accomplished by allowing the user to pass `join='exact'` via `open_mfdataset`. A related optimization would be to allow the user to pass `coords='minimal'` (or other `concat` coords options) via `open_mfdataset`.
For really big datasets, I think we will want to go the NCML approach, generating the xarray metadata as a pre-processing step. Then we could add a function like `open_ncml_dataset` to xarray which would parse this metadata and construct the dataset in a more efficient way (i.e. not reading redundant coordinates).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,229474101
https://github.com/pydata/xarray/pull/1413#issuecomment-302804510,https://api.github.com/repos/pydata/xarray/issues/1413,302804510,MDEyOklzc3VlQ29tbWVudDMwMjgwNDUxMA==,1217238,2017-05-19T20:32:57Z,2017-05-19T20:32:57Z,MEMBER,"Well, we could potentially write a fast path constructor for loading
multiple netcdf files that avoids open_dataset. We just need another way to
specify the schema, e.g., using NCML.
On Fri, May 19, 2017 at 10:53 AM Ryan Abernathey
wrote:
> As I think about this further, I realize it might be futile to avoid
> reading the dimensions from all the files. This is a basic part of how
> open_dataset works.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> , or mute
> the thread
>
> .
>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,229474101
https://github.com/pydata/xarray/pull/1413#issuecomment-302724756,https://api.github.com/repos/pydata/xarray/issues/1413,302724756,MDEyOklzc3VlQ29tbWVudDMwMjcyNDc1Ng==,1197350,2017-05-19T14:53:49Z,2017-05-19T14:53:49Z,MEMBER,"As I think about this further, I realize it might be futile to avoid reading the dimensions from all the files. This is a basic part of how `open_dataset` works.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,229474101
https://github.com/pydata/xarray/pull/1413#issuecomment-302711547,https://api.github.com/repos/pydata/xarray/issues/1413,302711547,MDEyOklzc3VlQ29tbWVudDMwMjcxMTU0Nw==,1217238,2017-05-19T14:04:05Z,2017-05-19T14:04:05Z,MEMBER,"> What is `xr.align(..., join='exact')` supposed to do?
It verifies that all dimensions have the same length, and coordinates along all dimensions (used for indexing) also match. Unlike the normal version of `align`, it doesn't do any indexing -- the outputs are always the same as the inputs.
It *does not* check that the necessary dimensions and variables exist in all datasets. But we should do that as part of the logic in `concat` anyways, since the xarray data model always requires knowing variables and their dimensions.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,229474101
https://github.com/pydata/xarray/pull/1413#issuecomment-302576832,https://api.github.com/repos/pydata/xarray/issues/1413,302576832,MDEyOklzc3VlQ29tbWVudDMwMjU3NjgzMg==,1197350,2017-05-19T00:30:13Z,2017-05-19T00:30:28Z,MEMBER,"> Given a collection of datasets, how do I know if setting prealigned=True will work?
I guess we would want to check that (a) the necessary variables and dimensions exist in all datasets and (b) the dimensions have the same _length_. We would want to bypass the actual reading of the indices. I agree it would be nicer to subsume this logic into `align`.
What is `xr.align(..., join='exact')` supposed to do?
> What happens if things go wrong?
I can add more careful checks once we sort out the align question.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,229474101
https://github.com/pydata/xarray/pull/1413#issuecomment-302511481,https://api.github.com/repos/pydata/xarray/issues/1413,302511481,MDEyOklzc3VlQ29tbWVudDMwMjUxMTQ4MQ==,1217238,2017-05-18T19:04:18Z,2017-05-18T19:04:18Z,MEMBER,"This enhancement makes a lot of sense to me.
Two things worth considering:
1. Given a collection of datasets, how do I know if setting `prealigned=True` will work? This is where my PR adding `xr.align(..., join='exact')` could help (I can finish that up). Maybe it's worth adding `xr.is_aligned` or something similar.
2. What happens if things go wrong? It's okay if the behavior is undefined (or could give wrong results) but we should document that. Ideally we should raise sensible errors at some later time, e.g., when the dask arrays are computed. This might or might not be possible to do efficiently with dask, if the result of all the equality checks are consolidated and added into the dask graphs of the results.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,229474101
https://github.com/pydata/xarray/pull/1413#issuecomment-302496987,https://api.github.com/repos/pydata/xarray/issues/1413,302496987,MDEyOklzc3VlQ29tbWVudDMwMjQ5Njk4Nw==,1197350,2017-05-18T18:14:56Z,2017-05-18T18:15:34Z,MEMBER,"Let me expand on what this does.
Many netCDF datasets consist of multiple files with identical coordinates, except for one (e.g. time). With xarray we can open these datasets with `open_mfdataset`, which calls `concat` on the list of individual dataset objects. `concat` calls `align`, which loads all of the dimension indices (and, optionally, non-dimension coordinates) from each file and checks them for consistency / alignment.
This `align` step is potentially quite expensive for big collections of files with large indices. For example, an unstructured grid or particle-based dataset would just have a single dimension coordinate, with the same length as the data variables. If the user knows that the datasets are already aligned, this PR enables the alignment step to be skipped by passing the argument `prealigned=True` to `concat`. My goal is to avoid touching the disk as much as possible.
This PR is a draft in progress. I still need to propagate the `prealigned` argument up to `auto_combine` and `open_mfdataset`.
An alternative API would be to add another option to the `coords` keywork, i.e. `coords='prealigned'`.
Feedback welcome. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,229474101