home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

11 rows where issue = 229474101 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • rabernat 5
  • shoyer 4
  • jhamman 2

issue 1

  • concat prealigned objects · 11 ✖

author_association 1

  • MEMBER 11
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
315896334 https://github.com/pydata/xarray/pull/1413#issuecomment-315896334 https://api.github.com/repos/pydata/xarray/issues/1413 MDEyOklzc3VlQ29tbWVudDMxNTg5NjMzNA== jhamman 2443309 2017-07-17T21:53:40Z 2017-07-17T21:53:40Z MEMBER

Okay thanks, closing now. We can always reopen this if necessary.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat prealigned objects 229474101
315354054 https://github.com/pydata/xarray/pull/1413#issuecomment-315354054 https://api.github.com/repos/pydata/xarray/issues/1413 MDEyOklzc3VlQ29tbWVudDMxNTM1NDA1NA== rabernat 1197350 2017-07-14T13:01:45Z 2017-07-14T13:02:20Z MEMBER

Yes, I think it should be closed. There are better ways to accomplish the desired goals.

Specifically, allowing the user to pass kwargs to concat via open_mfdataset would be useful.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat prealigned objects 229474101
315205652 https://github.com/pydata/xarray/pull/1413#issuecomment-315205652 https://api.github.com/repos/pydata/xarray/issues/1413 MDEyOklzc3VlQ29tbWVudDMxNTIwNTY1Mg== jhamman 2443309 2017-07-13T21:20:41Z 2017-07-13T21:20:41Z MEMBER

@rabernat - I'm just catching up on this issue. Is you last comment indicating that we should close this PR?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat prealigned objects 229474101
302881933 https://github.com/pydata/xarray/pull/1413#issuecomment-302881933 https://api.github.com/repos/pydata/xarray/issues/1413 MDEyOklzc3VlQ29tbWVudDMwMjg4MTkzMw== shoyer 1217238 2017-05-20T16:00:15Z 2017-07-13T21:20:10Z MEMBER

Sounds good to me!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat prealigned objects 229474101
302843502 https://github.com/pydata/xarray/pull/1413#issuecomment-302843502 https://api.github.com/repos/pydata/xarray/issues/1413 MDEyOklzc3VlQ29tbWVudDMwMjg0MzUwMg== rabernat 1197350 2017-05-20T01:51:03Z 2017-05-20T01:51:03Z MEMBER

Since the expensive part (for me) is actually reading all the coordinates, I'm not sure that this PR makes sense any more.

The same thing I am going for here could probably be accomplished by allowing the user to pass join='exact' via open_mfdataset. A related optimization would be to allow the user to pass coords='minimal' (or other concat coords options) via open_mfdataset.

For really big datasets, I think we will want to go the NCML approach, generating the xarray metadata as a pre-processing step. Then we could add a function like open_ncml_dataset to xarray which would parse this metadata and construct the dataset in a more efficient way (i.e. not reading redundant coordinates).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat prealigned objects 229474101
302804510 https://github.com/pydata/xarray/pull/1413#issuecomment-302804510 https://api.github.com/repos/pydata/xarray/issues/1413 MDEyOklzc3VlQ29tbWVudDMwMjgwNDUxMA== shoyer 1217238 2017-05-19T20:32:57Z 2017-05-19T20:32:57Z MEMBER

Well, we could potentially write a fast path constructor for loading multiple netcdf files that avoids open_dataset. We just need another way to specify the schema, e.g., using NCML. On Fri, May 19, 2017 at 10:53 AM Ryan Abernathey notifications@github.com wrote:

As I think about this further, I realize it might be futile to avoid reading the dimensions from all the files. This is a basic part of how open_dataset works.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1413#issuecomment-302724756, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1pDsz3dD_xmfKFgg-WYk3LBCP1raks5r7az9gaJpZM4NeYj- .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat prealigned objects 229474101
302724756 https://github.com/pydata/xarray/pull/1413#issuecomment-302724756 https://api.github.com/repos/pydata/xarray/issues/1413 MDEyOklzc3VlQ29tbWVudDMwMjcyNDc1Ng== rabernat 1197350 2017-05-19T14:53:49Z 2017-05-19T14:53:49Z MEMBER

As I think about this further, I realize it might be futile to avoid reading the dimensions from all the files. This is a basic part of how open_dataset works.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat prealigned objects 229474101
302711547 https://github.com/pydata/xarray/pull/1413#issuecomment-302711547 https://api.github.com/repos/pydata/xarray/issues/1413 MDEyOklzc3VlQ29tbWVudDMwMjcxMTU0Nw== shoyer 1217238 2017-05-19T14:04:05Z 2017-05-19T14:04:05Z MEMBER

What is xr.align(..., join='exact') supposed to do?

It verifies that all dimensions have the same length, and coordinates along all dimensions (used for indexing) also match. Unlike the normal version of align, it doesn't do any indexing -- the outputs are always the same as the inputs.

It does not check that the necessary dimensions and variables exist in all datasets. But we should do that as part of the logic in concat anyways, since the xarray data model always requires knowing variables and their dimensions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat prealigned objects 229474101
302576832 https://github.com/pydata/xarray/pull/1413#issuecomment-302576832 https://api.github.com/repos/pydata/xarray/issues/1413 MDEyOklzc3VlQ29tbWVudDMwMjU3NjgzMg== rabernat 1197350 2017-05-19T00:30:13Z 2017-05-19T00:30:28Z MEMBER

Given a collection of datasets, how do I know if setting prealigned=True will work?

I guess we would want to check that (a) the necessary variables and dimensions exist in all datasets and (b) the dimensions have the same length. We would want to bypass the actual reading of the indices. I agree it would be nicer to subsume this logic into align.

What is xr.align(..., join='exact') supposed to do?

What happens if things go wrong?

I can add more careful checks once we sort out the align question.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat prealigned objects 229474101
302511481 https://github.com/pydata/xarray/pull/1413#issuecomment-302511481 https://api.github.com/repos/pydata/xarray/issues/1413 MDEyOklzc3VlQ29tbWVudDMwMjUxMTQ4MQ== shoyer 1217238 2017-05-18T19:04:18Z 2017-05-18T19:04:18Z MEMBER

This enhancement makes a lot of sense to me.

Two things worth considering:

  1. Given a collection of datasets, how do I know if setting prealigned=True will work? This is where my PR adding xr.align(..., join='exact') could help (I can finish that up). Maybe it's worth adding xr.is_aligned or something similar.
  2. What happens if things go wrong? It's okay if the behavior is undefined (or could give wrong results) but we should document that. Ideally we should raise sensible errors at some later time, e.g., when the dask arrays are computed. This might or might not be possible to do efficiently with dask, if the result of all the equality checks are consolidated and added into the dask graphs of the results.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat prealigned objects 229474101
302496987 https://github.com/pydata/xarray/pull/1413#issuecomment-302496987 https://api.github.com/repos/pydata/xarray/issues/1413 MDEyOklzc3VlQ29tbWVudDMwMjQ5Njk4Nw== rabernat 1197350 2017-05-18T18:14:56Z 2017-05-18T18:15:34Z MEMBER

Let me expand on what this does.

Many netCDF datasets consist of multiple files with identical coordinates, except for one (e.g. time). With xarray we can open these datasets with open_mfdataset, which calls concat on the list of individual dataset objects. concat calls align, which loads all of the dimension indices (and, optionally, non-dimension coordinates) from each file and checks them for consistency / alignment.

This align step is potentially quite expensive for big collections of files with large indices. For example, an unstructured grid or particle-based dataset would just have a single dimension coordinate, with the same length as the data variables. If the user knows that the datasets are already aligned, this PR enables the alignment step to be skipped by passing the argument prealigned=True to concat. My goal is to avoid touching the disk as much as possible.

This PR is a draft in progress. I still need to propagate the prealigned argument up to auto_combine and open_mfdataset.

An alternative API would be to add another option to the coords keywork, i.e. coords='prealigned'.

Feedback welcome.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat prealigned objects 229474101

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 395.698ms · About: xarray-datasette