home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where author_association = "NONE" and issue = 324350248 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • ewquon 3
  • jnhansen 2
  • aluhamaa 1

issue 1

  • Concatenate across multiple dimensions with open_mfdataset · 6 ✖

author_association 1

  • NONE · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
531910110 https://github.com/pydata/xarray/issues/2159#issuecomment-531910110 https://api.github.com/repos/pydata/xarray/issues/2159 MDEyOklzc3VlQ29tbWVudDUzMTkxMDExMA== ewquon 18267059 2019-09-16T18:54:39Z 2019-09-16T18:54:39Z NONE

Thanks @dcherian. As you suggested, I ended up using v0.12.3 and xr.combine_by_coords() to get the expected behavior.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concatenate across multiple dimensions with open_mfdataset 324350248
528596388 https://github.com/pydata/xarray/issues/2159#issuecomment-528596388 https://api.github.com/repos/pydata/xarray/issues/2159 MDEyOklzc3VlQ29tbWVudDUyODU5NjM4OA== ewquon 18267059 2019-09-05T21:29:14Z 2019-09-05T21:29:14Z NONE

I can confirm that this issue persists in v0.12.3 as well.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concatenate across multiple dimensions with open_mfdataset 324350248
528594727 https://github.com/pydata/xarray/issues/2159#issuecomment-528594727 https://api.github.com/repos/pydata/xarray/issues/2159 MDEyOklzc3VlQ29tbWVudDUyODU5NDcyNw== ewquon 18267059 2019-09-05T21:24:34Z 2019-09-05T21:24:34Z NONE

I'm running xarray v0.12.1, released in June 5 of this year, which should include @TomNicholas's fix merged back in Dec of last year. However, the original MWE still gives the unwanted result with the repeated coordinates.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concatenate across multiple dimensions with open_mfdataset 324350248
410361639 https://github.com/pydata/xarray/issues/2159#issuecomment-410361639 https://api.github.com/repos/pydata/xarray/issues/2159 MDEyOklzc3VlQ29tbWVudDQxMDM2MTYzOQ== jnhansen 2622379 2018-08-03T20:03:21Z 2018-08-03T20:14:17Z NONE

Yes, xarray should support that very easily -- assuming you have dask installed: python ds = auto_merge('*.nc') ds.to_netcdf('larger_than_memory.nc') auto_merge conserves the chunk sizes resulting from the individual files. If the single files are still too large to fit into memory individually you can rechunk to smaller chunk sizes. The same goes of course for the original xarray.open_mfdataset.

I tested it on a ~25 GB dataset (on a machine with less memory than that).

Note: ds = auto_merge('*.nc') actually runs in a matter of milliseconds, as it merely provides a view of the merged dataset. Only once you call ds.to_netcdf('larger_than_memory.nc') all the disk I/O happens.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Concatenate across multiple dimensions with open_mfdataset 324350248
410348249 https://github.com/pydata/xarray/issues/2159#issuecomment-410348249 https://api.github.com/repos/pydata/xarray/issues/2159 MDEyOklzc3VlQ29tbWVudDQxMDM0ODI0OQ== jnhansen 2622379 2018-08-03T19:07:10Z 2018-08-03T19:12:21Z NONE

I just had the exact same problem, and while I didn't yet have time to dig into the source code of xarray.open_mfdataset, I wrote my own function to achieve this:

https://gist.github.com/jnhansen/fa474a536201561653f60ea33045f4e2

Maybe it's helpful to some of you.

Note that I make the following assumptions (which are reasonable for my use case): * the data variables in each part are identical * equality of the first element of two coordinate arrays is sufficient to assume equality of the two coordinate arrays

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concatenate across multiple dimensions with open_mfdataset 324350248
393479854 https://github.com/pydata/xarray/issues/2159#issuecomment-393479854 https://api.github.com/repos/pydata/xarray/issues/2159 MDEyOklzc3VlQ29tbWVudDM5MzQ3OTg1NA== aluhamaa 10668114 2018-05-31T09:57:01Z 2018-05-31T09:57:01Z NONE

Just wanted to add the same request;)

  • But, I want to add that current behavior, where combined y = 10 20 30 40 50 60 10 20 30 40 50 60 is incorrect in the context of CF, which states that coordinates must be monotonic, and I cannot see many other use cases where such a result would work.
  • Also, currently the documentation is not clear about what will happen if you have multiple dimensions with different ranges.

I also do not understand what is the real complexity of implementing it. As I understand the problem, the initial full dataset is some sort of N-d hypercube and then it is being split into parts along any nr of dimensions. When reading multiple files, which are just parts of this hypercube, it should be enough to just find the possible dimension values, form a hypercube and place each files content into the correct slot? What am I missing here?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concatenate across multiple dimensions with open_mfdataset 324350248

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 17.025ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows