home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where issue = 970245117 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 5

  • Illviljan 4
  • shoyer 1
  • raybellwaves 1
  • TomNicholas 1
  • github-actions[bot] 1

author_association 2

  • MEMBER 6
  • CONTRIBUTOR 2

issue 1

  • Allow in-memory arrays with open_mfdataset · 8 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1528693660 https://github.com/pydata/xarray/pull/5704#issuecomment-1528693660 https://api.github.com/repos/pydata/xarray/issues/5704 IC_kwDOAMm_X85bHgOc Illviljan 14371165 2023-04-29T06:56:37Z 2023-04-29T06:58:26Z MEMBER

Those issues indeed has to be fixed if opening files lazily is the only option for xarray.

But xarray could also accept that chunks=None will (for now) load all the files to memory. If that's ok we can merge this now I believe. I suspect there are a few in-memory users out there that could make use of this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow in-memory arrays with open_mfdataset 970245117
1527695510 https://github.com/pydata/xarray/pull/5704#issuecomment-1527695510 https://api.github.com/repos/pydata/xarray/issues/5704 IC_kwDOAMm_X85bDsiW TomNicholas 35968931 2023-04-28T14:57:54Z 2023-04-28T14:57:54Z MEMBER

For the benefit of anyone else reading this having come from https://github.com/pydata/xarray/issues/7792 or similar questions - see https://github.com/pydata/xarray/issues/4628 and https://github.com/pydata/xarray/issues/5081 to see what needs to be done. Also see discussion in https://github.com/pydata/xarray/issues/6807 for non-dask lazy backends.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow in-memory arrays with open_mfdataset 970245117
898353772 https://github.com/pydata/xarray/pull/5704#issuecomment-898353772 https://api.github.com/repos/pydata/xarray/issues/5704 IC_kwDOAMm_X841i8ps github-actions[bot] 41898282 2021-08-13T10:18:22Z 2021-10-29T23:04:58Z CONTRIBUTOR

Unit Test Results

6 files           6 suites   55m 1s :stopwatch: 16 325 tests 14 581 :heavy_check_mark: 1 744 :zzz: 0 :x: 91 146 runs  82 854 :heavy_check_mark: 8 292 :zzz: 0 :x:

Results for commit 34442811.

:recycle: This comment has been updated with latest results.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow in-memory arrays with open_mfdataset 970245117
903442749 https://github.com/pydata/xarray/pull/5704#issuecomment-903442749 https://api.github.com/repos/pydata/xarray/issues/5704 IC_kwDOAMm_X8412XE9 Illviljan 14371165 2021-08-23T04:55:06Z 2021-08-23T04:55:06Z MEMBER

That the arrays would be loaded into memory is what you would expect if a user insists on using chunks=None right?

I just changed the default value to {}. So now it will behave as it did previously but with the possibility to load into memory for whatever reason you might have with small files.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow in-memory arrays with open_mfdataset 970245117
903347552 https://github.com/pydata/xarray/pull/5704#issuecomment-903347552 https://api.github.com/repos/pydata/xarray/issues/5704 IC_kwDOAMm_X8411_1g shoyer 1217238 2021-08-22T23:27:45Z 2021-08-22T23:27:45Z MEMBER

The reason why open_mfdataset always uses dask is because otherwise it would not be lazy: the netCDF files would be immediately read into memory as NumPy arrays. open_dataset uses Xarray's own internal lazy indexing machinery, but that machinery doesn't (yet) support lazy concatenation or broadcasting, so it doesn't suffice for open_mfdataset.

We certainly could make a similar change to this, but I would not do so by default. Or I would add support for lazy concatenation into xarray's lazy indexing, and then we could slowly roll out a breaking change (with appropriate FutureWarning, etc).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow in-memory arrays with open_mfdataset 970245117
903176445 https://github.com/pydata/xarray/pull/5704#issuecomment-903176445 https://api.github.com/repos/pydata/xarray/issues/5704 IC_kwDOAMm_X8411WD9 Illviljan 14371165 2021-08-21T21:06:38Z 2021-08-21T21:06:38Z MEMBER

One way of making this less controversial is to also change the default value of chunks from None to {} here https://github.com/pydata/xarray/blob/48a9dbe7d8dc2361bc985dd9fb1193a26135b310/xarray/backends/api.py#L696 Then the default settings will behave the same as before. Although it's still not consistent with xr.open_datasets default parameters which mfdataset is just a thin wrapper around.

It is indeed bad practice to use dicts as default value but not completely uncommon, see for example: https://github.com/pydata/xarray/blob/48a9dbe7d8dc2361bc985dd9fb1193a26135b310/xarray/core/dataset.py#L2111

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow in-memory arrays with open_mfdataset 970245117
901583625 https://github.com/pydata/xarray/pull/5704#issuecomment-901583625 https://api.github.com/repos/pydata/xarray/issues/5704 IC_kwDOAMm_X841vRMJ raybellwaves 17162724 2021-08-19T03:37:09Z 2021-08-19T03:37:19Z CONTRIBUTOR

See https://github.com/pydata/xarray/discussions/5689 for reference to this PR

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow in-memory arrays with open_mfdataset 970245117
898356703 https://github.com/pydata/xarray/pull/5704#issuecomment-898356703 https://api.github.com/repos/pydata/xarray/issues/5704 IC_kwDOAMm_X841i9Xf Illviljan 14371165 2021-08-13T10:23:23Z 2021-08-13T10:27:30Z MEMBER

A lot of failing tests but they seem to just assume that open_mfdataset always returns dask arrays by default. Probably as simple as adding chunks={} in all these tests, but this is quite a breaking change.

Do you know the reason why chunks=chunks or {} is used in open_mfdataset, @aurghs?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow in-memory arrays with open_mfdataset 970245117

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.271ms · About: xarray-datasette