home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where issue = 1465047346 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 6

  • adanb13 3
  • shoyer 2
  • headtr1ck 2
  • jhamman 1
  • dcherian 1
  • Illviljan 1

author_association 3

  • MEMBER 5
  • NONE 3
  • COLLABORATOR 2

issue 1

  • (Issue #7324) added functions that return data values in memory efficient manner · 10 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1421254445 https://github.com/pydata/xarray/pull/7323#issuecomment-1421254445 https://api.github.com/repos/pydata/xarray/issues/7323 IC_kwDOAMm_X85Utp8t dcherian 2448579 2023-02-07T18:25:17Z 2023-02-07T18:25:17Z MEMBER

Thanks @adanb13

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  (Issue #7324) added functions that return data values in memory efficient manner 1465047346
1419917480 https://github.com/pydata/xarray/pull/7323#issuecomment-1419917480 https://api.github.com/repos/pydata/xarray/issues/7323 IC_kwDOAMm_X85Uojio adanb13 83403825 2023-02-06T23:10:39Z 2023-02-06T23:10:39Z NONE

@jhamman yes, I think it's alright to close. The issue seems to arise from the use of .tolist(), having to convert every value to a python float so that it may be retuned in a dictionary as is acceptable for JSON, causing the memory spike.

will try @Illviljan suggestion (thanks!) , to_json calls .compute but not sure if it calls .tolist(), in case it avoids it, that should result in greater memory efficiency.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  (Issue #7324) added functions that return data values in memory efficient manner 1465047346
1411223051 https://github.com/pydata/xarray/pull/7323#issuecomment-1411223051 https://api.github.com/repos/pydata/xarray/issues/7323 IC_kwDOAMm_X85UHY4L jhamman 2443309 2023-01-31T23:41:29Z 2023-01-31T23:41:29Z MEMBER

@adanb13 - do you have plans to revisit this PR? If not, do you mind if we close it for now? Based on the comments above, I think an issue discussing the use case and potential solutions would be a good next step.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  (Issue #7324) added functions that return data values in memory efficient manner 1465047346
1328331087 https://github.com/pydata/xarray/pull/7323#issuecomment-1328331087 https://api.github.com/repos/pydata/xarray/issues/7323 IC_kwDOAMm_X85PLLlP Illviljan 14371165 2022-11-27T20:15:53Z 2022-11-27T20:16:24Z MEMBER

How about converting the dataset to dask dataframe? python ddf = ds.to_dask_dataframe() ddf.to_json(filename)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  (Issue #7324) added functions that return data values in memory efficient manner 1465047346
1328156723 https://github.com/pydata/xarray/pull/7323#issuecomment-1328156723 https://api.github.com/repos/pydata/xarray/issues/7323 IC_kwDOAMm_X85PKhAz shoyer 1217238 2022-11-27T02:31:51Z 2022-11-27T02:31:51Z MEMBER

Use cases would be in any web service that would like to provide the final data values back to a user in JSON.

For what it's worth, I think your users will have a poor experience with encoded JSON data for very large arrays. It will be slow to compress and transfer this data.

In the long term, you would probably do better to transmit the data in some binary form (e.g., by calling tobytes() on the underlying np.ndarray objects, or by using Xarray's to_netcdf).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  (Issue #7324) added functions that return data values in memory efficient manner 1465047346
1328156304 https://github.com/pydata/xarray/pull/7323#issuecomment-1328156304 https://api.github.com/repos/pydata/xarray/issues/7323 IC_kwDOAMm_X85PKg6Q shoyer 1217238 2022-11-27T02:27:07Z 2022-11-27T02:27:07Z MEMBER

Thanks for report and the PR!

This really needs a "minimal complete verifiable" example (e.g., by creating and loading a Zarr array with random data) so others can verify your reported the performance gains: https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports https://stackoverflow.com/help/minimal-reproducible-example

To be honest, this fix looks a little funny to me, because NumPy's own implementation of tolist() is so similar. I would love to understand what is going on.

If you can reproduce the issue only using NumPy, it could also make more sense to file this as a upstream bug report to NumPy. The NumPy maintainers are in a better position to debug tricky memory allocation issues involving NumPy.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  (Issue #7324) added functions that return data values in memory efficient manner 1465047346
1328142597 https://github.com/pydata/xarray/pull/7323#issuecomment-1328142597 https://api.github.com/repos/pydata/xarray/issues/7323 IC_kwDOAMm_X85PKdkF adanb13 83403825 2022-11-27T00:48:34Z 2022-11-27T02:00:10Z NONE

I'm not sure if this breaks the data model of xarray leaving inconsistent sizes?

Also this seems like a very corner usecase, I don't think it is intended to write DataArrays in ASCII.

But I let some more senior devs of xarray be the judge here :)

Made these for work (big data, government). Is useful when trying to provide data values back to end user after all data manipulation has been done. (Aka the initial Xarray.DataArray is not longer needed)

best native solution that exists (from what I see) is .to_dict() which is memory inefficient. (Had memory errors at work when trying, hence all the tests provided in the gist). Basically a user can call to_dict with data = false and then add the data values using the above 2 functions to the resulting dictionary in a more memory efficient way.

Use cases would be in any web service that would like to provide the final data values back to a user in JSON.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  (Issue #7324) added functions that return data values in memory efficient manner 1465047346
1328119511 https://github.com/pydata/xarray/pull/7323#issuecomment-1328119511 https://api.github.com/repos/pydata/xarray/issues/7323 IC_kwDOAMm_X85PKX7X headtr1ck 43316012 2022-11-26T21:47:34Z 2022-11-26T21:47:34Z COLLABORATOR

The failing doctest is unrelated, you can ignore it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  (Issue #7324) added functions that return data values in memory efficient manner 1465047346
1328119375 https://github.com/pydata/xarray/pull/7323#issuecomment-1328119375 https://api.github.com/repos/pydata/xarray/issues/7323 IC_kwDOAMm_X85PKX5P headtr1ck 43316012 2022-11-26T21:46:20Z 2022-11-26T21:46:20Z COLLABORATOR

I'm not sure if this breaks the data model of xarray leaving inconsistent sizes?

Also this seems like a very corner usecase, I don't think it is intended to write DataArrays in ASCII.

But I let some more senior devs of xarray be the judge here :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  (Issue #7324) added functions that return data values in memory efficient manner 1465047346
1327982136 https://github.com/pydata/xarray/pull/7323#issuecomment-1327982136 https://api.github.com/repos/pydata/xarray/issues/7323 IC_kwDOAMm_X85PJ2Y4 adanb13 83403825 2022-11-26T05:09:04Z 2022-11-26T05:09:04Z NONE

ran python -m pip uninstall urllib3-secure-extra for Doctests fail as suggested. Get the following message: WARNING: Skipping urllib3-secure-extra as it is not installed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  (Issue #7324) added functions that return data values in memory efficient manner 1465047346

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 19.295ms · About: xarray-datasette