home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where issue = 184722754 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • shoyer 5
  • crusaderky 3
  • max-sixty 2

issue 1

  • shallow copies become deep copies when pickling · 10 ✖

author_association 1

  • MEMBER 10
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
277549915 https://github.com/pydata/xarray/issues/1058#issuecomment-277549915 https://api.github.com/repos/pydata/xarray/issues/1058 MDEyOklzc3VlQ29tbWVudDI3NzU0OTkxNQ== shoyer 1217238 2017-02-05T21:13:41Z 2017-02-05T21:13:41Z MEMBER

Alternatively, it could make sense to change pickle upstream in NumPy to special case arrays with a stride of 0 along some dimension differently.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  shallow copies become deep copies when pickling 184722754
277549355 https://github.com/pydata/xarray/issues/1058#issuecomment-277549355 https://api.github.com/repos/pydata/xarray/issues/1058 MDEyOklzc3VlQ29tbWVudDI3NzU0OTM1NQ== shoyer 1217238 2017-02-05T21:06:19Z 2017-02-05T21:06:19Z MEMBER

@crusaderky Yes, I think it could be reasonable to unify array types when you call broadcast() or align(), as either as optional behavior or by changing the default.

If your scalar array is the result of an expensive dask calculation, this also might be a good use case for dask's new .persist() method (https://github.com/dask/dask/issues/1908), which we could add to xarray as an alternative to .compute().

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  shallow copies become deep copies when pickling 184722754
277543644 https://github.com/pydata/xarray/issues/1058#issuecomment-277543644 https://api.github.com/repos/pydata/xarray/issues/1058 MDEyOklzc3VlQ29tbWVudDI3NzU0MzY0NA== crusaderky 6213168 2017-02-05T19:44:33Z 2017-02-05T19:44:33Z MEMBER

Actually, I very much still am facing the problem. The biggest issue is now when I need to invoke xarray.broadcast. In my use case, I'm broadcasting together

  • a scalar array with numpy backend, shape=(), chunks=None
  • a 1D array with dask backend, shape=(2**19,), chunks=(2**15,)

What broadcast does is transform the scalar array to a numpy array of 2**19 elements. This is actually a view on the original 0D array, so it's got negligible RAM requirements. But after pickling and unpickling, it's become a real 2**19 elements array. Add up a few hundreds of them, and I am facing GBs of wasted RAM.

A solution would be to change broadcast() to convert to dask before broadcasting, and then broadcast directly to the proper chunk size.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  shallow copies become deep copies when pickling 184722754
273001734 https://github.com/pydata/xarray/issues/1058#issuecomment-273001734 https://api.github.com/repos/pydata/xarray/issues/1058 MDEyOklzc3VlQ29tbWVudDI3MzAwMTczNA== shoyer 1217238 2017-01-17T01:53:18Z 2017-01-17T01:53:18Z MEMBER

I think this is fixed about as well as we can hope given how pickle works for NumPy by https://github.com/pydata/xarray/pull/1128.

So I'm closing this now, but feel free to open another issue for any follow-up concerns.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  shallow copies become deep copies when pickling 184722754
260773846 https://github.com/pydata/xarray/issues/1058#issuecomment-260773846 https://api.github.com/repos/pydata/xarray/issues/1058 MDEyOklzc3VlQ29tbWVudDI2MDc3Mzg0Ng== crusaderky 6213168 2016-11-15T21:26:52Z 2016-11-15T21:26:52Z MEMBER

Confirmed that #1017 fixes my specific issue, thanks! Leaving the ticket open as other people (particularly those that work on large arrays without dask) will still be affected.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  shallow copies become deep copies when pickling 184722754
256144009 https://github.com/pydata/xarray/issues/1058#issuecomment-256144009 https://api.github.com/repos/pydata/xarray/issues/1058 MDEyOklzc3VlQ29tbWVudDI1NjE0NDAwOQ== shoyer 1217238 2016-10-25T19:05:01Z 2016-10-25T19:05:01Z MEMBER

I answered the StackOverflow question: https://stackoverflow.com/questions/13746601/preserving-numpy-view-when-pickling/40247761#40247761

This was a tricky puzzle to figure out!

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  shallow copies become deep copies when pickling 184722754
256123953 https://github.com/pydata/xarray/issues/1058#issuecomment-256123953 https://api.github.com/repos/pydata/xarray/issues/1058 MDEyOklzc3VlQ29tbWVudDI1NjEyMzk1Mw== max-sixty 5635139 2016-10-25T18:23:35Z 2016-10-25T18:23:35Z MEMBER

@crusaderky right, I see. All those views are in the same pickle object, and so shouldn't be duplicated. That is frustrating.

As per @shoyer, the easiest way is to just not have the data in the first place. So not needing indexes at all should solve your case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  shallow copies become deep copies when pickling 184722754
255952251 https://github.com/pydata/xarray/issues/1058#issuecomment-255952251 https://api.github.com/repos/pydata/xarray/issues/1058 MDEyOklzc3VlQ29tbWVudDI1NTk1MjI1MQ== crusaderky 6213168 2016-10-25T06:54:02Z 2016-10-25T06:54:02Z MEMBER

@maximilianr, if you pickle 2 plain python objects A and B together, and one of the attributes of B is a reference to A, A does not get duplicated.

In this case there must be some specific getstate code to prevent this and/or something with the C implementation of the class

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  shallow copies become deep copies when pickling 184722754
255801010 https://github.com/pydata/xarray/issues/1058#issuecomment-255801010 https://api.github.com/repos/pydata/xarray/issues/1058 MDEyOklzc3VlQ29tbWVudDI1NTgwMTAxMA== max-sixty 5635139 2016-10-24T17:03:08Z 2016-10-24T17:03:08Z MEMBER

If I'm understanding you correctly @crusaderky, I think this is a tough problem, and one much broader than xarray. When pickling something with a reference, do you want to save the object, or the reference? If you pickle the reference, how can you guarantee to have the object available when unpickling? How would you codify the reference (memory location?)?

Is that right? Or am I misunderstanding your problem?

On this narrow case, I think not having indexes at all should solve this, though

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  shallow copies become deep copies when pickling 184722754
255622303 https://github.com/pydata/xarray/issues/1058#issuecomment-255622303 https://api.github.com/repos/pydata/xarray/issues/1058 MDEyOklzc3VlQ29tbWVudDI1NTYyMjMwMw== shoyer 1217238 2016-10-23T23:27:09Z 2016-10-23T23:27:09Z MEMBER

The plan is stop making default indexes with np.arange. See https://github.com/pydata/xarray/pull/1017, which is my top priority for the next major release.

I'm not confident that your work around will work properly. At the very least, you should check strides as well. Otherwise get_base(array[::-1]) would return array.

If it would really help, I'm open to making Variable(dims, array) reuse the same numpy array instead of creating a view (see as_compatible_data).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  shallow copies become deep copies when pickling 184722754

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.738ms · About: xarray-datasette