home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

16 rows where author_association = "MEMBER" and issue = 908971901 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • mrocklin 9
  • shoyer 5
  • TomAugspurger 2

issue 1

  • Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter · 16 ✖

author_association 1

  • MEMBER · 16 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
856124510 https://github.com/pydata/xarray/issues/5426#issuecomment-856124510 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1NjEyNDUxMA== mrocklin 306380 2021-06-07T17:31:00Z 2021-06-07T17:31:00Z MEMBER

Also cc'ing @gjoseph92

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852685733 https://github.com/pydata/xarray/issues/5426#issuecomment-852685733 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY4NTczMw== mrocklin 306380 2021-06-02T03:23:35Z 2021-06-02T03:23:35Z MEMBER

I think that the next thing to do here is to try to replicate this locally and watch the stealing logic to figure out why these tasks aren't moving. At this point we're just guessing. @jrbourbeau can I ask you to add this to the stack of issues to have folks look into?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852684328 https://github.com/pydata/xarray/issues/5426#issuecomment-852684328 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY4NDMyOA== shoyer 1217238 2021-06-02T03:19:43Z 2021-06-02T03:19:43Z MEMBER

When I pickle the adapter object from this example with cloudpickle, it looks like it's 6536 bytes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852683916 https://github.com/pydata/xarray/issues/5426#issuecomment-852683916 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY4MzkxNg== mrocklin 306380 2021-06-02T03:18:37Z 2021-06-02T03:18:37Z MEMBER

Yeah, that size being very small shouldn't be a problem

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852681951 https://github.com/pydata/xarray/issues/5426#issuecomment-852681951 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY4MTk1MQ== shoyer 1217238 2021-06-02T03:13:18Z 2021-06-02T03:13:18Z MEMBER

Hrm, the root dependency does appear to be of type

xarray.core.indexing.ImplicitToExplicitIndexingAdapter with size 48 B

I'm not sure what's going on with it

Well, sys.getsizeof() is certainly an under-estimate here, but I suspect the true size (e.g., if you pickle it) is measured in a handful of KB. I would be surprised if Dask is reluctant to serialize such objects.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852668929 https://github.com/pydata/xarray/issues/5426#issuecomment-852668929 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY2ODkyOQ== shoyer 1217238 2021-06-02T02:40:25Z 2021-06-02T03:09:13Z MEMBER

The only thing that comes to mind is everything being assigned to one worker when the entire task graph has a single node at the base of the task graph. But then work stealing kicks in and things level out (that was a while ago though).

Right, so it might help to pipe an option for inline=True into Variable.chunk() (which is indirectly called via open_zarr when chunks are provided).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852675828 https://github.com/pydata/xarray/issues/5426#issuecomment-852675828 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY3NTgyOA== mrocklin 306380 2021-06-02T02:58:13Z 2021-06-02T02:58:13Z MEMBER

Hrm, the root dependency does appear to be of type

xarray.core.indexing.ImplicitToExplicitIndexingAdapter with size 48 B

I'm not sure what's going on with it

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852672930 https://github.com/pydata/xarray/issues/5426#issuecomment-852672930 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY3MjkzMA== mrocklin 306380 2021-06-02T02:50:28Z 2021-06-02T02:50:28Z MEMBER

This is what it looks like in practice for me FWIW

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852671075 https://github.com/pydata/xarray/issues/5426#issuecomment-852671075 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY3MTA3NQ== mrocklin 306380 2021-06-02T02:45:48Z 2021-06-02T02:45:48Z MEMBER

Ideally Dask would be able to be robust to this kind of mis-assignment of object size, but it's particularly hard in this situation. We can't try to serialize these things because if we're wrong and the size actually is massive then we blow out the worker.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852670723 https://github.com/pydata/xarray/issues/5426#issuecomment-852670723 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY3MDcyMw== mrocklin 306380 2021-06-02T02:44:55Z 2021-06-02T02:44:55Z MEMBER

It may also be that we don't want to inline zarr objects (The graph is likely to be cheaper to move if we don't inline them). However we may want Zarr objects to report themselves as easy to move by defining their approximate size with sizeof. The ideal behavior here is that Dask treats zarr stores (or whatever is at the bottom of this graph) as separate tasks, but also as movable tasks.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852668060 https://github.com/pydata/xarray/issues/5426#issuecomment-852668060 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY2ODA2MA== shoyer 1217238 2021-06-02T02:38:20Z 2021-06-02T02:38:20Z MEMBER

dask/dask#6203 and dask/dask#6773 are the maybe relevant issues. I actually don't know if that could have an effect here. I don't know (and a brief search couldn't confirm) whether or not xarray uses dask.array.from_zarr.

Xarray uses dask.array.from_array but not from_zarr: https://github.com/pydata/xarray/blob/83eda1a8542a9dbd81bf0e08c8564c044df64c0a/xarray/core/variable.py#L1046-L1068

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852667695 https://github.com/pydata/xarray/issues/5426#issuecomment-852667695 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY2NzY5NQ== TomAugspurger 1312546 2021-06-02T02:37:18Z 2021-06-02T02:37:18Z MEMBER

Do you run into poor load balancing as well when using Zarr with Xarray?

The only thing that comes to mind is everything being assigned to one worker when the entire task graph has a single node at the base of the task graph. But then work stealing kicks in and things level out (that was a while ago though).

I haven't noticed any kind of systemic load balancing problem, but I can take a look at that notebook later.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852666904 https://github.com/pydata/xarray/issues/5426#issuecomment-852666904 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY2NjkwNA== shoyer 1217238 2021-06-02T02:35:11Z 2021-06-02T02:35:54Z MEMBER

What is sizeof supposed to estimate? The size of the computed array or the size of the pickled lazy object?

Typically this object would end up in Dask graphs when something is read from an xarray storage backend, e.g., netCDF or Zarr. If the underlying files are accessible everyone (e.g., as is the case for Zarr backed by a cloud object store), then a small size for the serialized object would be appropriate.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852666752 https://github.com/pydata/xarray/issues/5426#issuecomment-852666752 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY2Njc1Mg== mrocklin 306380 2021-06-02T02:34:48Z 2021-06-02T02:34:48Z MEMBER

Do you run into poor load balancing as well when using Zarr with Xarray? My guess here is that there are a few tasks in the graph that report multi-TB sizes and so are highly resistant to being moved around. I haven't verified that though

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852666211 https://github.com/pydata/xarray/issues/5426#issuecomment-852666211 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY2NjIxMQ== TomAugspurger 1312546 2021-06-02T02:33:28Z 2021-06-02T02:33:28Z MEMBER

https://github.com/dask/dask/pull/6203 and https://github.com/dask/dask/pull/6773/ are the maybe relevant issues. I actually don't know if that could have an effect here. I don't know (and a brief search couldn't confirm) whether or not xarray uses dask.array.from_zarr.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852656740 https://github.com/pydata/xarray/issues/5426#issuecomment-852656740 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY1Njc0MA== mrocklin 306380 2021-06-02T02:09:50Z 2021-06-02T02:09:50Z MEMBER

Thinking about this some more, it might be some other object, like a Zarr store, that is on only a couple of these machines. I recall that recently we switched Zarr from being in every task to being in only a few tasks. The problem here might be reversed, that we actually want to view Zarr stores in this case as quite cheap.

cc @TomAugspurger who I think was actively making decisions around that time.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.83ms · About: xarray-datasette