home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 707238146

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4482#issuecomment-707238146 https://api.github.com/repos/pydata/xarray/issues/4482 707238146 MDEyOklzc3VlQ29tbWVudDcwNzIzODE0Ng== 2560426 2020-10-12T17:01:54Z 2020-10-12T17:16:07Z NONE

Adding on here, even if fillna were to create a memory copy, we'd only expect memory usage to double. However, in my case with dask-based chunking (via parallel=True in open_mfdataset) I'm seeing the memory blow up multiple times that (10x+) until all available memory is eaten up.

This is happening with x.fillna(0).dot(y) as well as x.notnull().dot(y) and x.weighted(y).sum(skipna=True). x is the array that's chunked. This suggests that dask-based chunking isn't following through into the fillna and notnull ops, and the entire non-chunked arrays are being computed.

More evidence in favor: if I do (x*y).sum(skipna=True) I get the following error:

MemoryError: Unable to allocate [xxx] GiB for an array with shape [un-chunked array shape] and data type float64

I'm happy to live with a memory copy for now with fillna and notnull, but allocating the full, un-chunked array into memory is a showstopper. Is there a different workaround that I can use in the meantime?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  713834297
Powered by Datasette · Queries took 0.653ms · About: xarray-datasette