github: issue_comments: 5 rows where author_association = "NONE", issue = 713834297 and user = 2560426 sorted by updated

5 rows where author_association = "NONE", issue = 713834297 and user = 2560426 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
713172015	https://github.com/pydata/xarray/issues/4482#issuecomment-713172015	https://api.github.com/repos/pydata/xarray/issues/4482	MDEyOklzc3VlQ29tbWVudDcxMzE3MjAxNQ==	heerad 2560426	2020-10-20T22:17:08Z	2020-10-20T22:21:14Z	NONE	On the topic of fillna(), I'm seeing an odd unrelated issue that I don't have an explanation for. I have a dataarray `x` that I'm able to call `x.compute()` on. When I do `x.fillna(0).compute()`, I get the following error: `KeyError: ('where-3a3[...long hex string]', 100, 0, 0, 4)` Stack trace shows it's failing on a `get_dependencies(dsk, key, task, as_list)` call from a `cull(dsk, keys)` call in dask/optimization.py. `get_dependencies` itself is defined in dask/core.py. I have no idea how to reproduce this simply... If it helps narrow things down, `x` is a dask array, one of the dimensions is a datetime64, and all other are strings. I've tried using both the default engine and `netcdf4` when loading with `open_mfdataset`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow skipna in .dot() 713834297
708474940	https://github.com/pydata/xarray/issues/4482#issuecomment-708474940	https://api.github.com/repos/pydata/xarray/issues/4482	MDEyOklzc3VlQ29tbWVudDcwODQ3NDk0MA==	heerad 2560426	2020-10-14T15:21:29Z	2020-10-14T15:21:55Z	NONE	Adding on, whatever the solution is that avoids blowing up memory, especially when using with `construct`, it would be useful to be implemented for both `fillna(0)` and `notnull()`. One common use-case would be so that you can take a weighted mean which normalizes by the sum of weights corresponding only to non-null entries, as in here: https://github.com/pydata/xarray/blob/333e8dba55f0165ccadf18f2aaaee9257a4d716b/xarray/core/weighted.py#L169	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow skipna in .dot() 713834297
707331260	https://github.com/pydata/xarray/issues/4482#issuecomment-707331260	https://api.github.com/repos/pydata/xarray/issues/4482	MDEyOklzc3VlQ29tbWVudDcwNzMzMTI2MA==	heerad 2560426	2020-10-12T20:31:26Z	2020-10-12T21:05:24Z	NONE	See below. I temporarily write some files to netcdf then recombine them lazily using `open_mfdataset`. The issue seems to present itself more consistently when my `x` is a constructed rolling window, and especially when it's a rolling window of a stacked dimension as in below. I used the `memory_profiler` package and associated notebook extension (`%%memit` cell magic) to do memory profiling. ``` import numpy as np import xarray as xr import os N = 1000 N_per_file = 10 M = 100 K = 10 window_size = 150 tmp_dir = 'tmp' os.mkdir(tmp_dir) save many netcdf files, later to be concatted into a dask.delayed dataset for i in range(0, N, N_per_file): `# 3 dimensions: # d1 is the dim we're splitting our files/chunking along # d2 is a common dim among all files/chunks # d3 is a common dim among all files/chunks, where the first half is 0 and the second half is nan x_i = xr.DataArray([[[0](K//2) + [np.nan](K//2)]M]N_per_file, [('d1', [x for x in range(i, i+N_per_file)]), ('d2', [x for x in range(M)]), ('d3', [x for x in range(K)])] x_i.to_dataset(name='vals').to_netcdf('{}/file_{}.nc'.format(tmp_dir,i))` open lazily x = xr.open_mfdataset('{}/.nc'.format(tmp_dir), parallel=True, concat_dim='d1').vals a rolling window along a stacked dimension x_windows = x.stack(d13=['d1', 'd3']).rolling(d13=window_size).construct('window') we'll dot x_windows with y along the window dimension y = xr.DataArray([1]window_size, dims='window') incremental memory: 1.94 MiB x_windows.dot(y).compute() incremental memory: 20.00 MiB x_windows.notnull().dot(y).compute() incremental memory: 182.13 MiB x_windows.fillna(0.).dot(y).compute() incremental memory: 211.52 MiB x_windows.weighted(y).mean('window', skipna=True).compute() ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow skipna in .dot() 713834297
707238146	https://github.com/pydata/xarray/issues/4482#issuecomment-707238146	https://api.github.com/repos/pydata/xarray/issues/4482	MDEyOklzc3VlQ29tbWVudDcwNzIzODE0Ng==	heerad 2560426	2020-10-12T17:01:54Z	2020-10-12T17:16:07Z	NONE	Adding on here, even if `fillna` were to create a memory copy, we'd only expect memory usage to double. However, in my case with dask-based chunking (via `parallel=True` in `open_mfdataset`) I'm seeing the memory blow up multiple times that (10x+) until all available memory is eaten up. This is happening with `x.fillna(0).dot(y)` as well as `x.notnull().dot(y)` and `x.weighted(y).sum(skipna=True)`. `x` is the array that's chunked. This suggests that dask-based chunking isn't following through into the `fillna` and `notnull` ops, and the entire non-chunked arrays are being computed. More evidence in favor: if I do `(x*y).sum(skipna=True)` I get the following error: `MemoryError: Unable to allocate [xxx] GiB for an array with shape [un-chunked array shape] and data type float64` I'm happy to live with a memory copy for now with `fillna` and `notnull`, but allocating the full, un-chunked array into memory is a showstopper. Is there a different workaround that I can use in the meantime?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow skipna in .dot() 713834297
702939943	https://github.com/pydata/xarray/issues/4482#issuecomment-702939943	https://api.github.com/repos/pydata/xarray/issues/4482	MDEyOklzc3VlQ29tbWVudDcwMjkzOTk0Mw==	heerad 2560426	2020-10-02T20:20:53Z	2020-10-02T20:32:32Z	NONE	Great, looks like I missed that option. Thanks. For reference, `x.fillna(0).dot(y)` takes 18 seconds in that same example, so a little better.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow skipna in .dot() 713834297

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

5 rows where author_association = "NONE", issue = 713834297 and user = 2560426 sorted by updated_at descending

save many netcdf files, later to be concatted into a dask.delayed dataset

open lazily

a rolling window along a stacked dimension

we'll dot x_windows with y along the window dimension

incremental memory: 1.94 MiB

incremental memory: 20.00 MiB

incremental memory: 182.13 MiB

incremental memory: 211.52 MiB

Advanced export