github: issue_comments: 14 rows where user = 2560426 sorted by updated

14 rows where user = 2560426 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
778841149	https://github.com/pydata/xarray/issues/3961#issuecomment-778841149	https://api.github.com/repos/pydata/xarray/issues/3961	MDEyOklzc3VlQ29tbWVudDc3ODg0MTE0OQ==	heerad 2560426	2021-02-14T21:01:21Z	2021-02-14T21:01:21Z	NONE	Or alternatively you can try to set sleep between openings. To clarify, do you mean adding a sleep of e.g. 1 second prior to your `preprocess` function (and setting `preprocess` to just sleep then `return ds` if you're not doing any preprocessing)? Or, are you instead sleeping before the entire `open_mfdataset` call? Is this solution only addressing the issue of opening the same ds multiple times within a python process, or would it also address multiple processes opening the same ds?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Hangs while saving netcdf file opened using xr.open_mfdataset with lock=None 597657663
778838527	https://github.com/pydata/xarray/issues/3961#issuecomment-778838527	https://api.github.com/repos/pydata/xarray/issues/3961	MDEyOklzc3VlQ29tbWVudDc3ODgzODUyNw==	heerad 2560426	2021-02-14T20:40:38Z	2021-02-14T20:40:38Z	NONE	Also seeing this as of version 0.16.1. In some cases, I need `lock=False` otherwise I'll run into hung processes a certain percentage of the time. `ds.load()` prior to `to_netcdf()` does not solve the problem. In other cases, I need `lock=None` otherwise I'll consistently get `RuntimeError: NetCDF: Not a valid ID`. Is the current recommended solution to set `lock=False` and retry until success? Or, is it to keep `lock=None` and use `zarr` instead? @dcherian	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Hangs while saving netcdf file opened using xr.open_mfdataset with lock=None 597657663
713172015	https://github.com/pydata/xarray/issues/4482#issuecomment-713172015	https://api.github.com/repos/pydata/xarray/issues/4482	MDEyOklzc3VlQ29tbWVudDcxMzE3MjAxNQ==	heerad 2560426	2020-10-20T22:17:08Z	2020-10-20T22:21:14Z	NONE	On the topic of fillna(), I'm seeing an odd unrelated issue that I don't have an explanation for. I have a dataarray `x` that I'm able to call `x.compute()` on. When I do `x.fillna(0).compute()`, I get the following error: `KeyError: ('where-3a3[...long hex string]', 100, 0, 0, 4)` Stack trace shows it's failing on a `get_dependencies(dsk, key, task, as_list)` call from a `cull(dsk, keys)` call in dask/optimization.py. `get_dependencies` itself is defined in dask/core.py. I have no idea how to reproduce this simply... If it helps narrow things down, `x` is a dask array, one of the dimensions is a datetime64, and all other are strings. I've tried using both the default engine and `netcdf4` when loading with `open_mfdataset`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow skipna in .dot() 713834297
708474940	https://github.com/pydata/xarray/issues/4482#issuecomment-708474940	https://api.github.com/repos/pydata/xarray/issues/4482	MDEyOklzc3VlQ29tbWVudDcwODQ3NDk0MA==	heerad 2560426	2020-10-14T15:21:29Z	2020-10-14T15:21:55Z	NONE	Adding on, whatever the solution is that avoids blowing up memory, especially when using with `construct`, it would be useful to be implemented for both `fillna(0)` and `notnull()`. One common use-case would be so that you can take a weighted mean which normalizes by the sum of weights corresponding only to non-null entries, as in here: https://github.com/pydata/xarray/blob/333e8dba55f0165ccadf18f2aaaee9257a4d716b/xarray/core/weighted.py#L169	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow skipna in .dot() 713834297
707331260	https://github.com/pydata/xarray/issues/4482#issuecomment-707331260	https://api.github.com/repos/pydata/xarray/issues/4482	MDEyOklzc3VlQ29tbWVudDcwNzMzMTI2MA==	heerad 2560426	2020-10-12T20:31:26Z	2020-10-12T21:05:24Z	NONE	See below. I temporarily write some files to netcdf then recombine them lazily using `open_mfdataset`. The issue seems to present itself more consistently when my `x` is a constructed rolling window, and especially when it's a rolling window of a stacked dimension as in below. I used the `memory_profiler` package and associated notebook extension (`%%memit` cell magic) to do memory profiling. ``` import numpy as np import xarray as xr import os N = 1000 N_per_file = 10 M = 100 K = 10 window_size = 150 tmp_dir = 'tmp' os.mkdir(tmp_dir) save many netcdf files, later to be concatted into a dask.delayed dataset for i in range(0, N, N_per_file): `# 3 dimensions: # d1 is the dim we're splitting our files/chunking along # d2 is a common dim among all files/chunks # d3 is a common dim among all files/chunks, where the first half is 0 and the second half is nan x_i = xr.DataArray([[[0](K//2) + [np.nan](K//2)]M]N_per_file, [('d1', [x for x in range(i, i+N_per_file)]), ('d2', [x for x in range(M)]), ('d3', [x for x in range(K)])] x_i.to_dataset(name='vals').to_netcdf('{}/file_{}.nc'.format(tmp_dir,i))` open lazily x = xr.open_mfdataset('{}/.nc'.format(tmp_dir), parallel=True, concat_dim='d1').vals a rolling window along a stacked dimension x_windows = x.stack(d13=['d1', 'd3']).rolling(d13=window_size).construct('window') we'll dot x_windows with y along the window dimension y = xr.DataArray([1]window_size, dims='window') incremental memory: 1.94 MiB x_windows.dot(y).compute() incremental memory: 20.00 MiB x_windows.notnull().dot(y).compute() incremental memory: 182.13 MiB x_windows.fillna(0.).dot(y).compute() incremental memory: 211.52 MiB x_windows.weighted(y).mean('window', skipna=True).compute() ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow skipna in .dot() 713834297
707238146	https://github.com/pydata/xarray/issues/4482#issuecomment-707238146	https://api.github.com/repos/pydata/xarray/issues/4482	MDEyOklzc3VlQ29tbWVudDcwNzIzODE0Ng==	heerad 2560426	2020-10-12T17:01:54Z	2020-10-12T17:16:07Z	NONE	Adding on here, even if `fillna` were to create a memory copy, we'd only expect memory usage to double. However, in my case with dask-based chunking (via `parallel=True` in `open_mfdataset`) I'm seeing the memory blow up multiple times that (10x+) until all available memory is eaten up. This is happening with `x.fillna(0).dot(y)` as well as `x.notnull().dot(y)` and `x.weighted(y).sum(skipna=True)`. `x` is the array that's chunked. This suggests that dask-based chunking isn't following through into the `fillna` and `notnull` ops, and the entire non-chunked arrays are being computed. More evidence in favor: if I do `(x*y).sum(skipna=True)` I get the following error: `MemoryError: Unable to allocate [xxx] GiB for an array with shape [un-chunked array shape] and data type float64` I'm happy to live with a memory copy for now with `fillna` and `notnull`, but allocating the full, un-chunked array into memory is a showstopper. Is there a different workaround that I can use in the meantime?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow skipna in .dot() 713834297
702939943	https://github.com/pydata/xarray/issues/4482#issuecomment-702939943	https://api.github.com/repos/pydata/xarray/issues/4482	MDEyOklzc3VlQ29tbWVudDcwMjkzOTk0Mw==	heerad 2560426	2020-10-02T20:20:53Z	2020-10-02T20:32:32Z	NONE	Great, looks like I missed that option. Thanks. For reference, `x.fillna(0).dot(y)` takes 18 seconds in that same example, so a little better.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow skipna in .dot() 713834297
702346076	https://github.com/pydata/xarray/issues/4474#issuecomment-702346076	https://api.github.com/repos/pydata/xarray/issues/4474	MDEyOklzc3VlQ29tbWVudDcwMjM0NjA3Ng==	heerad 2560426	2020-10-01T19:20:50Z	2020-10-01T19:23:31Z	NONE	Looks like it's all in here: https://github.com/pydata/xarray/blob/6d8ac11ca0a785a6fe176eeca9b735c321a35527/xarray/core/dask_array_ops.py And it's used here: https://github.com/pydata/xarray/blob/6d8ac11ca0a785a6fe176eeca9b735c321a35527/xarray/core/rolling.py#L299	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Implement rolling_exp for dask arrays 712052219
702331156	https://github.com/pydata/xarray/issues/4474#issuecomment-702331156	https://api.github.com/repos/pydata/xarray/issues/4474	MDEyOklzc3VlQ29tbWVudDcwMjMzMTE1Ng==	heerad 2560426	2020-10-01T18:52:18Z	2020-10-01T18:52:18Z	NONE	Yes, see http://xarray.pydata.org/en/stable/computation.html#rolling-window-operations. `rolling` works with dask, but `rolling_exp` does not.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Implement rolling_exp for dask arrays 712052219
702307334	https://github.com/pydata/xarray/issues/4475#issuecomment-702307334	https://api.github.com/repos/pydata/xarray/issues/4475	MDEyOklzc3VlQ29tbWVudDcwMjMwNzMzNA==	heerad 2560426	2020-10-01T18:07:55Z	2020-10-01T18:07:55Z	NONE	Sounds good, I'll do this in the meantime. Still quite interested in `save_mfdataset` dealing with these lower level details, if possible. The ideal case would be loading with `load_mfdataset`, defining some ops lazily, then piping that directly to `save_mfdataset`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preprocess function for save_mfdataset 712189206
702265883	https://github.com/pydata/xarray/issues/4475#issuecomment-702265883	https://api.github.com/repos/pydata/xarray/issues/4475	MDEyOklzc3VlQ29tbWVudDcwMjI2NTg4Mw==	heerad 2560426	2020-10-01T16:52:59Z	2020-10-01T16:52:59Z	NONE	Multiple threads (the default), because it's recommended "for numeric code that releases the GIL (like NumPy, Pandas, Scikit-Learn, Numba, …)" according to the dask docs. I guess I could do multi-threaded for the compute part (everything up to the definition of `ds`), then multi-process for the write part, but doesn't that then require me to load everything into memory before writing?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preprocess function for save_mfdataset 712189206
702181324	https://github.com/pydata/xarray/issues/4474#issuecomment-702181324	https://api.github.com/repos/pydata/xarray/issues/4474	MDEyOklzc3VlQ29tbWVudDcwMjE4MTMyNA==	heerad 2560426	2020-10-01T14:39:01Z	2020-10-01T14:39:01Z	NONE	Great! This will be a common use-case for me, and I imagine others who are doing any sort of time series computation on large datasets.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Implement rolling_exp for dask arrays 712052219
702178407	https://github.com/pydata/xarray/issues/4475#issuecomment-702178407	https://api.github.com/repos/pydata/xarray/issues/4475	MDEyOklzc3VlQ29tbWVudDcwMjE3ODQwNw==	heerad 2560426	2020-10-01T14:34:28Z	2020-10-01T14:34:28Z	NONE	Thank you, this works for me. However, it's quite slow and seems to scale faster than linearly as the length of `datasets` increases (the number of groups in the `groupby`). Could it be connected to https://github.com/pydata/xarray/issues/2912#issuecomment-485497398 where they suggest to use `save_mfdataset` instead of `to_netcdf`? If so, there's a stronger case for supporting delayed objects in `save_mfdataset` as you said. Appreciate the help!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preprocess function for save_mfdataset 712189206
701676076	https://github.com/pydata/xarray/issues/4475#issuecomment-701676076	https://api.github.com/repos/pydata/xarray/issues/4475	MDEyOklzc3VlQ29tbWVudDcwMTY3NjA3Ng==	heerad 2560426	2020-09-30T22:17:24Z	2020-09-30T22:17:24Z	NONE	Unfortunately that doesn't work: `TypeError: save_mfdataset only supports writing Dataset objects, received type <class 'dask.delayed.Delayed'>`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preprocess function for save_mfdataset 712189206

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

14 rows where user = 2560426 sorted by updated_at descending

save many netcdf files, later to be concatted into a dask.delayed dataset

open lazily

a rolling window along a stacked dimension

we'll dot x_windows with y along the window dimension

incremental memory: 1.94 MiB

incremental memory: 20.00 MiB

incremental memory: 182.13 MiB

incremental memory: 211.52 MiB

Advanced export