github: issue_comments: 5 rows where author_association = "NONE" and issue = 435535284 sorted by updated

5 rows where author_association = "NONE" and issue = 435535284 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
832864415	https://github.com/pydata/xarray/issues/2912#issuecomment-832864415	https://api.github.com/repos/pydata/xarray/issues/2912	MDEyOklzc3VlQ29tbWVudDgzMjg2NDQxNQ==	pinshuai 34693887	2021-05-05T17:12:19Z	2021-05-05T17:12:19Z	NONE	I had a similar issue. I am trying to save a big xarray (~2 GB) dataset using `to_netcdf()`. Dataset: I tried the following three approaches: Directly save using `dset.to_netcdf()` Load before save using `dset.load().to_netcdf()` Chunk data and save using `dset.chunk({'time': 19968}).to_netcdf()` All three approaches failed to write to file which cause the python kernel to hang indefinitely or die. Any suggestion?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Writing a netCDF file is unexpectedly slow 435535284
773820054	https://github.com/pydata/xarray/issues/2912#issuecomment-773820054	https://api.github.com/repos/pydata/xarray/issues/2912	MDEyOklzc3VlQ29tbWVudDc3MzgyMDA1NA==	bhanu-magotra 60338532	2021-02-05T06:20:40Z	2021-02-05T06:56:05Z	NONE	I am trying to perform a fairly simplistic operation on a dataset involving editing of variable and global attributes on individual netcdf files of 3.5GB each. The files load instantly using `xr.open_dataset` but `dataset.to_netcdf()` is too slow to export after the modifications. I have tried : 1. Without rechunking and dask invocations. 2. Varying chunk sizes followed by : 3. Using`load()`before `to_netcdf` 4. Using `persist()` or `compute ()` before `to_netcdf` I am working on a HPC with 10 distributed workers . In all cases, the time taken is more than 15 minutes per file. Is it expected? What else can I try to speed up this process apart from further parallelizing the single file operations using dask delayed?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Writing a netCDF file is unexpectedly slow 435535284
542369777	https://github.com/pydata/xarray/issues/2912#issuecomment-542369777	https://api.github.com/repos/pydata/xarray/issues/2912	MDEyOklzc3VlQ29tbWVudDU0MjM2OTc3Nw==	fsteinmetz 668201	2019-10-15T19:32:50Z	2019-10-15T19:32:50Z	NONE	Thanks for the explanations @jhamman and @shoyer :) Actually it turns out that I was not using particularly small chunks, but the filesystem for /tmp was faulty... After trying on a reliable filesystem, the results are much more reasonable.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Writing a netCDF file is unexpectedly slow 435535284
533801682	https://github.com/pydata/xarray/issues/2912#issuecomment-533801682	https://api.github.com/repos/pydata/xarray/issues/2912	MDEyOklzc3VlQ29tbWVudDUzMzgwMTY4Mg==	fsteinmetz 668201	2019-09-21T14:21:17Z	2019-09-21T14:21:17Z	NONE	There are ways to side step some of these challenges (`save_mfdataset` and the distributed dask scheduler) @jhamman Could you elaborate on these ways ? I am having severe slow-downs when writing Datasets by blocks (backed by dask). I have also noticed that the slowdowns do not occur when writing to ramdisk. Here are the timings of `to_netcdf`, which uses default engine and encoding (the nc file is 4.3 GB) : When writing to ramdisk (`/dev/shm/`) : 2min 1s When writing to `/tmp/` : 27min 28s When writing to `/tmp/` after `.load()`, as suggested here : 34s (`.load` takes 1min 43s) The workaround suggested here works, but the datasets may not always fit in memory, and it fails the essential purpose of dask... Note: I am using dask 2.3.0 and xarray 0.12.3	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Writing a netCDF file is unexpectedly slow 435535284
485505651	https://github.com/pydata/xarray/issues/2912#issuecomment-485505651	https://api.github.com/repos/pydata/xarray/issues/2912	MDEyOklzc3VlQ29tbWVudDQ4NTUwNTY1MQ==	msaharia 2014301	2019-04-22T18:32:30Z	2019-04-22T18:36:38Z	NONE	Diagnosis Thank you very much! I found this. For now, I will use the load() option. Loading netCDFs `In [8]: time ncdat=reformat_LIS_outputs(outlist) CPU times: user 7.78 s, sys: 220 ms, total: 8 s Wall time: 8.02 s` Slower export `In [6]: time ncdat.to_netcdf('test_slow') CPU times: user 12min, sys: 8.19 s, total: 12min 9s Wall time: 12min 14s` Faster export `In [9]: time ncdat.load().to_netcdf('test_faster.nc') CPU times: user 42.6 s, sys: 2.82 s, total: 45.4 s Wall time: 54.6 s`	{ "total_count": 9, "+1": 5, "-1": 0, "laugh": 1, "hooray": 1, "confused": 0, "heart": 1, "rocket": 1, "eyes": 0 }	Writing a netCDF file is unexpectedly slow 435535284

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

5 rows where author_association = "NONE" and issue = 435535284 sorted by updated_at descending

Diagnosis

Loading netCDFs

Slower export

Faster export

Advanced export