issue_comments
11 rows where issue = 435535284 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Writing a netCDF file is unexpectedly slow · 11 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
832864415 | https://github.com/pydata/xarray/issues/2912#issuecomment-832864415 | https://api.github.com/repos/pydata/xarray/issues/2912 | MDEyOklzc3VlQ29tbWVudDgzMjg2NDQxNQ== | pinshuai 34693887 | 2021-05-05T17:12:19Z | 2021-05-05T17:12:19Z | NONE | I had a similar issue. I am trying to save a big xarray (~2 GB) dataset using Dataset:
I tried the following three approaches:
All three approaches failed to write to file which cause the python kernel to hang indefinitely or die. Any suggestion? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Writing a netCDF file is unexpectedly slow 435535284 | |
773820054 | https://github.com/pydata/xarray/issues/2912#issuecomment-773820054 | https://api.github.com/repos/pydata/xarray/issues/2912 | MDEyOklzc3VlQ29tbWVudDc3MzgyMDA1NA== | bhanu-magotra 60338532 | 2021-02-05T06:20:40Z | 2021-02-05T06:56:05Z | NONE | I am trying to perform a fairly simplistic operation on a dataset involving editing of variable and global attributes on individual netcdf files of 3.5GB each. The files load instantly using |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Writing a netCDF file is unexpectedly slow 435535284 | |
542369777 | https://github.com/pydata/xarray/issues/2912#issuecomment-542369777 | https://api.github.com/repos/pydata/xarray/issues/2912 | MDEyOklzc3VlQ29tbWVudDU0MjM2OTc3Nw== | fsteinmetz 668201 | 2019-10-15T19:32:50Z | 2019-10-15T19:32:50Z | NONE | Thanks for the explanations @jhamman and @shoyer :) Actually it turns out that I was not using particularly small chunks, but the filesystem for /tmp was faulty... After trying on a reliable filesystem, the results are much more reasonable. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Writing a netCDF file is unexpectedly slow 435535284 | |
534869060 | https://github.com/pydata/xarray/issues/2912#issuecomment-534869060 | https://api.github.com/repos/pydata/xarray/issues/2912 | MDEyOklzc3VlQ29tbWVudDUzNDg2OTA2MA== | shoyer 1217238 | 2019-09-25T06:08:43Z | 2019-09-25T06:08:43Z | MEMBER | I suspect it could work pretty well to explicitly rechunk your dataset into larger chunks (e.g., with the |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Writing a netCDF file is unexpectedly slow 435535284 | |
534855337 | https://github.com/pydata/xarray/issues/2912#issuecomment-534855337 | https://api.github.com/repos/pydata/xarray/issues/2912 | MDEyOklzc3VlQ29tbWVudDUzNDg1NTMzNw== | jhamman 2443309 | 2019-09-25T05:12:32Z | 2019-09-25T05:12:32Z | MEMBER | @fsteinmetz - in my experience, the main thing to consider here is how and when xarray's backends lock/block for certain operations. The hdf5 library is not thread safe and so we implement a global lock around all hdf5 read/write operations. In most cases, this means we can only do one read or one write at a time per process. We have found that using Dask's distributed (or mulitprocessing) scheduler allows us to bypass the thread locks required by hdf5 by using multiple processes. We also need a per file lock when writing, so using multiple output datasets theoretically allows for concurrent writes (provided your filesystem and OS support this). Finally, its best not to jump to the complicated explanations first. If you have many small dask chunks in your dataset, both reading and writing will be quite inefficient. This is simply because there is some non-trivial overhead when accessing partial datasets. This is even worse when the dataset is chunked/compressed. Hope that helps. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Writing a netCDF file is unexpectedly slow 435535284 | |
533801682 | https://github.com/pydata/xarray/issues/2912#issuecomment-533801682 | https://api.github.com/repos/pydata/xarray/issues/2912 | MDEyOklzc3VlQ29tbWVudDUzMzgwMTY4Mg== | fsteinmetz 668201 | 2019-09-21T14:21:17Z | 2019-09-21T14:21:17Z | NONE |
@jhamman Could you elaborate on these ways ? I am having severe slow-downs when writing Datasets by blocks (backed by dask). I have also noticed that the slowdowns do not occur when writing to ramdisk. Here are the timings of
The workaround suggested here works, but the datasets may not always fit in memory, and it fails the essential purpose of dask... Note: I am using dask 2.3.0 and xarray 0.12.3 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Writing a netCDF file is unexpectedly slow 435535284 | |
485505651 | https://github.com/pydata/xarray/issues/2912#issuecomment-485505651 | https://api.github.com/repos/pydata/xarray/issues/2912 | MDEyOklzc3VlQ29tbWVudDQ4NTUwNTY1MQ== | msaharia 2014301 | 2019-04-22T18:32:30Z | 2019-04-22T18:36:38Z | NONE | DiagnosisThank you very much! I found this. For now, I will use the load() option. Loading netCDFs
Slower export
Faster export
|
{ "total_count": 9, "+1": 5, "-1": 0, "laugh": 1, "hooray": 1, "confused": 0, "heart": 1, "rocket": 1, "eyes": 0 } |
Writing a netCDF file is unexpectedly slow 435535284 | |
485497398 | https://github.com/pydata/xarray/issues/2912#issuecomment-485497398 | https://api.github.com/repos/pydata/xarray/issues/2912 | MDEyOklzc3VlQ29tbWVudDQ4NTQ5NzM5OA== | jhamman 2443309 | 2019-04-22T18:06:56Z | 2019-04-22T18:06:56Z | MEMBER | Since the final dataset size is quite manageable, I would start by forcing computation before the write step:
While writing of xarray datasets backed by dask is possible, its a poorly optimized operation. Most of this comes from constraints in netCDF4/HDF5. There are ways to side step some of these challenges ( |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Writing a netCDF file is unexpectedly slow 435535284 | |
485465687 | https://github.com/pydata/xarray/issues/2912#issuecomment-485465687 | https://api.github.com/repos/pydata/xarray/issues/2912 | MDEyOklzc3VlQ29tbWVudDQ4NTQ2NTY4Nw== | shoyer 1217238 | 2019-04-22T16:23:44Z | 2019-04-22T16:23:44Z | MEMBER | It really depends on the underlying cause. In most cases, writing a file to disk is not the slow part, only the place where the slow-down is manifested. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Writing a netCDF file is unexpectedly slow 435535284 | |
485464872 | https://github.com/pydata/xarray/issues/2912#issuecomment-485464872 | https://api.github.com/repos/pydata/xarray/issues/2912 | MDEyOklzc3VlQ29tbWVudDQ4NTQ2NDg3Mg== | dcherian 2448579 | 2019-04-22T16:21:00Z | 2019-04-22T16:21:20Z | MEMBER | Are there "best practices" for a situation like this? Parallel writes? ping @jhamman @rabernat |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Writing a netCDF file is unexpectedly slow 435535284 | |
485460901 | https://github.com/pydata/xarray/issues/2912#issuecomment-485460901 | https://api.github.com/repos/pydata/xarray/issues/2912 | MDEyOklzc3VlQ29tbWVudDQ4NTQ2MDkwMQ== | shoyer 1217238 | 2019-04-22T16:06:50Z | 2019-04-22T16:06:50Z | MEMBER | You're using dask, so the Dataset is being lazily computed. If one part of your pipeline is very expensive (perhaps reading the original data from disk?) then the process of saving can be very slow. I would suggest doing some profiling, e.g., as shown in this example: http://docs.dask.org/en/latest/diagnostics-local.html#example Once we know what the slow part is, that will hopefully make opportunities for improvement more obvious. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Writing a netCDF file is unexpectedly slow 435535284 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 7