github: issue_comments: 5 rows where issue = 327064908 sorted by updated

5 rows where issue = 327064908 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
454162108	https://github.com/pydata/xarray/issues/2190#issuecomment-454162108	https://api.github.com/repos/pydata/xarray/issues/2190	MDEyOklzc3VlQ29tbWVudDQ1NDE2MjEwOA==	max-sixty 5635139	2019-01-14T21:09:03Z	2019-01-14T21:09:03Z	MEMBER	In an effort to reduce the issue backlog, I'll close this, but please reopen if you disagree	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel non-locked read using dask.Client crashes 327064908
392672562	https://github.com/pydata/xarray/issues/2190#issuecomment-392672562	https://api.github.com/repos/pydata/xarray/issues/2190	MDEyOklzc3VlQ29tbWVudDM5MjY3MjU2Mg==	shoyer 1217238	2018-05-29T06:59:32Z	2018-05-29T06:59:32Z	MEMBER	Indeed, HDF5 supports parallel IO, but only with MPI. Unfortunately that didn't work with Dask, at least not yet. Zarr is certainly worth a try for performance. The motivation for zarr (rather than HDF5) was performance with distributed reads/writes, especially with cloud storage. On Mon, May 28, 2018 at 11:27 PM Karel van de Plassche notifications@github.com wrote: @shoyer https://github.com/shoyer Thanks for your answer. Too bad. Maybe this could be documented in the 'dask' chapter? Or maybe even raise a warning when using open_dataset with lock=False on a netCDF4 file? Unfortunately there seems to be some conflicting information floating around, which is hard to spot for a non-expert like me. It might of course just be that xarray doesn't support it (yet). For example: python-netcdf4 support parallel read: Unidata/netcdf4-python#536 https://github.com/Unidata/netcdf4-python/issues/536 python-netcdf4 MPI parallel write/read: https://github.com/Unidata/netcdf4-python/blob/master/examples/mpi_example.py http://unidata.github.io/netcdf4-python/#section13 Using h5py directly (not supported by xarray I think): http://docs.h5py.org/en/latest/mpi.html Seems to suggest multiple read is fine: dask/dask#3074 (comment) https://github.com/dask/dask/issues/3074#issuecomment-359030028 You might have better luck using dask-distributed multiple processes, but then you'll encounter other bottlenecks with data transfer. I'll do some more experiments, thanks for this suggestion. I am not bound to netCDF4 (although I need the compression, so no netCDF3 unfortunately), so would moving to Zarr help improving IO performance? I'd really like to keep using xarray, thanks for this awesome library! Even with the disk IO performance hit, it's still more than worth it to use it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2190#issuecomment-392666250, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1pE46j-sU2hCgTUeBAg9VyTpv5ESks5t3OppgaJpZM4UQXTS .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel non-locked read using dask.Client crashes 327064908
392666250	https://github.com/pydata/xarray/issues/2190#issuecomment-392666250	https://api.github.com/repos/pydata/xarray/issues/2190	MDEyOklzc3VlQ29tbWVudDM5MjY2NjI1MA==	Karel-van-de-Plassche 6404167	2018-05-29T06:27:52Z	2018-05-29T06:35:02Z	CONTRIBUTOR	@shoyer Thanks for your answer. Too bad. Maybe this could be documented in the 'dask' chapter? Or maybe even raise a warning when using open_dataset with `lock=False` on a netCDF4 file? Unfortunately there seems to be some conflicting information floating around, which is hard to spot for a non-expert like me. It might of course just be that xarray doesn't support it (yet). I think MPI-style opening is a whole different beast, right? For example: python-netcdf4 support parallel read in threads: https://github.com/Unidata/netcdf4-python/issues/536 python-netcdf4 MPI parallel write/read: https://github.com/Unidata/netcdf4-python/blob/master/examples/mpi_example.py http://unidata.github.io/netcdf4-python/#section13 Using h5py directly (not supported by xarray I think): http://docs.h5py.org/en/latest/mpi.html Seems to suggest multiple read is fine: https://github.com/dask/dask/issues/3074#issuecomment-359030028 You might have better luck using dask-distributed multiple processes, but then you'll encounter other bottlenecks with data transfer. I'll do some more experiments, thanks for this suggestion. I am not bound to netCDF4 (although I need the compression, so no netCDF3 unfortunately), so would moving to Zarr help improving IO performance? I'd really like to keep using xarray, thanks for this awesome library! Even with the disk IO performance hit, it's still more than worth it to use it.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel non-locked read using dask.Client crashes 327064908
392649160	https://github.com/pydata/xarray/issues/2190#issuecomment-392649160	https://api.github.com/repos/pydata/xarray/issues/2190	MDEyOklzc3VlQ29tbWVudDM5MjY0OTE2MA==	shoyer 1217238	2018-05-29T04:24:58Z	2018-05-29T04:24:58Z	MEMBER	Maybe there's some place we could document this more clearly? `lock=False` would still be useful if you're reading/writing netCDF3 files.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel non-locked read using dask.Client crashes 327064908
392647556	https://github.com/pydata/xarray/issues/2190#issuecomment-392647556	https://api.github.com/repos/pydata/xarray/issues/2190	MDEyOklzc3VlQ29tbWVudDM5MjY0NzU1Ng==	shoyer 1217238	2018-05-29T04:11:55Z	2018-05-29T04:11:55Z	MEMBER	Unfortunately HDF5 doesn't support reading or writing files (even different files) in parallel via the same process, which is why xarray by default adds a lock around all read/write operations from NetCDF4/HDF5 files. So I'm afraid this is expected behavior. You might have better luck using dask-distributed multiple processes, but then you'll encounter other bottlenecks with data transfer.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel non-locked read using dask.Client crashes 327064908

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);