github: issue_comments: 5 rows where issue = 289342234 sorted by updated

5 rows where issue = 289342234 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1161850924	https://github.com/pydata/xarray/issues/1836#issuecomment-1161850924	https://api.github.com/repos/pydata/xarray/issues/1836	IC_kwDOAMm_X85FQHAs	smartlixx 16891009	2022-06-21T14:50:02Z	2022-06-21T14:50:02Z	CONTRIBUTOR	Any update to this? I got HDF error for both multiprocessing and distributed scheduler.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	HDF5 error when working with compressed NetCDF files and the dask multiprocessing scheduler 289342234
361532119	https://github.com/pydata/xarray/issues/1836#issuecomment-361532119	https://api.github.com/repos/pydata/xarray/issues/1836	MDEyOklzc3VlQ29tbWVudDM2MTUzMjExOQ==	cchwala 102827	2018-01-30T09:32:26Z	2018-01-30T09:32:26Z	CONTRIBUTOR	Thanks @jhamman for looking into this. Currently I am fine with using `persist()` since I can break down my analysis workflow to certain time periods for which data fits into RAM on a large machine. As I have written, the distributed scheduler failed for me because of #1464. But I would like to use it in the future. From other discussions on the dask schedulers (here or on SO) using the distributed scheduler seems to be a general recommendation anyway. In summary, I am fine with my current workaround. I do not think that solving this issue has a high priority, in particular when the distributed scheduler is further improved. The main annoyance was to track down the problem described in my first post. Hence, maybe the limitations of the schedulers could be described a bit better in the documentation. Would you want a PR on this?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	HDF5 error when working with compressed NetCDF files and the dask multiprocessing scheduler 289342234
361466652	https://github.com/pydata/xarray/issues/1836#issuecomment-361466652	https://api.github.com/repos/pydata/xarray/issues/1836	MDEyOklzc3VlQ29tbWVudDM2MTQ2NjY1Mg==	jhamman 2443309	2018-01-30T03:35:07Z	2018-01-30T03:35:07Z	MEMBER	I tried the above example with the multiprocessing and distributed schedulers. With the multiprocessing scheduler, I can reproduce the error described above. With the distributed scheduler, no error is encountered. Python In [4]: import xarray as xr ...: import numpy as np ...: import dask.multiprocessing ...: ...: from dask.distributed import Client ...: ...: client = Client() ...: print(client) ...: ...: # Generate dummy data and build xarray dataset ...: mat = np.random.rand(10, 90, 90) ...: ds = xr.Dataset(data_vars={'foo': (('time', 'x', 'y'), mat)}) ...: ...: # Write dataset to netcdf without compression ...: ds.to_netcdf('dummy_data_3d.nc') ...: # Write with zlib compersison ...: ds.to_netcdf('dummy_data_3d_with_compression.nc', ...: encoding={'foo': {'zlib': True}}) ...: # Write data as int16 with scale factor applied ...: ds.to_netcdf('dummy_data_3d_with_scale_factor.nc', ...: encoding={'foo': {'dtype': 'int16', ...: 'scale_factor': 0.01, ...: '_FillValue': -9999}}) ...: ...: # Load data from netCDF files ...: ds_vanilla = xr.open_dataset('dummy_data_3d.nc', chunks={'time': 1}) ...: ds_scaled = xr.open_dataset('dummy_data_3d_with_scale_factor.nc', chunks={'time': 1}) ...: ds_compressed = xr.open_dataset('dummy_data_3d_with_compression.nc', chunks={'time': 1}) ...: ...: # Do computation using dask's multiprocessing scheduler ...: foo = ds_vanilla.foo.mean(dim=['x', 'y']).compute() ...: foo = ds_scaled.foo.mean(dim=['x', 'y']).compute() ...: foo = ds_compressed.foo.mean(dim=['x', 'y']).compute() I personally don't have any use cases that would prefer the multiprocessing scheduler over the distributed scheduler but I have been working on improving the I/O performance and stability with xarray and dask lately. If anyone would like to work on this, I'd gladly help this get cleaned up or put a more definitive no on whether or not this can/should work.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	HDF5 error when working with compressed NetCDF files and the dask multiprocessing scheduler 289342234
358445479	https://github.com/pydata/xarray/issues/1836#issuecomment-358445479	https://api.github.com/repos/pydata/xarray/issues/1836	MDEyOklzc3VlQ29tbWVudDM1ODQ0NTQ3OQ==	cchwala 102827	2018-01-17T21:07:43Z	2018-01-17T21:07:43Z	CONTRIBUTOR	Thanks for the quick answer. The problem is that my actual use case also involves writing back a `xarray.Dataset` via `to_netcdf()`. I left this out of the example above to isolate the problem. With the `distributed` scheduler and `to_netcdf()`, I ran into this issue #1464. As I can see, this might be fixed "soon" (#1793).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	HDF5 error when working with compressed NetCDF files and the dask multiprocessing scheduler 289342234
358395845	https://github.com/pydata/xarray/issues/1836#issuecomment-358395845	https://api.github.com/repos/pydata/xarray/issues/1836	MDEyOklzc3VlQ29tbWVudDM1ODM5NTg0NQ==	shoyer 1217238	2018-01-17T18:22:20Z	2018-01-17T18:22:20Z	MEMBER	This may be a limitation of multiprocessing with netCDF4. Can you try using dask's distributed scheduler? That might work better, even on a single machine.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	HDF5 error when working with compressed NetCDF files and the dask multiprocessing scheduler 289342234

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);