github: issue_comments: 18 rows where issue = 304589831 sorted by updated

18 rows where issue = 304589831 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
382487555	https://github.com/pydata/xarray/pull/1983#issuecomment-382487555	https://api.github.com/repos/pydata/xarray/issues/1983	MDEyOklzc3VlQ29tbWVudDM4MjQ4NzU1NQ==	jhamman 2443309	2018-04-18T18:38:47Z	2018-04-18T18:38:47Z	MEMBER	With my last commits here, this feature is completely optional and defaults to the current behavior. I cleaned up the tests a bit further and am now ready to merge this. Baring any objections, I'll merge this on Friday.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel open_mfdataset 304589831
382157273	https://github.com/pydata/xarray/pull/1983#issuecomment-382157273	https://api.github.com/repos/pydata/xarray/issues/1983	MDEyOklzc3VlQ29tbWVudDM4MjE1NzI3Mw==	jhamman 2443309	2018-04-17T21:41:03Z	2018-04-17T21:41:03Z	MEMBER	I think that makes sense for now. We need to experiment with this a bit more but I don't see a problem merging the basic workflow we have now (with a minor change to the default behavior).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel open_mfdataset 304589831
382154051	https://github.com/pydata/xarray/pull/1983#issuecomment-382154051	https://api.github.com/repos/pydata/xarray/issues/1983	MDEyOklzc3VlQ29tbWVudDM4MjE1NDA1MQ==	shoyer 1217238	2018-04-17T21:30:53Z	2018-04-17T21:30:53Z	MEMBER	It sounds like the right resolution for now would be to leave the default as `parallel=False` and leave this as an optional feature.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel open_mfdataset 304589831
382146851	https://github.com/pydata/xarray/pull/1983#issuecomment-382146851	https://api.github.com/repos/pydata/xarray/issues/1983	MDEyOklzc3VlQ29tbWVudDM4MjE0Njg1MQ==	jhamman 2443309	2018-04-17T21:08:29Z	2018-04-17T21:08:29Z	MEMBER	@NicWayand - Thanks for giving this a go. Some thoughts on your problem... I'm have been using this feature for the past few days and have been seeing a speedup on datasets with many files along the lines of what I showed above. I am applying my tests on perhaps the perfect test architecture (parallel shared fs, fast interconnect, etc.). I think there are many reasons/cases where this won't work as well.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel open_mfdataset 304589831
382071801	https://github.com/pydata/xarray/pull/1983#issuecomment-382071801	https://api.github.com/repos/pydata/xarray/issues/1983	MDEyOklzc3VlQ29tbWVudDM4MjA3MTgwMQ==	NicWayand 1117224	2018-04-17T17:14:33Z	2018-04-17T17:38:42Z	NONE	Thanks @jhamman for working on this! I did a test on my real world data (1202 ~3mb files) on my local computer and am not getting results I expected: 1) No speed up with parallel=True 2) Slow down when using distributed (processes=16 cores=16). Am I missing something? ```python nc_files = glob.glob(E.obs['NSIDC_0081']['sipn_nc']+'/*.nc') print(len(nc_files)) 1202 Parallel False %time ds = xr.open_mfdataset(nc_files, concat_dim='time', parallel=False, autoclose=True) CPU times: user 57.8 s, sys: 3.2 s, total: 1min 1s Wall time: 1min Parallel True with default scheduler %time ds = xr.open_mfdataset(nc_files, concat_dim='time', parallel=True, autoclose=True) CPU times: user 1min 16s, sys: 9.82 s, total: 1min 26s Wall time: 1min 16s Parallel True with distributed from dask.distributed import Client client = Client() print(client) <Client: scheduler='tcp://127.0.0.1:43291' processes=16 cores=16> %time ds = xr.open_mfdataset(nc_files, concat_dim='time', parallel=True, autoclose=True) CPU times: user 2min 17s, sys: 12.3 s, total: 2min 29s Wall time: 3min 48s ``` On feature/parallel_open_netcdf commit 280a46f13426a462fb3e983cfd5ac7a0565d1826	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel open_mfdataset 304589831
381277673	https://github.com/pydata/xarray/pull/1983#issuecomment-381277673	https://api.github.com/repos/pydata/xarray/issues/1983	MDEyOklzc3VlQ29tbWVudDM4MTI3NzY3Mw==	jhamman 2443309	2018-04-13T22:42:59Z	2018-04-13T22:42:59Z	MEMBER	@rabernat - I got the tests passing here again. If you can make the time to try your example/test again, it would be great to figure out what wasn't working before.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel open_mfdataset 304589831
380257320	https://github.com/pydata/xarray/pull/1983#issuecomment-380257320	https://api.github.com/repos/pydata/xarray/issues/1983	MDEyOklzc3VlQ29tbWVudDM4MDI1NzMyMA==	jhamman 2443309	2018-04-10T21:44:28Z	2018-04-10T21:45:02Z	MEMBER	@rabernat - I just pushed a few more commits here. Can I ask two questions: When using the distributed scheduler, what configuration are you using? Can you try: - `autoclose=True` (in open_mfdataset) - `processes=True` (in client) If this turns out to be a corner case with the distributed scheduler, I can add a integration test for that specific use case.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel open_mfdataset 304589831
380150362	https://github.com/pydata/xarray/pull/1983#issuecomment-380150362	https://api.github.com/repos/pydata/xarray/issues/1983	MDEyOklzc3VlQ29tbWVudDM4MDE1MDM2Mg==	jhamman 2443309	2018-04-10T15:49:06Z	2018-04-10T15:49:06Z	MEMBER	@rabernat - my last commit(s) seem to have broken the CI so I'll need to revisit this.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel open_mfdataset 304589831
380121937	https://github.com/pydata/xarray/pull/1983#issuecomment-380121937	https://api.github.com/repos/pydata/xarray/issues/1983	MDEyOklzc3VlQ29tbWVudDM4MDEyMTkzNw==	rabernat 1197350	2018-04-10T14:32:25Z	2018-04-10T14:32:25Z	MEMBER	I recently tried this branch with my data server and got an error. I opened a dataset this way ```python works fine with parallel=False ds = xr.open_mfdataset(os.path.join(ddir, 'V1_1.204.nc'), decode_cf=False, parallel=True) ``` and got the following error. distributed.utils - ERROR - NetCDF: HDF error Traceback (most recent call last): File "/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/distributed/utils.py", line 237, in f result[0] = yield make_coro() File "/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/tornado/gen.py", line 1055, in run value = future.result() File "/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/tornado/concurrent.py", line 238, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info File "/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/tornado/gen.py", line 1063, in run yielded = self.gen.throw(exc_info) File "/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/distributed/client.py", line 1356, in _gather traceback) File "/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/six.py", line 692, in reraise raise value.with_traceback(tb) File "/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/dask/compatibility.py", line 48, in apply return func(args, *kwargs) File "/home/rpa/xarray/xarray/backends/api.py", line 318, in open_dataset return maybe_decode_store(store, lock) File "/home/rpa/xarray/xarray/backends/api.py", line 238, in maybe_decode_store drop_variables=drop_variables) File "/home/rpa/xarray/xarray/conventions.py", line 594, in decode_cf vars, attrs = obj.load() File "/home/rpa/xarray/xarray/backends/common.py", line 217, in load for k, v in self.get_variables().items()) File "/home/rpa/xarray/xarray/backends/netCDF4_.py", line 319, in get_variables iteritems(self.ds.variables)) File "/home/rpa/xarray/xarray/core/utils.py", line 308, in FrozenOrderedDict return Frozen(OrderedDict(args, **kwargs)) File "/home/rpa/xarray/xarray/backends/netCDF4_.py", line 318, in <genexpr> for k, v in File "/home/rpa/xarray/xarray/backends/netCDF4_.py", line 311, in open_store_variable encoding['original_shape'] = var.shape File "netCDF4/_netCDF4.pyx", line 3381, in netCDF4._netCDF4.Variable.shape.__get__ (netCDF4/_netCDF4.c:34388) File "netCDF4/_netCDF4.pyx", line 2759, in netCDF4._netCDF4.Dimension.__len__ (netCDF4/_netCDF4.c:27006) RuntimeError: NetCDF: HDF error Without the distributed scheduler (but with `parallel=True`), I get no error, but the command never returns, and eventually I have to restart the kernel. Any idea what could be going on? (Sorry for the non-reproducible bug report...I figured some trials "in the field" might be useful.)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel open_mfdataset 304589831
379323343	https://github.com/pydata/xarray/pull/1983#issuecomment-379323343	https://api.github.com/repos/pydata/xarray/issues/1983	MDEyOklzc3VlQ29tbWVudDM3OTMyMzM0Mw==	jhamman 2443309	2018-04-06T17:33:45Z	2018-04-06T17:33:45Z	MEMBER	All the tests are passing here? Any final objectors?	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel open_mfdataset 304589831
379306351	https://github.com/pydata/xarray/pull/1983#issuecomment-379306351	https://api.github.com/repos/pydata/xarray/issues/1983	MDEyOklzc3VlQ29tbWVudDM3OTMwNjM1MQ==	jhamman 2443309	2018-04-06T16:29:15Z	2018-04-06T16:29:15Z	MEMBER	I image there will be a small performance cost when the number of files is small. That cost is probably lost in the noise in most i/o operations.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel open_mfdataset 304589831
379305062	https://github.com/pydata/xarray/pull/1983#issuecomment-379305062	https://api.github.com/repos/pydata/xarray/issues/1983	MDEyOklzc3VlQ29tbWVudDM3OTMwNTA2Mg==	rabernat 1197350	2018-04-06T16:24:22Z	2018-04-06T16:24:22Z	MEMBER	Can we imagine cases where it might actually degrade performance?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel open_mfdataset 304589831
379304351	https://github.com/pydata/xarray/pull/1983#issuecomment-379304351	https://api.github.com/repos/pydata/xarray/issues/1983	MDEyOklzc3VlQ29tbWVudDM3OTMwNDM1MQ==	shoyer 1217238	2018-04-06T16:21:51Z	2018-04-06T16:21:51Z	MEMBER	My reason for suggesting default `parallel=True` when using distributed is default to turning this feature on when we can expect it will probably improve performance.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel open_mfdataset 304589831
379303753	https://github.com/pydata/xarray/pull/1983#issuecomment-379303753	https://api.github.com/repos/pydata/xarray/issues/1983	MDEyOklzc3VlQ29tbWVudDM3OTMwMzc1Mw==	jhamman 2443309	2018-04-06T16:19:35Z	2018-04-06T16:19:35Z	MEMBER	I'm curious about the logic of defaulting to parallel when using distributed. I'm not tied to the behavior. It was suggested by @shoyer a while back. Perhaps we try this and evaluate how it works in the wild?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel open_mfdataset 304589831
376689828	https://github.com/pydata/xarray/pull/1983#issuecomment-376689828	https://api.github.com/repos/pydata/xarray/issues/1983	MDEyOklzc3VlQ29tbWVudDM3NjY4OTgyOA==	jhamman 2443309	2018-03-27T21:59:35Z	2018-03-27T21:59:35Z	MEMBER	Have you tested this with both a local system and an HPC cluster? I have. See below for a simple example using this feature on Cheyenne. ```python In [1]: import xarray as xr ...: ...: import glob ...: In [2]: pattern = '/glade/u/home/jhamman/workdir/LOCA_daily/met_data/CESM1-BGC/16th/rcp45/r1i1p1//nc' In [3]: len(glob.glob(pattern)) Out[3]: 285 In [4]: %time ds = xr.open_mfdataset(pattern) CPU times: user 15.5 s, sys: 2.62 s, total: 18.1 s Wall time: 42.4 s In [5]: ds.close() In [6]: %time ds = xr.open_mfdataset(pattern, parallel=True) CPU times: user 18.4 s, sys: 5.28 s, total: 23.6 s Wall time: 30.7 s In [7]: ds.close() In [8]: from dask.distributed import Client In [9]: client = Client() clien In [10]: client Out[10]: <Client: scheduler='tcp://127.0.0.1:39853' processes=72 cores=72> In [11]: %time ds = xr.open_mfdataset(pattern, parallel=True, autoclose=True) CPU times: user 10.8 s, sys: 808 ms, total: 11.6 s Wall time: 12.4 s ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel open_mfdataset 304589831
375799794	https://github.com/pydata/xarray/pull/1983#issuecomment-375799794	https://api.github.com/repos/pydata/xarray/issues/1983	MDEyOklzc3VlQ29tbWVudDM3NTc5OTc5NA==	jhamman 2443309	2018-03-23T21:12:33Z	2018-03-23T21:12:33Z	MEMBER	I'm tempted to just skip this test there but thought I should ask for help first... I've skipped the offending test on appveyor for now. Objectors speak up please. I don't have a windows machine to test on and iterating via appveyor is not something a sane person does 😉.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel open_mfdataset 304589831
373245814	https://github.com/pydata/xarray/pull/1983#issuecomment-373245814	https://api.github.com/repos/pydata/xarray/issues/1983	MDEyOklzc3VlQ29tbWVudDM3MzI0NTgxNA==	jhamman 2443309	2018-03-15T03:05:08Z	2018-03-15T03:05:08Z	MEMBER	If anyone understands Windows file handling with Python, I'm all ears as to why this is failing on AppVeyor. I'm tempted to just skip this test there but thought I should ask for help first...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel open_mfdataset 304589831
372807932	https://github.com/pydata/xarray/pull/1983#issuecomment-372807932	https://api.github.com/repos/pydata/xarray/issues/1983	MDEyOklzc3VlQ29tbWVudDM3MjgwNzkzMg==	jhamman 2443309	2018-03-13T20:30:49Z	2018-03-13T20:30:49Z	MEMBER	@shoyer - I updated this to use dask.delayed. I actually like it more because I only have to call compute once. Thanks for the suggestion.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Parallel open_mfdataset 304589831

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

18 rows where issue = 304589831 sorted by updated_at descending

Parallel False

Parallel True with default scheduler

Parallel True with distributed

works fine with parallel=False

Advanced export