github: issue_comments: 12 rows where issue = 225774140 sorted by updated

12 rows where issue = 225774140 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
453806119	https://github.com/pydata/xarray/issues/1396#issuecomment-453806119	https://api.github.com/repos/pydata/xarray/issues/1396	MDEyOklzc3VlQ29tbWVudDQ1MzgwNjExOQ==	jhamman 2443309	2019-01-13T06:32:45Z	2019-01-13T06:32:45Z	MEMBER	closed via https://github.com/dask/dask/pull/2364 (a long time ago)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	selecting a point from an mfdataset 225774140
383477976	https://github.com/pydata/xarray/issues/1396#issuecomment-383477976	https://api.github.com/repos/pydata/xarray/issues/1396	MDEyOklzc3VlQ29tbWVudDM4MzQ3Nzk3Ng==	rohitshukla0104 35775735	2018-04-23T07:18:23Z	2019-01-13T06:32:12Z	NONE	I am using MITgcm and want to incorporate my latitude and longitude information from my grid file to state file. Could you please help me in this regard?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	selecting a point from an mfdataset 225774140
306850272	https://github.com/pydata/xarray/issues/1396#issuecomment-306850272	https://api.github.com/repos/pydata/xarray/issues/1396	MDEyOklzc3VlQ29tbWVudDMwNjg1MDI3Mg==	JanisGailis 9655353	2017-06-07T16:30:04Z	2017-06-07T16:50:40Z	NONE	That's great to know! I think there's no need to try my 'solution' then, maybe only out of pure interest. It would of course be interesting to know why a 'custom' chunked dataset was apparently not affected by the bug. And if it was indeed the case. EDIT: I read the discussion on dask github and the xarray mailinglist. It's probably because when explicit chunking is used, the chunks are not aliased and fusing works as expected.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	selecting a point from an mfdataset 225774140
306843285	https://github.com/pydata/xarray/issues/1396#issuecomment-306843285	https://api.github.com/repos/pydata/xarray/issues/1396	MDEyOklzc3VlQ29tbWVudDMwNjg0MzI4NQ==	rabernat 1197350	2017-06-07T16:07:03Z	2017-06-07T16:07:03Z	MEMBER	Hi @JanisGailis. Thanks for looking into this issue! I will give your solution a try as soon as I get some free time. However, I would like to point out that the issue is completely resolved by dask/dask#2364. So this can probably be closed after the next dask release.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	selecting a point from an mfdataset 225774140
306838587	https://github.com/pydata/xarray/issues/1396#issuecomment-306838587	https://api.github.com/repos/pydata/xarray/issues/1396	MDEyOklzc3VlQ29tbWVudDMwNjgzODU4Nw==	JanisGailis 9655353	2017-06-07T15:51:34Z	2017-06-07T15:53:06Z	NONE	We had similar performance issues with xarray+dask, which we solved by using a chunking heuristic when opening a dataset. You can read about it in #1440. Now, in our case the data really wouldn't fit in memory, which is clearly not the case in your gist. Anyway, I thought I'd play around with your gist and see if chunking can make a difference. I couldn't use your example directly, as the data it generates in memory is too large for the dev VM I'm on with this. So I changed the generated file size to (12, 1000, 2000), the essence of your gist remained though, it would take ~25 seconds to do the time series extraction, whereas ~800 ms using `extract_point_xarray()`. So, I thought I'd try our 'chunking heuristic' on the generated test datasets. Simply split the dataset in 2x2 chunks along spatial dimensions. So: `python ds = xr.open_mfdataset(all_files, decode_cf=False, chunks={'time':12, 'x':1000, 'y':500})` To my surprise: ```python time extracting a timeseries of a single point y, x = 200, 300 with ProgressBar(): %time ts = ds.data[:, y, x].load() `results in` [########################################] \| 100% Completed \| 0.7s CPU times: user 124 ms, sys: 268 ms, total: 392 ms Wall time: 826 ms ``` I'm not entirely sure what's happening, as the file obviously fits in memory just fine because the looping thing works well. Maybe it's fine when you loop through them one by one, but the single file chunk turns out to be too large when dask wants to parallelize the whole thing. I really have no idea. I'd be very intrigued to see if you can get a similar result by doing a simple 2x2xtime chunking. By the way, `chunks={'x':1000, 'y':500, 'time':1}` produces similar results with some overhead. Extraction took ~1.5 seconds. EDIT: `python print(xr.__version__) print(dask.__version__)` `0.9.5 0.14.1`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	selecting a point from an mfdataset 225774140
303254497	https://github.com/pydata/xarray/issues/1396#issuecomment-303254497	https://api.github.com/repos/pydata/xarray/issues/1396	MDEyOklzc3VlQ29tbWVudDMwMzI1NDQ5Nw==	rabernat 1197350	2017-05-23T00:16:58Z	2017-05-23T00:16:58Z	MEMBER	This dask bug also explains why it is so slow to generate the `repr` for these big datasets.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	selecting a point from an mfdataset 225774140
301837577	https://github.com/pydata/xarray/issues/1396#issuecomment-301837577	https://api.github.com/repos/pydata/xarray/issues/1396	MDEyOklzc3VlQ29tbWVudDMwMTgzNzU3Nw==	rabernat 1197350	2017-05-16T16:27:52Z	2017-05-16T16:27:52Z	MEMBER	I have created a self-contained, reproducible example of this serious performance problem. https://gist.github.com/rabernat/7175328ee04a3167fa4dede1821964c6 This issue is becoming a big problem for me. I imagine other people must be experiencing it too. I am happy to try to dig in and fix it, but I think some of @shoyer's backend insight would be valuable first.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	selecting a point from an mfdataset 225774140
298755789	https://github.com/pydata/xarray/issues/1396#issuecomment-298755789	https://api.github.com/repos/pydata/xarray/issues/1396	MDEyOklzc3VlQ29tbWVudDI5ODc1NTc4OQ==	rabernat 1197350	2017-05-02T20:45:29Z	2017-05-02T20:45:41Z	MEMBER	dask may be loading full arrays to do this computation This is definitely what I suspect is happening. The problem with adding more chunks is that I quickly hit my system ulimit (see #1394), since, for some reason, all the 1754 files are opened as soon as I call `.load()`. Putting more chunks in just multiplies that number.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	selecting a point from an mfdataset 225774140
298755027	https://github.com/pydata/xarray/issues/1396#issuecomment-298755027	https://api.github.com/repos/pydata/xarray/issues/1396	MDEyOklzc3VlQ29tbWVudDI5ODc1NTAyNw==	shoyer 1217238	2017-05-02T20:42:37Z	2017-05-02T20:42:37Z	MEMBER	One thing worth trying is specifying `chunks` manually in `open_mfdataset`. Point-wise indexing should not really require chunks specified ahead of time, but the optimizations dask uses to make these operations efficient are somewhat fragile, so dask may be loading full arrays to do this computation.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	selecting a point from an mfdataset 225774140
298754480	https://github.com/pydata/xarray/issues/1396#issuecomment-298754480	https://api.github.com/repos/pydata/xarray/issues/1396	MDEyOklzc3VlQ29tbWVudDI5ODc1NDQ4MA==	shoyer 1217238	2017-05-02T20:40:35Z	2017-05-02T20:40:35Z	MEMBER	OK, so that isn't terribly useful -- the slow-down is somewhere in dask-land. If it was an issue with alignment, that would come up when building the dask graph, not computing it.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	selecting a point from an mfdataset 225774140
298745833	https://github.com/pydata/xarray/issues/1396#issuecomment-298745833	https://api.github.com/repos/pydata/xarray/issues/1396	MDEyOklzc3VlQ29tbWVudDI5ODc0NTgzMw==	rabernat 1197350	2017-05-02T20:06:10Z	2017-05-02T20:06:10Z	MEMBER	The relevant part of the stack trace is ``` /home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/xarray/core/dataarray.py in load(self) 571 working with many file objects on disk. 572 """ --> 573 ds = self._to_temp_dataset().load() 574 new = self._from_temp_dataset(ds) 575 self._variable = new._variable /home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/xarray/core/dataset.py in load(self) 467 468 # evaluate all the dask arrays simultaneously --> 469 evaluated_data = da.compute(lazy_data.values()) 470 471 for k, data in zip(lazy_data, evaluated_data): /home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/dask/base.py in compute(args,* kwargs) 200 dsk = collections_to_dsk(variables, optimize_graph, kwargs) 201 keys = [var._keys() for var in variables] --> 202 results = get(dsk, keys, kwargs) 203 204 results_iter = iter(results) /home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, kwargs) 1523 return sync(self.loop, self._get, dsk, keys, restrictions=restrictions, 1524 loose_restrictions=loose_restrictions, -> 1525 resources=resources) 1526 1527 def _optimize_insert_futures(self, dsk, keys): /home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/distributed/utils.py in sync(loop, func, args,* kwargs) 200 loop.add_callback(f) 201 while not e.is_set(): --> 202 e.wait(1000000) 203 if error[0]: 204 six.reraise(error[0]) /home/rpa/.conda/envs/dask_distributed/lib/python3.5/threading.py in wait(self, timeout) 547 signaled = self._flag 548 if not signaled: --> 549 signaled = self._cond.wait(timeout) 550 return signaled 551 /home/rpa/.conda/envs/dask_distributed/lib/python3.5/threading.py in wait(self, timeout) 295 else: 296 if timeout > 0: --> 297 gotit = waiter.acquire(True, timeout) 298 else: 299 gotit = waiter.acquire(False) KeyboardInterrupt: ``` I think the issue you are referring to is also mine (#1385).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	selecting a point from an mfdataset 225774140
298738745	https://github.com/pydata/xarray/issues/1396#issuecomment-298738745	https://api.github.com/repos/pydata/xarray/issues/1396	MDEyOklzc3VlQ29tbWVudDI5ODczODc0NQ==	shoyer 1217238	2017-05-02T19:38:36Z	2017-05-02T19:38:36Z	MEMBER	Can you try using `Ctrl + C` to interrupt things and report the stack-trace? This might be an issue with xarray verifying aligned coordinates, which we should have an option to disable. (This came up somewhere else recently, but I couldn't find the issue with a quick search.)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	selecting a point from an mfdataset 225774140

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

12 rows where issue = 225774140 sorted by updated_at descending

time extracting a timeseries of a single point

Advanced export