html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1396#issuecomment-306843285,https://api.github.com/repos/pydata/xarray/issues/1396,306843285,MDEyOklzc3VlQ29tbWVudDMwNjg0MzI4NQ==,1197350,2017-06-07T16:07:03Z,2017-06-07T16:07:03Z,MEMBER,"Hi @JanisGailis. Thanks for looking into this issue! I will give your solution a try as soon as I get some free time.
However, I would like to point out that the issue is completely resolved by dask/dask#2364. So this can probably be closed after the next dask release.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,225774140
https://github.com/pydata/xarray/issues/1396#issuecomment-303254497,https://api.github.com/repos/pydata/xarray/issues/1396,303254497,MDEyOklzc3VlQ29tbWVudDMwMzI1NDQ5Nw==,1197350,2017-05-23T00:16:58Z,2017-05-23T00:16:58Z,MEMBER,This dask bug also explains why it is so slow to generate the `repr` for these big datasets.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,225774140
https://github.com/pydata/xarray/issues/1396#issuecomment-301837577,https://api.github.com/repos/pydata/xarray/issues/1396,301837577,MDEyOklzc3VlQ29tbWVudDMwMTgzNzU3Nw==,1197350,2017-05-16T16:27:52Z,2017-05-16T16:27:52Z,MEMBER,"I have created a self-contained, reproducible example of this serious performance problem.
https://gist.github.com/rabernat/7175328ee04a3167fa4dede1821964c6
This issue is becoming a big problem for me. I imagine other people must be experiencing it too.
I am happy to try to dig in and fix it, but I think some of @shoyer's backend insight would be valuable first.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,225774140
https://github.com/pydata/xarray/issues/1396#issuecomment-298755789,https://api.github.com/repos/pydata/xarray/issues/1396,298755789,MDEyOklzc3VlQ29tbWVudDI5ODc1NTc4OQ==,1197350,2017-05-02T20:45:29Z,2017-05-02T20:45:41Z,MEMBER,"> dask may be loading full arrays to do this computation
This is definitely what I suspect is happening. The problem with adding more chunks is that I quickly hit my system ulimit (see #1394), since, for some reason, all the 1754 files are opened as soon as I call `.load()`. Putting more chunks in just multiplies that number.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,225774140
https://github.com/pydata/xarray/issues/1396#issuecomment-298745833,https://api.github.com/repos/pydata/xarray/issues/1396,298745833,MDEyOklzc3VlQ29tbWVudDI5ODc0NTgzMw==,1197350,2017-05-02T20:06:10Z,2017-05-02T20:06:10Z,MEMBER,"The relevant part of the stack trace is
```
/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/xarray/core/dataarray.py in load(self)
571 working with many file objects on disk.
572 """"""
--> 573 ds = self._to_temp_dataset().load()
574 new = self._from_temp_dataset(ds)
575 self._variable = new._variable
/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/xarray/core/dataset.py in load(self)
467
468 # evaluate all the dask arrays simultaneously
--> 469 evaluated_data = da.compute(*lazy_data.values())
470
471 for k, data in zip(lazy_data, evaluated_data):
/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/dask/base.py in compute(*args, **kwargs)
200 dsk = collections_to_dsk(variables, optimize_graph, **kwargs)
201 keys = [var._keys() for var in variables]
--> 202 results = get(dsk, keys, **kwargs)
203
204 results_iter = iter(results)
/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, **kwargs)
1523 return sync(self.loop, self._get, dsk, keys, restrictions=restrictions,
1524 loose_restrictions=loose_restrictions,
-> 1525 resources=resources)
1526
1527 def _optimize_insert_futures(self, dsk, keys):
/home/rpa/.conda/envs/dask_distributed/lib/python3.5/site-packages/distributed/utils.py in sync(loop, func, *args, **kwargs)
200 loop.add_callback(f)
201 while not e.is_set():
--> 202 e.wait(1000000)
203 if error[0]:
204 six.reraise(*error[0])
/home/rpa/.conda/envs/dask_distributed/lib/python3.5/threading.py in wait(self, timeout)
547 signaled = self._flag
548 if not signaled:
--> 549 signaled = self._cond.wait(timeout)
550 return signaled
551
/home/rpa/.conda/envs/dask_distributed/lib/python3.5/threading.py in wait(self, timeout)
295 else:
296 if timeout > 0:
--> 297 gotit = waiter.acquire(True, timeout)
298 else:
299 gotit = waiter.acquire(False)
KeyboardInterrupt:
```
I think the issue you are referring to is also mine (#1385).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,225774140