html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/pull/1128#issuecomment-265966887,https://api.github.com/repos/pydata/xarray/issues/1128,265966887,MDEyOklzc3VlQ29tbWVudDI2NTk2Njg4Nw==,743508,2016-12-09T09:08:48Z,2016-12-09T09:08:48Z,CONTRIBUTOR,"@shoyer thanks, with a little testing it seems `lock=False` is fine (so don't automatically need dask dev for `lock=dask.utils.SerializableLock()`). Using spawning pool is necessary, just doesn't work without. Also looks like using dask distributed ipython backend works fine (works similar to spawn pool in that the worker engines aren't forked but kinda live in their own little world) - this is really nice because ipython in turn has good support for HPC systems (SGE batch scheduling + MPI for process handling).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033 https://github.com/pydata/xarray/pull/1128#issuecomment-265878280,https://api.github.com/repos/pydata/xarray/issues/1128,265878280,MDEyOklzc3VlQ29tbWVudDI2NTg3ODI4MA==,1217238,2016-12-08T22:44:12Z,2016-12-08T22:44:12Z,MEMBER,"@mangecoeur You still need to use `lock=False` (or `lock=dask.utils.SerializableLock()` with the dev version of dask) and use a spawning process pool (https://github.com/pydata/xarray/pull/1128#issuecomment-261936849). The former should be updated internally, and the later should be a documentation note.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033 https://github.com/pydata/xarray/pull/1128#issuecomment-265875012,https://api.github.com/repos/pydata/xarray/issues/1128,265875012,MDEyOklzc3VlQ29tbWVudDI2NTg3NTAxMg==,743508,2016-12-08T22:28:25Z,2016-12-08T22:28:25Z,CONTRIBUTOR,I'm trying out the latest code to subset a set of netcdf4 files with dask.multiprocessing using `set_options(get=dask.multiprocessing.get)` but I'm still getting `TypeError: can't pickle _thread.lock objects` - this expect or there something specific I need to do to make it work?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033 https://github.com/pydata/xarray/pull/1128#issuecomment-264033283,https://api.github.com/repos/pydata/xarray/issues/1128,264033283,MDEyOklzc3VlQ29tbWVudDI2NDAzMzI4Mw==,1217238,2016-11-30T23:44:54Z,2016-11-30T23:44:54Z,MEMBER,"OK, in it goes!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033 https://github.com/pydata/xarray/pull/1128#issuecomment-264032725,https://api.github.com/repos/pydata/xarray/issues/1128,264032725,MDEyOklzc3VlQ29tbWVudDI2NDAzMjcyNQ==,346079,2016-11-30T23:42:10Z,2016-11-30T23:42:10Z,NONE,"No objections, go ahead!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033 https://github.com/pydata/xarray/pull/1128#issuecomment-263968185,https://api.github.com/repos/pydata/xarray/issues/1128,263968185,MDEyOklzc3VlQ29tbWVudDI2Mzk2ODE4NQ==,6213168,2016-11-30T19:21:32Z,2016-11-30T19:21:32Z,MEMBER,"All looks good, go on On 30 Nov 2016 16:50, ""Stephan Hoyer"" wrote: > @kynan @crusaderky > Do you have concerns about merging this > in the current state? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > , or mute > the thread > > . > ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033 https://github.com/pydata/xarray/pull/1128#issuecomment-263927223,https://api.github.com/repos/pydata/xarray/issues/1128,263927223,MDEyOklzc3VlQ29tbWVudDI2MzkyNzIyMw==,1217238,2016-11-30T16:50:48Z,2016-11-30T16:50:48Z,MEMBER,@kynan @crusaderky Do you have concerns about merging this in the current state?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033 https://github.com/pydata/xarray/pull/1128#issuecomment-263926969,https://api.github.com/repos/pydata/xarray/issues/1128,263926969,MDEyOklzc3VlQ29tbWVudDI2MzkyNjk2OQ==,1217238,2016-11-30T16:49:53Z,2016-11-30T16:49:53Z,MEMBER,"I decided that between the choices of not running these tests on Windows and leaking a few temp files, I would rather leak some temp files. So that's exactly what I've done in the latest commit, for explicitly whitelisted tests.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033 https://github.com/pydata/xarray/pull/1128#issuecomment-263346169,https://api.github.com/repos/pydata/xarray/issues/1128,263346169,MDEyOklzc3VlQ29tbWVudDI2MzM0NjE2OQ==,306380,2016-11-28T18:05:54Z,2016-11-28T18:05:54Z,MEMBER,I agree that it's not great. This was more a show of solidarity that we've also run into this same issue and had to resort to similar hacks. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033 https://github.com/pydata/xarray/pull/1128#issuecomment-263345757,https://api.github.com/repos/pydata/xarray/issues/1128,263345757,MDEyOklzc3VlQ29tbWVudDI2MzM0NTc1Nw==,1217238,2016-11-28T18:04:17Z,2016-11-28T18:04:17Z,MEMBER,"@mrocklin OK, so one option is to just ignore the permission errors and not remove the files on Windows. But is it really better to make the test suite leak temp files?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033 https://github.com/pydata/xarray/pull/1128#issuecomment-263345054,https://api.github.com/repos/pydata/xarray/issues/1128,263345054,MDEyOklzc3VlQ29tbWVudDI2MzM0NTA1NA==,306380,2016-11-28T18:01:41Z,2016-11-28T18:01:41Z,MEMBER,@shoyer https://github.com/dask/dask/blob/master/dask/utils.py#L68-L84,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033 https://github.com/pydata/xarray/pull/1128#issuecomment-263344764,https://api.github.com/repos/pydata/xarray/issues/1128,263344764,MDEyOklzc3VlQ29tbWVudDI2MzM0NDc2NA==,1217238,2016-11-28T18:00:38Z,2016-11-28T18:00:38Z,MEMBER,"OK, I'm ready to give up on the remaining test failures and merge this anyways (marking them as expected failures). They are specific to our test suite and for Windows only, due to the inability to delete files that are not closed. If these manifest themselves as issues for real users, I am happy to revisit, especially if someone who uses Windows can help debug. The 5 minute feedback cycle of pushing a commit and then seeing what Appveyor says is too painful.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033 https://github.com/pydata/xarray/pull/1128#issuecomment-261989094,https://api.github.com/repos/pydata/xarray/issues/1128,261989094,MDEyOklzc3VlQ29tbWVudDI2MTk4OTA5NA==,306380,2016-11-21T16:29:25Z,2016-11-21T16:29:25Z,MEMBER,"> Why, yes it does -- and it shows a nice speedup, as well! What was I missing here? Spawn is only available in Python 3, so it's not a full solution. Something isn't fork-safe, possibly something within the HDF5 library? You might also want to try `forkserver` and look at this semi-related PR https://github.com/dask/distributed/pull/687","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033 https://github.com/pydata/xarray/pull/1128#issuecomment-261980869,https://api.github.com/repos/pydata/xarray/issues/1128,261980869,MDEyOklzc3VlQ29tbWVudDI2MTk4MDg2OQ==,1217238,2016-11-21T16:04:14Z,2016-11-21T16:04:14Z,MEMBER,"> Does your failure work with the following spawning pool in Python 3? Why, yes it does -- and it shows a nice speedup, as well! What was I missing here?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033 https://github.com/pydata/xarray/pull/1128#issuecomment-261936849,https://api.github.com/repos/pydata/xarray/issues/1128,261936849,MDEyOklzc3VlQ29tbWVudDI2MTkzNjg0OQ==,306380,2016-11-21T13:21:21Z,2016-11-21T13:21:21Z,MEMBER,"Does your failure work with the following spawning pool in Python 3? ```python In [1]: import multiprocessing In [2]: ctx = multiprocessing.get_context('spawn') In [3]: ctx.Pool(4) Out[3]: ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033 https://github.com/pydata/xarray/pull/1128#issuecomment-261841025,https://api.github.com/repos/pydata/xarray/issues/1128,261841025,MDEyOklzc3VlQ29tbWVudDI2MTg0MTAyNQ==,1217238,2016-11-21T04:36:02Z,2016-11-21T04:36:02Z,MEMBER,"This isn't yet working with dask multiprocessing for reading a netCDF4 file with in-memory compression. I'm not quite sure why: ``` In [5]: from multiprocessing.pool import Pool In [7]: ds = xr.open_dataset('big-random.nc', lock=False, chunks={'x': 2500}) In [8]: dask.set_options(pool=Pool(4)) Out[8]: In [9]: %time ds.sum().compute() --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) in () ----> 1 get_ipython().magic('time ds.sum().compute()') /Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/IPython/core/interactiveshell.py in magic(self, arg_s) 2156 magic_name, _, magic_arg_s = arg_s.partition(' ') 2157 magic_name = magic_name.lstrip(prefilter.ESC_MAGIC) -> 2158 return self.run_line_magic(magic_name, magic_arg_s) 2159 2160 #------------------------------------------------------------------------- /Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line) 2077 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals 2078 with self.builtin_trap: -> 2079 result = fn(*args,**kwargs) 2080 return result 2081 in time(self, line, cell, local_ns) /Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/IPython/core/magic.py in (f, *a, **k) 186 # but it's overkill for just that one bit of state. 187 def magic_deco(arg): --> 188 call = lambda f, *a, **k: f(*a, **k) 189 190 if callable(arg): /Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/IPython/core/magics/execution.py in time(self, line, cell, local_ns) 1174 if mode=='eval': 1175 st = clock2() -> 1176 out = eval(code, glob, local_ns) 1177 end = clock2() 1178 else: in () /Users/shoyer/dev/xarray/xarray/core/dataset.py in compute(self) 348 """""" 349 new = self.copy(deep=False) --> 350 return new.load() 351 352 @classmethod /Users/shoyer/dev/xarray/xarray/core/dataset.py in load(self) 325 326 # evaluate all the dask arrays simultaneously --> 327 evaluated_data = da.compute(*lazy_data.values()) 328 329 for k, data in zip(lazy_data, evaluated_data): /Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/base.py in compute(*args, **kwargs) 176 dsk = merge(var.dask for var in variables) 177 keys = [var._keys() for var in variables] --> 178 results = get(dsk, keys, **kwargs) 179 180 results_iter = iter(results) /Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, **kwargs) 67 results = get_async(pool.apply_async, len(pool._pool), dsk, result, 68 cache=cache, get_id=_thread_get_id, ---> 69 **kwargs) 70 71 # Cleanup pools associated to dead threads /Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, raise_on_exception, rerun_exceptions_locally, callbacks, dumps, loads, **kwargs) 500 _execute_task(task, data) # Re-execute locally 501 else: --> 502 raise(remote_exception(res, tb)) 503 state['cache'][key] = res 504 finish_task(dsk, key, state, results, keyorder.get) RuntimeError: NetCDF: HDF error Traceback --------- File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py"", line 268, in execute_task result = _execute_task(task, data) File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py"", line 248, in _execute_task args2 = [_execute_task(a, cache) for a in args] File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py"", line 248, in args2 = [_execute_task(a, cache) for a in args] File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py"", line 245, in _execute_task return [_execute_task(a, cache) for a in arg] File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py"", line 245, in return [_execute_task(a, cache) for a in arg] File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py"", line 249, in _execute_task return func(*args2) File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/array/core.py"", line 51, in getarray c = np.asarray(c) File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/numpy/core/numeric.py"", line 482, in asarray return array(a, dtype, copy=False, order=order) File ""/Users/shoyer/dev/xarray/xarray/core/indexing.py"", line 417, in __array__ return np.asarray(self.array, dtype=dtype) File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/numpy/core/numeric.py"", line 482, in asarray return array(a, dtype, copy=False, order=order) File ""/Users/shoyer/dev/xarray/xarray/core/indexing.py"", line 392, in __array__ return np.asarray(array[self.key], dtype=None) File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/numpy/core/numeric.py"", line 482, in asarray return array(a, dtype, copy=False, order=order) File ""/Users/shoyer/dev/xarray/xarray/core/indexing.py"", line 392, in __array__ return np.asarray(array[self.key], dtype=None) File ""/Users/shoyer/dev/xarray/xarray/backends/netCDF4_.py"", line 56, in __getitem__ data = getitem(self.array, key) File ""netCDF4/_netCDF4.pyx"", line 3695, in netCDF4._netCDF4.Variable.__getitem__ (netCDF4/_netCDF4.c:37914) File ""netCDF4/_netCDF4.pyx"", line 4376, in netCDF4._netCDF4.Variable._get (netCDF4/_netCDF4.c:47134) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033 https://github.com/pydata/xarray/pull/1128#issuecomment-261837981,https://api.github.com/repos/pydata/xarray/issues/1128,261837981,MDEyOklzc3VlQ29tbWVudDI2MTgzNzk4MQ==,1217238,2016-11-21T04:08:22Z,2016-11-21T04:11:30Z,MEMBER,"I added pickle support to DataStores. This *should* solve the basic serialization issue for dask.distributed (#798), but does not yet resolve the ""too many open files"" issue. @mrocklin this could use your review.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033 https://github.com/pydata/xarray/pull/1128#issuecomment-261755551,https://api.github.com/repos/pydata/xarray/issues/1128,261755551,MDEyOklzc3VlQ29tbWVudDI2MTc1NTU1MQ==,1217238,2016-11-20T03:13:30Z,2016-11-20T03:13:30Z,MEMBER,"I removed the custom pickle override on `Dataset`/`DataArray` -- the issue I was working around was actually a indirect manifestation of bug on `IndexVariable.load()` (introduced in this PR). ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033 https://github.com/pydata/xarray/pull/1128#issuecomment-261433336,https://api.github.com/repos/pydata/xarray/issues/1128,261433336,MDEyOklzc3VlQ29tbWVudDI2MTQzMzMzNg==,1217238,2016-11-18T02:36:21Z,2016-11-18T02:36:21Z,MEMBER,"> In the long run I think it would be more robust to check for attributes (duck type style) rather than types in the various places. Indeed, in particular I'm not very happy with the `isinstance` check for `indexing.MemoryCachedArray` in `Variable.copy()` -- it's rather poor separation of concerns. It exists so that `variable.compute()` does not cache data in-memory on `variable` but only on the computed variable. Otherwise, there's basically no point to the separate compute method: if you use `cache=True`, you are stuck with caching on the original object. Likewise, it ensures that `.copy()` creates an array with a new cache, which is consistent with the current behavior of `.copy()`. As for type checking for dask arrays in `.data`: yes, it would be nice to have a well defined array interface layer that other array types could plug into. That would entail a significant amount of further work, however. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033