html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/1128#issuecomment-265966887,https://api.github.com/repos/pydata/xarray/issues/1128,265966887,MDEyOklzc3VlQ29tbWVudDI2NTk2Njg4Nw==,743508,2016-12-09T09:08:48Z,2016-12-09T09:08:48Z,CONTRIBUTOR,"@shoyer thanks, with a little testing it seems `lock=False` is fine (so don't automatically need dask dev for `lock=dask.utils.SerializableLock()`). Using spawning pool is necessary, just doesn't work without. Also looks like using dask distributed ipython backend works fine (works similar to spawn pool in that the worker engines aren't forked but kinda live in their own little world) - this is really nice because ipython in turn has good support for HPC systems (SGE batch scheduling + MPI for process handling).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-265878280,https://api.github.com/repos/pydata/xarray/issues/1128,265878280,MDEyOklzc3VlQ29tbWVudDI2NTg3ODI4MA==,1217238,2016-12-08T22:44:12Z,2016-12-08T22:44:12Z,MEMBER,"@mangecoeur You still need to use `lock=False` (or `lock=dask.utils.SerializableLock()` with the dev version of dask) and use a spawning process pool (https://github.com/pydata/xarray/pull/1128#issuecomment-261936849).
The former should be updated internally, and the later should be a documentation note.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-265875012,https://api.github.com/repos/pydata/xarray/issues/1128,265875012,MDEyOklzc3VlQ29tbWVudDI2NTg3NTAxMg==,743508,2016-12-08T22:28:25Z,2016-12-08T22:28:25Z,CONTRIBUTOR,I'm trying out the latest code to subset a set of netcdf4 files with dask.multiprocessing using `set_options(get=dask.multiprocessing.get)` but I'm still getting `TypeError: can't pickle _thread.lock objects` - this expect or there something specific I need to do to make it work?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-264033283,https://api.github.com/repos/pydata/xarray/issues/1128,264033283,MDEyOklzc3VlQ29tbWVudDI2NDAzMzI4Mw==,1217238,2016-11-30T23:44:54Z,2016-11-30T23:44:54Z,MEMBER,"OK, in it goes!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-264032725,https://api.github.com/repos/pydata/xarray/issues/1128,264032725,MDEyOklzc3VlQ29tbWVudDI2NDAzMjcyNQ==,346079,2016-11-30T23:42:10Z,2016-11-30T23:42:10Z,NONE,"No objections, go ahead!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-263968185,https://api.github.com/repos/pydata/xarray/issues/1128,263968185,MDEyOklzc3VlQ29tbWVudDI2Mzk2ODE4NQ==,6213168,2016-11-30T19:21:32Z,2016-11-30T19:21:32Z,MEMBER,"All looks good, go on
On 30 Nov 2016 16:50, ""Stephan Hoyer"" wrote:
> @kynan @crusaderky
> Do you have concerns about merging this
> in the current state?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or mute
> the thread
>
> .
>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-263927223,https://api.github.com/repos/pydata/xarray/issues/1128,263927223,MDEyOklzc3VlQ29tbWVudDI2MzkyNzIyMw==,1217238,2016-11-30T16:50:48Z,2016-11-30T16:50:48Z,MEMBER,@kynan @crusaderky Do you have concerns about merging this in the current state?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-263926969,https://api.github.com/repos/pydata/xarray/issues/1128,263926969,MDEyOklzc3VlQ29tbWVudDI2MzkyNjk2OQ==,1217238,2016-11-30T16:49:53Z,2016-11-30T16:49:53Z,MEMBER,"I decided that between the choices of not running these tests on Windows and leaking a few temp files, I would rather leak some temp files. So that's exactly what I've done in the latest commit, for explicitly whitelisted tests.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-263346169,https://api.github.com/repos/pydata/xarray/issues/1128,263346169,MDEyOklzc3VlQ29tbWVudDI2MzM0NjE2OQ==,306380,2016-11-28T18:05:54Z,2016-11-28T18:05:54Z,MEMBER,I agree that it's not great. This was more a show of solidarity that we've also run into this same issue and had to resort to similar hacks. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-263345757,https://api.github.com/repos/pydata/xarray/issues/1128,263345757,MDEyOklzc3VlQ29tbWVudDI2MzM0NTc1Nw==,1217238,2016-11-28T18:04:17Z,2016-11-28T18:04:17Z,MEMBER,"@mrocklin OK, so one option is to just ignore the permission errors and not remove the files on Windows. But is it really better to make the test suite leak temp files?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-263345054,https://api.github.com/repos/pydata/xarray/issues/1128,263345054,MDEyOklzc3VlQ29tbWVudDI2MzM0NTA1NA==,306380,2016-11-28T18:01:41Z,2016-11-28T18:01:41Z,MEMBER,@shoyer https://github.com/dask/dask/blob/master/dask/utils.py#L68-L84,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-263344764,https://api.github.com/repos/pydata/xarray/issues/1128,263344764,MDEyOklzc3VlQ29tbWVudDI2MzM0NDc2NA==,1217238,2016-11-28T18:00:38Z,2016-11-28T18:00:38Z,MEMBER,"OK, I'm ready to give up on the remaining test failures and merge this anyways (marking them as expected failures). They are specific to our test suite and for Windows only, due to the inability to delete files that are not closed.
If these manifest themselves as issues for real users, I am happy to revisit, especially if someone who uses Windows can help debug. The 5 minute feedback cycle of pushing a commit and then seeing what Appveyor says is too painful.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-261989094,https://api.github.com/repos/pydata/xarray/issues/1128,261989094,MDEyOklzc3VlQ29tbWVudDI2MTk4OTA5NA==,306380,2016-11-21T16:29:25Z,2016-11-21T16:29:25Z,MEMBER,"> Why, yes it does -- and it shows a nice speedup, as well! What was I missing here?
Spawn is only available in Python 3, so it's not a full solution. Something isn't fork-safe, possibly something within the HDF5 library?
You might also want to try `forkserver` and look at this semi-related PR https://github.com/dask/distributed/pull/687","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-261980869,https://api.github.com/repos/pydata/xarray/issues/1128,261980869,MDEyOklzc3VlQ29tbWVudDI2MTk4MDg2OQ==,1217238,2016-11-21T16:04:14Z,2016-11-21T16:04:14Z,MEMBER,"> Does your failure work with the following spawning pool in Python 3?
Why, yes it does -- and it shows a nice speedup, as well! What was I missing here?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-261936849,https://api.github.com/repos/pydata/xarray/issues/1128,261936849,MDEyOklzc3VlQ29tbWVudDI2MTkzNjg0OQ==,306380,2016-11-21T13:21:21Z,2016-11-21T13:21:21Z,MEMBER,"Does your failure work with the following spawning pool in Python 3?
```python
In [1]: import multiprocessing
In [2]: ctx = multiprocessing.get_context('spawn')
In [3]: ctx.Pool(4)
Out[3]:
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-261841025,https://api.github.com/repos/pydata/xarray/issues/1128,261841025,MDEyOklzc3VlQ29tbWVudDI2MTg0MTAyNQ==,1217238,2016-11-21T04:36:02Z,2016-11-21T04:36:02Z,MEMBER,"This isn't yet working with dask multiprocessing for reading a netCDF4 file with in-memory compression. I'm not quite sure why:
```
In [5]: from multiprocessing.pool import Pool
In [7]: ds = xr.open_dataset('big-random.nc', lock=False, chunks={'x': 2500})
In [8]: dask.set_options(pool=Pool(4))
Out[8]:
In [9]: %time ds.sum().compute()
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
in ()
----> 1 get_ipython().magic('time ds.sum().compute()')
/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/IPython/core/interactiveshell.py in magic(self, arg_s)
2156 magic_name, _, magic_arg_s = arg_s.partition(' ')
2157 magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2158 return self.run_line_magic(magic_name, magic_arg_s)
2159
2160 #-------------------------------------------------------------------------
/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line)
2077 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2078 with self.builtin_trap:
-> 2079 result = fn(*args,**kwargs)
2080 return result
2081
in time(self, line, cell, local_ns)
/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/IPython/core/magic.py in (f, *a, **k)
186 # but it's overkill for just that one bit of state.
187 def magic_deco(arg):
--> 188 call = lambda f, *a, **k: f(*a, **k)
189
190 if callable(arg):
/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/IPython/core/magics/execution.py in time(self, line, cell, local_ns)
1174 if mode=='eval':
1175 st = clock2()
-> 1176 out = eval(code, glob, local_ns)
1177 end = clock2()
1178 else:
in ()
/Users/shoyer/dev/xarray/xarray/core/dataset.py in compute(self)
348 """"""
349 new = self.copy(deep=False)
--> 350 return new.load()
351
352 @classmethod
/Users/shoyer/dev/xarray/xarray/core/dataset.py in load(self)
325
326 # evaluate all the dask arrays simultaneously
--> 327 evaluated_data = da.compute(*lazy_data.values())
328
329 for k, data in zip(lazy_data, evaluated_data):
/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/base.py in compute(*args, **kwargs)
176 dsk = merge(var.dask for var in variables)
177 keys = [var._keys() for var in variables]
--> 178 results = get(dsk, keys, **kwargs)
179
180 results_iter = iter(results)
/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, **kwargs)
67 results = get_async(pool.apply_async, len(pool._pool), dsk, result,
68 cache=cache, get_id=_thread_get_id,
---> 69 **kwargs)
70
71 # Cleanup pools associated to dead threads
/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, raise_on_exception, rerun_exceptions_locally, callbacks, dumps, loads, **kwargs)
500 _execute_task(task, data) # Re-execute locally
501 else:
--> 502 raise(remote_exception(res, tb))
503 state['cache'][key] = res
504 finish_task(dsk, key, state, results, keyorder.get)
RuntimeError: NetCDF: HDF error
Traceback
---------
File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py"", line 268, in execute_task
result = _execute_task(task, data)
File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py"", line 248, in _execute_task
args2 = [_execute_task(a, cache) for a in args]
File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py"", line 248, in
args2 = [_execute_task(a, cache) for a in args]
File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py"", line 245, in _execute_task
return [_execute_task(a, cache) for a in arg]
File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py"", line 245, in
return [_execute_task(a, cache) for a in arg]
File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py"", line 249, in _execute_task
return func(*args2)
File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/array/core.py"", line 51, in getarray
c = np.asarray(c)
File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/numpy/core/numeric.py"", line 482, in asarray
return array(a, dtype, copy=False, order=order)
File ""/Users/shoyer/dev/xarray/xarray/core/indexing.py"", line 417, in __array__
return np.asarray(self.array, dtype=dtype)
File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/numpy/core/numeric.py"", line 482, in asarray
return array(a, dtype, copy=False, order=order)
File ""/Users/shoyer/dev/xarray/xarray/core/indexing.py"", line 392, in __array__
return np.asarray(array[self.key], dtype=None)
File ""/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/numpy/core/numeric.py"", line 482, in asarray
return array(a, dtype, copy=False, order=order)
File ""/Users/shoyer/dev/xarray/xarray/core/indexing.py"", line 392, in __array__
return np.asarray(array[self.key], dtype=None)
File ""/Users/shoyer/dev/xarray/xarray/backends/netCDF4_.py"", line 56, in __getitem__
data = getitem(self.array, key)
File ""netCDF4/_netCDF4.pyx"", line 3695, in netCDF4._netCDF4.Variable.__getitem__ (netCDF4/_netCDF4.c:37914)
File ""netCDF4/_netCDF4.pyx"", line 4376, in netCDF4._netCDF4.Variable._get (netCDF4/_netCDF4.c:47134)
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-261837981,https://api.github.com/repos/pydata/xarray/issues/1128,261837981,MDEyOklzc3VlQ29tbWVudDI2MTgzNzk4MQ==,1217238,2016-11-21T04:08:22Z,2016-11-21T04:11:30Z,MEMBER,"I added pickle support to DataStores. This *should* solve the basic serialization issue for dask.distributed (#798), but does not yet resolve the ""too many open files"" issue.
@mrocklin this could use your review.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-261755551,https://api.github.com/repos/pydata/xarray/issues/1128,261755551,MDEyOklzc3VlQ29tbWVudDI2MTc1NTU1MQ==,1217238,2016-11-20T03:13:30Z,2016-11-20T03:13:30Z,MEMBER,"I removed the custom pickle override on `Dataset`/`DataArray` -- the issue I was working around was actually a indirect manifestation of bug on `IndexVariable.load()` (introduced in this PR).
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-261433336,https://api.github.com/repos/pydata/xarray/issues/1128,261433336,MDEyOklzc3VlQ29tbWVudDI2MTQzMzMzNg==,1217238,2016-11-18T02:36:21Z,2016-11-18T02:36:21Z,MEMBER,"> In the long run I think it would be more robust to check for attributes (duck type style) rather than types in the various places.
Indeed, in particular I'm not very happy with the `isinstance` check for `indexing.MemoryCachedArray` in `Variable.copy()` -- it's rather poor separation of concerns.
It exists so that `variable.compute()` does not cache data in-memory on `variable` but only on the computed variable. Otherwise, there's basically no point to the separate compute method: if you use `cache=True`, you are stuck with caching on the original object. Likewise, it ensures that `.copy()` creates an array with a new cache, which is consistent with the current behavior of `.copy()`.
As for type checking for dask arrays in `.data`: yes, it would be nice to have a well defined array interface layer that other array types could plug into. That would entail a significant amount of further work, however.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033