home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

19 rows where issue = 189817033 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 5

  • shoyer 11
  • mrocklin 4
  • mangecoeur 2
  • kynan 1
  • crusaderky 1

author_association 3

  • MEMBER 16
  • CONTRIBUTOR 2
  • NONE 1

issue 1

  • Remove caching logic from xarray.Variable · 19 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
265966887 https://github.com/pydata/xarray/pull/1128#issuecomment-265966887 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2NTk2Njg4Nw== mangecoeur 743508 2016-12-09T09:08:48Z 2016-12-09T09:08:48Z CONTRIBUTOR

@shoyer thanks, with a little testing it seems lock=False is fine (so don't automatically need dask dev for lock=dask.utils.SerializableLock()). Using spawning pool is necessary, just doesn't work without. Also looks like using dask distributed ipython backend works fine (works similar to spawn pool in that the worker engines aren't forked but kinda live in their own little world) - this is really nice because ipython in turn has good support for HPC systems (SGE batch scheduling + MPI for process handling).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033
265878280 https://github.com/pydata/xarray/pull/1128#issuecomment-265878280 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2NTg3ODI4MA== shoyer 1217238 2016-12-08T22:44:12Z 2016-12-08T22:44:12Z MEMBER

@mangecoeur You still need to use lock=False (or lock=dask.utils.SerializableLock() with the dev version of dask) and use a spawning process pool (https://github.com/pydata/xarray/pull/1128#issuecomment-261936849).

The former should be updated internally, and the later should be a documentation note.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033
265875012 https://github.com/pydata/xarray/pull/1128#issuecomment-265875012 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2NTg3NTAxMg== mangecoeur 743508 2016-12-08T22:28:25Z 2016-12-08T22:28:25Z CONTRIBUTOR

I'm trying out the latest code to subset a set of netcdf4 files with dask.multiprocessing using set_options(get=dask.multiprocessing.get) but I'm still getting TypeError: can't pickle _thread.lock objects - this expect or there something specific I need to do to make it work?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033
264033283 https://github.com/pydata/xarray/pull/1128#issuecomment-264033283 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2NDAzMzI4Mw== shoyer 1217238 2016-11-30T23:44:54Z 2016-11-30T23:44:54Z MEMBER

OK, in it goes!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033
264032725 https://github.com/pydata/xarray/pull/1128#issuecomment-264032725 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2NDAzMjcyNQ== kynan 346079 2016-11-30T23:42:10Z 2016-11-30T23:42:10Z NONE

No objections, go ahead!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033
263968185 https://github.com/pydata/xarray/pull/1128#issuecomment-263968185 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2Mzk2ODE4NQ== crusaderky 6213168 2016-11-30T19:21:32Z 2016-11-30T19:21:32Z MEMBER

All looks good, go on

On 30 Nov 2016 16:50, "Stephan Hoyer" notifications@github.com wrote:

@kynan https://github.com/kynan @crusaderky https://github.com/crusaderky Do you have concerns about merging this in the current state?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1128#issuecomment-263927223, or mute the thread https://github.com/notifications/unsubscribe-auth/AF7OMPt-arhYGraorYzYGwxMUNuLe_9zks5rDalpgaJpZM4K0Vld .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033
263927223 https://github.com/pydata/xarray/pull/1128#issuecomment-263927223 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2MzkyNzIyMw== shoyer 1217238 2016-11-30T16:50:48Z 2016-11-30T16:50:48Z MEMBER

@kynan @crusaderky Do you have concerns about merging this in the current state?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033
263926969 https://github.com/pydata/xarray/pull/1128#issuecomment-263926969 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2MzkyNjk2OQ== shoyer 1217238 2016-11-30T16:49:53Z 2016-11-30T16:49:53Z MEMBER

I decided that between the choices of not running these tests on Windows and leaking a few temp files, I would rather leak some temp files. So that's exactly what I've done in the latest commit, for explicitly whitelisted tests.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033
263346169 https://github.com/pydata/xarray/pull/1128#issuecomment-263346169 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2MzM0NjE2OQ== mrocklin 306380 2016-11-28T18:05:54Z 2016-11-28T18:05:54Z MEMBER

I agree that it's not great. This was more a show of solidarity that we've also run into this same issue and had to resort to similar hacks.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033
263345757 https://github.com/pydata/xarray/pull/1128#issuecomment-263345757 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2MzM0NTc1Nw== shoyer 1217238 2016-11-28T18:04:17Z 2016-11-28T18:04:17Z MEMBER

@mrocklin OK, so one option is to just ignore the permission errors and not remove the files on Windows. But is it really better to make the test suite leak temp files?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033
263345054 https://github.com/pydata/xarray/pull/1128#issuecomment-263345054 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2MzM0NTA1NA== mrocklin 306380 2016-11-28T18:01:41Z 2016-11-28T18:01:41Z MEMBER

@shoyer https://github.com/dask/dask/blob/master/dask/utils.py#L68-L84

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033
263344764 https://github.com/pydata/xarray/pull/1128#issuecomment-263344764 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2MzM0NDc2NA== shoyer 1217238 2016-11-28T18:00:38Z 2016-11-28T18:00:38Z MEMBER

OK, I'm ready to give up on the remaining test failures and merge this anyways (marking them as expected failures). They are specific to our test suite and for Windows only, due to the inability to delete files that are not closed.

If these manifest themselves as issues for real users, I am happy to revisit, especially if someone who uses Windows can help debug. The 5 minute feedback cycle of pushing a commit and then seeing what Appveyor says is too painful.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033
261989094 https://github.com/pydata/xarray/pull/1128#issuecomment-261989094 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2MTk4OTA5NA== mrocklin 306380 2016-11-21T16:29:25Z 2016-11-21T16:29:25Z MEMBER

Why, yes it does -- and it shows a nice speedup, as well! What was I missing here?

Spawn is only available in Python 3, so it's not a full solution. Something isn't fork-safe, possibly something within the HDF5 library?

You might also want to try forkserver and look at this semi-related PR https://github.com/dask/distributed/pull/687

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033
261980869 https://github.com/pydata/xarray/pull/1128#issuecomment-261980869 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2MTk4MDg2OQ== shoyer 1217238 2016-11-21T16:04:14Z 2016-11-21T16:04:14Z MEMBER

Does your failure work with the following spawning pool in Python 3?

Why, yes it does -- and it shows a nice speedup, as well! What was I missing here?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033
261936849 https://github.com/pydata/xarray/pull/1128#issuecomment-261936849 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2MTkzNjg0OQ== mrocklin 306380 2016-11-21T13:21:21Z 2016-11-21T13:21:21Z MEMBER

Does your failure work with the following spawning pool in Python 3?

```python In [1]: import multiprocessing

In [2]: ctx = multiprocessing.get_context('spawn')

In [3]: ctx.Pool(4) Out[3]: <multiprocessing.pool.Pool at 0x7fec70afca20> ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033
261841025 https://github.com/pydata/xarray/pull/1128#issuecomment-261841025 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2MTg0MTAyNQ== shoyer 1217238 2016-11-21T04:36:02Z 2016-11-21T04:36:02Z MEMBER

This isn't yet working with dask multiprocessing for reading a netCDF4 file with in-memory compression. I'm not quite sure why: ``` In [5]: from multiprocessing.pool import Pool

In [7]: ds = xr.open_dataset('big-random.nc', lock=False, chunks={'x': 2500})

In [8]: dask.set_options(pool=Pool(4)) Out[8]: <dask.context.set_options at 0x1087c3898>

In [9]: %time ds.sum().compute()

RuntimeError Traceback (most recent call last) <ipython-input-9-4c43356c48db> in <module>() ----> 1 get_ipython().magic('time ds.sum().compute()')

/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/IPython/core/interactiveshell.py in magic(self, arg_s) 2156 magic_name, _, magic_arg_s = arg_s.partition(' ') 2157 magic_name = magic_name.lstrip(prefilter.ESC_MAGIC) -> 2158 return self.run_line_magic(magic_name, magic_arg_s) 2159 2160 #-------------------------------------------------------------------------

/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line) 2077 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals 2078 with self.builtin_trap: -> 2079 result = fn(args,*kwargs) 2080 return result 2081

<decorator-gen-59> in time(self, line, cell, local_ns)

/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/IPython/core/magic.py in <lambda>(f, a, k) 186 # but it's overkill for just that one bit of state. 187 def magic_deco(arg): --> 188 call = lambda f, a, k: f(*a, k) 189 190 if callable(arg):

/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/IPython/core/magics/execution.py in time(self, line, cell, local_ns) 1174 if mode=='eval': 1175 st = clock2() -> 1176 out = eval(code, glob, local_ns) 1177 end = clock2() 1178 else:

<timed eval> in <module>()

/Users/shoyer/dev/xarray/xarray/core/dataset.py in compute(self) 348 """ 349 new = self.copy(deep=False) --> 350 return new.load() 351 352 @classmethod

/Users/shoyer/dev/xarray/xarray/core/dataset.py in load(self) 325 326 # evaluate all the dask arrays simultaneously --> 327 evaluated_data = da.compute(*lazy_data.values()) 328 329 for k, data in zip(lazy_data, evaluated_data):

/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/base.py in compute(args, kwargs) 176 dsk = merge(var.dask for var in variables) 177 keys = [var._keys() for var in variables] --> 178 results = get(dsk, keys, *kwargs) 179 180 results_iter = iter(results)

/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, kwargs) 67 results = get_async(pool.apply_async, len(pool._pool), dsk, result, 68 cache=cache, get_id=_thread_get_id, ---> 69 kwargs) 70 71 # Cleanup pools associated to dead threads

/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, raise_on_exception, rerun_exceptions_locally, callbacks, dumps, loads, **kwargs) 500 _execute_task(task, data) # Re-execute locally 501 else: --> 502 raise(remote_exception(res, tb)) 503 state['cache'][key] = res 504 finish_task(dsk, key, state, results, keyorder.get)

RuntimeError: NetCDF: HDF error

Traceback

File "/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py", line 268, in execute_task result = execute_task(task, data) File "/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py", line 248, in _execute_task args2 = [_execute_task(a, cache) for a in args] File "/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py", line 248, in <listcomp> args2 = [_execute_task(a, cache) for a in args] File "/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py", line 245, in _execute_task return [_execute_task(a, cache) for a in arg] File "/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py", line 245, in <listcomp> return [_execute_task(a, cache) for a in arg] File "/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/async.py", line 249, in _execute_task return func(*args2) File "/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/dask/array/core.py", line 51, in getarray c = np.asarray(c) File "/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/numpy/core/numeric.py", line 482, in asarray return array(a, dtype, copy=False, order=order) File "/Users/shoyer/dev/xarray/xarray/core/indexing.py", line 417, in __array__ return np.asarray(self.array, dtype=dtype) File "/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/numpy/core/numeric.py", line 482, in asarray return array(a, dtype, copy=False, order=order) File "/Users/shoyer/dev/xarray/xarray/core/indexing.py", line 392, in __array__ return np.asarray(array[self.key], dtype=None) File "/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/numpy/core/numeric.py", line 482, in asarray return array(a, dtype, copy=False, order=order) File "/Users/shoyer/dev/xarray/xarray/core/indexing.py", line 392, in __array__ return np.asarray(array[self.key], dtype=None) File "/Users/shoyer/dev/xarray/xarray/backends/netCDF4.py", line 56, in getitem data = getitem(self.array, key) File "netCDF4/_netCDF4.pyx", line 3695, in netCDF4._netCDF4.Variable.getitem (netCDF4/_netCDF4.c:37914) File "netCDF4/_netCDF4.pyx", line 4376, in netCDF4._netCDF4.Variable._get (netCDF4/_netCDF4.c:47134) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033
261837981 https://github.com/pydata/xarray/pull/1128#issuecomment-261837981 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2MTgzNzk4MQ== shoyer 1217238 2016-11-21T04:08:22Z 2016-11-21T04:11:30Z MEMBER

I added pickle support to DataStores. This should solve the basic serialization issue for dask.distributed (#798), but does not yet resolve the "too many open files" issue.

@mrocklin this could use your review.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033
261755551 https://github.com/pydata/xarray/pull/1128#issuecomment-261755551 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2MTc1NTU1MQ== shoyer 1217238 2016-11-20T03:13:30Z 2016-11-20T03:13:30Z MEMBER

I removed the custom pickle override on Dataset/DataArray -- the issue I was working around was actually a indirect manifestation of bug on IndexVariable.load() (introduced in this PR).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033
261433336 https://github.com/pydata/xarray/pull/1128#issuecomment-261433336 https://api.github.com/repos/pydata/xarray/issues/1128 MDEyOklzc3VlQ29tbWVudDI2MTQzMzMzNg== shoyer 1217238 2016-11-18T02:36:21Z 2016-11-18T02:36:21Z MEMBER

In the long run I think it would be more robust to check for attributes (duck type style) rather than types in the various places.

Indeed, in particular I'm not very happy with the isinstance check for indexing.MemoryCachedArray in Variable.copy() -- it's rather poor separation of concerns.

It exists so that variable.compute() does not cache data in-memory on variable but only on the computed variable. Otherwise, there's basically no point to the separate compute method: if you use cache=True, you are stuck with caching on the original object. Likewise, it ensures that .copy() creates an array with a new cache, which is consistent with the current behavior of .copy().

As for type checking for dask arrays in .data: yes, it would be nice to have a well defined array interface layer that other array types could plug into. That would entail a significant amount of further work, however.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove caching logic from xarray.Variable 189817033

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.22ms · About: xarray-datasette