home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 550355524 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • TomAugspurger 3
  • dcherian 2

issue 1

  • dask.optimize on xarray objects · 5 ✖

author_association 1

  • MEMBER 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
690378323 https://github.com/pydata/xarray/issues/3698#issuecomment-690378323 https://api.github.com/repos/pydata/xarray/issues/3698 MDEyOklzc3VlQ29tbWVudDY5MDM3ODMyMw== TomAugspurger 1312546 2020-09-10T15:42:54Z 2020-09-10T15:42:54Z MEMBER

Thanks for confirming. I'll take another look at this today then.

On Thu, Sep 10, 2020 at 10:30 AM Deepak Cherian notifications@github.com wrote:

Reopened #3698 https://github.com/pydata/xarray/issues/3698.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/3698#event-3751728444, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIT6LDBKVUQ5KR7VFB3SFDWI3ANCNFSM4KHH63GQ .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dask.optimize on xarray objects 550355524
690367604 https://github.com/pydata/xarray/issues/3698#issuecomment-690367604 https://api.github.com/repos/pydata/xarray/issues/3698 MDEyOklzc3VlQ29tbWVudDY5MDM2NzYwNA== dcherian 2448579 2020-09-10T15:30:01Z 2020-09-10T15:30:01Z MEMBER

The numpy example is fixed but the dask rechunked example is still broken.

python a = dask.array.ones((10,5), chunks=(1,3)) dask.optimize(xr.DataArray(a))[0].compute() # works dask.optimize(xr.DataArray(a).chunk(5))[0].compute() # error

``` IndexError Traceback (most recent call last) <ipython-input-8-5663bc8bc82a> in <module> ----> 1 dask.optimize(xr.DataArray(a).chunk(5))[0].compute()

~/miniconda3/envs/dcpy/lib/python3.8/site-packages/xarray/core/dataarray.py in compute(self, kwargs) 838 """ 839 new = self.copy(deep=False) --> 840 return new.load(kwargs) 841 842 def persist(self, **kwargs) -> "DataArray":

~/miniconda3/envs/dcpy/lib/python3.8/site-packages/xarray/core/dataarray.py in load(self, kwargs) 812 dask.array.compute 813 """ --> 814 ds = self._to_temp_dataset().load(kwargs) 815 new = self._from_temp_dataset(ds) 816 self._variable = new._variable

~/miniconda3/envs/dcpy/lib/python3.8/site-packages/xarray/core/dataset.py in load(self, kwargs) 656 657 # evaluate all the dask arrays simultaneously --> 658 evaluated_data = da.compute(*lazy_data.values(), kwargs) 659 660 for k, data in zip(lazy_data, evaluated_data):

~/miniconda3/envs/dcpy/lib/python3.8/site-packages/dask/base.py in compute(args, kwargs) 445 postcomputes.append(x.dask_postcompute()) 446 --> 447 results = schedule(dsk, keys, kwargs) 448 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)]) 449

~/miniconda3/envs/dcpy/lib/python3.8/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, pool, **kwargs) 74 pools[thread][num_workers] = pool 75 ---> 76 results = get_async( 77 pool.apply_async, 78 len(pool._pool),

~/miniconda3/envs/dcpy/lib/python3.8/site-packages/dask/local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, **kwargs) 484 _execute_task(task, data) # Re-execute locally 485 else: --> 486 raise_exception(exc, tb) 487 res, worker_id = loads(res_info) 488 state["cache"][key] = res

~/miniconda3/envs/dcpy/lib/python3.8/site-packages/dask/local.py in reraise(exc, tb) 314 if exc.traceback is not tb: 315 raise exc.with_traceback(tb) --> 316 raise exc 317 318

~/miniconda3/envs/dcpy/lib/python3.8/site-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception) 220 try: 221 task, data = loads(task_info) --> 222 result = _execute_task(task, data) 223 id = get_id() 224 result = dumps((result, id))

~/miniconda3/envs/dcpy/lib/python3.8/site-packages/dask/core.py in _execute_task(arg, cache, dsk) 119 # temporaries by their reference count and can execute certain 120 # operations in-place. --> 121 return func(*(_execute_task(a, cache) for a in args)) 122 elif not ishashable(arg): 123 return arg

~/miniconda3/envs/dcpy/lib/python3.8/site-packages/dask/array/core.py in concatenate3(arrays) 4407 if not ndim: 4408 return arrays -> 4409 chunks = chunks_from_arrays(arrays) 4410 shape = tuple(map(sum, chunks)) 4411

~/miniconda3/envs/dcpy/lib/python3.8/site-packages/dask/array/core.py in chunks_from_arrays(arrays) 4178 4179 while isinstance(arrays, (list, tuple)): -> 4180 result.append(tuple([shape(deepfirst(a))[dim] for a in arrays])) 4181 arrays = arrays[0] 4182 dim += 1

~/miniconda3/envs/dcpy/lib/python3.8/site-packages/dask/array/core.py in <listcomp>(.0) 4178 4179 while isinstance(arrays, (list, tuple)): -> 4180 result.append(tuple([shape(deepfirst(a))[dim] for a in arrays])) 4181 arrays = arrays[0] 4182 dim += 1

IndexError: tuple index out of range ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dask.optimize on xarray objects 550355524
689825648 https://github.com/pydata/xarray/issues/3698#issuecomment-689825648 https://api.github.com/repos/pydata/xarray/issues/3698 MDEyOklzc3VlQ29tbWVudDY4OTgyNTY0OA== dcherian 2448579 2020-09-09T21:14:16Z 2020-09-09T21:14:16Z MEMBER

I guess I can see that. Thanks Tom.

it even when the chunk size exceeds dask.config['array']['chunk-size']

FYI the slicing behaviour is independent of chunk-size (matt's recommendation).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dask.optimize on xarray objects 550355524
689808725 https://github.com/pydata/xarray/issues/3698#issuecomment-689808725 https://api.github.com/repos/pydata/xarray/issues/3698 MDEyOklzc3VlQ29tbWVudDY4OTgwODcyNQ== TomAugspurger 1312546 2020-09-09T20:38:39Z 2020-09-09T20:38:39Z MEMBER

FYI, @dcherian your recent PR to dask fixed this example. Playing around with chunk sizes, it seems to have fixed it even when the chunk size exceeds dask.config['array']['chunk-size'].

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dask.optimize on xarray objects 550355524
592101136 https://github.com/pydata/xarray/issues/3698#issuecomment-592101136 https://api.github.com/repos/pydata/xarray/issues/3698 MDEyOklzc3VlQ29tbWVudDU5MjEwMTEzNg== TomAugspurger 1312546 2020-02-27T18:13:28Z 2020-02-27T18:13:28Z MEMBER

It looks like xarray is getting a bad task graph after the optimize.

```python In [1]: import xarray as xr import dask In [2]: import dask

In [3]: a = dask.array.ones((10,5), chunks=(1,3)) ...: a = dask.optimize(a)[0]

In [4]: da = xr.DataArray(a.compute()).chunk({"dim_0": 5}) ...: da = dask.optimize(da)[0]

In [5]: dict(da.dask_graph()) Out[5]: {('xarray-<this-array>-e2865aa10d476e027154771611541f99', 1, 0): (<function _operator.getitem(a, b, /)>, 'xarray-<this-array>-e2865aa10d476e027154771611541f99', (slice(5, 10, None), slice(0, 5, None))), ('xarray-<this-array>-e2865aa10d476e027154771611541f99', 0, 0): (<function _operator.getitem(a, b, /)>, 'xarray-<this-array>-e2865aa10d476e027154771611541f99', (slice(0, 5, None), slice(0, 5, None)))} ```

Notice that are references to xarray-<this-array>-e2865aa10d476e027154771611541f99 (just the string, not a tuple representing a chunk) but that key isn't in the graph.

If we manually insert that, you'll see things work

```python In [9]: dsk['xarray-<this-array>-e2865aa10d476e027154771611541f99'] = da._to_temp_dataset()[xr.core.dataarray._THIS_ARRAY]

In [11]: dask.get(dsk, keys=[('xarray-<this-array>-e2865aa10d476e027154771611541f99', 1, 0)]) Out[11]: (<xarray.DataArray \<this-array> (dim_0: 5, dim_1: 5)> dask.array<getitem, shape=(5, 5), dtype=float64, chunksize=(5, 5), chunktype=numpy.ndarray> Dimensions without coordinates: dim_0, dim_1,) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dask.optimize on xarray objects 550355524

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.806ms · About: xarray-datasette