home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 684930038 and user = 5821660 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • kmuehlbauer · 6 ✖

issue 1

  • Set `allow_rechunk=True` in `apply_ufunc` · 6 ✖

author_association 1

  • MEMBER 6
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
683621673 https://github.com/pydata/xarray/issues/4372#issuecomment-683621673 https://api.github.com/repos/pydata/xarray/issues/4372 MDEyOklzc3VlQ29tbWVudDY4MzYyMTY3Mw== kmuehlbauer 5821660 2020-08-31T07:43:34Z 2020-08-31T07:43:34Z MEMBER

@dcherian @shoyer

In #4392 I've tried to get around this bug. I found it easier to just catch the dask ValueError's and not add more code checks. I'll add more information in that PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Set `allow_rechunk=True` in `apply_ufunc` 684930038
682337819 https://github.com/pydata/xarray/issues/4372#issuecomment-682337819 https://api.github.com/repos/pydata/xarray/issues/4372 MDEyOklzc3VlQ29tbWVudDY4MjMzNzgxOQ== kmuehlbauer 5821660 2020-08-28T05:45:25Z 2020-08-28T05:45:25Z MEMBER

Another questions are, why does this kwarg exist in dask and why do they not rechunk per default?

Trying to answer this from looking at the dask code.

  • allow_rechunk=False: catch chunking problems in core and non-core-dimensions and raise an error. This helps to prevent users running into loading huge dask arrays into memory without further notice.
  • allow_rechunk=True: blockwise is called with align_arrays=True per default which means automatic rechunking for all arrays (core and non-core dimensions). Users can use this, if they are sure the system can handle possible large amounts of data.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Set `allow_rechunk=True` in `apply_ufunc` 684930038
682335024 https://github.com/pydata/xarray/issues/4372#issuecomment-682335024 https://api.github.com/repos/pydata/xarray/issues/4372 MDEyOklzc3VlQ29tbWVudDY4MjMzNTAyNA== kmuehlbauer 5821660 2020-08-28T05:35:28Z 2020-08-28T05:35:28Z MEMBER

From the dask apply_gufunc docstring:

```python """ allow_rechunk: Optional, bool, keyword only

Allows rechunking, otherwise chunk sizes need to match and core dimensions are to consist only of one chunk. 
Warning: enabling this can increase memory usage significantly. Defaults to False

""" ``` Current code handling in dask:

https://github.com/dask/dask/blob/42873f27ce11ce35652dda344dae5c47b742bef2/dask/array/gufunc.py#L398-L417

python if not allow_rechunk: chunksizes = chunksizess[dim] #### Check if core dimensions consist of only one chunk if (dim in core_shapes) and (chunksizes[0][0] < core_shapes[dim]): raise ValueError( "Core dimension `'{}'` consists of multiple chunks. To fix, rechunk into a single \ chunk along this dimension or set `allow_rechunk=True`, but beware that this may increase memory usage \ significantly.".format( dim ) ) #### Check if loop dimensions consist of same chunksizes, when they have sizes > 1 relevant_chunksizes = list( unique(c for s, c in zip(sizes, chunksizes) if s > 1) ) if len(relevant_chunksizes) > 1: raise ValueError( "Dimension `'{}'` with different chunksize present".format(dim) )

IIUTC, this not only rechunks non-core dimensions but also fixes core dimensions with more than one chunk. Would this be intended from the xarray-side? Before #4060 core dimension chunks>1 was catched and errored:

python # core dimensions cannot span multiple chunks for axis, dim in enumerate(core_dims, start=-len(core_dims)): if len(data.chunks[axis]) != 1: raise ValueError( "dimension {!r} on {}th function argument to " "apply_ufunc with dask='parallelized' consists of " "multiple chunks, but is also a core dimension. To " "fix, rechunk into a single dask array chunk along " "this dimension, i.e., ``.chunk({})``, but beware " "that this may significantly increase memory usage.".format( dim, n, {dim: -1} ) )

Explicit rechunk was recommended to the user, though.

That means setting allow_rechunk=True per default alone will not give us same behaviour as before #4060. I'm unsure how to proceed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Set `allow_rechunk=True` in `apply_ufunc` 684930038
682327998 https://github.com/pydata/xarray/issues/4372#issuecomment-682327998 https://api.github.com/repos/pydata/xarray/issues/4372 MDEyOklzc3VlQ29tbWVudDY4MjMyNzk5OA== kmuehlbauer 5821660 2020-08-28T05:09:19Z 2020-08-28T05:09:19Z MEMBER

@shoyer In this case: Should we warn the user, that data might be loaded into memory?

Another questions are, why does this kwarg exist in dask and why do they not rechunk per default?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Set `allow_rechunk=True` in `apply_ufunc` 684930038
681605223 https://github.com/pydata/xarray/issues/4372#issuecomment-681605223 https://api.github.com/repos/pydata/xarray/issues/4372 MDEyOklzc3VlQ29tbWVudDY4MTYwNTIyMw== kmuehlbauer 5821660 2020-08-27T06:13:03Z 2020-08-27T06:13:03Z MEMBER

One solution would be to catch this ValueError, issue a FutureWarning and add allow_rechunk=True to dask_gufunc_kwargs here:

https://github.com/pydata/xarray/blob/9c85dd5f792805bea319f01f08ee51b83bde0f3b/xarray/core/computation.py#L646-L657

```python def func(*arrays): import dask.array as da

        gufunc = functools.partial(
            da.apply_gufunc,
            numpy_func,
            signature.to_gufunc_string(exclude_dims),
            *arrays,
            vectorize=vectorize,
            output_dtypes=output_dtypes,
        )

        try:
            res = gufunc(**dask_gufunc_kwargs)
        except ValueError as exc:
            if "with different chunksize present" in str(exc):
                warnings.warn(
                    f"``allow_rechunk=True`` need to be explicitely set in the "
                    f"``dask_gufunc_kwargs`` parameter. Not setting will raise dask "
                    f"ValueError ``{str(exc)}`` in a future version.",
                    FutureWarning,
                    stacklevel=2,
                )
                dask_gufunc_kwargs["allow_rechunk"] = True
                res = gufunc(**dask_gufunc_kwargs)
            else:
                raise

```

I could make a PR out of this. The message wording can surely be improved. WDYT @dcherian and @shoyer?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Set `allow_rechunk=True` in `apply_ufunc` 684930038
679691483 https://github.com/pydata/xarray/issues/4372#issuecomment-679691483 https://api.github.com/repos/pydata/xarray/issues/4372 MDEyOklzc3VlQ29tbWVudDY3OTY5MTQ4Mw== kmuehlbauer 5821660 2020-08-25T05:20:32Z 2020-08-25T05:20:32Z MEMBER

The behaviour changed in #4060 (commit https://github.com/pydata/xarray/commit/a7fb5a9fa1a2b829181ea9e4986b959f315350dd). Please see discussion with regard to allow_rechunk over there. Reason to not handle/set allow_rechunk=True was @shoyer's comment.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Set `allow_rechunk=True` in `apply_ufunc` 684930038

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 32.651ms · About: xarray-datasette