home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

18 rows where issue = 758606082 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 8

  • keewis 7
  • AyrtonB 5
  • shoyer 1
  • dcherian 1
  • jsignell 1
  • martindurant 1
  • pep8speaks 1
  • sxwebster 1

author_association 3

  • MEMBER 9
  • CONTRIBUTOR 7
  • NONE 2

issue 1

  • xr.DataArray.from_dask_dataframe feature · 18 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1404008708 https://github.com/pydata/xarray/pull/4659#issuecomment-1404008708 https://api.github.com/repos/pydata/xarray/issues/4659 IC_kwDOAMm_X85Tr3kE dcherian 2448579 2023-01-25T17:52:58Z 2023-01-25T17:52:58Z MEMBER

indexes have come a long way since this PR was last touched.

We still don't have a lazy / out-of-core index unfortunately.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.from_dask_dataframe feature 758606082
1398534837 https://github.com/pydata/xarray/pull/4659#issuecomment-1398534837 https://api.github.com/repos/pydata/xarray/issues/4659 IC_kwDOAMm_X85TW_K1 jsignell 4806877 2023-01-20T15:11:13Z 2023-01-20T15:11:13Z CONTRIBUTOR

My understanding is that indexes have come a long way since this PR was last touched. Maybe now is the right time to rewrite this in a way that is more performant?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.from_dask_dataframe feature 758606082
1383512313 https://github.com/pydata/xarray/pull/4659#issuecomment-1383512313 https://api.github.com/repos/pydata/xarray/issues/4659 IC_kwDOAMm_X85Sdrj5 sxwebster 57381773 2023-01-16T05:26:03Z 2023-01-16T05:26:57Z NONE

I'm quite supportive of this effort as it would make raster calculation operations a whole lot more straight forward, not to mention doing things like joins of the dataframe which don't necessarily need to exist with the xarray object if selected columns are pushed back to rioxarray as bands. .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.from_dask_dataframe feature 758606082
824167488 https://github.com/pydata/xarray/pull/4659#issuecomment-824167488 https://api.github.com/repos/pydata/xarray/issues/4659 MDEyOklzc3VlQ29tbWVudDgyNDE2NzQ4OA== shoyer 1217238 2021-04-21T15:47:56Z 2021-04-21T15:47:56Z MEMBER

My main concern is really just if anybody will find this function useful in its current state, with all of the serious performance limitations. I expect conversion from dask data frames to xarray will be much more useful when we support out of core indexing, or can unstuck multiple columns into multidimensional arrays.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.from_dask_dataframe feature 758606082
818269258 https://github.com/pydata/xarray/pull/4659#issuecomment-818269258 https://api.github.com/repos/pydata/xarray/issues/4659 MDEyOklzc3VlQ29tbWVudDgxODI2OTI1OA== keewis 14808389 2021-04-12T21:56:59Z 2021-04-12T21:56:59Z MEMBER

this should be ready for review

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.from_dask_dataframe feature 758606082
818182598 https://github.com/pydata/xarray/pull/4659#issuecomment-818182598 https://api.github.com/repos/pydata/xarray/issues/4659 MDEyOklzc3VlQ29tbWVudDgxODE4MjU5OA== keewis 14808389 2021-04-12T20:30:23Z 2021-04-12T21:56:33Z MEMBER

@AyrtonB, I took the liberty of pushing the changes I had in mind to your branch, using a adapted version of your docstring. The only thing that should be missing is to figure out if it's possible to reduce the number of computes to 2 instead of n_columns + 1.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.from_dask_dataframe feature 758606082
740012453 https://github.com/pydata/xarray/pull/4659#issuecomment-740012453 https://api.github.com/repos/pydata/xarray/issues/4659 MDEyOklzc3VlQ29tbWVudDc0MDAxMjQ1Mw== pep8speaks 24736507 2020-12-07T16:05:03Z 2021-04-12T21:29:30Z NONE

Hello @AyrtonB! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2021-04-12 21:29:29 UTC
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.from_dask_dataframe feature 758606082
811388476 https://github.com/pydata/xarray/pull/4659#issuecomment-811388476 https://api.github.com/repos/pydata/xarray/issues/4659 MDEyOklzc3VlQ29tbWVudDgxMTM4ODQ3Ng== keewis 14808389 2021-03-31T19:40:51Z 2021-03-31T19:40:51Z MEMBER

@pydata/xarray, any opinion on the API design?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.from_dask_dataframe feature 758606082
798989229 https://github.com/pydata/xarray/pull/4659#issuecomment-798989229 https://api.github.com/repos/pydata/xarray/issues/4659 MDEyOklzc3VlQ29tbWVudDc5ODk4OTIyOQ== keewis 14808389 2021-03-14T22:10:00Z 2021-03-14T22:10:00Z MEMBER

I don't think there is a lot left to decide: we want to keep the conversion logic in from_dask_dataframe and maybe helper functions, and I think we should mirror the pandas integration as close as possible (which means we need a Dataset.from_dask_dataframe and a DataArray.from_dask_series class method).

The only thing I think is left to figure out is how to best compute the chunk sizes with as few computations of dask (as defined by raise_if_dask_computes) as possible.

cc @dcherian

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.from_dask_dataframe feature 758606082
797555413 https://github.com/pydata/xarray/pull/4659#issuecomment-797555413 https://api.github.com/repos/pydata/xarray/issues/4659 MDEyOklzc3VlQ29tbWVudDc5NzU1NTQxMw== AyrtonB 29051639 2021-03-12T15:17:16Z 2021-03-12T15:17:16Z CONTRIBUTOR

From what I can gather there are more serious back-end considerations needed before this can be progressed.

Personally, I've been monkey-patching this code in which has solved my particular use-case, hopefully it's helpful for yours.

```python import xarray as xr import pandas as pd import numpy as np

import dask.dataframe as dd from dask.distributed import Client

import numcodecs from types import ModuleType from datetime import timedelta

from dask.dataframe.core import DataFrame as ddf from numbers import Number from typing import Any, Union, Sequence, Tuple, Mapping, Hashable, Dict, Optional, Set

from xarray.core import dtypes, groupby, rolling, resample, weighted, utils# from xarray.core.accessor_dt import CombinedDatetimelikeAccessor from xarray.core.variable import Variable, IndexVariable from xarray.core.merge import PANDAS_TYPES from xarray.core.variable import NON_NUMPY_SUPPORTED_ARRAY_TYPES, IS_NEP18_ACTIVE, _maybe_wrap_data, _possibly_convert_objects from xarray.core.dataarray import _check_data_shape, _infer_coords_and_dims, _extract_indexes_from_coords from xarray.core.common import ImplementsDatasetReduce, DataWithCoords

def as_compatible_data(data, fastpath=False): """Prepare and wrap data to put in a Variable. - If data does not have the necessary attributes, convert it to ndarray. - If data has dtype=datetime64, ensure that it has ns precision. If it's a pandas.Timestamp, convert it to datetime64. - If data is already a pandas or xarray object (other than an Index), just use the values. Finally, wrap it up with an adapter if necessary. """ if fastpath and getattr(data, "ndim", 0) > 0: # can't use fastpath (yet) for scalars return _maybe_wrap_data(data)

# *** Start of monkey-patch changes ***
if isinstance(data, (ddf,)):
    return data.to_dask_array(lengths=True)
# *** End of monkey-patch changes ***

if isinstance(data, Variable):
    return data.data

if isinstance(data, NON_NUMPY_SUPPORTED_ARRAY_TYPES):
    return _maybe_wrap_data(data)
if isinstance(data, tuple):
    data = utils.to_0d_object_array(data)
if isinstance(data, pd.Timestamp):
    # TODO: convert, handle datetime objects, too
    data = np.datetime64(data.value, "ns")
if isinstance(data, timedelta):
    data = np.timedelta64(getattr(data, "value", data), "ns")
# we don't want nested self-described arrays
data = getattr(data, "values", data)
if isinstance(data, np.ma.MaskedArray):
    mask = np.ma.getmaskarray(data)
    if mask.any():
        dtype, fill_value = dtypes.maybe_promote(data.dtype)
        data = np.asarray(data, dtype=dtype)
        data[mask] = fill_value
    else:
        data = np.asarray(data)
if not isinstance(data, np.ndarray):
    if hasattr(data, "__array_function__"):
        if IS_NEP18_ACTIVE:
            return data
        else:
            raise TypeError(
                "Got an NumPy-like array type providing the "
                "__array_function__ protocol but NEP18 is not enabled. "
                "Check that numpy >= v1.16 and that the environment "
                'variable "NUMPY_EXPERIMENTAL_ARRAY_FUNCTION" is set to '
                '"1"'
            )
# validate whether the data is valid data types.
data = np.asarray(data)
if isinstance(data, np.ndarray):
    if data.dtype.kind == "O":
        data = _possibly_convert_objects(data)
    elif data.dtype.kind == "M":
        data = _possibly_convert_objects(data)
    elif data.dtype.kind == "m":
        data = _possibly_convert_objects(data)

return _maybe_wrap_data(data)

xr.core.variable.as_compatible_data = as_compatible_data

class DataArray(xr.core.dataarray.DataArray):

_cache: Dict[str, Any]
_coords: Dict[Any, Variable]
_indexes: Optional[Dict[Hashable, pd.Index]]
_name: Optional[Hashable]
_variable: Variable

__slots__ = (
    "_cache",
    "_coords",
    "_file_obj",
    "_indexes",
    "_name",
    "_variable"
)

_groupby_cls = groupby.DataArrayGroupBy
_rolling_cls = rolling.DataArrayRolling
_coarsen_cls = rolling.DataArrayCoarsen
_resample_cls = resample.DataArrayResample
_weighted_cls = weighted.DataArrayWeighted

dt = utils.UncachedAccessor(CombinedDatetimelikeAccessor)

def __init__(
    self,
    data: Any = dtypes.NA,
    coords: Union[Sequence[Tuple], Mapping[Hashable, Any], None] = None,
    dims: Union[Hashable, Sequence[Hashable], None] = None,
    name: Hashable = None,
    attrs: Mapping = None,
    # internal parameters
    indexes: Dict[Hashable, pd.Index] = None,
    fastpath: bool = False,
):
    if fastpath:
        variable = data
        assert dims is None
        assert attrs is None
    else:
        # try to fill in arguments from data if they weren't supplied
        if coords is None:

            if isinstance(data, DataArray):
                coords = data.coords
            elif isinstance(data, pd.Series):
                coords = [data.index]
            elif isinstance(data, pd.DataFrame):
                coords = [data.index, data.columns]
            elif isinstance(data, (pd.Index, IndexVariable)):
                coords = [data]
            elif isinstance(data, pdcompat.Panel):
                coords = [data.items, data.major_axis, data.minor_axis]

        if dims is None:
            dims = getattr(data, "dims", getattr(coords, "dims", None))
        if name is None:
            name = getattr(data, "name", None)
        if attrs is None and not isinstance(data, PANDAS_TYPES):
            attrs = getattr(data, "attrs", None)

        # *** Start of monkey-patch changes ***
        def compute_delayed_tuple_elements(tuple_):
            tuple_ = tuple(
                [
                    elem.compute() if hasattr(elem, "compute") else elem
                    for elem in tuple_
                ]
            )

            return tuple_

        shape = compute_delayed_tuple_elements(data.shape)
        coords = compute_delayed_tuple_elements(coords)

        data = _check_data_shape(data, coords, dims)
        data = as_compatible_data(data)
        coords, dims = _infer_coords_and_dims(shape, coords, dims)
        # *** End of monkey-patch changes ***

        variable = Variable(dims, data, attrs, fastpath=True)
        indexes = dict(
            _extract_indexes_from_coords(coords)
        )  # needed for to_dataset

    # These fully describe a DataArray
    self._variable = variable
    assert isinstance(coords, dict)
    self._coords = coords
    self._name = name

    # TODO(shoyer): document this argument, once it becomes part of the
    # public interface.
    self._indexes = indexes

    self._file_obj = None

@classmethod
def from_dask_dataframe(cls, ddf, index_name: str = "", columns_name: str = ""):
    """Convert a pandas.DataFrame into an xarray.DataArray
    This method will produce a DataArray from a Dask DataFrame.
    Dimensions are loaded into memory but the data itself remains
    a Dask Array. The dataframe you pass can contain only one data-type.
    Parameters
    ----------
    ddf: DataFrame
        Dask DataFrame from which to copy data and indices.
    index_name: str
        Name of the dimension that will be created from the index
    columns_name: str
        Name of the dimension that will be created from the columns
    Returns
    -------
    New DataArray.
    See also
    --------
    xarray.DataSet.from_dataframe
    xarray.DataArray.from_series
    pandas.DataFrame.to_xarray
    """
    assert len(set(ddf.dtypes)) == 1, "Each variable can include only one data-type"

    def extract_dim_name(df, dim="index"):
        if getattr(ddf, dim).name is None:
            getattr(ddf, dim).name = dim

        dim_name = getattr(ddf, dim).name

        return dim_name

    if index_name == "":
        index_name = extract_dim_name(ddf, "index")
    if columns_name == "":
        columns_name = extract_dim_name(ddf, "columns")

    dims = dict.fromkeys([index_name, columns_name], df.shape)
    da = cls(ddf, coords=[ddf.index, ddf.columns], dims=dims)

    return da

xr.core.dataarray.DataArray = DataArray xr.DataArray = DataArray

def _maybe_chunk( name, var, chunks=None, token=None, lock=None, name_prefix="xarray-", overwrite_encoded_chunks=False, ): from dask.base import tokenize

if chunks is not None:
    chunks = {dim: chunks[dim] for dim in var.dims if dim in chunks}
if var.ndim:
    # when rechunking by different amounts, make sure dask names change
    # by provinding chunks as an input to tokenize.
    # subtle bugs result otherwise. see GH3350
    token2 = tokenize(name, token if token else var._data, chunks)
    name2 = f"{name_prefix}{name}-{token2}"
    var = var.chunk(chunks, name=name2, lock=lock)

    if overwrite_encoded_chunks and var.chunks is not None:
        var.encoding["chunks"] = tuple(x[0] for x in var.chunks)
    return var
else:
    return var

class Dataset(xr.Dataset): """A multi-dimensional, in memory, array database.

A dataset resembles an in-memory representation of a NetCDF file,
and consists of variables, coordinates and attributes which
together form a self describing dataset.

Dataset implements the mapping interface with keys given by variable
names and values given by DataArray objects for each variable name.

One dimensional variables with name equal to their dimension are
index coordinates used for label based indexing.

To load data from a file or file-like object, use the `open_dataset`
function.

Parameters
----------
data_vars : dict-like, optional
    A mapping from variable names to :py:class:`~xarray.DataArray`
    objects, :py:class:`~xarray.Variable` objects or to tuples of
    the form ``(dims, data[, attrs])`` which can be used as
    arguments to create a new ``Variable``. Each dimension must
    have the same length in all variables in which it appears.

    The following notations are accepted:

    - mapping {var name: DataArray}
    - mapping {var name: Variable}
    - mapping {var name: (dimension name, array-like)}
    - mapping {var name: (tuple of dimension names, array-like)}
    - mapping {dimension name: array-like}
      (it will be automatically moved to coords, see below)

    Each dimension must have the same length in all variables in
    which it appears.
coords : dict-like, optional
    Another mapping in similar form as the `data_vars` argument,
    except the each item is saved on the dataset as a "coordinate".
    These variables have an associated meaning: they describe
    constant/fixed/independent quantities, unlike the
    varying/measured/dependent quantities that belong in
    `variables`. Coordinates values may be given by 1-dimensional
    arrays or scalars, in which case `dims` do not need to be
    supplied: 1D arrays will be assumed to give index values along
    the dimension with the same name.

    The following notations are accepted:

    - mapping {coord name: DataArray}
    - mapping {coord name: Variable}
    - mapping {coord name: (dimension name, array-like)}
    - mapping {coord name: (tuple of dimension names, array-like)}
    - mapping {dimension name: array-like}
      (the dimension name is implicitly set to be the same as the
      coord name)

    The last notation implies that the coord name is the same as
    the dimension name.

attrs : dict-like, optional
    Global attributes to save on this dataset.

Examples
--------
Create data:

>>> np.random.seed(0)
>>> temperature = 15 + 8 * np.random.randn(2, 2, 3)
>>> precipitation = 10 * np.random.rand(2, 2, 3)
>>> lon = [[-99.83, -99.32], [-99.79, -99.23]]
>>> lat = [[42.25, 42.21], [42.63, 42.59]]
>>> time = pd.date_range("2014-09-06", periods=3)
>>> reference_time = pd.Timestamp("2014-09-05")

Initialize a dataset with multiple dimensions:

>>> ds = xr.Dataset(
...     data_vars=dict(
...         temperature=(["x", "y", "time"], temperature),
...         precipitation=(["x", "y", "time"], precipitation),
...     ),
...     coords=dict(
...         lon=(["x", "y"], lon),
...         lat=(["x", "y"], lat),
...         time=time,
...         reference_time=reference_time,
...     ),
...     attrs=dict(description="Weather related data."),
... )
>>> ds
<xarray.Dataset>
Dimensions:         (time: 3, x: 2, y: 2)
Coordinates:
    lon             (x, y) float64 -99.83 -99.32 -99.79 -99.23
    lat             (x, y) float64 42.25 42.21 42.63 42.59
  * time            (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08
    reference_time  datetime64[ns] 2014-09-05
Dimensions without coordinates: x, y
Data variables:
    temperature     (x, y, time) float64 29.11 18.2 22.83 ... 18.28 16.15 26.63
    precipitation   (x, y, time) float64 5.68 9.256 0.7104 ... 7.992 4.615 7.805
Attributes:
    description:  Weather related data.

Find out where the coldest temperature was and what values the
other variables had:

>>> ds.isel(ds.temperature.argmin(...))
<xarray.Dataset>
Dimensions:         ()
Coordinates:
    lon             float64 -99.32
    lat             float64 42.21
    time            datetime64[ns] 2014-09-08
    reference_time  datetime64[ns] 2014-09-05
Data variables:
    temperature     float64 7.182
    precipitation   float64 8.326
Attributes:
    description:  Weather related data.
"""

__slots__ = ['foo']

def __init__(self, *args, **kwargs):
    super().__init__(*args, **kwargs)

def chunk(
    self,
    chunks: Union[None, Number, str, Mapping[Hashable, Union[None, Number, str, Tuple[Number, ...]]],] = None,
    name_prefix: str = "xarray-",
    token: str = None,
    lock: bool = False,
) -> "Dataset":
    """Coerce all arrays in this dataset into dask arrays with the given
    chunks.

    Non-dask arrays in this dataset will be converted to dask arrays. Dask
    arrays will be rechunked to the given chunk sizes.

    If neither chunks is not provided for one or more dimensions, chunk
    sizes along that dimension will not be updated; non-dask arrays will be
    converted into dask arrays with a single block.

    Parameters
    ----------
    chunks : int, 'auto' or mapping, optional
        Chunk sizes along each dimension, e.g., ``5`` or
        ``{"x": 5, "y": 5}``.
    name_prefix : str, optional
        Prefix for the name of any new dask arrays.
    token : str, optional
        Token uniquely identifying this dataset.
    lock : optional
        Passed on to :py:func:`dask.array.from_array`, if the array is not
        already as dask array.

    Returns
    -------
    chunked : xarray.Dataset
    """

    if isinstance(chunks, (Number, str)):
        chunks = dict.fromkeys(self.dims, chunks)

    if isinstance(chunks, (tuple, list)):
        chunks = dict(zip(self.dims, chunks))

    if chunks is not None:
        bad_dims = chunks.keys() - self.dims.keys()
        if bad_dims:
            raise ValueError("some chunks keys are not dimensions on this " "object: %s" % bad_dims)

    variables = {k: _maybe_chunk(k, v, chunks, token, lock, name_prefix) for k, v in self.variables.items()}
    return self._replace(variables)

xr.core.dataarray.Dataset = Dataset xr.Dataset = Dataset ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.from_dask_dataframe feature 758606082
797547241 https://github.com/pydata/xarray/pull/4659#issuecomment-797547241 https://api.github.com/repos/pydata/xarray/issues/4659 MDEyOklzc3VlQ29tbWVudDc5NzU0NzI0MQ== martindurant 6042212 2021-03-12T15:04:34Z 2021-03-12T15:04:34Z CONTRIBUTOR

Ping, can I please ask what the current status is here?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.from_dask_dataframe feature 758606082
740041249 https://github.com/pydata/xarray/pull/4659#issuecomment-740041249 https://api.github.com/repos/pydata/xarray/issues/4659 MDEyOklzc3VlQ29tbWVudDc0MDA0MTI0OQ== keewis 14808389 2020-12-07T16:50:03Z 2020-12-07T16:51:34Z MEMBER

there's a few things to fix in pycompat for this to work: first of all, import dask.dataframe before accessing dask.dataframe.core.DataFrame. We should also move the assignment to dask_dataframe_type to its own try / except block since it's possible to have dask.array but not dask.dataframe installed. And the reason for the ImportError you got is that we need a value for dask_dataframe_type if there was a ImportError. I'm thinking of something like this: ```python try: import dask.dataframe

dask_dataframe_type = (dask.dataframe.core.DataFrame,)

except ImportError: dask_dataframe_type = () ```

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.from_dask_dataframe feature 758606082
740032261 https://github.com/pydata/xarray/pull/4659#issuecomment-740032261 https://api.github.com/repos/pydata/xarray/issues/4659 MDEyOklzc3VlQ29tbWVudDc0MDAzMjI2MQ== AyrtonB 29051639 2020-12-07T16:36:36Z 2020-12-07T16:36:36Z CONTRIBUTOR

I've added dask_dataframe_type = (dask.dataframe.core.DataFrame,) to pycompat but now see: ImportError: cannot import name 'dask_dataframe_type' despite it being in there

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.from_dask_dataframe feature 758606082
740020080 https://github.com/pydata/xarray/pull/4659#issuecomment-740020080 https://api.github.com/repos/pydata/xarray/issues/4659 MDEyOklzc3VlQ29tbWVudDc0MDAyMDA4MA== AyrtonB 29051639 2020-12-07T16:17:25Z 2020-12-07T16:17:25Z CONTRIBUTOR

That makes sense, thanks @keewis

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.from_dask_dataframe feature 758606082
740006353 https://github.com/pydata/xarray/pull/4659#issuecomment-740006353 https://api.github.com/repos/pydata/xarray/issues/4659 MDEyOklzc3VlQ29tbWVudDc0MDAwNjM1Mw== keewis 14808389 2020-12-07T15:55:12Z 2020-12-07T15:55:12Z MEMBER

sorry, it is indeed called dask_array_compat. Looking closer, you probably won't be able to use that. Instead, I'd advise to do a local import (for an example see Dataset.to_dask_dataframe). For the change in variable.py I would use the same pattern as for pycompat.dask_array_type, so if dask.dataframe is not available dask_dataframe_type should be ().

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.from_dask_dataframe feature 758606082
740002632 https://github.com/pydata/xarray/pull/4659#issuecomment-740002632 https://api.github.com/repos/pydata/xarray/issues/4659 MDEyOklzc3VlQ29tbWVudDc0MDAwMjYzMg== AyrtonB 29051639 2020-12-07T15:49:00Z 2020-12-07T15:49:00Z CONTRIBUTOR

Thanks, yes I need to load the library for type-hinting and type checks.

When you say dask_compat is that the same as dask_array_compat? How would I use them instead of Dask, could I use say from dask_compat.dataframe.core import DataFrame as ddf instead of from dask.dataframe.core import DataFrame as ddf?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.from_dask_dataframe feature 758606082
739994871 https://github.com/pydata/xarray/pull/4659#issuecomment-739994871 https://api.github.com/repos/pydata/xarray/issues/4659 MDEyOklzc3VlQ29tbWVudDczOTk5NDg3MQ== keewis 14808389 2020-12-07T15:36:57Z 2020-12-07T15:42:22Z MEMBER

you can just decorate tests that require dask with requires_dask and they will be skipped automatically if dask is not installed

Edit: actually, you seem to import dask in some modules, which is not what we want. We usually use either the dask_compat module or pycompat.dask_array_type to work around that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.from_dask_dataframe feature 758606082
739988806 https://github.com/pydata/xarray/pull/4659#issuecomment-739988806 https://api.github.com/repos/pydata/xarray/issues/4659 MDEyOklzc3VlQ29tbWVudDczOTk4ODgwNg== AyrtonB 29051639 2020-12-07T15:27:10Z 2020-12-07T15:27:10Z CONTRIBUTOR

During testing I'm currently encountering the issue: ModuleNotFoundError: No module named 'dask'

How should testing of dask DataArrays be approached?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.from_dask_dataframe feature 758606082

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.958ms · About: xarray-datasette