html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/1068#issuecomment-966198058,https://api.github.com/repos/pydata/xarray/issues/1068,966198058,IC_kwDOAMm_X845lwMq,29051639,2021-11-11T10:46:16Z,2021-11-11T10:46:16Z,CONTRIBUTOR,Unfortunately not @zjans ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,186169975
https://github.com/pydata/xarray/issues/1068#issuecomment-864477138,https://api.github.com/repos/pydata/xarray/issues/1068,864477138,MDEyOklzc3VlQ29tbWVudDg2NDQ3NzEzOA==,29051639,2021-06-19T23:51:09Z,2021-06-19T23:51:09Z,CONTRIBUTOR,"I'm also getting the same error when running `xr.open_dataset(store)` even though I have accepted the EULA. Has anyone had success solving this?

I'm using pydap==3.2.2 and xarray==0.18.0, any help would be much appreciated!

```python
import xarray as xr
from pydap.client import open_url
from pydap.cas.urs import setup_session

username = ""my_username""
password= ""my_password""

url = 'https://goldsmr4.gesdisc.eosdis.nasa.gov/opendap/MERRA2/M2T1NXSLV.5.12.4/2016/06/MERRA2_400.tavg1_2d_slv_Nx.20160601.nc4'

session = setup_session(username, password, check_url=url)
pydap_ds = open_url(url, session=session)

store = xr.backends.PydapDataStore(pydap_ds)
ds = xr.open_dataset(store)
```

```html
HTTPError: 302 Found
<!DOCTYPE HTML PUBLIC ""-//IETF//DTD HTML 2.0//EN"">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href=""https://urs.earthdata.nasa.gov/oauth/authorize/?scope=uid&amp;app_type=401&amp;client_id=e2WVk8Pw6weeLUKZYOxvTQ&amp;response_type=code&amp;redirect_uri=http%3A%2F%2Fgoldsmr4.gesdisc.eosdis.nasa.gov%2Fdata-redirect&amp;state=aHR0cHM6Ly9nb2xkc21yNC5nZXNkaXNjLmVvc2Rpcy5uYXNhLmdvdi9vcGVuZGFwL01FUlJBMi9NMlQxTlhTTFYuNS4xMi40LzIwMTYvMDYvTUVSUkEyXzQwMC50YXZnMV8yZF9zbHZfTnguMjAxNjA2MDEubmM0LmRvZHM%2FdGltZSU1QjA6MTowJTVE"">here</a>.</p>
</body></html>
```","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,186169975
https://github.com/pydata/xarray/pull/4659#issuecomment-797555413,https://api.github.com/repos/pydata/xarray/issues/4659,797555413,MDEyOklzc3VlQ29tbWVudDc5NzU1NTQxMw==,29051639,2021-03-12T15:17:16Z,2021-03-12T15:17:16Z,CONTRIBUTOR,"From what I can gather there are more serious back-end considerations needed before this can be progressed.

Personally, I've been monkey-patching this code in which has solved my particular use-case, hopefully it's helpful for yours.

```python
import xarray as xr
import pandas as pd
import numpy as np

import dask.dataframe as dd
from dask.distributed import Client

import numcodecs
from types import ModuleType
from datetime import timedelta

from dask.dataframe.core import DataFrame as ddf
from numbers import Number
from typing import Any, Union, Sequence, Tuple, Mapping, Hashable, Dict, Optional, Set

from xarray.core import dtypes, groupby, rolling, resample, weighted, utils#
from xarray.core.accessor_dt import CombinedDatetimelikeAccessor
from xarray.core.variable import Variable, IndexVariable
from xarray.core.merge import PANDAS_TYPES
from xarray.core.variable import NON_NUMPY_SUPPORTED_ARRAY_TYPES, IS_NEP18_ACTIVE, _maybe_wrap_data, _possibly_convert_objects
from xarray.core.dataarray import _check_data_shape, _infer_coords_and_dims, _extract_indexes_from_coords
from xarray.core.common import ImplementsDatasetReduce, DataWithCoords

def as_compatible_data(data, fastpath=False):
    """"""Prepare and wrap data to put in a Variable.
    - If data does not have the necessary attributes, convert it to ndarray.
    - If data has dtype=datetime64, ensure that it has ns precision. If it's a
      pandas.Timestamp, convert it to datetime64.
    - If data is already a pandas or xarray object (other than an Index), just
      use the values.
    Finally, wrap it up with an adapter if necessary.
    """"""
    if fastpath and getattr(data, ""ndim"", 0) > 0:
        # can't use fastpath (yet) for scalars
        return _maybe_wrap_data(data)

    # *** Start of monkey-patch changes ***
    if isinstance(data, (ddf,)):
        return data.to_dask_array(lengths=True)
    # *** End of monkey-patch changes ***

    if isinstance(data, Variable):
        return data.data

    if isinstance(data, NON_NUMPY_SUPPORTED_ARRAY_TYPES):
        return _maybe_wrap_data(data)
    if isinstance(data, tuple):
        data = utils.to_0d_object_array(data)
    if isinstance(data, pd.Timestamp):
        # TODO: convert, handle datetime objects, too
        data = np.datetime64(data.value, ""ns"")
    if isinstance(data, timedelta):
        data = np.timedelta64(getattr(data, ""value"", data), ""ns"")
    # we don't want nested self-described arrays
    data = getattr(data, ""values"", data)
    if isinstance(data, np.ma.MaskedArray):
        mask = np.ma.getmaskarray(data)
        if mask.any():
            dtype, fill_value = dtypes.maybe_promote(data.dtype)
            data = np.asarray(data, dtype=dtype)
            data[mask] = fill_value
        else:
            data = np.asarray(data)
    if not isinstance(data, np.ndarray):
        if hasattr(data, ""__array_function__""):
            if IS_NEP18_ACTIVE:
                return data
            else:
                raise TypeError(
                    ""Got an NumPy-like array type providing the ""
                    ""__array_function__ protocol but NEP18 is not enabled. ""
                    ""Check that numpy >= v1.16 and that the environment ""
                    'variable ""NUMPY_EXPERIMENTAL_ARRAY_FUNCTION"" is set to '
                    '""1""'
                )
    # validate whether the data is valid data types.
    data = np.asarray(data)
    if isinstance(data, np.ndarray):
        if data.dtype.kind == ""O"":
            data = _possibly_convert_objects(data)
        elif data.dtype.kind == ""M"":
            data = _possibly_convert_objects(data)
        elif data.dtype.kind == ""m"":
            data = _possibly_convert_objects(data)

    return _maybe_wrap_data(data)

xr.core.variable.as_compatible_data = as_compatible_data

class DataArray(xr.core.dataarray.DataArray):

    _cache: Dict[str, Any]
    _coords: Dict[Any, Variable]
    _indexes: Optional[Dict[Hashable, pd.Index]]
    _name: Optional[Hashable]
    _variable: Variable

    __slots__ = (
        ""_cache"",
        ""_coords"",
        ""_file_obj"",
        ""_indexes"",
        ""_name"",
        ""_variable""
    )

    _groupby_cls = groupby.DataArrayGroupBy
    _rolling_cls = rolling.DataArrayRolling
    _coarsen_cls = rolling.DataArrayCoarsen
    _resample_cls = resample.DataArrayResample
    _weighted_cls = weighted.DataArrayWeighted

    dt = utils.UncachedAccessor(CombinedDatetimelikeAccessor)

    def __init__(
        self,
        data: Any = dtypes.NA,
        coords: Union[Sequence[Tuple], Mapping[Hashable, Any], None] = None,
        dims: Union[Hashable, Sequence[Hashable], None] = None,
        name: Hashable = None,
        attrs: Mapping = None,
        # internal parameters
        indexes: Dict[Hashable, pd.Index] = None,
        fastpath: bool = False,
    ):
        if fastpath:
            variable = data
            assert dims is None
            assert attrs is None
        else:
            # try to fill in arguments from data if they weren't supplied
            if coords is None:

                if isinstance(data, DataArray):
                    coords = data.coords
                elif isinstance(data, pd.Series):
                    coords = [data.index]
                elif isinstance(data, pd.DataFrame):
                    coords = [data.index, data.columns]
                elif isinstance(data, (pd.Index, IndexVariable)):
                    coords = [data]
                elif isinstance(data, pdcompat.Panel):
                    coords = [data.items, data.major_axis, data.minor_axis]

            if dims is None:
                dims = getattr(data, ""dims"", getattr(coords, ""dims"", None))
            if name is None:
                name = getattr(data, ""name"", None)
            if attrs is None and not isinstance(data, PANDAS_TYPES):
                attrs = getattr(data, ""attrs"", None)

            # *** Start of monkey-patch changes ***
            def compute_delayed_tuple_elements(tuple_):
                tuple_ = tuple(
                    [
                        elem.compute() if hasattr(elem, ""compute"") else elem
                        for elem in tuple_
                    ]
                )

                return tuple_

            shape = compute_delayed_tuple_elements(data.shape)
            coords = compute_delayed_tuple_elements(coords)

            data = _check_data_shape(data, coords, dims)
            data = as_compatible_data(data)
            coords, dims = _infer_coords_and_dims(shape, coords, dims)
            # *** End of monkey-patch changes ***

            variable = Variable(dims, data, attrs, fastpath=True)
            indexes = dict(
                _extract_indexes_from_coords(coords)
            )  # needed for to_dataset

        # These fully describe a DataArray
        self._variable = variable
        assert isinstance(coords, dict)
        self._coords = coords
        self._name = name

        # TODO(shoyer): document this argument, once it becomes part of the
        # public interface.
        self._indexes = indexes

        self._file_obj = None

    @classmethod
    def from_dask_dataframe(cls, ddf, index_name: str = """", columns_name: str = """"):
        """"""Convert a pandas.DataFrame into an xarray.DataArray
        This method will produce a DataArray from a Dask DataFrame.
        Dimensions are loaded into memory but the data itself remains
        a Dask Array. The dataframe you pass can contain only one data-type.
        Parameters
        ----------
        ddf: DataFrame
            Dask DataFrame from which to copy data and indices.
        index_name: str
            Name of the dimension that will be created from the index
        columns_name: str
            Name of the dimension that will be created from the columns
        Returns
        -------
        New DataArray.
        See also
        --------
        xarray.DataSet.from_dataframe
        xarray.DataArray.from_series
        pandas.DataFrame.to_xarray
        """"""
        assert len(set(ddf.dtypes)) == 1, ""Each variable can include only one data-type""

        def extract_dim_name(df, dim=""index""):
            if getattr(ddf, dim).name is None:
                getattr(ddf, dim).name = dim

            dim_name = getattr(ddf, dim).name

            return dim_name

        if index_name == """":
            index_name = extract_dim_name(ddf, ""index"")
        if columns_name == """":
            columns_name = extract_dim_name(ddf, ""columns"")

        dims = dict.fromkeys([index_name, columns_name], df.shape)
        da = cls(ddf, coords=[ddf.index, ddf.columns], dims=dims)

        return da

xr.core.dataarray.DataArray = DataArray
xr.DataArray = DataArray

def _maybe_chunk(
    name, var, chunks=None, token=None, lock=None, name_prefix=""xarray-"", overwrite_encoded_chunks=False,
):
    from dask.base import tokenize

    if chunks is not None:
        chunks = {dim: chunks[dim] for dim in var.dims if dim in chunks}
    if var.ndim:
        # when rechunking by different amounts, make sure dask names change
        # by provinding chunks as an input to tokenize.
        # subtle bugs result otherwise. see GH3350
        token2 = tokenize(name, token if token else var._data, chunks)
        name2 = f""{name_prefix}{name}-{token2}""
        var = var.chunk(chunks, name=name2, lock=lock)

        if overwrite_encoded_chunks and var.chunks is not None:
            var.encoding[""chunks""] = tuple(x[0] for x in var.chunks)
        return var
    else:
        return var

class Dataset(xr.Dataset):
    """"""A multi-dimensional, in memory, array database.

    A dataset resembles an in-memory representation of a NetCDF file,
    and consists of variables, coordinates and attributes which
    together form a self describing dataset.

    Dataset implements the mapping interface with keys given by variable
    names and values given by DataArray objects for each variable name.

    One dimensional variables with name equal to their dimension are
    index coordinates used for label based indexing.

    To load data from a file or file-like object, use the `open_dataset`
    function.

    Parameters
    ----------
    data_vars : dict-like, optional
        A mapping from variable names to :py:class:`~xarray.DataArray`
        objects, :py:class:`~xarray.Variable` objects or to tuples of
        the form ``(dims, data[, attrs])`` which can be used as
        arguments to create a new ``Variable``. Each dimension must
        have the same length in all variables in which it appears.

        The following notations are accepted:

        - mapping {var name: DataArray}
        - mapping {var name: Variable}
        - mapping {var name: (dimension name, array-like)}
        - mapping {var name: (tuple of dimension names, array-like)}
        - mapping {dimension name: array-like}
          (it will be automatically moved to coords, see below)

        Each dimension must have the same length in all variables in
        which it appears.
    coords : dict-like, optional
        Another mapping in similar form as the `data_vars` argument,
        except the each item is saved on the dataset as a ""coordinate"".
        These variables have an associated meaning: they describe
        constant/fixed/independent quantities, unlike the
        varying/measured/dependent quantities that belong in
        `variables`. Coordinates values may be given by 1-dimensional
        arrays or scalars, in which case `dims` do not need to be
        supplied: 1D arrays will be assumed to give index values along
        the dimension with the same name.

        The following notations are accepted:

        - mapping {coord name: DataArray}
        - mapping {coord name: Variable}
        - mapping {coord name: (dimension name, array-like)}
        - mapping {coord name: (tuple of dimension names, array-like)}
        - mapping {dimension name: array-like}
          (the dimension name is implicitly set to be the same as the
          coord name)

        The last notation implies that the coord name is the same as
        the dimension name.

    attrs : dict-like, optional
        Global attributes to save on this dataset.

    Examples
    --------
    Create data:

    >>> np.random.seed(0)
    >>> temperature = 15 + 8 * np.random.randn(2, 2, 3)
    >>> precipitation = 10 * np.random.rand(2, 2, 3)
    >>> lon = [[-99.83, -99.32], [-99.79, -99.23]]
    >>> lat = [[42.25, 42.21], [42.63, 42.59]]
    >>> time = pd.date_range(""2014-09-06"", periods=3)
    >>> reference_time = pd.Timestamp(""2014-09-05"")

    Initialize a dataset with multiple dimensions:

    >>> ds = xr.Dataset(
    ...     data_vars=dict(
    ...         temperature=([""x"", ""y"", ""time""], temperature),
    ...         precipitation=([""x"", ""y"", ""time""], precipitation),
    ...     ),
    ...     coords=dict(
    ...         lon=([""x"", ""y""], lon),
    ...         lat=([""x"", ""y""], lat),
    ...         time=time,
    ...         reference_time=reference_time,
    ...     ),
    ...     attrs=dict(description=""Weather related data.""),
    ... )
    >>> ds
    <xarray.Dataset>
    Dimensions:         (time: 3, x: 2, y: 2)
    Coordinates:
        lon             (x, y) float64 -99.83 -99.32 -99.79 -99.23
        lat             (x, y) float64 42.25 42.21 42.63 42.59
      * time            (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08
        reference_time  datetime64[ns] 2014-09-05
    Dimensions without coordinates: x, y
    Data variables:
        temperature     (x, y, time) float64 29.11 18.2 22.83 ... 18.28 16.15 26.63
        precipitation   (x, y, time) float64 5.68 9.256 0.7104 ... 7.992 4.615 7.805
    Attributes:
        description:  Weather related data.

    Find out where the coldest temperature was and what values the
    other variables had:

    >>> ds.isel(ds.temperature.argmin(...))
    <xarray.Dataset>
    Dimensions:         ()
    Coordinates:
        lon             float64 -99.32
        lat             float64 42.21
        time            datetime64[ns] 2014-09-08
        reference_time  datetime64[ns] 2014-09-05
    Data variables:
        temperature     float64 7.182
        precipitation   float64 8.326
    Attributes:
        description:  Weather related data.
    """"""

    __slots__ = ['foo']

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def chunk(
        self,
        chunks: Union[None, Number, str, Mapping[Hashable, Union[None, Number, str, Tuple[Number, ...]]],] = None,
        name_prefix: str = ""xarray-"",
        token: str = None,
        lock: bool = False,
    ) -> ""Dataset"":
        """"""Coerce all arrays in this dataset into dask arrays with the given
        chunks.

        Non-dask arrays in this dataset will be converted to dask arrays. Dask
        arrays will be rechunked to the given chunk sizes.

        If neither chunks is not provided for one or more dimensions, chunk
        sizes along that dimension will not be updated; non-dask arrays will be
        converted into dask arrays with a single block.

        Parameters
        ----------
        chunks : int, 'auto' or mapping, optional
            Chunk sizes along each dimension, e.g., ``5`` or
            ``{""x"": 5, ""y"": 5}``.
        name_prefix : str, optional
            Prefix for the name of any new dask arrays.
        token : str, optional
            Token uniquely identifying this dataset.
        lock : optional
            Passed on to :py:func:`dask.array.from_array`, if the array is not
            already as dask array.

        Returns
        -------
        chunked : xarray.Dataset
        """"""

        if isinstance(chunks, (Number, str)):
            chunks = dict.fromkeys(self.dims, chunks)

        if isinstance(chunks, (tuple, list)):
            chunks = dict(zip(self.dims, chunks))

        if chunks is not None:
            bad_dims = chunks.keys() - self.dims.keys()
            if bad_dims:
                raise ValueError(""some chunks keys are not dimensions on this "" ""object: %s"" % bad_dims)

        variables = {k: _maybe_chunk(k, v, chunks, token, lock, name_prefix) for k, v in self.variables.items()}
        return self._replace(variables)

xr.core.dataarray.Dataset = Dataset
xr.Dataset = Dataset
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,758606082
https://github.com/pydata/xarray/pull/4659#issuecomment-740032261,https://api.github.com/repos/pydata/xarray/issues/4659,740032261,MDEyOklzc3VlQ29tbWVudDc0MDAzMjI2MQ==,29051639,2020-12-07T16:36:36Z,2020-12-07T16:36:36Z,CONTRIBUTOR,"I've added `dask_dataframe_type = (dask.dataframe.core.DataFrame,)` to pycompat but now see: `ImportError: cannot import name 'dask_dataframe_type'` despite it being in there","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,758606082
https://github.com/pydata/xarray/pull/4659#issuecomment-740020080,https://api.github.com/repos/pydata/xarray/issues/4659,740020080,MDEyOklzc3VlQ29tbWVudDc0MDAyMDA4MA==,29051639,2020-12-07T16:17:25Z,2020-12-07T16:17:25Z,CONTRIBUTOR,"That makes sense, thanks @keewis ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,758606082
https://github.com/pydata/xarray/pull/4659#issuecomment-740002632,https://api.github.com/repos/pydata/xarray/issues/4659,740002632,MDEyOklzc3VlQ29tbWVudDc0MDAwMjYzMg==,29051639,2020-12-07T15:49:00Z,2020-12-07T15:49:00Z,CONTRIBUTOR,"Thanks, yes I need to load the library for type-hinting and type checks.

When you say `dask_compat` is that the same as `dask_array_compat`? How would I use them instead of Dask, could I use say from `dask_compat.dataframe.core import DataFrame as ddf` instead of `from dask.dataframe.core import DataFrame as ddf`?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,758606082
https://github.com/pydata/xarray/issues/3929#issuecomment-739991914,https://api.github.com/repos/pydata/xarray/issues/3929,739991914,MDEyOklzc3VlQ29tbWVudDczOTk5MTkxNA==,29051639,2020-12-07T15:32:01Z,2020-12-07T15:32:01Z,CONTRIBUTOR,I've added a [PR](https://github.com/pydata/xarray/pull/4659) for the new feature but it's currently failing tests as the test-suite doesn't seem to have Dask installed. Any advice on how to get this PR prepared for merging would be appreciated.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,593029940
https://github.com/pydata/xarray/pull/4659#issuecomment-739988806,https://api.github.com/repos/pydata/xarray/issues/4659,739988806,MDEyOklzc3VlQ29tbWVudDczOTk4ODgwNg==,29051639,2020-12-07T15:27:10Z,2020-12-07T15:27:10Z,CONTRIBUTOR,"During testing I'm currently encountering the issue: `ModuleNotFoundError: No module named 'dask'`

How should testing of dask DataArrays be approached?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,758606082
https://github.com/pydata/xarray/issues/3929#issuecomment-739904265,https://api.github.com/repos/pydata/xarray/issues/3929,739904265,MDEyOklzc3VlQ29tbWVudDczOTkwNDI2NQ==,29051639,2020-12-07T13:01:57Z,2020-12-07T13:02:20Z,CONTRIBUTOR,"One of the things I was hoping to include in my approach is the preservation of the column dimension names, however if I was to use `Dataset.to_array` it would just be called variable. This is pretty minor though and a wrapper could be used to get around it.

Thanks for the advice @shoyer, I reached a similar opinion and so have been working on the dim compute route.

The issue is that a Dask array's shape uses np.nan for uncomputed dimensions, rather than leaving a delayed object like the Dask dataframe's shape. I looked into returning the dask dataframe rather than dask array but this didn't feel like it fit with the rest of the code and produced another issue as dask dataframes don't have a dtype attribute. I'll continue to look into alternatives.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,593029940
https://github.com/pydata/xarray/pull/4653#issuecomment-739338154,https://api.github.com/repos/pydata/xarray/issues/4653,739338154,MDEyOklzc3VlQ29tbWVudDczOTMzODE1NA==,29051639,2020-12-05T19:18:10Z,2020-12-05T19:18:10Z,CONTRIBUTOR,Nothing like a transient error to keep everyone on their toes. Thanks again!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,757751542
https://github.com/pydata/xarray/pull/4653#issuecomment-739336219,https://api.github.com/repos/pydata/xarray/issues/4653,739336219,MDEyOklzc3VlQ29tbWVudDczOTMzNjIxOQ==,29051639,2020-12-05T19:10:27Z,2020-12-05T19:10:27Z,CONTRIBUTOR,"Thanks @dcherian, out of interest what would I have had to have done to remove that test failure?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,757751542
https://github.com/pydata/xarray/issues/3929#issuecomment-739334281,https://api.github.com/repos/pydata/xarray/issues/3929,739334281,MDEyOklzc3VlQ29tbWVudDczOTMzNDI4MQ==,29051639,2020-12-05T18:52:49Z,2020-12-05T18:52:49Z,CONTRIBUTOR,"For context this is the function I'm using to convert the Dask DataFrame to a DataArray.

```python
def from_dask_dataframe(df, index_name=None, columns_name=None):
    def extract_dim_name(df, dim='index'):
        if getattr(df, dim).name is None:
            getattr(df, dim).name = dim

        dim_name = getattr(df, dim).name

        return dim_name
    
    if index_name is None:
        index_name = extract_dim_name(df, 'index')
    if columns_name is None:
        columns_name = extract_dim_name(df, 'columns')
        
    da = xr.DataArray(df, coords=[df.index, df.columns], dims=[index_name, columns_name])
    
    return da

df.index.name = 'datetime'
df.columns.name = 'fueltypes'

da = from_dask_dataframe(df)
```

I'm also conscious that my question is different to @raybellwaves' as they were asking about Dataset creation and I'm interested in creating a DataArray which requires different functionality. I'm assuming this is the correct place to post though as @keewis closed my issue and linked to this one.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,593029940
https://github.com/pydata/xarray/issues/4650#issuecomment-739330830,https://api.github.com/repos/pydata/xarray/issues/4650,739330830,MDEyOklzc3VlQ29tbWVudDczOTMzMDgzMA==,29051639,2020-12-05T18:23:10Z,2020-12-05T18:23:10Z,CONTRIBUTOR,Have started to implement this but will continue the discussion in [3929](https://github.com/pydata/xarray/issues/3929#issuecomment-739330558),"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,757660307
https://github.com/pydata/xarray/issues/3929#issuecomment-739330558,https://api.github.com/repos/pydata/xarray/issues/3929,739330558,MDEyOklzc3VlQ29tbWVudDczOTMzMDU1OA==,29051639,2020-12-05T18:20:33Z,2020-12-05T18:20:33Z,CONTRIBUTOR,"I've been trying to implement this and have managed to create a `xarray.core.dataarray.DataArray` object from a dask dataframe. The issue I'm encountering is that whilst I've enabled it to pass the coords and dims checks (by computing any elements in the shape or coords tuples with `.compute`), the variable that is assigned to `self._variable` still has an NaN in the shape.

The modifications I've made so far are adding the following above line 400 in [dataarray.py](https://github.com/pydata/xarray/blob/master/xarray/core/dataarray.py):
```python
shape = tuple([
    dim_size.compute() 
    if hasattr(dim_size, 'compute') 
    else dim_size 
    for dim_size 
    in data.shape
    ])

coords = tuple([
    coord.compute() 
    if hasattr(coord, 'compute') 
    else coord 
    for coord 
    in coords
    ])
```

and on line 403 by replacing `data.shape` with `shape` that was created in the previous step.

The issue I have is that when I then want to use the DataArray and do something like `da.sel(datetime='2020-01-01')` I get the error:
```python
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-23-5d739a721388> in <module>
----> 1 da.sel(datetime='2020')

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataarray.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   1219 
   1220         """"""
-> 1221         ds = self._to_temp_dataset().sel(
   1222             indexers=indexers,
   1223             drop=drop,

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataarray.py in _to_temp_dataset(self)
    499 
    500     def _to_temp_dataset(self) -> Dataset:
--> 501         return self._to_dataset_whole(name=_THIS_ARRAY, shallow_copy=False)
    502 
    503     def _from_temp_dataset(

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataarray.py in _to_dataset_whole(self, name, shallow_copy)
    551 
    552         coord_names = set(self._coords)
--> 553         dataset = Dataset._construct_direct(variables, coord_names, indexes=indexes)
    554         return dataset
    555 

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataset.py in _construct_direct(cls, variables, coord_names, dims, attrs, indexes, encoding, file_obj)
    959         """"""
    960         if dims is None:
--> 961             dims = calculate_dimensions(variables)
    962         obj = object.__new__(cls)
    963         obj._variables = variables

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataset.py in calculate_dimensions(variables)
    207                     ""conflicting sizes for dimension %r: ""
    208                     ""length %s on %r and length %s on %r""
--> 209                     % (dim, size, k, dims[dim], last_used[dim])
    210                 )
    211     return dims

ValueError: conflicting sizes for dimension 'datetime': length nan on <this-array> and length 90386 on 'datetime'
```

This occurs due to the construction of `Variable(dims, data, attrs, fastpath=True)` on line 404, which converts the data to a numpy array on line 244 of [variable.py](https://github.com/pydata/xarray/blob/master/xarray/core/variable.py).

I'm assuming there's an alternative way to construct `Variable` that is dask friendly but I couldn't find anything searching around, including areas that are using dask like open_dataset with chunks. Any advice on how to get around this would be much appreciated!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,593029940
https://github.com/pydata/xarray/issues/4650#issuecomment-739322106,https://api.github.com/repos/pydata/xarray/issues/4650,739322106,MDEyOklzc3VlQ29tbWVudDczOTMyMjEwNg==,29051639,2020-12-05T17:09:23Z,2020-12-05T17:09:23Z,CONTRIBUTOR,"Thanks, I saw [dask/dask#6058](https://github.com/dask/dask/issues/6058) but missed [#3929](https://github.com/pydata/xarray/issues/3929).

If I'm understanding you correctly there should be no problem passing a dask array for the data parameters its just the dims/coords. If the `_infer_coords_and_dims` method on DataArrays was adapted to check for any dask.delayed elements and compute them would that enable this functionality or are there additional blockers? Thanks for your help with this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,757660307