html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/3929#issuecomment-843725376,https://api.github.com/repos/pydata/xarray/issues/3929,843725376,MDEyOklzc3VlQ29tbWVudDg0MzcyNTM3Ng==,35295509,2021-05-19T03:52:00Z,2021-05-19T03:52:00Z,NONE,"I create this function which works pretty good,
idk if it is of any help:
[](see: https://stackoverflow.com/a/67595345/13592469)

```
import xarray as xr
import dask.dataframe as dd
        
def dask_2_xarray(ddf, indexname='index'):
     ds = xr.Dataset()
     ds[indexname] = ddf.index
     for key in ddf.columns:
         ds[key] = (indexname, ddf[key].to_dask_array().compute_chunk_sizes())
     return ds
            
# use:
ds = dask_2_xarray(ddf)
```

","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,593029940
https://github.com/pydata/xarray/issues/3929#issuecomment-739991914,https://api.github.com/repos/pydata/xarray/issues/3929,739991914,MDEyOklzc3VlQ29tbWVudDczOTk5MTkxNA==,29051639,2020-12-07T15:32:01Z,2020-12-07T15:32:01Z,CONTRIBUTOR,I've added a [PR](https://github.com/pydata/xarray/pull/4659) for the new feature but it's currently failing tests as the test-suite doesn't seem to have Dask installed. Any advice on how to get this PR prepared for merging would be appreciated.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,593029940
https://github.com/pydata/xarray/issues/3929#issuecomment-739904265,https://api.github.com/repos/pydata/xarray/issues/3929,739904265,MDEyOklzc3VlQ29tbWVudDczOTkwNDI2NQ==,29051639,2020-12-07T13:01:57Z,2020-12-07T13:02:20Z,CONTRIBUTOR,"One of the things I was hoping to include in my approach is the preservation of the column dimension names, however if I was to use `Dataset.to_array` it would just be called variable. This is pretty minor though and a wrapper could be used to get around it.

Thanks for the advice @shoyer, I reached a similar opinion and so have been working on the dim compute route.

The issue is that a Dask array's shape uses np.nan for uncomputed dimensions, rather than leaving a delayed object like the Dask dataframe's shape. I looked into returning the dask dataframe rather than dask array but this didn't feel like it fit with the rest of the code and produced another issue as dask dataframes don't have a dtype attribute. I'll continue to look into alternatives.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,593029940
https://github.com/pydata/xarray/issues/3929#issuecomment-739576721,https://api.github.com/repos/pydata/xarray/issues/3929,739576721,MDEyOklzc3VlQ29tbWVudDczOTU3NjcyMQ==,1217238,2020-12-06T22:36:32Z,2020-12-06T22:36:32Z,MEMBER,"It sounds like making this work well would require xarray to support ""unknown"" dimension sizes throughout the codebase. This would be a nice feature to have, but indeed would likely require pervasive changes.

The other option would be to explicitly compute the shape when converting from dask dataframes, by calling `dask_dataframe.shape[0].compute()`. This would probably be more straightforward to implement but could potentially be pretty expensive in speed/memory.

(xref https://github.com/data-apis/array-api/issues/97)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,593029940
https://github.com/pydata/xarray/issues/3929#issuecomment-739395190,https://api.github.com/repos/pydata/xarray/issues/3929,739395190,MDEyOklzc3VlQ29tbWVudDczOTM5NTE5MA==,14808389,2020-12-05T20:13:58Z,2020-12-05T20:44:42Z,MEMBER,"Thanks for investigating and working on this, @AyrtonB.

I indeed think this is the correct place to discuss this: your use case can probably be implemented by converting to a `Dataset` and then calling [`Dataset.to_array`](https://xarray.pydata.org/en/latest/generated/xarray.Dataset.to_array.html#xarray.Dataset.to_array). Actually, we currently implement most methods on `DataArray` objects by converting to a temporary single-variable `Dataset`, calling the equivalent `Dataset` method and then converting the result back to a `DataArray`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,593029940
https://github.com/pydata/xarray/issues/3929#issuecomment-739334281,https://api.github.com/repos/pydata/xarray/issues/3929,739334281,MDEyOklzc3VlQ29tbWVudDczOTMzNDI4MQ==,29051639,2020-12-05T18:52:49Z,2020-12-05T18:52:49Z,CONTRIBUTOR,"For context this is the function I'm using to convert the Dask DataFrame to a DataArray.

```python
def from_dask_dataframe(df, index_name=None, columns_name=None):
    def extract_dim_name(df, dim='index'):
        if getattr(df, dim).name is None:
            getattr(df, dim).name = dim

        dim_name = getattr(df, dim).name

        return dim_name
    
    if index_name is None:
        index_name = extract_dim_name(df, 'index')
    if columns_name is None:
        columns_name = extract_dim_name(df, 'columns')
        
    da = xr.DataArray(df, coords=[df.index, df.columns], dims=[index_name, columns_name])
    
    return da

df.index.name = 'datetime'
df.columns.name = 'fueltypes'

da = from_dask_dataframe(df)
```

I'm also conscious that my question is different to @raybellwaves' as they were asking about Dataset creation and I'm interested in creating a DataArray which requires different functionality. I'm assuming this is the correct place to post though as @keewis closed my issue and linked to this one.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,593029940
https://github.com/pydata/xarray/issues/3929#issuecomment-739330558,https://api.github.com/repos/pydata/xarray/issues/3929,739330558,MDEyOklzc3VlQ29tbWVudDczOTMzMDU1OA==,29051639,2020-12-05T18:20:33Z,2020-12-05T18:20:33Z,CONTRIBUTOR,"I've been trying to implement this and have managed to create a `xarray.core.dataarray.DataArray` object from a dask dataframe. The issue I'm encountering is that whilst I've enabled it to pass the coords and dims checks (by computing any elements in the shape or coords tuples with `.compute`), the variable that is assigned to `self._variable` still has an NaN in the shape.

The modifications I've made so far are adding the following above line 400 in [dataarray.py](https://github.com/pydata/xarray/blob/master/xarray/core/dataarray.py):
```python
shape = tuple([
    dim_size.compute() 
    if hasattr(dim_size, 'compute') 
    else dim_size 
    for dim_size 
    in data.shape
    ])

coords = tuple([
    coord.compute() 
    if hasattr(coord, 'compute') 
    else coord 
    for coord 
    in coords
    ])
```

and on line 403 by replacing `data.shape` with `shape` that was created in the previous step.

The issue I have is that when I then want to use the DataArray and do something like `da.sel(datetime='2020-01-01')` I get the error:
```python
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-23-5d739a721388> in <module>
----> 1 da.sel(datetime='2020')

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataarray.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   1219 
   1220         """"""
-> 1221         ds = self._to_temp_dataset().sel(
   1222             indexers=indexers,
   1223             drop=drop,

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataarray.py in _to_temp_dataset(self)
    499 
    500     def _to_temp_dataset(self) -> Dataset:
--> 501         return self._to_dataset_whole(name=_THIS_ARRAY, shallow_copy=False)
    502 
    503     def _from_temp_dataset(

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataarray.py in _to_dataset_whole(self, name, shallow_copy)
    551 
    552         coord_names = set(self._coords)
--> 553         dataset = Dataset._construct_direct(variables, coord_names, indexes=indexes)
    554         return dataset
    555 

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataset.py in _construct_direct(cls, variables, coord_names, dims, attrs, indexes, encoding, file_obj)
    959         """"""
    960         if dims is None:
--> 961             dims = calculate_dimensions(variables)
    962         obj = object.__new__(cls)
    963         obj._variables = variables

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataset.py in calculate_dimensions(variables)
    207                     ""conflicting sizes for dimension %r: ""
    208                     ""length %s on %r and length %s on %r""
--> 209                     % (dim, size, k, dims[dim], last_used[dim])
    210                 )
    211     return dims

ValueError: conflicting sizes for dimension 'datetime': length nan on <this-array> and length 90386 on 'datetime'
```

This occurs due to the construction of `Variable(dims, data, attrs, fastpath=True)` on line 404, which converts the data to a numpy array on line 244 of [variable.py](https://github.com/pydata/xarray/blob/master/xarray/core/variable.py).

I'm assuming there's an alternative way to construct `Variable` that is dask friendly but I couldn't find anything searching around, including areas that are using dask like open_dataset with chunks. Any advice on how to get around this would be much appreciated!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,593029940