home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 739330558

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/3929#issuecomment-739330558 https://api.github.com/repos/pydata/xarray/issues/3929 739330558 MDEyOklzc3VlQ29tbWVudDczOTMzMDU1OA== 29051639 2020-12-05T18:20:33Z 2020-12-05T18:20:33Z CONTRIBUTOR

I've been trying to implement this and have managed to create a xarray.core.dataarray.DataArray object from a dask dataframe. The issue I'm encountering is that whilst I've enabled it to pass the coords and dims checks (by computing any elements in the shape or coords tuples with .compute), the variable that is assigned to self._variable still has an NaN in the shape.

The modifications I've made so far are adding the following above line 400 in dataarray.py: ```python shape = tuple([ dim_size.compute() if hasattr(dim_size, 'compute') else dim_size for dim_size in data.shape ])

coords = tuple([ coord.compute() if hasattr(coord, 'compute') else coord for coord in coords ]) ```

and on line 403 by replacing data.shape with shape that was created in the previous step.

The issue I have is that when I then want to use the DataArray and do something like da.sel(datetime='2020-01-01') I get the error: ```python


ValueError Traceback (most recent call last) <ipython-input-23-5d739a721388> in <module> ----> 1 da.sel(datetime='2020')

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataarray.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs) 1219 1220 """ -> 1221 ds = self._to_temp_dataset().sel( 1222 indexers=indexers, 1223 drop=drop,

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataarray.py in _to_temp_dataset(self) 499 500 def _to_temp_dataset(self) -> Dataset: --> 501 return self._to_dataset_whole(name=_THIS_ARRAY, shallow_copy=False) 502 503 def _from_temp_dataset(

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataarray.py in _to_dataset_whole(self, name, shallow_copy) 551 552 coord_names = set(self._coords) --> 553 dataset = Dataset._construct_direct(variables, coord_names, indexes=indexes) 554 return dataset 555

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataset.py in _construct_direct(cls, variables, coord_names, dims, attrs, indexes, encoding, file_obj) 959 """ 960 if dims is None: --> 961 dims = calculate_dimensions(variables) 962 obj = object.new(cls) 963 obj._variables = variables

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataset.py in calculate_dimensions(variables) 207 "conflicting sizes for dimension %r: " 208 "length %s on %r and length %s on %r" --> 209 % (dim, size, k, dims[dim], last_used[dim]) 210 ) 211 return dims

ValueError: conflicting sizes for dimension 'datetime': length nan on <this-array> and length 90386 on 'datetime' ```

This occurs due to the construction of Variable(dims, data, attrs, fastpath=True) on line 404, which converts the data to a numpy array on line 244 of variable.py.

I'm assuming there's an alternative way to construct Variable that is dask friendly but I couldn't find anything searching around, including areas that are using dask like open_dataset with chunks. Any advice on how to get around this would be much appreciated!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  593029940
Powered by Datasette · Queries took 158.987ms · About: xarray-datasette