home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "CONTRIBUTOR", issue = 593029940 and user = 29051639 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • AyrtonB · 4 ✖

issue 1

  • Feature request xarray.Dataset.from_dask_dataframe · 4 ✖

author_association 1

  • CONTRIBUTOR · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
739991914 https://github.com/pydata/xarray/issues/3929#issuecomment-739991914 https://api.github.com/repos/pydata/xarray/issues/3929 MDEyOklzc3VlQ29tbWVudDczOTk5MTkxNA== AyrtonB 29051639 2020-12-07T15:32:01Z 2020-12-07T15:32:01Z CONTRIBUTOR

I've added a PR for the new feature but it's currently failing tests as the test-suite doesn't seem to have Dask installed. Any advice on how to get this PR prepared for merging would be appreciated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request xarray.Dataset.from_dask_dataframe 593029940
739904265 https://github.com/pydata/xarray/issues/3929#issuecomment-739904265 https://api.github.com/repos/pydata/xarray/issues/3929 MDEyOklzc3VlQ29tbWVudDczOTkwNDI2NQ== AyrtonB 29051639 2020-12-07T13:01:57Z 2020-12-07T13:02:20Z CONTRIBUTOR

One of the things I was hoping to include in my approach is the preservation of the column dimension names, however if I was to use Dataset.to_array it would just be called variable. This is pretty minor though and a wrapper could be used to get around it.

Thanks for the advice @shoyer, I reached a similar opinion and so have been working on the dim compute route.

The issue is that a Dask array's shape uses np.nan for uncomputed dimensions, rather than leaving a delayed object like the Dask dataframe's shape. I looked into returning the dask dataframe rather than dask array but this didn't feel like it fit with the rest of the code and produced another issue as dask dataframes don't have a dtype attribute. I'll continue to look into alternatives.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request xarray.Dataset.from_dask_dataframe 593029940
739334281 https://github.com/pydata/xarray/issues/3929#issuecomment-739334281 https://api.github.com/repos/pydata/xarray/issues/3929 MDEyOklzc3VlQ29tbWVudDczOTMzNDI4MQ== AyrtonB 29051639 2020-12-05T18:52:49Z 2020-12-05T18:52:49Z CONTRIBUTOR

For context this is the function I'm using to convert the Dask DataFrame to a DataArray.

```python def from_dask_dataframe(df, index_name=None, columns_name=None): def extract_dim_name(df, dim='index'): if getattr(df, dim).name is None: getattr(df, dim).name = dim

    dim_name = getattr(df, dim).name

    return dim_name

if index_name is None:
    index_name = extract_dim_name(df, 'index')
if columns_name is None:
    columns_name = extract_dim_name(df, 'columns')

da = xr.DataArray(df, coords=[df.index, df.columns], dims=[index_name, columns_name])

return da

df.index.name = 'datetime' df.columns.name = 'fueltypes'

da = from_dask_dataframe(df) ```

I'm also conscious that my question is different to @raybellwaves' as they were asking about Dataset creation and I'm interested in creating a DataArray which requires different functionality. I'm assuming this is the correct place to post though as @keewis closed my issue and linked to this one.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request xarray.Dataset.from_dask_dataframe 593029940
739330558 https://github.com/pydata/xarray/issues/3929#issuecomment-739330558 https://api.github.com/repos/pydata/xarray/issues/3929 MDEyOklzc3VlQ29tbWVudDczOTMzMDU1OA== AyrtonB 29051639 2020-12-05T18:20:33Z 2020-12-05T18:20:33Z CONTRIBUTOR

I've been trying to implement this and have managed to create a xarray.core.dataarray.DataArray object from a dask dataframe. The issue I'm encountering is that whilst I've enabled it to pass the coords and dims checks (by computing any elements in the shape or coords tuples with .compute), the variable that is assigned to self._variable still has an NaN in the shape.

The modifications I've made so far are adding the following above line 400 in dataarray.py: ```python shape = tuple([ dim_size.compute() if hasattr(dim_size, 'compute') else dim_size for dim_size in data.shape ])

coords = tuple([ coord.compute() if hasattr(coord, 'compute') else coord for coord in coords ]) ```

and on line 403 by replacing data.shape with shape that was created in the previous step.

The issue I have is that when I then want to use the DataArray and do something like da.sel(datetime='2020-01-01') I get the error: ```python


ValueError Traceback (most recent call last) <ipython-input-23-5d739a721388> in <module> ----> 1 da.sel(datetime='2020')

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataarray.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs) 1219 1220 """ -> 1221 ds = self._to_temp_dataset().sel( 1222 indexers=indexers, 1223 drop=drop,

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataarray.py in _to_temp_dataset(self) 499 500 def _to_temp_dataset(self) -> Dataset: --> 501 return self._to_dataset_whole(name=_THIS_ARRAY, shallow_copy=False) 502 503 def _from_temp_dataset(

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataarray.py in _to_dataset_whole(self, name, shallow_copy) 551 552 coord_names = set(self._coords) --> 553 dataset = Dataset._construct_direct(variables, coord_names, indexes=indexes) 554 return dataset 555

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataset.py in _construct_direct(cls, variables, coord_names, dims, attrs, indexes, encoding, file_obj) 959 """ 960 if dims is None: --> 961 dims = calculate_dimensions(variables) 962 obj = object.new(cls) 963 obj._variables = variables

~\anaconda3\envs\DataHub\lib\site-packages\xarray\core\dataset.py in calculate_dimensions(variables) 207 "conflicting sizes for dimension %r: " 208 "length %s on %r and length %s on %r" --> 209 % (dim, size, k, dims[dim], last_used[dim]) 210 ) 211 return dims

ValueError: conflicting sizes for dimension 'datetime': length nan on <this-array> and length 90386 on 'datetime' ```

This occurs due to the construction of Variable(dims, data, attrs, fastpath=True) on line 404, which converts the data to a numpy array on line 244 of variable.py.

I'm assuming there's an alternative way to construct Variable that is dask friendly but I couldn't find anything searching around, including areas that are using dask like open_dataset with chunks. Any advice on how to get around this would be much appreciated!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request xarray.Dataset.from_dask_dataframe 593029940

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 11.789ms · About: xarray-datasette