home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

1 row where issue = 1517575123 and user = 85181086 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • akanshajais · 1 ✖

issue 1

  • Implement `DataArray.to_dask_dataframe()` · 1 ✖

author_association 1

  • NONE 1
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1494027739 https://github.com/pydata/xarray/issues/7409#issuecomment-1494027739 https://api.github.com/repos/pydata/xarray/issues/7409 IC_kwDOAMm_X85ZDQ3b akanshajais 85181086 2023-04-03T09:58:10Z 2023-04-03T09:58:10Z NONE

@gcaria , Your solution looks like a reasonable approach to convert a 1D or 2D chunked DataArray to a dask DataFrame or Series, respectively. this solution will only work if the DataArray is chunked along one or both dimensions. If the DataArray is not chunked, then calling to_dask() will return an equivalent in-memory pandas DataFrame or Series.

One potential improvement you could make is to add a check to ensure that the chunking is valid for conversion to a dask DataFrame or Series. For example, if the chunk sizes are too small, the overhead of parallelism may outweigh the benefits.

Here's an updated version of your code that includes this check:

``` import dask.dataframe as dkd import xarray as xr from typing import Union

def to_dask(da: xr.DataArray) -> Union[dkd.Series, dkd.DataFrame]:

if da.data.ndim > 2:
    raise ValueError(f"Can only convert 1D and 2D DataArrays, found {da.data.ndim} dimensions")

# Check that the chunk sizes are not too small
min_chunk_size = 100_000  # Adjust as needed
if any(cs < min_chunk_size for cs in da.data.chunks):
    raise ValueError("Chunk sizes are too small for conversion to dask DataFrame/Series")

indexes = [da.get_index(dim) for dim in da.dims]
darr_index = dkd.from_array(indexes[0], chunks=da.data.chunks[0])
columns = [da.name] if da.data.ndim == 1 else indexes[1]
ddf = dkd.from_dask_array(da.data, columns=columns)
ddf[indexes[0].name] = darr_index
return ddf.set_index(indexes[0].name).squeeze()

``` This code adds a check to ensure that the chunk sizes are not too small (in this case, we've set the minimum chunk size to 100,000). If any of the chunks have a size smaller than the minimum, then the function raises a ValueError. You can adjust the minimum chunk size as needed for your specific use case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement `DataArray.to_dask_dataframe()` 1517575123

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.774ms · About: xarray-datasette