home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where user = 85181086 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

issue 4

  • include/exclude lists in Dataset.expand_dims 1
  • FacetGrid with coords error 1
  • Implement `DataArray.to_dask_dataframe()` 1
  • plotting facet grid with singleton dimension should create a facet grid with size 1 1

user 1

  • akanshajais · 4 ✖

author_association 1

  • NONE 4
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1494018319 https://github.com/pydata/xarray/issues/7552#issuecomment-1494018319 https://api.github.com/repos/pydata/xarray/issues/7552 IC_kwDOAMm_X85ZDOkP akanshajais 85181086 2023-04-03T09:51:16Z 2023-04-08T11:37:40Z NONE

The reason why you are getting a KeyError in your first example when trying to plot with col='y' is because when y is a dimension with length 1, xarray automatically drops the dimension and promotes its coordinates to 1D.

In your example, the y dimension has length 1, so it is dropped, and da.coords['y'] returns a scalar value rather than a 1D coordinate array. When you try to plot with col='y', xarray looks for the 'y' key in the coordinates dictionary, which doesn't exist, hence the KeyError.

To work around this, you can manually promote the y coordinate to a 1D array using the expand_dims method:

``` import xarray as xr import numpy as np

da = xr.DataArray(np.random.rand(3,1), dims=('x', 'y')) da.coords['y'] = da.coords['y'].expand_dims('y') da.plot(col='y')

``` This will ensure that the y coordinate is always a 1D array, even when the dimension it represents has length 1.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  plotting facet grid with singleton dimension should create a facet grid with size 1 1596890616
1494051039 https://github.com/pydata/xarray/issues/7333#issuecomment-1494051039 https://api.github.com/repos/pydata/xarray/issues/7333 IC_kwDOAMm_X85ZDWjf akanshajais 85181086 2023-04-03T10:12:40Z 2023-04-03T10:12:40Z NONE

The reason for the difference in behavior between the two examples is that the first example has named coordinates ('A', 'B', 'C', 'X', 'Y') while the second example has default dimension names ('dim_0', 'dim_1', 'dim_2', 'dim_3', 'dim_4').

In the first example, when calling p.map_dataarray, the x and y arguments are passed as coordinate names ('B' and 'C') instead of dimension names, which is causing the error. The correct way to pass the dimension names would be to use da.dims to get the list of dimension names and pass them as strings:

p.map_dataarray(xr.plot.pcolormesh, y=da.dims[1], x=da.dims[2])

The reason for the difference in behavior between the two examples is that the first example has named coordinates ('A', 'B', 'C', 'X', 'Y') while the second example has default dimension names ('dim_0', 'dim_1', 'dim_2', 'dim_3', 'dim_4').

In the first example, when calling p.map_dataarray, the x and y arguments are passed as coordinate names ('B' and 'C') instead of dimension names, which is causing the error. The correct way to pass the dimension names would be to use da.dims to get the list of dimension names and pass them as strings:

p.map_dataarray(xr.plot.pcolormesh, y=da.dims[1], x=da.dims[2])

In the second example, the error message is more informative because the dimension names are already being used. The error message indicates that the valid options for the x argument are None, 'dim_3', and 'dim_4', which are the dimensions with index 3 and 4 in the data array. The correct way to pass the dimension names would be:

p.map_dataarray(xr.plot.pcolormesh, y='dim_1', x='dim_2') This will map the pcolormesh plot onto the dimensions 'dim_1' and 'dim_2', which correspond to the columns and rows of the contour plot.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  FacetGrid with coords error 1468534020
1494033789 https://github.com/pydata/xarray/issues/7239#issuecomment-1494033789 https://api.github.com/repos/pydata/xarray/issues/7239 IC_kwDOAMm_X85ZDSV9 akanshajais 85181086 2023-04-03T10:01:52Z 2023-04-03T10:01:52Z NONE

a workaround for achieving this would be to use the apply method of xarray data structures along with the expand_dims method. Here's an example of how we can use it:

``` import xarray as xr dataset = xr.Dataset(data_vars={'foo': 1, 'bar': 2})

Define a function that expands the given variable along a new dimension

def expand_variable(da): if da.name == 'foo': return da.expand_dims('zar') else: return da

Use the apply method to apply the function to only the desired variables

expanded_dataset = dataset.apply(expand_variable, keep_attrs=True)

print(expanded_dataset)

``` Here, the expand_variable function is defined to only expand the variable with the name 'foo' along a new dimension. The apply method is used to apply this function to each data variable in the dataset, but only 'foo' is actually modified.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  include/exclude lists in Dataset.expand_dims 1429172192
1494027739 https://github.com/pydata/xarray/issues/7409#issuecomment-1494027739 https://api.github.com/repos/pydata/xarray/issues/7409 IC_kwDOAMm_X85ZDQ3b akanshajais 85181086 2023-04-03T09:58:10Z 2023-04-03T09:58:10Z NONE

@gcaria , Your solution looks like a reasonable approach to convert a 1D or 2D chunked DataArray to a dask DataFrame or Series, respectively. this solution will only work if the DataArray is chunked along one or both dimensions. If the DataArray is not chunked, then calling to_dask() will return an equivalent in-memory pandas DataFrame or Series.

One potential improvement you could make is to add a check to ensure that the chunking is valid for conversion to a dask DataFrame or Series. For example, if the chunk sizes are too small, the overhead of parallelism may outweigh the benefits.

Here's an updated version of your code that includes this check:

``` import dask.dataframe as dkd import xarray as xr from typing import Union

def to_dask(da: xr.DataArray) -> Union[dkd.Series, dkd.DataFrame]:

if da.data.ndim > 2:
    raise ValueError(f"Can only convert 1D and 2D DataArrays, found {da.data.ndim} dimensions")

# Check that the chunk sizes are not too small
min_chunk_size = 100_000  # Adjust as needed
if any(cs < min_chunk_size for cs in da.data.chunks):
    raise ValueError("Chunk sizes are too small for conversion to dask DataFrame/Series")

indexes = [da.get_index(dim) for dim in da.dims]
darr_index = dkd.from_array(indexes[0], chunks=da.data.chunks[0])
columns = [da.name] if da.data.ndim == 1 else indexes[1]
ddf = dkd.from_dask_array(da.data, columns=columns)
ddf[indexes[0].name] = darr_index
return ddf.set_index(indexes[0].name).squeeze()

``` This code adds a check to ensure that the chunk sizes are not too small (in this case, we've set the minimum chunk size to 100,000). If any of the chunks have a size smaller than the minimum, then the function raises a ValueError. You can adjust the minimum chunk size as needed for your specific use case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement `DataArray.to_dask_dataframe()` 1517575123

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 11.607ms · About: xarray-datasette