home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where issue = 190683531 and user = 3404817 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • j08lue · 2 ✖

issue 1

  • groupby with datetime DataArray fails with `AttributeError` · 2 ✖

author_association 1

  • CONTRIBUTOR 2
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
262557502 https://github.com/pydata/xarray/issues/1132#issuecomment-262557502 https://api.github.com/repos/pydata/xarray/issues/1132 MDEyOklzc3VlQ29tbWVudDI2MjU1NzUwMg== j08lue 3404817 2016-11-23T16:06:46Z 2016-11-23T16:06:46Z CONTRIBUTOR

Great, safe_cast_to_index works nicely (it passes my test). I added the change to the existing PR.

Do we need to add more groupby tests to make sure the solution is safe for other cases (e.g. other data types)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby with datetime DataArray fails with `AttributeError` 190683531
262233644 https://github.com/pydata/xarray/issues/1132#issuecomment-262233644 https://api.github.com/repos/pydata/xarray/issues/1132 MDEyOklzc3VlQ29tbWVudDI2MjIzMzY0NA== j08lue 3404817 2016-11-22T12:56:03Z 2016-11-22T13:21:04Z CONTRIBUTOR

OK, here is the minimal example:

```python import xarray as xr import pandas as pd

def test_groupby_da_datetime(): """groupby with a DataArray of dtype datetime""" # create test data times = pd.date_range('2000-01-01', periods=4) foo = xr.DataArray([1,2,3,4], coords=dict(time=times), dims='time')

# create test index
dd = times.to_datetime()
reference_dates = [dd[0], dd[2]]
labels = reference_dates[0:1]*2 + reference_dates[1:2]*2
ind = xr.DataArray(labels, coords=dict(time=times), dims='time', name='reference_date')

# group foo by ind
g = foo.groupby(ind)

# check result
actual = g.sum(dim='time')
expected = xr.DataArray([3,7], coords=dict(reference_date=reference_dates), dims='reference_date')
assert actual.to_dataset(name='foo').equals(expected.to_dataset(name='foo'))

```

Making that, I found out that the problem only occurs when the DataArray used with groupby has dtype=datetime64[ns].

The problem is that we effectively feed the DataArray to pd.factorize and that goes well for most data types: Pandas checks with the function needs_i8_conversion whether it can factorize the DataArray and decides YES for our datetime64[ns]. But then in pd.factorize it fails because it tries to access DataArray.view to convert to int64.

So as I see it there are three possible solutions to this: 1. Make Pandas' pd.factorize handle our datetime DataArrays better, 2. Add an attribute .view to DataArrays, or 3. Use the solution in the above PR, which means feeding only the NumPy .values to pd.factorize.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby with datetime DataArray fails with `AttributeError` 190683531

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.445ms · About: xarray-datasette