home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where issue = 190683531 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • shoyer 3
  • guziy 2
  • j08lue 2

author_association 2

  • CONTRIBUTOR 4
  • MEMBER 3

issue 1

  • groupby with datetime DataArray fails with `AttributeError` · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
268021977 https://github.com/pydata/xarray/issues/1132#issuecomment-268021977 https://api.github.com/repos/pydata/xarray/issues/1132 MDEyOklzc3VlQ29tbWVudDI2ODAyMTk3Nw== guziy 900941 2016-12-19T17:15:17Z 2016-12-19T17:15:17Z CONTRIBUTOR

Thanks, I'll try to use the github version.

Cheers

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby with datetime DataArray fails with `AttributeError` 190683531
268021188 https://github.com/pydata/xarray/issues/1132#issuecomment-268021188 https://api.github.com/repos/pydata/xarray/issues/1132 MDEyOklzc3VlQ29tbWVudDI2ODAyMTE4OA== shoyer 1217238 2016-12-19T17:12:16Z 2016-12-19T17:12:16Z MEMBER

@guziy This should be fixed by #1133, which will be part of the next release.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby with datetime DataArray fails with `AttributeError` 190683531
268020632 https://github.com/pydata/xarray/issues/1132#issuecomment-268020632 https://api.github.com/repos/pydata/xarray/issues/1132 MDEyOklzc3VlQ29tbWVudDI2ODAyMDYzMg== guziy 900941 2016-12-19T17:10:09Z 2016-12-19T17:10:09Z CONTRIBUTOR

Hi:

Here I have an example that have worked until recently...

https://github.com/guziy/PyNotebooks/blob/master/xarray/test_grouping.ipynb

Of course there is a resample function, but sometimes you want grouping...

Are there any plans to add a function as a key to the groupby function?

Cheers

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby with datetime DataArray fails with `AttributeError` 190683531
262557502 https://github.com/pydata/xarray/issues/1132#issuecomment-262557502 https://api.github.com/repos/pydata/xarray/issues/1132 MDEyOklzc3VlQ29tbWVudDI2MjU1NzUwMg== j08lue 3404817 2016-11-23T16:06:46Z 2016-11-23T16:06:46Z CONTRIBUTOR

Great, safe_cast_to_index works nicely (it passes my test). I added the change to the existing PR.

Do we need to add more groupby tests to make sure the solution is safe for other cases (e.g. other data types)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby with datetime DataArray fails with `AttributeError` 190683531
262286432 https://github.com/pydata/xarray/issues/1132#issuecomment-262286432 https://api.github.com/repos/pydata/xarray/issues/1132 MDEyOklzc3VlQ29tbWVudDI2MjI4NjQzMg== shoyer 1217238 2016-11-22T16:17:51Z 2016-11-22T16:17:51Z MEMBER

Thanks for looking into this!

Based on how factorize works (with specialized handling for pandas dtypes), I think the most robust behavior would be to pass in a pandas.Index. For example, this will work better if someone uses a pandas.PeriodIndex. So I would suggest wrapping arrays with safe_cast_to_index before passing them to pd.factorize.

I'm not sure if we want a .view attribute on DataArrays, but in any case it's not clear that would even fix the issue here -- pandas probably needs to a coerce to numpy arrays internally in factorize eventually anyways.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby with datetime DataArray fails with `AttributeError` 190683531
262233644 https://github.com/pydata/xarray/issues/1132#issuecomment-262233644 https://api.github.com/repos/pydata/xarray/issues/1132 MDEyOklzc3VlQ29tbWVudDI2MjIzMzY0NA== j08lue 3404817 2016-11-22T12:56:03Z 2016-11-22T13:21:04Z CONTRIBUTOR

OK, here is the minimal example:

```python import xarray as xr import pandas as pd

def test_groupby_da_datetime(): """groupby with a DataArray of dtype datetime""" # create test data times = pd.date_range('2000-01-01', periods=4) foo = xr.DataArray([1,2,3,4], coords=dict(time=times), dims='time')

# create test index
dd = times.to_datetime()
reference_dates = [dd[0], dd[2]]
labels = reference_dates[0:1]*2 + reference_dates[1:2]*2
ind = xr.DataArray(labels, coords=dict(time=times), dims='time', name='reference_date')

# group foo by ind
g = foo.groupby(ind)

# check result
actual = g.sum(dim='time')
expected = xr.DataArray([3,7], coords=dict(reference_date=reference_dates), dims='reference_date')
assert actual.to_dataset(name='foo').equals(expected.to_dataset(name='foo'))

```

Making that, I found out that the problem only occurs when the DataArray used with groupby has dtype=datetime64[ns].

The problem is that we effectively feed the DataArray to pd.factorize and that goes well for most data types: Pandas checks with the function needs_i8_conversion whether it can factorize the DataArray and decides YES for our datetime64[ns]. But then in pd.factorize it fails because it tries to access DataArray.view to convert to int64.

So as I see it there are three possible solutions to this: 1. Make Pandas' pd.factorize handle our datetime DataArrays better, 2. Add an attribute .view to DataArrays, or 3. Use the solution in the above PR, which means feeding only the NumPy .values to pd.factorize.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby with datetime DataArray fails with `AttributeError` 190683531
261979024 https://github.com/pydata/xarray/issues/1132#issuecomment-261979024 https://api.github.com/repos/pydata/xarray/issues/1132 MDEyOklzc3VlQ29tbWVudDI2MTk3OTAyNA== shoyer 1217238 2016-11-21T15:58:29Z 2016-11-21T15:58:29Z MEMBER

This looks like a plausible fix, but what would be really helpful is a minimal, complete example that triggers the error. That should help clarify the issue and at the least, we will need that for a test case in your pull request.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby with datetime DataArray fails with `AttributeError` 190683531

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1676.614ms · About: xarray-datasette