home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 115210260 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • shoyer · 6 ✖

issue 1

  • Display of PeriodIndex · 6 ✖

author_association 1

  • MEMBER 6
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
164635815 https://github.com/pydata/xarray/issues/645#issuecomment-164635815 https://api.github.com/repos/pydata/xarray/issues/645 MDEyOklzc3VlQ29tbWVudDE2NDYzNTgxNQ== shoyer 1217238 2015-12-15T03:39:49Z 2015-12-15T03:39:49Z MEMBER

Lazy loading, even of indices, can be pretty important -- sometimes calculating indices requiring downloading a significant amount of data over a wire. I am reluctant to change it.

However, another possible way to fix the printing issue is to guarantee that index data always gets cast to a pandas.Index before accessing it, even if the next step is simply pulling out .values (a numpy array).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Display of PeriodIndex 115210260
154195290 https://github.com/pydata/xarray/issues/645#issuecomment-154195290 https://api.github.com/repos/pydata/xarray/issues/645 MDEyOklzc3VlQ29tbWVudDE1NDE5NTI5MA== shoyer 1217238 2015-11-05T21:16:32Z 2015-11-05T21:16:32Z MEMBER

yes, exactly

On Thu, Nov 5, 2015 at 1:10 PM, Maximilian Roos notifications@github.com wrote:

OK, because we need the dtype before we've loaded the Index?

— Reply to this email directly or view it on GitHub https://github.com/xray/xray/issues/645#issuecomment-154193942.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Display of PeriodIndex 115210260
154189059 https://github.com/pydata/xarray/issues/645#issuecomment-154189059 https://api.github.com/repos/pydata/xarray/issues/645 MDEyOklzc3VlQ29tbWVudDE1NDE4OTA1OQ== shoyer 1217238 2015-11-05T20:59:08Z 2015-11-05T20:59:08Z MEMBER

Ha - maybe we'll never get there. One more push: in the comment above, can you see the differences between the two cases? One succeeds and one fails. The only difference is the length of the other coord. That's at least weird if not a bug?

Oh -- yes, I agree that is very strange. I have no idea why that is!

What are your thoughts on making that change instead, then? Or too big a blast radius without more reflection? Currently only one test fails - setting a float32 dtype.

I think this will be a little tricky to change. The main subtlety is that currently we don't actually create the pandas.Index object until we load the entire index array into memory or need to do a lookup operation on the index. I'm not entirely sure this laziness is necessary, but it might be helpful -- it lets us differ loading some data from disk (or remote data sources) until absolutely necessary. The challenge then is ensuring that dtypes are preserved if the array is cached or not -- we would need to figure out what the corresponding pandas type is even before we load the data necessary to create the Index object.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Display of PeriodIndex 115210260
154182414 https://github.com/pydata/xarray/issues/645#issuecomment-154182414 https://api.github.com/repos/pydata/xarray/issues/645 MDEyOklzc3VlQ29tbWVudDE1NDE4MjQxNA== shoyer 1217238 2015-11-05T20:34:53Z 2015-11-05T20:34:53Z MEMBER

When I originally wrote that code, pandas didn't have Float64Index and would use dtype=object. Now, the need for this sort of thing is definitely less pressing.

The unanswered question is why the code accesses the items from this coord when it's repr-ing differently, depending on the length of the other coord.

Sorry, I still don't understand exactly what you're referring to! This does sound pretty bizarre, though -- possibly a bug.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Display of PeriodIndex 115210260
154149878 https://github.com/pydata/xarray/issues/645#issuecomment-154149878 https://api.github.com/repos/pydata/xarray/issues/645 MDEyOklzc3VlQ29tbWVudDE1NDE0OTg3OA== shoyer 1217238 2015-11-05T18:46:32Z 2015-11-05T18:46:32Z MEMBER

Do you know why this line https://github.com/xray/xray/blob/master/xray/core/indexing.py#L400 isn't just value?

This line is basically there to work around cases where pandas stores an array in an index with a different dtype. For example, consider this dataset with an int32 coordinate:

In [10]: xray.Dataset({'x': np.arange(3, dtype='int32')}).x.dtype Out[10]: dtype('int32')

Under the covers, there's an int64 index (pandas doesn't have Int32Index):

In [11]: xray.Dataset({'x': np.arange(3, dtype='int32')}).indexes['x'] Out[11]: Int64Index([0, 1, 2], dtype='int64', name=u'x')

This line ensure that we cast back to the original dtype when we get .values from the data.

In this case, I think a simple fix for PandasIndexAdapter would be to update it's dtype so it reports object instead of int64 if it's holding a PeriodIndex. Then the casting should work properly.

Do you know why it's trying to pull a value from the index when it prints?

I'm not entirely sure what you're referring to here -- which line(s) of code is surprising you?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Display of PeriodIndex 115210260
153980946 https://github.com/pydata/xarray/issues/645#issuecomment-153980946 https://api.github.com/repos/pydata/xarray/issues/645 MDEyOklzc3VlQ29tbWVudDE1Mzk4MDk0Ng== shoyer 1217238 2015-11-05T07:55:27Z 2015-11-05T07:55:27Z MEMBER

I have not tried using xray with pandas's PeriodIndex before. On the whole, I'm not a really big fan of PeriodIndex -- IntervalIndex (https://github.com/pydata/pandas/pull/8707) will be a more general solution that allows for arbitrary interval bounds.

The broken thing about PeriodIndex is that it lies and claims to have int64 dtype even though it consists of Period scalars:

In [3]: pd.period_range('2000', freq='Y', periods=3).dtype Out[3]: dtype('int64')

I suppose pandas is unlikely to fix this in the immediate (though I would argue that it really should). In the meantime, do you have any interest in working on a fix for this? I suspect this would be relatively straightforward -- you'll simply need a work around or two to explicitly handle PeriodIndex.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Display of PeriodIndex 115210260

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 160.402ms · About: xarray-datasette