home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "MEMBER", issue = 134359597 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: updated_at (date)

user 1

  • shoyer · 3 ✖

issue 1

  • MultiIndex and data selection · 3 ✖

author_association 1

  • MEMBER · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
191826844 https://github.com/pydata/xarray/issues/767#issuecomment-191826844 https://api.github.com/repos/pydata/xarray/issues/767 MDEyOklzc3VlQ29tbWVudDE5MTgyNjg0NA== shoyer 1217238 2016-03-03T16:00:50Z 2016-03-03T16:00:50Z MEMBER

If you try that doing that indexing with a pandas.Series, you actually get an error message:

``` In [71]: s.loc['bar', slice(4000, 4100.3)]

...

/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis) 1363 # nested tuple slicing 1364 if is_nested_tuple(key, labels): -> 1365 locs = labels.get_locs(key) 1366 indexer = [ slice(None) ] * self.ndim 1367 indexer[axis] = locs

/Users/shoyer/conda/envs/xarray-dev/lib/python3.5/site-packages/pandas/core/index.py in get_locs(self, tup) 5692 if not self.is_lexsorted_for_tuple(tup): 5693 raise KeyError('MultiIndex Slicing requires the index to be fully lexsorted' -> 5694 ' tuple len ({0}), lexsort depth ({1})'.format(len(tup), self.lexsort_depth)) 5695 5696 # indexer KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0,) ```

I guess it's also worth investigating get_locs as an alternative or companion to get_loc_level.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex and data selection 134359597
191315935 https://github.com/pydata/xarray/issues/767#issuecomment-191315935 https://api.github.com/repos/pydata/xarray/issues/767 MDEyOklzc3VlQ29tbWVudDE5MTMxNTkzNQ== shoyer 1217238 2016-03-02T16:32:43Z 2016-03-02T16:32:43Z MEMBER

The good news about writing our own custom way to select levels is that because we can avoid the stack/unstack, we can simply omit unused levels without worrying about doing dropna with unstack. So as long as we are implementing this in own other method (e.g., sel or xs), we can default to drop_level=True.

I would be OK with xs, but da.xs('bar', dim='band_wavenumber', level='band') feels much more verbose to me than da.sel(band_wavenumber={'band': 'bar'}). The later solution involves inventing no new API, and because dictionaries are not hashable there's no potential conflict with existing functionality.

Last year at the SciPy conference sprints, @jonathanrocher was working on adding similar dictionary support into .loc in pandas (i.e., da.loc[{'band': 'band'}]). I don't think he ever finished up that PR, but he might have a branch worth looking at as a starting point.

I think that this solution is better than, e.g., directly providing index level names as arguments of the sel method. This may be confusing and there may be conflict when different dimensions have the same index level names.

This is a fair point, but such scenarios are unlikely to appear in practice. We might be able to, for example, update our handling of MultiIndexes to guarantee that level names cannot conflict with other variables. This might be done by inserting dummy-variables of some sort into the _coords dict whenever a MultiIndex is added. It would take some work to ensure this works smoothly, though.

Besides this, It would be nice if the drop_level=True behavior could be applied by default to any selection (i.e., also when using loc, sel, etc.), like with Pandas. I don't know how Pandas does this (I'll look into that), but at first glance this would here imply checking for each dimension if it has a multi-index and then checking the labels for each index level.

Yes, agreed. Unfortunately the pandas code that handles this is a complete mess of spaghetti code (see pandas/core/indexers.py). So are welcome to try decoding it, but in my opinion you might be better off starting from scratch. In xarray, the function convert_label_indexer would need an updated interface that allows it to possibly return a new pandas.Index object to replace the existing index.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex and data selection 134359597
185796850 https://github.com/pydata/xarray/issues/767#issuecomment-185796850 https://api.github.com/repos/pydata/xarray/issues/767 MDEyOklzc3VlQ29tbWVudDE4NTc5Njg1MA== shoyer 1217238 2016-02-18T16:16:00Z 2016-02-18T16:16:00Z MEMBER

This is a really good point that honestly I had not thought carefully about before. I agree that it would be very nice to have this behavior, though. This will require a bit of internal refactoring to pass on the level information to the MultiIndex during indexing.

To remove unused levels after unstacking, you need to add an explicit dropna, e.g., da.unstack('band_wavenumber').dropna('band_wavenumber'). This is definitely a break from pandas, but for a good reason (IMO). See here for discussion on this point.

I raised another issue for the bug related to copying MultiIndex that you had in the earlier version of this PR (#769).

More broadly, if you care about MultiIndex support, it would be great to get some help pushing it. I'm happy to answer questions, but I'm at a new job and don't have a lot of time to work on new development.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex and data selection 134359597

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1678.86ms · About: xarray-datasette