home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 134359597 and user = 4160723 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • benbovy · 6 ✖

issue 1

  • MultiIndex and data selection · 6 ✖

author_association 1

  • MEMBER 6
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
247031135 https://github.com/pydata/xarray/issues/767#issuecomment-247031135 https://api.github.com/repos/pydata/xarray/issues/767 MDEyOklzc3VlQ29tbWVudDI0NzAzMTEzNQ== benbovy 4160723 2016-09-14T14:28:29Z 2016-09-14T14:28:29Z MEMBER

Fixed in #802 and #947.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex and data selection 134359597
191625512 https://github.com/pydata/xarray/issues/767#issuecomment-191625512 https://api.github.com/repos/pydata/xarray/issues/767 MDEyOklzc3VlQ29tbWVudDE5MTYyNTUxMg== benbovy 4160723 2016-03-03T07:24:34Z 2016-03-03T07:24:34Z MEMBER

From this point of view I agree that da.sel(band_wavenumber={'band': 'bar'}) is a nicer solution! I'll follow your suggestion of returning a new pandas.Index object from convert_label_indexer.

Unless I miss a better solution, we can use pandas.MultiIndex.get_loc_level to get both the indexer and the new pandas.Index object. However, there may still be some advanced cases where it won't behave as expected. For example, selecting both the band 'bar' and a range of wavenumber values (that doesn't exactly match the range of that band)

da.sel(band_wavenumber={'band': 'bar', 'wavenumber': slice(4000, 4100.3)})`

will a-priori return a stacked DataArray with the full multi-index:

In [32]: idx = da.band_wavenumber.to_index() In [33]: idx.get_loc_level(('bar', slice(4000, 4100.3)), level=('band', 'wavenumber')) Out[33]: (array([False, False, True, True, False], dtype=bool), MultiIndex(levels=[['bar', 'foo'], [4050.2, 4050.3, 4100.1, 4100.3, 4100.5]], labels=[[0, 0], [2, 3]], names=['band', 'wavenumber']))

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex and data selection 134359597
191303962 https://github.com/pydata/xarray/issues/767#issuecomment-191303962 https://api.github.com/repos/pydata/xarray/issues/767 MDEyOklzc3VlQ29tbWVudDE5MTMwMzk2Mg== benbovy 4160723 2016-03-02T16:05:31Z 2016-03-02T16:05:31Z MEMBER

OK, I've read more carefully the discussion you referred to, and now I understand why it is preferable to call dropna explicitely. My last suggestion above is not compatible with this.

The xs method (not sure about the name) may still provide a concise way to perform a selection with explicit unstack and dropna. Maybe it is more appropriate to use dropna instead of drop_level:

python da.xs('bar', dim='band_wavenumber', level='band', dropna=True)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex and data selection 134359597
191239041 https://github.com/pydata/xarray/issues/767#issuecomment-191239041 https://api.github.com/repos/pydata/xarray/issues/767 MDEyOklzc3VlQ29tbWVudDE5MTIzOTA0MQ== benbovy 4160723 2016-03-02T13:31:36Z 2016-03-02T13:31:36Z MEMBER

Thinking about this issue, I'd like to know what you think of the suggestions below before considering any pull request.

The following line code gives the same result than in my previous comment, but it is more explicit and shorter:

python da.unstack('band_wavenumber').sel(band='bar').dropna('wavenumber', how='any')

A nice shortcut to this would be adding a new xs method to DataArray and Dataset, which would be quite similar to the xs method of Pandas but here with an additional dim keyword argument:

python da.xs('bar', dim='band_wavenumber', level='band', drop_level=True)

Like Pandas, the default value of drop_level would be True. But here drop_level rather sets whether or not to apply dropna to all (unstacked) index levels of dim except the specified level.

I think that this solution is better than, e.g., directly providing index level names as arguments of the sel method. This may be confusing and there may be conflict when different dimensions have the same index level names.

Another, though less elegant, solution would be to provide dictionnaries to the sel method:

python da.sel(band_wavenumber={'band': 'bar'})

Besides this, It would be nice if the drop_level=True behavior could be applied by default to any selection (i.e., also when using loc, sel, etc.), like with Pandas. I don't know how Pandas does this (I'll look into that), but at first glance this would here imply checking for each dimension if it has a multi-index and then checking the labels for each index level.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex and data selection 134359597
185826486 https://github.com/pydata/xarray/issues/767#issuecomment-185826486 https://api.github.com/repos/pydata/xarray/issues/767 MDEyOklzc3VlQ29tbWVudDE4NTgyNjQ4Ng== benbovy 4160723 2016-02-18T17:31:29Z 2016-02-18T17:31:29Z MEMBER

Thanks for the tip. So I finally obtain the desired result when selecting the band 'bar' by doing this:

In [21]: (da.sel(band_wavenumber='bar') ...: .unstack('band_wavenumber') ...: .dropna('band', how='all') ...: .dropna('wavenumber', how='any') ...: .sel(band='bar')) Out[21]: <xarray.DataArray (wavenumber: 3)> array([ 1.20000000e-04, 1.00000000e-04, 8.50000000e-05]) Coordinates: band <U3 'bar' * wavenumber (wavenumber) float64 4.1e+03 4.1e+03 4.1e+03

But it's still a lot of code to write for such a common operation.

I'd be happy to think more deeply about this and contribute to the development of this great package ! (within the limits of my skills)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex and data selection 134359597
185677090 https://github.com/pydata/xarray/issues/767#issuecomment-185677090 https://api.github.com/repos/pydata/xarray/issues/767 MDEyOklzc3VlQ29tbWVudDE4NTY3NzA5MA== benbovy 4160723 2016-02-18T11:38:53Z 2016-02-18T11:38:53Z MEMBER

Mmm now I'm wondering if the problem I explained above isn't just related to the 3rd TODO item in #719 (make levels accessible as coordinate variables).

Sorry for the post if it is the case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex and data selection 134359597

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 30.495ms · About: xarray-datasette