home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

14 rows where issue = 166439490 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • crusaderky 7
  • shoyer 6
  • stale[bot] 1

author_association 2

  • MEMBER 13
  • NONE 1

issue 1

  • unstack() sorts data alphabetically · 14 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
457183389 https://github.com/pydata/xarray/issues/906#issuecomment-457183389 https://api.github.com/repos/pydata/xarray/issues/906 MDEyOklzc3VlQ29tbWVudDQ1NzE4MzM4OQ== stale[bot] 26384082 2019-01-24T12:43:22Z 2019-01-24T12:43:22Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  unstack() sorts data alphabetically 166439490
269507466 https://github.com/pydata/xarray/issues/906#issuecomment-269507466 https://api.github.com/repos/pydata/xarray/issues/906 MDEyOklzc3VlQ29tbWVudDI2OTUwNzQ2Ng== shoyer 1217238 2016-12-28T17:09:23Z 2016-12-28T17:09:23Z MEMBER

@crusaderky can you raise the issue again on the pandas issue tracker (see my comment in https://github.com/pandas-dev/pandas/issues/14903#issuecomment-267779151)? If need be, we can change this separately, but all things being equal I would prefer to keep unstack() consistent between pandas and xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  unstack() sorts data alphabetically 166439490
269479071 https://github.com/pydata/xarray/issues/906#issuecomment-269479071 https://api.github.com/repos/pydata/xarray/issues/906 MDEyOklzc3VlQ29tbWVudDI2OTQ3OTA3MQ== crusaderky 6213168 2016-12-28T13:46:19Z 2016-12-28T13:46:19Z MEMBER

@shoyer, are you happy for me to go ahead and change unstack() to respect the order of the first found series?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  unstack() sorts data alphabetically 166439490
234687071 https://github.com/pydata/xarray/issues/906#issuecomment-234687071 https://api.github.com/repos/pydata/xarray/issues/906 MDEyOklzc3VlQ29tbWVudDIzNDY4NzA3MQ== crusaderky 6213168 2016-07-23T00:27:49Z 2016-07-23T00:27:49Z MEMBER

Thanks, didn't know

https://gist.github.com/crusaderky/002ba64ee270164931d32ea3366dce1f

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  unstack() sorts data alphabetically 166439490
234686759 https://github.com/pydata/xarray/issues/906#issuecomment-234686759 https://api.github.com/repos/pydata/xarray/issues/906 MDEyOklzc3VlQ29tbWVudDIzNDY4Njc1OQ== shoyer 1217238 2016-07-23T00:24:17Z 2016-07-23T00:24:17Z MEMBER

@crusaderky gist.github.com will render ipynb files, which makes them much easier to view!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  unstack() sorts data alphabetically 166439490
234686438 https://github.com/pydata/xarray/issues/906#issuecomment-234686438 https://api.github.com/repos/pydata/xarray/issues/906 MDEyOklzc3VlQ29tbWVudDIzNDY4NjQzOA== crusaderky 6213168 2016-07-23T00:20:41Z 2016-07-23T00:20:41Z MEMBER

Fixed in attachment. The code uses the first found series as the order.

proper_unstack.zip

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  unstack() sorts data alphabetically 166439490
234004910 https://github.com/pydata/xarray/issues/906#issuecomment-234004910 https://api.github.com/repos/pydata/xarray/issues/906 MDEyOklzc3VlQ29tbWVudDIzNDAwNDkxMA== crusaderky 6213168 2016-07-20T16:33:15Z 2016-07-20T16:33:15Z MEMBER

I see. I'll see if I can think a good way to cope with your two examples. BTW, my code above is buggy as it blindly assumes that the first dim is also the outermost.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  unstack() sorts data alphabetically 166439490
233994941 https://github.com/pydata/xarray/issues/906#issuecomment-233994941 https://api.github.com/repos/pydata/xarray/issues/906 MDEyOklzc3VlQ29tbWVudDIzMzk5NDk0MQ== shoyer 1217238 2016-07-20T15:58:15Z 2016-07-20T15:58:15Z MEMBER

Here are two examples where we would need to do pick-by-index on the data no matter what:

python def demo_unstack(index): index = pandas.MultiIndex.from_tuples(index, names=['x', 'count']) s = pandas.Series(list(range(len(index))), index) print(s.unstack())

There is no order for one or more of the levels would be sorted:

python demo_unstack([ ['x0', 'first' ], ['x0', 'second'], ['x0', 'third' ], ['x1', 'third' ], ['x1', 'second'], ['x1', 'first' ], ])

count first second third x x0 0 1 2 x1 5 4 3 In [ ]:

Even more pathological: the multi-index doesn't even fill out every value in the cartesian product:

python demo_unstack([ ['x1', 'first' ], ['x1', 'second'], ['x0', 'first' ], ])

count first second x x0 2.0 NaN x1 0.0 1.0

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  unstack() sorts data alphabetically 166439490
233904555 https://github.com/pydata/xarray/issues/906#issuecomment-233904555 https://api.github.com/repos/pydata/xarray/issues/906 MDEyOklzc3VlQ29tbWVudDIzMzkwNDU1NQ== crusaderky 6213168 2016-07-20T09:52:42Z 2016-07-20T09:52:42Z MEMBER

This preamble should be integrated inside unstack():

``` python import operator from functools import reduce

def proper_unstack(array, dim):

# Regenerate Pandas multi-index to be ordered by appearance
# TODO: check that the stacked coords repeat periodically
# TODO: write a faster/cleaner algorithm using numpy
mindex = array.coords[dim].to_pandas().index

levels = []
labels = []
for dim_i, (levels_i, labels_i) in enumerate(zip(mindex.levels, mindex.labels)):
    step_inner = reduce(operator.mul, (len(lvl) for lvl in mindex.levels[dim_i + 1:]), 1)
    step_outer = reduce(operator.mul, (len(lvl) for lvl in mindex.levels[:dim_i]), 1)

    levels.append([levels_i[labels_i[j]] for j in range(0, levels_i.size * step_inner, step_inner)])
    labels.append(reduce(operator.add, ([j] * step_inner for j in range(levels_i.size))) * step_outer)

mindex = pandas.MultiIndex(levels, labels, names=mindex.names)
array = array.copy()
array.coords[dim] = mindex
return array.unstack(dim)

proper_unstack(a, 'dim_0') ```

<xarray.DataArray (x: 2, count: 4)> array([[0, 1, 2, 3], [4, 5, 6, 7]]) Coordinates: * x (x) object 'x1' 'x0' * count (count) object 'first' 'second' 'third' 'fourth'

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  unstack() sorts data alphabetically 166439490
233888081 https://github.com/pydata/xarray/issues/906#issuecomment-233888081 https://api.github.com/repos/pydata/xarray/issues/906 MDEyOklzc3VlQ29tbWVudDIzMzg4ODA4MQ== crusaderky 6213168 2016-07-20T08:42:19Z 2016-07-20T08:42:19Z MEMBER

the order of appearance should be what dictates the output.

in the worst case (e.g., random order for the MultiIndex) we'll have this issue no matter what rule we pick for assigning unstacked coordinates.

Not true. Using the order of appearance requires you to do a pick-by-index on the index. At the moment, you're doing a pick-by-index on the data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  unstack() sorts data alphabetically 166439490
233797167 https://github.com/pydata/xarray/issues/906#issuecomment-233797167 https://api.github.com/repos/pydata/xarray/issues/906 MDEyOklzc3VlQ29tbWVudDIzMzc5NzE2Nw== shoyer 1217238 2016-07-19T23:29:57Z 2016-07-19T23:29:57Z MEMBER

You're basically doing a pick-by-index rebuild of the array, which does potentially random access to the whole input array - thus nullifying the benefits of the CPU cache. This is compared to a numpy.ndarray.reshape(), which has the cost of a memcpy().

This is true, but in the worst case (e.g., random order for the MultiIndex) we'll have this issue no matter what rule we pick for assigning unstacked coordinates.

I was going to add something about doing pick-by-index with a dask array will be even worse, when I realised that multiindex does not work at all when you chunk()... :(

MultiIndex should work with dask -- we have a few tests for this. If not, a bug report would be appreciated!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  unstack() sorts data alphabetically 166439490
233796557 https://github.com/pydata/xarray/issues/906#issuecomment-233796557 https://api.github.com/repos/pydata/xarray/issues/906 MDEyOklzc3VlQ29tbWVudDIzMzc5NjU1Nw== shoyer 1217238 2016-07-19T23:26:33Z 2016-07-19T23:26:33Z MEMBER

What behavior would you suggest as an alternative? I suppose that in principle we could assign new levels based on order of appearance (and treat levels as an implementation detail), but it's worth noting that this behavior for unstack() matches how pandas works:

```

s.unstack() count first fourth second third x
x0 4 7 5 6 x1 0 3 1 2 ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  unstack() sorts data alphabetically 166439490
233794061 https://github.com/pydata/xarray/issues/906#issuecomment-233794061 https://api.github.com/repos/pydata/xarray/issues/906 MDEyOklzc3VlQ29tbWVudDIzMzc5NDA2MQ== crusaderky 6213168 2016-07-19T23:11:57Z 2016-07-19T23:11:57Z MEMBER

this workaround works:

python index2 = pandas.MultiIndex( levels=[['x0', 'x1'], ['first', 'second', 'third', 'fourth']], labels=[[0,0,0,0,1,1,1,1], [0,1,2,3,0,1,2,3]], names=['x', 'count']) xarray.DataArray(pandas.Series(list(range(8)), index2)).unstack('dim_0')

<xarray.DataArray (x: 2, count: 4)> array([[0, 1, 2, 3], [4, 5, 6, 7]], dtype=int64) Coordinates: * x (x) object 'x0' 'x1' * count (count) object 'first' 'second' 'third' 'fourth'

However, I think that the whole thing is incredibly convoluted. Namely, because everything looks good both if you visualize the original pandas Series/DataFrame, as well as the stacked DataArray. unstack() is causing an internal technicality of pandas to produce real change in the data.

I came through this issue because I am using pandas to load a multi-index CSV from disk, and then convert it to a n-dimensional xarray. In this situation, I have no control over the multiindex - short of manually rebuilding it after the CSV load. The pandas dataframe looks right, the stacked xarray looks right, the unstacked xarray gets magically sorted :$

Also I don't understand why you say there's no performance implications. You're basically doing a pick-by-index rebuild of the array, which does potentially random access to the whole input array - thus nullifying the benefits of the CPU cache. This is compared to a numpy.ndarray.reshape(), which has the cost of a memcpy().

I was going to add something about doing pick-by-index with a dask array will be even worse, when I realised that multiindex does not work at all when you chunk()... :(

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  unstack() sorts data alphabetically 166439490
233776163 https://github.com/pydata/xarray/issues/906#issuecomment-233776163 https://api.github.com/repos/pydata/xarray/issues/906 MDEyOklzc3VlQ29tbWVudDIzMzc3NjE2Mw== shoyer 1217238 2016-07-19T21:45:33Z 2016-07-19T21:45:33Z MEMBER

unstack sorts the data by the order of labels on the levels attribute on the MultiIndex. We don't calculate the order when calling unstack, so there shouldn't be any performance concerns on this side.

By default, pandas.MultiIndex creates each level in levels in sorted order, which is sometimes necessary to ensure indexing (especially slicing) works properly. But if you like, you can control this explicitly by using the MultiIndex constructor directly, e.g., index = pandas.MultiIndex(levels, labels). Does that solve your use case here?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  unstack() sorts data alphabetically 166439490

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.044ms · About: xarray-datasette