issue_comments
24 rows where issue = 262642978 and user = 1217238 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
These facets timed out: author_association, issue
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
523240818 | https://github.com/pydata/xarray/issues/1603#issuecomment-523240818 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDUyMzI0MDgxOA== | shoyer 1217238 | 2019-08-21T00:00:43Z | 2021-03-03T16:46:25Z | MEMBER | Explicitly propagating indexes requires going through most of xarray's source code and auditing each time we create a Dataset or DataArray object with low-level operations. We have some pretty decent testing functions for this in the form of Here's our current progress:
- [x] most of |
{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
549179102 | https://github.com/pydata/xarray/issues/1603#issuecomment-549179102 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDU0OTE3OTEwMg== | shoyer 1217238 | 2019-11-03T21:12:25Z | 2019-11-03T21:12:25Z | MEMBER | I'm not working on any of these right now. You might start with a few of the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
450702503 | https://github.com/pydata/xarray/issues/1603#issuecomment-450702503 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ1MDcwMjUwMw== | shoyer 1217238 | 2019-01-01T00:54:27Z | 2019-01-01T00:54:27Z | MEMBER | I'm starting to make these changes incrementally -- the first step is in https://github.com/pydata/xarray/pull/2639. |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
444204957 | https://github.com/pydata/xarray/issues/1603#issuecomment-444204957 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0NDIwNDk1Nw== | shoyer 1217238 | 2018-12-04T18:25:33Z | 2018-12-04T18:25:33Z | MEMBER |
I discussed this is a little bit above in https://github.com/pydata/xarray/issues/1603#issuecomment-442661526, under "MultiIndex as part of the data schema". I agree that the default behavior should still be to create automatic indexes only for 1d coordinates matching dimension names. But we still will have (rare?) cases where "multiple single indexes" could arise from combining arguments with different indexes. For example, suppose the I guess the error is probably the best idea.
This is indeed the historical genesis, but I agree that this is confusing and we should deprecate/remove it. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
443044579 | https://github.com/pydata/xarray/issues/1603#issuecomment-443044579 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MzA0NDU3OQ== | shoyer 1217238 | 2018-11-30T00:24:39Z | 2018-11-30T00:24:39Z | MEMBER | I wonder if we should also change the default value of the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442965602 | https://github.com/pydata/xarray/issues/1603#issuecomment-442965602 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0Mjk2NTYwMg== | shoyer 1217238 | 2018-11-29T19:38:34Z | 2018-11-29T19:38:34Z | MEMBER | It occurs to me that for the case of "multiple single indexes" along the same dimension there is no good way to use them simultaneously for indexing/reindexing at the same time. We should explicitly raise if you try to do this. I guess we have a few options for automatic alignment with multiple single indexes, too: 1. We could only support "exact" indexing 2. We could require that aligning each index separately gives the same result (2) seems least restrictive and is probably the right choice. One advantage of not having What should the default behavior of |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442956167 | https://github.com/pydata/xarray/issues/1603#issuecomment-442956167 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0Mjk1NjE2Nw== | shoyer 1217238 | 2018-11-29T19:10:14Z | 2018-11-29T19:10:14Z | MEMBER |
I think the pandas.MultiIndex is a pretty solid data structure on a fundamental level, it just has some weird semantics for some indexing edge cases. Whether or not we write xarray.MultiIndex structure, we can achieve most of what we want with a thin layer over
Yes, I like this! Generally I like @benbovy's entire proposal :). @fujiisoup can you clarity the use-cases you have for a MultiIndex as a variable?
From a data perspective, the only thing having an Index and/or MultiIndex should change is that the data is immutable. But by necessity the nature of the index will determine which indexing operations are possible/efficient. For example, if you want to do nearest-neighbor indexing with multiple coordinates you'll need a KDTree. We should not be afraid to raise errors if an indexing operation can't be done efficiently. With regards to reindexing: I don't think this needs any special handling versus normal indexing ( Another issue: how do automatic alignment with multiple indexes? Let me suggest a straw-man proposal: We always align indexed coordinates. If a coordinate is used in different types of indexes (e.g., a base |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442710536 | https://github.com/pydata/xarray/issues/1603#issuecomment-442710536 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MjcxMDUzNg== | shoyer 1217238 | 2018-11-29T05:23:33Z | 2018-11-29T05:25:48Z | MEMBER |
This needs an important caveat: it's only true that you use Let me make a tentative proposal: we should model a MultiIndex in xarray as exactly equivalent to a sparse multi-dimensional array, except with missing elements modeled implicitly (by omission) instead of explicitly (with NaN). If we do this, I think MultiIndex semantics could be defined to be identical to those of separable Index objects. One challenge is that we will definitely have to make some intentional deviations from the behavior of pandas, at least when dealing with array indexing of a MultiIndex level. Pandas has some strange behaviors with array indexing of a MultiIndex level, and I'm honestly not sure if they are bugs or features: - It ignores missing labels (https://github.com/pandas-dev/pandas/issues/15452) - It drops duplicate labels (https://github.com/pandas-dev/pandas/issues/19414) Fortunately, the MultiIndex data model is not that complicated, and it is quite straightforward to remap indexing results from sub-Index levels onto integer codes. I suspect we will find it easier to rewrite some of these routines than to change pandas, both because pandas may not agree with different semantics and because the pandas indexing code is an unholy mess. For example, we can reproduce the above issues:
print(get_locs(index, (['a', 'a'],))) # [0, 0] print(get_locs(index, (['a', 'd'],))) # [0, -1] ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442680467 | https://github.com/pydata/xarray/issues/1603#issuecomment-442680467 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MjY4MDQ2Nw== | shoyer 1217238 | 2018-11-29T02:15:48Z | 2018-11-29T02:19:06Z | MEMBER |
The answer is the It's painfully slow for large numbers of points due to a Python loop over each point, but presumably that could be optimized:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442581754 | https://github.com/pydata/xarray/issues/1603#issuecomment-442581754 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MjU4MTc1NA== | shoyer 1217238 | 2018-11-28T19:51:42Z | 2018-11-29T00:48:53Z | MEMBER | I've been thinking about this a little more in the context of starting on the implementation (in #2195). In particular, I no longer agree with this "Separate indexers without a MultiIndex should be prohibited" from my original proposal. The problem is that the semantics of a MultiIndex are not quite the same as separate indexes, and I don't think all use-cases are well solved by always using a MultiIndex. ~~For example, I don't think it's possible to do point-wise indexing along anything other than the first level of a MultiIndex.~~ (note: this is not true, see https://github.com/pydata/xarray/issues/1603#issuecomment-442662561) Instead, I think we should make the model transparent by retaining an xarray variable for the MultiIndex, and provide APIs for explicitly converting index types. e.g., for the repr with a MultiIndex:
The main way in which this could get confusing is if you explicitly mutate the Dataset to remove some but not all of the variables corresponding to the MultiIndex (e.g., The different indicator might make sense regardless but I am also partial to "Prohibit it in our data model." The main downside is that this adds a little more complexity to the logic for determining indexes resulting from an operation (namely, verifying that all MultiIndex levels still correspond to coordinates). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442662561 | https://github.com/pydata/xarray/issues/1603#issuecomment-442662561 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MjY2MjU2MQ== | shoyer 1217238 | 2018-11-29T00:48:12Z | 2018-11-29T00:48:28Z | MEMBER |
This is clearly not true, since it works in pandas:
That said, I still don't know how to use public |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442661526 | https://github.com/pydata/xarray/issues/1603#issuecomment-442661526 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MjY2MTUyNg== | shoyer 1217238 | 2018-11-29T00:42:39Z | 2018-11-29T00:42:39Z | MEMBER | @max-sixty I like your schema vs. implementation breakdown. In general, I agree with you that it would be nice to have MultiIndex has an implementation detail rather than part of xarray's schema. But I'm not entirely sure that's feasible. Let's try to list out the pros/cons. Consider a MultiIndex 'multi' with levels 'x' and 'y':
- Advantages of MultiIndex as part of the data schema:
- There is an explicit coordinate (of tuples) corresponding to MultiIndex values, which can be returned from P.S. I haven't made much progress on this yet so there's definitely still time to figure out the right decision -- thanks for your engagement on this! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
392833478 | https://github.com/pydata/xarray/issues/1603#issuecomment-392833478 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM5MjgzMzQ3OA== | shoyer 1217238 | 2018-05-29T16:04:27Z | 2018-05-29T16:04:27Z | MEMBER | Sure, this is as good a time as any. But we'll probably need to refinish this refactoring before it makes sense to implement anything. On Tue, May 29, 2018 at 8:59 AM Alistair Miles notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
392803210 | https://github.com/pydata/xarray/issues/1603#issuecomment-392803210 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM5MjgwMzIxMA== | shoyer 1217238 | 2018-05-29T14:45:12Z | 2018-05-29T14:45:12Z | MEMBER | Yes, the index API still needs to be determined. But I think we want to support something like that. On Tue, May 29, 2018 at 1:20 AM Alistair Miles notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
392649605 | https://github.com/pydata/xarray/issues/1603#issuecomment-392649605 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM5MjY0OTYwNQ== | shoyer 1217238 | 2018-05-29T04:28:45Z | 2018-05-29T04:28:45Z | MEMBER | I started thinking about how to do this incrementally, and it occurs to me that a good place to start would be to write some of the utility functions we'll need for this:
1. Normalizing and creating default I drafted up docstrings for each of these functions and did a little bit of working starting to think through implementations in https://github.com/pydata/xarray/pull/2195. So this would be a great place for others to help out. Each of these could be separate PRs. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
379905457 | https://github.com/pydata/xarray/issues/1603#issuecomment-379905457 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM3OTkwNTQ1Nw== | shoyer 1217238 | 2018-04-09T21:52:02Z | 2018-04-11T04:34:43Z | MEMBER | I've been thinking about getting started on this. Here are my current thoughts on the right design approach. Data model
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
379937531 | https://github.com/pydata/xarray/issues/1603#issuecomment-379937531 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM3OTkzNzUzMQ== | shoyer 1217238 | 2018-04-10T00:42:19Z | 2018-04-10T00:42:19Z | MEMBER | @fujiisoup Yes, we certainly could add a "N-dimensional index", even if it has no function other than a placeholder to mark a variable as an index. This would let us restore index state after selecting/concatenating along a dimension. However, I'm not sure it would be a satisfactory solution. If we keep these indexes around like coordinates, we could end up with scalar coordinates from different dimensions. Then it's still not clear how they should stack up in the final result -- we would have the same issue we currently have with concatenating coordinates. The other concern is that existence and behavior of scalar/N-dimensional indexes could be a surprising. What does it mean to index an N-dimensional index? This operations probably cannot be supported in a sensible way, or at least not without significant effort. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
340012824 | https://github.com/pydata/xarray/issues/1603#issuecomment-340012824 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM0MDAxMjgyNA== | shoyer 1217238 | 2017-10-27T15:59:51Z | 2017-10-27T15:59:51Z | MEMBER | @jjpr-mit can you explain your use case a little more? What sort of order dependent queries do you want to do? The one that comes to mind for me are range based queries, e.g, I think it is still relatively easy to ensure a unique ordering between levels, based on the order of coordinate variables in the xarray dataset. A bigger challenge is that for efficiency, these sorts of queries depend critically on having an actual MultiIndex. This means that if indexes for each of the levels arise from different arguments that were merged together, we might need to "merge" the separate indexes into a joint MultiIndex. This could potentially be slightly expensive. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
336496995 | https://github.com/pydata/xarray/issues/1603#issuecomment-336496995 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzNjQ5Njk5NQ== | shoyer 1217238 | 2017-10-13T16:09:23Z | 2017-10-13T16:09:38Z | MEMBER |
The other advantage is that it solves many of the issues with the current
I agree, but there are probably some advantages to using a MultiIndex internally. For example, it allows for looking up on multiple levels at the same time.
I think we could get away with making For KDTree, this means we'll have to write our own wrapper |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
334229444 | https://github.com/pydata/xarray/issues/1603#issuecomment-334229444 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzNDIyOTQ0NA== | shoyer 1217238 | 2017-10-04T17:27:44Z | 2017-10-04T17:27:44Z | MEMBER |
We would still assign default indexes (using a normal Another aspect to consider how to handle alignment when you have indexes along non-dimension coordinates. Probably the most elegant rule would again be to check all indexed variables for exact matches. Directly assigning indexes rather than using this default or For performance reasons, we probably do not want to actually check the values of manually assigned indexes, although we should verify that the shape matches. (We would have a clear disclaimer that if you manually assign an index with mismatched values the behavior is not well defined.) In principle, this data model would allow for two mostly equivalent indexing schemes:
Yes, this is a little unfortunate. We could potentially make a custom wrapper for use in
Every entry in |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
334048571 | https://github.com/pydata/xarray/issues/1603#issuecomment-334048571 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzNDA0ODU3MQ== | shoyer 1217238 | 2017-10-04T04:45:07Z | 2017-10-04T04:45:07Z | MEMBER | CC @benbovy @fmaussion |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
334045987 | https://github.com/pydata/xarray/issues/1603#issuecomment-334045987 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzNDA0NTk4Nw== | shoyer 1217238 | 2017-10-04T04:19:55Z | 2017-10-04T04:20:25Z | MEMBER |
Yes, exactly. We actually already have an attribute that works like this, but it's current computed lazily, from either |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
334041813 | https://github.com/pydata/xarray/issues/1603#issuecomment-334041813 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzNDA0MTgxMw== | shoyer 1217238 | 2017-10-04T03:40:13Z | 2017-10-04T04:15:39Z | MEMBER | I sometimes find it helpful to think about what the right For example, we might imagine that "Indexes" are no longer coordinates, but instead their own entry in the repr:
"Indexes" might not even need to be part of the main In this model:
|
{ "total_count": 5, "+1": 5, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
334030279 | https://github.com/pydata/xarray/issues/1603#issuecomment-334030279 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzNDAzMDI3OQ== | shoyer 1217238 | 2017-10-04T02:03:39Z | 2017-10-04T02:03:39Z | MEMBER | One API design challenge here is that I think we still want a explicit notation of "indexed" variables. We could possibly allow operations like |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1