issue_comments
68 rows where issue = 262642978 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Explicit indexes in xarray's data-model (Future of MultiIndex) · 68 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1259326037 | https://github.com/pydata/xarray/issues/1603#issuecomment-1259326037 | https://api.github.com/repos/pydata/xarray/issues/1603 | IC_kwDOAMm_X85LD8pV | benbovy 4160723 | 2022-09-27T10:50:36Z | 2022-09-27T10:50:36Z | MEMBER | Should we close this issue and continue the discussion in #6293? For anyone who wants to track the progress on this topic: https://github.com/pydata/xarray/projects/1 |
{ "total_count": 2, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 2, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
949494376 | https://github.com/pydata/xarray/issues/1603#issuecomment-949494376 | https://api.github.com/repos/pydata/xarray/issues/1603 | IC_kwDOAMm_X844mCJo | benbovy 4160723 | 2021-10-22T10:27:26Z | 2021-10-22T10:27:26Z | MEMBER |
Agreed, and both are supported by xarray actually. In case we want to keep the original dimensions like ("x", "y") in the example above, it's better to use masking. This discussion is broader than the topic covered in this issue so I'd suggest you start a new discussion if you want to further discuss this with the xarray community. Thanks. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
949485684 | https://github.com/pydata/xarray/issues/1603#issuecomment-949485684 | https://api.github.com/repos/pydata/xarray/issues/1603 | IC_kwDOAMm_X844mAB0 | weipeng1999 38346144 | 2021-10-22T10:15:39Z | 2021-10-22T10:15:39Z | NONE | So I think maintain the origin dims may do less broken on current code. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
949484507 | https://github.com/pydata/xarray/issues/1603#issuecomment-949484507 | https://api.github.com/repos/pydata/xarray/issues/1603 | IC_kwDOAMm_X844l_vb | weipeng1999 38346144 | 2021-10-22T10:14:01Z | 2021-10-22T10:14:01Z | NONE |
well, both "contain the origin dims" or just "generate another one" have its benefit. if we contain origin dims, we can ensure that: - less difference between 1d coordinate and multi dims ones, both can run like S1.sel(C1=["a", "e", "h"]) S4.sel(C2=["a", "e", "h"]) and return a new data set with origin dims ( that's why I highly not recommended the implicit one ) - return a new data set have original dims which means if you change C1 to C2, and the rest code have S_res.sel(x=[1,2,3]) still work. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
949449312 | https://github.com/pydata/xarray/issues/1603#issuecomment-949449312 | https://api.github.com/repos/pydata/xarray/issues/1603 | IC_kwDOAMm_X844l3Jg | benbovy 4160723 | 2021-10-22T09:28:01Z | 2021-10-22T09:28:01Z | MEMBER | For such case you could already do After the explicit index refactor, we could imagine a custom index that supports multi-dimension coordinates such that you would only need to do something like ```python
or without explicitly providing the name of the packed dimension: ```python
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
949423480 | https://github.com/pydata/xarray/issues/1603#issuecomment-949423480 | https://api.github.com/repos/pydata/xarray/issues/1603 | IC_kwDOAMm_X844lw14 | weipeng1999 38346144 | 2021-10-22T08:56:38Z | 2021-10-22T09:15:17Z | NONE | well, here are my ideas on how to define coordinates with multi dims.(because of github's bug, the characters of 1st image are white, I can not fix it)
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
949413144 | https://github.com/pydata/xarray/issues/1603#issuecomment-949413144 | https://api.github.com/repos/pydata/xarray/issues/1603 | IC_kwDOAMm_X844luUY | benbovy 4160723 | 2021-10-22T08:41:36Z | 2021-10-22T08:41:36Z | MEMBER | Sorry but this is confusing. To me It still looks like you want implicit broadcasting of the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
949401881 | https://github.com/pydata/xarray/issues/1603#issuecomment-949401881 | https://api.github.com/repos/pydata/xarray/issues/1603 | IC_kwDOAMm_X844lrkZ | weipeng1999 38346144 | 2021-10-22T08:25:54Z | 2021-10-22T08:25:54Z | NONE |
thank you for figuring out the wrong things what I done. Well, it' is hard to explain the idea because it is a bit complicated, the last two picture is wrong and make misunderstanding, here are two images explain what I actuarily mean:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
949358898 | https://github.com/pydata/xarray/issues/1603#issuecomment-949358898 | https://api.github.com/repos/pydata/xarray/issues/1603 | IC_kwDOAMm_X844lhEy | benbovy 4160723 | 2021-10-22T07:22:24Z | 2021-10-22T07:22:24Z | MEMBER | Thanks for the detailed description @weipeng1999. For the first 4 slides I don't see how this is different from how does |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
947480352 | https://github.com/pydata/xarray/issues/1603#issuecomment-947480352 | https://api.github.com/repos/pydata/xarray/issues/1603 | IC_kwDOAMm_X844eWcg | weipeng1999 38346144 | 2021-10-20T09:15:41Z | 2021-10-20T09:15:41Z | NONE |
Try to explain my idea, I make a PPT.
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
946474674 | https://github.com/pydata/xarray/issues/1603#issuecomment-946474674 | https://api.github.com/repos/pydata/xarray/issues/1603 | IC_kwDOAMm_X844ag6y | benbovy 4160723 | 2021-10-19T08:19:54Z | 2021-10-19T08:19:54Z | MEMBER | Hi @weipeng1999, I'm not sure to fully understand your suggestion, would you mind sharing some illustrative examples? It is useful to have two distinct It also helps to have a clear separation between the Currently in Xarray the It looks like what you suggest is some kind of implicit (co-)indexes hidden behind any dataset variable(s)? We actually took the opposite direction, trying to make everything explicit. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
946337314 | https://github.com/pydata/xarray/issues/1603#issuecomment-946337314 | https://api.github.com/repos/pydata/xarray/issues/1603 | IC_kwDOAMm_X844Z_Yi | weipeng1999 38346144 | 2021-10-19T03:32:13Z | 2021-10-19T03:33:54Z | NONE | Well, maybe we can consider the coordinates in a more generic way. Let us define coordinate an array in data set cause co-indexed when we index its data set. It means that:
Use dims to determined the way how other array of the data set will be co-indexed.
Some compatibility issues:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
822122172 | https://github.com/pydata/xarray/issues/1603#issuecomment-822122172 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDgyMjEyMjE3Mg== | Hoeze 1200058 | 2021-04-19T02:18:58Z | 2021-04-19T02:19:24Z | NONE | Many array types do have implicit indices.
For example, sparse arrays do have their coordinates / CSR representation as primary index ( Going one step further, one could have continuous dimensions where positional indexing ( => Having explicit and implicit indices on arrays would be awesome, even if they don't support all xarray features! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
523240818 | https://github.com/pydata/xarray/issues/1603#issuecomment-523240818 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDUyMzI0MDgxOA== | shoyer 1217238 | 2019-08-21T00:00:43Z | 2021-03-03T16:46:25Z | MEMBER | Explicitly propagating indexes requires going through most of xarray's source code and auditing each time we create a Dataset or DataArray object with low-level operations. We have some pretty decent testing functions for this in the form of Here's our current progress:
- [x] most of |
{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
557590898 | https://github.com/pydata/xarray/issues/1603#issuecomment-557590898 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDU1NzU5MDg5OA== | max-sixty 5635139 | 2019-11-22T16:04:22Z | 2019-11-22T16:04:22Z | MEMBER |
👍 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
557579503 | https://github.com/pydata/xarray/issues/1603#issuecomment-557579503 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDU1NzU3OTUwMw== | NowanIlfideme 2067093 | 2019-11-22T15:34:57Z | 2019-11-22T15:34:57Z | NONE |
The first example in this comment is similar to my use case: https://github.com/pydata/xarray/issues/3213#issuecomment-520741706 . There are several "core" dimensions, but some part of the coordinates may be hierarchical or cross-defined (e.g. country > province > city > building, but also country > province > voting district > building). We might have a full or nearly-full panel in the MultiIndex representation, but have a huge cross product (even if we keep strictly hierarchical dimensions out). Meanwhile using a true COO sparse representation (as I understand it) will likely end up with slower operations overall, since nearly all machine learning models (think: linear regression) require a dense array input anyways. I'll make an example of this when I find some free time, along with a contrasting one in Pandas. :) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
557567339 | https://github.com/pydata/xarray/issues/1603#issuecomment-557567339 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDU1NzU2NzMzOQ== | dcherian 2448579 | 2019-11-22T15:08:26Z | 2019-11-22T15:08:26Z | MEMBER |
We have experimental support for https://sparse.pydata.org/en/latest/index.html that may help but no documentation unfortunately. There are some details here: https://github.com/pydata/xarray/issues/3213 and https://github.com/pydata/xarray/issues/3484 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
557566798 | https://github.com/pydata/xarray/issues/1603#issuecomment-557566798 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDU1NzU2Njc5OA== | rabernat 1197350 | 2019-11-22T15:07:14Z | 2019-11-22T15:07:14Z | MEMBER | Thanks @NowanIlfideme for your feedback. Could you perhaps share a gist of code related to your use case? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
557563566 | https://github.com/pydata/xarray/issues/1603#issuecomment-557563566 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDU1NzU2MzU2Ng== | NowanIlfideme 2067093 | 2019-11-22T14:59:29Z | 2019-11-22T14:59:29Z | NONE | I've noticed that basically all my current troubles with xarray lead to this issue (lack of MultiIndex support). I use xarray for machine learning/data science/econometrics. My current problem requires a semi-hierarchical indexing on one of the dimensions, and slicing/aggregation along some levels of those dimensions. My first attempt was to just assume each dimension was orthogonal, which resulted in out-of-memory errors. I ended up using a MultiIndex for the hierarchy dimension to have a "dense" representation of a sparse subspace. Unfortunately, currently Multidimensional groupby, especially within the MultiIndex, is a headache as it currently stands. I had to resort to making auxilliary dimensions with one-hot encoded levels (dummy variables) and doing multiply-aggregate operations by hand.
|
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
549179102 | https://github.com/pydata/xarray/issues/1603#issuecomment-549179102 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDU0OTE3OTEwMg== | shoyer 1217238 | 2019-11-03T21:12:25Z | 2019-11-03T21:12:25Z | MEMBER | I'm not working on any of these right now. You might start with a few of the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
549097800 | https://github.com/pydata/xarray/issues/1603#issuecomment-549097800 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDU0OTA5NzgwMA== | dcherian 2448579 | 2019-11-03T02:03:35Z | 2019-11-03T02:03:35Z | MEMBER | @shoyer I was thinking of starting on one of the listed files. Do you have any tips? Are you working on any of those at present? What might be the easiest one to begin? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
511126208 | https://github.com/pydata/xarray/issues/1603#issuecomment-511126208 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDUxMTEyNjIwOA== | rabernat 1197350 | 2019-07-13T14:27:32Z | 2019-07-13T14:27:32Z | MEMBER | After spending a few hours on the issue tracker yesterday, it became clear to me that the issue--more flexible indexes--is a major blocker on many high-priority features going forward. In #2639, @shoyer started to address this. In that now merged-PR, he outlined the following steps, each of which needs its own PR:
So the best way to make progress on all manner of higher-level xarray feature requests is to start working through the next three items in this list. |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
491229992 | https://github.com/pydata/xarray/issues/1603#issuecomment-491229992 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ5MTIyOTk5Mg== | aldanor 2418513 | 2019-05-10T09:47:39Z | 2019-05-10T09:47:39Z | NONE | There's now a good few dozen issues that reference this PR. Wondering if there's any particular help needed (in the form of coding, discussion, or any other fashion), so as to try and speed it up and unblock those issues? (I'm personally interested in resolving problems like #934 myself - allowing selection on non-dim coords, which seems to be a major hassle for a lot of use cases.) |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
450702503 | https://github.com/pydata/xarray/issues/1603#issuecomment-450702503 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ1MDcwMjUwMw== | shoyer 1217238 | 2019-01-01T00:54:27Z | 2019-01-01T00:54:27Z | MEMBER | I'm starting to make these changes incrementally -- the first step is in https://github.com/pydata/xarray/pull/2639. |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
444403484 | https://github.com/pydata/xarray/issues/1603#issuecomment-444403484 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0NDQwMzQ4NA== | benbovy 4160723 | 2018-12-05T08:39:35Z | 2018-12-05T08:39:35Z | MEMBER |
Agreed. It seems very strict indeed, but it will be easier to relax this later than the other way. There is also a (very rare?) case where the two indexed coordinates have the same labels but are named differently in the two datasets (e.g., |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
444204957 | https://github.com/pydata/xarray/issues/1603#issuecomment-444204957 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0NDIwNDk1Nw== | shoyer 1217238 | 2018-12-04T18:25:33Z | 2018-12-04T18:25:33Z | MEMBER |
I discussed this is a little bit above in https://github.com/pydata/xarray/issues/1603#issuecomment-442661526, under "MultiIndex as part of the data schema". I agree that the default behavior should still be to create automatic indexes only for 1d coordinates matching dimension names. But we still will have (rare?) cases where "multiple single indexes" could arise from combining arguments with different indexes. For example, suppose the I guess the error is probably the best idea.
This is indeed the historical genesis, but I agree that this is confusing and we should deprecate/remove it. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
444187219 | https://github.com/pydata/xarray/issues/1603#issuecomment-444187219 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0NDE4NzIxOQ== | alimanfoo 703554 | 2018-12-04T17:33:34Z | 2018-12-04T17:33:34Z | CONTRIBUTOR |
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
444132393 | https://github.com/pydata/xarray/issues/1603#issuecomment-444132393 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0NDEzMjM5Mw== | benbovy 4160723 | 2018-12-04T15:06:21Z | 2018-12-04T15:19:08Z | MEMBER |
Sorry for maybe asking this again but I'm a bit confused now: is there any good reason of supporting "multiple single indexes" along the same dimension? After all, perhaps better defaults would be to set indexes ( If you want a different behavior, then you need to use
I think that one big source of confusion has been so far mixing coordinates/variables and indexes. These are really two separate concepts, and the indexes refactoring should address that IMHO. For example, I think that Take for example ```python
I find it so weird being able to do this: ```python
Where does come from I might be a good thing explicitly requiring |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
443239040 | https://github.com/pydata/xarray/issues/1603#issuecomment-443239040 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MzIzOTA0MA== | max-sixty 5635139 | 2018-11-30T15:29:15Z | 2018-11-30T15:29:15Z | MEMBER | How should dimension names interact with index names - i.e. the "Mapping indexes into pandas" in @shoyer 's comment I'd suggest that option (3) should be invalid, and that |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
443172604 | https://github.com/pydata/xarray/issues/1603#issuecomment-443172604 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MzE3MjYwNA== | benbovy 4160723 | 2018-11-30T11:14:24Z | 2018-11-30T11:14:24Z | MEMBER | A couple of thoughts: If nothing useful can be done in the case of "multiple single indexes", would it make sense to discourage users explicitly creating multiple single indexes along a dimension? "Multiple single indexes" would be just a default situation when nothing specific as been defined yet or resulting from a failback. For example, why not requiring that Hence, would it be possible to avoid |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
443044579 | https://github.com/pydata/xarray/issues/1603#issuecomment-443044579 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MzA0NDU3OQ== | shoyer 1217238 | 2018-11-30T00:24:39Z | 2018-11-30T00:24:39Z | MEMBER | I wonder if we should also change the default value of the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442965602 | https://github.com/pydata/xarray/issues/1603#issuecomment-442965602 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0Mjk2NTYwMg== | shoyer 1217238 | 2018-11-29T19:38:34Z | 2018-11-29T19:38:34Z | MEMBER | It occurs to me that for the case of "multiple single indexes" along the same dimension there is no good way to use them simultaneously for indexing/reindexing at the same time. We should explicitly raise if you try to do this. I guess we have a few options for automatic alignment with multiple single indexes, too: 1. We could only support "exact" indexing 2. We could require that aligning each index separately gives the same result (2) seems least restrictive and is probably the right choice. One advantage of not having What should the default behavior of |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442956167 | https://github.com/pydata/xarray/issues/1603#issuecomment-442956167 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0Mjk1NjE2Nw== | shoyer 1217238 | 2018-11-29T19:10:14Z | 2018-11-29T19:10:14Z | MEMBER |
I think the pandas.MultiIndex is a pretty solid data structure on a fundamental level, it just has some weird semantics for some indexing edge cases. Whether or not we write xarray.MultiIndex structure, we can achieve most of what we want with a thin layer over
Yes, I like this! Generally I like @benbovy's entire proposal :). @fujiisoup can you clarity the use-cases you have for a MultiIndex as a variable?
From a data perspective, the only thing having an Index and/or MultiIndex should change is that the data is immutable. But by necessity the nature of the index will determine which indexing operations are possible/efficient. For example, if you want to do nearest-neighbor indexing with multiple coordinates you'll need a KDTree. We should not be afraid to raise errors if an indexing operation can't be done efficiently. With regards to reindexing: I don't think this needs any special handling versus normal indexing ( Another issue: how do automatic alignment with multiple indexes? Let me suggest a straw-man proposal: We always align indexed coordinates. If a coordinate is used in different types of indexes (e.g., a base |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442907394 | https://github.com/pydata/xarray/issues/1603#issuecomment-442907394 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MjkwNzM5NA== | benbovy 4160723 | 2018-11-29T16:49:12Z | 2018-11-29T17:18:10Z | MEMBER |
Indeed I haven't really thought about How do you currently Contrary to Wouldn't be possible to easily support
This is a good question. A related question: apart from
I agree, although whether or not we will eventually support custom indexes might influence the design choices that we have to do now, IMO. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442906486 | https://github.com/pydata/xarray/issues/1603#issuecomment-442906486 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MjkwNjQ4Ng== | max-sixty 5635139 | 2018-11-29T16:46:52Z | 2018-11-29T16:46:52Z | MEMBER | And broadening out further:
This is basically how I think of indexes - as a performant lookup data structure, rather than a feature of the schema. An RDBMS in a good corollary there. Now, maybe there's enough overlap between the data access and the data schema that we should let them couple - e.g. would you want to be able to run We probably don't need to answer this question to proceed, but I'd be interested whether others see indexes as a property of the schema / I'm missing something. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442902327 | https://github.com/pydata/xarray/issues/1603#issuecomment-442902327 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MjkwMjMyNw== | max-sixty 5635139 | 2018-11-29T16:36:20Z | 2018-11-29T16:36:20Z | MEMBER | I broadly agree with @benbovy 's proposal. One question that I think is worth being clear on is what additional contracts do multiple indexes on a dimension have over individual indexes? e.g. re:
Am I right in thinking the @fujiisoup 's poses a good case of this question:
(and separately, I think we can do much of this before adding the ability to set custom indexes, which would be cool but further from where we are, I think) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442809859 | https://github.com/pydata/xarray/issues/1603#issuecomment-442809859 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MjgwOTg1OQ== | fujiisoup 6815844 | 2018-11-29T12:05:03Z | 2018-11-29T12:05:03Z | MEMBER | I am late for the party (but still only have time to write a short comment). I am a big fan of MultiIndex and like @shoyer 's idea.
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442797084 | https://github.com/pydata/xarray/issues/1603#issuecomment-442797084 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0Mjc5NzA4NA== | benbovy 4160723 | 2018-11-29T11:15:17Z | 2018-11-29T11:15:17Z | MEMBER |
Looking at the reported issues related to multi-indexes in xarray, I have the same feeling. Simply reusing If we re-design indexes so that we allow 3rd-party indexes, maybe we could support both and let the user choose the one (xarray or pandas baked) that best suits his needs? Regarding MultiIndex as part of the data schema vs an implementation detail, if we support extending indexes (and already given the different kinds of multi-coordinate indexes: MultiIndex, KDTree, etc.), then I think that it should be transparent to the user. However, I don't really see why a multi-coordinate index should have its own variable (with tuples of values). I don't want to speak for others, but IMHO If a variable for each multi-coordinate index is "just" for data schema consistency, then why not showing all those indexes in a separate section of the repr? For example:
It is equally transparent, not more verbose, and it is clear that multi-indexes are not part of the coordinates (in fact there is no need of "virtual" coordinates either, nor to name the index). I don't think single indexes should be shown here as it would results in duplicated, uninformative lines. More generally, here is how I would see indexes handled in xarray (I might be missing important aspects, though):
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442725856 | https://github.com/pydata/xarray/issues/1603#issuecomment-442725856 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MjcyNTg1Ng== | max-sixty 5635139 | 2018-11-29T06:52:49Z | 2018-11-29T06:52:49Z | MEMBER |
💯- that very much resonates! And it leaves the implementation flexible if we want to iterate. I'll try to think of some dissenting cases to the proposal / helpful responses to the above. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442710536 | https://github.com/pydata/xarray/issues/1603#issuecomment-442710536 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MjcxMDUzNg== | shoyer 1217238 | 2018-11-29T05:23:33Z | 2018-11-29T05:25:48Z | MEMBER |
This needs an important caveat: it's only true that you use Let me make a tentative proposal: we should model a MultiIndex in xarray as exactly equivalent to a sparse multi-dimensional array, except with missing elements modeled implicitly (by omission) instead of explicitly (with NaN). If we do this, I think MultiIndex semantics could be defined to be identical to those of separable Index objects. One challenge is that we will definitely have to make some intentional deviations from the behavior of pandas, at least when dealing with array indexing of a MultiIndex level. Pandas has some strange behaviors with array indexing of a MultiIndex level, and I'm honestly not sure if they are bugs or features: - It ignores missing labels (https://github.com/pandas-dev/pandas/issues/15452) - It drops duplicate labels (https://github.com/pandas-dev/pandas/issues/19414) Fortunately, the MultiIndex data model is not that complicated, and it is quite straightforward to remap indexing results from sub-Index levels onto integer codes. I suspect we will find it easier to rewrite some of these routines than to change pandas, both because pandas may not agree with different semantics and because the pandas indexing code is an unholy mess. For example, we can reproduce the above issues:
print(get_locs(index, (['a', 'a'],))) # [0, 0] print(get_locs(index, (['a', 'd'],))) # [0, -1] ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442680467 | https://github.com/pydata/xarray/issues/1603#issuecomment-442680467 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MjY4MDQ2Nw== | shoyer 1217238 | 2018-11-29T02:15:48Z | 2018-11-29T02:19:06Z | MEMBER |
The answer is the It's painfully slow for large numbers of points due to a Python loop over each point, but presumably that could be optimized:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442581754 | https://github.com/pydata/xarray/issues/1603#issuecomment-442581754 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MjU4MTc1NA== | shoyer 1217238 | 2018-11-28T19:51:42Z | 2018-11-29T00:48:53Z | MEMBER | I've been thinking about this a little more in the context of starting on the implementation (in #2195). In particular, I no longer agree with this "Separate indexers without a MultiIndex should be prohibited" from my original proposal. The problem is that the semantics of a MultiIndex are not quite the same as separate indexes, and I don't think all use-cases are well solved by always using a MultiIndex. ~~For example, I don't think it's possible to do point-wise indexing along anything other than the first level of a MultiIndex.~~ (note: this is not true, see https://github.com/pydata/xarray/issues/1603#issuecomment-442662561) Instead, I think we should make the model transparent by retaining an xarray variable for the MultiIndex, and provide APIs for explicitly converting index types. e.g., for the repr with a MultiIndex:
The main way in which this could get confusing is if you explicitly mutate the Dataset to remove some but not all of the variables corresponding to the MultiIndex (e.g., The different indicator might make sense regardless but I am also partial to "Prohibit it in our data model." The main downside is that this adds a little more complexity to the logic for determining indexes resulting from an operation (namely, verifying that all MultiIndex levels still correspond to coordinates). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442662561 | https://github.com/pydata/xarray/issues/1603#issuecomment-442662561 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MjY2MjU2MQ== | shoyer 1217238 | 2018-11-29T00:48:12Z | 2018-11-29T00:48:28Z | MEMBER |
This is clearly not true, since it works in pandas:
That said, I still don't know how to use public |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442661526 | https://github.com/pydata/xarray/issues/1603#issuecomment-442661526 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MjY2MTUyNg== | shoyer 1217238 | 2018-11-29T00:42:39Z | 2018-11-29T00:42:39Z | MEMBER | @max-sixty I like your schema vs. implementation breakdown. In general, I agree with you that it would be nice to have MultiIndex has an implementation detail rather than part of xarray's schema. But I'm not entirely sure that's feasible. Let's try to list out the pros/cons. Consider a MultiIndex 'multi' with levels 'x' and 'y':
- Advantages of MultiIndex as part of the data schema:
- There is an explicit coordinate (of tuples) corresponding to MultiIndex values, which can be returned from P.S. I haven't made much progress on this yet so there's definitely still time to figure out the right decision -- thanks for your engagement on this! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
442636798 | https://github.com/pydata/xarray/issues/1603#issuecomment-442636798 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDQ0MjYzNjc5OA== | max-sixty 5635139 | 2018-11-28T22:54:26Z | 2018-11-28T22:54:26Z | MEMBER | Potentially this is too much 'stepping back' now we're at the implementation stage - my perception is that @shoyer is leading this without much support, so weighting having some additional viewpoints, some questions: Is a MultiIndex a feature of the schema or the implementation?I had thought of an MI being an implementation detail in code, rather than in the data schema. We use it as a container for all the indexes along a dimension, rather than representing any properties about the data it contains. One exception to that would be if we wanted multiple groups of indexes along the same dimension, for example: ``` Coordinates: * xa (x) MultiIndex[level_a_1, level_a_2] * level_a_1 (x) object 'a' 'a' 'b' 'b' * level_a_2 (x) int64 1 2 1 2
But is that common / required? MultiIndex as an implementation detailIf it's an implementation detail, is there a benefit to investing in allowing both separate and MIs? While it may not be possible to do pointwise indexing with the current implementation of MI, am I mistaken that it's not an API issue, assuming we pass in index names? e.g.: ```python [ins] In [22]: da = xr.DataArray(np.arange(12).reshape((3, 4)), dims=['x', 'y'], coords=dict(x=list('abc'), y=pd.MultiIndex.from_product([list('ab'),[1,2]]))) [ins] In [23]: da Out[23]: <xarray.DataArray (x: 3, y: 4)> array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) Coordinates: * x (x) <U1 'a' 'b' 'c' * y (y) MultiIndex - y_level_0 (y) object 'a' 'a' 'b' 'b' - y_level_1 (y) int64 1 2 1 2 [ins] In [26]: da.sel(x=xr.DataArray(['a','c'],dims=['z']), y_level_0=xr.DataArray(['a','b'],dims=['z']) y_level_1=xr.DataArray([1,1],dims=['z'])) Out[80]: # hypothetical <xarray.DataArray (z: 3)> array([ 0, 10]) Dimensions without coordinates: z ``` If that's the case, could we instead force all indexes along a dimension to be in a MI, tolerate the short-term constraints of the current MI implementation, and where needed build out additional features? That would (ideally) leave us uncoupled to MIs - if we built a better in-memory data structure, we could transition. The contract would be around the cases above. -- ...and as mentioned above, these are intended as questions rather than high-confident views. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
392833478 | https://github.com/pydata/xarray/issues/1603#issuecomment-392833478 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM5MjgzMzQ3OA== | shoyer 1217238 | 2018-05-29T16:04:27Z | 2018-05-29T16:04:27Z | MEMBER | Sure, this is as good a time as any. But we'll probably need to refinish this refactoring before it makes sense to implement anything. On Tue, May 29, 2018 at 8:59 AM Alistair Miles notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
392831984 | https://github.com/pydata/xarray/issues/1603#issuecomment-392831984 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM5MjgzMTk4NA== | alimanfoo 703554 | 2018-05-29T15:59:46Z | 2018-05-29T15:59:46Z | CONTRIBUTOR | Ok, cool. Was wondering if now was right time to revisit that, alongside the work proposed in this PR. Happy to participate in that discussion, still interested in implementing some alternative index classes. On Tue, 29 May 2018, 15:45 Stephan Hoyer, notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
392803210 | https://github.com/pydata/xarray/issues/1603#issuecomment-392803210 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM5MjgwMzIxMA== | shoyer 1217238 | 2018-05-29T14:45:12Z | 2018-05-29T14:45:12Z | MEMBER | Yes, the index API still needs to be determined. But I think we want to support something like that. On Tue, May 29, 2018 at 1:20 AM Alistair Miles notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
392692996 | https://github.com/pydata/xarray/issues/1603#issuecomment-392692996 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM5MjY5Mjk5Ng== | alimanfoo 703554 | 2018-05-29T08:20:22Z | 2018-05-29T08:20:22Z | CONTRIBUTOR | I see this mentions an Index API, is that still to be decided? On Tue, 29 May 2018, 05:28 Stephan Hoyer, notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
392649605 | https://github.com/pydata/xarray/issues/1603#issuecomment-392649605 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM5MjY0OTYwNQ== | shoyer 1217238 | 2018-05-29T04:28:45Z | 2018-05-29T04:28:45Z | MEMBER | I started thinking about how to do this incrementally, and it occurs to me that a good place to start would be to write some of the utility functions we'll need for this:
1. Normalizing and creating default I drafted up docstrings for each of these functions and did a little bit of working starting to think through implementations in https://github.com/pydata/xarray/pull/2195. So this would be a great place for others to help out. Each of these could be separate PRs. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
379905457 | https://github.com/pydata/xarray/issues/1603#issuecomment-379905457 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM3OTkwNTQ1Nw== | shoyer 1217238 | 2018-04-09T21:52:02Z | 2018-04-11T04:34:43Z | MEMBER | I've been thinking about getting started on this. Here are my current thoughts on the right design approach. Data model
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
380323532 | https://github.com/pydata/xarray/issues/1603#issuecomment-380323532 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM4MDMyMzUzMg== | max-sixty 5635139 | 2018-04-11T04:28:53Z | 2018-04-11T04:28:53Z | MEMBER | Overall, I agree with the proposed conclusion. And appreciate the level of thoughtfulness and clarity. I'm happy to help with some of the implementation if we can split this up. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
379937531 | https://github.com/pydata/xarray/issues/1603#issuecomment-379937531 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM3OTkzNzUzMQ== | shoyer 1217238 | 2018-04-10T00:42:19Z | 2018-04-10T00:42:19Z | MEMBER | @fujiisoup Yes, we certainly could add a "N-dimensional index", even if it has no function other than a placeholder to mark a variable as an index. This would let us restore index state after selecting/concatenating along a dimension. However, I'm not sure it would be a satisfactory solution. If we keep these indexes around like coordinates, we could end up with scalar coordinates from different dimensions. Then it's still not clear how they should stack up in the final result -- we would have the same issue we currently have with concatenating coordinates. The other concern is that existence and behavior of scalar/N-dimensional indexes could be a surprising. What does it mean to index an N-dimensional index? This operations probably cannot be supported in a sensible way, or at least not without significant effort. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
379920389 | https://github.com/pydata/xarray/issues/1603#issuecomment-379920389 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM3OTkyMDM4OQ== | fujiisoup 6815844 | 2018-04-09T23:03:03Z | 2018-04-09T23:04:01Z | MEMBER | @shoyer, thank you for detailing. I am thinking how can we establish the following Or, we may give up to restore the original coordinate structure during the above action, but stil keep them as ordinary coodinates. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
340012824 | https://github.com/pydata/xarray/issues/1603#issuecomment-340012824 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM0MDAxMjgyNA== | shoyer 1217238 | 2017-10-27T15:59:51Z | 2017-10-27T15:59:51Z | MEMBER | @jjpr-mit can you explain your use case a little more? What sort of order dependent queries do you want to do? The one that comes to mind for me are range based queries, e.g, I think it is still relatively easy to ensure a unique ordering between levels, based on the order of coordinate variables in the xarray dataset. A bigger challenge is that for efficiency, these sorts of queries depend critically on having an actual MultiIndex. This means that if indexes for each of the levels arise from different arguments that were merged together, we might need to "merge" the separate indexes into a joint MultiIndex. This could potentially be slightly expensive. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
340005903 | https://github.com/pydata/xarray/issues/1603#issuecomment-340005903 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDM0MDAwNTkwMw== | jjpr-mit 25231875 | 2017-10-27T15:34:42Z | 2017-10-27T15:34:42Z | NONE | Will the new API preserve the order of the levels? One of the features that's necessary for |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
338622746 | https://github.com/pydata/xarray/issues/1603#issuecomment-338622746 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzODYyMjc0Ng== | alimanfoo 703554 | 2017-10-23T10:56:40Z | 2017-10-23T10:56:40Z | CONTRIBUTOR | Just to say I'm interested in how MultiIndexes are handled also. In our use case, we have two variables conventionally named CHROM (chromosome) and POS (position) which together describe a location in a genome. I want to combine both variables into a multi-index so I can, e.g., select all data from some data variable for chromosome X between positions 100,000-200,000. For all our data variables, this genome location multi-index would be used to index the first dimension. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
336496995 | https://github.com/pydata/xarray/issues/1603#issuecomment-336496995 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzNjQ5Njk5NQ== | shoyer 1217238 | 2017-10-13T16:09:23Z | 2017-10-13T16:09:38Z | MEMBER |
The other advantage is that it solves many of the issues with the current
I agree, but there are probably some advantages to using a MultiIndex internally. For example, it allows for looking up on multiple levels at the same time.
I think we could get away with making For KDTree, this means we'll have to write our own wrapper |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
336381864 | https://github.com/pydata/xarray/issues/1603#issuecomment-336381864 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzNjM4MTg2NA== | fujiisoup 6815844 | 2017-10-13T08:09:25Z | 2017-10-13T08:09:25Z | MEMBER | Thanks for the details. (Sorry for my late responce. It took a long for me to understand what does it look like.) I am wondering what the advantageous cases which are realized with this
Are they correct?
That sounds reasonable.
I like the latter one, as it is easier to understand even for non-pandas users. What does the actual implementation look like?
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
334229444 | https://github.com/pydata/xarray/issues/1603#issuecomment-334229444 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzNDIyOTQ0NA== | shoyer 1217238 | 2017-10-04T17:27:44Z | 2017-10-04T17:27:44Z | MEMBER |
We would still assign default indexes (using a normal Another aspect to consider how to handle alignment when you have indexes along non-dimension coordinates. Probably the most elegant rule would again be to check all indexed variables for exact matches. Directly assigning indexes rather than using this default or For performance reasons, we probably do not want to actually check the values of manually assigned indexes, although we should verify that the shape matches. (We would have a clear disclaimer that if you manually assign an index with mismatched values the behavior is not well defined.) In principle, this data model would allow for two mostly equivalent indexing schemes:
Yes, this is a little unfortunate. We could potentially make a custom wrapper for use in
Every entry in |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
334125888 | https://github.com/pydata/xarray/issues/1603#issuecomment-334125888 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzNDEyNTg4OA== | fujiisoup 6815844 | 2017-10-04T11:25:14Z | 2017-10-04T12:43:59Z | MEMBER | @shoyer, could you add more details of this idea?
I think I do not yet fully understand the practical difference between
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
334091075 | https://github.com/pydata/xarray/issues/1603#issuecomment-334091075 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzNDA5MTA3NQ== | benbovy 4160723 | 2017-10-04T08:52:08Z | 2017-10-04T08:52:08Z | MEMBER | I think that promoting "Indexes" to a first-class concept is indeed a very good idea, at both internal and public levels, even if at the latter level it would be another concept for users (it should be already familiar for pandas users, though). IMHO the "coordinate" and "index" concepts are different enough to consider them separately. I like the proposed repr for I have to think a bit more about the details but I like the idea. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
334048571 | https://github.com/pydata/xarray/issues/1603#issuecomment-334048571 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzNDA0ODU3MQ== | shoyer 1217238 | 2017-10-04T04:45:07Z | 2017-10-04T04:45:07Z | MEMBER | CC @benbovy @fmaussion |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
334045987 | https://github.com/pydata/xarray/issues/1603#issuecomment-334045987 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzNDA0NTk4Nw== | shoyer 1217238 | 2017-10-04T04:19:55Z | 2017-10-04T04:20:25Z | MEMBER |
Yes, exactly. We actually already have an attribute that works like this, but it's current computed lazily, from either |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
334041813 | https://github.com/pydata/xarray/issues/1603#issuecomment-334041813 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzNDA0MTgxMw== | shoyer 1217238 | 2017-10-04T03:40:13Z | 2017-10-04T04:15:39Z | MEMBER | I sometimes find it helpful to think about what the right For example, we might imagine that "Indexes" are no longer coordinates, but instead their own entry in the repr:
"Indexes" might not even need to be part of the main In this model:
|
{ "total_count": 5, "+1": 5, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
334043044 | https://github.com/pydata/xarray/issues/1603#issuecomment-334043044 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzNDA0MzA0NA== | fujiisoup 6815844 | 2017-10-04T03:51:57Z | 2017-10-04T03:51:57Z | MEMBER | I think we currently assume It sounds a much cleaner data model. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
334030279 | https://github.com/pydata/xarray/issues/1603#issuecomment-334030279 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzNDAzMDI3OQ== | shoyer 1217238 | 2017-10-04T02:03:39Z | 2017-10-04T02:03:39Z | MEMBER | One API design challenge here is that I think we still want a explicit notation of "indexed" variables. We could possibly allow operations like |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 | |
334029215 | https://github.com/pydata/xarray/issues/1603#issuecomment-334029215 | https://api.github.com/repos/pydata/xarray/issues/1603 | MDEyOklzc3VlQ29tbWVudDMzNDAyOTIxNQ== | fujiisoup 6815844 | 2017-10-04T01:55:02Z | 2017-10-04T01:55:02Z | MEMBER | I'm using Consider the following example, ```python In [1]: import numpy as np ...: import xarray as xr ...: da = xr.DataArray(np.arange(5), dims=['x'], ...: coords={'experiment': ('x', [0, 0, 0, 1, 1]), ...: 'time': ('x', [0.0, 0.1, 0.2, 0.0, 0.15])}) ...: In [2]: da Out[2]: <xarray.DataArray (x: 5)> array([0, 1, 2, 3, 4]) Coordinates: experiment (x) int64 0 0 0 1 1 time (x) float64 0.0 0.1 0.2 0.0 0.15 Dimensions without coordinates: x ``` I want to do something like this
If we could make a selection from a non-index coordinate,
I think there should be other important usecases of |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 12