github: issue_comments: 12 rows where author_association = "MEMBER", issue = 262642978 and user = 4160723 sorted by updated

12 rows where author_association = "MEMBER", issue = 262642978 and user = 4160723 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1259326037	https://github.com/pydata/xarray/issues/1603#issuecomment-1259326037	https://api.github.com/repos/pydata/xarray/issues/1603	IC_kwDOAMm_X85LD8pV	benbovy 4160723	2022-09-27T10:50:36Z	2022-09-27T10:50:36Z	MEMBER	Should we close this issue and continue the discussion in #6293? For anyone who wants to track the progress on this topic: https://github.com/pydata/xarray/projects/1	{ "total_count": 2, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 2, "eyes": 0 }	Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
949494376	https://github.com/pydata/xarray/issues/1603#issuecomment-949494376	https://api.github.com/repos/pydata/xarray/issues/1603	IC_kwDOAMm_X844mCJo	benbovy 4160723	2021-10-22T10:27:26Z	2021-10-22T10:27:26Z	MEMBER	well, both "contain the origin dims" or just "generate another one" have its benefit. Agreed, and both are supported by xarray actually. In case we want to keep the original dimensions like ("x", "y") in the example above, it's better to use masking. This discussion is broader than the topic covered in this issue so I'd suggest you start a new discussion if you want to further discuss this with the xarray community. Thanks.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
949449312	https://github.com/pydata/xarray/issues/1603#issuecomment-949449312	https://api.github.com/repos/pydata/xarray/issues/1603	IC_kwDOAMm_X844l3Jg	benbovy 4160723	2021-10-22T09:28:01Z	2021-10-22T09:28:01Z	MEMBER	For such case you could already do `ds.stack(z=("t", "x")).set_index(z="C2").sel(z=["a", "e", "h"])`. After the explicit index refactor, we could imagine a custom index that supports multi-dimension coordinates such that you would only need to do something like ```python S_res = S4.sel(C2=("z", ["a", "e", "h"])) S_res <xarray.Dataset> Dimensions: (z: 3) Coordinates: * C2 (z) <U1 'a' 'e' 'h' Data variables: A1 (z) float64 4 3 3 ``` or without explicitly providing the name of the packed dimension: ```python S_res = S4.sel(C2=["a", "e", "h"]) S_res <xarray.Dataset> Dimensions: (C2: 3) Coordinates: * C2 (C2) <U1 'a' 'e' 'h' Data variables: A1 (C2) float64 4 3 3 ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
949413144	https://github.com/pydata/xarray/issues/1603#issuecomment-949413144	https://api.github.com/repos/pydata/xarray/issues/1603	IC_kwDOAMm_X844luUY	benbovy 4160723	2021-10-22T08:41:36Z	2021-10-22T08:41:36Z	MEMBER	Sorry but this is confusing. To me It still looks like you want implicit broadcasting of the `A3` variable along the `x` dimension. In your last comment you depict `A3` inconsistently with a 2-d shape but with only the `t` dimension. I'm also not sure how your suggestion relates to the issue here.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
949358898	https://github.com/pydata/xarray/issues/1603#issuecomment-949358898	https://api.github.com/repos/pydata/xarray/issues/1603	IC_kwDOAMm_X844lhEy	benbovy 4160723	2021-10-22T07:22:24Z	2021-10-22T07:22:24Z	MEMBER	Thanks for the detailed description @weipeng1999. For the first 4 slides I don't see how this is different from how does `S_res = S1.sel(C1=['a', 'b']` and `S_res = S2.sel(C1=['a', 'b'])` currently? And for the last 2 slides, I don't think that we always want such implicit broadcasting for dimensions that are not involved in the indexed coordinates.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
946474674	https://github.com/pydata/xarray/issues/1603#issuecomment-946474674	https://api.github.com/repos/pydata/xarray/issues/1603	IC_kwDOAMm_X844ag6y	benbovy 4160723	2021-10-19T08:19:54Z	2021-10-19T08:19:54Z	MEMBER	Hi @weipeng1999, I'm not sure to fully understand your suggestion, would you mind sharing some illustrative examples? It is useful to have two distinct `coordinate variable` vs `data variable` concepts. Although both are data arrays, the former is used to locate data in the dimensional space(s) defined by all dimensions in the dataset while the latter is used to store field data. It also helps to have a clear separation between the `coordinate variable` and `index` concepts. An index is a specific data structure or object that allows efficient data extraction or alignment based one or more coordinate labels. Sometimes an index object may be handled like a data array (like pandas indexes) but this is not always the case (e.g., a KD-Tree). Currently in Xarray the `index` concept is hidden behind "dimension" coordinate variables. The goal of the explicit index refactor is to bring it to the light and make it available to any coordinate (and also open it to custom index structures, not only pandas indexes). It looks like what you suggest is some kind of implicit (co-)indexes hidden behind any dataset variable(s)? We actually took the opposite direction, trying to make everything explicit.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
444403484	https://github.com/pydata/xarray/issues/1603#issuecomment-444403484	https://api.github.com/repos/pydata/xarray/issues/1603	MDEyOklzc3VlQ29tbWVudDQ0NDQwMzQ4NA==	benbovy 4160723	2018-12-05T08:39:35Z	2018-12-05T08:39:35Z	MEMBER	I guess the error is probably the best idea. Agreed. It seems very strict indeed, but it will be easier to relax this later than the other way. There is also a (very rare?) case where the two indexed coordinates have the same labels but are named differently in the two datasets (e.g., `station_name` and `sname`). In that case an error is probably better too. It would be a sort of indication that the most useful thing to do for future operations is to rename one of those coordinates first.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
444132393	https://github.com/pydata/xarray/issues/1603#issuecomment-444132393	https://api.github.com/repos/pydata/xarray/issues/1603	MDEyOklzc3VlQ29tbWVudDQ0NDEzMjM5Mw==	benbovy 4160723	2018-12-04T15:06:21Z	2018-12-04T15:19:08Z	MEMBER	It occurs to me that for the case of "multiple single indexes" along the same dimension there is no good way to use them simultaneously for indexing/reindexing at the same time. Sorry for maybe asking this again but I'm a bit confused now: is there any good reason of supporting "multiple single indexes" along the same dimension? After all, perhaps better defaults would be to set indexes (`pandas.Index`) only for 1-d coordinates matching dimension names, like it is the case now. If you want a different behavior, then you need to use `.set_index()`, which would raise if it results in multiple single indexes along a dimension. We could also add a new `indexes` argument to the `Dataset` / `DataArray` constructors to save some typing (and avoid the creation of in-memory `pandas.Index` for very long coordinates if an out-of-core alternative is later supported). da[dim_name] should return all the indexes on that dimension I think that one big source of confusion has been so far mixing coordinates/variables and indexes. These are really two separate concepts, and the indexes refactoring should address that IMHO. For example, I think that `da[some_name]` should never return indexes but only coordinates (and/or data variables for Dataset). That would be much simpler. Take for example ```python da = xr.DataArray(np.random.rand(2, 2), ... dims=('one', 'two'), ... coords={'one_labels': ('one', ['a', 'b'])}) da <xarray.DataArray (one: 2, two: 2)> array([[ 0.536028, 0.291895], [ 0.682108, 0.926003]]) Coordinates: one_labels (one) <U1 'a' 'b' Dimensions without coordinates: one, two ``` I find it so weird being able to do this: ```python da['one'] <xarray.DataArray 'one' (one: 2)> array([0, 1]) Coordinates: one_labels (one) <U1 'a' 'b' Dimensions without coordinates: one ``` Where does come from `array([0, 1])`? I wouldn't have been surprised if a `KeyError` was raised instead. Perhaps this specific case was initially for backward compatibility when the "dimensions without indexes" feature has been introduced, but it was a long time ago and I'm not sure this is still necessary. I might be a good thing explicitly requiring `da.set_index('one_labels')` to enable indexing/alignment (edit: label indexing/alignment) along dimension `one` in the example above.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
443172604	https://github.com/pydata/xarray/issues/1603#issuecomment-443172604	https://api.github.com/repos/pydata/xarray/issues/1603	MDEyOklzc3VlQ29tbWVudDQ0MzE3MjYwNA==	benbovy 4160723	2018-11-30T11:14:24Z	2018-11-30T11:14:24Z	MEMBER	A couple of thoughts: If nothing useful can be done in the case of "multiple single indexes", would it make sense to discourage users explicitly creating multiple single indexes along a dimension? "Multiple single indexes" would be just a default situation when nothing specific as been defined yet or resulting from a failback. For example, why not requiring that `set_index(['x', 'y'])` (with a list as argument) should always result in a multi-index regardless of the `kind` argument, i.e., raise if a single index is given? This is close to the current behavior, I think. This would require calling `set_index` for each single index that we want to (re)define, but I don't think setting a lot of single indexes at the same time is something that often happens. Hence, would it be possible to avoid `append=None` and instead change the default to `append=True`?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
442907394	https://github.com/pydata/xarray/issues/1603#issuecomment-442907394	https://api.github.com/repos/pydata/xarray/issues/1603	MDEyOklzc3VlQ29tbWVudDQ0MjkwNzM5NA==	benbovy 4160723	2018-11-29T16:49:12Z	2018-11-29T17:18:10Z	MEMBER	ds.sel(multi=list_of_pairs) can probably be replaced by ds.sel(x=..., y=...), but how about reindex along MultiIndex? Indeed I haven't really thought about `reindex` and alignment in my suggestion above. How do you currently `reindex` along a multi-index dimension? Contrary to `.sel`, `ds.reindex(multi=list_of_pairs)` doesn't seem to work (the list of n-length tuples being interpreted as a ~~n-dim~~ 2-d array). The only way I've found to make it work is to pass another `pandas.MultiIndex`. Wouldn't be it rather confusing if we choose to go with our own implementation of MultiIndex for xarray instead of `pandas.MultiIndex`? Wouldn't be possible to easily support `ds.reindex(x=..., y=...)` within the new data model proposed here? Am I right in thinking the Multi-indexes is only a helpful note to users, rather than conveying anything about how data is accessed? This is a good question. A related question: apart from `ds.sel(multi=list_of_pairs)` and `ds.reindex(multi=list_of_pairs)` use cases discussed so far, is there other reasons of having a variable for a multi-index? I think we can do much of this before adding the ability to set custom indexes, which would be cool but further from where we are, I think. I agree, although whether or not we will eventually support custom indexes might influence the design choices that we have to do now, IMO.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
442797084	https://github.com/pydata/xarray/issues/1603#issuecomment-442797084	https://api.github.com/repos/pydata/xarray/issues/1603	MDEyOklzc3VlQ29tbWVudDQ0Mjc5NzA4NA==	benbovy 4160723	2018-11-29T11:15:17Z	2018-11-29T11:15:17Z	MEMBER	we will definitely have to make some intentional deviations from the behavior of pandas Looking at the reported issues related to multi-indexes in xarray, I have the same feeling. Simply reusing `pandas.MultiIndex` in xarray where slightly different semantics are generally expected has shown to be painful. It seems easier to have our own baked solution and deal with differences during xarray<-> pandas conversion if needed. If we re-design indexes so that we allow 3rd-party indexes, maybe we could support both and let the user choose the one (xarray or pandas baked) that best suits his needs? Regarding MultiIndex as part of the data schema vs an implementation detail, if we support extending indexes (and already given the different kinds of multi-coordinate indexes: MultiIndex, KDTree, etc.), then I think that it should be transparent to the user. However, I don't really see why a multi-coordinate index should have its own variable (with tuples of values). I don't want to speak for others, but IMHO `ds.sel(multi=list_of_pairs)` is rather a edge case and I'm not sure if we really need to support it. Using `ds.sel(x=..., y=...)` with DataArray objects is certainly more code to write, but this form of indexing is very powerful and it might not be a bad idea to encourage it. If a variable for each multi-coordinate index is "just" for data schema consistency, then why not showing all those indexes in a separate section of the repr? For example: `Coordinates: * level_1 (x) object 'a' 'a' 'b' 'b' * level_2 (x) int64 1 2 1 2 Multi-indexes: pandas.MultiIndex [level_1, level_2]` It is equally transparent, not more verbose, and it is clear that multi-indexes are not part of the coordinates (in fact there is no need of "virtual" coordinates either, nor to name the index). I don't think single indexes should be shown here as it would results in duplicated, uninformative lines. More generally, here is how I would see indexes handled in xarray (I might be missing important aspects, though): Default behavior: all 1-dimensional coordinates each have their own, single index (`pandas.Index`), unless explicitly stated. Explicit API is used for setting new, possibly multi-coordinate indexes. Note the absence of keyword argument below to specify the variables: This is actually more consistent with the pandas API but this would be a breaking change and I don't know how a smooth transition could look like. `set_index(['x', 'y'], kind='multiindex') # xarray built-in index` `set_index(['x', 'y'], kind='kdtree') # xarray built-in index` `set_index('x', kind=ASingleIndexWrapperClass) # 3rd-party index` If a coordinate is removed from the Dataset or if its index is reset or changed: If the coordinate had a single index, no problem If the coordinate was part of a multi-coordinate index: a new index is built from all remaining coordinates that were also part of the original index, if it is supported. Otherwise, the original index is removed and the default behavior (single `pandas.Index`) is reset for all those remaining coordinates.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
334091075	https://github.com/pydata/xarray/issues/1603#issuecomment-334091075	https://api.github.com/repos/pydata/xarray/issues/1603	MDEyOklzc3VlQ29tbWVudDMzNDA5MTA3NQ==	benbovy 4160723	2017-10-04T08:52:08Z	2017-10-04T08:52:08Z	MEMBER	I think that promoting "Indexes" to a first-class concept is indeed a very good idea, at both internal and public levels, even if at the latter level it would be another concept for users (it should be already familiar for pandas users, though). IMHO the "coordinate" and "index" concepts are different enough to consider them separately. I like the proposed repr for `Dataset.indexes`. I wouldn't mind if it is not included in `Dataset.__repr__`, considering that multi-indexes, kdtree, etc. only represent a few use cases. In too many cases it could result in a long, uninformative list of simple `pandas.Index`. I have to think a bit more about the details but I like the idea.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);