home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where author_association = "NONE" and user = 38346144 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 3

  • Explicit indexes in xarray's data-model (Future of MultiIndex) 6
  • Support .reindex with DataArrays and Dataset as indexers 2
  • Linear algebra support 1

user 1

  • weipeng1999 · 9 ✖

author_association 1

  • NONE · 9 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1287593498 https://github.com/pydata/xarray/issues/7193#issuecomment-1287593498 https://api.github.com/repos/pydata/xarray/issues/7193 IC_kwDOAMm_X85Mvx4a weipeng1999 38346144 2022-10-22T02:51:50Z 2022-10-22T09:51:43Z NONE

Thanks @weipeng1999 . Can you please provide a minimal example showing the syntax and expected output?

Can I just copy from the doc and use comment to mention the change ``` python In [100]: da = xr.DataArray( ....: np.random.rand(4, 2), ....: [ ....: ("time", pd.date_range("2000-01-01", periods=4)), ....: ("space", ["IA", "IL"]), # do not have the "IN" label ....: ], ....: )

In [101]: times = xr.DataArray( ....: pd.to_datetime(["2000-01-03", "2000-01-02", "2000-01-01"]), dims="new_time" ....: )

In [102]: # use .reindex instead of .sel ....: # and give the parameter : "fill_value" ....: da.reindex(space=xr.DataArray(["IA", "IL", "IN"], dims=["new_time"]), time=times, fill_value=np.nan) Out[102]: <xarray.DataArray (new_time: 3)> array([0.92, 0.34, NaN]) # so fill the missing value by np.nan Coordinates: time (new_time) datetime64[ns] 2000-01-03 2000-01-02 2000-01-01 space (new_time) <U2 'IA' 'IL' 'IN' * new_time (new_time) datetime64[ns] 2000-01-03 2000-01-02 2000-01-01

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support .reindex with DataArrays and Dataset as indexers 1417641930
1287594393 https://github.com/pydata/xarray/issues/7193#issuecomment-1287594393 https://api.github.com/repos/pydata/xarray/issues/7193 IC_kwDOAMm_X85MvyGZ weipeng1999 38346144 2022-10-22T02:57:50Z 2022-10-22T02:57:50Z NONE

So we can guarantee that:

  • .reindex: set the missing value to fill_value (derfault is nan), and the result data may have nan.
  • .fillna: deal the nan in data and the result data will not have nan.
  • .sel: do not change the state.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support .reindex with DataArrays and Dataset as indexers 1417641930
949485684 https://github.com/pydata/xarray/issues/1603#issuecomment-949485684 https://api.github.com/repos/pydata/xarray/issues/1603 IC_kwDOAMm_X844mAB0 weipeng1999 38346144 2021-10-22T10:15:39Z 2021-10-22T10:15:39Z NONE

So I think maintain the origin dims may do less broken on current code.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
949484507 https://github.com/pydata/xarray/issues/1603#issuecomment-949484507 https://api.github.com/repos/pydata/xarray/issues/1603 IC_kwDOAMm_X844l_vb weipeng1999 38346144 2021-10-22T10:14:01Z 2021-10-22T10:14:01Z NONE

For such case you could already do ds.stack(z=("t", "x")).set_index(z="C2").sel(z=["a", "e", "h"]).

After the explicit index refactor, we could imagine a custom index that supports multi-dimension coordinates such that you would only need to do something like

```python

S_res = S4.sel(C2=("z", ["a", "e", "h"])) S_res <xarray.Dataset> Dimensions: (z: 3) Coordinates: * C2 (z) <U1 'a' 'e' 'h' Data variables: A1 (z) float64 4 3 3 ```

or without explicitly providing the name of the packed dimension:

```python

S_res = S4.sel(C2=["a", "e", "h"]) S_res <xarray.Dataset> Dimensions: (C2: 3) Coordinates: * C2 (C2) <U1 'a' 'e' 'h' Data variables: A1 (C2) float64 4 3 3 ```

well, both "contain the origin dims" or just "generate another one" have its benefit. if we contain origin dims, we can ensure that: - less difference between 1d coordinate and multi dims ones, both can run like S1.sel(C1=["a", "e", "h"]) S4.sel(C2=["a", "e", "h"]) and return a new data set with origin dims ( that's why I highly not recommended the implicit one ) - return a new data set have original dims which means if you change C1 to C2, and the rest code have S_res.sel(x=[1,2,3]) still work.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
949423480 https://github.com/pydata/xarray/issues/1603#issuecomment-949423480 https://api.github.com/repos/pydata/xarray/issues/1603 IC_kwDOAMm_X844lw14 weipeng1999 38346144 2021-10-22T08:56:38Z 2021-10-22T09:15:17Z NONE

well, here are my ideas on how to define coordinates with multi dims.(because of github's bug, the characters of 1st image are white, I can not fix it)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
949401881 https://github.com/pydata/xarray/issues/1603#issuecomment-949401881 https://api.github.com/repos/pydata/xarray/issues/1603 IC_kwDOAMm_X844lrkZ weipeng1999 38346144 2021-10-22T08:25:54Z 2021-10-22T08:25:54Z NONE

Thanks for the detailed description @weipeng1999. For the first 4 slides I don't see how this is different from how does S_res = S1.sel(C1=['a', 'b'] and S_res = S2.sel(C1=['a', 'b']) currently? And for the last 2 slides, I don't think that we always want such implicit broadcasting for dimensions that are not involved in the indexed coordinates.

thank you for figuring out the wrong things what I done. Well, it' is hard to explain the idea because it is a bit complicated, the last two picture is wrong and make misunderstanding, here are two images explain what I actuarily mean:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
947480352 https://github.com/pydata/xarray/issues/1603#issuecomment-947480352 https://api.github.com/repos/pydata/xarray/issues/1603 IC_kwDOAMm_X844eWcg weipeng1999 38346144 2021-10-20T09:15:41Z 2021-10-20T09:15:41Z NONE

Hi @weipeng1999,

I'm not sure to fully understand your suggestion, would you mind sharing some illustrative examples?

It is useful to have two distinct coordinate variable vs data variable concepts. Although both are data arrays, the former is used to locate data in the dimensional space(s) defined by all dimensions in the dataset while the latter is used to store field data.

It also helps to have a clear separation between the coordinate variable and index concepts. An index is a specific data structure or object that allows efficient data extraction or alignment based one or more coordinate labels. Sometimes an index object may be handled like a data array (like pandas indexes) but this is not always the case (e.g., a KD-Tree).

Currently in Xarray the index concept is hidden behind "dimension" coordinate variables. The goal of the explicit index refactor is to bring it to the light and make it available to any coordinate (and also open it to custom index structures, not only pandas indexes).

It looks like what you suggest is some kind of implicit (co-)indexes hidden behind any dataset variable(s)? We actually took the opposite direction, trying to make everything explicit.

Try to explain my idea, I make a PPT.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
946337314 https://github.com/pydata/xarray/issues/1603#issuecomment-946337314 https://api.github.com/repos/pydata/xarray/issues/1603 IC_kwDOAMm_X844Z_Yi weipeng1999 38346144 2021-10-19T03:32:13Z 2021-10-19T03:33:54Z NONE

Well, maybe we can consider the coordinates in a more generic way.

Let us define coordinate an array in data set cause co-indexed when we index its data set. It means that:

  • If A1,A2,A3 are in a same data set S, we index S[ {'A1':I} ] will return a new data set which not only have indexed A1, but they also been Indexed that the A2 A3 which have dims shared with A1. This behavior I call it co-index.

Use dims to determined the way how other array of the data set will be co-indexed.

  • If all dims of A1(as coordinate) are also in A2(as regular array co-indexed), obviously the behavior can simply follow the old behavior, just change at the same dim and contain others.
  • If A1 has a dim which not in A2, we should broadcast A2 at the dim, because the older behavior is to consider None dim as broadcast-able dim during other operation so co-index should follow it.

Some compatibility issues:

  • maybe need a New Type like DataArray but only have dims instead of both dims and coordinate
  • just define how Dataset to deal with index, maybe DataArray is simlar.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
533563714 https://github.com/pydata/xarray/issues/3322#issuecomment-533563714 https://api.github.com/repos/pydata/xarray/issues/3322 MDEyOklzc3VlQ29tbWVudDUzMzU2MzcxNA== weipeng1999 38346144 2019-09-20T13:54:40Z 2019-09-20T14:02:46Z NONE

Hi @weipeng1999 , could you link the reference implementation in numpy/scipy?

I think this would be niche-ish. I would personally try to keep xarray free of functionality that only a tiny fraction of the users actually use - particularly when such functionality can be implemented with a trivial wrapper by the users themselves. e.g. at the moment we have exactly one scipy function being wrapped, and that's linear interpolation which is useful to a lot of people.

I think this falls into a more general discussion on how niche a function must be in order to be excluded from the library - @shoyer what's your opinion?

Regardless, I would like to point you to https://xarray-extras.readthedocs.io which is a module that I created exactly for this kind of cases (PRs are welcome).

I realize that I am a totally green finger here are my trial implement I think I have long way to make it commitable

qr.txt

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Linear algebra support 495799492

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1516.382ms · About: xarray-datasette