home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 1175329407 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • benbovy 3
  • max-sixty 2
  • keewis 1

issue 1

  • Pass indexes to the Dataset and DataArray constructors · 6 ✖

author_association 1

  • MEMBER 6
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1290454937 https://github.com/pydata/xarray/issues/6392#issuecomment-1290454937 https://api.github.com/repos/pydata/xarray/issues/6392 IC_kwDOAMm_X85M6seZ benbovy 4160723 2022-10-25T12:19:52Z 2022-10-25T12:19:52Z MEMBER

I'm thinking of only accepting one or more instances of Indexes as indexes argument in the Dataset and DataArray constructors. The only exception is when fastpath=True a mapping can be given directly.

  • It is much easier to handle: just check that keys returned by Indexes.variables do no conflict with the coordinate names in the coords argument
  • It is slightly safer: it requires the user to explicitly create an Indexes object, thus with less chance to accidentally provide coordinate variables and index objects that do not relate to each other (we could probably add some safe guards in the Indexes class itself)
  • It is more convenient: an Xarray Index may provide a factory method that returns an instance of Indexes that we just need to pass as indexes
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes to the Dataset and DataArray constructors 1175329407
1260618693 https://github.com/pydata/xarray/issues/6392#issuecomment-1260618693 https://api.github.com/repos/pydata/xarray/issues/6392 IC_kwDOAMm_X85LI4PF benbovy 4160723 2022-09-28T09:13:00Z 2022-09-28T12:52:01Z MEMBER

How would we handle creating xarray objects from pandas objects where they have a multiindex?

For pandas.Series / pandas.DataFrame objects, DataArray.from_series() / Dataset.from_dataframe() already expand multi-index levels as dimensions.

For a pandas.MultiIndex, we could do like below but it is a bit tedious:

```python import pandas as pd import xarray as xr from xarray.indexes import PandasMultiIndex

pd_idx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("foo", "bar")) idx = PandasMultiIndex(pd_idx, "x")

indexes = {"x": idx, "foo": idx, "bar": idx} coords = idx.create_variables()

ds = xr.Dataset(coords=coords, indexes=indexes) ```

For more convenience, we could add a class method to PandasMultiIndex, e.g.,

```python

this calls PandasMultiIndex.init() and PandasMultiIndex.create_variables() internally

indexes, coords = PandasMultiIndex.from_pandas_index(pd_idx, "x")

ds = xr.Dataset(coords=coords, indexes=indexes) ```

Instead of indexes, coords raw dictionaries, we could return an instance of the Indexes class (also returned by Dataset.xindexes), which encapsulates the coordinate variables:

```python xmidx = PandasMultiIndex.from_pandas_index(pd_idx, "x")

ds = xr.Dataset(coords=xmidx.variables, indexes=xmidx) ```

For even more convenience, I think it might be reasonable to support special handling of Indexes instances given in Dataset / DataArray constructors and in .update(), i.e.,

```python

both cases below will implicitly add the coordinates found in xmidx

(if there's no conflict with other coordinates)

ds = xr.Dataset(indexes=xmidx)

ds2 = xr.Dataset() ds2.update(xmidx) ```

The same approach could be used for pandas.IntervalIndex (as discussed in #4579).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes to the Dataset and DataArray constructors 1175329407
1082497324 https://github.com/pydata/xarray/issues/6392#issuecomment-1082497324 https://api.github.com/repos/pydata/xarray/issues/6392 IC_kwDOAMm_X85AhZks max-sixty 5635139 2022-03-30T00:32:48Z 2022-03-30T00:32:48Z MEMBER

Thanks for the thoughtful reply @benbovy

(This is a level down and you can make a decision later, so fine if you prefer to push the discussion.)

How would we handle creating xarray objects from pandas objects where they have a multiindex?

To what extent do you think this is this the "standard case" and we could default to it?

python idx = xr.PandasMultiIndex(pd_idx, "x") indexes = {"x": idx, "foo": idx, "bar": idx}

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes to the Dataset and DataArray constructors 1175329407
1080738079 https://github.com/pydata/xarray/issues/6392#issuecomment-1080738079 https://api.github.com/repos/pydata/xarray/issues/6392 IC_kwDOAMm_X85AasEf benbovy 4160723 2022-03-28T14:38:13Z 2022-03-28T14:38:13Z MEMBER

What's the rationale for deprecating this? I think my experience with users of xarray is mostly those coming from pandas; for them interop is quite important.

Yes I agree that interoperability with pandas is important. Providing pandas (multi-)indexes via coords is convenient and worked pretty well so far because (1) indexes and dimension coordinates were not clearly distinct concepts and (2) multi-index levels were not "real" coordinates. However, this is not the case anymore.

Now that indexes are really distinct from coordinates, I'd rather expect the following behavior for the case of pandas multi-index:

```python pd_idx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("foo", "bar"))

convert a pandas multi-index to a numpy array returns level values as tuples

np.array(pd_idx)

array([('a', 1), ('a', 2), ('b', 1), ('b', 2)], dtype=object)

simply pass the index as a coordinate would treat it as an array-like, i.e., like numpy does

xr.Dataset(coords={"x": pd_idx})

<xarray.Dataset>

Dimensions: (x: 4)

Coordinates:

* x (x) object ('a', 1) ('a', 2) ('b', 1) ('b', 2)

Data variables:

empty

```

In this specific case, I'd favor consistency with how Numpy handles Pandas indexes over more convenient interoperability with Pandas. The array of tuple elements is not very useful, though. There should be ways to create Xarray objects with Pandas indexes, but I think it's better if we eventually pass them via indexes instead of via coords, or via both indexes and coords even if that's slightly less convenient.

More generally, I don't know how will evolve the ecosystem in the future (how many custom Xarray indexes?). I wonder to which point in Xarray's API we should support special cases for Pandas (multi-)indexes compared to other kinds of indexes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes to the Dataset and DataArray constructors 1175329407
1080007416 https://github.com/pydata/xarray/issues/6392#issuecomment-1080007416 https://api.github.com/repos/pydata/xarray/issues/6392 IC_kwDOAMm_X85AX5r4 max-sixty 5635139 2022-03-27T19:54:44Z 2022-03-27T19:54:44Z MEMBER

I realize there's a lot here and I've been out of this thread for a bit, so please forgive any naive questions!

I would suggest depreciating this behavior in favor of a more explicit (although more verbose) way to pass an existing pandas multi-index:

What's the rationale for deprecating this? I think my experience with users of xarray is mostly those coming from pandas; for them interop is quite important. If there's a canonical way of transforming the index, it would be friendlier to do that automatically.

```python import pandas as pd import xarray as xr

pd_idx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("foo", "bar")) idx = pd_idx

ds = xr.Dataset(coords={"x": idx}) ```

i.e.

``` ds = xr.Dataset(coords=coords)

ValueError: missing index(es) for coordinate(s): 'x', 'foo', 'bar'

or

create unindexed coordinates 'foo' and 'bar' and a 'x' coordinate with a single pandas index

```

I would have expected the later, both for coords=coords and for coords=pd_idx (again, with the disclaimer that I may be missing crucial parts of the puzzle here).

Should we silently reorder the coordinates and/or indexes when the levels are not passed in the right order? It seems odd requiring mapping elements be passed in a given order.

👍

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes to the Dataset and DataArray constructors 1175329407
1079981685 https://github.com/pydata/xarray/issues/6392#issuecomment-1079981685 https://api.github.com/repos/pydata/xarray/issues/6392 IC_kwDOAMm_X85AXzZ1 keewis 14808389 2022-03-27T17:39:59Z 2022-03-27T17:39:59Z MEMBER

I wonder if it would help to have a custom type that unlike tuple is invalid for coordinates / data variables, but allows to reduce the redundancy? E.g. python indexes = {xr.combined("lat", "lon"): idx, xr.combined("z", "x", "y"): multi_index}) This would be immediately normalized to: python indexes = {"lat": idx, "lon": idx, "z": multi_index, "x": multi_index, "y": multi_index}

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes to the Dataset and DataArray constructors 1175329407

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 892.585ms · About: xarray-datasette