home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

12 rows where state = "open", type = "pull" and user = 4160723 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, draft, created_at (date), updated_at (date)

type 1

  • pull · 12 ✖

state 1

  • open · 12 ✖

repo 1

  • xarray 12
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2227413822 PR_kwDOAMm_X85rz7ZX 8911 Refactor swap dims benbovy 4160723 open 0     5 2024-04-05T08:45:49Z 2024-04-17T16:46:34Z   MEMBER   1 pydata/xarray/pulls/8911
  • [ ] Attempt at fixing #8646
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

I've tried here re-implementing swap_dims using rename_dims, drop_indexes and set_xindex. This fixes the example in #8646 but unfortunately this fails at handling the pandas multi-index special case (i.e., a single non-dimension coordinate wrapping a pd.MultiIndex that is promoted to a dimension coordinate in swap-dims auto-magically results in a PandasMultiIndex with both dimension and level coordinates).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8911/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2215059449 PR_kwDOAMm_X85rJr7c 8888 to_base_variable: coerce multiindex data to numpy array benbovy 4160723 open 0     3 2024-03-29T10:10:42Z 2024-03-29T15:54:19Z   MEMBER   0 pydata/xarray/pulls/8888
  • [x] Closes #8887, and probably supersedes #8809
  • [x] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • ~~New functions/methods are listed in api.rst~~

@slevang this should also make work your test case added in #8809. I haven't added it here, instead I added a basic check that should be enough.

I don't really understand why the serialization backends (zarr?) do not seem to work with the PandasMultiIndexingAdapter.__array__() implementation, which should normally coerce the multi-index levels into numpy arrays as needed. Anyway, I guess that coercing it early like in this PR doesn't hurt and may avoid the confusion of a non-indexed, isolated coordinate variable that still wraps a pandas.MultiIndex.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8888/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1839199929 PR_kwDOAMm_X85XUl4W 8051 Allow setting (or skipping) new indexes in open_dataset benbovy 4160723 open 0     9 2023-08-07T10:53:46Z 2024-02-03T19:12:48Z   MEMBER   0 pydata/xarray/pulls/8051
  • [x] Closes #6633
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

This PR introduces a new boolean parameter set_indexes=True to xr.open_dataset(), which may be used to skip the creation of default (pandas) indexes when opening a dataset.

Currently works with the Zarr backend:

```python import numpy as np import xarray as xr

example dataset (real dataset may be much larger)

arr = np.random.random(size=1_000_000) xr.Dataset({"x": arr}).to_zarr("dataset.zarr")

xr.open_dataset("dataset.zarr", set_indexes=False, engine="zarr")

<xarray.Dataset>

Dimensions: (x: 1000000)

Coordinates:

x (x) float64 ...

Data variables:

empty

xr.open_zarr("dataset.zarr", set_indexes=False)

<xarray.Dataset>

Dimensions: (x: 1000000)

Coordinates:

x (x) float64 ...

Data variables:

empty

```

I'll add it to the other Xarray backends as well, but I'd like to get your thoughts about the API first.

  1. Do we want to add yet another keyword parameter to xr.open_dataset()? There are already many...
  2. Do we want to add this parameter to the BackendEntrypoint.open_dataset() API?
  3. I'm afraid we must do it if we want this parameter in xr.open_dataset()
  4. this would also make it possible skipping the creation of custom indexes (if any) in custom IO backends
  5. con: if we require set_indexes in the signature in addition to the drop_variables parameter, this is a breaking change for all existing 3rd-party backends. Or should we group set_indexes with the other xarray decoder kwargs? This would feel a bit odd to me as setting indexes is different from decoding data.
  6. Or should we leave this up to the backends?
  7. pros: no breaking change, more flexible (3rd party backends may want to offer more control like choosing between custom indexes and default pandas indexes or skipping the creation of indexes by default)
  8. cons: less discoverable, consistency is not enforced across 3rd party backends (although for such advanced case this is probably OK), not available by default in every backend.

Currently 1 and 2 are implemented in this PR, although as I write this comment I think that I would prefer 3. I guess this depends on whether we prefer open_*** vs. xr.open_dataset(engine="***") and unless I missed something there is still no real consensus about that? (e.g., #7496).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8051/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1879109770 PR_kwDOAMm_X85ZbILy 8140 Deprecate passing pd.MultiIndex implicitly benbovy 4160723 open 0     23 2023-09-03T14:01:18Z 2023-11-15T20:15:00Z   MEMBER   0 pydata/xarray/pulls/8140
  • Follow-up #8094
  • [x] Closes #6481
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst

This PR should normally raise a warning each time when indexed coordinates are created implicitly from a pd.MultiIndex object.

I updated the tests to create coordinates explicitly using Coordinates.from_pandas_multiindex().

I also refactored some parts where a pd.MultiIndex could still be passed and promoted internally, with the exception of:

  • swap_dims(): it should raise a warning! Right now the warning message is a bit confusing for this case, but instead of adding a special case we should probably deprecate the whole method? As it is suggested as a TODO comment... This method was to circumvent the limitations of dimension coordinates, which isn't needed anymore (rename_dims and/or set_xindex is equivalent and less confusing).
  • xr.DataArray(pandas_obj_with_multiindex, dims=...): I guess it should raise a warning too?
  • da.stack(z=...).groupby("z"): it shoudn't raise a warning, but this requires a (heavy?) refactoring of groupby. During building the "grouper" objects, grouper.group1d or grouper.unique_coord may still be built by extracting only the multi-index dimension coordinate. I'd greatly appreciate if anyone familiar with the groupby implementation could help me with this! @dcherian ?
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8140/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1865494976 PR_kwDOAMm_X85Ytlq0 8111 Alignment: allow flexible index coordinate order benbovy 4160723 open 0     3 2023-08-24T16:18:49Z 2023-09-28T15:58:38Z   MEMBER   0 pydata/xarray/pulls/8111
  • [x] Closes #7002
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst

This PR relaxes some of the rules used in alignment for finding the indexes to compare or join together. Those indexes must still be of the same type and must relate to the same set of coordinates (and dimensions), but the order of coordinates is now ignored.

It is up to the index to implement the equal / join logic if it needs to care about that order.

Regarding pandas.MultiIndex, it seems that the level names are ignored when comparing indexes:

```python midx = pd.MultiIndex.from_product([["a", "b"], [0, 1]], names=("one", "two"))) midx2 = pd.MultiIndex.from_product([["a", "b"], [0, 1]], names=("two", "one"))

midx.equals(midx2) # True ```

However, in Xarray the names of the multi-index levels (and their order) matter since each level has its own xarray coordinate. In this PR, PandasMultiIndex.equals() and PandasMultiIndex.join() thus check that the level names match.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8111/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1869879398 PR_kwDOAMm_X85Y8P4c 8118 Add Coordinates `set_xindex()` and `drop_indexes()` methods benbovy 4160723 open 0     0 2023-08-28T14:28:24Z 2023-09-19T01:53:18Z   MEMBER   0 pydata/xarray/pulls/8118
  • Complements #8102
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

I don't think that we need to copy most API from Dataset / DataArray to Coordinates, but I find it convenient to have some relevant methods there too. For example, building Coordinates from scratch (with custom indexes) before passing the whole coords + indexes bundle around:

```python import dask.array as da import numpy as np import xarray as xr

coords = ( xr.Coordinates( coords={"x": da.arange(100_000_000), "y": np.arange(100)}, indexes={}, ) .set_xindex("x", DaskIndex) .set_xindex("y", xr.indexes.PandasIndex) )

ds = xr.Dataset(coords=coords)

<xarray.Dataset>

Dimensions: (x: 100000000, y: 100)

Coordinates:

* x (x) int64 dask.array<chunksize=(16777216,), meta=np.ndarray>

* y (y) int64 0 1 2 3 4 5 6 7 8 9 10 ... 90 91 92 93 94 95 96 97 98 99

Data variables:

empty

Indexes:

x DaskIndex

```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8118/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1889751633 PR_kwDOAMm_X85Z-5v1 8170 Dataset.from_dataframe: optionally keep multi-index unexpanded benbovy 4160723 open 0     0 2023-09-11T06:20:17Z 2023-09-11T06:20:17Z   MEMBER   1 pydata/xarray/pulls/8170
  • [x] Closes #8166
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

I added both the unstack and dim arguments but we can change that.

  • [ ] update DataArray.from_series()
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8170/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1880184915 PR_kwDOAMm_X85ZespA 8143 Deprecate the multi-index dimension coordinate benbovy 4160723 open 0     0 2023-09-04T12:32:36Z 2023-09-04T12:32:48Z   MEMBER   0 pydata/xarray/pulls/8143
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst

This PR adds a future_no_mindex_dim_coord=False option that, if set to True, enables the future behavior of PandasMultiIndex (i.e., no added dimension coordinate with tuple values):

```python import xarray as xr

ds = xr.Dataset(coords={"x": ["a", "b"], "y": [1, 2]})

ds.stack(z=["x", "y"])

<xarray.Dataset>

Dimensions: (z: 4)

Coordinates:

* z (z) object MultiIndex

* x (z) <U1 'a' 'a' 'b' 'b'

* y (z) int64 1 2 1 2

Data variables:

empty

with xr.set_options(future_no_mindex_dim_coord=True): ds.stack(z=["x", "y"])

<xarray.Dataset>

Dimensions: (z: 4)

Coordinates:

* x (z) <U1 'a' 'a' 'b' 'b'

* y (z) int64 1 2 1 2

Dimensions without coordinates: z

Data variables:

empty

```

There are a few other things that we'll need to adapt or deprecate:

  • Dropping multi-index dimension coordinate de-facto allows having several multi-indexes along the same dimension. Normally stack should already take this into account, but there may be other places where this is not yet supported or where we should raise an explicit error.
  • Deprecate Dataset.reorder_levels: API is not compatible with the absence of dimension coordinate and several multi-indexes along the same dimension. I think it is OK to deprecate such edge case, which alternatively could be done by extracting the pandas index, updating it and then re-assign it to a the dataset with assign_coords(xr.Coordinates.from_pandas_multiindex(...))
  • The text-based repr: in the example above, Dimensions without coordinate: z doesn't make much sense
  • ... ?

I started updating the tests, although this will be much easier once #8140 is merged. This is something that we could also easily split into multiple PRs. It is probably OK if some features are (temporarily) breaking badly when setting future_no_mindex_dim_coord=True.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8143/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1874412700 PR_kwDOAMm_X85ZLe24 8124 More flexible index variables benbovy 4160723 open 0     0 2023-08-30T21:45:12Z 2023-08-31T16:02:20Z   MEMBER   1 pydata/xarray/pulls/8124
  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

The goal of this PR is to provide a more general solution to indexed coordinate variables, i.e., support arbitrary dimensions and/or duck arrays for those variables while at the same time prevent them from being updated in a way that would invalidate their index.

This would solve problems like the one mentioned here: https://github.com/pydata/xarray/issues/1650#issuecomment-1697237429

@shoyer I've tried to implement what you have suggested in https://github.com/pydata/xarray/pull/4979#discussion_r589798510. It would be nice indeed if eventually we could get rid of IndexVariable. It won't be easy to deprecate it until we finish the index refactor (i.e., all methods listed in #6293), though. Also, I didn't find an easy way to refactor that class as it has been designed too closely around a 1-d variable backed by a pandas.Index.

So the approach implemented in this PR is to keep using IndexVariable for PandasIndex until we can deprecate / remove it later, and for the other cases use Variable with data wrapped in a custom IndexedCoordinateArray object.

The latter solution (wrapper) doesn't always work nicely, though. For example, several methods of Variable expect that self._data directly returns a duck array (e.g., a dask array or a chunked duck array). A wrapped duck array will result in unexpected behavior there. We could probably add some checks / indirection or extend the wrapper API... But I wonder if there wouldn't be a more elegant approach?

More generally, which operations should we allow / forbid / skip for an indexed coordinate variable?

  • Set array items in-place? Do not allow.
  • Replace data? Do not allow.
  • (Re)Chunk?
  • Load lazy data?
  • ... ?

(Note: we could add Index.chunk() and Index.load() methods in order to allow an Xarray index implement custom logic for the two latter cases like, e.g., convert a DaskIndex to a PandasIndex during load, see #8128).

cc @andersy005 (some changes made here may conflict with what you are refactoring in #8075).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8124/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1875631817 PR_kwDOAMm_X85ZPnjq 8128 Add Index.load() and Index.chunk() methods benbovy 4160723 open 0     0 2023-08-31T14:16:27Z 2023-08-31T15:49:06Z   MEMBER   1 pydata/xarray/pulls/8128
  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

As mentioned in #8124, it gives more control to custom Xarray indexes on what best to do when the Dataset / DataArray load() and chunk() counterpart methods are called.

PandasIndex.load() and PandasIndex.chunk() always return self (no action required).

For a DaskIndex, we might want to return a PandasIndex (or another non-lazy index) from load() and rebuild a DaskIndex object from chunk() (rechunk).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8128/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1412901282 PR_kwDOAMm_X85A_96j 7182 add MultiPandasIndex helper class benbovy 4160723 open 0     2 2022-10-18T09:42:58Z 2023-08-23T16:30:28Z   MEMBER   1 pydata/xarray/pulls/7182
  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

This PR adds a xarray.indexes.MultiPandasIndex helper class for building custom, meta-indexes that encapsulate multiple PandasIndex instances. Unlike PandasMultiIndex, the meta-index classes inheriting from this helper class may encapsulate loosely coupled (pandas) indexes, with coordinates of arbitrary dimensions (each coordinate must be 1-dimensional but an Xarray index may be created from coordinates with differing dimensions).

Early prototype in this notebook

TODO / TO FIX:

  • How to allow custom __init__ options in subclasses be passed to all the type(self)(new_indexes) calls inside the MultiPandasIndex "base" class? This could be done via **kwargs passed through... However, mypy will certainly complain (Liskov Substitution Principle).
  • Is MultiPandasIndex a good name for this helper class?
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7182/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1364798843 PR_kwDOAMm_X84-hLRI 7004 Rework PandasMultiIndex.sel internals benbovy 4160723 open 0     2 2022-09-07T14:57:29Z 2022-09-22T20:38:41Z   MEMBER   0 pydata/xarray/pulls/7004
  • [x] Closes #6838
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst

This PR hopefully improves how are handled the labels that are provided for multi-index level coordinates in .sel().

More specifically, slices are handled in a cleaner way and it is now allowed to provide array-like labels.

PandasMultiIndex.sel() relies on the underlying pandas.MultiIndex methods like this:

  • use get_loc when all levels are provided with each a scalar label (no slice, no array)
  • always drops the index and returns scalar coordinates for each multi-index level
  • use get_loc_level when only a subset of levels are provided with scalar labels only
  • may collapse one or more levels of the multi-index (dropped levels result in scalar coordinates)
  • if only one level remains: renames the dimension and the corresponding dimension coordinate
  • use get_locs for all other cases.
  • always keeps the multi-index and its coordinates (even if only one item or one level is selected)

This yields a predictable behavior: as soon as one of the provided labels is a slice or array-like, the multi-index and all its level coordinates are kept in the result.

Some cases illustrated below (I compare this PR with an older release due to the errors reported in #6838):

```python import xarray as xr import pandas as pd

midx = pd.MultiIndex.from_product([list("abc"), range(4)], names=("one", "two")) ds = xr.Dataset(coords={"x": midx})

<xarray.Dataset>

Dimensions: (x: 12)

Coordinates:

* x (x) object MultiIndex

* one (x) object 'a' 'a' 'a' 'a' 'b' 'b' 'b' 'b' 'c' 'c' 'c' 'c'

* two (x) int64 0 1 2 3 0 1 2 3 0 1 2 3

Data variables:

empty

```

```python ds.sel(one="a", two=0)

this PR

<xarray.Dataset>

Dimensions: ()

Coordinates:

x object ('a', 0)

one <U1 'a'

two int64 0

Data variables:

empty

v2022.3.0

<xarray.Dataset>

Dimensions: ()

Coordinates:

x object ('a', 0)

Data variables:

empty

```

```python ds.sel(one="a")

this PR:

<xarray.Dataset>

Dimensions: (two: 4)

Coordinates:

* two (two) int64 0 1 2 3

one <U1 'a'

Data variables:

empty

v2022.3.0

<xarray.Dataset>

Dimensions: (two: 4)

Coordinates:

* two (two) int64 0 1 2 3

Data variables:

empty

```

```python ds.sel(one=slice("a", "b"))

this PR

<xarray.Dataset>

Dimensions: (x: 8)

Coordinates:

* x (x) object MultiIndex

* one (x) object 'a' 'a' 'a' 'a' 'b' 'b' 'b' 'b'

* two (x) int64 0 1 2 3 0 1 2 3

Data variables:

empty

v2022.3.0

<xarray.Dataset>

Dimensions: (two: 8)

Coordinates:

* two (two) int64 0 1 2 3 0 1 2 3

Data variables:

empty

```

```python ds.sel(one="a", two=slice(1, 1))

this PR

<xarray.Dataset>

Dimensions: (x: 1)

Coordinates:

* x (x) object MultiIndex

* one (x) object 'a'

* two (x) int64 1

Data variables:

empty

v2022.3.0

<xarray.Dataset>

Dimensions: (x: 1)

Coordinates:

* x (x) MultiIndex

- one (x) object 'a'

- two (x) int64 1

Data variables:

empty

```

```python ds.sel(one=["b", "c"], two=[0, 2])

this PR

<xarray.Dataset>

Dimensions: (x: 4)

Coordinates:

* x (x) object MultiIndex

* one (x) object 'b' 'b' 'c' 'c'

* two (x) int64 0 2 0 2

Data variables:

empty

v2022.3.0

ValueError: Vectorized selection is not available along coordinate 'one' (multi-index level)

```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7004/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 24.091ms · About: xarray-datasette