home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

1 row where comments = 11, "updated_at" is on date 2023-08-29 and user = 20629530 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue 1

state 1

  • closed 1

repo 1

  • xarray 1
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1831975171 I_kwDOAMm_X85tMbkD 8039 Update assign_coords with a MultiIndex to match new Coordinates API aulemahal 20629530 closed 0     11 2023-08-01T20:22:41Z 2023-08-29T14:23:30Z 2023-08-29T14:23:30Z CONTRIBUTOR      

What is your issue?

A pattern we used in xclim (and elsewhere) seems to be broken on the master.

See MWE: ```python3 import pandas as pd import xarray as xr

da = xr.DataArray([1] * 730, coords={"time": xr.date_range('1900-01-01', periods=730, freq='D', calendar='noleap')}) mulind = pd.MultiIndex.from_arrays((da.time.dt.year.values, da.time.dt.dayofyear.values), names=('year', 'doy'))

Override previous time axis with new MultiIndex

da.assign_coords(time=mulind).unstack('time') ```

Now this works ok with both the current master and the latest release. However, if we chunk da, the last line now fails: python da.chunk(time=50).assign_coords(time=mulind).unstack('time') On the master, this gives: ValueError: unmatched keys found in indexes and variables: {'year', 'doy'}

Full traceback:

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[44], line 1 ----> 1 da.chunk(time=50).assign_coords(time=mulind).unstack("time") File ~/Projets/xarray/xarray/core/dataarray.py:2868, in DataArray.unstack(self, dim, fill_value, sparse) 2808 def unstack( 2809 self, 2810 dim: Dims = None, 2811 fill_value: Any = dtypes.NA, 2812 sparse: bool = False, 2813 ) -> DataArray: 2814 """ 2815 Unstack existing dimensions corresponding to MultiIndexes into 2816 multiple new dimensions. (...) 2866 DataArray.stack 2867 """ -> 2868 ds = self._to_temp_dataset().unstack(dim, fill_value, sparse) 2869 return self._from_temp_dataset(ds) File ~/Projets/xarray/xarray/core/dataset.py:5481, in Dataset.unstack(self, dim, fill_value, sparse) 5479 for d in dims: 5480 if needs_full_reindex: -> 5481 result = result._unstack_full_reindex( 5482 d, stacked_indexes[d], fill_value, sparse 5483 ) 5484 else: 5485 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse) File ~/Projets/xarray/xarray/core/dataset.py:5365, in Dataset._unstack_full_reindex(self, dim, index_and_vars, fill_value, sparse) 5362 else: 5363 # TODO: we may depreciate implicit re-indexing with a pandas.MultiIndex 5364 xr_full_idx = PandasMultiIndex(full_idx, dim) -> 5365 indexers = Indexes( 5366 {k: xr_full_idx for k in index_vars}, 5367 xr_full_idx.create_variables(index_vars), 5368 ) 5369 obj = self._reindex( 5370 indexers, copy=False, fill_value=fill_value, sparse=sparse 5371 ) 5373 for name, var in obj.variables.items(): File ~/Projets/xarray/xarray/core/indexes.py:1435, in Indexes.__init__(self, indexes, variables, index_type) 1433 unmatched_keys = set(indexes) ^ set(variables) 1434 if unmatched_keys: -> 1435 raise ValueError( 1436 f"unmatched keys found in indexes and variables: {unmatched_keys}" 1437 ) 1439 if any(not isinstance(idx, index_type) for idx in indexes.values()): 1440 index_type_str = f"{index_type.__module__}.{index_type.__name__}" ValueError: unmatched keys found in indexes and variables: {'year', 'doy'}

This seems related to PR #7368.

The reason for the title of this issue is that in both versions, I now realize the da.assign_coords(time=mulind) prints as: <xarray.DataArray (time: 730)> dask.array<xarray-<this-array>, shape=(730,), dtype=int64, chunksize=(50,), chunktype=numpy.ndarray> Coordinates: * time (time) object MultiIndex Something's fishy, because the two "sub" indexes are not showing.

And indeed, with the current master, I can get this to work by doing (again changing the last line): python da2 = xr.DataArray(da.data, coords=xr.Coordinates.from_pandas_multiindex(mulind, 'time')) da2.chunk(time=50).unstack('time') But it seems a bit odd to me that we need to reconstruct the DataArray to replace its coordinate with a "MultiIndex" one.

Thus, my questions are:

  1. How does one properly override a coordinate by a MultiIndex ? Is there a way to use assign_coords ? If not, then this issue would become a feature request.
  2. Is this a regression ? Or was I just "lucky" before ?
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8039/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 54.663ms · About: xarray-datasette