home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 860418546

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
860418546 MDU6SXNzdWU4NjA0MTg1NDY= 5179 N-dimensional boolean indexing 1200058 open 0     6 2021-04-17T14:07:48Z 2021-07-16T17:30:45Z   NONE      

Currently, the docs state that boolean indexing is only possible with 1-dimensional arrays: http://xarray.pydata.org/en/stable/indexing.html

However, I often have the case where I'd like to convert a subset of an xarray to a dataframe. Usually, I would call e.g.: python data = xrds.stack(observations=["dim1", "dim2", "dim3"]) data = data.isel(~ data.missing) df = data.to_dataframe()

However, this approach is incredibly slow and memory-demanding, since it creates a MultiIndex of every possible coordinate in the array.

Describe the solution you'd like A better approach would be to directly allow index selection with the boolean array: python data = xrds.isel(~ xrds.missing, dim="observations") df = data.to_dataframe() This way, it is possible to 1) Identify the resulting coordinates with np.argwhere() 2) Directly use the underlying array for fancy indexing: variable.data[mask]

Additional context I created a proof-of-concept that works for my projects: https://gist.github.com/Hoeze/c746ea1e5fef40d99997f765c48d3c0d Some important lines are those: ```python def core_dim_locs_from_cond(cond, new_dim_name, core_dims=None) -> List[Tuple[str, xr.DataArray]]: [...] core_dim_locs = np.argwhere(cond.data) if isinstance(core_dim_locs, dask.array.core.Array): core_dim_locs = core_dim_locs.persist().compute_chunk_sizes()

def subset_variable(variable, core_dim_locs, new_dim_name, mask=None): [...] subset = dask.array.asanyarray(variable.data)[mask] # force-set chunk size from known chunks chunk_sizes = core_dim_locs[0][1].chunks[0] subset._chunks = (chunk_sizes, *subset._chunks[1:]) ```

As a result, I would expect something like this:

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5179/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 6 rows from issue in issue_comments
Powered by Datasette · Queries took 0.697ms · About: xarray-datasette