home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

62 rows where repo = 13221727, state = "open" and user = 35968931 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, draft, created_at (date), updated_at (date)

type 2

  • issue 44
  • pull 18

state 1

  • open · 62 ✖

repo 1

  • xarray · 62 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2276408691 I_kwDOAMm_X86Hrz1z 8995 Why does xr.apply_ufunc support numpy/dask.arrays? TomNicholas 35968931 open 0     0 2024-05-02T20:18:41Z 2024-05-03T22:03:43Z   MEMBER      

What is your issue?

@keewis pointed out that it's weird that xarray.apply_ufunc supports passing numpy/dask arrays directly, and I'm inclined to agree. I don't understand why we do, and think we should consider removing that feature.

Two arguments in favour of removing it:

1) It exposes users to transposition errors

Consider this example:

```python In [1]: import xarray as xr

In [2]: import numpy as np

In [3]: arr = np.arange(12).reshape(3, 4)

In [4]: def mean(obj, dim): ...: # note: apply always moves core dimensions to the end ...: return xr.apply_ufunc( ...: np.mean, obj, input_core_dims=[[dim]], kwargs={"axis": -1} ...: ) ...:

In [5]: mean(arr, dim='time') Out[5]: array([1.5, 5.5, 9.5])

In [6]: mean(arr.T, dim='time') Out[6]: array([4., 5., 6., 7.]) ```

Transposing the input leads to a different result, with the value of the dim kwarg effectively ignored. This kind of error is what xarray code is supposed to prevent by design.

2) There is an alternative input pattern that doesn't require accepting bare arrays

Instead, any numpy/dask array can just be wrapped up into an xarray Variable/NamedArray before passing it to apply_ufunc.

```python In [7]: from xarray.core.variable import Variable

In [8]: var = Variable(data=arr, dims=['time', 'space'])

In [9]: mean(var, dim='time') Out[9]: <xarray.Variable (space: 4)> Size: 32B array([4., 5., 6., 7.])

In [10]: mean(var.T, dim='time') Out[10]: <xarray.Variable (space: 4)> Size: 32B array([4., 5., 6., 7.]) ```

This now guards against the transposition error, and puts the onus on the user to be clear about which axes of their array correspond to which dimension.

With Variable/NamedArray as public API, this latter pattern can handle every case that passing bare arrays in could.

I suggest we deprecate accepting bare arrays in favour of having users wrap them in Variable/NamedArray/DataArray objects instead.

(Note 1: We also accept raw scalars, but this doesn't expose anyone to transposition errors.)

(Note 2: In a quick scan of the apply_ufunc docstring, the docs on it in computation.rst, and the extensive guide that @dcherian wrote in the xarray tutorial repository, I can't see any examples that actually pass bare arrays to apply_ufunc.)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8995/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2276352251 I_kwDOAMm_X86HrmD7 8994 Improving performance of open_datatree TomNicholas 35968931 open 0     4 2024-05-02T19:43:17Z 2024-05-03T15:25:33Z   MEMBER      

What is your issue?

The implementation of open_datatree works, but is inefficient, because it calls open_dataset once for every group in the file. We should refactor this to improve the performance, which would fix issues like https://github.com/xarray-contrib/datatree/issues/330.

We discussed this in the datatree meeting, and my understanding is that concretely we need to:

  • [ ] Create an asv benchmark for open_datatree, probably involving first writing then benchmarking the opening of a special netCDF file that has no data but lots of groups.
  • [ ] Refactor the NetCDFDatastore class to only create one CachingFileManager object per file, not one per group, see https://github.com/pydata/xarray/blob/748bb3a328a65416022ec44ced8d461f143081b5/xarray/backends/netCDF4_.py#L406.
  • [ ] Refactor NetCDF4BackendEntrypoint.open_datatree to use an implementation that goes through NetCDFDatastore without calling the top-level xr.open_dataset again.
  • [ ] Check the performance of calling xr.open_datatree on a netCDF file has actually improved.

It would be great to get this done soon as part of the datatree integration project. @kmuehlbauer I know you were interested - are you willing / do you have time to take this task on?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8994/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2054280736 I_kwDOAMm_X856cdYg 8572 Track merging datatree into xarray TomNicholas 35968931 open 0     27 2023-12-22T17:37:20Z 2024-05-02T19:44:29Z   MEMBER      

What is your issue?

Master issue to track progress of merging xarray-datatree into xarray main. Would close https://github.com/pydata/xarray/issues/4118 (and many similar issues), as well as one of the goals of our development roadmap.

Also see the project board for DataTree integration.


On calls in the last few dev meetings, we decided to forget about a temporary cross-repo from xarray import datatree (so this issue supercedes #7418), and just begin merging datatree into xarray main directly.

Weekly meeting

See https://github.com/pydata/xarray/issues/8747

Task list:

To happen in order:

  • [x] open_datatree in xarray. This doesn't need to be performant initially, and ~~it would initially return a datatree.DataTree object.~~ EDIT: We decided it should return an xarray.DataTree object, or even xarray.core.datatree.DataTree object. So we can start by just copying the basic version in datatree/io.py right now which just calls open_dataset many times. #8697
  • [x] Triage and fix issues: figure out which of the issues on xarray-contrib/datatree need to be fixed before the merge (if any).
  • [ ] Merge in code for DataTree class. I suggest we do this by making one PR for each module, and ideally discussing and merging each before opening a PR for the next module. (Open to other workflow suggestions though.) The main aim here being lowering the bus factor on the code, confirming high-level design decisions, and improving details of the implementation as it goes in.

    Suggested order of modules to merge: - [x] datatree/treenode.py - defines the tree structure, without any dimensions/data attached, #8757 - [x] datatree/datatree.py - adds data to the tree structure, #8789 - [x] datatree/iterators.py - iterates over a single tree in various ways, currently copied from anytree, #8879 - [x] datatree/mapping.py - implements map_over_subtree by iterating over N trees at once https://github.com/pydata/xarray/pull/8948, - [ ] datatree/ops.py - uses map_over_subtree to map methods like .mean over whole trees (https://github.com/pydata/xarray/pull/8976), - [x] datatree/formatting_html.py - HTML repr, works but could do with some optimization https://github.com/pydata/xarray/pull/8930, - [x] datatree/{extensions/common}.py - miscellaneous other features e.g. attribute-like access (#8967).

  • [ ] Expose datatree API publicly. Actually expose open_datatree and DataTree in xarray's public API as top-level imports. The full list of things to expose is:

  • [ ] open_datatree
  • [ ] DataTree
  • [ ] map_over_subtree
  • [ ] assert_isomorphic
  • [ ] register_datatree_accessor

  • [ ] Refactor class inheritance - Dataset/DataArray share some mixin classes (e.g. DataWithCoords), and we could probably refactor DataTree to use these too. This is low-priority but would reduce code duplication.

Can happen basically at any time or maybe in parallel with other efforts:

  • [ ] Generalize backends to support groups. Once a basic version of xr.open_datatree exists, we can start refactoring xarray's backend classes to support a general Backend.open_datatree method for any backend that can open multiple groups. Then we can make sure this is more performant than the naive implementation, i.e. only opening the file once. See also #8994.
  • [ ] Support backends other than netCDF and Zarr. - e.g. grib, see https://github.com/pydata/xarray/pull/7437,
  • [ ] Support dask properly - Issue https://github.com/xarray-contrib/datatree/pull/97 and the (stale) PR https://github.com/xarray-contrib/datatree/pull/196 are about dask parallelization over separate nodes in the tree.
  • [ ] Add other new high-level API methods - Things like .reorder_nodes and ideas we've only discussed like https://github.com/xarray-contrib/datatree/issues/79 and https://github.com/xarray-contrib/datatree/issues/254 (cc @dcherian who has had useful ideas here)
  • [ ] Copy xarray-contrib/datatree issues over to xarray's main repository. I think this is quite important and worth doing as a record of why decisions were made. (@jhamman and @TomNicholas)
  • [ ] Copy over any recent bug fixes from original datatree repository
  • [x] Look into merging commit history of xarray-contrib/datatree. I think this would be cool but is less important than keeping the issues. (@jhamman suggested we could do this using some git wizardry that I hadn't heard of before)
  • [ ] xarray.tutorial.open_datatree - I've been meaning to make a tutorial datatree object for ages. There's an issue about it, but actually now I think something close to the CMIP6 ensemble data that @jbusecke and I used in our pangeo blog post would already be pretty good. Once we have this it becomes much easier to write docs about some advanced features.
  • [ ] Merge Docs - I've tried to write these pages so that they should slot neatly into xarray's existing docs structure. Careful reading, additions and improvements would be great though. Summary of what docs exist on this issue https://github.com/xarray-contrib/datatree/issues/61
  • [ ] Write a blog post on the xarray blog highlighting xarray's new functionality, and explicitly thanking the NASA team for their work. Doesn't have to be long, it can just point to the documentation.

Anyone is welcome to help with any of this, including but not limited to @owenlittlejohns , @eni-awowale, @flamingbear (@etienneschalk maybe?).

cc also @shoyer @keewis for any thoughts as to the process.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8572/reactions",
    "total_count": 7,
    "+1": 6,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
    xarray 13221727 issue
2204914380 PR_kwDOAMm_X85qnPSf 8872 Avoid auto creation of indexes in concat TomNicholas 35968931 open 0     15 2024-03-25T05:16:33Z 2024-05-01T19:07:01Z   MEMBER   0 pydata/xarray/pulls/8872

If we create a Coordinates object using the concatenated result_indexes, and pass that to the Dataset constructor, we can explicitly set the correct indexes from the start, instead of auto-creating the wrong ones and then trying to overwrite them with the correct indexes later (which is what the current implementation does).

  • [x] Possible fix for #8871
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8872/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2267803218 PR_kwDOAMm_X85t8pSN 8980 Complete deprecation of Dataset.dims returning dict TomNicholas 35968931 open 0     6 2024-04-28T20:32:29Z 2024-05-01T15:40:44Z   MEMBER   0 pydata/xarray/pulls/8980
  • [x] Completes deprecation cycle described in #8496, and started in #8500
  • [ ] ~~Tests added~~
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8980/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2267780811 PR_kwDOAMm_X85t8kgX 8979 Warn on automatic coercion to coordinate variables in Dataset constructor TomNicholas 35968931 open 0     2 2024-04-28T19:44:20Z 2024-04-29T21:13:00Z   MEMBER   0 pydata/xarray/pulls/8979
  • [x] Starts the deprecation cycle for #8959
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
  • [ ] Change existing code + examples so as not to emit this new warning everywhere.
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8979/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2253567622 I_kwDOAMm_X86GUraG 8959 Dataset constructor always coerces 1D data variables with same name as dim to coordinates TomNicholas 35968931 open 0     10 2024-04-19T17:54:28Z 2024-04-28T19:57:31Z   MEMBER      

What is your issue?

Whilst xarray's data model appears to allow 1D data variables that have the same name as their dimension, it seems to be impossible to actually create this using the Dataset constructor, as they will always be converted to coordinate variables instead.

We can create a 1D data variable with the same name as it's dimension like this: ```python In [9]: ds = xr.Dataset({'x': 0})

In [10]: ds Out[10]: <xarray.Dataset> Size: 8B Dimensions: () Data variables: x int64 8B 0

In [11]: ds.expand_dims('x') Out[11]: <xarray.Dataset> Size: 8B Dimensions: (x: 1) Dimensions without coordinates: x Data variables: x (x) int64 8B 0 ``` so it seems to be a valid part of the data model.

But I can't get to that situation from the Dataset constructor. This should create the same dataset:

```python In [15]: ds = xr.Dataset(data_vars={'x': ('x', [0])})

In [16]: ds Out[16]: <xarray.Dataset> Size: 8B Dimensions: (x: 1) Coordinates: * x (x) int64 8B 0 Data variables: empty `` But actually it makesxa coordinate variable (and implicitly creates a pandas Index for it). This means that in this case there is no difference between using thedata_varsandcoords` kwargs to the constructor:

```python ds = xr.Dataset(coords={'x': ('x', [0])})

In [18]: ds Out[18]: <xarray.Dataset> Size: 8B Dimensions: (x: 1) Coordinates: * x (x) int64 8B 0 Data variables: empty ```

This all seems weird to me. I would have thought that if a 1D data variable is allowed, we shouldn't coerce to making it a coordinate variable in the constructor. If anything that's actively misleading.

Note that whilst this came up in the context of trying to avoid auto-creation of 1D indexes for coordinate variables, this issue is actually separate. (xref https://github.com/pydata/xarray/pull/8872#issuecomment-2027571714)

cc @benbovy who probably has thoughts

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8959/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2204768593 I_kwDOAMm_X86DahlR 8871 Concatenation automatically creates indexes where none existed TomNicholas 35968931 open 0     1 2024-03-25T02:43:31Z 2024-04-27T16:50:56Z   MEMBER      

What happened?

Currently concatenation will automatically create indexes for any dimension coordinates in the output, even if there were no indexes on the input.

What did you expect to happen?

Indexes not to be created for variables which did not already have them.

Minimal Complete Verifiable Example

```Python

TODO once passing indexes={} directly to DataArray constructor is allowed then no need to create coords object separately first

coords = Coordinates( {"x": np.array([1, 2, 3])}, indexes={} ) arrays = [ DataArray( np.zeros((3, 3)), dims=["x", "y"], coords=coords, ) for _ in range(2) ]

combined = concat(arrays, dim="x") assert combined.shape == (6, 3) assert combined.dims == ("x", "y")

should not have auto-created any indexes

assert combined.indexes == {} # this fails

combined = concat(arrays, dim="z") assert combined.shape == (2, 3, 3) assert combined.dims == ("z", "x", "y")

should not have auto-created any indexes

assert combined.indexes == {} # this also fails ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

```Python

nor have auto-created any indexes

  assert combined.indexes == {}

E AssertionError: assert Indexes:\n x Index([1, 2, 3, 1, 2, 3], dtype='int64', name='x') == {} E Full diff: E - { E - , E - } E + Indexes: E + x Index([1, 2, 3, 1, 2, 3], dtype='int64', name='x', E + ) ```

Anything else we need to know?

The culprit is the call to core.indexes.create_default_index_implicit inside merge.py. If I comment out this call my concat test passes, but basic tests in test_merge.py start failing.

I would like know to how to avoid the internal call to create_default_index_implicit. I tried passing compat='override' but that made no difference, so I think we would have to change merge.collect_variables_and_indexes somehow.

Conceptually, I would have thought we should be examining what indexes exist on the objects to be concatenated, and not creating new indexes for any variable that doesn't already have one. Presumably we should therefore be making use of the indexes argument to merge.collect_variables_and_indexes, but currently that just seems to be empty.

Environment

I've been experimenting running this test on a branch that includes both #8711 and #8714, but actually this example will fail in the same way on main.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8871/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2259850888 I_kwDOAMm_X86GspaI 8966 HTML repr for chunked variables with high dimensionality TomNicholas 35968931 open 0     1 2024-04-23T22:00:40Z 2024-04-24T13:27:05Z   MEMBER      

What is your issue?

The graphical representation of dask arrays with many dimensions can end up off the page in the HTML repr.

Ideally dask would worry about this for us, and we just use their _inline_repr, as mentioned here https://github.com/pydata/xarray/issues/4376#issuecomment-680296332

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8966/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1692909704 PR_kwDOAMm_X85PnMF6 7811 Generalize delayed TomNicholas 35968931 open 0     0 2023-05-02T18:34:26Z 2024-04-23T17:41:55Z   MEMBER   0 pydata/xarray/pulls/7811

A small follow-on to #7019 to allow using non-dask implementations of delayed.

(Builds off of #7019)

  • [x] Closes #7810
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7811/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1692904446 I_kwDOAMm_X85k56v- 7810 Generalize dask.delayed calls to go through ChunkManager TomNicholas 35968931 open 0     0 2023-05-02T18:30:32Z 2024-04-23T17:38:58Z   MEMBER      

[Deepak: Should we add chunked_array_type and from_array_kwargs to open_mfdataset?

I actually don't think we need to - from_array_kwargs is only going to get directly passed down to open_dataset, and hence could be considered part of **kwargs.

This should actually just work, except in the case of parallel=True. For that we could add delayed to the ChunkManager ABC, so that if cubed does implement cubed.delayed it could be added, else a NotImplementedError would be raised. I think all of this wouldn't be necessary if we had lazy concatenation in xarray though (xref https://github.com/pydata/xarray/issues/4628). That suggestion would mean we should also replace other instances of dask.delayed in other parts of the codebase though... I think I will split this into a separate issue in the interests of getting this one merged.

Originally posted by @TomNicholas in https://github.com/pydata/xarray/pull/7019#discussion_r1182904134

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7810/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2134951079 I_kwDOAMm_X85_QMSn 8747 Datatree design discussions - weekly meeting TomNicholas 35968931 open 0     10 2024-02-14T18:39:16Z 2024-04-18T22:09:16Z   MEMBER      

What is your issue?

In the bi-weekly dev meeting today we agreed that deliberate higher-level discussions of datatree's design would be useful. (i.e. we're not worried about our ability to write high-quality code, so let's focus review time more explicitly on the high-level design questions.)

This could take the form of me just talking through what I did in a certain part of the code and why, or a targeted discussion on specific design questions that I was never quite sure about. Some examples of the latter, as food for thought: - [ ] Inheritance of dimension coordinates from parent nodes? https://github.com/xarray-contrib/datatree/issues/297 - [x] ~~Symbolic links? https://github.com/xarray-contrib/datatree/issues/5~~ (we decided this was overkill) - [ ] Is dt.ds ugly? See also the difference between dt.ds and dt.to_dataset() https://github.com/xarray-contrib/datatree/issues/303#issuecomment-1917798769 - [ ] Which methods should map over the subtree and which shouldn't? (can't find the issue for this one) - [ ] Ignore missing dims when mapping over subtree? https://github.com/xarray-contrib/datatree/issues/67 - [ ] API for sub-tree selection https://github.com/xarray-contrib/datatree/issues/254 - [ ] API for merging leaves https://github.com/xarray-contrib/datatree/issues/192 - [ ] Dict-like interface ambiguities https://github.com/xarray-contrib/datatree/issues/240 - [ ] The tree broadcasting rabbit hole https://github.com/xarray-contrib/datatree/issues/199 - [ ] Relationship between datatree and catalogs https://github.com/xarray-contrib/datatree/issues/134 - [ ] Should xr.concat/xr.merge accept DataTree objects? (and map over them by default?) Would help with https://github.com/TomNicholas/VirtualiZarr/issues/84#issuecomment-2065410549

There was also this design doc I wrote at one point

@flamingbear are you free at 11:30am EST on Tuesday each week? @shoyer, @keewis and I are all free then. Others also welcome (e.g. @owenlittlejohns , @eni-awowale, @etienneschalk), but not required :)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8747/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2247043809 I_kwDOAMm_X86F7yrh 8949 Mapping DataTree methods over nodes with variables for which the args are invalid TomNicholas 35968931 open 0     0 2024-04-16T23:45:26Z 2024-04-17T14:58:14Z   MEMBER      

What is your issue?

In the datatree call today we narrowed down an issue with how datatree maps methods over many variables in many nodes. This issue is essentially https://github.com/xarray-contrib/datatree/issues/67, but I'll attempt to discuss the problem and solution in more general terms.

Context in xarray

xarray.Dataset is essentially a mapping of variable names to Variable objects, and most Dataset methods implicitly map a method defined on Variable over all these variables (e.g. .mean()). Sometimes the mapped method can be naively applied to every variable in the dataset, but sometimes it doesn't make sense to apply it to some of the variables. For example .mean(dim='time') only makes sense for the variables in the dataset that actually have a time dimension.

xarray.Dataset handles this for the user by either working out what version of the method does make sense for that variable (e.g. only trying to take the mean along the reduction dimensions actually present on that variable), or just passing the variable through unaltered. There are some weird subtleties lurking here, e.g. with statistical reductions like std and var.

https://github.com/pydata/xarray/blob/239309f881ba0d7e02280147bc443e6e286e6a63/xarray/core/dataset.py#L6853

There is therefore a difference between

ds.map(Variable.{REDUCTION}, dim='time') and ds.{REDUCTION}(dim='time')

For example:

```python In [13]: ds = xr.Dataset({'a': ('x', [1, 2]), 'b': 0})

In [14]: ds.isel(x=0) Out[14]: <xarray.Dataset> Size: 16B Dimensions: () Data variables: a int64 8B 1 b int64 8B 0

In [15]: ds.map(Variable.isel, x=0)

ValueError Traceback (most recent call last) Cell In[15], line 1 ----> 1 ds.map(Variable.isel, x=0)

...

ValueError: Dimensions {'x'} do not exist. Expected one or more of () ```

(Aside: It would be nice for Dataset.map to include information about which variable it raised an exception on in the error message.)

Clearly Dataset.isel does more than just applying Variable.isel using Dataset.map.

Issue in DataTree

In datatree we have to map methods over different variables in the same node, but also over different variables in different nodes. Currently the implementation of a method naively maps the Dataset method over every node using map_over_subtree, but if there is a node containing a variable for which the method args are invalid, it will raise an exception.

This causes problems for users, for example in https://github.com/xarray-contrib/datatree/issues/67. A minimal example of this problem would be

```python In [18]: ds1 = xr.Dataset({'a': ('x', [1, 2])})

In [19]: ds2 = xr.Dataset({'b': 0})

In [20]: dt = DataTree.from_dict({'node1': ds1, 'node2': ds2})

In [21]: dt Out[21]: DataTree('None', parent=None) ├── DataTree('node1') │ Dimensions: (x: 2) │ Dimensions without coordinates: x │ Data variables: │ a (x) int64 16B 1 2 └── DataTree('node2') Dimensions: () Data variables: b int64 8B 0

In [22]: dt.isel(x=0) ValueError: Dimensions {'x'} do not exist. Expected one or more of FrozenMappingWarningOnValuesAccess({}) Raised whilst mapping function over node with path /node2 ```

(The slightly weird error message here is related to the deprecation cycle in #8500)

We would have preferred that variable b in node2 survived unchanged, like it does in the pure Dataset example.

Desired behaviour

We can kind of think of the desired behaviour like a hypothesis property we want (xref https://github.com/pydata/xarray/issues/1846), but not quite. It would be something like

python dt.{REDUCTION}().flatten_into_dataset() == dt.flatten_into_dataset().{REDUCTION}()

except that .flatten_into_dataset() can't really exist for all cases otherwise we wouldn't need datatree.

Proposed Solution

There are two ways I can imagine implementing this. 1) Use map_over_subtree the apply the method as-is and try to catch known possible KeyErrors for missing dimensions. This would be fragile. 2) Do some kind of pre-checking of the data in the tree, potentially adjust the method before applying it using map_over_subtree.

I think @shoyer and I concluded that we should make (2), in the form of some kind of new primitive, i.e. DataTree.reduce. (Actually DataTree.reduce already exists, but should be changed to not just map_over_subtree Dataset.reduce). Taking after Dataset.reduce, it would look something like this:

```python class DataTree: def reduce(self, reduce_func: Callable, dim: Dims = None, , *kwargs) -> DataTree: all_dims_in_tree = set(node.dims for node in self.subtree)

    missing_dims = tuple(d for d in dims if d not in all_dims_in_tree)
    if missing_dims:
        raise ValueError()

    # TODO this could probably be refactored to call `map_over_subtree`
    for node in self.subtree:
        # using only the reduction dims that are actually present here would fix datatree GH issue #67
        reduce_dims = [d for d in node.dims if d in dims]
        result = node.ds.reduce(func, dims=reduce_dims, **kwargs)

    # TODO build the result and return it

```

Then every method that has this pattern of acting over one or more dims should be mapped over the tree using DataTree.reduce, not map_over_subtree.

cc @shoyer, @flamingbear, @owenlittlejohns

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8949/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1336119080 PR_kwDOAMm_X849CQ7A 6908 Hypothesis strategies in xarray.testing.strategies TomNicholas 35968931 open 0     15 2022-08-11T15:20:56Z 2024-04-01T16:01:21Z   MEMBER   0 pydata/xarray/pulls/6908

Adds a whole suite of hypothesis strategies for generating xarray objects, inspired by and separated out from the new hypothesis strategies in #4972. They are placed into the namespace xarray.testing.strategies, and publicly mentioned in the API docs, but with a big warning message. There is also a new testing page in the user guide documenting how to use these strategies.

  • [x] Closes #6911
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst

EDIT: A variables strategy and user-facing documentation were shipped in https://github.com/pydata/xarray/pull/8404

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6908/reactions",
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 2,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2120340151 PR_kwDOAMm_X85mHqI0 8714 Avoid coercing to numpy in `as_shared_dtypes` TomNicholas 35968931 open 0     3 2024-02-06T09:35:22Z 2024-03-28T18:31:50Z   MEMBER   0 pydata/xarray/pulls/8714
  • [x] Solves the problem in https://github.com/pydata/xarray/pull/8712#issuecomment-1929037299
  • [ ] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8714/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1247010680 I_kwDOAMm_X85KU994 6633 Opening dataset without loading any indexes? TomNicholas 35968931 open 0     10 2022-05-24T19:06:09Z 2024-02-23T05:36:53Z   MEMBER      

Is your feature request related to a problem?

Within pangeo-forge's internals we would like to call open_dataset, then to_dict(), and end up with a schema-like representation of the contents of the dataset. This works, but it also has the side-effect of loading all indexes into memory, even if we are loading the data values "lazily".

Describe the solution you'd like

@benbovy do you think it would be possible to (perhaps optionally) also avoid loading indexes upon opening a dataset, so that we actually don't load anything? The end result would act a bit like ncdump does.

Describe alternatives you've considered

Otherwise we might have to try using xarray-schema or something but the suggestion here would be much neater and more flexible.

xref: https://github.com/pangeo-forge/pangeo-forge-recipes/issues/256

cc @rabernat @jhamman @cisaacstern

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6633/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1912094632 I_kwDOAMm_X85x-D-o 8231 xr.concat concatenates along dimensions that it wasn't asked to TomNicholas 35968931 open 0     4 2023-09-25T18:50:29Z 2024-02-14T20:30:26Z   MEMBER      

What happened?

Here are two toy datasets designed to represent sections of a dataset that has variables living on a staggered grid. This type of dataset is common in fluid modelling (it's why xGCM exists).

```python import xarray as xr

ds1 = xr.Dataset( coords={ 'x_center': ('x_center', [1, 2, 3]), 'x_outer': ('x_outer', [0.5, 1.5, 2.5, 3.5]),
}, )

ds2 = xr.Dataset( coords={ 'x_center': ('x_center', [4, 5, 6]), 'x_outer': ('x_outer', [4.5, 5.5, 6.5]),
}, ) ```

Calling xr.concat on these with dim='x_center' happily concatenates them python xr.concat([ds1, ds2], dim='x_center') <xarray.Dataset> Dimensions: (x_outer: 7, x_center: 6) Coordinates: * x_outer (x_outer) float64 0.5 1.5 2.5 3.5 4.5 5.5 6.5 * x_center (x_center) int64 1 2 3 4 5 6 Data variables: *empty* but notice that the returned result has been concatenated along both x_center and x_outer.

What did you expect to happen?

I did not expect this to work. I definitely didn't expect the datasets to be concatenated along a dimension I didn't ask them to be concatenated along (i.e. x_outer).

What I expected to happen was that (as by default coords='different') both variables would be attempted to be concatenated along the x_center dimension, which would have succeeded for the x_center variable but failed for the x_outer variable. Indeed, if I name the variables differently so that they are no longer coordinate variables then that is what happens:

```python import xarray as xr

ds1 = xr.Dataset( data_vars={ 'a': ('x_center', [1, 2, 3]), 'b': ('x_outer', [0.5, 1.5, 2.5, 3.5]),
}, )

ds2 = xr.Dataset( data_vars={ 'a': ('x_center', [4, 5, 6]), 'b': ('x_outer', [4.5, 5.5, 6.5]),
}, ) python xr.concat([ds1, ds2], dim='x_center', data_vars='different') ValueError: cannot reindex or align along dimension 'x_outer' because of conflicting dimension sizes: {3, 4} ```

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

I was trying to create an example for which you would need the automatic combined concat/merge that happens within xr.combine_by_coords.

Environment

xarray 2023.8.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8231/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2120030667 PR_kwDOAMm_X85mGm4g 8712 Only use CopyOnWriteArray wrapper on BackendArrays TomNicholas 35968931 open 0     6 2024-02-06T06:05:53Z 2024-02-07T17:09:56Z   MEMBER   0 pydata/xarray/pulls/8712

This makes sure we only use the CopyOnWriteArray wrapper on arrays that have been explicitly marked to be lazily-loaded (through being subclasses of BackendArray). Without this change we are implicitly assuming that any array type obtained through the BackendEntrypoint system should be treated as if it points to an on-disk array.

Motivated by https://github.com/pydata/xarray/issues/8699, which is a counterexample to that assumption.

  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8712/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2118876352 PR_kwDOAMm_X85mCobE 8708 Try pydata-sphinx-theme in docs TomNicholas 35968931 open 0     1 2024-02-05T15:50:01Z 2024-02-05T16:57:33Z   MEMBER   0 pydata/xarray/pulls/8708
  • [x] Closes #8701
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst

How might we want to move headers/sections around to take advantage of now having a navigation bar at the top? Adding an explicit link to the tutorial.xarray.dev site would be good.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8708/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2116695961 I_kwDOAMm_X85-KjeZ 8699 Wrapping a `kerchunk.Array` object directly with xarray TomNicholas 35968931 open 0     3 2024-02-03T22:15:07Z 2024-02-04T21:15:14Z   MEMBER      

What is your issue?

In https://github.com/fsspec/kerchunk/issues/377 the idea came up of using the xarray API to concatenate arrays which represent parts of a zarr store - i.e. using xarray to kerchunk a large set of netCDF files instead of using kerchunk.combine.MultiZarrToZarr.

The idea is to make something like this work for kerchunking sets of netCDF files into zarr stores

```python ds = xr.open_mfdataset( '/my/files*.nc' engine='kerchunk', # kerchunk registers an xarray IO backend that returns zarr.Array objects combine='nested', # 'by_coords' would require actually reading coordinate data parallel=True, # would use dask.delayed to generate reference dicts for each file in parallel )

ds # now wraps a bunch of zarr.Array / kerchunk.Array objects, no need for dask arrays

ds.kerchunk.to_zarr(store='out.zarr') # kerchunk defines an xarray accessor that extracts the zarr arrays and serializes them (which could also be done in parallel if writing to parquet) ```

I had a go at doing this in this notebook, and in doing so discovered a few potential issues with xarray's internals.

For this to work xarray has to: - Wrap a kerchunk.Array object which barely defines any array API methods, including basically not supporting indexing at all, - Store all the information present in a kerchunked Zarr store but without ever loading any data, - Not create any indexes by default during dataset construction or during xr.concat, - Not try to do anything else that can't be defined for a kerchunk.Array. - Possibly we need the Lazy Indexing classes to support concatenation https://github.com/pydata/xarray/issues/4628

It's an interesting exercise in using xarray as an abstraction, with no access to real numerical values at all.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8699/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
    xarray 13221727 issue
2099591300 I_kwDOAMm_X859JTiE 8667 Error using vectorized indexing with array API compliant class TomNicholas 35968931 open 0     0 2024-01-25T05:20:31Z 2024-01-25T16:07:12Z   MEMBER      

What happened?

Vectorized indexing can fail for array types that strictly follow the array API standard.

What did you expect to happen?

Vectorized indexing to all work.

Minimal Complete Verifiable Example

```Python import numpy.array_api as nxp

da = xr.DataArray( nxp.reshape(nxp.arange(12), (3, 4)), dims=["x", "y"], coords={"x": [0, 1, 2], "y": ["a", "b", "c", "d"]}, )

da[[0, 2, 2], [1, 3]] # works

ind_x = xr.DataArray([0, 1], dims=["x"]) ind_y = xr.DataArray([0, 1], dims=["y"])

da[ind_x, ind_y] # works

da[[0, 1], ind_x] # doesn't work


TypeError Traceback (most recent call last) Cell In[157], line 1 ----> 1 da[[0, 1], ind_x]

File ~/Documents/Work/Code/xarray/xarray/core/dataarray.py:859, in DataArray.getitem(self, key) 856 return self._getitem_coord(key) 857 else: 858 # xarray-style array indexing --> 859 return self.isel(indexers=self._item_key_to_dict(key))

File ~/Documents/Work/Code/xarray/xarray/core/dataarray.py:1472, in DataArray.isel(self, indexers, drop, missing_dims, **indexers_kwargs) 1469 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "isel") 1471 if any(is_fancy_indexer(idx) for idx in indexers.values()): -> 1472 ds = self._to_temp_dataset()._isel_fancy( 1473 indexers, drop=drop, missing_dims=missing_dims 1474 ) 1475 return self._from_temp_dataset(ds) 1477 # Much faster algorithm for when all indexers are ints, slices, one-dimensional 1478 # lists, or zero or one-dimensional np.ndarray's

File ~/Documents/Work/Code/xarray/xarray/core/dataset.py:3001, in Dataset._isel_fancy(self, indexers, drop, missing_dims) 2997 var_indexers = { 2998 k: v for k, v in valid_indexers.items() if k in var.dims 2999 } 3000 if var_indexers: -> 3001 new_var = var.isel(indexers=var_indexers) 3002 # drop scalar coordinates 3003 # https://github.com/pydata/xarray/issues/6554 3004 if name in self.coords and drop and new_var.ndim == 0:

File ~/Documents/Work/Code/xarray/xarray/core/variable.py:1130, in Variable.isel(self, indexers, missing_dims, **indexers_kwargs) 1127 indexers = drop_dims_from_indexers(indexers, self.dims, missing_dims) 1129 key = tuple(indexers.get(dim, slice(None)) for dim in self.dims) -> 1130 return self[key]

File ~/Documents/Work/Code/xarray/xarray/core/variable.py:812, in Variable.getitem(self, key) 799 """Return a new Variable object whose contents are consistent with 800 getting the provided key from the underlying data. 801 (...) 809 array x.values directly. 810 """ 811 dims, indexer, new_order = self._broadcast_indexes(key) --> 812 data = as_indexable(self._data)[indexer] 813 if new_order: 814 data = np.moveaxis(data, range(len(new_order)), new_order)

File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:1390, in ArrayApiIndexingAdapter.getitem(self, key) 1388 else: 1389 if isinstance(key, VectorizedIndexer): -> 1390 raise TypeError("Vectorized indexing is not supported") 1391 else: 1392 raise TypeError(f"Unrecognized indexer: {key}")

TypeError: Vectorized indexing is not supported ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

I don't really understand why the first two examples work but the last one doesn't...

Environment

main branch of xarray, numpy 1.26.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8667/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1332231863 I_kwDOAMm_X85PaD63 6894 Public testing framework for duck array integration TomNicholas 35968931 open 0     8 2022-08-08T18:23:49Z 2024-01-25T04:04:11Z   MEMBER      

What is your issue?

In #4972 @keewis started writing a public framework for testing the integration of any duck array class in xarray, inspired by the testing framework pandas has for ExtensionArrays. This is a meta-issue for what our version of that framework for wrapping numpy-like duck arrays should look like.

(Feel free to edit / add to this)

What behaviour should we test?

We have a lot of xarray methods to test with any type of duck array. Each of these bullets should correspond to one or more testing base classes which the duck array library author would inherit from. In rough order of increasing complexity:

  • [x] Constructors - Including for Variable #6903
  • [x] Properties - checking that .shape, .dtype etc. exist on the wrapped array, see #4285 for example #6903
  • [x] Reductions - #4972 also uses parameters to automatically test many methods, and hypothesis to test each method for many different array instances.
  • [ ] Unary ops
  • [ ] Binary ops
  • [ ] Selection
  • [ ] Computation
  • [ ] Combining
  • [ ] Groupby
  • [ ] Rolling
  • [ ] Coarsen
  • [ ] Weighted

We don't need to test that the array class obeys everything else in the Array API Standard. (For instance .device is probably never going to be used by xarray directly.) We instead assume that if the array class doesn't implement something in the API standard but all the generated tests pass, then all is well.

How extensible does our testing framework need to be?

To be able to test any type of wrapped array our testing framework needs to itself be quite flexible.

  • User-defined checking - For some arrays np.testing.assert_equal is not enough to guarantee correctness, so the user creating tests needs to specify additional checks. #4972 shows how to do this for checking the units of resulting pint arrays.
  • User-created data? - Some array libraries might need to test array data that is invalid for numpy arrays. I'm thinking specifically of testing wrapping ragged arrays. #4285
  • Parallel computing frameworks? - Related to the last point is chunked arrays. Here the strategy requires an extra chunks argument when the array is created, and any results need to first call .compute(). Testing parallel-executed arrays might also require pretty complicated SetUps and TearDowns in fixtures too. (see also #6807)

What documentation / examples do we need?

All of this content should really go on a dedicated page in the docs, perhaps grouped alongside other ways of extending xarray.

  • [ ] Motivation
  • [ ] What subset of the Array API standard we expect duck array classes to define (could point to a typing protocol?)
  • [ ] Explanation that the array type needs to return the same type for any numpy-like function which xarray might call upon that type (i.e. the set of duckarray instances is closed under numpy operations)
  • [ ] Explanation of the different base classes
  • [ ] Simple demo of testing a toy numpy-like array class
  • [ ] Point to code testing more advanced examples we actually use (e.g. sparse, pint)
  • [ ] Which advanced behaviours are optional (e.g. Constructors and Properties have to work, but Groupby is optional)

Where should duck array compatibility testing eventually live?

Right now the tests for sparse & pint are going into the xarray repo, but presumably we don't want tests for every duck array type living in this repository. I suggest that we want to work towards eventually having no array library-specific tests in this repository at all. (Except numpy I guess.) Thanks @crusaderky for the original suggestion.

Instead all tests involving pint could live in pint-xarray, all involving sparse could live in the sparse repository (or a new sparse-xarray repo), etc. etc. We would set those test jobs to re-run when xarray is released, and then xref any issues revealed here if needs be.

We should probably also move some of our existing tests https://github.com/pydata/xarray/pull/7023#pullrequestreview-1104932752

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6894/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1716228662 I_kwDOAMm_X85mS5I2 7848 Compatibility with the Array API standard TomNicholas 35968931 open 0     4 2023-05-18T20:34:43Z 2024-01-25T04:03:42Z   MEMBER      

What is your issue?

Meta-issue to track all the smaller issues around making xarray and the array API standard compatible with each other.

We've already had - #6804 - #7067 - #7847

and there will likely be many others.


I suspect this might require changes to the standard as well as to xarray - in particular see this list of common numpy functions which are not currently in the array API standard. Of these xarray currently uses (FYI @ralfgommers ):

  • np.clip
  • np.diff
  • np.pad
  • np.repeat
  • ~np.take~
  • ~np.tile~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7848/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2088695240 I_kwDOAMm_X858fvXI 8619 Docs sidebar is squished TomNicholas 35968931 open 0     9 2024-01-18T16:54:55Z 2024-01-23T18:38:38Z   MEMBER      

What happened?

Since the v2024.01.0 release yesterday, there seems to be a rendering error in the website - the sidebar is squished up to the left:

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8619/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  reopened xarray 13221727 issue
1940536602 I_kwDOAMm_X85zqj0a 8298 cftime.DatetimeNoLeap incorrectly decoded from netCDF file TomNicholas 35968931 open 0     14 2023-10-12T18:13:53Z 2024-01-08T01:01:53Z   MEMBER      

What happened?

I have been given a netCDF file (I think it's netCDF3) which when I open it does not decode the time variable in the way I expected it to. The time coordinate created is a numpy object array

What did you expect to happen?

I expected it to automatically create a coordinate backed by a CFTimeIndex object, not a CFTimeIndex object wrapped inside another array type.

Minimal Complete Verifiable Example

The original problematic file is 455MB (I can share it if necessary), but I can create a small netCDF file that displays the same issue.

```python import cftime

time_values = [cftime.DatetimeNoLeap(347, 2, 1, 0, 0, 0, 0, has_year_zero=True)] time_ds = xr.Dataset(coords={'time': (['time'], time_values)}) print(time_ds) time_ds.to_netcdf('time_mwe.nc') <xarray.Dataset> Dimensions: (time: 1) Coordinates: * time (time) object 0347-02-01 00:00:00 Data variables: empty python ds = xr.open_dataset('time_mwe.nc', engine='netcdf4', decode_times=True, use_cftime=True) print(ds) <xarray.Dataset> Dimensions: (time: 1) Coordinates: * time (time) object 0347-02-01 00:00:00 Data variables: empty ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

cftime 1.6.2 netcdf4 1.6.4 xarray 2023.8.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8298/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1333644214 PR_kwDOAMm_X8486DyE 6903 Duckarray tests for constructors and properties TomNicholas 35968931 open 0     5 2022-08-09T18:36:56Z 2024-01-01T13:33:22Z   MEMBER   0 pydata/xarray/pulls/6903

Builds on top of #4972 to add tests for Variable/DataArray/Dataset constructors and properties when wrapping duck arrays.

Adds a file xarray/tests/duckarrays/base/constructors.py which contains new test base classes.

Also uses those new base classes to test Sparse array integration (not yet tried for pint integration).

  • [x] Closes part of #6894
  • [ ] Tests added (tests for tests?? Maybe...)
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6903/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2038153739 I_kwDOAMm_X855e8IL 8545 map_blocks should dispatch to ChunkManager TomNicholas 35968931 open 0     5 2023-12-12T16:34:13Z 2023-12-22T16:47:27Z   MEMBER      

Is your feature request related to a problem?

7019 generalized most of xarrays internals to be able to use any chunked array type that we can create a ChunkManagerEntrypoint for. Most functions now go through this (e.g. apply_ufunc), but I did not redirect xarray.map_blocks to go through ChunkManagerEntrypoint.

This redirection works by dispatching to high-level dask.array primitives such as dask.array.apply_gufunc, dask.array.blockwise, and dask.array.map_blocks. However the current implementation of xarray.map_blocks is much lower-level, building a custom HLG, so it was not obvious how to swap it out.

Describe the solution you'd like

I would like to either:

1) Replace the current internals of xarray.map_blocks with a simple call to ChunkManagerEntrypoint.map_blocks. This would be the cleanest separation of concerns we could do here. Presumably there is some obvious reason why this cannot or should not be done, but I have yet to understand what that reason is. (either @dcherian or @tomwhite can you enlighten me perhaps? 🙏)

2) (More likely) refactor so that the existing guts of xarray.map_blocks are only called from the ChunkManagerEntrypoint, and a non-dask chunked array (i.e. cubed, but in theory other types too) would be able to specify how it wants to perform the map_blocks.

Describe alternatives you've considered

Leaving it as the status quo breaks the nice abstraction and separation of concerns that #7019 introduced.

Additional context

Split off from https://github.com/pydata/xarray/issues/8414

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8545/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2027231531 I_kwDOAMm_X8541Rkr 8524 PR labeler bot broken and possibly dead TomNicholas 35968931 open 0     2 2023-12-05T22:23:44Z 2023-12-06T15:33:42Z   MEMBER      

What is your issue?

The PR labeler bot seems to be broken

https://github.com/pydata/xarray/actions/runs/7107212418/job/19348227101?pr=8404

and even worse the repository has been archived!

https://github.com/andymckay/labeler

I actually like this bot, but unless a similar bot exists somewhere else I guess we should just delete this action 😞

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8524/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  reopened xarray 13221727 issue
2019594436 I_kwDOAMm_X854YJDE 8496 Dataset.dims should return a set, not a dict of sizes TomNicholas 35968931 open 0     8 2023-11-30T22:12:37Z 2023-12-02T03:10:14Z   MEMBER      

What is your issue?

This is inconsistent:

```python In [25]: ds Out[25]: <xarray.Dataset> Dimensions: (x: 1, y: 2) Dimensions without coordinates: x, y Data variables: a (x, y) int64 0 1

In [26]: ds['a'].dims Out[26]: ('x', 'y')

In [27]: ds['a'].sizes Out[27]: Frozen({'x': 1, 'y': 2})

In [28]: ds.dims Out[28]: Frozen({'x': 1, 'y': 2})

In [29]: ds.sizes Out[29]: Frozen({'x': 1, 'y': 2}) ```

Surely ds.dims should return something like a Frozenset({'x', 'y'})? (because dimension order is meaningless when you have multiple arrays underneath - see https://github.com/pydata/xarray/issues/8498)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8496/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1790161818 PR_kwDOAMm_X85UvI4i 7963 Suggest installing dask when not discovered by ChunkManager TomNicholas 35968931 open 0     2 2023-07-05T19:34:06Z 2023-10-16T13:31:44Z   MEMBER   0 pydata/xarray/pulls/7963
  • [x] Closes #7962
  • [ ] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7963/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1812811751 I_kwDOAMm_X85sDU_n 8008 "Deep linking" disparate documentation resources together TomNicholas 35968931 open 0     3 2023-07-19T22:18:55Z 2023-10-12T18:36:52Z   MEMBER      

What is your issue?

Our docs have a general issue with having lots of related resources that are not necessarily linked together in a useful way. This results in users (including myself!) getting "stuck" in one part of the docs and being unaware of material that would help them solve their specific issue.

To give a concrete example, if a user wants to know about coarsen, there is relevant material:

  • In the coarsen class docstring
  • On the reshaping page
  • On the computations page
  • On the "how do I?" page
  • On the tutorial repository

Different types of material are great, but only some of these resources are linked to others. Coarsen is actually pretty well covered overall, but for other functions there might be no useful linking at all, or no examples in the docstrings.


The biggest missed opportunity here is the way all the great content on the tutorial.xarray.dev repository is not linked from anywhere on the main documentation site (I believe). To address that we could either (a) integrate the tutorial.xarray.dev material into the main site or (b) add a lot more cross-linking between the two sites.

Identifying sections that could be linked and adding links would be a great task for new contributors.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8008/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
602218021 MDU6SXNzdWU2MDIyMTgwMjE= 3980 Make subclassing easier? TomNicholas 35968931 open 0     9 2020-04-17T20:33:13Z 2023-10-04T16:27:28Z   MEMBER      

Suggestion

We relatively regularly have users asking about subclassing DataArray and Dataset, and I know of at least a few cases where people have gone through with it. However we currently explicitly discourage doing this, on the basis that basically all operations will return a bare xarray object instead of the subclassed version, it's full of trip hazards, and we have the accessor interface to point people to instead.

However, while useful, the accessors aren't enough for some users, and I think we could probably do better. If we refactored internally we might be able to make it much easier to subclass.

Example to follow in Pandas

Pandas takes an interesting approach: while they also explicitly discourage subclassing, they still try to make it easier, and show you what you need to do in order for it to work.

They ask you to override some constructor properties with your own, and allow you to define your own original properties.

Potential complications

  • .construct_dataarray and DataArray.__init__ are used a lot internally to reconstruct a DataArray from dims, coords, data etc. before returning the result of a method call. We would probably need to standardise this, before allowing users to override it.

  • Pandas actually has multiple constructor properties you need to override: _constructor, _constructor_sliced, and _constructor_expanddim. What's the minimum set of similar constructors we would need?

  • Blocking access to attributes - we current stop people from adding their own attributes quite aggressively, so that we can have attributes as an alias for variables and attrs, we would need to either relax this or better allow users to set a list of their own _properties which they want to register, similar to pandas.

  • __slots__ - I think something funky can happen if you inherit from a class that defines __slots__?

Documentation

I think if we do this we should also slightly refactor the relevant docs to make clear the distinction between 3 groups of people: - Users - People who import and use xarray at the top-level with (ideally) no particular concern as to how it works. This is who the vast majority of the documentation is for. - Developers - People who are actually improving and developing xarray upstream. This is who the Contributing to xarray page is for. - Extenders - People who want to subclass, accessorize or wrap xarray objects, in order to do something more complicated. These people are probably writing a domain-specific library which will then bring in a new set of users. There maybe aren't as many of these people, but they are really important IMO. This is implicitly who the xarray internals page is aimed at, but it would be nice to make that distinction much more clear. It might also be nice to give them a guide as to "I want to achieve X, should I use wrapping/subclassing/accessors?"

@max-sixty you had some ideas about what would need to be done for this to work?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3980/reactions",
    "total_count": 11,
    "+1": 11,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1801393806 PR_kwDOAMm_X85VVV4q 7981 Document that Coarsen accepts coord func as callable TomNicholas 35968931 open 0     0 2023-07-12T17:01:31Z 2023-09-19T01:18:49Z   MEMBER   0 pydata/xarray/pulls/7981

Documents a hidden feature I noticed yesterday, corrects incorrect docstrings, and tidies up some of the typing internally.

  • [ ] ~~Closes #xxxx~~
  • [ ] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7981/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1812188730 I_kwDOAMm_X85sA846 8004 Rotation Functional Index example TomNicholas 35968931 open 0     2 2023-07-19T15:23:20Z 2023-08-24T13:26:56Z   MEMBER      

Is your feature request related to a problem?

I'm trying to think of an example that would demonstrate the "functional index" pattern discussed in https://github.com/pydata/xarray/issues/3620.

I think a 2D rotation is the simplest example of an analytically-expressible, non-trivial, domain-agnostic case where you might want to back a set of multiple coordinates with a single functional index. It's also nice because there is additional information that must be passed and stored (the angle of the rotation), but that part is very simple, and domain-agnostic. I'm proposing we make this example work and put it in the custom index docs.

I had a go at making that example (notebook here) @benbovy, but I'm confused about a couple of things:

1) How do I implement .sel in such a way that it supports indexing with slices (i.e. to crop my image) 2) How can I make this lazy? 3) Should the implementation be a "MetaIndex" (i.e. wrapping some pandas indexes)?

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

This example is inspired by @jni's use case in napari, where (IIUC) they want to do a lazy functional affine transformation from pixel to physical coordinates, where the simplest example of such a transform might be a linear shear (caused by the imaging focal plane being at an angle to the physical sample).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8004/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1742035781 I_kwDOAMm_X85n1VtF 7894 Can a "skipna" argument be added for Dataset.integrate() and DataArray.integrate()? TomNicholas 35968931 open 0     2 2023-06-05T15:32:35Z 2023-06-05T21:59:45Z   MEMBER      

Discussed in https://github.com/pydata/xarray/discussions/5283

<sup>Originally posted by **chfite** May 9, 2021</sup> I am using the Dataset.integrate() function and noticed that because one of my variables has a NaN in it the function returns a NaN for the integrated value for that variable. I know based on the trapezoidal rule one could not get an integrated value at the location of the NaN, but is it not possible for it to calculate the integrated values where there were regular values? Assuming 0 for NaNs does not work because it would still integrate between the values before and after 0 and add additional area I do not want. Using DataArray.dropna() also is not sufficient because it would assume the value before the NaN is then connected to the value after the NaN and again add additional area that I would not want included. If a "skipna" functionality or something could not be added to the integrate function, does anyone have a suggestion for another way to get around to calculating my integrated area while excluding the NaNs?
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7894/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1694956396 I_kwDOAMm_X85lBvts 7813 Task naming for general chunkmanagers TomNicholas 35968931 open 0     3 2023-05-03T22:56:46Z 2023-05-05T10:30:39Z   MEMBER      

What is your issue?

(Follow-up to #7019)

When you create a dask graph of xarray operations, the tasks in the graph get useful names according the name of the DataArray they operate on, or whether they represent an open_dataset call.

Currently for cubed this doesn't work, for example this graph from https://github.com/pangeo-data/distributed-array-examples/issues/2#issuecomment-1533852877:

cc @tomwhite @dcherian

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7813/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1468534020 I_kwDOAMm_X85XiA0E 7333 FacetGrid with coords error TomNicholas 35968931 open 0     1 2022-11-29T18:42:48Z 2023-04-03T10:12:40Z   MEMBER      

There may perhaps be a small bug anyway, as DataArrays with and without coords are handled differently. Contrast:

``` da=xr.DataArray(data=np.random.randn(2,2,2,10,10),coords={'A':['a1','a2'],'B':[0,1],'C':[0,1],'X':range(10),'Y':range(10)})

p=da.sel(A='a1').plot.contour(col='B',row='C') try: p.map_dataarray(xr.plot.pcolormesh, y="B", x="C"); except Exception as e: print('An uninformative error:') print(e) An uninformative error: tuple index out of range

```

with:

``` da=xr.DataArray(data=np.random.randn(2,2,2,10,10))

p=da.sel(dim_0=0).plot.contour(col='dim_1',row='dim_2') try: p.map_dataarray(xr.plot.pcolormesh, y="dim_1", x="dim_2"); except Exception as e: print('A more informative error:') print(e) ```

``` A more informative error: x must be one of None, 'dim_3', 'dim_4'

```

Originally posted by @joshdorrington in https://github.com/pydata/xarray/discussions/7310#discussioncomment-4257643

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7333/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1188523721 I_kwDOAMm_X85G127J 6431 Bug when padding coordinates with NaNs TomNicholas 35968931 open 0     2 2022-03-31T18:57:16Z 2023-03-30T13:33:10Z   MEMBER      

What happened?

python da = xr.DataArray(np.arange(9), dim='x') da.pad({'x': (0, 1)}, 'constant', constant_values=np.NAN) ```


ValueError Traceback (most recent call last) Input In [12], in <cell line: 1>() ----> 1 da.pad({'x': 1}, 'constant', constant_values=np.NAN)

File ~/Documents/Work/Code/xarray/xarray/core/dataarray.py:4158, in DataArray.pad(self, pad_width, mode, stat_length, constant_values, end_values, reflect_type, pad_width_kwargs) 4000 def pad( 4001 self, 4002 pad_width: Mapping[Any, int | tuple[int, int]] | None = None, (...) 4012 pad_width_kwargs: Any, 4013 ) -> DataArray: 4014 """Pad this array along one or more dimensions. 4015 4016 .. warning:: (...) 4156 z (x) float64 nan 100.0 200.0 nan 4157 """ -> 4158 ds = self._to_temp_dataset().pad( 4159 pad_width=pad_width, 4160 mode=mode, 4161 stat_length=stat_length, 4162 constant_values=constant_values, 4163 end_values=end_values, 4164 reflect_type=reflect_type, 4165 **pad_width_kwargs, 4166 ) 4167 return self._from_temp_dataset(ds)

File ~/Documents/Work/Code/xarray/xarray/core/dataset.py:7368, in Dataset.pad(self, pad_width, mode, stat_length, constant_values, end_values, reflect_type, pad_width_kwargs) 7366 variables[name] = var 7367 elif name in self.data_vars: -> 7368 variables[name] = var.pad( 7369 pad_width=var_pad_width, 7370 mode=mode, 7371 stat_length=stat_length, 7372 constant_values=constant_values, 7373 end_values=end_values, 7374 reflect_type=reflect_type, 7375 ) 7376 else: 7377 variables[name] = var.pad( 7378 pad_width=var_pad_width, 7379 mode=coord_pad_mode, 7380 coord_pad_options, # type: ignore[arg-type] 7381 )

File ~/Documents/Work/Code/xarray/xarray/core/variable.py:1360, in Variable.pad(self, pad_width, mode, stat_length, constant_values, end_values, reflect_type, pad_width_kwargs) 1357 if reflect_type is not None: 1358 pad_option_kwargs["reflect_type"] = reflect_type # type: ignore[assignment] -> 1360 array = np.pad( # type: ignore[call-overload] 1361 self.data.astype(dtype, copy=False), 1362 pad_width_by_index, 1363 mode=mode, 1364 pad_option_kwargs, 1365 ) 1367 return type(self)(self.dims, array)

File <array_function internals>:5, in pad(args, *kwargs)

File ~/miniconda3/envs/py39/lib/python3.9/site-packages/numpy/lib/arraypad.py:803, in pad(array, pad_width, mode, **kwargs) 801 for axis, width_pair, value_pair in zip(axes, pad_width, values): 802 roi = _view_roi(padded, original_area_slice, axis) --> 803 _set_pad_area(roi, axis, width_pair, value_pair) 805 elif mode == "empty": 806 pass # Do nothing as _pad_simple already returned the correct result

File ~/miniconda3/envs/py39/lib/python3.9/site-packages/numpy/lib/arraypad.py:147, in _set_pad_area(padded, axis, width_pair, value_pair) 130 """ 131 Set empty-padded area in given dimension. 132 (...) 144 broadcastable to the shape of arr. 145 """ 146 left_slice = _slice_at_axis(slice(None, width_pair[0]), axis) --> 147 padded[left_slice] = value_pair[0] 149 right_slice = _slice_at_axis( 150 slice(padded.shape[axis] - width_pair[1], None), axis) 151 padded[right_slice] = value_pair[1]

ValueError: cannot convert float NaN to integer ```

What did you expect to happen?

It should have successfully padded with a NaN, same as it does if you don't specify constant_values:

python In [14]: da.pad({'x': (0, 1)}, 'constant') Out[14]: <xarray.DataArray (x: 3)> array([ 0., 1., nan]) Dimensions without coordinates: x

Minimal Complete Verifiable Example

No response

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.11.0-7620-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1

xarray: 0.20.3.dev4+gdbc02d4e pandas: 1.4.0 numpy: 1.21.4 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.10.3 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.01.1 distributed: 2022.01.1 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2022.01.0 cupy: None pint: None sparse: None setuptools: 59.6.0 pip: 21.3.1 conda: 4.11.0 pytest: 6.2.5 IPython: 8.2.0 sphinx: 4.4.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6431/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1588461863 I_kwDOAMm_X85ergEn 7539 Concat doesn't concatenate dimension coordinates along new dims TomNicholas 35968931 open 0     4 2023-02-16T22:32:33Z 2023-02-21T19:07:48Z   MEMBER      

What is your issue?

xr.concat doesn't concatenate dimension coordinates along new dimensions, which leads to pretty unintuitive behavior.

Take this example (motivated by https://github.com/pydata/xarray/discussions/7532#discussioncomment-4988792) python segments = [] for i in range(2): time = np.sort(np.random.random(4)) da = xr.DataArray( np.random.randn(4,2), dims=["time", "cols"], coords=dict(time=('time', time), cols=["col1", "col2"]), ) segments.append(da) python In [86]: segments Out[86]: [<xarray.DataArray (time: 4, cols: 2)> array([[-0.61199576, -0.9012078 ], [-0.54187577, 1.30509994], [-3.53720471, 0.97607797], [ 0.2593455 , 0.95920031]]) Coordinates: * time (time) float64 0.1048 0.168 0.869 0.9432 * cols (cols) <U4 'col1' 'col2', <xarray.DataArray (time: 4, cols: 2)> array([[ 0.90266408, -0.54294821], [-1.09087103, -0.17484417], [-0.21679558, -0.57377412], [ 0.07570151, 0.27433728]]) Coordinates: * time (time) float64 0.03627 0.09754 0.2434 0.592 * cols (cols) <U4 'col1' 'col2'] ```python In [85]: xr.concat(segments, dim='new') Out[85]: <xarray.DataArray (new: 2, time: 8, cols: 2)> array([[[ nan, nan], [ nan, nan], [-0.61199576, -0.9012078 ], [-0.54187577, 1.30509994], [ nan, nan], [ nan, nan], [-3.53720471, 0.97607797], [ 0.2593455 , 0.95920031]],

   [[ 0.90266408, -0.54294821],
    [-1.09087103, -0.17484417],
    [        nan,         nan],
    [        nan,         nan],
    [-0.21679558, -0.57377412],
    [ 0.07570151,  0.27433728],
    [        nan,         nan],
    [        nan,         nan]]])

Coordinates: * time (time) float64 0.03627 0.09754 0.1048 0.168 ... 0.592 0.869 0.9432 * cols (cols) <U4 'col1' 'col2' Dimensions without coordinates: new ```

I would have expected to get a result of size {new: 2, time: 4, cols: 2}. That would be intuitive, because the default is coords='different', and that would be the result of concatenating each time coordinate (which have different values) and just propagating the cols coordinate (as they have the same values).

Instead what happened is that xr.concat treats the dimension coordinates as indexes to align, and defaults to an outer join. This auto-alignment behaviour has been discussed at length before, I'm just trying to point out another place in which its problematic.

This is kind of briefly mentioned in the concat docstring under coords='all': “all”: All coordinate variables will be concatenated, except those corresponding to other dimensions. but it's not even mentioned under coords='different'

I don't really know what I would prefer to happen with the coordinates. I guess to have created a time coordinate of size {new: 2, time: 4, cols: 2}, but then I don't know what that implies for the underlying index. @benbovy do you have any thoughts?

At the very least we should make this a lot clearer in the docs.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7539/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1586144997 PR_kwDOAMm_X85KDKDY 7534 Docs page on numpy to xarray TomNicholas 35968931 open 0     0 2023-02-15T16:16:53Z 2023-02-15T16:16:53Z   MEMBER   0 pydata/xarray/pulls/7534
  • [x] Closes #7533
  • [ ] ~~Tests added~~
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7534/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1585231355 I_kwDOAMm_X85efLX7 7533 Numpy to xarray docs TomNicholas 35968931 open 0     0 2023-02-15T05:13:50Z 2023-02-15T06:28:05Z   MEMBER      

We should make a docs page specifically to ease the transition from pure-numpy to xarray.

A lot of new xarray users come from already using numpy as their primary data structure. We relatively often get questions about "what's the xarray equivalent of X numpy function" but we don't have a dedicated place to collect those answers, or explain key conceptual differences.

I think this deserves its own dedicated docs page, with: - [ ] High-level conceptual differences (e.g. transpose invariance) - [ ] Arguments for the benefits of using xarray over pure numpy - [ ] Table of numpy <-> xarray function equivalents (similar to the existing "How do I..." page) - [ ] Other common recommendations for numpy users (e.g. use netCDF / Zarr instead of .npz or pickle to store data on disk)

For the table I thought of a few already, but I know there will be a lot more:

  • np.concatenate/np.vstack/np.hstack/np.stack → xr.concat
  • np.block → xr.combine_nested
  • np.apply_along_axis → xr.apply_ufunc
  • np.polynomial → xr.polyfit
  • np.reshape -> xr.coarsen().construct()
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7533/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1366751031 PR_kwDOAMm_X84-n1xC 7011 Add sphinx-codeautolink extension to docs build TomNicholas 35968931 open 0     15 2022-09-08T17:43:47Z 2023-02-06T17:55:52Z   MEMBER   1 pydata/xarray/pulls/7011

I think that sphinx-codeautolink is different from sphinx.ext.linkcode...

  • [x] Closes #7010
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7011/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1565458372 PR_kwDOAMm_X85I-VC2 7497 Enable datatree * dataset commutativity TomNicholas 35968931 open 0     0 2023-02-01T05:24:53Z 2023-02-03T17:32:20Z   MEMBER   0 pydata/xarray/pulls/7497

Change binary operations involving DataTree objects and Dataset objects to be handled by the DataTree class. Necessary to enable ds * dt to return the same type as dt * ds.

Builds on top of #7418.

  • [x] Closes https://github.com/xarray-contrib/datatree/issues/146
  • [x] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7497/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1549861293 I_kwDOAMm_X85cYQGt 7459 Error when broadcast given int TomNicholas 35968931 open 0     0 2023-01-19T19:59:31Z 2023-01-19T21:11:12Z   MEMBER      

What happened?

Unhelpful error raised by xr.broadcast when supplied with an int.

What did you expect to happen?

The broadcast to succeed I think?

Minimal Complete Verifiable Example

```Python In [1]: import xarray as xr

In [2]: da = xr.DataArray([5, 4], dims='x')

In [3]: xr.broadcast(da, 1)

AttributeError Traceback (most recent call last) Cell In[3], line 1 ----> 1 xr.broadcast(da, 1)

File ~/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/core/alignment.py:1049, in broadcast(exclude, args) 1047 if exclude is None: 1048 exclude = set() -> 1049 args = align(args, join="outer", copy=False, exclude=exclude) 1051 dims_map, common_coords = _get_broadcast_dims_map_common_coords(args, exclude) 1052 result = [_broadcast_helper(arg, exclude, dims_map, common_coords) for arg in args]

File ~/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/core/alignment.py:772, in align(join, copy, indexes, exclude, fill_value, *objects) 576 """ 577 Given any number of Dataset and/or DataArray objects, returns new 578 objects with aligned indexes and dimension sizes. (...) 762 763 """ 764 aligner = Aligner( 765 objects, 766 join=join, (...) 770 fill_value=fill_value, 771 ) --> 772 aligner.align() 773 return aligner.results

File ~/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/core/alignment.py:556, in Aligner.align(self) 553 self.results = (obj.copy(deep=self.copy),) 554 return --> 556 self.find_matching_indexes() 557 self.find_matching_unindexed_dims() 558 self.assert_no_index_conflict()

File ~/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/core/alignment.py:262, in Aligner.find_matching_indexes(self) 259 objects_matching_indexes = [] 261 for obj in self.objects: --> 262 obj_indexes, obj_index_vars = self._normalize_indexes(obj.xindexes) 263 objects_matching_indexes.append(obj_indexes) 264 for key, idx in obj_indexes.items():

AttributeError: 'int' object has no attribute 'xindexes' ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

This clearly has something to do with a change in the flexible indexes refactor, as it complains about .xindexes not being present. @benbovy

Environment

The main branch

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7459/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1536556849 I_kwDOAMm_X85blf8x 7447 Add Align to terminology page TomNicholas 35968931 open 0     0 2023-01-17T15:15:16Z 2023-01-17T15:15:16Z   MEMBER      

Is your feature request related to a problem?

The terminology docs page mostly contains explanation of available classes. It should also contain explanation of words we use to describe relationships between those classes.

For example the docstring on xr.align just says "Given any number of Dataset and/or DataArray objects, returns new objects with aligned indexes and dimension sizes.", but there is no link given to a definition of what we mean by "aligned".

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7447/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1365266461 PR_kwDOAMm_X84-it_s 7006 Fix decorators in ipython code blocks in docs TomNicholas 35968931 open 0     0 2022-09-07T22:38:07Z 2023-01-15T18:11:17Z   MEMBER   0 pydata/xarray/pulls/7006

There was a bug in ipython's sphinx extension causing decorators to be skipped when evaluating code blocks. I assume that's why there is this weird workaround in the docs page on defining accessors (which uses decorators).

I fixed that bug, and the fix is in the most recent release of ipython, so this PR bumps our ipython version for the docs, and removes the workaround.

  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7006/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1512290017 I_kwDOAMm_X85aI7bh 7403 Zarr error when trying to overwrite part of existing store TomNicholas 35968931 open 0     3 2022-12-28T00:40:16Z 2023-01-11T21:26:10Z   MEMBER      

What happened?

to_zarr threw an error when I tried to overwrite part of an existing zarr store.

What did you expect to happen?

With mode w I was expecting it to overwrite part of the store with no complaints.

I expected that because that's what the docstring of to_zarr says:

mode ({"w", "w-", "a", "r+", None}, optional) – Persistence mode: “w” means create (overwrite if exists); “w-” means create (fail if exists); “a” means override existing variables (create if does not exist);

The default mode is "w", so I was expecting it to overwrite.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np np.random.seed(0)

ds = xr.Dataset() ds["data"] = (['x', 'y'], np.random.random((100,100))) ds.to_zarr("test.zarr") print(ds["data"].mean().compute())

returns array(0.49645889) as expected

ds = xr.open_dataset("test.zarr", engine='zarr', chunks={}) ds["data"].mean().compute() print(ds["data"].mean().compute())

still returns array(0.49645889) as expected

ds.to_zarr("test.zarr", mode="a") ```

python <xarray.DataArray 'data' ()> array(0.49645889) <xarray.DataArray 'data' ()> array(0.49645889) Traceback (most recent call last): File "/home/tom/Documents/Work/Code/experimentation/bugs/datatree_nans/mwe_xarray.py", line 16, in <module> ds.to_zarr("test.zarr") File "/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/core/dataset.py", line 2091, in to_zarr return to_zarr( # type: ignore File "/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/backends/api.py", line 1628, in to_zarr zstore = backends.ZarrStore.open_group( File "/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/backends/zarr.py", line 420, in open_group zarr_group = zarr.open_group(store, **open_kwargs) File "/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/zarr/hierarchy.py", line 1389, in open_group raise ContainsGroupError(path) zarr.errors.ContainsGroupError: path '' contains a group

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

I would like to know what the intended result is supposed to be here, so that I can make sure datatree behaves the same way, see https://github.com/xarray-contrib/datatree/issues/168.

Environment

Main branch of xarray, zarr v2.13.3

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7403/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
906023492 MDExOlB1bGxSZXF1ZXN0NjU3MDYxODI5 5400 Multidimensional histogram TomNicholas 35968931 open 0     3 2021-05-28T20:38:53Z 2022-11-21T22:41:01Z   MEMBER   0 pydata/xarray/pulls/5400

Initial work on integrating the multi-dimensional dask-powered histogram functionality from xhistogram into xarray. Just working on the skeleton to fit around the histogram algorithm for now, to be filled in later.

  • [x] Closes #4610
  • [x] API skeleton
  • [x] Input checking
  • [ ] Internal blockwise algorithm from https://github.com/xgcm/xhistogram/pull/49
  • [x] Redirect plot.hist
  • [x] da.weighted().hist()
  • [ ] Tests added for results
  • [x] Hypothesis tests for different chunking patterns
  • [ ] Examples in documentation
  • [ ] Examples in docstrings
  • [x] Type hints (first time trying these so might be wrong)
  • [ ] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
  • [x] Range argument
  • [ ] Handle multidimensional bins (for a future PR? - See https://github.com/xgcm/xhistogram/pull/59)
  • [ ] Handle np.datetime64 dtypes by refactoring to use np.searchsorted (for a future PR? See discussion)
  • [ ] Fast path for uniform bin widths (for a future PR? See suggestion)

Question: da.hist() or da.histogram()?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5400/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
925098557 MDExOlB1bGxSZXF1ZXN0NjczNjM4NDQy 5493 Fix bug when querying unnamed dataarray TomNicholas 35968931 open 0     0 2021-06-18T17:51:01Z 2022-11-21T22:32:37Z   MEMBER   0 pydata/xarray/pulls/5493

There might be a slightly neater way to do this, but this works.

  • [x] Closes #5492
  • [x] Tests added
  • [ ] Passes pre-commit run --all-files
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5493/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1424215477 I_kwDOAMm_X85U4821 7227 Typing with Variadic Generics in python 3.11 (PEP 646) TomNicholas 35968931 open 0     5 2022-10-26T15:03:01Z 2022-10-26T21:50:02Z   MEMBER      

What is your issue?

I just saw this new typing feature in python 3.11, and I'm wondering whether / where we could usefully use this? The feature is parametrizing Generics with arbitrary numbers of TypeVars, which allows you to have Array types whose static typing behaviour is a function of their shape. (But we could possibly use it for a tuple of dims too...) We might use it to do things like:

  • Specify that a function expects an array of a certain dimensionality
  • Overload methods based on the array dimensionality (e.g. .plot for 1D vs 2D arrays)
  • (If they implement Shape Arithmetic) Type hint how certain methods will change the output shape?

@headtr1ck @max-sixty any thoughts?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7227/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1372035441 I_kwDOAMm_X85Rx5lx 7031 Periodic Boundary Index TomNicholas 35968931 open 0     14 2022-09-13T21:39:40Z 2022-09-16T10:50:10Z   MEMBER      

What is your issue?

I would like to create a PeriodicBoundaryIndex using the Explicit Indexes refactor. I want to do it first in 1D, then 2D, then maybe ND.

I'm thinking this would be useful for: 1) Geoscientists with periodic longitudes 2) Any scientists with periodic domains 3) Road-testing the refactor + how easy the documentation is to follow.

Eventually I think perhaps this index should live in xarray itself? As it's domain-agnostic, doesn't introduce extra dependencies, and could be a conceptually simple example of a custom index.

I had a first go, using the benbovy:add-set-xindex-and-drop-indexes branch, and reading the in-progress docs page. I got a bit stuck early on though.

@benbovy here's what I have so far:

```python import numpy as np import pandas as pd import xarray as xr from xarray.core.variable import Variable from xarray.core.indexes import PandasIndex, is_scalar

from typing import Union, Mapping, Any

class PeriodicBoundaryIndex(PandasIndex): """ An index representing any 1D periodic numberline.

Implementation subclasses a normal xarray PandasIndex object but intercepts indexer queries.
"""

def _periodic_subset(self, indxr: Union[int, slice, np.ndarray]) -> pd.Index:
    """Equivalent of __getitem__ for a pd.Index, but respects periodicity."""

    length = len(self)

    if isinstance(indxr, int):
        return self.index[indxr % length]
    elif isinstance(indxr, slice):
        raise NotImplementedError()
    elif isinstance(indxr, np.ndarray):
        raise NotImplementedError()
    else:
        raise TypeError

def isel(
    self, indexers: Mapping[Any, Union[int, slice, np.ndarray, Variable]]
) -> Union["PeriodicBoundaryIndex", None]:

    print("isel called")

    indxr = indexers[self.dim]
    if isinstance(indxr, Variable):
        if indxr.dims != (self.dim,):
            # can't preserve a index if result has new dimensions
            return None
        else:
            indxr = indxr.data
    if not isinstance(indxr, slice) and is_scalar(indxr):
        # scalar indexer: drop index
        return None

    subsetted_index = self._periodic_subset[indxr]
    return self._replace(subsetted_index)

```

```python airtemps = xr.tutorial.open_dataset("air_temperature")['air']

da = airtemps.drop_indexes("lon")

world = da.set_xindex("lon", index_cls=PeriodicBoundaryIndex) ```

Now selecting a value with isel inside the range works fine, giving the same result same as without my custom index. (The length of the example dataset along lon is 53.)

python world.isel(lon=45)

isel called <xarray.DataArray 'air' (time: 2920, lat: 25)> ...

But indexing with a lon value outside the range of the index data gives an IndexError, seemingly without consulting my new index object. It didn't even print "isel called" :confused: What should I have implemented that I didn't implement?

python world.isel(lon=55)

```python

IndexError Traceback (most recent call last) Input In [35], in <cell line: 1>() ----> 1 world.isel(lon=55)

File ~/Documents/Work/Code/xarray/xarray/core/dataarray.py:1297, in DataArray.isel(self, indexers, drop, missing_dims, **indexers_kwargs) 1292 return self._from_temp_dataset(ds) 1294 # Much faster algorithm for when all indexers are ints, slices, one-dimensional 1295 # lists, or zero or one-dimensional np.ndarray's -> 1297 variable = self._variable.isel(indexers, missing_dims=missing_dims) 1298 indexes, index_variables = isel_indexes(self.xindexes, indexers) 1300 coords = {}

File ~/Documents/Work/Code/xarray/xarray/core/variable.py:1233, in Variable.isel(self, indexers, missing_dims, **indexers_kwargs) 1230 indexers = drop_dims_from_indexers(indexers, self.dims, missing_dims) 1232 key = tuple(indexers.get(dim, slice(None)) for dim in self.dims) -> 1233 return self[key]

File ~/Documents/Work/Code/xarray/xarray/core/variable.py:793, in Variable.getitem(self, key) 780 """Return a new Variable object whose contents are consistent with 781 getting the provided key from the underlying data. 782 (...) 790 array x.values directly. 791 """ 792 dims, indexer, new_order = self._broadcast_indexes(key) --> 793 data = as_indexable(self._data)[indexer] 794 if new_order: 795 data = np.moveaxis(data, range(len(new_order)), new_order)

File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:657, in MemoryCachedArray.getitem(self, key) 656 def getitem(self, key): --> 657 return type(self)(_wrap_numpy_scalars(self.array[key]))

File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:626, in CopyOnWriteArray.getitem(self, key) 625 def getitem(self, key): --> 626 return type(self)(_wrap_numpy_scalars(self.array[key]))

File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:533, in LazilyIndexedArray.getitem(self, indexer) 531 array = LazilyVectorizedIndexedArray(self.array, self.key) 532 return array[indexer] --> 533 return type(self)(self.array, self._updated_key(indexer))

File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:505, in LazilyIndexedArray._updated_key(self, new_key) 503 full_key.append(k) 504 else: --> 505 full_key.append(_index_indexer_1d(k, next(iter_new_key), size)) 506 full_key = tuple(full_key) 508 if all(isinstance(k, integer_types + (slice,)) for k in full_key):

File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:278, in _index_indexer_1d(old_indexer, applied_indexer, size) 276 indexer = slice_slice(old_indexer, applied_indexer, size) 277 else: --> 278 indexer = _expand_slice(old_indexer, size)[applied_indexer] 279 else: 280 indexer = old_indexer[applied_indexer]

IndexError: index 55 is out of bounds for axis 0 with size 53 ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7031/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1366657155 I_kwDOAMm_X85RdYiD 7010 Use sphinx-codeautolink in docs? TomNicholas 35968931 open 0     4 2022-09-08T16:35:52Z 2022-09-14T20:20:08Z   MEMBER      

I'm a big fan of sphinx-codeautolink 🙂

Originally posted by @Zac-HD in https://github.com/pydata/xarray/pull/6908#discussion_r963290657

This looks cool, lets add it!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7010/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1307212158 I_kwDOAMm_X85N6nl- 6801 Use Papyri to explore documentation TomNicholas 35968931 open 0     0 2022-07-17T21:21:21Z 2022-09-12T18:35:21Z   MEMBER      

What is your issue?

At Scipy @Carreau demo'ed a new docs engine: Papyri. (You can find the talk slides here).

In short it looks awesome, and we should use it to improve our docs!

You should watch the talk, but Papyri allows:

  • bidirectional crosslinking across libraries,
  • navigation,
  • proper reflow of user docstrings text,
  • proper reflow of inline images (when rendered to html),
  • proper math rendering (both in terminal and html), and more.

There is also a jupyter-lab extension in the works.

One of the examples in the talk uses xarray docs, as papyri builds from our .rst files.

Here I have "ingested" both xarray and numpy docs, which papyri's explorer dynamically links together in both directions.

I think this is super cool, and we should think about using it. However the project is extremely early stage, and currently has many bugs, and no unified way to ship it (the example was made locally).

I encourage other xarray devs to have a look and a think about how we can use it / benefit / test it out though!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6801/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1337337135 I_kwDOAMm_X85PtiUv 6911 Public hypothesis strategies for generating xarray data TomNicholas 35968931 open 0     0 2022-08-12T15:17:40Z 2022-08-12T17:46:48Z   MEMBER      

Proposal

We should expose a public set of hypothesis strategies for use in testing xarray code. It could be useful for downstream users, but also for our own internal test suite. It should live in xarray.testing.strategies. Specifically perhaps

  • xarray.testing.strategies.variables
  • xarray.testing.strategies.dataarrays
  • xarray.testing.strategies.datasets
  • (xarray.testing.strategies.datatrees ?)
  • xarray.testing.strategies.indexes
  • xarray.testing.strategies.chunksizes following dask.array.testing.strategies.chunks

This issue is different from #1846 because that issue describes how we could use such strategies in our own testing code, whereas this issue is for how we create general strategies that we could use in many places (including exposing publicly).

I've become interested in this as part of wanting to see #6894 happen. #6908 would effectively close this issue, but itself is just a pulled out section of all the work @keewis did in #4972.

(Also xref https://github.com/pydata/xarray/issues/2686. Also also @max-sixty didn't you have an issue somewhere about creating better and public test fixtures?)


Previous work

I was pretty surprised to see this comment by @Zac-HD in #1846

@rdturnermtl wrote a Hypothesis extension for Xarray, which is at least a nice demo of what's possible.

given that we might have just used that instead of writing new ones in #4972! (@keewis had you already seen that extension?)

We could literally just include that extension in xarray and call this issue solved...


Shrinking performance of strategies

However I was also reading about strategies that shrink yesterday and think that we should try to make some effort to come up with strategies for producing xarray objects that shrink in a performant and well-motivated manner. In particular by pooling the knowledge of the @xarray-dev core team we could try to create strategies that search for many of the edge cases that we are collectively aware of.

My understanding of that guide is that our strategies ideally should:

1) Quickly include or exclude complexity

For instance `if draw(booleans()): # then add coordinates to generated dataset`.

It might also be nice to have strategy constructors which allow passing other strategies in, so the user can choose how much complexity they want their strategy to generate. e.g. I think a signature like this should be possible

```python
from hypothesis import strategies as st

@st.composite
def dataarrays(
    data: xr.Variable | st.SearchStrategy[xr.Variable] | duckarray | st.SearchStrategy[duckarray] | None ..., 
    coords: ...,
    dims: ...,
    attrs: ...,
    name: ...,
) -> st.SearchStrategy[xr.DataArray]:
    """
    Hypothesis strategy for generating arbitrary DataArray objects.

    Parameters
    ----------
    data
        Can pass an absolute value of an appropriate type (i.e. `Variable`, `np.ndarray` etc.), 
        or pass a strategy which generates such types.
         Default is that the generated DataArray could contain any possible data.
    ...
    (similar flexibility for other constructor arguments)
    """
    ...
```

2) Deliberately generate known edge cases

For instance deliberately create:
  - dimension coordinates, 
  - names which are Hashable but not strings, 
  - multi-indexes,
  - weird dtypes,
  - NaNs,
  - duckarrays instead of `np.ndarray`,
  - inconsistent chunking between different variables,
  - (any other ideas?)

3) Be very modular internally, to help with "keeping things local"

Each sub-strategy should be in its own function, so that hypothesis' decision tree can cut branches off as soon as possible.

4) Avoid obvious inefficiencies

e.g. not .filter(...) or assume(...) if we can help it, and if we do need them then keep them in the same function that generates that data. Plus just keep all sizes small by default.

Perhaps the solutions implemented in #6894 or this hypothesis xarray extension already meet these criteria - I'm not sure. I just wanted a dedicated place to discuss building the strategies specifically, without it getting mixed in with complicated discussions about whatever we're trying to use the strategies for!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6911/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1012713019 PR_kwDOAMm_X84siCtw 5835 combine_nested dataarrays TomNicholas 35968931 open 0     1 2021-09-30T23:19:03Z 2022-06-09T14:50:16Z   MEMBER   0 pydata/xarray/pulls/5835

The spiritual successor to #4696 , this attempts to generalise combine_nested to handle both named and unnamed DataArrays in the same way that combine_by_coords does.

Unfortunately it doesn't actually work yet - I think the problem is a bit more subtle than I originally thought.

Ideally I would implement this using the same logical structure as in #5834, but my attempt to do that was thwarted by how tricky it is to iterate over a nested list-of-lists of arbitrary and modify the stored objects in place...

  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5835/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1230247677 I_kwDOAMm_X85JVBb9 6585 Add example of apply_ufunc + dask.array.map_blocks to docs? TomNicholas 35968931 open 0     1 2022-05-09T21:02:43Z 2022-05-09T21:10:23Z   MEMBER      

What is your issue?

A pattern I use fairly often is apply_ufunc(..., dask="allowed") calling a function wrapped with dask.array.map_blocks. This is necessary to use apply_ufunc with chunked core dimensions.

AFAIK this currently isn't discussed anywhere in the docs. A sensible place to add a recipe explaining this would be just after this section in your notebook @dcherian ?

@rabernat @jbusecke this is the pattern we used in xGCM FYI

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6585/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
400289716 MDU6SXNzdWU0MDAyODk3MTY= 2686 Is `create_test_data()` public API? TomNicholas 35968931 open 0     3 2019-01-17T14:00:20Z 2022-04-09T01:48:14Z   MEMBER      

We want to encourage people to use and extend xarray, and we already provide testing functions as public API to help with this.

One function I keep using when writing code which uses xarray is xarray.tests.test_dataset.create_test_data(). This is very useful for quickly writing tests for the same reasons that it's useful in xarray's internal tests, but it's not explicitly public API. This means that there's no guarantee it won't change/disappear, which is not ideal if you're trying to write a test suite for separate software. But so many tests in xarray rely on it that presumably it's not going to get changed.

Is there any reason why it shouldn't be public API? Is there something I should use instead?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2686/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1042652334 I_kwDOAMm_X84-JZyu 5927 Release frequency TomNicholas 35968931 open 0     11 2021-11-02T17:53:57Z 2021-11-05T17:12:42Z   MEMBER      

In issuing the last 2 xarray releases, I've noticed a pattern, that goes something like this: 1) We don't have a release for 3+ months, for no particular reason. 2) Someone realises they want a release, to fix a bug or make a new feature available. 3) That person announces that they would like a release. 4) Lots of people (myself especially) suggest all sorts of unfinished issues that they think could or should go into that next release. 5) The dev team end up spending the better part of a week trying to finish up all of these miscellaneous PRs. 6) Finally it is deemed "ready" in some fairly arbitrary way. 7) The release is made manually using the "16 easy steps". 8) No-one wants to think about releasing again for another 3 months...

Frequency

I mentioned this to @rabernat and he suggested that we should be releasing much more frequently.

If we released more regularly then we wouldn't have this effect of "oh and we should try to squeeze XYZ into this release".

I think the majority of the time xarray's CI is passing, and even when it's not it's only 1 tiny fix away from passing. That means that we in theory could release the main branch at practically any time, and it would be perfectly stable for users. (I personally exclusively use the most recent version of main.)

I also don't know of any downside to releasing very regularly (other than that someone has to issue the release).

How about we try to release after each of the bi-weekly dev calls? We could make it an official part of the call to end by saying: - "any reason why we can't release right now?" - "no, CI is passing" - "okay [person] volunteers to click the button right after this meeting"

That would immediately increase our release frequency by up to 6x.

Automation

Can we automate any more steps of our release process? As far as I can tell the only steps that really need human intervention are - "write the release summary" and - "check that all the automated stuff went as expected".

We could potentially still automate - "add new section to the whats-new.rst", - "update the stable branch", - "update the active version of the docs" (maybe?), and - "email various mailing lists".

@pydata/xarray thoughts?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5927/reactions",
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
939072049 MDU6SXNzdWU5MzkwNzIwNDk= 5587 Tolerance argument for `da.isin()`? TomNicholas 35968931 open 0     1 2021-07-07T16:39:42Z 2021-10-13T06:28:11Z   MEMBER      

Is your feature request related to a problem? Please describe. Sometimes you want to check that data values are present in another array, but only up to a certain tolerance.

Describe the solution you'd like da.isin(test_values, tolerance=1e-6), where the tolerance argument is optional.

Not sure what the implementation should be but there are two vectorized suggestions here.

Describe alternatives you've considered Different to np.isclose because isin compares all values against a flattened array, whereas isclose compares individual values elementwise.

Additional context @jbusecke requested it.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5587/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
956103236 MDU6SXNzdWU5NTYxMDMyMzY= 5648 Duck array compatibility meeting TomNicholas 35968931 open 0     31 2021-07-29T18:31:52Z 2021-10-12T18:26:17Z   MEMBER      

Proposal: hold a high-level inter-library meeting to sort out roadblocks in the duck-array wrapping efforts.

Whilst trying to get dask, pint and xarray all working nicely together, I couldn't help but notice there are important issues which conclude with a shared sentiment that "we just need to make a decision as to what wraps what" but since then have had essentially no codified consensus, and hence no progress for the past year. Multiply-nested duck-array wrapping is complicated and involves a lot of separate libraries (as this graph of potential wrappings shows), but could be an amazingly powerful feature!

I suggest that as asynchronous discussion hasn't moved this forward, we should instead hold a (hopefully one-off) meeting to make these high-level design decisions.

I'm happy to arrange the meeting, but for this to work we ideally need attendees who understand the issues from the perspective of each of the main libraries involved - some suggestions: - xarray (@shoyer and @keewis) - dask (@mrocklin?) - pint (@jthielen) - cupy? (@jacobtomlinson?) - sparse? (@crusaderky?) - pytorch?? (@rgommers??)

Possible Agenda (please suggest additions!):

  • Which libraries should wrap which other libraries
  • Repo/NEP/etc. for standardizing wrapping order and other future decisions
  • Outstanding issues to tackle first

Background reading

  • Basic idea of the numpy dispatch mechanism explained in a blog post
  • @jthielen 's excellent overview comment, with links to relevant NEP's
  • Pint's technical commentary on array type support

Some related issues (there are many more - please add)

  • https://github.com/pydata/xarray/issues/5559
  • https://github.com/pydata/xarray/issues/3950
  • dask/dask#5329
  • dask/dask#6637
  • dask/dask#6636
  • dask/dask#6635
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5648/reactions",
    "total_count": 9,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 5,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
446054247 MDU6SXNzdWU0NDYwNTQyNDc= 2975 Inconsistent/confusing behaviour when concatenating dimension coords TomNicholas 35968931 open 0     2 2019-05-20T11:01:37Z 2021-07-08T17:42:52Z   MEMBER      

I noticed that with multiple conflicting dimension coords then concat can give pretty weird/counterintuitive results, at least compared to what the documentation suggests they should give:

```python

Create two datasets with conflicting coordinates

objs = [Dataset({'x': [0], 'y': [1]}), Dataset({'y': [0], 'x': [1]})]

[<xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: * x (x) int64 0 * y (y) int64 1 Data variables: empty, <xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: * y (y) int64 0 * x (x) int64 1 Data variables: empty] ```

```python

Try to join along only 'x',

coords='minimal' so concatenate "Only coordinates in which the dimension already appears"

concat(objs, dim='x', coords='minimal')

<xarray.Dataset> Dimensions: (x: 2, y: 2) Coordinates: * y (y) int64 0 1 * x (x) int64 0 1 Data variables: empty

It's joined along x and y!

```

Based on my reading of the docstring for concat, I would have expected this to not attempt to concatenate y, because coords='minimal', and instead to throw an error because 'y' is a "non-concatenated variable" whose values are not the same across datasets.

Now let's try to get concat to broadcast 'y' across 'x':

```python

Try to join along only 'x' by setting coords='different'

concat(objs, dim='x', coords='different') ```

Now as "Data variables which are not equal (ignoring attributes) across all datasets are also concatenated" then I would have expected 'y' to be concatenated across 'x', i.e. to add the 'x' dimension to the 'y' coord, i.e:

python <xarray.Dataset> Dimensions: (x: 2, y: 1) Coordinates: * y (y, x) int64 1 0 * x (x) int64 0 1 Data variables: *empty* But that's not what we get!: <xarray.Dataset> Dimensions: (x: 2, y: 2) Coordinates: * y (y) int64 0 1 * x (x) int64 0 1 Data variables: *empty*

Same again but without dimension coords

If we create the same sort of objects but the variables are data vars not coords, then everything behaves exactly as expected:

```python objs2 = [Dataset({'a': ('x', [0]), 'b': ('y', [1])}), Dataset({'a': ('x', [1]), 'b': ('y', [0])})]

[<xarray.Dataset> Dimensions: (x: 1, y: 1) Dimensions without coordinates: x, y Data variables: a (x) int64 0 b (y) int64 1, <xarray.Dataset> Dimensions: (x: 1, y: 1) Dimensions without coordinates: x, y Data variables: a (x) int64 1 b (y) int64 0]

concat(objs2, dim='x', data_vars='minimal')

ValueError: variable b not equal across datasets

concat(objs2, dim='x', data_vars='different')

<xarray.Dataset> Dimensions: (x: 2, y: 1) Dimensions without coordinates: x, y Data variables: a (x) int64 0 1 b (x, y) int64 1 0 ```

Also if you do the same again but with coordinates which are not dimension coords, i.e:

```python objs3 = [Dataset(coords={'a': ('x', [0]), 'b': ('y', [1])}), Dataset(coords={'a': ('x', [1]), 'b': ('y', [0])})]

[<xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: a (x) int64 0 b (y) int64 1 Dimensions without coordinates: x, y Data variables: empty, <xarray.Dataset> Dimensions: (x: 1, y: 1) Coordinates: a (x) int64 1 b (y) int64 0 Dimensions without coordinates: x, y Data variables: empty] ``` then this again gives the expected concatenation behaviour.

So this implies that the compatibility checks that are being done on the data vars are not being done on the coords, but only if they are dimension coordinates!

Either this is not the desired behaviour or the concat docstring needs to be a lot clearer. If we agree that this is not the desired behaviour then I will have a look inside concat to work out why it's happening.

EDIT: Presumably this has something to do with the ToDo in the code for concat: # TODO: support concatenating scalar coordinates even if the concatenated dimension already exists...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2975/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
936305081 MDU6SXNzdWU5MzYzMDUwODE= 5570 assert_equal does not handle wrapped duck arrays well TomNicholas 35968931 open 0     0 2021-07-03T18:27:11Z 2021-07-03T18:49:57Z   MEMBER      

Whilst trying to fix #5559 I noticed that xarray.testing.assert_equal (and xarray.testing.assert_equal) don't behave well with wrapped duck-typed arrays.

Firstly, they can give unhelpful AssertionError messages:

```python In [5]: a = np.array([1,2,3])

In [6]: q = pint.Quantity([1,2,3], units='m')

In [7]: da_np = xr.DataArray(a, dims='x')

In [8]: da_p = xr.DataArray(q, dims='x')

In [9]: da_np Out[9]: <xarray.DataArray (x: 3)> array([1, 2, 3]) Dimensions without coordinates: x

In [10]: da_p Out[10]: <xarray.DataArray (x: 3)> <Quantity([1 2 3], 'meter')> Dimensions without coordinates: x

In [11]: from xarray.testing import assert_equal

In [12]: assert_equal(da_np, da_p) /home/tegn500/miniconda3/envs/py38-mamba/lib/python3.8/site-packages/xarray/core/duck_array_ops.py:265: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray. flag_array = (arr1 == arr2) | (isnull(arr1) & isnull(arr2)) /home/tegn500/miniconda3/envs/py38-mamba/lib/python3.8/site-packages/xarray/core/duck_array_ops.py:265: DeprecationWarning: elementwise comparison failed; this will raise an error in the future. flag_array = (arr1 == arr2) | (isnull(arr1) & isnull(arr2)) /home/tegn500/miniconda3/envs/py38-mamba/lib/python3.8/site-packages/xarray/core/duck_array_ops.py:265: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray. flag_array = (arr1 == arr2) | (isnull(arr1) & isnull(arr2)) /home/tegn500/miniconda3/envs/py38-mamba/lib/python3.8/site-packages/xarray/core/duck_array_ops.py:265: DeprecationWarning: elementwise comparison failed; this will raise an error in the future. flag_array = (arr1 == arr2) | (isnull(arr1) & isnull(arr2)) /home/tegn500/miniconda3/envs/py38-mamba/lib/python3.8/site-packages/numpy/core/_asarray.py:102: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray. return array(a, dtype, copy=False, order=order)


AssertionError Traceback (most recent call last) <ipython-input-12-33b16d6b79ed> in <module> ----> 1 assert_equal(da_np, da_p)

[... skipping hidden 1 frame]

~/miniconda3/envs/py38-mamba/lib/python3.8/site-packages/xarray/testing.py in assert_equal(a, b) 79 assert type(a) == type(b) 80 if isinstance(a, (Variable, DataArray)): ---> 81 assert a.equals(b), formatting.diff_array_repr(a, b, "equals") 82 elif isinstance(a, Dataset): 83 assert a.equals(b), formatting.diff_dataset_repr(a, b, "equals")

AssertionError: Left and right DataArray objects are not equal

Differing values: L array([1, 2, 3]) R array([1, 2, 3]) `` These are different, but not because the array values are different. At the moment.valuesis converting the wrapped array type by stripping the units too - it might be better to check the type of the wrapped array first, then use.valuesto compare. Or could we even do duck-typed testing by delegating viaexpected.data.equals(actual.data)? (EDIT: I don't think a.equals()method exists in the numpy API, but you could do the equivalent ofassert all(expected.data == actual.data)`

Secondly, given that we coerce before comparison, I think it's possible that assert_equal could say two different wrapped duck-type arrays are equal when they are not, just because np.asarray() coerces them to the same values.

EDIT2: Looks like there is some discussion here

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5570/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 46.314ms · About: xarray-datasette