home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

4 rows where comments = 8, repo = 13221727 and user = 35968931 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 2
  • pull 2

state 2

  • closed 2
  • open 2

repo 1

  • xarray · 4 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1332231863 I_kwDOAMm_X85PaD63 6894 Public testing framework for duck array integration TomNicholas 35968931 open 0     8 2022-08-08T18:23:49Z 2024-01-25T04:04:11Z   MEMBER      

What is your issue?

In #4972 @keewis started writing a public framework for testing the integration of any duck array class in xarray, inspired by the testing framework pandas has for ExtensionArrays. This is a meta-issue for what our version of that framework for wrapping numpy-like duck arrays should look like.

(Feel free to edit / add to this)

What behaviour should we test?

We have a lot of xarray methods to test with any type of duck array. Each of these bullets should correspond to one or more testing base classes which the duck array library author would inherit from. In rough order of increasing complexity:

  • [x] Constructors - Including for Variable #6903
  • [x] Properties - checking that .shape, .dtype etc. exist on the wrapped array, see #4285 for example #6903
  • [x] Reductions - #4972 also uses parameters to automatically test many methods, and hypothesis to test each method for many different array instances.
  • [ ] Unary ops
  • [ ] Binary ops
  • [ ] Selection
  • [ ] Computation
  • [ ] Combining
  • [ ] Groupby
  • [ ] Rolling
  • [ ] Coarsen
  • [ ] Weighted

We don't need to test that the array class obeys everything else in the Array API Standard. (For instance .device is probably never going to be used by xarray directly.) We instead assume that if the array class doesn't implement something in the API standard but all the generated tests pass, then all is well.

How extensible does our testing framework need to be?

To be able to test any type of wrapped array our testing framework needs to itself be quite flexible.

  • User-defined checking - For some arrays np.testing.assert_equal is not enough to guarantee correctness, so the user creating tests needs to specify additional checks. #4972 shows how to do this for checking the units of resulting pint arrays.
  • User-created data? - Some array libraries might need to test array data that is invalid for numpy arrays. I'm thinking specifically of testing wrapping ragged arrays. #4285
  • Parallel computing frameworks? - Related to the last point is chunked arrays. Here the strategy requires an extra chunks argument when the array is created, and any results need to first call .compute(). Testing parallel-executed arrays might also require pretty complicated SetUps and TearDowns in fixtures too. (see also #6807)

What documentation / examples do we need?

All of this content should really go on a dedicated page in the docs, perhaps grouped alongside other ways of extending xarray.

  • [ ] Motivation
  • [ ] What subset of the Array API standard we expect duck array classes to define (could point to a typing protocol?)
  • [ ] Explanation that the array type needs to return the same type for any numpy-like function which xarray might call upon that type (i.e. the set of duckarray instances is closed under numpy operations)
  • [ ] Explanation of the different base classes
  • [ ] Simple demo of testing a toy numpy-like array class
  • [ ] Point to code testing more advanced examples we actually use (e.g. sparse, pint)
  • [ ] Which advanced behaviours are optional (e.g. Constructors and Properties have to work, but Groupby is optional)

Where should duck array compatibility testing eventually live?

Right now the tests for sparse & pint are going into the xarray repo, but presumably we don't want tests for every duck array type living in this repository. I suggest that we want to work towards eventually having no array library-specific tests in this repository at all. (Except numpy I guess.) Thanks @crusaderky for the original suggestion.

Instead all tests involving pint could live in pint-xarray, all involving sparse could live in the sparse repository (or a new sparse-xarray repo), etc. etc. We would set those test jobs to re-run when xarray is released, and then xref any issues revealed here if needs be.

We should probably also move some of our existing tests https://github.com/pydata/xarray/pull/7023#pullrequestreview-1104932752

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6894/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2019594436 I_kwDOAMm_X854YJDE 8496 Dataset.dims should return a set, not a dict of sizes TomNicholas 35968931 open 0     8 2023-11-30T22:12:37Z 2023-12-02T03:10:14Z   MEMBER      

What is your issue?

This is inconsistent:

```python In [25]: ds Out[25]: <xarray.Dataset> Dimensions: (x: 1, y: 2) Dimensions without coordinates: x, y Data variables: a (x, y) int64 0 1

In [26]: ds['a'].dims Out[26]: ('x', 'y')

In [27]: ds['a'].sizes Out[27]: Frozen({'x': 1, 'y': 2})

In [28]: ds.dims Out[28]: Frozen({'x': 1, 'y': 2})

In [29]: ds.sizes Out[29]: Frozen({'x': 1, 'y': 2}) ```

Surely ds.dims should return something like a Frozenset({'x', 'y'})? (because dimension order is meaningless when you have multiple arrays underneath - see https://github.com/pydata/xarray/issues/8498)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8496/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
592331420 MDExOlB1bGxSZXF1ZXN0Mzk3MzM1NTY3 3926 Remove old auto combine TomNicholas 35968931 closed 0     8 2020-04-02T03:25:54Z 2020-06-24T18:22:55Z 2020-06-24T18:22:55Z MEMBER   0 pydata/xarray/pulls/3926
  • [x] Finishes deprecation cycle started in #2616 (was supposed to have been done in 0.15)
  • [x] Passes isort -rc . && black . && mypy . && flake8
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

I've set combine='by_coords' as the default argument to open_mfdataset. Technically we could go for either, as the deprecation warning just told users to make it explicit from now on, but going for by_coords rather than nested means that: - The concat_dim argument is not needed by default, - The default behaviour of the function is the "magic" one - users have to opt-in to the more explicit behaviour.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3926/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
369673042 MDExOlB1bGxSZXF1ZXN0MjIyNTU3NzU5 2482 Global option to always keep/discard attrs on operations TomNicholas 35968931 closed 0     8 2018-10-12T19:01:12Z 2020-04-05T03:53:53Z 2018-10-30T01:01:08Z MEMBER   0 pydata/xarray/pulls/2482
  • [x] Resolves wishes of some users and relevant for discussion in #138, #442, #688, #828, #988, #1009, #1271, #2288, #2473
  • [x] Tests added, both of setting the option and of attributes propagating in the expected way
  • [x] Tests passed
  • [x] Documented

Adds a global option to either always keep or always discard attrs in method and function calls.

The behaviour is backwards-compatible, as the logic is: - if keep_attrs supplied as keyword argument then use that - else if global option (xarray.set_options(keep_attrs=True)) is set then use that - else use default value of keep_attrs argument for that particular function/method (kept the same as they were for backwards-compatibility).

Main use cases include users who want to store the units of their data in the attrs, users who want to always keep information about the source or history of their data, and users who want to store objects in their attributes which are needed to supplement the xarray objects (e.g. an xgcm.grid). It should eventually be superceded by hooks for custom attribute handling (#988), but will be useful until then.

I have left the top-level functions like concat and merge alone. Currently concat keeps the attributes of the first object passed to it, and merge returns a dataset with no attributes. It's not clear how this should be treated though, so I left it to users to extend those functions if they need to.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2482/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 429.248ms · About: xarray-datasette