home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

where issue = 1376109308 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date)

These facets timed out: author_association

user 1

  • shoyer · 2 ✖

issue 1

  • Should Xarray stop doing automatic index-based alignment? · 2 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1249910951 https://github.com/pydata/xarray/issues/7045#issuecomment-1249910951 https://api.github.com/repos/pydata/xarray/issues/7045 IC_kwDOAMm_X85KgCCn shoyer 1217238 2022-09-16T22:26:36Z 2022-09-16T22:26:36Z MEMBER

As a concrete example, suppose we have two datasets: 1. Hourly predictions for 10 days 2. Daily observations for a month.

```python import numpy as np import pandas as pd import xarray

predictions = xarray.DataArray( np.random.RandomState(0).randn(24*10), {'time': pd.date_range('2022-01-01', '2022-01-11', freq='1h', closed='left')}, ) observations = xarray.DataArray( np.random.RandomState(1).randn(31), {'time': pd.date_range('2022-01-01', '2022-01-31', freq='24h')}, ) ```

Today, if you compare these datasets, they automatically align: ```

predictions - observations <xarray.DataArray (time: 10)> array([ 0.13970698, 2.88151104, -1.0857261 , 2.21236931, -0.85490761, 2.67796423, 0.63833301, 1.94923669, -0.35832191, 0.23234996]) Coordinates: * time (time) datetime64[ns] 2022-01-01 2022-01-02 ... 2022-01-10 ```

With this proposed change, you would get an error, e.g., something like: ```

predictions - observations ValueError: xarray objects are not aligned along dimension 'time':
array(['2022-01-01T00:00:00.000000000', '2022-01-02T00:00:00.000000000', '2022-01-03T00:00:00.000000000', '2022-01-04T00:00:00.000000000', '2022-01-05T00:00:00.000000000', '2022-01-06T00:00:00.000000000', '2022-01-07T00:00:00.000000000', '2022-01-08T00:00:00.000000000', '2022-01-09T00:00:00.000000000', '2022-01-10T00:00:00.000000000', '2022-01-11T00:00:00.000000000', '2022-01-12T00:00:00.000000000', '2022-01-13T00:00:00.000000000', '2022-01-14T00:00:00.000000000', '2022-01-15T00:00:00.000000000', '2022-01-16T00:00:00.000000000', '2022-01-17T00:00:00.000000000', '2022-01-18T00:00:00.000000000', '2022-01-19T00:00:00.000000000', '2022-01-20T00:00:00.000000000', '2022-01-21T00:00:00.000000000', '2022-01-22T00:00:00.000000000', '2022-01-23T00:00:00.000000000', '2022-01-24T00:00:00.000000000', '2022-01-25T00:00:00.000000000', '2022-01-26T00:00:00.000000000', '2022-01-27T00:00:00.000000000', '2022-01-28T00:00:00.000000000', '2022-01-29T00:00:00.000000000', '2022-01-30T00:00:00.000000000', '2022-01-31T00:00:00.000000000'], dtype='datetime64[ns]') vs array(['2022-01-01T00:00:00.000000000', '2022-01-01T01:00:00.000000000', '2022-01-01T02:00:00.000000000', ..., '2022-01-10T21:00:00.000000000', '2022-01-10T22:00:00.000000000', '2022-01-10T23:00:00.000000000'], dtype='datetime64[ns]') ```

Instead, you would need to manually align these objects, e.g., with xarray.align, reindex_like() or interp_like(), e.g., ```

predictions, observations = xarray.align(predictions, observations) or observations = observations.reindex_like(predictions) or predictions = predictions.interp_like(observations) ```

To (partially) simulate the effect of this change on a codebase today, you could write xarray.set_options(arithmetic_join='exact') -- but presmably it would also make sense to change Xarray's other alignment code (e.g., in concat and merge).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should Xarray stop doing automatic index-based alignment? 1376109308
1249601076 https://github.com/pydata/xarray/issues/7045#issuecomment-1249601076 https://api.github.com/repos/pydata/xarray/issues/7045 IC_kwDOAMm_X85Ke2Y0 shoyer 1217238 2022-09-16T17:16:52Z 2022-09-16T17:18:38Z MEMBER

IMO we could first align (hah) these choices to be the same:

the exact mode of automatic alignment (outer vs inner vs left join) depends on the specific operation.

The problem is that user expectations are actually rather different for different options:

  • With data movement operations like xarray.merge, you expect to keep around all existing data -- so you want an outer join.
  • With inplace operations that modify an existing Dataset, e.g., by adding new variables, you don't expect the existing coordinates to change -- so you want a left join.
  • With computate based operations (like arithmatic), you don't have an expectation that all existing data is unmodified, so keeping around a bunch of NaN values felt very wasteful -- hence the inner join.

What do you think of making the default FloatIndex use a reasonable (hard to define!) rtol for comparisons?

This would definitely be a step forward! However, it's a tricky nut to crack. We would both need a heuristic for defining rtol (some fraction of coordinate spacing?) and a method for deciding what the resulting coordinates should be (use values from the first object?).

Even then, automatic alignment is often problematic, e.g., imagine cases where a coordinate is defined in separate units.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should Xarray stop doing automatic index-based alignment? 1376109308

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 4080.184ms · About: xarray-datasette