home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "MEMBER" and issue = 1381955373 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • benbovy 4

issue 1

  • Merge wrongfully creating NaN · 4 ✖

author_association 1

  • MEMBER · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1255073449 https://github.com/pydata/xarray/issues/7065#issuecomment-1255073449 https://api.github.com/repos/pydata/xarray/issues/7065 IC_kwDOAMm_X85Kzuap benbovy 4160723 2022-09-22T14:04:22Z 2022-09-22T14:05:56Z MEMBER

Actually there's another conversion when you reuse an xarray dimension coordinate in array-like computations:

```python ds = xr.Dataset(coords={"x": np.array([1.2, 1.3, 1.4], dtype=np.float16)})

coordinate data is a wrapper around a pandas.Index object

(it keeps track of the original array dtype)

ds.variables["x"]._data

PandasIndexingAdapter(array=Float64Index([1.2001953125, 1.2998046875, 1.400390625], dtype='float64', name='x'), dtype=dtype('float16'))

This coerces the pandas.Index back as a numpy array

np.asarray(ds.x)

array([1.2, 1.3, 1.4], dtype=float16)

which is equivalent to

ds.variables["x"]._data.array()

array([1.2, 1.3, 1.4], dtype=float16)

```

The round-trip conversion preserves the original dtype so different execution times may be expected.

I can't tell much why the results are different (how much are they different?), but I wouldn't be surprised if it's caused by rounding errors accumulated through the computation of a complex formula like haversine.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Merge wrongfully creating NaN 1381955373
1255014363 https://github.com/pydata/xarray/issues/7065#issuecomment-1255014363 https://api.github.com/repos/pydata/xarray/issues/7065 IC_kwDOAMm_X85Kzf_b benbovy 4160723 2022-09-22T13:19:23Z 2022-09-22T13:19:23Z MEMBER

As my latitude and longitude arrays in both datasets have a resolution of 0.1 degrees, wouldn't it make sense to use np.float16 for both arrays?

I don't think so (at least not currently). The numpy arrays are by default converted to pandas.Index objects for each dimension coordinate, and for floats there's only pandas.Float64Index. It looks like it will be depreciated in favor of pandas.NumericIndex that supports more dtypes, but still I don't see support for 16 bits floats.

Regarding your nearest lat/lon point data selection problem, this is something that could probably be better solved using more specific (custom) indexes like the ones available in xoak. Xoak only supports point-wise selection at the moment, though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Merge wrongfully creating NaN 1381955373
1254983291 https://github.com/pydata/xarray/issues/7065#issuecomment-1254983291 https://api.github.com/repos/pydata/xarray/issues/7065 IC_kwDOAMm_X85KzYZ7 benbovy 4160723 2022-09-22T12:54:43Z 2022-09-22T12:54:43Z MEMBER

The problem is that I tried to merge with join='override' but it was still taking a long time. Probably I wasn't using the right order.

Not 100% sure but maybe xr.merge loads all the data from your datasets and performs some equality checks. Perhaps you could see how much time it takes after loading all the data, or try different xr.merge(compat=) values?

Before closing, just a curiosity: in this corner case shouldn't xarray cast automatically the lat,lon coordinate arrays to the same dtype or is it a dangerous assumption?

We already do this for label indexers that are passed to .sel(). However, for alignment I think that it would require re-building an index for every cast coordinate, which may be expensive and is probably no ideal if done automatically.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Merge wrongfully creating NaN 1381955373
1254862548 https://github.com/pydata/xarray/issues/7065#issuecomment-1254862548 https://api.github.com/repos/pydata/xarray/issues/7065 IC_kwDOAMm_X85Ky67U benbovy 4160723 2022-09-22T10:58:10Z 2022-09-22T10:58:36Z MEMBER

Hi @guidocioni.

I see that the longitude and latitude coordinates both have different dtype in the two input datasets, which likely explains why you have many NaNs and larger sizes (almost 2x) for the lat and lon dimensions in the resulting dataset.

Here's a small reproducible example:

```python import numpy as np import xarray as xr

lat = np.random.uniform(0, 40, size=100) lon = np.random.uniform(0, 180, size=100)

ds1 = xr.Dataset( coords={"lon": lon.astype(np.float32), "lat": lat.astype(np.float32)} ) ds2 = xr.Dataset( coords={"lon": lon, "lat": lat} )

ds1.indexes["lat"].equals(ds2.indexes["lat"])

False

xr.merge([ds1, ds2], join="exact")

ValueError: cannot align objects with join='exact' where index/labels/sizes

are not equal along these coordinates (dimensions): 'lon' ('lon',)

```

If coordinates labels differ only by their encoding, you could use xr.merge([ds1, ds2], join="override"), which will take the coordinates from the 1st object.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Merge wrongfully creating NaN 1381955373

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 721.933ms · About: xarray-datasette