home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 1381955373 and user = 12760310 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • guidocioni · 5 ✖

issue 1

  • Merge wrongfully creating NaN · 5 ✖

author_association 1

  • NONE 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1260899163 https://github.com/pydata/xarray/issues/7065#issuecomment-1260899163 https://api.github.com/repos/pydata/xarray/issues/7065 IC_kwDOAMm_X85LJ8tb guidocioni 12760310 2022-09-28T13:16:13Z 2022-09-28T13:16:13Z NONE

Hey @benbovy, sorry for resurrect again this post but today I'm seeing the same issue and for the love of me I cannot understand what is the difference in this dataset that is causing the latitude and longitude arrays to be duplicated...

If I try to merge these two datasets I get one with lat lon doubled in size.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Merge wrongfully creating NaN 1381955373
1255092548 https://github.com/pydata/xarray/issues/7065#issuecomment-1255092548 https://api.github.com/repos/pydata/xarray/issues/7065 IC_kwDOAMm_X85KzzFE guidocioni 12760310 2022-09-22T14:17:07Z 2022-09-22T14:17:17Z NONE

Actually there's another conversion when you reuse an xarray dimension coordinate in array-like computations:

```python ds = xr.Dataset(coords={"x": np.array([1.2, 1.3, 1.4], dtype=np.float16)})

coordinate data is a wrapper around a pandas.Index object

(it keeps track of the original array dtype)

ds.variables["x"]._data

PandasIndexingAdapter(array=Float64Index([1.2001953125, 1.2998046875, 1.400390625], dtype='float64', name='x'), dtype=dtype('float16'))

This coerces the pandas.Index back as a numpy array

np.asarray(ds.x)

array([1.2, 1.3, 1.4], dtype=float16)

which is equivalent to

ds.variables["x"]._data.array()

array([1.2, 1.3, 1.4], dtype=float16)

```

The round-trip conversion preserves the original dtype so different execution times may be expected.

I can't tell much why the results are different (how much are they different?), but I wouldn't be surprised if it's caused by rounding errors accumulated through the computation of a complex formula like haversine.

The differences are larger than I would expect (order of 0.1 in some variables) but could be related to the fact that, when using different precisions, the closest grid points to the target point could change. This would eventually lead to a different value of the variable extracted from the original dataset.

Unfortunately I didn't have time to verify if it was the case, but I think this is the only valid explanation because the variables of the dataset are untouched.

It is still puzzling because, as the target points have a precision of e.g. (45.820497820, 13.003510004), I would expect the cast of the dataset coordinates from e.g. (45.8, 13.0) to preserve the 0 (45.800000000, 13.00000000), so that the closest point should not change.

Anyway, I think we're getting off-topic, thanks for the help :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Merge wrongfully creating NaN 1381955373
1255026304 https://github.com/pydata/xarray/issues/7065#issuecomment-1255026304 https://api.github.com/repos/pydata/xarray/issues/7065 IC_kwDOAMm_X85Kzi6A guidocioni 12760310 2022-09-22T13:28:17Z 2022-09-22T13:28:31Z NONE

Mmmm that's weird, because the execution time is really different, and it would be hard to explain it if all the arrays are casted to the same dtype.

Yeah, for the nearest lookup I already implemented "my version" of BallTree, but I thought the sel method is using that under the hood already...no?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Merge wrongfully creating NaN 1381955373
1254985357 https://github.com/pydata/xarray/issues/7065#issuecomment-1254985357 https://api.github.com/repos/pydata/xarray/issues/7065 IC_kwDOAMm_X85KzY6N guidocioni 12760310 2022-09-22T12:56:35Z 2022-09-22T12:56:35Z NONE

Sorry, that brings me to another question that I never even considered.

As my latitude and longitude arrays in both datasets have a resolution of 0.1 degrees, wouldn't it make sense to use np.float16 for both arrays?

From this dataset I'm extracting the closest points to a station inside a user-defined radius, doing something similar to

python ds['distances'] = haversine(station['lon'], station['lat'], ds.lon, ds.lat) # haversine is the haversine distance nearest = ds.where(distances < 20, drop=True).copy() In theory, using a 16 bit precision for the longitude and latitude arrays shouldn't change much, as the original coordinates are not supposed to have more than 0.1 precision, but the final results are still quite different...

The thing is, if I use float16 I can bring the computation time from 6-7 seconds to 2 seconds.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Merge wrongfully creating NaN 1381955373
1254941693 https://github.com/pydata/xarray/issues/7065#issuecomment-1254941693 https://api.github.com/repos/pydata/xarray/issues/7065 IC_kwDOAMm_X85KzOP9 guidocioni 12760310 2022-09-22T12:17:10Z 2022-09-22T12:17:10Z NONE

@benbovy you have no idea how much time I spent trying to understand what the difference between the two different datasets was....and I completely missed the dtype difference. That could definitely explain the problem.

The problem is that I tried to merge with join='override' but it was still taking a long time. Probably I wasn't using the right order.

Before closing, just a curiosity: in this corner case shouldn't xarray cast automatically the lat,lon coordinate arrays to the same dtype or is it a dangerous assumption?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Merge wrongfully creating NaN 1381955373

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 42.93ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows