home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

12 rows where user = 5637662 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 10

  • Slow performance of isel 2
  • xarray 2022.10.0 much slower then 2022.6.0 2
  • Feature request: vector cross product 1
  • writing sparse to netCDF 1
  • Dataset.mean changes variables without specified dimension 1
  • scatter plot with row or col gets hue wrong 1
  • Skip mean over empty axis 1
  • Plotting of labelled data fails 1
  • remove _ensure_plottable 1
  • Optimize some copying 1

user 1

  • dschwoerer · 12 ✖

author_association 1

  • CONTRIBUTOR 12
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1467929278 https://github.com/pydata/xarray/issues/2227#issuecomment-1467929278 https://api.github.com/repos/pydata/xarray/issues/2227 IC_kwDOAMm_X85XftK- dschwoerer 5637662 2023-03-14T11:32:10Z 2023-03-14T11:32:10Z CONTRIBUTOR

I see, they are not the same - the slow one is still a dask array, the other one is not: Sn (r, theta, phi, sampling) float64 dask.array<chunksize=(14, 52, 2, 10), meta=np.ndarray>, Sn (r, theta, phi, sampling) float64 nan nan nan nan ... nan nan nan Otherwise they are the same, so this might be dask related ...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slow performance of isel 331668890
1463894170 https://github.com/pydata/xarray/issues/2227#issuecomment-1463894170 https://api.github.com/repos/pydata/xarray/issues/2227 IC_kwDOAMm_X85XQUCa dschwoerer 5637662 2023-03-10T14:36:43Z 2023-03-10T14:36:43Z CONTRIBUTOR

I just changed theisel = ds[k].isel(**slc, missing_dims="ignore") to: slcp = [slc[d] if d in slc else slice(None) for d in ds[k].dims] theisel = ds[k].values[tuple(slcp)] And that changed the runtime of my code from (unknown, still running after 3 hours) to around 10 seconds.

ds[k] is a 3 dimensional array slc[d] are 7-d numpy array of integers

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slow performance of isel 331668890
1300201799 https://github.com/pydata/xarray/pull/7209#issuecomment-1300201799 https://api.github.com/repos/pydata/xarray/issues/7209 IC_kwDOAMm_X85Nf4FH dschwoerer 5637662 2022-11-02T11:51:50Z 2022-11-02T11:51:50Z CONTRIBUTOR

The change does matter - but deep copies are still much more expensive than they used to be (as to be expected, I guess)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Optimize some copying 1421441672
1290161405 https://github.com/pydata/xarray/issues/7181#issuecomment-1290161405 https://api.github.com/repos/pydata/xarray/issues/7181 IC_kwDOAMm_X85M5kz9 dschwoerer 5637662 2022-10-25T08:12:05Z 2022-10-25T08:12:05Z CONTRIBUTOR

Indeed, it does help. In 6 hours the CI completed 50% of the tests, compared to 17%.

This however still very much slower than before, where we finished in around half an hour - so around 24x slower ...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray 2022.10.0 much slower then 2022.6.0 1412895383
1283740282 https://github.com/pydata/xarray/issues/7181#issuecomment-1283740282 https://api.github.com/repos/pydata/xarray/issues/7181 IC_kwDOAMm_X85MhFJ6 dschwoerer 5637662 2022-10-19T09:57:47Z 2022-10-19T09:57:47Z CONTRIBUTOR

A call graph was posted in the referenced thread:

https://github.com/boutproject/xBOUT/pull/252#issuecomment-1282222985 https://user-images.githubusercontent.com/1486942/196415148-ca7ea730-34f6-4622-8f0c-1e98d8b06e26.svg

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray 2022.10.0 much slower then 2022.6.0 1412895383
912636489 https://github.com/pydata/xarray/issues/5762#issuecomment-912636489 https://api.github.com/repos/pydata/xarray/issues/5762 IC_kwDOAMm_X842ZbpJ dschwoerer 5637662 2021-09-03T15:49:40Z 2021-09-03T15:49:40Z CONTRIBUTOR

I tried it with master, and it failed. Trying with main worked :-D

I think there will be cases where the data is not suitable for plotting, where the new error will be less clear than the old one, but I still think that would be overall an improvement.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Plotting of labelled data fails 987551524
912630631 https://github.com/pydata/xarray/pull/5763#issuecomment-912630631 https://api.github.com/repos/pydata/xarray/issues/5763 IC_kwDOAMm_X842ZaNn dschwoerer 5637662 2021-09-03T15:40:59Z 2021-09-03T15:40:59Z CONTRIBUTOR

I think there are several options:

  1. remove the offending tests (simple)
  2. reintroduce _ensure_plottable but only check for multiindex. This has still the issue that it might at some point be possible, and the tests need to be changed and _ensure_plottable needs to be changed
  3. Change the test to check for any error. This basically just means we are checking it isn't working, which I see only limited value in.

Which of these options would you prefer?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  remove _ensure_plottable 987559143
843971807 https://github.com/pydata/xarray/issues/4156#issuecomment-843971807 https://api.github.com/repos/pydata/xarray/issues/4156 MDEyOklzc3VlQ29tbWVudDg0Mzk3MTgwNw== dschwoerer 5637662 2021-05-19T10:33:08Z 2021-05-19T10:33:08Z CONTRIBUTOR

I have hacked something that does support the reading and writing of sparse arrays to a netcdf file, however I didn't know how and where to put this within xarray.

``` def ds_to_netcdf(ds, fn): dsorg = ds ds = dsorg.copy() for v in ds: if hasattr(ds[v].data, "nnz") and ( hasattr(ds[v].data, "to_coo") or hasattr(ds[v].data, "linear_loc") ): coord = f"{v}_xarray_index" assert coord not in ds data = ds[v].data if hasattr(data, "to_coo"): data = data.to_coo() ds[coord] = coord, data.linear_loc() dims = ds[v].dims ds[coord].attrs["compress"] = " ".join(dims) at = ds[v].attrs ds[v] = coord, data.data ds[v].attrs = at ds[v].attrs["fill_value"] = str(data.fill_value) for d in dims: if d not in ds: ds[f"_len{d}"] = len(dsorg[d])

print(ds)
ds.to_netcdf(fn)

```

``` def xr_open_dataset(fn): ds = xr.open_dataset(fn)

def fromflat(shape, i):
    index = []
    for fac in shape[::-1]:
        index.append(i % fac)
        i //= fac
    return tuple(index[::-1])

for c in ds.coords:
    if "compress" in ds[c].attrs:
        vs = c.split("_")
        if len(vs) < 5:
            continue
        if vs[-1] != "" or vs[-2] != "index" or vs[-3] != "xarray":
            continue
        v = "_".join(vs[1:-3])
        at = ds[v].attrs
        dat = ds[v].data
        fill = ds[v].attrs.pop("_fill_value", None)
        if fill:
            knownfails = {"nan": np.nan, "False": False, "True": True}
            if fill in knownfails:
                fill = knownfails[fill]
            else:
                fill = np.fromstring(fill, dtype=dat.dtype)
        dims = ds[c].attrs["compress"].split()
        shape = []
        for d in dims:
            try:
                shape.append(len(ds[d]))
            except KeyError:
                shape.append(int(ds[f"_len_{d}"].data))
                ds = ds.drop_vars(f"_len_{d}")

        locs = fromflat(shape, ds[c].data)
        data = sparse.COO(locs, ds[v].data, shape, fill_value=fill)
        ds[v] = dims, data, ds[v].attrs, ds[v].encoding
print(ds)
return ds

```

Has there been any progress since last year?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  writing sparse to netCDF 638947370
825664534 https://github.com/pydata/xarray/pull/5207#issuecomment-825664534 https://api.github.com/repos/pydata/xarray/issues/5207 MDEyOklzc3VlQ29tbWVudDgyNTY2NDUzNA== dschwoerer 5637662 2021-04-23T13:37:53Z 2021-04-23T13:37:53Z CONTRIBUTOR

I have now changed so that several wrapped functions preserve the data.

It is more generic, and hopefully still readable. The flag might also be called identity_0d because the function is the identity function for 0d data, while invariant_0d means that any 0d data is invariant under this function.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Skip mean over empty axis 865002281
788123029 https://github.com/pydata/xarray/issues/4975#issuecomment-788123029 https://api.github.com/repos/pydata/xarray/issues/4975 MDEyOklzc3VlQ29tbWVudDc4ODEyMzAyOQ== dschwoerer 5637662 2021-03-01T17:20:27Z 2021-03-01T17:20:27Z CONTRIBUTOR

Thanks, it is indeed fixed in 070d815 :+1: Should I close #4978 ?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  scatter plot with row or col gets hue wrong 818944970
788052130 https://github.com/pydata/xarray/issues/4885#issuecomment-788052130 https://api.github.com/repos/pydata/xarray/issues/4885 MDEyOklzc3VlQ29tbWVudDc4ODA1MjEzMA== dschwoerer 5637662 2021-03-01T15:47:52Z 2021-03-01T15:47:52Z CONTRIBUTOR

I tried this: diff --- a/xarray/core/dataset.py +++ b/xarray/core/dataset.py @@ -4701,7 +4701,9 @@ class Dataset(Mapping, ImplementsDatasetReduce, DataWithCoords): if not reduce_dims: variables[name] = var else: - if ( + if not reduce_dims: + variables[name] = var + elif ( not numeric_only or np.issubdtype(var.dtype, np.number) or (var.dtype == np.bool_) which works great for mean - "var" stays an integer, as expected.

However, that breaks ds.std - which should be zero for "var", but isn't. I guess that is ok for coords - as the assumption is that on coordinates the calculation is not done, but for data variables this is probably not ok.

```diff --- a/xarray/core/duck_array_ops.py +++ b/xarray/core/duck_array_ops.py @@ -537,6 +537,11 @@ def mean(array, axis=None, skipna=None, **kwargs): dtypes""" from .common import _contains_cftime_datetimes

  • The mean over an empty axis shouldn't change the data

  • See https://github.com/pydata/xarray/issues/4885

  • if not axis:
  • return array + array = asarray(array) if array.dtype.kind in "Mm": offset = _datetime_nanmin(array) ``` I think it is best to change mean - which would work also for dataArrays. This implies that mean does not convert to float64 - as the numpy version does, but I guess that should be fine.

Should I open a PR?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset.mean changes variables without specified dimension 805389572
689508449 https://github.com/pydata/xarray/issues/3279#issuecomment-689508449 https://api.github.com/repos/pydata/xarray/issues/3279 MDEyOklzc3VlQ29tbWVudDY4OTUwODQ0OQ== dschwoerer 5637662 2020-09-09T11:46:09Z 2020-09-09T11:46:09Z CONTRIBUTOR

Very useful :+1: I would add: try: c.attrs["units"] = a.attrs["units"] + '*' + b.attrs["units"] except KeyError: pass to preserve units - but I am not sure that is in scope for xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature request: vector cross product 489034521

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.98ms · About: xarray-datasette