home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "MEMBER" and issue = 1216517115 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • max-sixty 2
  • shoyer 1
  • kmuehlbauer 1

issue 1

  • Loading from NetCDF creates unnecessary numpy.ndarray-views that clears the OWNDATA-flag · 4 ✖

author_association 1

  • MEMBER · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1116397246 https://github.com/pydata/xarray/issues/6517#issuecomment-1116397246 https://api.github.com/repos/pydata/xarray/issues/6517 IC_kwDOAMm_X85Cit6- shoyer 1217238 2022-05-03T18:09:42Z 2022-05-03T18:09:42Z MEMBER

I'm a little skeptical that it makes sense to add special case logic into Xarray in an attempt to keep NumPy's "OWNDATA" flag up to date. There are lots of places where we create views of data from existing arrays inside Xarray operations.

There are definitely cases where Xarray's internal operations do memory copies followed by views, which would also result in datasets with misleading "OWNDATA" flags if you look only at resulting datasets, e.g., DataArray.interp() which definitely does internal memory copies: ```

y = xarray.DataArray([1, 2, 3], dims='x', coords={'x': [0, 1, 2]}) y.interp(x=0.5).data.flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False UPDATEIFCOPY : False ```

Overall, I just don't think this is a reliable way to trace memory allocation with NumPy. Maybe you could do better by also tracing back to source arrays with .base?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Loading from NetCDF creates unnecessary numpy.ndarray-views that clears the OWNDATA-flag 1216517115
1110703065 https://github.com/pydata/xarray/issues/6517#issuecomment-1110703065 https://api.github.com/repos/pydata/xarray/issues/6517 IC_kwDOAMm_X85CM_vZ kmuehlbauer 5821660 2022-04-27T08:24:56Z 2022-04-27T08:24:56Z MEMBER

FYI: Since h5netcdf recently moved to version 1.0, I've checked with latest xarray (2022.3.0) and latest h5netcdf (1.0.0). The OP example with the OP fix reproduces nicely as well with the updated fix.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Loading from NetCDF creates unnecessary numpy.ndarray-views that clears the OWNDATA-flag 1216517115
1110354387 https://github.com/pydata/xarray/issues/6517#issuecomment-1110354387 https://api.github.com/repos/pydata/xarray/issues/6517 IC_kwDOAMm_X85CLqnT max-sixty 5635139 2022-04-26T23:48:44Z 2022-04-26T23:48:44Z MEMBER

I don't know this well — maybe others can comment — but the example checks out.

Would we take this as a PR? Is there a simpler way to express that logic?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Loading from NetCDF creates unnecessary numpy.ndarray-views that clears the OWNDATA-flag 1216517115
1110353767 https://github.com/pydata/xarray/issues/6517#issuecomment-1110353767 https://api.github.com/repos/pydata/xarray/issues/6517 IC_kwDOAMm_X85CLqdn max-sixty 5635139 2022-04-26T23:47:14Z 2022-04-26T23:47:14Z MEMBER

For others, here's the diff:

```diff diff --git "indexing.original.py" "indexing.patched.py" --- "indexing.original.py" +++ "indexing.patched.py" @@ -709,8 +709,12 @@ def explicit_indexing_adapter( """ raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support) result = raw_indexing_method(raw_key.tuple) - if numpy_indices.tuple: - result = NumpyIndexingAdapter(np.asarray(result))[numpy_indices] + if numpy_indices.tuple and (not isinstance(result, np.ndarray) + or not all(i == slice(None, None, None) for i in numpy_indices.tuple)): + # The conditions within parentehses are to avoid unnecessary array slice/view-creation + # that would set flags['OWNDATA'] to False for no reason. + # Index the loaded np.ndarray. + result = NumpyIndexingAdapter(np.asarray(result))[numpy_indices] return result

@@ -1156,6 +1160,11 @@ class NumpyIndexingAdapter(ExplicitlyIndexedNDArrayMixin):

 def __getitem__(self, key):
     array, key = self._indexing_array_and_key(key)
  • if ((len(key) == len(array.shape) or (len(key) == len(array.shape) + 1 and key[-1] is ...))
  • and all(isinstance(i, slice) and i == slice(None, None, None) for i in key[:len(array.shape)])
  • and isinstance(array, np.ndarray)): # (The isinstance-check is because nputils.NumpyVIndexAdapter() has not been tested.)
  • Avoid unnecessary array slice/view-creation that would set flags['OWNDATA'] to False for no reason.

  • return array return array[key]

    def setitem(self, key, value):

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Loading from NetCDF creates unnecessary numpy.ndarray-views that clears the OWNDATA-flag 1216517115

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 35.834ms · About: xarray-datasette