github: issue_comments: 6 rows where issue = 1216517115 sorted by updated

6 rows where issue = 1216517115 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1117786528	https://github.com/pydata/xarray/issues/6517#issuecomment-1117786528	https://api.github.com/repos/pydata/xarray/issues/6517	IC_kwDOAMm_X85CoBGg	erik-mansson 16100116	2022-05-04T19:46:09Z	2022-05-04T19:46:09Z	NONE	Overall, I just don't think this is a reliable way to trace memory allocation with NumPy. Maybe you could do better by also tracing back to source arrays with `.base`? You may be right that the OWNDATA-flag is more of an internal numpy thing for its memory management, and that there is no general requirement or guarantee that higher-level libraries should avoid creating "unnecessary" layers of views. I had just gotten used to nice behaviour form the other xarray's operations I was using (isel() and []-slicing created views as expected, while e.g. sel() and mean() which create array copies did not create any unnecessary view on top of those). While not creating extra view-objects for viewing the entire array could also be seen as an optimization, the net benefit is not obvious since the extra checks in the if-cases of my patch add some work too. (And of course a risk that a change deep down in the indexing methods has unintended consequences.) I would thus be OK with closing this issue as "won't fix", which I supposed you were heading towards unless a demand from others would appear. I followed your suggestion and changed my memory_size()-function to not just care about whether the OWNDATA is True/False (or probably equivalently whether ndarray.base is None or not), but recursively following the ndarray.base.base... and tracking the id() of objects to avoid counting the same more than once. The new version behaves differently: When called on a single DataArray whose data was defined by slicing something else, it counts the size of the full base array instead of 0 (or about 100 bytes overhead) as before, but within a Dataset (or optionally a set of multiple Datasets) any other reference to the same base array won't be counted again. I can live with this new more "relative" than "absolute" definition of were memory is considered "shared".	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Loading from NetCDF creates unnecessary numpy.ndarray-views that clears the OWNDATA-flag 1216517115
1116397246	https://github.com/pydata/xarray/issues/6517#issuecomment-1116397246	https://api.github.com/repos/pydata/xarray/issues/6517	IC_kwDOAMm_X85Cit6-	shoyer 1217238	2022-05-03T18:09:42Z	2022-05-03T18:09:42Z	MEMBER	I'm a little skeptical that it makes sense to add special case logic into Xarray in an attempt to keep NumPy's "OWNDATA" flag up to date. There are lots of places where we create views of data from existing arrays inside Xarray operations. There are definitely cases where Xarray's internal operations do memory copies followed by views, which would also result in datasets with misleading "OWNDATA" flags if you look only at resulting datasets, e.g., `DataArray.interp()` which definitely does internal memory copies: ``` y = xarray.DataArray([1, 2, 3], dims='x', coords={'x': [0, 1, 2]}) y.interp(x=0.5).data.flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False UPDATEIFCOPY : False ``` Overall, I just don't think this is a reliable way to trace memory allocation with NumPy. Maybe you could do better by also tracing back to source arrays with `.base`?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Loading from NetCDF creates unnecessary numpy.ndarray-views that clears the OWNDATA-flag 1216517115
1110703065	https://github.com/pydata/xarray/issues/6517#issuecomment-1110703065	https://api.github.com/repos/pydata/xarray/issues/6517	IC_kwDOAMm_X85CM_vZ	kmuehlbauer 5821660	2022-04-27T08:24:56Z	2022-04-27T08:24:56Z	MEMBER	FYI: Since h5netcdf recently moved to version 1.0, I've checked with latest xarray (2022.3.0) and latest h5netcdf (1.0.0). The OP example with the OP fix reproduces nicely as well with the updated fix.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Loading from NetCDF creates unnecessary numpy.ndarray-views that clears the OWNDATA-flag 1216517115
1110681951	https://github.com/pydata/xarray/issues/6517#issuecomment-1110681951	https://api.github.com/repos/pydata/xarray/issues/6517	IC_kwDOAMm_X85CM6lf	erik-mansson 16100116	2022-04-27T08:04:45Z	2022-04-27T08:22:28Z	NONE	Would we take this as a PR? Is there a simpler way to express that logic? One small simplification I realized later is that slice(None) can be used in place of slice(None, None, None). The `isinstance(i, slice) and` condition seemed necessary to avoid some case where `i` was array-like and thus gave an array of booleans with the comparison operator. The `len(key) == len(array.shape) + 1 and key[-1] is ...` is to handle the case where (at least) NumpyIndexingAdapter._indexing_array_and_key() appends an extra Ellipsis at the end of a tuple that may already have one slice per array dimension. This is actually the only case I noticed in the debugging, and a test now shows that I get the desired outcome even if the `len(key) == len(array.shape)` alternative is skipped (although that would look intuitive to allow). Thus an alternative patch could be ```diff --git "indexing.original.py" "indexing.patched.py" --- "indexing.original.py" +++ "indexing.patched.py" @@ -709,9 +709,12 @@ def explicit_indexing_adapter( """ raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support) result = raw_indexing_method(raw_key.tuple) - if numpy_indices.tuple: - # index the loaded np.ndarray - result = NumpyIndexingAdapter(np.asarray(result))[numpy_indices] + if numpy_indices.tuple and (not isinstance(result, np.ndarray) + or not all(i == slice(None) for i in numpy_indices.tuple)): + # The conditions within parentehses are to avoid unnecessary array slice/view-creation + # that would set flags['OWNDATA'] to False for no reason. + # Index the loaded np.ndarray. + result = NumpyIndexingAdapter(np.asarray(result))[numpy_indices] return result @@ -1156,6 +1160,11 @@ class NumpyIndexingAdapter(ExplicitlyIndexedNDArrayMixin): `def __getitem__(self, key): array, key = self._indexing_array_and_key(key)` if (len(key) == len(array.shape) + 1 and key[-1] is ... and all(isinstance(i, slice) and i == slice(None) for i in key[:len(array.shape)]) and isinstance(array, np.ndarray)): # (This isinstance-check is because nputils.NumpyVIndexAdapter() has not been tested.) Avoid unnecessary array slice/view-creation that would set flags['OWNDATA'] to False for no reason. return array return array[key] def setitem(self, key, value): `` Here I also corrected the fact that my old diff was made against an "original" file missing the line# index the loaded np.ndarray`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Loading from NetCDF creates unnecessary numpy.ndarray-views that clears the OWNDATA-flag 1216517115
1110354387	https://github.com/pydata/xarray/issues/6517#issuecomment-1110354387	https://api.github.com/repos/pydata/xarray/issues/6517	IC_kwDOAMm_X85CLqnT	max-sixty 5635139	2022-04-26T23:48:44Z	2022-04-26T23:48:44Z	MEMBER	I don't know this well — maybe others can comment — but the example checks out. Would we take this as a PR? Is there a simpler way to express that logic?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Loading from NetCDF creates unnecessary numpy.ndarray-views that clears the OWNDATA-flag 1216517115
1110353767	https://github.com/pydata/xarray/issues/6517#issuecomment-1110353767	https://api.github.com/repos/pydata/xarray/issues/6517	IC_kwDOAMm_X85CLqdn	max-sixty 5635139	2022-04-26T23:47:14Z	2022-04-26T23:47:14Z	MEMBER	For others, here's the diff: ```diff diff --git "indexing.original.py" "indexing.patched.py" --- "indexing.original.py" +++ "indexing.patched.py" @@ -709,8 +709,12 @@ def explicit_indexing_adapter( """ raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support) result = raw_indexing_method(raw_key.tuple) - if numpy_indices.tuple: - result = NumpyIndexingAdapter(np.asarray(result))[numpy_indices] + if numpy_indices.tuple and (not isinstance(result, np.ndarray) + or not all(i == slice(None, None, None) for i in numpy_indices.tuple)): + # The conditions within parentehses are to avoid unnecessary array slice/view-creation + # that would set flags['OWNDATA'] to False for no reason. + # Index the loaded np.ndarray. + result = NumpyIndexingAdapter(np.asarray(result))[numpy_indices] return result @@ -1156,6 +1160,11 @@ class NumpyIndexingAdapter(ExplicitlyIndexedNDArrayMixin): `def __getitem__(self, key): array, key = self._indexing_array_and_key(key)` if ((len(key) == len(array.shape) or (len(key) == len(array.shape) + 1 and key[-1] is ...)) and all(isinstance(i, slice) and i == slice(None, None, None) for i in key[:len(array.shape)]) and isinstance(array, np.ndarray)): # (The isinstance-check is because nputils.NumpyVIndexAdapter() has not been tested.) Avoid unnecessary array slice/view-creation that would set flags['OWNDATA'] to False for no reason. return array return array[key] def setitem(self, key, value): ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Loading from NetCDF creates unnecessary numpy.ndarray-views that clears the OWNDATA-flag 1216517115

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

6 rows where issue = 1216517115 sorted by updated_at descending

Avoid unnecessary array slice/view-creation that would set flags['OWNDATA'] to False for no reason.

Avoid unnecessary array slice/view-creation that would set flags['OWNDATA'] to False for no reason.

Advanced export