home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 2127671156

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2127671156 I_kwDOAMm_X85-0a90 8728 Lingering memory connections when extracting underlying `np.arrays` from datasets 16925278 open 0     6 2024-02-09T18:39:34Z 2024-02-26T06:02:15Z   CONTRIBUTOR      

What is your issue?

I know that generally, ds2 = ds connects the two objects in memory, and changes in one will also cause changes in the other.

However, I generally assume that certain operations should break this connection, for example: - extracting the underlying np.array from a dataset (changing its type and destroying a lot of the xarray-specific information: index, dimensions, etc.) - using the underlying np.array into a new dataset

In other words, I would expect that using ds['var'].values would be similar to copy.deepcopy(ds['var'].values).

Here's an example that illustrates how in these cases, the objects are still linked in memory:

(apologies for the somewhat hokey example)

``` import xarray as xr import numpy as np

Create a dataset

ds = xr.Dataset(coords = {'lon':(['lon'],np.array([178.2,179.2,-179.8, -178.8,-177.8,-176.8]))}) print('\nds: ') print(ds)

Create a new dataset that uses the values of the first dataset

ds2 = xr.Dataset({'lon1':(['lon'],ds.lon.values)}, coords = {'lon':(['lon'],ds.lon.values)}) print('\nds2: ') print(ds2)

Change ds2's 'lon1' variable

ds2['lon1'][ds2['lon1']<0] = 360 + ds2['lon1'][ds2['lon1']<0]

ds2 is changed as expected

print('\nds2 (should be modified): ') print(ds2)

ds is changed, which is not expected

print('\nds (should not be modified): ') print(ds) ```

The question is - am I right (from a UX perspective) to expect these kinds of operations to disconnect the objects in memory? If so, I might try to update the docs to be a bit clearer on this. (or, alternatively, if these kinds of operations should disconnect the objects in memory, maybe it's better to have .values also call .copy(deep=True).values)

Appreciate y'all's thoughts on this!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8728/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 156.72ms · About: xarray-datasette