home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1175517164

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1175517164 I_kwDOAMm_X85GEPfs 6395 two Dataset objects reference the same numpy array memory block upon creation. 44142765 closed 0     3 2022-03-21T15:03:20Z 2022-03-22T11:52:31Z 2022-03-21T15:55:08Z NONE      

What happened?

I tried creating two new Dataset objects using empty numpy arrays that I would later populate with values using integer indexing. To my surprise, I got output that was identical to for both datasets. After digging around, I discovered that the underlying numpy arrays of the two Dataset objects were identical. This confused me because I did not make a copy of one to create the other.

What did you expect to happen?

Getting two separate objects with non-identical memory addresses for the numpy arrays they contain.

Minimal Complete Verifiable Example

```Python import xarray as xr import pandas as pd import numpy as np

def xarray_dataset(): rng = np.random.default_rng(0) data_map = { "Tmin": rng.uniform(-1, 1, size=(3, 2, 2)), "Tmax": rng.uniform(-1, 1, size=(3, 2, 2)), } lon = [-99.83, -99.32] lat = [42.25, 42.21] time = pd.date_range("2014-09-06", "2016-09-06", periods=3) var_map = {"time": time, "lat": lat, "lon": lon} out_map = {name: (tuple(var_map), data_map[name]) for name in data_map} return xr.Dataset(data_vars=out_map, coords=var_map)

base = xarray_dataset() dims = tuple(base.dims) base_shape = (base.time.size, base.lat.size, base.lon.size) var_map = {var: (dims, np.empty(base_shape)) for var in ("var_a", "var_b")} coord_map = { "time": (("time",), base.time.values), "lon": (("lon",), base.lon.values), "lat": (("lat",), base.lat.values), } out1 = xr.Dataset(var_map, coords=coord_map) out2 = xr.Dataset(var_map, coords=coord_map)

print(out1 is out2) # False print(out1.var_a.values is out2.var_a.values) # True......but HOW?! ```

Relevant log output

Python print(out1 is out2) # False print(out1.var_a.values is out2.var_a.values) # True......but HOW?!

Anything else we need to know?

It seems as though changing the lines: python out1 = xr.Dataset(var_map, coords=coord_map) out2 = xr.Dataset(var_map, coords=coord_map) to python out1 = xr.Dataset(var_map, coords=coord_map) out2 = out1.copy(deep=True)

fixes the issue.

Environment

``` INSTALLED VERSIONS


commit: None python: 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.15.25-1-lts machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4

xarray: 0.21.1 pandas: 1.1.5 numpy: 1.22.0 scipy: 1.6.2 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: 3.6.0 Nio: None zarr: 2.10.3 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: None fsspec: 2022.01.0 cupy: None pint: None sparse: None setuptools: 60.0.4 pip: 21.3.1 conda: 4.11.0 pytest: 6.2.5 IPython: 8.0.1 sphinx: 4.4.0 ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6395/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 3 rows from issue in issue_comments
Powered by Datasette · Queries took 0.683ms · About: xarray-datasette