home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 793245791

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
793245791 MDU6SXNzdWU3OTMyNDU3OTE= 4842 nbytes does not return the true size for sparse variables 13662783 closed 0     2 2021-01-25T10:17:56Z 2022-07-22T17:25:33Z 2022-07-22T17:25:33Z CONTRIBUTOR      

This wasn't entirely surprising to me, but nbytes currently doesn't return the right value for sparse data -- at least, I think nbytes should show the actual size in memory?

Since it uses size here:

https://github.com/pydata/xarray/blob/a0c71c1508f34345ad7eef244cdbbe224e031c1b/xarray/core/variable.py#L349

Rather than something like data.nnz, which of course only exists for sparse arrays... I'm not sure if there's a sparse flag or something, or whether you'd have to do a typecheck?

Minimal Complete Verifiable Example:

```python import pandas as pd import numpy as np import xarray as xr

df = pd.DataFrame() df["x"] = np.repeat(np.random.rand(10_000), 10) df["y"] = np.repeat(np.random.rand(10_000), 10) df["time"] = np.tile(pd.date_range("2000-01-01", "2000-03-10", freq="W"), 10_000) df["rate"] = 10.0 df = df.set_index(["time", "y", "x"])

sparse_ds = xr.Dataset.from_dataframe(df, sparse=True) print(sparse_ds["rate"].nbytes) ```

python 8000000000 Anything else we need to know?:

Environment:

Output of <tt>xr.show_versions()</tt> ``` INSTALLED VERSIONS ------------------ commit: None python: 3.7.9 (default, Aug 31 2020, 17:10:11) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.16.1 pandas: 1.1.2 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.3 pydap: None h5netcdf: 0.8.0 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.2 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.27.0 distributed: 2.30.1 matplotlib: 3.3.1 cartopy: None seaborn: 0.11.0 numbagg: None pint: None setuptools: 49.6.0.post20201009 pip: 20.3.3 conda: None pytest: 6.1.0 IPython: 7.19.0 sphinx: 3.2.1 ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4842/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 3 rows from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 2.055ms · About: xarray-datasette