home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

4 rows where repo = 13221727, type = "issue" and user = 49512274 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 3
  • open 1

type 1

  • issue · 4 ✖

repo 1

  • xarray · 4 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1056247970 I_kwDOAMm_X84-9RCi 5995 High memory usage of xarray vs netCDF4 function ludwigVonKoopa 49512274 closed 0     3 2021-11-17T15:13:19Z 2023-09-12T15:44:19Z 2023-09-12T15:44:18Z NONE      

Hi,

I would like to open one netcdf file, changes some variables attributes, zlib compression and sometimes global attributes. I used to do it using netCDF4, and it worked.

Recently, i tried using xarray to perform the same job. The result are the same, but xarray always load entirely the file in memory, instead of write variable by variables.

Minimal example here

Creation of example file :

```python import xarray as xr

ds = xr.Dataset() obs=4835680 n = 20

basic_encoding = dict(zlib=True, shuffle=True, complevel=1)

some variables with scale factor

for i in range(3): vname = f"scale{i:02d}" ds[vname] = (["obs"], np.random.rand(obs).astype(np.float32)/1e3) ds[vname].encoding.update(basic_encoding) ds[vname].encoding.update({"dtype": np.uint16, "scale_factor": 0.0001, "add_offset": 0, "chunksizes": (1611894,)})

some variables without scale factor

for i in range(3): vname = f"float{i:02d}" ds[vname] = (["obs"], np.random.rand(obs).astype(np.float32)) ds[vname].encoding.update(basic_encoding) ds[vname].encoding.update({"chunksizes": (967136,)})

some variables with 2 dimensions which use more memory

for i in range(3): vname = f"matrix{i:02d}" ds[vname] = (["obs", "n"], np.random.rand(obs, n).astype(np.float32)*10) ds[vname].encoding.update(basic_encoding) ds[vname].encoding.update({"dtype": np.int16, "scale_factor": 0.01, "add_offset": 0, "chunksizes": (20000, 20)})

ds.to_netcdf("/tmp/test_original.nc") ```

here is my olf function to copy/rewrite my netcdf file, and the new function (i deleted useless changes in both function to keep only importants parts)

```python import netCDF4 def old_copy(f_in, f_out):

with netCDF4.Dataset(f_out, 'w') as h_out:
    with netCDF4.Dataset(f_in, 'r') as h_in:
        for dimension, size in h_in.dimensions.items():
            h_out.createDimension(dimension, len(size))

        for varname, var_in in h_in.variables.items():
            var_out = h_out.createVariable(
                varname, var_in.dtype, var_in.dimensions,
                zlib=True, complevel=2
            )
            for key in var_in.ncattrs():
                if key != '_FillValue':
                    setattr(var_out, key, getattr(var_in, key))
            var_in.set_auto_maskandscale(False)
            var_out.set_auto_maskandscale(False)
            var_out[:] = var_in[:]

        for attr in h_in.ncattrs():
            setattr(h_out, attr, getattr(h_in, attr))

def new_copy(f_in, f_out): with xr.open_dataset(f_in) as d_in: d_in.to_netcdf(f_out) ```

here i compare both function in term of memory usage,

```python import holoviews as hv from dask.diagnostics import ResourceProfiler, visualize hv.extension("bokeh")

F_IN = "/tmp/test_original.nc" F_OUT = "/tmp/test.nc"

!rm -rfv {F_OUT} with ResourceProfiler(dt=0.1) as rprof_old:

old_copy(F_IN, F_OUT)

rprof.visualize()

!rm -rfv {F_OUT} with ResourceProfiler(dt=0.1) as rprof_new:

new_copy(F_IN, F_OUT)

visualize([rprof_old, rprof_new]) ```

What happened:

xarray seems to load the entire file in memory to dump it.

What you expected to happen:

How can i tell xarray to load/dump variable by variable without loading the entire file ?

Thanks you

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 (default, Jul 30 2021, 16:35:19) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.15.0-142-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: ('fr_FR', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.0 pandas: 1.3.2 numpy: 1.20.3 scipy: 1.6.2 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.8.1 cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.07.2 distributed: 2021.07.2 matplotlib: 3.4.3 cartopy: 0.19.0 seaborn: None numbagg: None pint: 0.17 setuptools: 52.0.0.post20210125 pip: 21.2.2 conda: 4.10.3 pytest: 6.2.5 IPython: 7.26.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5995/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
822987300 MDU6SXNzdWU4MjI5ODczMDA= 5001 .min() doesn't work on np.datetime64 with a chunked Dataset ludwigVonKoopa 49512274 open 0     2 2021-03-05T11:12:19Z 2022-05-01T16:11:48Z   NONE      

Hi all,

if a xr.Dataset is chunked, i cannot do ds.time.min(), i get an error : ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]'). I don't know if it is expected ? Moreover, ds2.time.mean() works

Thanks

What happened:

raised an UFuncTypeError: ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]')

What you expected to happen:

compute the min & max on a chunked datetime64 xarray.DataArray

Minimal Complete Verifiable Example:

```python import xarray as xr import numpy as np

obs=200 t0 = np.datetime64("2010-01-01T00:00:00") tn = t0 + np.timedelta64(123*4, "D")

ds2 = xr.Dataset( { "time": (["obs"], np.arange(t0, tn, (tn-t0)/obs)), }, coords={ "obs": (["obs"], np.arange(obs)), }, ).chunk({"obs": 100})

ds2.time.min() ```

Anything else we need to know?:

ds2.time.mean() works, max & min raise Exception

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-133-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.2.1 numpy: 1.19.5 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.6.1 cftime: 1.3.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.01.1 distributed: 2021.01.1 matplotlib: 3.3.4 cartopy: None seaborn: None numbagg: None pint: 0.16.1 setuptools: 52.0.0.post20210125 pip: 20.3.3 conda: None pytest: 6.2.2 IPython: 7.20.0 sphinx: 3.5.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5001/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
673504545 MDU6SXNzdWU2NzM1MDQ1NDU= 4311 uint32 variable in zarr, but float64 when loading with xarray ludwigVonKoopa 49512274 closed 0     1 2020-08-05T12:34:35Z 2021-04-19T08:59:51Z 2021-04-19T08:59:51Z NONE      

Hi all,

I start to play with xarray and zarr and came across something curious : I create a zarr folder and a zarr variable in uint32. When i load this dataset with xarray, it loads in float64. I don't know if it is something expected ?

```python fichier1 = "/tmp/test.zarr"

zh = zarr.open(fichier1, "w")

example = np.zeros(10, dtype=np.uint32) myvar = zh.create_dataset("myvar", shape=example.shape, dtype=example.dtype )

myvar.attrs["_ARRAY_DIMENSIONS"] = ["obs"] # <- without this, the zarr dataset will not be readable by xarray myvar[:] = example

dtype is uint32

zh.myvar.dtype python

dtype('uint32') ```

when reloading with zarr :

```python

dtype is stil uint32

zh = zarr.open(fichier1, 'r') zh.myvar.dtype ```

```python

dtype('uint32') ```

But when loading with xarray :

```python

dtype is float64

ds = xr.open_zarr(fichier1) ds.myvar.dtype ```

```python

dtype('float64') ```

Is it something expected ? Am I missing something ?

link to the notebook created : bad_dtype_zarr_xarray

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-106-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.15.1 pandas: 1.0.3 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.3.2 cftime: 1.1.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.13.0 distributed: 2.13.0 matplotlib: 3.1.3 cartopy: None seaborn: None numbagg: None setuptools: 46.1.1.post20200323 pip: 20.0.2 conda: None pytest: 5.4.1 IPython: 7.13.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4311/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
778221436 MDU6SXNzdWU3NzgyMjE0MzY= 4763 Keep attributes across operations ludwigVonKoopa 49512274 closed 0     1 2021-01-04T16:45:45Z 2021-01-04T16:52:15Z 2021-01-04T16:52:15Z NONE      

Hi,

I felt on this issue#2582 about the problem when arithmetic operation doesn't keep attribute in an DataArray.

Is this problem not merged yet ? I just installed a fresh conda env with python3.8 & xarray 0.16.2 and the problem still persist :

```python ds = xr.Dataset({"a": (("x",), np.array([1,2,3]))}) ds["a"].attrs["units"] = "m" ds.a

Out[1]: <xarray.DataArray 'a' (x: 3)> array([1, 2, 3]) Dimensions without coordinates: x Attributes: units: m ```

```python ds["b"] = ds.a * 2 ds.b

Out[2]: <xarray.DataArray 'b' (x: 3)> array([2, 4, 6]) Dimensions without coordinates: x ```

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-128-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8 libhdf5: None libnetcdf: None xarray: 0.16.2 pandas: 1.1.5 numpy: 1.19.2 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 51.0.0.post20201207 pip: 20.3.3 conda: None pytest: None IPython: 7.19.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4763/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 3747.543ms · About: xarray-datasette