home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 506205450

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
506205450 MDU6SXNzdWU1MDYyMDU0NTA= 3394 Update a small slice of a large netcdf file without overwriting the entire file. 15239248 open 0     1 2019-10-12T16:06:18Z 2021-07-04T03:32:00Z   NONE      

MCVE Code Sample

```python

Your code here

orig = '/tmp/orig.h5'

ii = 100000

data = xr.Dataset( { 'x':('t',np.random.randn(ii)), 'y':('t',np.random.randn(ii)) }, coords={'t':range(ii)} )

function to save the large file usnig chunksizes

def save(ds,path,kwargs): dvars = ds.variables chunksize = 100 var_dic = {} for var in dvars: var_dic[var]={ 'chunksizes': (chunksize,) } delayed =ds.to_netcdf(path,encoding=var_dic,kwargs)

save(data,orig)

data.close()

open the file, using dask

data_1 = xr.open_mfdataset([orig],chunks={'t':100})

Change variable x

data_1['x']=data_1['x']+20 data_1.close()

update only variable x. This works!

data_1['x'].to_netcdf(orig,mode='a')

try the same but now update only a slice of the x variable

open the file, using dask

data_1 = xr.open_mfdataset(orig,chunks={'t':100})

Change variable x

data_1['x']=data_1['x']+20 data_1.close()

update only variable x. this doesnt work!

data_1['x'][{'t':slice(0,10)}].to_netcdf(orig,mode='a')

```

Expected Output

Problem Description

Hi, I have a large dataset that does not fit in memory. Lets say i only want to update a small portion of it. Is there any way to update this small portion without having to rewrite the entire file.

I was fiddling around and found a way to update one variable at a time, but i want to be able to update only a subsection of this variable

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Jul 2 2019, 02:07:37) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.6.2 xarray: 0.12.3 pandas: 0.25.1 numpy: 1.17.1 scipy: 1.3.1 netCDF4: 1.5.1.2 pydap: None h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.3.0 distributed: None matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: None setuptools: 41.2.0 pip: 19.2.3 conda: 4.7.11 pytest: 4.5.0 IPython: 7.8.0 sphinx: 2.2.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3394/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 0.833ms · About: xarray-datasette