home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 334633212

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
334633212 MDU6SXNzdWUzMzQ2MzMyMTI= 2242 to_netcdf(compute=False) can be slow 1554921 closed 0     5 2018-06-21T19:50:36Z 2019-01-13T21:13:28Z 2019-01-13T21:13:28Z CONTRIBUTOR      

Code Sample

```python import xarray as xr from dask.array import ones import dask from dask.diagnostics import ProgressBar ProgressBar().register()

Define a mock DataSet

dset = {} for i in range(5): name = 'var'+str(i) data = i*ones((8,79,200,401),dtype='f4',chunks=(1,1,200,401)) var = xr.DataArray(data=data, dims=('time','level','lat','lon'), name=name) dset[name] = var dset = xr.Dataset(dset)

Single thread to facilitate debugging.

(may require dask < 0.18)

with dask.set_options(get=dask.get):

# This works fine. print ("Testing immediate netCDF4 writing") dset.to_netcdf("test1.nc")

# This can be twice as slow as the version above. # Can be even slower (like 10x slower) on a shared filesystem. print ("Testing delayed netCDF4 writing") dset.to_netcdf("test2.nc",compute=False).compute()

```

Problem description

Using the delayed version of to_netcdf can cause a slowdown in writing the file. Running through cProfile, I see _open_netcdf4_group is called many times, suggesting the file is opened and closed for each chunk written. In my scripts (which dump to an NFS filesystem), writes can take 10 times longer than they should.

Is there a reason for the repeated open/close cycles (e.g. #1198?), or can this behaviour be fixed so the file stays open for the duration of the compute() call?

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.6.final.0 python-bits: 64 OS: Linux OS-release: 3.13.0-135-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None xarray: 0.10.7 pandas: 0.23.0 numpy: 1.14.4 scipy: None netCDF4: 1.4.0 h5netcdf: None h5py: None Nio: None zarr: None bottleneck: None cyordereddict: None dask: 0.17.5 distributed: None matplotlib: 1.3.1 cartopy: None seaborn: None setuptools: 39.2.0 pip: None conda: None pytest: None IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2242/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 5 rows from issue in issue_comments
Powered by Datasette · Queries took 83.671ms · About: xarray-datasette