home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 539907822

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/3386#issuecomment-539907822 https://api.github.com/repos/pydata/xarray/issues/3386 539907822 MDEyOklzc3VlQ29tbWVudDUzOTkwNzgyMg== 6213168 2019-10-09T08:58:21Z 2019-10-09T08:58:21Z MEMBER

@sipposip xarray doesn't use netCDF4.MFDataset, but netCDF4.Dataset which is then wrapped by dask arrays which are then concatenated.

Opening each file separately with open_dataset, and then concatenating them with xr.concat does not work, as this loads the data into memory.

This is by design, because of the reason above. The NetCDF/HDF5 lazy loading means that data is loaded up into a numpy.ndarray on the first operation performed upon it. This includes concatenation.

I'm aware that threads within threads, threads within processes, and processes within threads cause a world of pain in the form of random deadlocks - I've been there myself. You can completely disable dask threads process-wide with python dask.config.set(scheduler="synchronous") ... ds.load() or as a context manager python with dask.config.set(scheduler="synchronous"): ds.load() or for the single operation: python ds.load(scheduler="synchronous") Does this address your issue?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  504497403
Powered by Datasette · Queries took 0.735ms · About: xarray-datasette