home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 707331260

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4482#issuecomment-707331260 https://api.github.com/repos/pydata/xarray/issues/4482 707331260 MDEyOklzc3VlQ29tbWVudDcwNzMzMTI2MA== 2560426 2020-10-12T20:31:26Z 2020-10-12T21:05:24Z NONE

See below. I temporarily write some files to netcdf then recombine them lazily using open_mfdataset.

The issue seems to present itself more consistently when my x is a constructed rolling window, and especially when it's a rolling window of a stacked dimension as in below.

I used the memory_profiler package and associated notebook extension (%%memit cell magic) to do memory profiling.

``` import numpy as np import xarray as xr import os

N = 1000 N_per_file = 10 M = 100 K = 10 window_size = 150

tmp_dir = 'tmp'

os.mkdir(tmp_dir)

save many netcdf files, later to be concatted into a dask.delayed dataset

for i in range(0, N, N_per_file):

# 3 dimensions:
# d1 is the dim we're splitting our files/chunking along
# d2 is a common dim among all files/chunks
# d3 is a common dim among all files/chunks, where the first half is 0 and the second half is nan
x_i = xr.DataArray([[[0]*(K//2) + [np.nan]*(K//2)]*M]*N_per_file,
    [('d1', [x for x in range(i, i+N_per_file)]), 
     ('d2', [x for x in range(M)]),
     ('d3', [x for x in range(K)])]

x_i.to_dataset(name='vals').to_netcdf('{}/file_{}.nc'.format(tmp_dir,i))

open lazily

x = xr.open_mfdataset('{}/*.nc'.format(tmp_dir), parallel=True, concat_dim='d1').vals

a rolling window along a stacked dimension

x_windows = x.stack(d13=['d1', 'd3']).rolling(d13=window_size).construct('window')

we'll dot x_windows with y along the window dimension

y = xr.DataArray([1]*window_size, dims='window')

incremental memory: 1.94 MiB

x_windows.dot(y).compute()

incremental memory: 20.00 MiB

x_windows.notnull().dot(y).compute()

incremental memory: 182.13 MiB

x_windows.fillna(0.).dot(y).compute()

incremental memory: 211.52 MiB

x_windows.weighted(y).mean('window', skipna=True).compute() ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  713834297
Powered by Datasette · Queries took 0.601ms · About: xarray-datasette