html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/1385#issuecomment-464100720,https://api.github.com/repos/pydata/xarray/issues/1385,464100720,MDEyOklzc3VlQ29tbWVudDQ2NDEwMDcyMA==,30007270,2019-02-15T15:57:01Z,2019-02-15T18:33:31Z,NONE,"In that case, the speedup disappears. It seems that the slowdown arises from the entire time array being loaded into memory at once. EDIT: I subsequently realized that using drop_variables = 'time' caused all the data values to become nan, which makes that an invalid option. ``` %prun ds = xr.open_mfdataset(fname,decode_times=False) 8025 function calls (7856 primitive calls) in 29.662 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 4 29.608 7.402 29.608 7.402 {built-in method _operator.getitem} 1 0.032 0.032 0.032 0.032 netCDF4_.py:244(_open_netcdf4_group) 1 0.015 0.015 0.015 0.015 {built-in method posix.lstat} 126/114 0.000 0.000 0.001 0.000 indexing.py:504(shape) 1196 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance} 81 0.000 0.000 0.001 0.000 variable.py:239(__init__) ``` See the rest of the prun output under the Details for more information:
30 0.000 0.000 0.000 0.000 {method 'getncattr' of 'netCDF4._netCDF4.Variable' objects} 81 0.000 0.000 0.000 0.000 variable.py:709(attrs) 736/672 0.000 0.000 0.000 0.000 {built-in method builtins.len} 157 0.000 0.000 0.001 0.000 utils.py:450(ndim) 81 0.000 0.000 0.001 0.000 variable.py:417(_parse_dimensions) 7 0.000 0.000 0.001 0.000 netCDF4_.py:361(open_store_variable) 4 0.000 0.000 0.000 0.000 base.py:253(__new__) 1 0.000 0.000 29.662 29.662 :1() 7 0.000 0.000 0.001 0.000 conventions.py:245(decode_cf_variable) 39/19 0.000 0.000 29.609 1.558 {built-in method numpy.core.multiarray.array} 9 0.000 0.000 0.000 0.000 core.py:1776(normalize_chunks) 104 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr} 143 0.000 0.000 0.001 0.000 variable.py:272(shape) 4 0.000 0.000 0.000 0.000 utils.py:88(_StartCountStride) 8 0.000 0.000 0.000 0.000 core.py:747(blockdims_from_blockshape) 23 0.000 0.000 0.032 0.001 file_manager.py:150(acquire) 8 0.000 0.000 0.000 0.000 base.py:590(tokenize) 84 0.000 0.000 0.000 0.000 variable.py:137(as_compatible_data) 268 0.000 0.000 0.000 0.000 {method 'indices' of 'slice' objects} 14 0.000 0.000 29.610 2.115 variable.py:41(as_variable) 35 0.000 0.000 0.000 0.000 variables.py:102(unpack_for_decoding) 81 0.000 0.000 0.000 0.000 variable.py:721(encoding) 192 0.000 0.000 0.000 0.000 {built-in method builtins.getattr} 2 0.000 0.000 0.000 0.000 merge.py:109(merge_variables) 2 0.000 0.000 29.610 14.805 merge.py:392(merge_core) 7 0.000 0.000 0.000 0.000 variables.py:161() 103 0.000 0.000 0.000 0.000 {built-in method _abc._abc_instancecheck} 1 0.000 0.000 0.001 0.001 conventions.py:351(decode_cf_variables) 3 0.000 0.000 0.000 0.000 dataset.py:90(calculate_dimensions) 1 0.000 0.000 0.000 0.000 {built-in method posix.stat} 361 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 20 0.000 0.000 0.000 0.000 variable.py:728(copy) 23 0.000 0.000 0.000 0.000 lru_cache.py:40(__getitem__) 12 0.000 0.000 0.000 0.000 base.py:504(_simple_new) 2 0.000 0.000 0.000 0.000 variable.py:1985(assert_unique_multiindex_level_names) 2 0.000 0.000 0.000 0.000 alignment.py:172(deep_align) 14 0.000 0.000 0.000 0.000 indexing.py:469(__init__) 16 0.000 0.000 29.609 1.851 variable.py:1710(__init__) 1 0.000 0.000 29.662 29.662 {built-in method builtins.exec} 25 0.000 0.000 0.000 0.000 contextlib.py:81(__init__) 7 0.000 0.000 0.000 0.000 {method 'getncattr' of 'netCDF4._netCDF4.Dataset' objects} 24 0.000 0.000 0.000 0.000 indexing.py:331(as_integer_slice) 50/46 0.000 0.000 0.000 0.000 common.py:181(__setattr__) 7 0.000 0.000 0.000 0.000 variables.py:155(decode) 4 0.000 0.000 29.609 7.402 indexing.py:760(explicit_indexing_adapter) 48 0.000 0.000 0.000 0.000 :416(parent) 103 0.000 0.000 0.000 0.000 abc.py:137(__instancecheck__) 48 0.000 0.000 0.000 0.000 _collections_abc.py:742(__iter__) 180 0.000 0.000 0.000 0.000 variable.py:411(dims) 4 0.000 0.000 0.000 0.000 locks.py:158(__exit__) 3 0.000 0.000 0.001 0.000 core.py:2048(from_array) 1 0.000 0.000 29.612 29.612 conventions.py:412(decode_cf) 4 0.000 0.000 0.000 0.000 utils.py:50(_maybe_cast_to_cftimeindex) 77/59 0.000 0.000 0.000 0.000 utils.py:473(dtype) 84 0.000 0.000 0.000 0.000 generic.py:7(_check) 146 0.000 0.000 0.000 0.000 indexing.py:319(tuple) 7 0.000 0.000 0.000 0.000 netCDF4_.py:34(__init__) 1 0.000 0.000 29.614 29.614 api.py:270(maybe_decode_store) 1 0.000 0.000 29.662 29.662 api.py:487(open_mfdataset) 20 0.000 0.000 0.000 0.000 common.py:1845(_is_dtype_type) 33 0.000 0.000 0.000 0.000 core.py:1911() 84 0.000 0.000 0.000 0.000 variable.py:117(_maybe_wrap_data) 3 0.000 0.000 0.001 0.000 variable.py:830(chunk) 25 0.000 0.000 0.000 0.000 contextlib.py:237(helper) 36/25 0.000 0.000 0.000 0.000 utils.py:477(shape) 8 0.000 0.000 0.000 0.000 base.py:566(_shallow_copy) 8 0.000 0.000 0.000 0.000 indexing.py:346(__init__) 26/25 0.000 0.000 0.000 0.000 utils.py:408(__call__) 4 0.000 0.000 0.000 0.000 indexing.py:886(_decompose_outer_indexer) 2 0.000 0.000 29.610 14.805 merge.py:172(expand_variable_dicts) 4 0.000 0.000 29.608 7.402 netCDF4_.py:67(_getitem) 2 0.000 0.000 0.000 0.000 dataset.py:722(copy) 7 0.000 0.000 0.001 0.000 dataset.py:1383(maybe_chunk) 16 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.empty} 14 0.000 0.000 0.000 0.000 fromnumeric.py:1471(ravel) 60 0.000 0.000 0.000 0.000 base.py:652(__len__) 3 0.000 0.000 0.000 0.000 core.py:141(getem) 25 0.000 0.000 0.000 0.000 contextlib.py:116(__exit__) 4 0.000 0.000 29.609 7.402 utils.py:62(safe_cast_to_index) 18 0.000 0.000 0.000 0.000 core.py:891(shape) 25 0.000 0.000 0.000 0.000 contextlib.py:107(__enter__) 4 0.000 0.000 0.001 0.000 utils.py:332(FrozenOrderedDict) 8 0.000 0.000 0.000 0.000 base.py:1271(set_names) 4 0.000 0.000 0.000 0.000 numeric.py:34(__new__) 24 0.000 0.000 0.000 0.000 inference.py:253(is_list_like) 3 0.000 0.000 0.000 0.000 core.py:820(__new__) 12 0.000 0.000 0.000 0.000 variable.py:1785(copy) 36 0.000 0.000 0.000 0.000 {method 'copy' of 'collections.OrderedDict' objects} 8/7 0.000 0.000 0.000 0.000 {built-in method builtins.sorted} 2 0.000 0.000 0.000 0.000 merge.py:220(determine_coords) 46 0.000 0.000 0.000 0.000 file_manager.py:141(_optional_lock) 60 0.000 0.000 0.000 0.000 indexing.py:1252(shape) 50 0.000 0.000 0.000 0.000 {built-in method builtins.next} 59 0.000 0.000 0.000 0.000 {built-in method builtins.iter} 54 0.000 0.000 0.000 0.000 :1009(_handle_fromlist) 1 0.000 0.000 0.000 0.000 api.py:146(_protect_dataset_variables_inplace) 1 0.000 0.000 29.646 29.646 api.py:162(open_dataset) 4 0.000 0.000 0.000 0.000 utils.py:424(_out_array_shape) 4 0.000 0.000 29.609 7.402 indexing.py:1224(__init__) 24 0.000 0.000 0.000 0.000 function_base.py:241(iterable) 4 0.000 0.000 0.000 0.000 dtypes.py:968(is_dtype) 2 0.000 0.000 0.000 0.000 merge.py:257(coerce_pandas_values) 14 0.000 0.000 0.000 0.000 missing.py:105(_isna_new) 8 0.000 0.000 0.000 0.000 variable.py:1840(to_index) 7 0.000 0.000 0.000 0.000 {method 'search' of 're.Pattern' objects} 48 0.000 0.000 0.000 0.000 {method 'rpartition' of 'str' objects} 7 0.000 0.000 0.000 0.000 strings.py:66(decode) 7 0.000 0.000 0.000 0.000 netCDF4_.py:257(_disable_auto_decode_variable) 14 0.000 0.000 0.000 0.000 numerictypes.py:619(issubclass_) 24/4 0.000 0.000 29.609 7.402 numeric.py:433(asarray) 7 0.000 0.000 0.000 0.000 {method 'ncattrs' of 'netCDF4._netCDF4.Variable' objects} 8 0.000 0.000 0.000 0.000 numeric.py:67(_shallow_copy) 8 0.000 0.000 0.000 0.000 indexing.py:373(__init__) 3 0.000 0.000 0.000 0.000 core.py:134() 14 0.000 0.000 0.000 0.000 merge.py:154() 16 0.000 0.000 0.000 0.000 dataset.py:816() 11 0.000 0.000 0.000 0.000 netCDF4_.py:56(get_array) 40 0.000 0.000 0.000 0.000 utils.py:40(_find_dim) 22 0.000 0.000 0.000 0.000 core.py:1893() 27 0.000 0.000 0.000 0.000 {built-in method builtins.all} 26/10 0.000 0.000 0.000 0.000 {built-in method builtins.sum} 2 0.000 0.000 0.000 0.000 dataset.py:424(attrs) 7 0.000 0.000 0.000 0.000 variables.py:231(decode) 1 0.000 0.000 0.000 0.000 file_manager.py:66(__init__) 67 0.000 0.000 0.000 0.000 utils.py:316(__getitem__) 22 0.000 0.000 0.000 0.000 {method 'move_to_end' of 'collections.OrderedDict' objects} 53 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass} 1 0.000 0.000 0.000 0.000 combine.py:374(_infer_concat_order_from_positions) 7 0.000 0.000 0.000 0.000 dataset.py:1378(selkeys) 1 0.000 0.000 0.001 0.001 dataset.py:1333(chunk) 4 0.000 0.000 29.609 7.402 netCDF4_.py:62(__getitem__) 37 0.000 0.000 0.000 0.000 netCDF4_.py:365() 18 0.000 0.000 0.000 0.000 {method 'ravel' of 'numpy.ndarray' objects} 2 0.000 0.000 0.000 0.000 alignment.py:37(align) 14 0.000 0.000 0.000 0.000 {pandas._libs.lib.is_scalar} 8 0.000 0.000 0.000 0.000 base.py:1239(_set_names) 16 0.000 0.000 0.000 0.000 indexing.py:314(__init__) 3 0.000 0.000 0.000 0.000 config.py:414(get) 7 0.000 0.000 0.000 0.000 dtypes.py:68(maybe_promote) 8 0.000 0.000 0.000 0.000 variable.py:1856(level_names) 37 0.000 0.000 0.000 0.000 {method 'copy' of 'dict' objects} 6 0.000 0.000 0.000 0.000 re.py:180(search) 6 0.000 0.000 0.000 0.000 re.py:271(_compile) 8 0.000 0.000 0.000 0.000 {built-in method _hashlib.openssl_md5} 1 0.000 0.000 0.000 0.000 merge.py:463(merge) 7 0.000 0.000 0.000 0.000 variables.py:158() 7 0.000 0.000 0.000 0.000 numerictypes.py:687(issubdtype) 6 0.000 0.000 0.000 0.000 utils.py:510(is_remote_uri) 8 0.000 0.000 0.000 0.000 common.py:1702(is_extension_array_dtype) 25 0.000 0.000 0.000 0.000 indexing.py:645(as_indexable) 21 0.000 0.000 0.000 0.000 {method 'pop' of 'collections.OrderedDict' objects} 19 0.000 0.000 0.000 0.000 {built-in method __new__ of type object at 0x2b324a13e3c0} 1 0.000 0.000 0.001 0.001 dataset.py:1394() 21 0.000 0.000 0.000 0.000 variables.py:117(pop_to) 1 0.000 0.000 0.032 0.032 netCDF4_.py:320(open) 8 0.000 0.000 0.000 0.000 netCDF4_.py:399() 12 0.000 0.000 0.000 0.000 __init__.py:221(iteritems) 4 0.000 0.000 0.000 0.000 common.py:403(is_datetime64_dtype) 8 0.000 0.000 0.000 0.000 common.py:1809(_get_dtype) 8 0.000 0.000 0.000 0.000 dtypes.py:68(find) 8 0.000 0.000 0.000 0.000 base.py:3607(values) 22 0.000 0.000 0.000 0.000 pycompat.py:32(move_to_end) 8 0.000 0.000 0.000 0.000 utils.py:792(__exit__) 3 0.000 0.000 0.000 0.000 highlevelgraph.py:84(from_collections) 22 0.000 0.000 0.000 0.000 core.py:1906() 16 0.000 0.000 0.000 0.000 abc.py:141(__subclasscheck__) 1 0.000 0.000 0.000 0.000 posixpath.py:104(split) 1 0.000 0.000 0.001 0.001 combine.py:479(_auto_combine_all_along_first_dim) 1 0.000 0.000 29.610 29.610 dataset.py:321(__init__) 4 0.000 0.000 0.000 0.000 dataset.py:643(_construct_direct) 7 0.000 0.000 0.000 0.000 variables.py:266(decode) 1 0.000 0.000 0.032 0.032 netCDF4_.py:306(__init__) 14 0.000 0.000 0.000 0.000 numeric.py:504(asanyarray) 4 0.000 0.000 0.000 0.000 common.py:503(is_period_dtype) 8 0.000 0.000 0.000 0.000 common.py:1981(pandas_dtype) 12 0.000 0.000 0.000 0.000 base.py:633(_reset_identity) 11 0.000 0.000 0.000 0.000 pycompat.py:18(iteritems) 16 0.000 0.000 0.000 0.000 utils.py:279(is_integer) 14 0.000 0.000 0.000 0.000 variable.py:268(dtype) 4 0.000 0.000 0.000 0.000 indexing.py:698(_outer_to_numpy_indexer) 42 0.000 0.000 0.000 0.000 variable.py:701(attrs) 9 0.000 0.000 0.000 0.000 {built-in method builtins.any} 1 0.000 0.000 0.000 0.000 posixpath.py:338(normpath) 6 0.000 0.000 0.000 0.000 _collections_abc.py:676(items) 24 0.000 0.000 0.000 0.000 {built-in method math.isnan} 1 0.000 0.000 29.610 29.610 merge.py:360(merge_data_and_coords) 1 0.000 0.000 0.000 0.000 dataset.py:1084(set_coords) 1 0.000 0.000 0.001 0.001 common.py:99(load) 1 0.000 0.000 0.000 0.000 file_manager.py:250(decrement) 4 0.000 0.000 0.000 0.000 locks.py:154(__enter__) 7 0.000 0.000 0.000 0.000 netCDF4_.py:160(_ensure_fill_value_valid) 8 0.000 0.000 0.001 0.000 netCDF4_.py:393() 8 0.000 0.000 0.000 0.000 common.py:572(is_categorical_dtype) 16 0.000 0.000 0.000 0.000 base.py:75(is_dtype) 72 0.000 0.000 0.000 0.000 indexing.py:327(as_integer_or_none) 26 0.000 0.000 0.000 0.000 utils.py:382(dispatch) 3 0.000 0.000 0.000 0.000 core.py:123(slices_from_chunks) 16 0.000 0.000 0.000 0.000 core.py:768() 4 0.000 0.000 29.609 7.402 indexing.py:514(__array__) 4 0.000 0.000 0.000 0.000 indexing.py:1146(__init__) 4 0.000 0.000 0.000 0.000 indexing.py:1153(_indexing_array_and_key) 4 0.000 0.000 29.609 7.402 variable.py:400(to_index_variable) 30 0.000 0.000 0.000 0.000 {method 'items' of 'collections.OrderedDict' objects} 16 0.000 0.000 0.000 0.000 {built-in method _abc._abc_subclasscheck} 19 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects} 1 0.000 0.000 0.000 0.000 combine.py:423(_check_shape_tile_ids) 4 0.000 0.000 0.000 0.000 merge.py:91(_assert_compat_valid) 12 0.000 0.000 0.000 0.000 dataset.py:263() 1 0.000 0.000 29.610 29.610 dataset.py:372(_set_init_vars_and_dims) 3 0.000 0.000 0.000 0.000 dataset.py:413(_attrs_copy) 8 0.000 0.000 0.000 0.000 common.py:120() 14 0.000 0.000 0.000 0.000 {built-in method pandas._libs.missing.checknull} 4 0.000 0.000 0.000 0.000 common.py:746(is_dtype_equal) 4 0.000 0.000 0.000 0.000 common.py:923(is_signed_integer_dtype) 4 0.000 0.000 0.000 0.000 common.py:1545(is_float_dtype) 14 0.000 0.000 0.000 0.000 missing.py:25(isna) 3 0.000 0.000 0.000 0.000 highlevelgraph.py:71(__init__) 3 0.000 0.000 0.000 0.000 core.py:137() 33 0.000 0.000 0.000 0.000 core.py:1883() 35 0.000 0.000 0.000 0.000 variable.py:713(encoding) 2 0.000 0.000 0.000 0.000 {built-in method builtins.min} 16 0.000 0.000 0.000 0.000 _collections_abc.py:719(__iter__) 8 0.000 0.000 0.000 0.000 _collections_abc.py:760(__iter__) 1 0.000 0.000 0.015 0.015 glob.py:9(glob) 2 0.000 0.000 0.015 0.008 glob.py:39(_iglob) 8 0.000 0.000 0.000 0.000 {method 'hexdigest' of '_hashlib.HASH' objects} 1 0.000 0.000 0.000 0.000 combine.py:500(_auto_combine_1d) 14 0.000 0.000 0.000 0.000 merge.py:104(__missing__) 1 0.000 0.000 0.000 0.000 coordinates.py:167(variables) 3 0.000 0.000 0.000 0.000 dataset.py:98() 4 0.000 0.000 0.000 0.000 dataset.py:402(variables) 1 0.000 0.000 0.000 0.000 netCDF4_.py:269(_disable_auto_decode_group) 12 0.000 0.000 0.032 0.003 netCDF4_.py:357(ds) 1 0.000 0.000 29.646 29.646 api.py:637() 9 0.000 0.000 0.000 0.000 utils.py:313(__init__) 7 0.000 0.000 0.000 0.000 {method 'filters' of 'netCDF4._netCDF4.Variable' objects} 12 0.000 0.000 0.000 0.000 common.py:117(classes) 8 0.000 0.000 0.000 0.000 common.py:536(is_interval_dtype) 4 0.000 0.000 0.000 0.000 common.py:1078(is_datetime64_any_dtype) 4 0.000 0.000 0.000 0.000 dtypes.py:827(is_dtype) 8 0.000 0.000 0.000 0.000 base.py:551() 8 0.000 0.000 0.000 0.000 base.py:547(_get_attributes_dict) 8 0.000 0.000 0.000 0.000 utils.py:789(__enter__) 18 0.000 0.000 0.000 0.000 core.py:903(_get_chunks) 33 0.000 0.000 0.000 0.000 core.py:1885() 22 0.000 0.000 0.000 0.000 core.py:1889() 4 0.000 0.000 0.000 0.000 indexing.py:799(_decompose_slice) 4 0.000 0.000 0.000 0.000 indexing.py:1174(__getitem__) 3 0.000 0.000 0.000 0.000 variable.py:294(data) 8 0.000 0.000 0.000 0.000 {method '__enter__' of '_thread.lock' objects} 9 0.000 0.000 0.000 0.000 {built-in method builtins.hash} 4 0.000 0.000 0.000 0.000 {built-in method builtins.max} 4 0.000 0.000 0.000 0.000 {method 'update' of 'set' objects} 7 0.000 0.000 0.000 0.000 {method 'values' of 'dict' objects} 8 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects} 1 0.000 0.000 0.000 0.000 posixpath.py:376(abspath) 1 0.000 0.000 0.000 0.000 genericpath.py:53(getmtime) 4 0.000 0.000 0.000 0.000 _collections_abc.py:657(get) 1 0.000 0.000 0.000 0.000 __init__.py:548(__init__) 1 0.000 0.000 0.000 0.000 __init__.py:617(update) 4/2 0.000 0.000 0.000 0.000 combine.py:392(_infer_tile_ids_from_nested_list) 1 0.000 0.000 0.001 0.001 combine.py:522(_auto_combine) 2 0.000 0.000 0.000 0.000 merge.py:100(__init__) 5 0.000 0.000 0.000 0.000 coordinates.py:38(__iter__) 5 0.000 0.000 0.000 0.000 coordinates.py:169() 1 0.000 0.000 0.000 0.000 dataset.py:666(_replace_vars_and_dims) 5 0.000 0.000 0.000 0.000 dataset.py:1078(data_vars) 1 0.000 0.000 0.000 0.000 file_manager.py:133(_make_key) 1 0.000 0.000 0.000 0.000 file_manager.py:245(increment) 1 0.000 0.000 0.000 0.000 lru_cache.py:54(__setitem__) 1 0.000 0.000 0.000 0.000 netCDF4_.py:398(get_attrs) 1 0.000 0.000 0.000 0.000 api.py:80(_get_default_engine) 1 0.000 0.000 0.000 0.000 api.py:92(_normalize_path) 8 0.000 0.000 0.000 0.000 {method 'view' of 'numpy.ndarray' objects} 8 0.000 0.000 0.000 0.000 utils.py:187(is_dict_like) 4 0.000 0.000 0.000 0.000 utils.py:219(is_valid_numpy_dtype) 10 0.000 0.000 0.000 0.000 utils.py:319(__iter__) 1 0.000 0.000 0.000 0.000 {method 'filepath' of 'netCDF4._netCDF4.Dataset' objects} 4 0.000 0.000 0.000 0.000 common.py:434(is_datetime64tz_dtype) 3 0.000 0.000 0.000 0.000 config.py:107(normalize_key) 3 0.000 0.000 0.000 0.000 core.py:160() 6 0.000 0.000 0.000 0.000 core.py:966(ndim) 4 0.000 0.000 0.000 0.000 indexing.py:791(decompose_indexer) 8 0.000 0.000 0.000 0.000 {method '__exit__' of '_thread.lock' objects} 3 0.000 0.000 0.000 0.000 {method 'replace' of 'str' objects} 4 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects} 1 0.000 0.000 0.000 0.000 posixpath.py:121(splitext) 1 0.000 0.000 0.000 0.000 genericpath.py:117(_splitext) 1 0.000 0.000 0.001 0.001 combine.py:443(_combine_nd) 1 0.000 0.000 0.000 0.000 combine.py:508() 14 0.000 0.000 0.000 0.000 merge.py:41(unique_variable) 11 0.000 0.000 0.000 0.000 coordinates.py:163(_names) 1 0.000 0.000 0.000 0.000 dataset.py:2593(_assert_all_in_dataset) 1 0.000 0.000 0.000 0.000 variables.py:55(__init__) 1 0.000 0.000 0.000 0.000 file_manager.py:269(__init__) 29 0.000 0.000 0.000 0.000 file_manager.py:273(__hash__) 1 0.000 0.000 0.001 0.001 netCDF4_.py:392(get_variables) 1 0.000 0.000 0.000 0.000 netCDF4_.py:410() 7 0.000 0.000 0.000 0.000 {method 'set_auto_chartostring' of 'netCDF4._netCDF4.Variable' objects} 1 0.000 0.000 0.000 0.000 {method 'ncattrs' of 'netCDF4._netCDF4.Dataset' objects} 4 0.000 0.000 0.000 0.000 common.py:472(is_timedelta64_dtype) 4 0.000 0.000 0.000 0.000 common.py:980(is_unsigned_integer_dtype) 4 0.000 0.000 0.000 0.000 base.py:3805(_coerce_to_ndarray) 3 0.000 0.000 0.000 0.000 itertoolz.py:241(unique) 11 0.000 0.000 0.000 0.000 core.py:137() 3 0.000 0.000 0.000 0.000 indexing.py:600(__init__) 2 0.000 0.000 0.000 0.000 {method 'keys' of 'collections.OrderedDict' objects} 2 0.000 0.000 0.000 0.000 {built-in method _thread.allocate_lock} 1 0.000 0.000 0.000 0.000 {built-in method _collections._count_elements} 8 0.000 0.000 0.000 0.000 {method 'encode' of 'str' objects} 3 0.000 0.000 0.000 0.000 {method 'rfind' of 'str' objects} 8 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects} 3 0.000 0.000 0.000 0.000 {method 'intersection' of 'set' objects} 7 0.000 0.000 0.000 0.000 {method 'setdefault' of 'dict' objects} 13 0.000 0.000 0.000 0.000 {method 'pop' of 'dict' objects} 1 0.000 0.000 0.000 0.000 posixpath.py:64(isabs) 1 0.000 0.000 0.015 0.015 posixpath.py:178(lexists) 1 0.000 0.000 0.000 0.000 posixpath.py:232(expanduser) 2 0.000 0.000 0.000 0.000 _collections_abc.py:672(keys) 7 0.000 0.000 0.000 0.000 contextlib.py:352(__init__) 7 0.000 0.000 0.000 0.000 contextlib.py:355(__enter__) 2 0.000 0.000 0.000 0.000 combine.py:496(vars_as_keys) 2 0.000 0.000 0.000 0.000 combine.py:517(_new_tile_id) 7 0.000 0.000 0.000 0.000 common.py:29(_decode_variable_name) 1 0.000 0.000 0.000 0.000 coordinates.py:160(__init__) 3 0.000 0.000 0.000 0.000 dataset.py:262(__iter__) 2 0.000 0.000 0.000 0.000 dataset.py:266(__len__) 2 0.000 0.000 0.000 0.000 dataset.py:940(__iter__) 1 0.000 0.000 0.000 0.000 dataset.py:1071(coords) 7 0.000 0.000 0.000 0.000 dataset.py:1381() 4 0.000 0.000 0.000 0.000 variables.py:61(dtype) 1 0.000 0.000 0.000 0.000 file_manager.py:189(__del__) 1 0.000 0.000 0.000 0.000 lru_cache.py:47(_enforce_size_limit) 1 0.000 0.000 0.000 0.000 netCDF4_.py:138(_nc4_require_group) 1 0.000 0.000 0.000 0.000 netCDF4_.py:408(get_encoding) 1 0.000 0.000 0.000 0.000 api.py:66(_get_default_engine_netcdf) 4 0.000 0.000 0.000 0.000 utils.py:197() 1 0.000 0.000 0.000 0.000 alignment.py:17(_get_joiner) 10 0.000 0.000 0.000 0.000 alignment.py:184(is_alignable) 5 0.000 0.000 0.000 0.000 alignment.py:226() 5 0.000 0.000 0.000 0.000 utils.py:325(__contains__) 5 0.000 0.000 0.000 0.000 {method 'isunlimited' of 'netCDF4._netCDF4.Dimension' objects} 8 0.000 0.000 0.000 0.000 inference.py:435(is_hashable) 12 0.000 0.000 0.000 0.000 common.py:119() 8 0.000 0.000 0.000 0.000 common.py:127() 8 0.000 0.000 0.000 0.000 common.py:122(classes_and_not_datetimelike) 4 0.000 0.000 0.000 0.000 base.py:675(dtype) 8 0.000 0.000 0.000 0.000 base.py:1395(nlevels) 24 0.000 0.000 0.000 0.000 functoolz.py:15(identity) 1 0.000 0.000 0.000 0.000 base.py:610(normalize_dict) 1 0.000 0.000 0.000 0.000 base.py:625(normalize_seq) 3 0.000 0.000 0.000 0.000 indexing.py:453(__init__) 4 0.000 0.000 0.000 0.000 indexing.py:713() 3 0.000 0.000 0.000 0.000 variable.py:821(chunks) 4 0.000 0.000 0.000 0.000 variable.py:1731(chunk) 8 0.000 0.000 0.000 0.000 variable.py:1874(name) 3 0.000 0.000 0.000 0.000 {method 'values' of 'collections.OrderedDict' objects} 6 0.000 0.000 0.000 0.000 {built-in method posix.fspath} 1 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects} 4 0.000 0.000 0.000 0.000 {method 'startswith' of 'str' objects} 3 0.000 0.000 0.000 0.000 {method 'copy' of 'set' objects} 1 0.000 0.000 0.000 0.000 {method 'union' of 'set' objects} 1 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects} 2 0.000 0.000 0.000 0.000 posixpath.py:41(_get_sep) 1 0.000 0.000 0.000 0.000 _collections_abc.py:680(values) 9 0.000 0.000 0.000 0.000 _collections_abc.py:698(__init__) 7 0.000 0.000 0.000 0.000 contextlib.py:358(__exit__) 1 0.000 0.000 0.000 0.000 glob.py:145(has_magic) 1 0.000 0.000 0.000 0.000 combine.py:428() 2 0.000 0.000 0.000 0.000 merge.py:301(_get_priority_vars) 1 0.000 0.000 0.000 0.000 merge.py:370(extract_indexes) 1 0.000 0.000 0.000 0.000 merge.py:378(assert_valid_explicit_coords) 5 0.000 0.000 0.000 0.000 dataset.py:259(__init__) 1 0.000 0.000 0.000 0.000 dataset.py:375() 2 0.000 0.000 0.000 0.000 dataset.py:416(attrs) 5 0.000 0.000 0.000 0.000 dataset.py:428(encoding) 1 0.000 0.000 0.000 0.000 dataset.py:436(encoding) 1 0.000 0.000 0.000 0.000 dataset.py:1373() 1 0.000 0.000 0.000 0.000 variables.py:76(lazy_elemwise_func) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 7 0.000 0.000 0.000 0.000 strings.py:39(__init__) 1 0.000 0.000 0.000 0.000 file_manager.py:241(__init__) 1 0.000 0.000 0.000 0.000 locks.py:206(ensure_lock) 1 0.000 0.000 0.000 0.000 netCDF4_.py:236(__init__) 1 0.000 0.000 0.000 0.000 api.py:638() 1 0.000 0.000 0.000 0.000 utils.py:452(_tostr) 7 0.000 0.000 0.000 0.000 {method 'set_auto_maskandscale' of 'netCDF4._netCDF4.Variable' objects} 1 0.000 0.000 0.000 0.000 utils.py:514(is_grib_path) 3 0.000 0.000 0.000 0.000 core.py:989(name) 8 0.000 0.000 0.000 0.000 variable.py:1834(to_index_variable) 1 0.000 0.000 0.000 0.000 {method 'rstrip' of 'str' objects} 1 0.000 0.000 0.000 0.000 {method 'endswith' of 'str' objects} 1 0.000 0.000 0.000 0.000 {method 'keys' of 'dict' objects} 1 0.000 0.000 0.000 0.000 glob.py:22(iglob) 2 0.000 0.000 0.000 0.000 variable.py:2007() 1 0.000 0.000 0.000 0.000 combine.py:345(_auto_concat) 1 0.000 0.000 0.000 0.000 combine.py:435() 1 0.000 0.000 0.000 0.000 merge.py:519() 2 0.000 0.000 0.000 0.000 dataset.py:934(__len__) 2 0.000 0.000 0.000 0.000 variables.py:106(safe_setitem) 1 0.000 0.000 0.000 0.000 api.py:479(__init__) 1 0.000 0.000 0.000 0.000 utils.py:20(_check_inplace) 7 0.000 0.000 0.000 0.000 {method 'chunking' of 'netCDF4._netCDF4.Variable' objects} 4 0.000 0.000 0.000 0.000 utils.py:498(close_on_error) 1 0.000 0.000 0.000 0.000 numeric.py:101(_assert_safe_casting) 3 0.000 0.000 0.000 0.000 core.py:167()
Output of ds: ``` Dimensions: (bnds: 2, lat: 360, level: 23, lon: 576, time: 1827) Coordinates: * lat (lat) float64 -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25 89.75 * level (level) float32 1000.0 925.0 850.0 775.0 700.0 ... 5.0 3.0 2.0 1.0 * lon (lon) float64 0.3125 0.9375 1.562 2.188 ... 358.4 359.1 359.7 * time (time) float64 7.671e+03 7.672e+03 ... 9.496e+03 9.497e+03 Dimensions without coordinates: bnds Data variables: lat_bnds (lat, bnds) float64 dask.array lon_bnds (lon, bnds) float64 dask.array sphum (time, level, lat, lon) float32 dask.array ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135 https://github.com/pydata/xarray/issues/1385#issuecomment-464113917,https://api.github.com/repos/pydata/xarray/issues/1385,464113917,MDEyOklzc3VlQ29tbWVudDQ2NDExMzkxNw==,30007270,2019-02-15T16:34:02Z,2019-02-15T16:34:35Z,NONE,"On a related note, is it possible to clear out the memory used by the xarray dataset after it is no longer needed? Here's an example: ```python fname = '/work/xrc/AM4_xrc/c192L33_am4p0_cmip6Diag/daily/5yr/atmos.19800101-19841231.ucomp.nc' ``` ```python import xarray as xr ``` ```python with xr.set_options(file_cache_maxsize=1): %time ds = xr.open_mfdataset(fname) ``` CPU times: user 48 ms, sys: 124 ms, total: 172 ms Wall time: 29.7 s ```python fname2 = '/work/xrc/AM4_xrc/c192L33_am4p0_cmip6Diag/daily/5yr/atmos.20100101-20141231.ucomp.nc' ``` ```python with xr.set_options(file_cache_maxsize=1): %time ds = xr.open_mfdataset(fname2) # would like this to free up memory used by fname ``` CPU times: user 39 ms, sys: 124 ms, total: 163 ms Wall time: 28.8 s ```python import gc gc.collect() ``` ```python with xr.set_options(file_cache_maxsize=1): # expected to take same time as first call %time ds = xr.open_mfdataset(fname) ``` CPU times: user 28 ms, sys: 10 ms, total: 38 ms Wall time: 37.9 ms ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135 https://github.com/pydata/xarray/issues/1385#issuecomment-463367754,https://api.github.com/repos/pydata/xarray/issues/1385,463367754,MDEyOklzc3VlQ29tbWVudDQ2MzM2Nzc1NA==,30007270,2019-02-13T20:58:52Z,2019-02-13T20:59:06Z,NONE,"It seems my issue has to do with the time coordinate: ``` fname = '/work/xrc/AM4_xrc/c192L33_am4p0_cmip6Diag/daily/5yr/atmos.20100101-20141231.sphum.nc' %prun ds = xr.open_mfdataset(fname,drop_variables='time') 7510 function calls (7366 primitive calls) in 0.068 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.039 0.039 0.039 0.039 netCDF4_.py:244(_open_netcdf4_group) 3 0.022 0.007 0.022 0.007 {built-in method _operator.getitem} 1 0.001 0.001 0.001 0.001 {built-in method posix.lstat} 125/113 0.000 0.000 0.001 0.000 indexing.py:504(shape) 11 0.000 0.000 0.000 0.000 core.py:137() fname = '/work/xrc/AM4_xrc/c192L33_am4p0_cmip6Diag/daily/5yr/atmos.20000101-20041231.sphum.nc' %prun ds = xr.open_mfdataset(fname) 13143 function calls (12936 primitive calls) in 23.853 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 6 23.791 3.965 23.791 3.965 {built-in method _operator.getitem} 1 0.029 0.029 0.029 0.029 netCDF4_.py:244(_open_netcdf4_group) 2 0.023 0.012 0.023 0.012 {cftime._cftime.num2date} 1 0.001 0.001 0.001 0.001 {built-in method posix.lstat} 158/139 0.000 0.000 0.001 0.000 indexing.py:504(shape) ``` Both files are 33 GB. This is using xarray 0.11.3. I also confirm that nc.MFDataset is much faster (<1s). Is there any speed-up for the time coordinates possible, given that my data follows a standard calendar? (Short of using drop_variables='time' and then manually adding the time coordinate...)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135 https://github.com/pydata/xarray/issues/1385#issuecomment-439478904,https://api.github.com/repos/pydata/xarray/issues/1385,439478904,MDEyOklzc3VlQ29tbWVudDQzOTQ3ODkwNA==,30007270,2018-11-16T18:10:53Z,2018-11-16T18:10:53Z,NONE,"h5netcdf fails with the following error (presumably the file is not compatible): ``` /nbhome/xrc/anaconda2/envs/py361/lib/python3.6/site-packages/h5py/_hl/files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr) 97 if swmr and swmr_support: 98 flags |= h5f.ACC_SWMR_READ ---> 99 fid = h5f.open(name, flags, fapl=fapl) 100 elif mode == 'r+': 101 fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl) h5py/_objects.pyx in h5py._objects.with_phil.wrapper() h5py/_objects.pyx in h5py._objects.with_phil.wrapper() h5py/h5f.pyx in h5py.h5f.open() OSError: Unable to open file (file signature not found) ``` Using scipy: ``` ncalls tottime percall cumtime percall filename:lineno(function) 65/42 80.448 1.238 80.489 1.916 {built-in method numpy.core.multiarray.array} 764838 0.548 0.000 0.548 0.000 core.py:169() 3 0.169 0.056 0.717 0.239 core.py:169() 2 0.041 0.021 0.041 0.021 {cftime._cftime.num2date} 3 0.038 0.013 0.775 0.258 core.py:173(getem) 1 0.024 0.024 81.313 81.313 :1() ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135 https://github.com/pydata/xarray/issues/1385#issuecomment-439445695,https://api.github.com/repos/pydata/xarray/issues/1385,439445695,MDEyOklzc3VlQ29tbWVudDQzOTQ0NTY5NQ==,30007270,2018-11-16T16:20:25Z,2018-11-16T16:20:25Z,NONE,"Sorry, I think the speedup had to do with accessing a file that had previously been loaded rather than due to `decode_cf`. Here's the output of `prun` using two different files of approximately the same size (~75 GB), run from a notebook without using distributed (which doesn't lead to any speedup): Output of %prun ds = xr.open_mfdataset('/work/xrc/AM4_skc/atmos_level.1999010100-2000123123.sphum.nc',chunks={'lat':20,'time':50,'lon':12,'pfull':11}) ``` 780980 function calls (780741 primitive calls) in 55.374 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 7 54.448 7.778 54.448 7.778 {built-in method _operator.getitem} 764838 0.473 0.000 0.473 0.000 core.py:169() 3 0.285 0.095 0.758 0.253 core.py:169() 2 0.041 0.020 0.041 0.020 {cftime._cftime.num2date} 3 0.040 0.013 0.821 0.274 core.py:173(getem) 1 0.027 0.027 55.374 55.374 :1() ``` Output of %prun ds = xr.open_mfdataset('/work/xrc/AM4_skc/atmos_level.2001010100-2002123123.temp.nc',chunks={'lat':20,'time':50,'lon':12,'pfull':11},\ decode_cf=False) ``` 772212 function calls (772026 primitive calls) in 56.000 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 5 55.213 11.043 55.214 11.043 {built-in method _operator.getitem} 764838 0.486 0.000 0.486 0.000 core.py:169() 3 0.185 0.062 0.671 0.224 core.py:169() 3 0.041 0.014 0.735 0.245 core.py:173(getem) 1 0.027 0.027 56.001 56.001 :1() ``` /work isn't a remote archive, so it surprises me that this should happen. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135 https://github.com/pydata/xarray/issues/1385#issuecomment-439042364,https://api.github.com/repos/pydata/xarray/issues/1385,439042364,MDEyOklzc3VlQ29tbWVudDQzOTA0MjM2NA==,30007270,2018-11-15T13:37:16Z,2018-11-15T14:06:04Z,NONE,"Yes, I'm on 0.11. Nothing displays on the task stream/ progress bar when using `open_mfdataset`, although I can monitor progress when, say, computing the mean. The output from `%time` using `decode_cf = False` is ``` CPU times: user 4.42 s, sys: 392 ms, total: 4.82 s Wall time: 4.74 s ``` and for decode_cf = True: ``` CPU times: user 11.6 s, sys: 1.61 s, total: 13.2 s Wall time: 3min 28s ``` Using `xr.set_options(file_cache_maxsize=1)` doesn't make any noticeable difference. If I repeat the open_mfdataset for another 5 files (after opening the first 5), I occasionally get this warning: `distributed.utils_perf - WARNING - full garbage collections took 24% CPU time recently (threshold: 10%)` I only began using the dashboard recently; please let me know if there's something basic I'm missing.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135 https://github.com/pydata/xarray/issues/1385#issuecomment-438870575,https://api.github.com/repos/pydata/xarray/issues/1385,438870575,MDEyOklzc3VlQ29tbWVudDQzODg3MDU3NQ==,30007270,2018-11-15T00:32:42Z,2018-11-15T00:32:42Z,NONE,"I can confirm that ``` ds = xr.open_mfdataset(data_fnames,chunks={'lat':20,'time':50,'lon':24,'pfull':11},\ decode_cf=False) ds = xr.decode_cf(ds) ``` is much faster (seconds vs minutes) than ``` ds = xr.open_mfdataset(data_fnames,chunks={'lat':20,'time':50,'lon':24,'pfull':11}) ``` . For reference, data_fnames is a list of 5 files, each of which is ~75 GB.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,224553135