home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "NONE", issue = 224553135 and user = 30007270 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • chuaxr · 7 ✖

issue 1

  • slow performance with open_mfdataset · 7 ✖

author_association 1

  • NONE · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
464100720 https://github.com/pydata/xarray/issues/1385#issuecomment-464100720 https://api.github.com/repos/pydata/xarray/issues/1385 MDEyOklzc3VlQ29tbWVudDQ2NDEwMDcyMA== chuaxr 30007270 2019-02-15T15:57:01Z 2019-02-15T18:33:31Z NONE

In that case, the speedup disappears. It seems that the slowdown arises from the entire time array being loaded into memory at once.

EDIT: I subsequently realized that using drop_variables = 'time' caused all the data values to become nan, which makes that an invalid option. ```

%prun ds = xr.open_mfdataset(fname,decode_times=False) 8025 function calls (7856 primitive calls) in 29.662 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function) 4 29.608 7.402 29.608 7.402 {built-in method operator.getitem} 1 0.032 0.032 0.032 0.032 netCDF4.py:244(_open_netcdf4_group) 1 0.015 0.015 0.015 0.015 {built-in method posix.lstat} 126/114 0.000 0.000 0.001 0.000 indexing.py:504(shape) 1196 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance} 81 0.000 0.000 0.001 0.000 variable.py:239(init) ``` See the rest of the prun output under the Details for more information:

30 0.000 0.000 0.000 0.000 {method 'getncattr' of 'netCDF4._netCDF4.Variable' objects} 81 0.000 0.000 0.000 0.000 variable.py:709(attrs) 736/672 0.000 0.000 0.000 0.000 {built-in method builtins.len} 157 0.000 0.000 0.001 0.000 utils.py:450(ndim) 81 0.000 0.000 0.001 0.000 variable.py:417(_parse_dimensions) 7 0.000 0.000 0.001 0.000 netCDF4_.py:361(open_store_variable) 4 0.000 0.000 0.000 0.000 base.py:253(__new__) 1 0.000 0.000 29.662 29.662 <string>:1(<module>) 7 0.000 0.000 0.001 0.000 conventions.py:245(decode_cf_variable) 39/19 0.000 0.000 29.609 1.558 {built-in method numpy.core.multiarray.array} 9 0.000 0.000 0.000 0.000 core.py:1776(normalize_chunks) 104 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr} 143 0.000 0.000 0.001 0.000 variable.py:272(shape) 4 0.000 0.000 0.000 0.000 utils.py:88(_StartCountStride) 8 0.000 0.000 0.000 0.000 core.py:747(blockdims_from_blockshape) 23 0.000 0.000 0.032 0.001 file_manager.py:150(acquire) 8 0.000 0.000 0.000 0.000 base.py:590(tokenize) 84 0.000 0.000 0.000 0.000 variable.py:137(as_compatible_data) 268 0.000 0.000 0.000 0.000 {method 'indices' of 'slice' objects} 14 0.000 0.000 29.610 2.115 variable.py:41(as_variable) 35 0.000 0.000 0.000 0.000 variables.py:102(unpack_for_decoding) 81 0.000 0.000 0.000 0.000 variable.py:721(encoding) 192 0.000 0.000 0.000 0.000 {built-in method builtins.getattr} 2 0.000 0.000 0.000 0.000 merge.py:109(merge_variables) 2 0.000 0.000 29.610 14.805 merge.py:392(merge_core) 7 0.000 0.000 0.000 0.000 variables.py:161(<setcomp>) 103 0.000 0.000 0.000 0.000 {built-in method _abc._abc_instancecheck} 1 0.000 0.000 0.001 0.001 conventions.py:351(decode_cf_variables) 3 0.000 0.000 0.000 0.000 dataset.py:90(calculate_dimensions) 1 0.000 0.000 0.000 0.000 {built-in method posix.stat} 361 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 20 0.000 0.000 0.000 0.000 variable.py:728(copy) 23 0.000 0.000 0.000 0.000 lru_cache.py:40(__getitem__) 12 0.000 0.000 0.000 0.000 base.py:504(_simple_new) 2 0.000 0.000 0.000 0.000 variable.py:1985(assert_unique_multiindex_level_names) 2 0.000 0.000 0.000 0.000 alignment.py:172(deep_align) 14 0.000 0.000 0.000 0.000 indexing.py:469(__init__) 16 0.000 0.000 29.609 1.851 variable.py:1710(__init__) 1 0.000 0.000 29.662 29.662 {built-in method builtins.exec} 25 0.000 0.000 0.000 0.000 contextlib.py:81(__init__) 7 0.000 0.000 0.000 0.000 {method 'getncattr' of 'netCDF4._netCDF4.Dataset' objects} 24 0.000 0.000 0.000 0.000 indexing.py:331(as_integer_slice) 50/46 0.000 0.000 0.000 0.000 common.py:181(__setattr__) 7 0.000 0.000 0.000 0.000 variables.py:155(decode) 4 0.000 0.000 29.609 7.402 indexing.py:760(explicit_indexing_adapter) 48 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:416(parent) 103 0.000 0.000 0.000 0.000 abc.py:137(__instancecheck__) 48 0.000 0.000 0.000 0.000 _collections_abc.py:742(__iter__) 180 0.000 0.000 0.000 0.000 variable.py:411(dims) 4 0.000 0.000 0.000 0.000 locks.py:158(__exit__) 3 0.000 0.000 0.001 0.000 core.py:2048(from_array) 1 0.000 0.000 29.612 29.612 conventions.py:412(decode_cf) 4 0.000 0.000 0.000 0.000 utils.py:50(_maybe_cast_to_cftimeindex) 77/59 0.000 0.000 0.000 0.000 utils.py:473(dtype) 84 0.000 0.000 0.000 0.000 generic.py:7(_check) 146 0.000 0.000 0.000 0.000 indexing.py:319(tuple) 7 0.000 0.000 0.000 0.000 netCDF4_.py:34(__init__) 1 0.000 0.000 29.614 29.614 api.py:270(maybe_decode_store) 1 0.000 0.000 29.662 29.662 api.py:487(open_mfdataset) 20 0.000 0.000 0.000 0.000 common.py:1845(_is_dtype_type) 33 0.000 0.000 0.000 0.000 core.py:1911(<genexpr>) 84 0.000 0.000 0.000 0.000 variable.py:117(_maybe_wrap_data) 3 0.000 0.000 0.001 0.000 variable.py:830(chunk) 25 0.000 0.000 0.000 0.000 contextlib.py:237(helper) 36/25 0.000 0.000 0.000 0.000 utils.py:477(shape) 8 0.000 0.000 0.000 0.000 base.py:566(_shallow_copy) 8 0.000 0.000 0.000 0.000 indexing.py:346(__init__) 26/25 0.000 0.000 0.000 0.000 utils.py:408(__call__) 4 0.000 0.000 0.000 0.000 indexing.py:886(_decompose_outer_indexer) 2 0.000 0.000 29.610 14.805 merge.py:172(expand_variable_dicts) 4 0.000 0.000 29.608 7.402 netCDF4_.py:67(_getitem) 2 0.000 0.000 0.000 0.000 dataset.py:722(copy) 7 0.000 0.000 0.001 0.000 dataset.py:1383(maybe_chunk) 16 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.empty} 14 0.000 0.000 0.000 0.000 fromnumeric.py:1471(ravel) 60 0.000 0.000 0.000 0.000 base.py:652(__len__) 3 0.000 0.000 0.000 0.000 core.py:141(getem) 25 0.000 0.000 0.000 0.000 contextlib.py:116(__exit__) 4 0.000 0.000 29.609 7.402 utils.py:62(safe_cast_to_index) 18 0.000 0.000 0.000 0.000 core.py:891(shape) 25 0.000 0.000 0.000 0.000 contextlib.py:107(__enter__) 4 0.000 0.000 0.001 0.000 utils.py:332(FrozenOrderedDict) 8 0.000 0.000 0.000 0.000 base.py:1271(set_names) 4 0.000 0.000 0.000 0.000 numeric.py:34(__new__) 24 0.000 0.000 0.000 0.000 inference.py:253(is_list_like) 3 0.000 0.000 0.000 0.000 core.py:820(__new__) 12 0.000 0.000 0.000 0.000 variable.py:1785(copy) 36 0.000 0.000 0.000 0.000 {method 'copy' of 'collections.OrderedDict' objects} 8/7 0.000 0.000 0.000 0.000 {built-in method builtins.sorted} 2 0.000 0.000 0.000 0.000 merge.py:220(determine_coords) 46 0.000 0.000 0.000 0.000 file_manager.py:141(_optional_lock) 60 0.000 0.000 0.000 0.000 indexing.py:1252(shape) 50 0.000 0.000 0.000 0.000 {built-in method builtins.next} 59 0.000 0.000 0.000 0.000 {built-in method builtins.iter} 54 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:1009(_handle_fromlist) 1 0.000 0.000 0.000 0.000 api.py:146(_protect_dataset_variables_inplace) 1 0.000 0.000 29.646 29.646 api.py:162(open_dataset) 4 0.000 0.000 0.000 0.000 utils.py:424(_out_array_shape) 4 0.000 0.000 29.609 7.402 indexing.py:1224(__init__) 24 0.000 0.000 0.000 0.000 function_base.py:241(iterable) 4 0.000 0.000 0.000 0.000 dtypes.py:968(is_dtype) 2 0.000 0.000 0.000 0.000 merge.py:257(coerce_pandas_values) 14 0.000 0.000 0.000 0.000 missing.py:105(_isna_new) 8 0.000 0.000 0.000 0.000 variable.py:1840(to_index) 7 0.000 0.000 0.000 0.000 {method 'search' of 're.Pattern' objects} 48 0.000 0.000 0.000 0.000 {method 'rpartition' of 'str' objects} 7 0.000 0.000 0.000 0.000 strings.py:66(decode) 7 0.000 0.000 0.000 0.000 netCDF4_.py:257(_disable_auto_decode_variable) 14 0.000 0.000 0.000 0.000 numerictypes.py:619(issubclass_) 24/4 0.000 0.000 29.609 7.402 numeric.py:433(asarray) 7 0.000 0.000 0.000 0.000 {method 'ncattrs' of 'netCDF4._netCDF4.Variable' objects} 8 0.000 0.000 0.000 0.000 numeric.py:67(_shallow_copy) 8 0.000 0.000 0.000 0.000 indexing.py:373(__init__) 3 0.000 0.000 0.000 0.000 core.py:134(<listcomp>) 14 0.000 0.000 0.000 0.000 merge.py:154(<listcomp>) 16 0.000 0.000 0.000 0.000 dataset.py:816(<genexpr>) 11 0.000 0.000 0.000 0.000 netCDF4_.py:56(get_array) 40 0.000 0.000 0.000 0.000 utils.py:40(_find_dim) 22 0.000 0.000 0.000 0.000 core.py:1893(<genexpr>) 27 0.000 0.000 0.000 0.000 {built-in method builtins.all} 26/10 0.000 0.000 0.000 0.000 {built-in method builtins.sum} 2 0.000 0.000 0.000 0.000 dataset.py:424(attrs) 7 0.000 0.000 0.000 0.000 variables.py:231(decode) 1 0.000 0.000 0.000 0.000 file_manager.py:66(__init__) 67 0.000 0.000 0.000 0.000 utils.py:316(__getitem__) 22 0.000 0.000 0.000 0.000 {method 'move_to_end' of 'collections.OrderedDict' objects} 53 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass} 1 0.000 0.000 0.000 0.000 combine.py:374(_infer_concat_order_from_positions) 7 0.000 0.000 0.000 0.000 dataset.py:1378(selkeys) 1 0.000 0.000 0.001 0.001 dataset.py:1333(chunk) 4 0.000 0.000 29.609 7.402 netCDF4_.py:62(__getitem__) 37 0.000 0.000 0.000 0.000 netCDF4_.py:365(<genexpr>) 18 0.000 0.000 0.000 0.000 {method 'ravel' of 'numpy.ndarray' objects} 2 0.000 0.000 0.000 0.000 alignment.py:37(align) 14 0.000 0.000 0.000 0.000 {pandas._libs.lib.is_scalar} 8 0.000 0.000 0.000 0.000 base.py:1239(_set_names) 16 0.000 0.000 0.000 0.000 indexing.py:314(__init__) 3 0.000 0.000 0.000 0.000 config.py:414(get) 7 0.000 0.000 0.000 0.000 dtypes.py:68(maybe_promote) 8 0.000 0.000 0.000 0.000 variable.py:1856(level_names) 37 0.000 0.000 0.000 0.000 {method 'copy' of 'dict' objects} 6 0.000 0.000 0.000 0.000 re.py:180(search) 6 0.000 0.000 0.000 0.000 re.py:271(_compile) 8 0.000 0.000 0.000 0.000 {built-in method _hashlib.openssl_md5} 1 0.000 0.000 0.000 0.000 merge.py:463(merge) 7 0.000 0.000 0.000 0.000 variables.py:158(<listcomp>) 7 0.000 0.000 0.000 0.000 numerictypes.py:687(issubdtype) 6 0.000 0.000 0.000 0.000 utils.py:510(is_remote_uri) 8 0.000 0.000 0.000 0.000 common.py:1702(is_extension_array_dtype) 25 0.000 0.000 0.000 0.000 indexing.py:645(as_indexable) 21 0.000 0.000 0.000 0.000 {method 'pop' of 'collections.OrderedDict' objects} 19 0.000 0.000 0.000 0.000 {built-in method __new__ of type object at 0x2b324a13e3c0} 1 0.000 0.000 0.001 0.001 dataset.py:1394(<listcomp>) 21 0.000 0.000 0.000 0.000 variables.py:117(pop_to) 1 0.000 0.000 0.032 0.032 netCDF4_.py:320(open) 8 0.000 0.000 0.000 0.000 netCDF4_.py:399(<genexpr>) 12 0.000 0.000 0.000 0.000 __init__.py:221(iteritems) 4 0.000 0.000 0.000 0.000 common.py:403(is_datetime64_dtype) 8 0.000 0.000 0.000 0.000 common.py:1809(_get_dtype) 8 0.000 0.000 0.000 0.000 dtypes.py:68(find) 8 0.000 0.000 0.000 0.000 base.py:3607(values) 22 0.000 0.000 0.000 0.000 pycompat.py:32(move_to_end) 8 0.000 0.000 0.000 0.000 utils.py:792(__exit__) 3 0.000 0.000 0.000 0.000 highlevelgraph.py:84(from_collections) 22 0.000 0.000 0.000 0.000 core.py:1906(<genexpr>) 16 0.000 0.000 0.000 0.000 abc.py:141(__subclasscheck__) 1 0.000 0.000 0.000 0.000 posixpath.py:104(split) 1 0.000 0.000 0.001 0.001 combine.py:479(_auto_combine_all_along_first_dim) 1 0.000 0.000 29.610 29.610 dataset.py:321(__init__) 4 0.000 0.000 0.000 0.000 dataset.py:643(_construct_direct) 7 0.000 0.000 0.000 0.000 variables.py:266(decode) 1 0.000 0.000 0.032 0.032 netCDF4_.py:306(__init__) 14 0.000 0.000 0.000 0.000 numeric.py:504(asanyarray) 4 0.000 0.000 0.000 0.000 common.py:503(is_period_dtype) 8 0.000 0.000 0.000 0.000 common.py:1981(pandas_dtype) 12 0.000 0.000 0.000 0.000 base.py:633(_reset_identity) 11 0.000 0.000 0.000 0.000 pycompat.py:18(iteritems) 16 0.000 0.000 0.000 0.000 utils.py:279(is_integer) 14 0.000 0.000 0.000 0.000 variable.py:268(dtype) 4 0.000 0.000 0.000 0.000 indexing.py:698(_outer_to_numpy_indexer) 42 0.000 0.000 0.000 0.000 variable.py:701(attrs) 9 0.000 0.000 0.000 0.000 {built-in method builtins.any} 1 0.000 0.000 0.000 0.000 posixpath.py:338(normpath) 6 0.000 0.000 0.000 0.000 _collections_abc.py:676(items) 24 0.000 0.000 0.000 0.000 {built-in method math.isnan} 1 0.000 0.000 29.610 29.610 merge.py:360(merge_data_and_coords) 1 0.000 0.000 0.000 0.000 dataset.py:1084(set_coords) 1 0.000 0.000 0.001 0.001 common.py:99(load) 1 0.000 0.000 0.000 0.000 file_manager.py:250(decrement) 4 0.000 0.000 0.000 0.000 locks.py:154(__enter__) 7 0.000 0.000 0.000 0.000 netCDF4_.py:160(_ensure_fill_value_valid) 8 0.000 0.000 0.001 0.000 netCDF4_.py:393(<genexpr>) 8 0.000 0.000 0.000 0.000 common.py:572(is_categorical_dtype) 16 0.000 0.000 0.000 0.000 base.py:75(is_dtype) 72 0.000 0.000 0.000 0.000 indexing.py:327(as_integer_or_none) 26 0.000 0.000 0.000 0.000 utils.py:382(dispatch) 3 0.000 0.000 0.000 0.000 core.py:123(slices_from_chunks) 16 0.000 0.000 0.000 0.000 core.py:768(<genexpr>) 4 0.000 0.000 29.609 7.402 indexing.py:514(__array__) 4 0.000 0.000 0.000 0.000 indexing.py:1146(__init__) 4 0.000 0.000 0.000 0.000 indexing.py:1153(_indexing_array_and_key) 4 0.000 0.000 29.609 7.402 variable.py:400(to_index_variable) 30 0.000 0.000 0.000 0.000 {method 'items' of 'collections.OrderedDict' objects} 16 0.000 0.000 0.000 0.000 {built-in method _abc._abc_subclasscheck} 19 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects} 1 0.000 0.000 0.000 0.000 combine.py:423(_check_shape_tile_ids) 4 0.000 0.000 0.000 0.000 merge.py:91(_assert_compat_valid) 12 0.000 0.000 0.000 0.000 dataset.py:263(<genexpr>) 1 0.000 0.000 29.610 29.610 dataset.py:372(_set_init_vars_and_dims) 3 0.000 0.000 0.000 0.000 dataset.py:413(_attrs_copy) 8 0.000 0.000 0.000 0.000 common.py:120(<genexpr>) 14 0.000 0.000 0.000 0.000 {built-in method pandas._libs.missing.checknull} 4 0.000 0.000 0.000 0.000 common.py:746(is_dtype_equal) 4 0.000 0.000 0.000 0.000 common.py:923(is_signed_integer_dtype) 4 0.000 0.000 0.000 0.000 common.py:1545(is_float_dtype) 14 0.000 0.000 0.000 0.000 missing.py:25(isna) 3 0.000 0.000 0.000 0.000 highlevelgraph.py:71(__init__) 3 0.000 0.000 0.000 0.000 core.py:137(<listcomp>) 33 0.000 0.000 0.000 0.000 core.py:1883(<genexpr>) 35 0.000 0.000 0.000 0.000 variable.py:713(encoding) 2 0.000 0.000 0.000 0.000 {built-in method builtins.min} 16 0.000 0.000 0.000 0.000 _collections_abc.py:719(__iter__) 8 0.000 0.000 0.000 0.000 _collections_abc.py:760(__iter__) 1 0.000 0.000 0.015 0.015 glob.py:9(glob) 2 0.000 0.000 0.015 0.008 glob.py:39(_iglob) 8 0.000 0.000 0.000 0.000 {method 'hexdigest' of '_hashlib.HASH' objects} 1 0.000 0.000 0.000 0.000 combine.py:500(_auto_combine_1d) 14 0.000 0.000 0.000 0.000 merge.py:104(__missing__) 1 0.000 0.000 0.000 0.000 coordinates.py:167(variables) 3 0.000 0.000 0.000 0.000 dataset.py:98(<genexpr>) 4 0.000 0.000 0.000 0.000 dataset.py:402(variables) 1 0.000 0.000 0.000 0.000 netCDF4_.py:269(_disable_auto_decode_group) 12 0.000 0.000 0.032 0.003 netCDF4_.py:357(ds) 1 0.000 0.000 29.646 29.646 api.py:637(<listcomp>) 9 0.000 0.000 0.000 0.000 utils.py:313(__init__) 7 0.000 0.000 0.000 0.000 {method 'filters' of 'netCDF4._netCDF4.Variable' objects} 12 0.000 0.000 0.000 0.000 common.py:117(classes) 8 0.000 0.000 0.000 0.000 common.py:536(is_interval_dtype) 4 0.000 0.000 0.000 0.000 common.py:1078(is_datetime64_any_dtype) 4 0.000 0.000 0.000 0.000 dtypes.py:827(is_dtype) 8 0.000 0.000 0.000 0.000 base.py:551(<dictcomp>) 8 0.000 0.000 0.000 0.000 base.py:547(_get_attributes_dict) 8 0.000 0.000 0.000 0.000 utils.py:789(__enter__) 18 0.000 0.000 0.000 0.000 core.py:903(_get_chunks) 33 0.000 0.000 0.000 0.000 core.py:1885(<genexpr>) 22 0.000 0.000 0.000 0.000 core.py:1889(<genexpr>) 4 0.000 0.000 0.000 0.000 indexing.py:799(_decompose_slice) 4 0.000 0.000 0.000 0.000 indexing.py:1174(__getitem__) 3 0.000 0.000 0.000 0.000 variable.py:294(data) 8 0.000 0.000 0.000 0.000 {method '__enter__' of '_thread.lock' objects} 9 0.000 0.000 0.000 0.000 {built-in method builtins.hash} 4 0.000 0.000 0.000 0.000 {built-in method builtins.max} 4 0.000 0.000 0.000 0.000 {method 'update' of 'set' objects} 7 0.000 0.000 0.000 0.000 {method 'values' of 'dict' objects} 8 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects} 1 0.000 0.000 0.000 0.000 posixpath.py:376(abspath) 1 0.000 0.000 0.000 0.000 genericpath.py:53(getmtime) 4 0.000 0.000 0.000 0.000 _collections_abc.py:657(get) 1 0.000 0.000 0.000 0.000 __init__.py:548(__init__) 1 0.000 0.000 0.000 0.000 __init__.py:617(update) 4/2 0.000 0.000 0.000 0.000 combine.py:392(_infer_tile_ids_from_nested_list) 1 0.000 0.000 0.001 0.001 combine.py:522(_auto_combine) 2 0.000 0.000 0.000 0.000 merge.py:100(__init__) 5 0.000 0.000 0.000 0.000 coordinates.py:38(__iter__) 5 0.000 0.000 0.000 0.000 coordinates.py:169(<genexpr>) 1 0.000 0.000 0.000 0.000 dataset.py:666(_replace_vars_and_dims) 5 0.000 0.000 0.000 0.000 dataset.py:1078(data_vars) 1 0.000 0.000 0.000 0.000 file_manager.py:133(_make_key) 1 0.000 0.000 0.000 0.000 file_manager.py:245(increment) 1 0.000 0.000 0.000 0.000 lru_cache.py:54(__setitem__) 1 0.000 0.000 0.000 0.000 netCDF4_.py:398(get_attrs) 1 0.000 0.000 0.000 0.000 api.py:80(_get_default_engine) 1 0.000 0.000 0.000 0.000 api.py:92(_normalize_path) 8 0.000 0.000 0.000 0.000 {method 'view' of 'numpy.ndarray' objects} 8 0.000 0.000 0.000 0.000 utils.py:187(is_dict_like) 4 0.000 0.000 0.000 0.000 utils.py:219(is_valid_numpy_dtype) 10 0.000 0.000 0.000 0.000 utils.py:319(__iter__) 1 0.000 0.000 0.000 0.000 {method 'filepath' of 'netCDF4._netCDF4.Dataset' objects} 4 0.000 0.000 0.000 0.000 common.py:434(is_datetime64tz_dtype) 3 0.000 0.000 0.000 0.000 config.py:107(normalize_key) 3 0.000 0.000 0.000 0.000 core.py:160(<listcomp>) 6 0.000 0.000 0.000 0.000 core.py:966(ndim) 4 0.000 0.000 0.000 0.000 indexing.py:791(decompose_indexer) 8 0.000 0.000 0.000 0.000 {method '__exit__' of '_thread.lock' objects} 3 0.000 0.000 0.000 0.000 {method 'replace' of 'str' objects} 4 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects} 1 0.000 0.000 0.000 0.000 posixpath.py:121(splitext) 1 0.000 0.000 0.000 0.000 genericpath.py:117(_splitext) 1 0.000 0.000 0.001 0.001 combine.py:443(_combine_nd) 1 0.000 0.000 0.000 0.000 combine.py:508(<listcomp>) 14 0.000 0.000 0.000 0.000 merge.py:41(unique_variable) 11 0.000 0.000 0.000 0.000 coordinates.py:163(_names) 1 0.000 0.000 0.000 0.000 dataset.py:2593(_assert_all_in_dataset) 1 0.000 0.000 0.000 0.000 variables.py:55(__init__) 1 0.000 0.000 0.000 0.000 file_manager.py:269(__init__) 29 0.000 0.000 0.000 0.000 file_manager.py:273(__hash__) 1 0.000 0.000 0.001 0.001 netCDF4_.py:392(get_variables) 1 0.000 0.000 0.000 0.000 netCDF4_.py:410(<setcomp>) 7 0.000 0.000 0.000 0.000 {method 'set_auto_chartostring' of 'netCDF4._netCDF4.Variable' objects} 1 0.000 0.000 0.000 0.000 {method 'ncattrs' of 'netCDF4._netCDF4.Dataset' objects} 4 0.000 0.000 0.000 0.000 common.py:472(is_timedelta64_dtype) 4 0.000 0.000 0.000 0.000 common.py:980(is_unsigned_integer_dtype) 4 0.000 0.000 0.000 0.000 base.py:3805(_coerce_to_ndarray) 3 0.000 0.000 0.000 0.000 itertoolz.py:241(unique) 11 0.000 0.000 0.000 0.000 core.py:137(<genexpr>) 3 0.000 0.000 0.000 0.000 indexing.py:600(__init__) 2 0.000 0.000 0.000 0.000 {method 'keys' of 'collections.OrderedDict' objects} 2 0.000 0.000 0.000 0.000 {built-in method _thread.allocate_lock} 1 0.000 0.000 0.000 0.000 {built-in method _collections._count_elements} 8 0.000 0.000 0.000 0.000 {method 'encode' of 'str' objects} 3 0.000 0.000 0.000 0.000 {method 'rfind' of 'str' objects} 8 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects} 3 0.000 0.000 0.000 0.000 {method 'intersection' of 'set' objects} 7 0.000 0.000 0.000 0.000 {method 'setdefault' of 'dict' objects} 13 0.000 0.000 0.000 0.000 {method 'pop' of 'dict' objects} 1 0.000 0.000 0.000 0.000 posixpath.py:64(isabs) 1 0.000 0.000 0.015 0.015 posixpath.py:178(lexists) 1 0.000 0.000 0.000 0.000 posixpath.py:232(expanduser) 2 0.000 0.000 0.000 0.000 _collections_abc.py:672(keys) 7 0.000 0.000 0.000 0.000 contextlib.py:352(__init__) 7 0.000 0.000 0.000 0.000 contextlib.py:355(__enter__) 2 0.000 0.000 0.000 0.000 combine.py:496(vars_as_keys) 2 0.000 0.000 0.000 0.000 combine.py:517(_new_tile_id) 7 0.000 0.000 0.000 0.000 common.py:29(_decode_variable_name) 1 0.000 0.000 0.000 0.000 coordinates.py:160(__init__) 3 0.000 0.000 0.000 0.000 dataset.py:262(__iter__) 2 0.000 0.000 0.000 0.000 dataset.py:266(__len__) 2 0.000 0.000 0.000 0.000 dataset.py:940(__iter__) 1 0.000 0.000 0.000 0.000 dataset.py:1071(coords) 7 0.000 0.000 0.000 0.000 dataset.py:1381(<genexpr>) 4 0.000 0.000 0.000 0.000 variables.py:61(dtype) 1 0.000 0.000 0.000 0.000 file_manager.py:189(__del__) 1 0.000 0.000 0.000 0.000 lru_cache.py:47(_enforce_size_limit) 1 0.000 0.000 0.000 0.000 netCDF4_.py:138(_nc4_require_group) 1 0.000 0.000 0.000 0.000 netCDF4_.py:408(get_encoding) 1 0.000 0.000 0.000 0.000 api.py:66(_get_default_engine_netcdf) 4 0.000 0.000 0.000 0.000 utils.py:197(<genexpr>) 1 0.000 0.000 0.000 0.000 alignment.py:17(_get_joiner) 10 0.000 0.000 0.000 0.000 alignment.py:184(is_alignable) 5 0.000 0.000 0.000 0.000 alignment.py:226(<genexpr>) 5 0.000 0.000 0.000 0.000 utils.py:325(__contains__) 5 0.000 0.000 0.000 0.000 {method 'isunlimited' of 'netCDF4._netCDF4.Dimension' objects} 8 0.000 0.000 0.000 0.000 inference.py:435(is_hashable) 12 0.000 0.000 0.000 0.000 common.py:119(<lambda>) 8 0.000 0.000 0.000 0.000 common.py:127(<lambda>) 8 0.000 0.000 0.000 0.000 common.py:122(classes_and_not_datetimelike) 4 0.000 0.000 0.000 0.000 base.py:675(dtype) 8 0.000 0.000 0.000 0.000 base.py:1395(nlevels) 24 0.000 0.000 0.000 0.000 functoolz.py:15(identity) 1 0.000 0.000 0.000 0.000 base.py:610(normalize_dict) 1 0.000 0.000 0.000 0.000 base.py:625(normalize_seq) 3 0.000 0.000 0.000 0.000 indexing.py:453(__init__) 4 0.000 0.000 0.000 0.000 indexing.py:713(<listcomp>) 3 0.000 0.000 0.000 0.000 variable.py:821(chunks) 4 0.000 0.000 0.000 0.000 variable.py:1731(chunk) 8 0.000 0.000 0.000 0.000 variable.py:1874(name) 3 0.000 0.000 0.000 0.000 {method 'values' of 'collections.OrderedDict' objects} 6 0.000 0.000 0.000 0.000 {built-in method posix.fspath} 1 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects} 4 0.000 0.000 0.000 0.000 {method 'startswith' of 'str' objects} 3 0.000 0.000 0.000 0.000 {method 'copy' of 'set' objects} 1 0.000 0.000 0.000 0.000 {method 'union' of 'set' objects} 1 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects} 2 0.000 0.000 0.000 0.000 posixpath.py:41(_get_sep) 1 0.000 0.000 0.000 0.000 _collections_abc.py:680(values) 9 0.000 0.000 0.000 0.000 _collections_abc.py:698(__init__) 7 0.000 0.000 0.000 0.000 contextlib.py:358(__exit__) 1 0.000 0.000 0.000 0.000 glob.py:145(has_magic) 1 0.000 0.000 0.000 0.000 combine.py:428(<listcomp>) 2 0.000 0.000 0.000 0.000 merge.py:301(_get_priority_vars) 1 0.000 0.000 0.000 0.000 merge.py:370(extract_indexes) 1 0.000 0.000 0.000 0.000 merge.py:378(assert_valid_explicit_coords) 5 0.000 0.000 0.000 0.000 dataset.py:259(__init__) 1 0.000 0.000 0.000 0.000 dataset.py:375(<listcomp>) 2 0.000 0.000 0.000 0.000 dataset.py:416(attrs) 5 0.000 0.000 0.000 0.000 dataset.py:428(encoding) 1 0.000 0.000 0.000 0.000 dataset.py:436(encoding) 1 0.000 0.000 0.000 0.000 dataset.py:1373(<listcomp>) 1 0.000 0.000 0.000 0.000 variables.py:76(lazy_elemwise_func) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 7 0.000 0.000 0.000 0.000 strings.py:39(__init__) 1 0.000 0.000 0.000 0.000 file_manager.py:241(__init__) 1 0.000 0.000 0.000 0.000 locks.py:206(ensure_lock) 1 0.000 0.000 0.000 0.000 netCDF4_.py:236(__init__) 1 0.000 0.000 0.000 0.000 api.py:638(<listcomp>) 1 0.000 0.000 0.000 0.000 utils.py:452(_tostr) 7 0.000 0.000 0.000 0.000 {method 'set_auto_maskandscale' of 'netCDF4._netCDF4.Variable' objects} 1 0.000 0.000 0.000 0.000 utils.py:514(is_grib_path) 3 0.000 0.000 0.000 0.000 core.py:989(name) 8 0.000 0.000 0.000 0.000 variable.py:1834(to_index_variable) 1 0.000 0.000 0.000 0.000 {method 'rstrip' of 'str' objects} 1 0.000 0.000 0.000 0.000 {method 'endswith' of 'str' objects} 1 0.000 0.000 0.000 0.000 {method 'keys' of 'dict' objects} 1 0.000 0.000 0.000 0.000 glob.py:22(iglob) 2 0.000 0.000 0.000 0.000 variable.py:2007(<listcomp>) 1 0.000 0.000 0.000 0.000 combine.py:345(_auto_concat) 1 0.000 0.000 0.000 0.000 combine.py:435(<listcomp>) 1 0.000 0.000 0.000 0.000 merge.py:519(<listcomp>) 2 0.000 0.000 0.000 0.000 dataset.py:934(__len__) 2 0.000 0.000 0.000 0.000 variables.py:106(safe_setitem) 1 0.000 0.000 0.000 0.000 api.py:479(__init__) 1 0.000 0.000 0.000 0.000 utils.py:20(_check_inplace) 7 0.000 0.000 0.000 0.000 {method 'chunking' of 'netCDF4._netCDF4.Variable' objects} 4 0.000 0.000 0.000 0.000 utils.py:498(close_on_error) 1 0.000 0.000 0.000 0.000 numeric.py:101(_assert_safe_casting) 3 0.000 0.000 0.000 0.000 core.py:167(<listcomp>)

Output of ds: <xarray.Dataset> Dimensions: (bnds: 2, lat: 360, level: 23, lon: 576, time: 1827) Coordinates: * lat (lat) float64 -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25 89.75 * level (level) float32 1000.0 925.0 850.0 775.0 700.0 ... 5.0 3.0 2.0 1.0 * lon (lon) float64 0.3125 0.9375 1.562 2.188 ... 358.4 359.1 359.7 * time (time) float64 7.671e+03 7.672e+03 ... 9.496e+03 9.497e+03 Dimensions without coordinates: bnds Data variables: lat_bnds (lat, bnds) float64 dask.array<shape=(360, 2), chunksize=(360, 2)> lon_bnds (lon, bnds) float64 dask.array<shape=(576, 2), chunksize=(576, 2)> sphum (time, level, lat, lon) float32 dask.array<shape=(1827, 23, 360, 576), chunksize=(1827, 23, 360, 576)>

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance with open_mfdataset 224553135
464113917 https://github.com/pydata/xarray/issues/1385#issuecomment-464113917 https://api.github.com/repos/pydata/xarray/issues/1385 MDEyOklzc3VlQ29tbWVudDQ2NDExMzkxNw== chuaxr 30007270 2019-02-15T16:34:02Z 2019-02-15T16:34:35Z NONE

On a related note, is it possible to clear out the memory used by the xarray dataset after it is no longer needed?

Here's an example:

python fname = '/work/xrc/AM4_xrc/c192L33_am4p0_cmip6Diag/daily/5yr/atmos.19800101-19841231.ucomp.nc'

python import xarray as xr

python with xr.set_options(file_cache_maxsize=1): %time ds = xr.open_mfdataset(fname)

CPU times: user 48 ms, sys: 124 ms, total: 172 ms
Wall time: 29.7 s

```python

fname2 = '/work/xrc/AM4_xrc/c192L33_am4p0_cmip6Diag/daily/5yr/atmos.20100101-20141231.ucomp.nc' ```

```python

with xr.set_options(file_cache_maxsize=1): %time ds = xr.open_mfdataset(fname2) # would like this to free up memory used by fname ```

CPU times: user 39 ms, sys: 124 ms, total: 163 ms
Wall time: 28.8 s

python import gc gc.collect()

```python

with xr.set_options(file_cache_maxsize=1): # expected to take same time as first call %time ds = xr.open_mfdataset(fname) ```

CPU times: user 28 ms, sys: 10 ms, total: 38 ms
Wall time: 37.9 ms
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance with open_mfdataset 224553135
463367754 https://github.com/pydata/xarray/issues/1385#issuecomment-463367754 https://api.github.com/repos/pydata/xarray/issues/1385 MDEyOklzc3VlQ29tbWVudDQ2MzM2Nzc1NA== chuaxr 30007270 2019-02-13T20:58:52Z 2019-02-13T20:59:06Z NONE

It seems my issue has to do with the time coordinate:
```

fname = '/work/xrc/AM4_xrc/c192L33_am4p0_cmip6Diag/daily/5yr/atmos.20100101-20141231.sphum.nc' %prun ds = xr.open_mfdataset(fname,drop_variables='time') 7510 function calls (7366 primitive calls) in 0.068 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function) 1 0.039 0.039 0.039 0.039 netCDF4_.py:244(_open_netcdf4_group) 3 0.022 0.007 0.022 0.007 {built-in method _operator.getitem} 1 0.001 0.001 0.001 0.001 {built-in method posix.lstat} 125/113 0.000 0.000 0.001 0.000 indexing.py:504(shape) 11 0.000 0.000 0.000 0.000 core.py:137(<genexpr>)

fname = '/work/xrc/AM4_xrc/c192L33_am4p0_cmip6Diag/daily/5yr/atmos.20000101-20041231.sphum.nc' %prun ds = xr.open_mfdataset(fname)

      13143 function calls (12936 primitive calls) in 23.853 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function) 6 23.791 3.965 23.791 3.965 {built-in method operator.getitem} 1 0.029 0.029 0.029 0.029 netCDF4.py:244(_open_netcdf4_group) 2 0.023 0.012 0.023 0.012 {cftime._cftime.num2date} 1 0.001 0.001 0.001 0.001 {built-in method posix.lstat} 158/139 0.000 0.000 0.001 0.000 indexing.py:504(shape) ```

Both files are 33 GB. This is using xarray 0.11.3.

I also confirm that nc.MFDataset is much faster (<1s).

Is there any speed-up for the time coordinates possible, given that my data follows a standard calendar? (Short of using drop_variables='time' and then manually adding the time coordinate...)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance with open_mfdataset 224553135
439478904 https://github.com/pydata/xarray/issues/1385#issuecomment-439478904 https://api.github.com/repos/pydata/xarray/issues/1385 MDEyOklzc3VlQ29tbWVudDQzOTQ3ODkwNA== chuaxr 30007270 2018-11-16T18:10:53Z 2018-11-16T18:10:53Z NONE

h5netcdf fails with the following error (presumably the file is not compatible): ``` /nbhome/xrc/anaconda2/envs/py361/lib/python3.6/site-packages/h5py/_hl/files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr) 97 if swmr and swmr_support: 98 flags |= h5f.ACC_SWMR_READ ---> 99 fid = h5f.open(name, flags, fapl=fapl) 100 elif mode == 'r+': 101 fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5f.pyx in h5py.h5f.open()

OSError: Unable to open file (file signature not found) Using scipy: ncalls tottime percall cumtime percall filename:lineno(function) 65/42 80.448 1.238 80.489 1.916 {built-in method numpy.core.multiarray.array} 764838 0.548 0.000 0.548 0.000 core.py:169(<genexpr>) 3 0.169 0.056 0.717 0.239 core.py:169(<listcomp>) 2 0.041 0.021 0.041 0.021 {cftime._cftime.num2date} 3 0.038 0.013 0.775 0.258 core.py:173(getem) 1 0.024 0.024 81.313 81.313 <string>:1(<module>) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance with open_mfdataset 224553135
439445695 https://github.com/pydata/xarray/issues/1385#issuecomment-439445695 https://api.github.com/repos/pydata/xarray/issues/1385 MDEyOklzc3VlQ29tbWVudDQzOTQ0NTY5NQ== chuaxr 30007270 2018-11-16T16:20:25Z 2018-11-16T16:20:25Z NONE

Sorry, I think the speedup had to do with accessing a file that had previously been loaded rather than due to decode_cf. Here's the output of prun using two different files of approximately the same size (~75 GB), run from a notebook without using distributed (which doesn't lead to any speedup):

Output of %prun ds = xr.open_mfdataset('/work/xrc/AM4_skc/atmos_level.1999010100-2000123123.sphum.nc',chunks={'lat':20,'time':50,'lon':12,'pfull':11})

```

      780980 function calls (780741 primitive calls) in 55.374 seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     7   54.448    7.778   54.448    7.778 {built-in method _operator.getitem}
764838    0.473    0.000    0.473    0.000 core.py:169(<genexpr>)
     3    0.285    0.095    0.758    0.253 core.py:169(<listcomp>)
     2    0.041    0.020    0.041    0.020 {cftime._cftime.num2date}
     3    0.040    0.013    0.821    0.274 core.py:173(getem)
     1    0.027    0.027   55.374   55.374 <string>:1(<module>)

Output of %prun ds = xr.open_mfdataset('/work/xrc/AM4_skc/atmos_level.2001010100-2002123123.temp.nc',chunks={'lat':20,'time':50,'lon':12,'pfull':11},\ decode_cf=False)

      772212 function calls (772026 primitive calls) in 56.000 seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     5   55.213   11.043   55.214   11.043 {built-in method _operator.getitem}
764838    0.486    0.000    0.486    0.000 core.py:169(<genexpr>)
     3    0.185    0.062    0.671    0.224 core.py:169(<listcomp>)
     3    0.041    0.014    0.735    0.245 core.py:173(getem)
     1    0.027    0.027   56.001   56.001 <string>:1(<module>)

```

/work isn't a remote archive, so it surprises me that this should happen.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance with open_mfdataset 224553135
439042364 https://github.com/pydata/xarray/issues/1385#issuecomment-439042364 https://api.github.com/repos/pydata/xarray/issues/1385 MDEyOklzc3VlQ29tbWVudDQzOTA0MjM2NA== chuaxr 30007270 2018-11-15T13:37:16Z 2018-11-15T14:06:04Z NONE

Yes, I'm on 0.11.

Nothing displays on the task stream/ progress bar when using open_mfdataset, although I can monitor progress when, say, computing the mean.

The output from %time using decode_cf = False is CPU times: user 4.42 s, sys: 392 ms, total: 4.82 s Wall time: 4.74 s

and for decode_cf = True: CPU times: user 11.6 s, sys: 1.61 s, total: 13.2 s Wall time: 3min 28s

Using xr.set_options(file_cache_maxsize=1) doesn't make any noticeable difference.

If I repeat the open_mfdataset for another 5 files (after opening the first 5), I occasionally get this warning: distributed.utils_perf - WARNING - full garbage collections took 24% CPU time recently (threshold: 10%)

I only began using the dashboard recently; please let me know if there's something basic I'm missing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance with open_mfdataset 224553135
438870575 https://github.com/pydata/xarray/issues/1385#issuecomment-438870575 https://api.github.com/repos/pydata/xarray/issues/1385 MDEyOklzc3VlQ29tbWVudDQzODg3MDU3NQ== chuaxr 30007270 2018-11-15T00:32:42Z 2018-11-15T00:32:42Z NONE

I can confirm that ds = xr.open_mfdataset(data_fnames,chunks={'lat':20,'time':50,'lon':24,'pfull':11},\ decode_cf=False) ds = xr.decode_cf(ds) is much faster (seconds vs minutes) than

``` ds = xr.open_mfdataset(data_fnames,chunks={'lat':20,'time':50,'lon':24,'pfull':11})

``` . For reference, data_fnames is a list of 5 files, each of which is ~75 GB.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance with open_mfdataset 224553135

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 721.629ms · About: xarray-datasette