issue_comments
7 rows where author_association = "NONE", issue = 224553135 and user = 30007270 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- slow performance with open_mfdataset · 7 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
464100720 | https://github.com/pydata/xarray/issues/1385#issuecomment-464100720 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDQ2NDEwMDcyMA== | chuaxr 30007270 | 2019-02-15T15:57:01Z | 2019-02-15T18:33:31Z | NONE | In that case, the speedup disappears. It seems that the slowdown arises from the entire time array being loaded into memory at once. EDIT: I subsequently realized that using drop_variables = 'time' caused all the data values to become nan, which makes that an invalid option. ``` %prun ds = xr.open_mfdataset(fname,decode_times=False) 8025 function calls (7856 primitive calls) in 29.662 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 4 29.608 7.402 29.608 7.402 {built-in method operator.getitem} 1 0.032 0.032 0.032 0.032 netCDF4.py:244(_open_netcdf4_group) 1 0.015 0.015 0.015 0.015 {built-in method posix.lstat} 126/114 0.000 0.000 0.001 0.000 indexing.py:504(shape) 1196 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance} 81 0.000 0.000 0.001 0.000 variable.py:239(init) ``` See the rest of the prun output under the Details for more information:
30 0.000 0.000 0.000 0.000 {method 'getncattr' of 'netCDF4._netCDF4.Variable' objects}
81 0.000 0.000 0.000 0.000 variable.py:709(attrs)
736/672 0.000 0.000 0.000 0.000 {built-in method builtins.len}
157 0.000 0.000 0.001 0.000 utils.py:450(ndim)
81 0.000 0.000 0.001 0.000 variable.py:417(_parse_dimensions)
7 0.000 0.000 0.001 0.000 netCDF4_.py:361(open_store_variable)
4 0.000 0.000 0.000 0.000 base.py:253(__new__)
1 0.000 0.000 29.662 29.662 <string>:1(<module>)
7 0.000 0.000 0.001 0.000 conventions.py:245(decode_cf_variable)
39/19 0.000 0.000 29.609 1.558 {built-in method numpy.core.multiarray.array}
9 0.000 0.000 0.000 0.000 core.py:1776(normalize_chunks)
104 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
143 0.000 0.000 0.001 0.000 variable.py:272(shape)
4 0.000 0.000 0.000 0.000 utils.py:88(_StartCountStride)
8 0.000 0.000 0.000 0.000 core.py:747(blockdims_from_blockshape)
23 0.000 0.000 0.032 0.001 file_manager.py:150(acquire)
8 0.000 0.000 0.000 0.000 base.py:590(tokenize)
84 0.000 0.000 0.000 0.000 variable.py:137(as_compatible_data)
268 0.000 0.000 0.000 0.000 {method 'indices' of 'slice' objects}
14 0.000 0.000 29.610 2.115 variable.py:41(as_variable)
35 0.000 0.000 0.000 0.000 variables.py:102(unpack_for_decoding)
81 0.000 0.000 0.000 0.000 variable.py:721(encoding)
192 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}
2 0.000 0.000 0.000 0.000 merge.py:109(merge_variables)
2 0.000 0.000 29.610 14.805 merge.py:392(merge_core)
7 0.000 0.000 0.000 0.000 variables.py:161(<setcomp>)
103 0.000 0.000 0.000 0.000 {built-in method _abc._abc_instancecheck}
1 0.000 0.000 0.001 0.001 conventions.py:351(decode_cf_variables)
3 0.000 0.000 0.000 0.000 dataset.py:90(calculate_dimensions)
1 0.000 0.000 0.000 0.000 {built-in method posix.stat}
361 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
20 0.000 0.000 0.000 0.000 variable.py:728(copy)
23 0.000 0.000 0.000 0.000 lru_cache.py:40(__getitem__)
12 0.000 0.000 0.000 0.000 base.py:504(_simple_new)
2 0.000 0.000 0.000 0.000 variable.py:1985(assert_unique_multiindex_level_names)
2 0.000 0.000 0.000 0.000 alignment.py:172(deep_align)
14 0.000 0.000 0.000 0.000 indexing.py:469(__init__)
16 0.000 0.000 29.609 1.851 variable.py:1710(__init__)
1 0.000 0.000 29.662 29.662 {built-in method builtins.exec}
25 0.000 0.000 0.000 0.000 contextlib.py:81(__init__)
7 0.000 0.000 0.000 0.000 {method 'getncattr' of 'netCDF4._netCDF4.Dataset' objects}
24 0.000 0.000 0.000 0.000 indexing.py:331(as_integer_slice)
50/46 0.000 0.000 0.000 0.000 common.py:181(__setattr__)
7 0.000 0.000 0.000 0.000 variables.py:155(decode)
4 0.000 0.000 29.609 7.402 indexing.py:760(explicit_indexing_adapter)
48 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:416(parent)
103 0.000 0.000 0.000 0.000 abc.py:137(__instancecheck__)
48 0.000 0.000 0.000 0.000 _collections_abc.py:742(__iter__)
180 0.000 0.000 0.000 0.000 variable.py:411(dims)
4 0.000 0.000 0.000 0.000 locks.py:158(__exit__)
3 0.000 0.000 0.001 0.000 core.py:2048(from_array)
1 0.000 0.000 29.612 29.612 conventions.py:412(decode_cf)
4 0.000 0.000 0.000 0.000 utils.py:50(_maybe_cast_to_cftimeindex)
77/59 0.000 0.000 0.000 0.000 utils.py:473(dtype)
84 0.000 0.000 0.000 0.000 generic.py:7(_check)
146 0.000 0.000 0.000 0.000 indexing.py:319(tuple)
7 0.000 0.000 0.000 0.000 netCDF4_.py:34(__init__)
1 0.000 0.000 29.614 29.614 api.py:270(maybe_decode_store)
1 0.000 0.000 29.662 29.662 api.py:487(open_mfdataset)
20 0.000 0.000 0.000 0.000 common.py:1845(_is_dtype_type)
33 0.000 0.000 0.000 0.000 core.py:1911(<genexpr>)
84 0.000 0.000 0.000 0.000 variable.py:117(_maybe_wrap_data)
3 0.000 0.000 0.001 0.000 variable.py:830(chunk)
25 0.000 0.000 0.000 0.000 contextlib.py:237(helper)
36/25 0.000 0.000 0.000 0.000 utils.py:477(shape)
8 0.000 0.000 0.000 0.000 base.py:566(_shallow_copy)
8 0.000 0.000 0.000 0.000 indexing.py:346(__init__)
26/25 0.000 0.000 0.000 0.000 utils.py:408(__call__)
4 0.000 0.000 0.000 0.000 indexing.py:886(_decompose_outer_indexer)
2 0.000 0.000 29.610 14.805 merge.py:172(expand_variable_dicts)
4 0.000 0.000 29.608 7.402 netCDF4_.py:67(_getitem)
2 0.000 0.000 0.000 0.000 dataset.py:722(copy)
7 0.000 0.000 0.001 0.000 dataset.py:1383(maybe_chunk)
16 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.empty}
14 0.000 0.000 0.000 0.000 fromnumeric.py:1471(ravel)
60 0.000 0.000 0.000 0.000 base.py:652(__len__)
3 0.000 0.000 0.000 0.000 core.py:141(getem)
25 0.000 0.000 0.000 0.000 contextlib.py:116(__exit__)
4 0.000 0.000 29.609 7.402 utils.py:62(safe_cast_to_index)
18 0.000 0.000 0.000 0.000 core.py:891(shape)
25 0.000 0.000 0.000 0.000 contextlib.py:107(__enter__)
4 0.000 0.000 0.001 0.000 utils.py:332(FrozenOrderedDict)
8 0.000 0.000 0.000 0.000 base.py:1271(set_names)
4 0.000 0.000 0.000 0.000 numeric.py:34(__new__)
24 0.000 0.000 0.000 0.000 inference.py:253(is_list_like)
3 0.000 0.000 0.000 0.000 core.py:820(__new__)
12 0.000 0.000 0.000 0.000 variable.py:1785(copy)
36 0.000 0.000 0.000 0.000 {method 'copy' of 'collections.OrderedDict' objects}
8/7 0.000 0.000 0.000 0.000 {built-in method builtins.sorted}
2 0.000 0.000 0.000 0.000 merge.py:220(determine_coords)
46 0.000 0.000 0.000 0.000 file_manager.py:141(_optional_lock)
60 0.000 0.000 0.000 0.000 indexing.py:1252(shape)
50 0.000 0.000 0.000 0.000 {built-in method builtins.next}
59 0.000 0.000 0.000 0.000 {built-in method builtins.iter}
54 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:1009(_handle_fromlist)
1 0.000 0.000 0.000 0.000 api.py:146(_protect_dataset_variables_inplace)
1 0.000 0.000 29.646 29.646 api.py:162(open_dataset)
4 0.000 0.000 0.000 0.000 utils.py:424(_out_array_shape)
4 0.000 0.000 29.609 7.402 indexing.py:1224(__init__)
24 0.000 0.000 0.000 0.000 function_base.py:241(iterable)
4 0.000 0.000 0.000 0.000 dtypes.py:968(is_dtype)
2 0.000 0.000 0.000 0.000 merge.py:257(coerce_pandas_values)
14 0.000 0.000 0.000 0.000 missing.py:105(_isna_new)
8 0.000 0.000 0.000 0.000 variable.py:1840(to_index)
7 0.000 0.000 0.000 0.000 {method 'search' of 're.Pattern' objects}
48 0.000 0.000 0.000 0.000 {method 'rpartition' of 'str' objects}
7 0.000 0.000 0.000 0.000 strings.py:66(decode)
7 0.000 0.000 0.000 0.000 netCDF4_.py:257(_disable_auto_decode_variable)
14 0.000 0.000 0.000 0.000 numerictypes.py:619(issubclass_)
24/4 0.000 0.000 29.609 7.402 numeric.py:433(asarray)
7 0.000 0.000 0.000 0.000 {method 'ncattrs' of 'netCDF4._netCDF4.Variable' objects}
8 0.000 0.000 0.000 0.000 numeric.py:67(_shallow_copy)
8 0.000 0.000 0.000 0.000 indexing.py:373(__init__)
3 0.000 0.000 0.000 0.000 core.py:134(<listcomp>)
14 0.000 0.000 0.000 0.000 merge.py:154(<listcomp>)
16 0.000 0.000 0.000 0.000 dataset.py:816(<genexpr>)
11 0.000 0.000 0.000 0.000 netCDF4_.py:56(get_array)
40 0.000 0.000 0.000 0.000 utils.py:40(_find_dim)
22 0.000 0.000 0.000 0.000 core.py:1893(<genexpr>)
27 0.000 0.000 0.000 0.000 {built-in method builtins.all}
26/10 0.000 0.000 0.000 0.000 {built-in method builtins.sum}
2 0.000 0.000 0.000 0.000 dataset.py:424(attrs)
7 0.000 0.000 0.000 0.000 variables.py:231(decode)
1 0.000 0.000 0.000 0.000 file_manager.py:66(__init__)
67 0.000 0.000 0.000 0.000 utils.py:316(__getitem__)
22 0.000 0.000 0.000 0.000 {method 'move_to_end' of 'collections.OrderedDict' objects}
53 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass}
1 0.000 0.000 0.000 0.000 combine.py:374(_infer_concat_order_from_positions)
7 0.000 0.000 0.000 0.000 dataset.py:1378(selkeys)
1 0.000 0.000 0.001 0.001 dataset.py:1333(chunk)
4 0.000 0.000 29.609 7.402 netCDF4_.py:62(__getitem__)
37 0.000 0.000 0.000 0.000 netCDF4_.py:365(<genexpr>)
18 0.000 0.000 0.000 0.000 {method 'ravel' of 'numpy.ndarray' objects}
2 0.000 0.000 0.000 0.000 alignment.py:37(align)
14 0.000 0.000 0.000 0.000 {pandas._libs.lib.is_scalar}
8 0.000 0.000 0.000 0.000 base.py:1239(_set_names)
16 0.000 0.000 0.000 0.000 indexing.py:314(__init__)
3 0.000 0.000 0.000 0.000 config.py:414(get)
7 0.000 0.000 0.000 0.000 dtypes.py:68(maybe_promote)
8 0.000 0.000 0.000 0.000 variable.py:1856(level_names)
37 0.000 0.000 0.000 0.000 {method 'copy' of 'dict' objects}
6 0.000 0.000 0.000 0.000 re.py:180(search)
6 0.000 0.000 0.000 0.000 re.py:271(_compile)
8 0.000 0.000 0.000 0.000 {built-in method _hashlib.openssl_md5}
1 0.000 0.000 0.000 0.000 merge.py:463(merge)
7 0.000 0.000 0.000 0.000 variables.py:158(<listcomp>)
7 0.000 0.000 0.000 0.000 numerictypes.py:687(issubdtype)
6 0.000 0.000 0.000 0.000 utils.py:510(is_remote_uri)
8 0.000 0.000 0.000 0.000 common.py:1702(is_extension_array_dtype)
25 0.000 0.000 0.000 0.000 indexing.py:645(as_indexable)
21 0.000 0.000 0.000 0.000 {method 'pop' of 'collections.OrderedDict' objects}
19 0.000 0.000 0.000 0.000 {built-in method __new__ of type object at 0x2b324a13e3c0}
1 0.000 0.000 0.001 0.001 dataset.py:1394(<listcomp>)
21 0.000 0.000 0.000 0.000 variables.py:117(pop_to)
1 0.000 0.000 0.032 0.032 netCDF4_.py:320(open)
8 0.000 0.000 0.000 0.000 netCDF4_.py:399(<genexpr>)
12 0.000 0.000 0.000 0.000 __init__.py:221(iteritems)
4 0.000 0.000 0.000 0.000 common.py:403(is_datetime64_dtype)
8 0.000 0.000 0.000 0.000 common.py:1809(_get_dtype)
8 0.000 0.000 0.000 0.000 dtypes.py:68(find)
8 0.000 0.000 0.000 0.000 base.py:3607(values)
22 0.000 0.000 0.000 0.000 pycompat.py:32(move_to_end)
8 0.000 0.000 0.000 0.000 utils.py:792(__exit__)
3 0.000 0.000 0.000 0.000 highlevelgraph.py:84(from_collections)
22 0.000 0.000 0.000 0.000 core.py:1906(<genexpr>)
16 0.000 0.000 0.000 0.000 abc.py:141(__subclasscheck__)
1 0.000 0.000 0.000 0.000 posixpath.py:104(split)
1 0.000 0.000 0.001 0.001 combine.py:479(_auto_combine_all_along_first_dim)
1 0.000 0.000 29.610 29.610 dataset.py:321(__init__)
4 0.000 0.000 0.000 0.000 dataset.py:643(_construct_direct)
7 0.000 0.000 0.000 0.000 variables.py:266(decode)
1 0.000 0.000 0.032 0.032 netCDF4_.py:306(__init__)
14 0.000 0.000 0.000 0.000 numeric.py:504(asanyarray)
4 0.000 0.000 0.000 0.000 common.py:503(is_period_dtype)
8 0.000 0.000 0.000 0.000 common.py:1981(pandas_dtype)
12 0.000 0.000 0.000 0.000 base.py:633(_reset_identity)
11 0.000 0.000 0.000 0.000 pycompat.py:18(iteritems)
16 0.000 0.000 0.000 0.000 utils.py:279(is_integer)
14 0.000 0.000 0.000 0.000 variable.py:268(dtype)
4 0.000 0.000 0.000 0.000 indexing.py:698(_outer_to_numpy_indexer)
42 0.000 0.000 0.000 0.000 variable.py:701(attrs)
9 0.000 0.000 0.000 0.000 {built-in method builtins.any}
1 0.000 0.000 0.000 0.000 posixpath.py:338(normpath)
6 0.000 0.000 0.000 0.000 _collections_abc.py:676(items)
24 0.000 0.000 0.000 0.000 {built-in method math.isnan}
1 0.000 0.000 29.610 29.610 merge.py:360(merge_data_and_coords)
1 0.000 0.000 0.000 0.000 dataset.py:1084(set_coords)
1 0.000 0.000 0.001 0.001 common.py:99(load)
1 0.000 0.000 0.000 0.000 file_manager.py:250(decrement)
4 0.000 0.000 0.000 0.000 locks.py:154(__enter__)
7 0.000 0.000 0.000 0.000 netCDF4_.py:160(_ensure_fill_value_valid)
8 0.000 0.000 0.001 0.000 netCDF4_.py:393(<genexpr>)
8 0.000 0.000 0.000 0.000 common.py:572(is_categorical_dtype)
16 0.000 0.000 0.000 0.000 base.py:75(is_dtype)
72 0.000 0.000 0.000 0.000 indexing.py:327(as_integer_or_none)
26 0.000 0.000 0.000 0.000 utils.py:382(dispatch)
3 0.000 0.000 0.000 0.000 core.py:123(slices_from_chunks)
16 0.000 0.000 0.000 0.000 core.py:768(<genexpr>)
4 0.000 0.000 29.609 7.402 indexing.py:514(__array__)
4 0.000 0.000 0.000 0.000 indexing.py:1146(__init__)
4 0.000 0.000 0.000 0.000 indexing.py:1153(_indexing_array_and_key)
4 0.000 0.000 29.609 7.402 variable.py:400(to_index_variable)
30 0.000 0.000 0.000 0.000 {method 'items' of 'collections.OrderedDict' objects}
16 0.000 0.000 0.000 0.000 {built-in method _abc._abc_subclasscheck}
19 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}
1 0.000 0.000 0.000 0.000 combine.py:423(_check_shape_tile_ids)
4 0.000 0.000 0.000 0.000 merge.py:91(_assert_compat_valid)
12 0.000 0.000 0.000 0.000 dataset.py:263(<genexpr>)
1 0.000 0.000 29.610 29.610 dataset.py:372(_set_init_vars_and_dims)
3 0.000 0.000 0.000 0.000 dataset.py:413(_attrs_copy)
8 0.000 0.000 0.000 0.000 common.py:120(<genexpr>)
14 0.000 0.000 0.000 0.000 {built-in method pandas._libs.missing.checknull}
4 0.000 0.000 0.000 0.000 common.py:746(is_dtype_equal)
4 0.000 0.000 0.000 0.000 common.py:923(is_signed_integer_dtype)
4 0.000 0.000 0.000 0.000 common.py:1545(is_float_dtype)
14 0.000 0.000 0.000 0.000 missing.py:25(isna)
3 0.000 0.000 0.000 0.000 highlevelgraph.py:71(__init__)
3 0.000 0.000 0.000 0.000 core.py:137(<listcomp>)
33 0.000 0.000 0.000 0.000 core.py:1883(<genexpr>)
35 0.000 0.000 0.000 0.000 variable.py:713(encoding)
2 0.000 0.000 0.000 0.000 {built-in method builtins.min}
16 0.000 0.000 0.000 0.000 _collections_abc.py:719(__iter__)
8 0.000 0.000 0.000 0.000 _collections_abc.py:760(__iter__)
1 0.000 0.000 0.015 0.015 glob.py:9(glob)
2 0.000 0.000 0.015 0.008 glob.py:39(_iglob)
8 0.000 0.000 0.000 0.000 {method 'hexdigest' of '_hashlib.HASH' objects}
1 0.000 0.000 0.000 0.000 combine.py:500(_auto_combine_1d)
14 0.000 0.000 0.000 0.000 merge.py:104(__missing__)
1 0.000 0.000 0.000 0.000 coordinates.py:167(variables)
3 0.000 0.000 0.000 0.000 dataset.py:98(<genexpr>)
4 0.000 0.000 0.000 0.000 dataset.py:402(variables)
1 0.000 0.000 0.000 0.000 netCDF4_.py:269(_disable_auto_decode_group)
12 0.000 0.000 0.032 0.003 netCDF4_.py:357(ds)
1 0.000 0.000 29.646 29.646 api.py:637(<listcomp>)
9 0.000 0.000 0.000 0.000 utils.py:313(__init__)
7 0.000 0.000 0.000 0.000 {method 'filters' of 'netCDF4._netCDF4.Variable' objects}
12 0.000 0.000 0.000 0.000 common.py:117(classes)
8 0.000 0.000 0.000 0.000 common.py:536(is_interval_dtype)
4 0.000 0.000 0.000 0.000 common.py:1078(is_datetime64_any_dtype)
4 0.000 0.000 0.000 0.000 dtypes.py:827(is_dtype)
8 0.000 0.000 0.000 0.000 base.py:551(<dictcomp>)
8 0.000 0.000 0.000 0.000 base.py:547(_get_attributes_dict)
8 0.000 0.000 0.000 0.000 utils.py:789(__enter__)
18 0.000 0.000 0.000 0.000 core.py:903(_get_chunks)
33 0.000 0.000 0.000 0.000 core.py:1885(<genexpr>)
22 0.000 0.000 0.000 0.000 core.py:1889(<genexpr>)
4 0.000 0.000 0.000 0.000 indexing.py:799(_decompose_slice)
4 0.000 0.000 0.000 0.000 indexing.py:1174(__getitem__)
3 0.000 0.000 0.000 0.000 variable.py:294(data)
8 0.000 0.000 0.000 0.000 {method '__enter__' of '_thread.lock' objects}
9 0.000 0.000 0.000 0.000 {built-in method builtins.hash}
4 0.000 0.000 0.000 0.000 {built-in method builtins.max}
4 0.000 0.000 0.000 0.000 {method 'update' of 'set' objects}
7 0.000 0.000 0.000 0.000 {method 'values' of 'dict' objects}
8 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}
1 0.000 0.000 0.000 0.000 posixpath.py:376(abspath)
1 0.000 0.000 0.000 0.000 genericpath.py:53(getmtime)
4 0.000 0.000 0.000 0.000 _collections_abc.py:657(get)
1 0.000 0.000 0.000 0.000 __init__.py:548(__init__)
1 0.000 0.000 0.000 0.000 __init__.py:617(update)
4/2 0.000 0.000 0.000 0.000 combine.py:392(_infer_tile_ids_from_nested_list)
1 0.000 0.000 0.001 0.001 combine.py:522(_auto_combine)
2 0.000 0.000 0.000 0.000 merge.py:100(__init__)
5 0.000 0.000 0.000 0.000 coordinates.py:38(__iter__)
5 0.000 0.000 0.000 0.000 coordinates.py:169(<genexpr>)
1 0.000 0.000 0.000 0.000 dataset.py:666(_replace_vars_and_dims)
5 0.000 0.000 0.000 0.000 dataset.py:1078(data_vars)
1 0.000 0.000 0.000 0.000 file_manager.py:133(_make_key)
1 0.000 0.000 0.000 0.000 file_manager.py:245(increment)
1 0.000 0.000 0.000 0.000 lru_cache.py:54(__setitem__)
1 0.000 0.000 0.000 0.000 netCDF4_.py:398(get_attrs)
1 0.000 0.000 0.000 0.000 api.py:80(_get_default_engine)
1 0.000 0.000 0.000 0.000 api.py:92(_normalize_path)
8 0.000 0.000 0.000 0.000 {method 'view' of 'numpy.ndarray' objects}
8 0.000 0.000 0.000 0.000 utils.py:187(is_dict_like)
4 0.000 0.000 0.000 0.000 utils.py:219(is_valid_numpy_dtype)
10 0.000 0.000 0.000 0.000 utils.py:319(__iter__)
1 0.000 0.000 0.000 0.000 {method 'filepath' of 'netCDF4._netCDF4.Dataset' objects}
4 0.000 0.000 0.000 0.000 common.py:434(is_datetime64tz_dtype)
3 0.000 0.000 0.000 0.000 config.py:107(normalize_key)
3 0.000 0.000 0.000 0.000 core.py:160(<listcomp>)
6 0.000 0.000 0.000 0.000 core.py:966(ndim)
4 0.000 0.000 0.000 0.000 indexing.py:791(decompose_indexer)
8 0.000 0.000 0.000 0.000 {method '__exit__' of '_thread.lock' objects}
3 0.000 0.000 0.000 0.000 {method 'replace' of 'str' objects}
4 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects}
1 0.000 0.000 0.000 0.000 posixpath.py:121(splitext)
1 0.000 0.000 0.000 0.000 genericpath.py:117(_splitext)
1 0.000 0.000 0.001 0.001 combine.py:443(_combine_nd)
1 0.000 0.000 0.000 0.000 combine.py:508(<listcomp>)
14 0.000 0.000 0.000 0.000 merge.py:41(unique_variable)
11 0.000 0.000 0.000 0.000 coordinates.py:163(_names)
1 0.000 0.000 0.000 0.000 dataset.py:2593(_assert_all_in_dataset)
1 0.000 0.000 0.000 0.000 variables.py:55(__init__)
1 0.000 0.000 0.000 0.000 file_manager.py:269(__init__)
29 0.000 0.000 0.000 0.000 file_manager.py:273(__hash__)
1 0.000 0.000 0.001 0.001 netCDF4_.py:392(get_variables)
1 0.000 0.000 0.000 0.000 netCDF4_.py:410(<setcomp>)
7 0.000 0.000 0.000 0.000 {method 'set_auto_chartostring' of 'netCDF4._netCDF4.Variable' objects}
1 0.000 0.000 0.000 0.000 {method 'ncattrs' of 'netCDF4._netCDF4.Dataset' objects}
4 0.000 0.000 0.000 0.000 common.py:472(is_timedelta64_dtype)
4 0.000 0.000 0.000 0.000 common.py:980(is_unsigned_integer_dtype)
4 0.000 0.000 0.000 0.000 base.py:3805(_coerce_to_ndarray)
3 0.000 0.000 0.000 0.000 itertoolz.py:241(unique)
11 0.000 0.000 0.000 0.000 core.py:137(<genexpr>)
3 0.000 0.000 0.000 0.000 indexing.py:600(__init__)
2 0.000 0.000 0.000 0.000 {method 'keys' of 'collections.OrderedDict' objects}
2 0.000 0.000 0.000 0.000 {built-in method _thread.allocate_lock}
1 0.000 0.000 0.000 0.000 {built-in method _collections._count_elements}
8 0.000 0.000 0.000 0.000 {method 'encode' of 'str' objects}
3 0.000 0.000 0.000 0.000 {method 'rfind' of 'str' objects}
8 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects}
3 0.000 0.000 0.000 0.000 {method 'intersection' of 'set' objects}
7 0.000 0.000 0.000 0.000 {method 'setdefault' of 'dict' objects}
13 0.000 0.000 0.000 0.000 {method 'pop' of 'dict' objects}
1 0.000 0.000 0.000 0.000 posixpath.py:64(isabs)
1 0.000 0.000 0.015 0.015 posixpath.py:178(lexists)
1 0.000 0.000 0.000 0.000 posixpath.py:232(expanduser)
2 0.000 0.000 0.000 0.000 _collections_abc.py:672(keys)
7 0.000 0.000 0.000 0.000 contextlib.py:352(__init__)
7 0.000 0.000 0.000 0.000 contextlib.py:355(__enter__)
2 0.000 0.000 0.000 0.000 combine.py:496(vars_as_keys)
2 0.000 0.000 0.000 0.000 combine.py:517(_new_tile_id)
7 0.000 0.000 0.000 0.000 common.py:29(_decode_variable_name)
1 0.000 0.000 0.000 0.000 coordinates.py:160(__init__)
3 0.000 0.000 0.000 0.000 dataset.py:262(__iter__)
2 0.000 0.000 0.000 0.000 dataset.py:266(__len__)
2 0.000 0.000 0.000 0.000 dataset.py:940(__iter__)
1 0.000 0.000 0.000 0.000 dataset.py:1071(coords)
7 0.000 0.000 0.000 0.000 dataset.py:1381(<genexpr>)
4 0.000 0.000 0.000 0.000 variables.py:61(dtype)
1 0.000 0.000 0.000 0.000 file_manager.py:189(__del__)
1 0.000 0.000 0.000 0.000 lru_cache.py:47(_enforce_size_limit)
1 0.000 0.000 0.000 0.000 netCDF4_.py:138(_nc4_require_group)
1 0.000 0.000 0.000 0.000 netCDF4_.py:408(get_encoding)
1 0.000 0.000 0.000 0.000 api.py:66(_get_default_engine_netcdf)
4 0.000 0.000 0.000 0.000 utils.py:197(<genexpr>)
1 0.000 0.000 0.000 0.000 alignment.py:17(_get_joiner)
10 0.000 0.000 0.000 0.000 alignment.py:184(is_alignable)
5 0.000 0.000 0.000 0.000 alignment.py:226(<genexpr>)
5 0.000 0.000 0.000 0.000 utils.py:325(__contains__)
5 0.000 0.000 0.000 0.000 {method 'isunlimited' of 'netCDF4._netCDF4.Dimension' objects}
8 0.000 0.000 0.000 0.000 inference.py:435(is_hashable)
12 0.000 0.000 0.000 0.000 common.py:119(<lambda>)
8 0.000 0.000 0.000 0.000 common.py:127(<lambda>)
8 0.000 0.000 0.000 0.000 common.py:122(classes_and_not_datetimelike)
4 0.000 0.000 0.000 0.000 base.py:675(dtype)
8 0.000 0.000 0.000 0.000 base.py:1395(nlevels)
24 0.000 0.000 0.000 0.000 functoolz.py:15(identity)
1 0.000 0.000 0.000 0.000 base.py:610(normalize_dict)
1 0.000 0.000 0.000 0.000 base.py:625(normalize_seq)
3 0.000 0.000 0.000 0.000 indexing.py:453(__init__)
4 0.000 0.000 0.000 0.000 indexing.py:713(<listcomp>)
3 0.000 0.000 0.000 0.000 variable.py:821(chunks)
4 0.000 0.000 0.000 0.000 variable.py:1731(chunk)
8 0.000 0.000 0.000 0.000 variable.py:1874(name)
3 0.000 0.000 0.000 0.000 {method 'values' of 'collections.OrderedDict' objects}
6 0.000 0.000 0.000 0.000 {built-in method posix.fspath}
1 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}
4 0.000 0.000 0.000 0.000 {method 'startswith' of 'str' objects}
3 0.000 0.000 0.000 0.000 {method 'copy' of 'set' objects}
1 0.000 0.000 0.000 0.000 {method 'union' of 'set' objects}
1 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
2 0.000 0.000 0.000 0.000 posixpath.py:41(_get_sep)
1 0.000 0.000 0.000 0.000 _collections_abc.py:680(values)
9 0.000 0.000 0.000 0.000 _collections_abc.py:698(__init__)
7 0.000 0.000 0.000 0.000 contextlib.py:358(__exit__)
1 0.000 0.000 0.000 0.000 glob.py:145(has_magic)
1 0.000 0.000 0.000 0.000 combine.py:428(<listcomp>)
2 0.000 0.000 0.000 0.000 merge.py:301(_get_priority_vars)
1 0.000 0.000 0.000 0.000 merge.py:370(extract_indexes)
1 0.000 0.000 0.000 0.000 merge.py:378(assert_valid_explicit_coords)
5 0.000 0.000 0.000 0.000 dataset.py:259(__init__)
1 0.000 0.000 0.000 0.000 dataset.py:375(<listcomp>)
2 0.000 0.000 0.000 0.000 dataset.py:416(attrs)
5 0.000 0.000 0.000 0.000 dataset.py:428(encoding)
1 0.000 0.000 0.000 0.000 dataset.py:436(encoding)
1 0.000 0.000 0.000 0.000 dataset.py:1373(<listcomp>)
1 0.000 0.000 0.000 0.000 variables.py:76(lazy_elemwise_func)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
7 0.000 0.000 0.000 0.000 strings.py:39(__init__)
1 0.000 0.000 0.000 0.000 file_manager.py:241(__init__)
1 0.000 0.000 0.000 0.000 locks.py:206(ensure_lock)
1 0.000 0.000 0.000 0.000 netCDF4_.py:236(__init__)
1 0.000 0.000 0.000 0.000 api.py:638(<listcomp>)
1 0.000 0.000 0.000 0.000 utils.py:452(_tostr)
7 0.000 0.000 0.000 0.000 {method 'set_auto_maskandscale' of 'netCDF4._netCDF4.Variable' objects}
1 0.000 0.000 0.000 0.000 utils.py:514(is_grib_path)
3 0.000 0.000 0.000 0.000 core.py:989(name)
8 0.000 0.000 0.000 0.000 variable.py:1834(to_index_variable)
1 0.000 0.000 0.000 0.000 {method 'rstrip' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'endswith' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'keys' of 'dict' objects}
1 0.000 0.000 0.000 0.000 glob.py:22(iglob)
2 0.000 0.000 0.000 0.000 variable.py:2007(<listcomp>)
1 0.000 0.000 0.000 0.000 combine.py:345(_auto_concat)
1 0.000 0.000 0.000 0.000 combine.py:435(<listcomp>)
1 0.000 0.000 0.000 0.000 merge.py:519(<listcomp>)
2 0.000 0.000 0.000 0.000 dataset.py:934(__len__)
2 0.000 0.000 0.000 0.000 variables.py:106(safe_setitem)
1 0.000 0.000 0.000 0.000 api.py:479(__init__)
1 0.000 0.000 0.000 0.000 utils.py:20(_check_inplace)
7 0.000 0.000 0.000 0.000 {method 'chunking' of 'netCDF4._netCDF4.Variable' objects}
4 0.000 0.000 0.000 0.000 utils.py:498(close_on_error)
1 0.000 0.000 0.000 0.000 numeric.py:101(_assert_safe_casting)
3 0.000 0.000 0.000 0.000 core.py:167(<listcomp>)
Output of ds:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
464113917 | https://github.com/pydata/xarray/issues/1385#issuecomment-464113917 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDQ2NDExMzkxNw== | chuaxr 30007270 | 2019-02-15T16:34:02Z | 2019-02-15T16:34:35Z | NONE | On a related note, is it possible to clear out the memory used by the xarray dataset after it is no longer needed? Here's an example:
```python fname2 = '/work/xrc/AM4_xrc/c192L33_am4p0_cmip6Diag/daily/5yr/atmos.20100101-20141231.ucomp.nc' ``` ```python with xr.set_options(file_cache_maxsize=1): %time ds = xr.open_mfdataset(fname2) # would like this to free up memory used by fname ```
```python with xr.set_options(file_cache_maxsize=1): # expected to take same time as first call %time ds = xr.open_mfdataset(fname) ```
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
463367754 | https://github.com/pydata/xarray/issues/1385#issuecomment-463367754 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDQ2MzM2Nzc1NA== | chuaxr 30007270 | 2019-02-13T20:58:52Z | 2019-02-13T20:59:06Z | NONE | It seems my issue has to do with the time coordinate: fname = '/work/xrc/AM4_xrc/c192L33_am4p0_cmip6Diag/daily/5yr/atmos.20100101-20141231.sphum.nc' %prun ds = xr.open_mfdataset(fname,drop_variables='time') 7510 function calls (7366 primitive calls) in 0.068 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.039 0.039 0.039 0.039 netCDF4_.py:244(_open_netcdf4_group) 3 0.022 0.007 0.022 0.007 {built-in method _operator.getitem} 1 0.001 0.001 0.001 0.001 {built-in method posix.lstat} 125/113 0.000 0.000 0.001 0.000 indexing.py:504(shape) 11 0.000 0.000 0.000 0.000 core.py:137(<genexpr>) fname = '/work/xrc/AM4_xrc/c192L33_am4p0_cmip6Diag/daily/5yr/atmos.20000101-20041231.sphum.nc' %prun ds = xr.open_mfdataset(fname)
Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 6 23.791 3.965 23.791 3.965 {built-in method operator.getitem} 1 0.029 0.029 0.029 0.029 netCDF4.py:244(_open_netcdf4_group) 2 0.023 0.012 0.023 0.012 {cftime._cftime.num2date} 1 0.001 0.001 0.001 0.001 {built-in method posix.lstat} 158/139 0.000 0.000 0.001 0.000 indexing.py:504(shape) ``` Both files are 33 GB. This is using xarray 0.11.3. I also confirm that nc.MFDataset is much faster (<1s). Is there any speed-up for the time coordinates possible, given that my data follows a standard calendar? (Short of using drop_variables='time' and then manually adding the time coordinate...) |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
439478904 | https://github.com/pydata/xarray/issues/1385#issuecomment-439478904 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDQzOTQ3ODkwNA== | chuaxr 30007270 | 2018-11-16T18:10:53Z | 2018-11-16T18:10:53Z | NONE | h5netcdf fails with the following error (presumably the file is not compatible): ``` /nbhome/xrc/anaconda2/envs/py361/lib/python3.6/site-packages/h5py/_hl/files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr) 97 if swmr and swmr_support: 98 flags |= h5f.ACC_SWMR_READ ---> 99 fid = h5f.open(name, flags, fapl=fapl) 100 elif mode == 'r+': 101 fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl) h5py/_objects.pyx in h5py._objects.with_phil.wrapper() h5py/_objects.pyx in h5py._objects.with_phil.wrapper() h5py/h5f.pyx in h5py.h5f.open() OSError: Unable to open file (file signature not found)
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
439445695 | https://github.com/pydata/xarray/issues/1385#issuecomment-439445695 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDQzOTQ0NTY5NQ== | chuaxr 30007270 | 2018-11-16T16:20:25Z | 2018-11-16T16:20:25Z | NONE | Sorry, I think the speedup had to do with accessing a file that had previously been loaded rather than due to Output of %prun ds = xr.open_mfdataset('/work/xrc/AM4_skc/atmos_level.1999010100-2000123123.sphum.nc',chunks={'lat':20,'time':50,'lon':12,'pfull':11}) ```
``` /work isn't a remote archive, so it surprises me that this should happen. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
439042364 | https://github.com/pydata/xarray/issues/1385#issuecomment-439042364 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDQzOTA0MjM2NA== | chuaxr 30007270 | 2018-11-15T13:37:16Z | 2018-11-15T14:06:04Z | NONE | Yes, I'm on 0.11. Nothing displays on the task stream/ progress bar when using The output from and for decode_cf = True:
Using If I repeat the open_mfdataset for another 5 files (after opening the first 5), I occasionally get this warning:
I only began using the dashboard recently; please let me know if there's something basic I'm missing. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 | |
438870575 | https://github.com/pydata/xarray/issues/1385#issuecomment-438870575 | https://api.github.com/repos/pydata/xarray/issues/1385 | MDEyOklzc3VlQ29tbWVudDQzODg3MDU3NQ== | chuaxr 30007270 | 2018-11-15T00:32:42Z | 2018-11-15T00:32:42Z | NONE | I can confirm that
``` ds = xr.open_mfdataset(data_fnames,chunks={'lat':20,'time':50,'lon':24,'pfull':11}) ``` . For reference, data_fnames is a list of 5 files, each of which is ~75 GB. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
slow performance with open_mfdataset 224553135 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1