id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 561921094,MDU6SXNzdWU1NjE5MjEwOTQ=,3762,xarray groupby/map fails to parallelize,6491058,closed,1,,,4,2020-02-07T23:20:59Z,2023-09-15T15:52:42Z,2023-09-15T15:52:41Z,NONE,,,,"#### MCVE Code Sample <!-- In order for the maintainers to efficiently understand and prioritize issues, we ask you post a ""Minimal, Complete and Verifiable Example"" (MCVE): http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports --> ```python import sys import math import logging import dask import xarray import numpy logger = logging.getLogger('main') if __name__ == '__main__': logging.basicConfig( stream=sys.stdout, format='%(asctime)s %(levelname)-8s %(message)s', level=logging.INFO, datefmt='%Y-%m-%d %H:%M:%S') logger.info('Starting dask client') client = dask.distributed.Client() SIZE = 100000 SONAR_BINS = 2000 time = range(0, SIZE) upper_limit = numpy.random.randint(0, 10, (SIZE)) lower_limit = numpy.random.randint(20, 30, (SIZE)) sonar_data = numpy.random.randint(0, 255, (SIZE, SONAR_BINS)) channel = xarray.Dataset({ 'upper_limit': (['time'], upper_limit, {'units': 'depth meters'}), 'lower_limit': (['time'], lower_limit, {'units': 'depth meters'}), 'data': (['time', 'depth_bin'], sonar_data, {'units': 'amplitude'}), }, coords={ 'depth_bin': (['depth_bin'], range(0,SONAR_BINS)), 'time': (['time'], time) }) logger.info('get overall min/max radar range we want to normalize to called the adjusted range') adjusted_min, adjusted_max = channel.upper_limit.min().values.item(), channel.lower_limit.max().values.item() adjusted_min = math.floor(adjusted_min) adjusted_max = math.ceil(adjusted_max) logger.info('adjusted_min: %s, adjusted_max: %s', adjusted_min, adjusted_max) bin_count = len(channel.depth_bin) logger.info('bin_count: %s', bin_count) adjusted_depth_per_bin = (adjusted_max - adjusted_min) / bin_count logger.info('adjusted_depth_per_bin: %s', adjusted_depth_per_bin) adjusted_bin_depths = [adjusted_min + (j * adjusted_depth_per_bin) for j in range(0, bin_count)] logger.info('adjusted_bin_depths[0]: %s ... [-1]: %s', adjusted_bin_depths[0], adjusted_bin_depths[-1]) def Interp(ds): # Ideally instead of using interp we will use some kind of downsampling and shift # this doesnt exist in xarray though and interp is good enough for the moment # I just added this to debug t = ds.time.values.item() if (t % 100) == 0: total = len(channel.time) perc = 100.0 * t / total logger.info('%s : %s of %s', perc, t, total) unadjusted_depth_amplitudes = ds.data unadjusted_min = ds.upper_limit.values.item() unadjusted_max = ds.lower_limit.values.item() unadjusted_depth_per_bin = (unadjusted_max - unadjusted_min) / bin_count index_mapping = [((adjusted_min + (bin * adjusted_depth_per_bin)) - unadjusted_min) / unadjusted_depth_per_bin for bin in range(0, bin_count)] adjusted_depth_amplitudes = unadjusted_depth_amplitudes.interp(coords={'depth_bin':index_mapping}, method='linear', assume_sorted=True) adjusted_depth_amplitudes = adjusted_depth_amplitudes.rename({'depth_bin':'depth'}).assign_coords({'depth':adjusted_bin_depths}) #logger.info('%s, \n\tunadjusted_depth_amplitudes.values:%s\n\tunadjusted_min:%s\n\tunadjusted_max:%s\n\tunadjusted_depth_per_bin:%s\n\tindex_mapping:%s\n\tadjusted_depth_amplitudes:%s\n\tadjusted_depth_amplitudes.values:%s\n\n', ds, unadjusted_depth_amplitudes.values, unadjusted_min, unadjusted_max, unadjusted_depth_per_bin, index_mapping, adjusted_depth_amplitudes, adjusted_depth_amplitudes.values) return adjusted_depth_amplitudes # Lets split into chunks so could be performed in parallel # This doesnt work to parallelize and only slows it down a lot #logger.info('chunk') #channel = channel.chunk({'time':100}) logger.info('groupby') g = channel.groupby('time') logger.info('do interp') normalized_depth_data = g.map(Interp) logger.info('done') ``` #### Expected Output I am fairly new to xarray but feel this example could have been executed a bit better than xarray currenty does. Each map call of the above custom function should be possible to be parallelized from what I can tell. I imagined that in the backend, xarray would have chunked it and run in parallel on dask. However I find it is VERY slow even for single threaded case but also that it doesn't seem to parallelize. It takes roughly 5msec per map call in my hardware when I don't include the chunk and 70msec with the chunk call you can find in the code. #### Problem Description <!-- this should explain why the current behavior is a problem and why the expected output is a better solution --> The single threaded performance is super slow, but also it fails to parallelize the computations across the cores on my machine. If you are after more background to what I am trying to do, I also asked a SO question about how to re-organize the code to improve performance. I felt the current behavior though is a performance bug (assuming I didn't do something completely wrong in the code). https://stackoverflow.com/questions/60103317/can-the-performance-of-using-xarray-groupby-map-be-improved #### Output of ``xr.show_versions()`` <details> # Paste the output here xr.show_versions() here xarray.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 21:48:41) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.14.1 pandas: 0.25.3 numpy: 1.17.3 scipy: 1.3.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.2 cfgrib: None iris: None bottleneck: None dask: 2.9.1 distributed: 2.9.1 matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: None numbagg: None setuptools: 44.0.0.post20200102 pip: 19.3.1 conda: None pytest: None IPython: 7.11.1 sphinx: None </details> ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3762/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue