html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2662#issuecomment-454439392,https://api.github.com/repos/pydata/xarray/issues/2662,454439392,MDEyOklzc3VlQ29tbWVudDQ1NDQzOTM5Mg==,22245117,2019-01-15T15:45:03Z,2019-01-15T15:45:03Z,CONTRIBUTOR,I checked PR #2678 with the data that originated the issue and it fixes the problem! ,"{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,397063221 https://github.com/pydata/xarray/issues/2662#issuecomment-454086847,https://api.github.com/repos/pydata/xarray/issues/2662,454086847,MDEyOklzc3VlQ29tbWVudDQ1NDA4Njg0Nw==,22245117,2019-01-14T17:20:03Z,2019-01-14T17:20:03Z,CONTRIBUTOR,"I've created a little script to reproduce the problem. @TomNicholas it looks like datasets are opened correctly. The problem arises when `open_mfdatasets` calls `_auto_combine`. Indeed, `_auto_combine` was introduced in v0.11.1 ```python import numpy as np import xarray as xr import os Tsize=100; T = np.arange(Tsize); Xsize=900; X = np.arange(Xsize); Ysize=800; Y = np.arange(Ysize) data = np.random.randn(Tsize, Xsize, Ysize) for i in range(2): # Create 2 datasets with different variables dsA = xr.Dataset({'A': xr.DataArray(data, coords={'T': T+i*Tsize}, dims=('T', 'X', 'Y'))}) dsB = xr.Dataset({'B': xr.DataArray(data, coords={'T': T+i*Tsize}, dims=('T', 'X', 'Y'))}) # Save datasets in one folder dsA.to_netcdf('dsA'+str(i)+'.nc') dsB.to_netcdf('dsB'+str(i)+'.nc') # Save datasets in two folders dirname='rep'+str(i) os.mkdir(dirname) dsA.to_netcdf(dirname+'/'+'dsA'+str(i)+'.nc') dsB.to_netcdf(dirname+'/'+'dsB'+str(i)+'.nc') ``` ### Fast if netCDFs are stored in one folder: ```python %%time ds_1folder = xr.open_mfdataset('*.nc', concat_dim='T') ``` CPU times: user 49.9 ms, sys: 5.06 ms, total: 55 ms Wall time: 59.7 ms ### Slow if netCDFs are stored in several folders: ```python %%time ds_2folders = xr.open_mfdataset('rep*/*.nc', concat_dim='T') ``` CPU times: user 8.6 s, sys: 5.95 s, total: 14.6 s Wall time: 10.3 s ### Fast if files containing different variables are opened separately, then merged: ```python %%time ds_A = xr.open_mfdataset('rep*/dsA*.nc', concat_dim='T') ds_B = xr.open_mfdataset('rep*/dsB*.nc', concat_dim='T') ds_merged = xr.merge([ds_A, ds_B]) ``` CPU times: user 33.8 ms, sys: 3.7 ms, total: 37.5 ms Wall time: 34.5 ms ","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,397063221