html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2313#issuecomment-778554202,https://api.github.com/repos/pydata/xarray/issues/2313,778554202,MDEyOklzc3VlQ29tbWVudDc3ODU1NDIwMg==,17162724,2021-02-13T03:20:58Z,2021-02-13T03:20:58Z,CONTRIBUTOR,"Edit: Copied and pasted from a duplicate issue I opened. Closing that and moving convo here.

@jhamman's SO answer circa 2018 helped me this week https://stackoverflow.com/a/51714004/6046019

I wonder if it's worth (not sure where) providing an example of how to use `preprocesses` with `open_mfdataset`? 

Add an Examples entry to the doc string? (http://xarray.pydata.org/en/latest/generated/xarray.open_mfdataset.html / https://github.com/pydata/xarray/blob/5296ed18272a856d478fbbb3d3253205508d1c2d/xarray/backends/api.py#L895)

While not a small example (as the remote files are large) this is how I used it:

```
import xarray as xr
import s3fs


def preprocess(ds):
    return ds.expand_dims('time')

fs = s3fs.S3FileSystem(anon=True)
f1 = fs.open('s3://fmi-opendata-rcrhirlam-surface-grib/2021/02/03/00/numerical-hirlam74-forecast-MaximumWind-20210203T000000Z.grb2')
f2 = fs.open('s3://fmi-opendata-rcrhirlam-surface-grib/2021/02/03/06/numerical-hirlam74-forecast-MaximumWind-20210203T060000Z.grb2')

ds = xr.open_mfdataset([f1, f2], engine=""cfgrib"", preprocess=preprocess, parallel=True)
```

with one file looking like:
```
xr.open_dataset(""LOCAL_numerical-hirlam74-forecast-MaximumWind-20210203T000000Z.grb2"", engine=""cfgrib"")
<xarray.Dataset>
Dimensions:            (latitude: 947, longitude: 5294, step: 55)
Coordinates:
    time               datetime64[ns] ...
  * step               (step) timedelta64[ns] 01:00:00 ... 2 days 07:00:00
    heightAboveGround  int64 ...
  * latitude           (latitude) float64 25.65 25.72 25.78 ... 89.86 89.93 90.0
  * longitude          (longitude) float64 -180.0 -179.9 -179.9 ... 179.9 180.0
    valid_time         (step) datetime64[ns] ...
Data variables:
    fg10               (step, latitude, longitude) float32 ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2021-02-12T18:06:52 GRIB to CDM+CF via cfgrib-0....
```

A smaller example could be (WIP; note I was hoping ds would concat along t but it doesn't do what I expect)
```
import numpy as np
import xarray as xr

f1 = xr.DataArray(np.arange(2), coords=[np.arange(2)], dims=[""a""], name=""f1"")
f1 = f1.assign_coords(t=0)
f1.to_dataset().to_zarr(""f1.zarr"") # What's the best way to store small files to open again with mf_dataset? csv via xarray objects? can you use open_mfdataset on pkl objects?

f2 = xr.DataArray(np.arange(2), coords=[np.arange(2)], dims=[""a""], name=""f2"")
f2 = f2.assign_coords(t=1)
f2.to_dataset().to_zarr(""f2.zarr"")

# Concat along t
def preprocess(ds):
    return ds.expand_dims('t')
ds = xr.open_mfdataset([""f1.zarr"", ""f2.zarr""], engine=""zarr"", concat_dim=""t"", preprocess=preprocess)
>>> ds
<xarray.Dataset>
Dimensions:  (a: 2, t: 1)
Coordinates:
  * t        (t) int64 0
  * a        (a) int64 0 1
Data variables:
    f1       (t, a) int64 dask.array<chunksize=(1, 2), meta=np.ndarray>
    f2       (t, a) int64 dask.array<chunksize=(1, 2), meta=np.ndarray>
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,344614881