html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2313#issuecomment-1468024753,https://api.github.com/repos/pydata/xarray/issues/2313,1468024753,IC_kwDOAMm_X85XgEex,61923007,2023-03-14T12:35:00Z,2023-03-14T12:35:00Z,NONE,"I'll like to work on this @TomNicholas, where do I start from?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,344614881
https://github.com/pydata/xarray/issues/2313#issuecomment-1142381856,https://api.github.com/repos/pydata/xarray/issues/2313,1142381856,IC_kwDOAMm_X85EF10g,2448579,2022-05-31T16:51:57Z,2022-05-31T16:51:57Z,MEMBER,"I bet you need to `ds.precip.max(""time"", keepdims=True)` so that the returned value has the `time` dimension. Please ask usage questions at https://github.com/pydata/xarray/discussions next time","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,344614881
https://github.com/pydata/xarray/issues/2313#issuecomment-1135302642,https://api.github.com/repos/pydata/xarray/issues/2313,1135302642,IC_kwDOAMm_X85Dq1fy,54370222,2022-05-24T01:31:22Z,2022-05-24T01:31:22Z,NONE,"Hello:
I have to find maximum precipitation of each year (for example: 2007 and 2008, Dataset link are: [2007](https://downloads.psl.noaa.gov/Datasets/cpc_us_precip/RT/precip.V1.0.2007.nc) and [2008](https://downloads.psl.noaa.gov/Datasets/cpc_us_precip/RT/precip.V1.0.2008.nc)). I have done this using resample method (i.e. `.resample(time='Y').max()`) after concatenating it along time dimension.
Following along [SO](https://stackoverflow.com/questions/51709266/using-xarray-to-open-a-multi-file-dataset-when-both-the-files-and-dataset-have-a), I am wondering if I can use preprocess to find maximum (or minimum or average) for each file first and then concatenate it using time dimension. I tried the following code and was not successful. Can someone help me with this?
```import dask.array as da
import numpy as np
import xarray as xr
from dask.distributed import Client
client = Client()
client
def preprocess_func(ds):
'''Get maximum (or minimum or average) from each file and concatenate along time'''
return ds.precip.max('time')
prec_ds=xr.open_mfdataset([prec_2007,prec_2008],
chunks={""lat"": 25,""lon"": 25,""time"": -1,},
preprocess=preprocess_func,
concat_dim='time')```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,344614881
https://github.com/pydata/xarray/issues/2313#issuecomment-1062761948,https://api.github.com/repos/pydata/xarray/issues/2313,1062761948,IC_kwDOAMm_X84_WHXc,30007270,2022-03-09T10:13:09Z,2022-03-09T10:13:09Z,NONE,"Seconding @dcherian's comment in #4901 on an example for `.encoding['source']`. Working off @raybellwaves' example, something like this would have been useful to me:
```
>>> import xarray as xr
>>> import numpy as np
>>> model1 = xr.DataArray(np.arange(2), coords=[np.arange(2)], name=""f"")
>>> model1.to_dataset().to_netcdf(""model1.nc"")
>>> model2 = xr.DataArray(np.arange(2), coords=[np.arange(2)], name=""f"")
>>> model2.to_dataset().to_netcdf(""model2.nc"")
>>> ds = xr.open_mfdataset(
... [""model1.nc"", ""model2.nc""],
... preprocess=lambda ds: ds.expand_dims(
... {""model_name"": [ds.encoding[""source""].split(""/"")[-1].split(""."")[0]]}
... ),
... )
>>> ds
Dimensions: (dim_0: 2, model_name: 2)
Coordinates:
* dim_0 (dim_0) int64 0 1
* model_name (model_name) object 'model1' 'model2'
Data variables:
f (model_name, dim_0) int64 dask.array
```
On that note, the example above seems to work with some slight changes:
```
>>> import numpy as np
>>> import xarray as xr
>>>
>>> f1 = xr.DataArray(np.arange(2), coords=[np.arange(2)], dims=[""a""], name=""f1"")
>>> f1 = f1.assign_coords(t='t0')
>>> f1.to_dataset().to_netcdf(""f1.nc"")
>>>
>>> f2 = xr.DataArray(np.arange(2), coords=[np.arange(2)], dims=[""a""], name=""f2"")
>>> f2 = f2.assign_coords(t='t1')
>>> f2.to_dataset().to_netcdf(""f2.nc"")
>>>
>>> # Concat along t
>>> def preprocess(ds):
... return ds.expand_dims(""t"")
...
>>>
>>> ds = xr.open_mfdataset([""f1.nc"", ""f2.nc""], concat_dim=""t"", preprocess=preprocess)
>>> ds
Dimensions: (a: 2, t: 2)
Coordinates:
* t (t) object 't0' 't1'
* a (a) int64 0 1
Data variables:
f1 (t, a) float64 dask.array
f2 (t, a) float64 dask.array
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,344614881
https://github.com/pydata/xarray/issues/2313#issuecomment-778554202,https://api.github.com/repos/pydata/xarray/issues/2313,778554202,MDEyOklzc3VlQ29tbWVudDc3ODU1NDIwMg==,17162724,2021-02-13T03:20:58Z,2021-02-13T03:20:58Z,CONTRIBUTOR,"Edit: Copied and pasted from a duplicate issue I opened. Closing that and moving convo here.
@jhamman's SO answer circa 2018 helped me this week https://stackoverflow.com/a/51714004/6046019
I wonder if it's worth (not sure where) providing an example of how to use `preprocesses` with `open_mfdataset`?
Add an Examples entry to the doc string? (http://xarray.pydata.org/en/latest/generated/xarray.open_mfdataset.html / https://github.com/pydata/xarray/blob/5296ed18272a856d478fbbb3d3253205508d1c2d/xarray/backends/api.py#L895)
While not a small example (as the remote files are large) this is how I used it:
```
import xarray as xr
import s3fs
def preprocess(ds):
return ds.expand_dims('time')
fs = s3fs.S3FileSystem(anon=True)
f1 = fs.open('s3://fmi-opendata-rcrhirlam-surface-grib/2021/02/03/00/numerical-hirlam74-forecast-MaximumWind-20210203T000000Z.grb2')
f2 = fs.open('s3://fmi-opendata-rcrhirlam-surface-grib/2021/02/03/06/numerical-hirlam74-forecast-MaximumWind-20210203T060000Z.grb2')
ds = xr.open_mfdataset([f1, f2], engine=""cfgrib"", preprocess=preprocess, parallel=True)
```
with one file looking like:
```
xr.open_dataset(""LOCAL_numerical-hirlam74-forecast-MaximumWind-20210203T000000Z.grb2"", engine=""cfgrib"")
Dimensions: (latitude: 947, longitude: 5294, step: 55)
Coordinates:
time datetime64[ns] ...
* step (step) timedelta64[ns] 01:00:00 ... 2 days 07:00:00
heightAboveGround int64 ...
* latitude (latitude) float64 25.65 25.72 25.78 ... 89.86 89.93 90.0
* longitude (longitude) float64 -180.0 -179.9 -179.9 ... 179.9 180.0
valid_time (step) datetime64[ns] ...
Data variables:
fg10 (step, latitude, longitude) float32 ...
Attributes:
GRIB_edition: 2
GRIB_centre: ecmf
GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts
GRIB_subCentre: 0
Conventions: CF-1.7
institution: European Centre for Medium-Range Weather Forecasts
history: 2021-02-12T18:06:52 GRIB to CDM+CF via cfgrib-0....
```
A smaller example could be (WIP; note I was hoping ds would concat along t but it doesn't do what I expect)
```
import numpy as np
import xarray as xr
f1 = xr.DataArray(np.arange(2), coords=[np.arange(2)], dims=[""a""], name=""f1"")
f1 = f1.assign_coords(t=0)
f1.to_dataset().to_zarr(""f1.zarr"") # What's the best way to store small files to open again with mf_dataset? csv via xarray objects? can you use open_mfdataset on pkl objects?
f2 = xr.DataArray(np.arange(2), coords=[np.arange(2)], dims=[""a""], name=""f2"")
f2 = f2.assign_coords(t=1)
f2.to_dataset().to_zarr(""f2.zarr"")
# Concat along t
def preprocess(ds):
return ds.expand_dims('t')
ds = xr.open_mfdataset([""f1.zarr"", ""f2.zarr""], engine=""zarr"", concat_dim=""t"", preprocess=preprocess)
>>> ds
Dimensions: (a: 2, t: 1)
Coordinates:
* t (t) int64 0
* a (a) int64 0 1
Data variables:
f1 (t, a) int64 dask.array
f2 (t, a) int64 dask.array
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,344614881
https://github.com/pydata/xarray/issues/2313#issuecomment-410892776,https://api.github.com/repos/pydata/xarray/issues/2313,410892776,MDEyOklzc3VlQ29tbWVudDQxMDg5Mjc3Ng==,6815844,2018-08-07T00:18:02Z,2018-08-07T00:18:02Z,MEMBER,"There is a related question on [SO](https://stackoverflow.com/questions/51709266/using-xarray-to-open-a-multi-file-dataset-when-both-the-files-and-dataset-have-a).
I think it is a good idea to add an example to our doc.
Agreed for the question 1.
I did not yet examine the question 2, but I think simple examples are generally nice.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,344614881