home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 807614965

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
807614965 MDU6SXNzdWU4MDc2MTQ5NjU= 4901 document example of preprocess with open_mfdataset 17162724 closed 0     1 2021-02-12T23:37:04Z 2021-02-13T03:21:08Z 2021-02-13T03:21:08Z CONTRIBUTOR      

@jhamman's SO answer circa 2018 helped me this week https://stackoverflow.com/a/51714004/6046019

I wonder if it's worth (not sure where) providing an example of how to use preprocesses with open_mfdataset?

Add an Examples entry to the doc string? (http://xarray.pydata.org/en/latest/generated/xarray.open_mfdataset.html / https://github.com/pydata/xarray/blob/5296ed18272a856d478fbbb3d3253205508d1c2d/xarray/backends/api.py#L895)

While not a small example (as the remote files are large) this is how I used it:

``` import xarray as xr import s3fs

def preprocess(ds): return ds.expand_dims('time')

fs = s3fs.S3FileSystem(anon=True) f1 = fs.open('s3://fmi-opendata-rcrhirlam-surface-grib/2021/02/03/00/numerical-hirlam74-forecast-MaximumWind-20210203T000000Z.grb2') f2 = fs.open('s3://fmi-opendata-rcrhirlam-surface-grib/2021/02/03/06/numerical-hirlam74-forecast-MaximumWind-20210203T060000Z.grb2')

ds = xr.open_mfdataset([f1, f2], engine="cfgrib", preprocess=preprocess, parallel=True) ```

with one file looking like: xr.open_dataset("LOCAL_numerical-hirlam74-forecast-MaximumWind-20210203T000000Z.grb2", engine="cfgrib") <xarray.Dataset> Dimensions: (latitude: 947, longitude: 5294, step: 55) Coordinates: time datetime64[ns] ... * step (step) timedelta64[ns] 01:00:00 ... 2 days 07:00:00 heightAboveGround int64 ... * latitude (latitude) float64 25.65 25.72 25.78 ... 89.86 89.93 90.0 * longitude (longitude) float64 -180.0 -179.9 -179.9 ... 179.9 180.0 valid_time (step) datetime64[ns] ... Data variables: fg10 (step, latitude, longitude) float32 ... Attributes: GRIB_edition: 2 GRIB_centre: ecmf GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts GRIB_subCentre: 0 Conventions: CF-1.7 institution: European Centre for Medium-Range Weather Forecasts history: 2021-02-12T18:06:52 GRIB to CDM+CF via cfgrib-0....

A smaller example could be (WIP; note I was hoping ds would concat along t but it doesn't do what I expect) ``` import numpy as np import xarray as xr

f1 = xr.DataArray(np.arange(2), coords=[np.arange(2)], dims=["a"], name="f1") f1 = f1.assign_coords(t=0) f1.to_dataset().to_zarr("f1.zarr") # What's the best way to store small files to open again with mf_dataset? csv via xarray objects? can you use open_mfdataset on pkl objects?

f2 = xr.DataArray(np.arange(2), coords=[np.arange(2)], dims=["a"], name="f2") f2 = f2.assign_coords(t=1) f2.to_dataset().to_zarr("f2.zarr")

Concat along t

def preprocess(ds): return ds.expand_dims('t') ds = xr.open_mfdataset(["f1.zarr", "f2.zarr"], engine="zarr", concat_dim="t", preprocess=preprocess)

ds <xarray.Dataset> Dimensions: (a: 2, t: 1) Coordinates: * t (t) int64 0 * a (a) int64 0 1 Data variables: f1 (t, a) int64 dask.array<chunksize=(1, 2), meta=np.ndarray> f2 (t, a) int64 dask.array<chunksize=(1, 2), meta=np.ndarray> ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4901/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 0.572ms · About: xarray-datasette