home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 344614881 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 6

  • dcherian 1
  • fujiisoup 1
  • raybellwaves 1
  • chuaxr 1
  • jibcar 1
  • husainridwan 1

author_association 3

  • NONE 3
  • MEMBER 2
  • CONTRIBUTOR 1

issue 1

  • Example on using `preprocess` with `mfdataset` · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1468024753 https://github.com/pydata/xarray/issues/2313#issuecomment-1468024753 https://api.github.com/repos/pydata/xarray/issues/2313 IC_kwDOAMm_X85XgEex husainridwan 61923007 2023-03-14T12:35:00Z 2023-03-14T12:35:00Z NONE

I'll like to work on this @TomNicholas, where do I start from?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Example on using `preprocess` with `mfdataset` 344614881
1142381856 https://github.com/pydata/xarray/issues/2313#issuecomment-1142381856 https://api.github.com/repos/pydata/xarray/issues/2313 IC_kwDOAMm_X85EF10g dcherian 2448579 2022-05-31T16:51:57Z 2022-05-31T16:51:57Z MEMBER

I bet you need to ds.precip.max("time", keepdims=True) so that the returned value has the time dimension. Please ask usage questions at https://github.com/pydata/xarray/discussions next time

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Example on using `preprocess` with `mfdataset` 344614881
1135302642 https://github.com/pydata/xarray/issues/2313#issuecomment-1135302642 https://api.github.com/repos/pydata/xarray/issues/2313 IC_kwDOAMm_X85Dq1fy jibcar 54370222 2022-05-24T01:31:22Z 2022-05-24T01:31:22Z NONE

Hello:

I have to find maximum precipitation of each year (for example: 2007 and 2008, Dataset link are: 2007 and 2008). I have done this using resample method (i.e. .resample(time='Y').max()) after concatenating it along time dimension.

Following along SO, I am wondering if I can use preprocess to find maximum (or minimum or average) for each file first and then concatenate it using time dimension. I tried the following code and was not successful. Can someone help me with this?

```import dask.array as da import numpy as np import xarray as xr

from dask.distributed import Client client = Client() client

def preprocess_func(ds): '''Get maximum (or minimum or average) from each file and concatenate along time''' return ds.precip.max('time')

prec_ds=xr.open_mfdataset([prec_2007,prec_2008], chunks={"lat": 25,"lon": 25,"time": -1,}, preprocess=preprocess_func, concat_dim='time')```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Example on using `preprocess` with `mfdataset` 344614881
1062761948 https://github.com/pydata/xarray/issues/2313#issuecomment-1062761948 https://api.github.com/repos/pydata/xarray/issues/2313 IC_kwDOAMm_X84_WHXc chuaxr 30007270 2022-03-09T10:13:09Z 2022-03-09T10:13:09Z NONE

Seconding @dcherian's comment in #4901 on an example for .encoding['source']. Working off @raybellwaves' example, something like this would have been useful to me:

```

import xarray as xr import numpy as np model1 = xr.DataArray(np.arange(2), coords=[np.arange(2)], name="f") model1.to_dataset().to_netcdf("model1.nc") model2 = xr.DataArray(np.arange(2), coords=[np.arange(2)], name="f") model2.to_dataset().to_netcdf("model2.nc") ds = xr.open_mfdataset( ... ["model1.nc", "model2.nc"], ... preprocess=lambda ds: ds.expand_dims( ... {"model_name": [ds.encoding["source"].split("/")[-1].split(".")[0]]} ... ), ... ) ds <xarray.Dataset> Dimensions: (dim_0: 2, model_name: 2) Coordinates: * dim_0 (dim_0) int64 0 1 * model_name (model_name) object 'model1' 'model2' Data variables: f (model_name, dim_0) int64 dask.array<chunksize=(1, 2), meta=np.ndarray> ```

On that note, the example above seems to work with some slight changes: ```

import numpy as np import xarray as xr

f1 = xr.DataArray(np.arange(2), coords=[np.arange(2)], dims=["a"], name="f1") f1 = f1.assign_coords(t='t0') f1.to_dataset().to_netcdf("f1.nc")

f2 = xr.DataArray(np.arange(2), coords=[np.arange(2)], dims=["a"], name="f2") f2 = f2.assign_coords(t='t1') f2.to_dataset().to_netcdf("f2.nc")

Concat along t

def preprocess(ds): ... return ds.expand_dims("t") ...

ds = xr.open_mfdataset(["f1.nc", "f2.nc"], concat_dim="t", preprocess=preprocess) ds <xarray.Dataset> Dimensions: (a: 2, t: 2) Coordinates: * t (t) object 't0' 't1' * a (a) int64 0 1 Data variables: f1 (t, a) float64 dask.array<chunksize=(2, 2), meta=np.ndarray> f2 (t, a) float64 dask.array<chunksize=(2, 2), meta=np.ndarray> ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Example on using `preprocess` with `mfdataset` 344614881
778554202 https://github.com/pydata/xarray/issues/2313#issuecomment-778554202 https://api.github.com/repos/pydata/xarray/issues/2313 MDEyOklzc3VlQ29tbWVudDc3ODU1NDIwMg== raybellwaves 17162724 2021-02-13T03:20:58Z 2021-02-13T03:20:58Z CONTRIBUTOR

Edit: Copied and pasted from a duplicate issue I opened. Closing that and moving convo here.

@jhamman's SO answer circa 2018 helped me this week https://stackoverflow.com/a/51714004/6046019

I wonder if it's worth (not sure where) providing an example of how to use preprocesses with open_mfdataset?

Add an Examples entry to the doc string? (http://xarray.pydata.org/en/latest/generated/xarray.open_mfdataset.html / https://github.com/pydata/xarray/blob/5296ed18272a856d478fbbb3d3253205508d1c2d/xarray/backends/api.py#L895)

While not a small example (as the remote files are large) this is how I used it:

``` import xarray as xr import s3fs

def preprocess(ds): return ds.expand_dims('time')

fs = s3fs.S3FileSystem(anon=True) f1 = fs.open('s3://fmi-opendata-rcrhirlam-surface-grib/2021/02/03/00/numerical-hirlam74-forecast-MaximumWind-20210203T000000Z.grb2') f2 = fs.open('s3://fmi-opendata-rcrhirlam-surface-grib/2021/02/03/06/numerical-hirlam74-forecast-MaximumWind-20210203T060000Z.grb2')

ds = xr.open_mfdataset([f1, f2], engine="cfgrib", preprocess=preprocess, parallel=True) ```

with one file looking like: xr.open_dataset("LOCAL_numerical-hirlam74-forecast-MaximumWind-20210203T000000Z.grb2", engine="cfgrib") <xarray.Dataset> Dimensions: (latitude: 947, longitude: 5294, step: 55) Coordinates: time datetime64[ns] ... * step (step) timedelta64[ns] 01:00:00 ... 2 days 07:00:00 heightAboveGround int64 ... * latitude (latitude) float64 25.65 25.72 25.78 ... 89.86 89.93 90.0 * longitude (longitude) float64 -180.0 -179.9 -179.9 ... 179.9 180.0 valid_time (step) datetime64[ns] ... Data variables: fg10 (step, latitude, longitude) float32 ... Attributes: GRIB_edition: 2 GRIB_centre: ecmf GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts GRIB_subCentre: 0 Conventions: CF-1.7 institution: European Centre for Medium-Range Weather Forecasts history: 2021-02-12T18:06:52 GRIB to CDM+CF via cfgrib-0....

A smaller example could be (WIP; note I was hoping ds would concat along t but it doesn't do what I expect) ``` import numpy as np import xarray as xr

f1 = xr.DataArray(np.arange(2), coords=[np.arange(2)], dims=["a"], name="f1") f1 = f1.assign_coords(t=0) f1.to_dataset().to_zarr("f1.zarr") # What's the best way to store small files to open again with mf_dataset? csv via xarray objects? can you use open_mfdataset on pkl objects?

f2 = xr.DataArray(np.arange(2), coords=[np.arange(2)], dims=["a"], name="f2") f2 = f2.assign_coords(t=1) f2.to_dataset().to_zarr("f2.zarr")

Concat along t

def preprocess(ds): return ds.expand_dims('t') ds = xr.open_mfdataset(["f1.zarr", "f2.zarr"], engine="zarr", concat_dim="t", preprocess=preprocess)

ds <xarray.Dataset> Dimensions: (a: 2, t: 1) Coordinates: * t (t) int64 0 * a (a) int64 0 1 Data variables: f1 (t, a) int64 dask.array<chunksize=(1, 2), meta=np.ndarray> f2 (t, a) int64 dask.array<chunksize=(1, 2), meta=np.ndarray> ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Example on using `preprocess` with `mfdataset` 344614881
410892776 https://github.com/pydata/xarray/issues/2313#issuecomment-410892776 https://api.github.com/repos/pydata/xarray/issues/2313 MDEyOklzc3VlQ29tbWVudDQxMDg5Mjc3Ng== fujiisoup 6815844 2018-08-07T00:18:02Z 2018-08-07T00:18:02Z MEMBER

There is a related question on SO.

I think it is a good idea to add an example to our doc. Agreed for the question 1. I did not yet examine the question 2, but I think simple examples are generally nice.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Example on using `preprocess` with `mfdataset` 344614881

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.352ms · About: xarray-datasette