home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 829426650 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • dcherian 2
  • porterdf 2
  • shoyer 1

author_association 2

  • MEMBER 3
  • NONE 2

issue 1

  • Unable to load multiple WRF NetCDF files into Dask array on pangeo · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
828713852 https://github.com/pydata/xarray/issues/5023#issuecomment-828713852 https://api.github.com/repos/pydata/xarray/issues/5023 MDEyOklzc3VlQ29tbWVudDgyODcxMzg1Mg== dcherian 2448579 2021-04-28T19:16:41Z 2021-04-28T19:16:41Z MEMBER

Great. Thanks for following up @porterdf

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unable to load multiple WRF NetCDF files into Dask array on pangeo 829426650
828683287 https://github.com/pydata/xarray/issues/5023#issuecomment-828683287 https://api.github.com/repos/pydata/xarray/issues/5023 MDEyOklzc3VlQ29tbWVudDgyODY4MzI4Nw== porterdf 7237617 2021-04-28T18:30:46Z 2021-04-28T18:30:46Z NONE

Thanks @dcherian ```

ds = xr.open_mfdataset(NCs_urls, engine='netcdf4', parallel=True, concat_dim='XTIME', )

ValueError: Could not find any dimension coordinates to use to order the datasets for concatenation ```

So it doesn't work, but perhaps that's not surprising give that 'XTIME' is a coordinate, but 'Time' is the dimension (one of WRF's quirks related to staggered grids and moving nests).

```

print(ds.coords)

Coordinates: XLAT (Time, south_north, west_east) float32 dask.array<chunksize=(8, 1035, 675), meta=np.ndarray> XLONG (Time, south_north, west_east) float32 dask.array<chunksize=(8, 1035, 675), meta=np.ndarray> XTIME (Time) datetime64[ns] dask.array<chunksize=(8,), meta=np.ndarray> XLAT_U (Time, south_north, west_east_stag) float32 dask.array<chunksize=(8, 1035, 676), meta=np.ndarray> XLONG_U (Time, south_north, west_east_stag) float32 dask.array<chunksize=(8, 1035, 676), meta=np.ndarray> XLAT_V (Time, south_north_stag, west_east) float32 dask.array<chunksize=(8, 1036, 675), meta=np.ndarray> XLONG_V (Time, south_north_stag, west_east) float32 dask.array<chunksize=(8, 1036, 675), meta=np.ndarray> ```

As such, I'm following the documentation to add a preprocessor ds.swap_dims({'Time':'XTIME'}), which works as expected.

Thanks for everyone's help! Shall I close this? (as it was never actually an issue?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unable to load multiple WRF NetCDF files into Dask array on pangeo 829426650
822127102 https://github.com/pydata/xarray/issues/5023#issuecomment-822127102 https://api.github.com/repos/pydata/xarray/issues/5023 MDEyOklzc3VlQ29tbWVudDgyMjEyNzEwMg== dcherian 2448579 2021-04-19T02:37:25Z 2021-04-19T02:37:25Z MEMBER

Does it work if you pass concat_dim="XTIME"?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unable to load multiple WRF NetCDF files into Dask array on pangeo 829426650
812278389 https://github.com/pydata/xarray/issues/5023#issuecomment-812278389 https://api.github.com/repos/pydata/xarray/issues/5023 MDEyOklzc3VlQ29tbWVudDgxMjI3ODM4OQ== porterdf 7237617 2021-04-02T02:14:19Z 2021-04-02T02:14:19Z NONE

Thanks for the great suggestion @shoyer - your suggestion to loop through the netCDF files is working well in Dask using the following code:

``` import xarray as xr import gcsfs from tqdm.autonotebook import tqdm xr.set_options(display_style="html");

fs = gcsfs.GCSFileSystem(project='ldeo-glaciology', mode='r',cache_timeout = 0) NCs = fs.glob('gs://ldeo-glaciology/AMPS/WRF_24/domain_02/*.nc') url = 'gs://' + NCs[0] openfile = fs.open(url, mode='rb') ds = xr.open_dataset(openfile, engine='h5netcdf',chunks={'Time': -1}) for i in tqdm(range(1, 8)): url = 'gs://' + NCs[i] openfile = fs.open(url, mode='rb') temp = xr.open_dataset(openfile, engine='h5netcdf',chunks={'Time': -1}) ds = xr.concat([ds,temp],'Time') ```

However, I am still confused why open_mfdataset was not parsing the Time dimension - the concatenated DataSet using the looping method above appears to have a time dimension compatible with datetime64[ns].

```

ds.coords['XTIME'].compute()

xarray.DataArray'XTIME'Time: 8 array(['2019-01-01T03:00:00.000000000', '2019-01-01T06:00:00.000000000', '2019-01-01T09:00:00.000000000', '2019-01-01T12:00:00.000000000', '2019-01-01T15:00:00.000000000', '2019-01-01T18:00:00.000000000', '2019-01-01T21:00:00.000000000', '2019-01-02T00:00:00.000000000'], dtype='datetime64[ns]') ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unable to load multiple WRF NetCDF files into Dask array on pangeo 829426650
797211009 https://github.com/pydata/xarray/issues/5023#issuecomment-797211009 https://api.github.com/repos/pydata/xarray/issues/5023 MDEyOklzc3VlQ29tbWVudDc5NzIxMTAwOQ== shoyer 1217238 2021-03-12T03:32:49Z 2021-03-12T03:32:49Z MEMBER

I suspect there is at least one netCDF file with inconsistent metadata, e.g., without a Time dimension. If you can find and fix that dataset (or otherwise deal with it in whatever special way is required), then that would resolve the issue. In my experience, looping through files (rather than using open_mfdataset) is definitely helpful in this regard because you can verify that each file has the expected metadata.

The only reason why I can imagine this behavior might be different in GCP rather than on your workstation would be if you are using inconsistent package version.

Note: In general, for multi-file netCDF -> Zarr workflows you might check out pangeo-forge: https://github.com/pangeo-forge/pangeo-forge

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unable to load multiple WRF NetCDF files into Dask array on pangeo 829426650

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.27ms · About: xarray-datasette