home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 506475819

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2501#issuecomment-506475819 https://api.github.com/repos/pydata/xarray/issues/2501 506475819 MDEyOklzc3VlQ29tbWVudDUwNjQ3NTgxOQ== 1872600 2019-06-27T19:16:28Z 2019-06-27T19:24:31Z NONE

I tried this, and either I didn't apply it right, or it didn't work. The memory use kept growing until the process died. My code to process the 8760 netcdf files with open_mfdataset looks like this:

```python import xarray as xr from dask.distributed import Client, progress, LocalCluster

cluster = LocalCluster() client = Client(cluster)

import pandas as pd

dates = pd.date_range(start='2009-01-01 00:00',end='2009-12-31 23:00', freq='1h') files = ['./nc/{}/{}.CHRTOUT_DOMAIN1.comp'.format(date.strftime('%Y'),date.strftime('%Y%m%d%H%M')) for date in dates]

def drop_coords(ds): return ds.reset_coords(drop=True)

ds = xr.open_mfdataset(files, preprocess=drop_coords, autoclose=True, parallel=True) ds1 = ds.chunk(chunks={'time':168, 'feature_id':209929})

import numcodecs numcodecs.blosc.use_threads = False ds1.to_zarr('zarr/2009', mode='w', consolidated=True) ```

I transfered the netcdf files from AWS S3 to my local disk to run this, using this command:

rclone sync --include '*.CHRTOUT_DOMAIN1.comp' aws-east:nwm-archive/2009 . --checksum --fast-list --transfers 16 @TomAugspurger, if you could take a look, that would be great, and if you have any ideas of how to make this example simpler/more easily reproducible, please let me know.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  372848074
Powered by Datasette · Queries took 0.988ms · About: xarray-datasette