issues: 493058488
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
493058488 | MDU6SXNzdWU0OTMwNTg0ODg= | 3306 | `ds.load()` with local files stalls and fails, and `to_zarr` does not include `store` in the dask graph | 15016780 | closed | 0 | 7 | 2019-09-12T22:29:04Z | 2019-09-16T01:22:09Z | 2019-09-16T01:22:09Z | NONE | MCVE Code SampleBelow details a scenario where reading local netcdf files (shared via EFS) to create a zarr store is not calling I include a commented option where I try using files over https and this works (does store data on S3), but of course the open dataset calls are slower.
```python !/usr/bin/env pythoncoding: utf-8In[1]:import xarray as xr from dask.distributed import Client, progress import s3fs import zarr import datetime In[16]:import datetime chunks = {'lat': 1000, 'lon': 1000} base = 2018 year = base ending = '090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc' days_of_year = list(range(152, 154)) file_urls = [] for doy in days_of_year: date = datetime.datetime(year, 1, 1) + datetime.timedelta(doy - 1) date = date.strftime('%Y%m%d') file_urls.append('./{}/{}/{}{}'.format(year, doy, date, ending)) print(file_urls) ds = xr.open_mfdataset(file_urls, chunks=chunks, combine='by_coords', parallel=True) ds In[21]:This works finebase_url = 'https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/'url_ending = '090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc?time[0:1:0],lat[0:1:17998],lon[0:1:35999],analysed_sst[0:1:0][0:1:17998][0:1:35999]'year = 2018days_of_year = list(range(152, 154))file_urls = []for doy in days_of_year:date = datetime.datetime(year, 1, 1) + datetime.timedelta(doy - 1)date = date.strftime('%Y%m%d')file_urls.append('{}/{}/{}/{}{}'.format(base_url, year, doy, date, url_ending))#file_urlsds = xr.open_mfdataset(file_urls, chunks=chunks, parallel=True, combine='by_coords')dsIn[ ]:Write zarr to s3myS3fs = s3fs.S3FileSystem(anon=False) zarr_s3 = 'aimeeb-datasets-private/mur_sst_zarr14' d = s3fs.S3Map(zarr_s3, s3=myS3fs) compressor = zarr.Blosc(cname='zstd', clevel=5, shuffle=zarr.Blosc.AUTOSHUFFLE) encoding = {v: {'compressor': compressor} for v in ds.data_vars} ds.to_zarr(d, mode='w', encoding=encoding) ``` Expected OutputExpect the call Problem DescriptionThe end result should be a zarr store on S3 Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3306/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |