home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 493058488

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
493058488 MDU6SXNzdWU0OTMwNTg0ODg= 3306 `ds.load()` with local files stalls and fails, and `to_zarr` does not include `store` in the dask graph 15016780 closed 0     7 2019-09-12T22:29:04Z 2019-09-16T01:22:09Z 2019-09-16T01:22:09Z NONE      

MCVE Code Sample

Below details a scenario where reading local netcdf files (shared via EFS) to create a zarr store is not calling store as part of the dask graph. I discovered it looks like this may actually be related to concatenate

I include a commented option where I try using files over https and this works (does store data on S3), but of course the open dataset calls are slower.

ds.to_zarr and ds.load() will both stall and eventually returning many instances of:

distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://192.168.62.40:37233'], ('concatenate-2babafa03313bcf979ae6ca3a8e16aad', 1, 10, 13) NoneType: None distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://192.168.62.40:37233'], ('concatenate-2babafa03313bcf979ae6ca3a8e16aad', 0, 6, 30)

```python

!/usr/bin/env python

coding: utf-8

In[1]:

import xarray as xr from dask.distributed import Client, progress import s3fs import zarr import datetime

In[16]:

import datetime

chunks = {'lat': 1000, 'lon': 1000} base = 2018 year = base ending = '090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc' days_of_year = list(range(152, 154)) file_urls = []

for doy in days_of_year: date = datetime.datetime(year, 1, 1) + datetime.timedelta(doy - 1) date = date.strftime('%Y%m%d') file_urls.append('./{}/{}/{}{}'.format(year, doy, date, ending))

print(file_urls) ds = xr.open_mfdataset(file_urls, chunks=chunks, combine='by_coords', parallel=True) ds

In[21]:

This works fine

base_url = 'https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/'

url_ending = '090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc?time[0:1:0],lat[0:1:17998],lon[0:1:35999],analysed_sst[0:1:0][0:1:17998][0:1:35999]'

year = 2018

days_of_year = list(range(152, 154))

file_urls = []

for doy in days_of_year:

date = datetime.datetime(year, 1, 1) + datetime.timedelta(doy - 1)

date = date.strftime('%Y%m%d')

file_urls.append('{}/{}/{}/{}{}'.format(base_url, year, doy, date, url_ending))

#file_urls

ds = xr.open_mfdataset(file_urls, chunks=chunks, parallel=True, combine='by_coords')

ds

In[ ]:

Write zarr to s3

myS3fs = s3fs.S3FileSystem(anon=False) zarr_s3 = 'aimeeb-datasets-private/mur_sst_zarr14' d = s3fs.S3Map(zarr_s3, s3=myS3fs) compressor = zarr.Blosc(cname='zstd', clevel=5, shuffle=zarr.Blosc.AUTOSHUFFLE) encoding = {v: {'compressor': compressor} for v in ds.data_vars} ds.to_zarr(d, mode='w', encoding=encoding)

```

Expected Output

Expect the call to_zarr to produce a graph with store

Problem Description

The end result should be a zarr store on S3

Output of xr.show_versions()

``` INSTALLED VERSIONS


commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.14.128-112.105.amzn2.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2

xarray: 0.12.3 pandas: 0.25.0 numpy: 1.16.4 scipy: 1.3.0 netCDF4: 1.5.1.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: 1.0.24 cfgrib: None iris: None bottleneck: None dask: 2.2.0 distributed: 2.2.0 matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: None setuptools: 41.0.1 pip: 19.2.1 conda: None pytest: None IPython: 7.7.0 sphinx: None ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3306/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 7 rows from issue in issue_comments
Powered by Datasette · Queries took 0.666ms · About: xarray-datasette