home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 178359375

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
178359375 MDU6SXNzdWUxNzgzNTkzNzU= 1014 dask tokenize error with chunking 1197350 closed 0     1 2016-09-21T14:14:10Z 2016-09-22T02:38:08Z 2016-09-22T02:38:08Z MEMBER      

I have hit a problem with my custom xarray store: https://github.com/xgcm/xgcm/blob/master/xgcm/models/mitgcm/mds_store.py

Unfortunately it is hard for me to create a re-producible example, since this error is only coming up when I try to read a large binary dataset stored on my server. Nevertheless, I am opening an issue in hopes that someone can help me.

I create an xarray dataset via a custom function

python ds = xgcm.open_mdsdataset(ddir, iters, delta_t=deltaT, prefix=['DiagLAYERS-diapycnal','DiagLAYERS-transport'])

This function creates a dataset object successfully and then calls ds.chunk(). Dask is unable to tokenize the variables and fails. I don't really understand why, but it seems to ultimately depend on the presence and value of the filename attribute in the data getting passed to dask.

Any advice would be appreciated. The relevant stack trace is

``` python /home/rpa/xgcm/xgcm/models/mitgcm/mds_store.pyc in open_mdsdataset(dirname, iters, prefix, read_grid, delta_t, ref_date, calendar, geometry, grid_vars_to_coords, swap_dims, endian, chunks, ignore_unknown_vars) 154 # do we need more fancy logic (like open_dataset), or is this enough 155 if chunks is not None: --> 156 ds = ds.chunk(chunks) 157 158 return ds

/home/rpa/xarray/xarray/core/dataset.py in chunk(self, chunks, name_prefix, token, lock) 863 864 variables = OrderedDict([(k, maybe_chunk(k, v, chunks)) --> 865 for k, v in self.variables.items()]) 866 return self._replace_vars_and_dims(variables) 867

/home/rpa/xarray/xarray/core/dataset.py in maybe_chunk(name, var, chunks) 856 chunks = None 857 if var.ndim > 0: --> 858 token2 = tokenize(name, token if token else var._data) 859 name2 = '%s%s-%s' % (name_prefix, name, token2) 860 return var.chunk(chunks, name=name2, lock=lock)

/home/rpa/dask/dask/base.pyc in tokenize(args, *kwargs) 355 if kwargs: 356 args = args + (kwargs,) --> 357 return md5(str(tuple(map(normalize_token, args))).encode()).hexdigest()

/home/rpa/dask/dask/utils.pyc in call(self, arg) 510 for cls in inspect.getmro(typ)[1:]: 511 if cls in lk: --> 512 return lkcls 513 raise TypeError("No dispatch for {0} type".format(typ)) 514

/home/rpa/dask/dask/base.pyc in normalize_array(x) 320 return (str(x), x.dtype) 321 if hasattr(x, 'mode') and hasattr(x, 'filename'): --> 322 return x.filename, os.path.getmtime(x.filename), x.dtype, x.shape 323 if x.dtype.hasobject: 324 try:

/usr/local/anaconda/lib/python2.7/genericpath.pyc in getmtime(filename) 60 def getmtime(filename): 61 """Return the last modification time of a file, reported by os.stat().""" ---> 62 return os.stat(filename).st_mtime 63 64

TypeError: coercing to Unicode: need string or buffer, NoneType found ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1014/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 0.608ms · About: xarray-datasette