home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 351339156

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1745#issuecomment-351339156 https://api.github.com/repos/pydata/xarray/issues/1745 351339156 MDEyOklzc3VlQ29tbWVudDM1MTMzOTE1Ng== 10512793 2017-12-13T09:49:13Z 2017-12-13T09:49:13Z CONTRIBUTOR

I'm getting a similar error. The file size is very small (Kbs), so I don't think it's the size issue above. Instead, the error I get is due to something strange happening in core.utils.is_remote_uri(path). The error occurs when I'm reading netcdf3 files with the default netcdf4 engine (which should be able to handle netcdf3 of course).
There is a workaround in that I can use the scipy reader to read netcdf3 files with no problems. Note that whenever I refer to "error" below it means the error that gives the following output rather than a python exception.

The error message is: *** Error in `/path/anaconda2/envs/base3/bin/python': corrupted size vs. prev_size: 0x0000000001814930 *** Aborted (core dumped)

The function where the problem arises is: def is_remote_uri(path): return bool(re.search('^https?\://', path)) The function is called a few times during the open_dataset (or open_mfdataset, I get the same error). On the third or fourth call it triggers the error. As I'm not using remote datasets, I can hard-code the output of the function to be return False and then the file reads with no problems.

The is_remote_uri(path) call is made a few times. However, it's only on line 233 of netCDF4_.py with is_remote_uri(self._filename) that the error is triggered.

I've output the argument to the is_remote_uri() function for each time it's called. In the first call the argument is the filename, in the second call the argument is the filename with the absolute path and in the third (and fatal) call the argument is also the filename with the absolute path.

I can't see any difference between the arguments to the function on the second and third call. When I copy them, assign them to variables and check equality in python it evaluates to True.

I've added in a simpler call to re.search in the function: def is_remote_uri(path): print((re.search('.nc','.nc'))) return bool(re.search('^https?\://', path)) This also triggers the error on the third call to the function. As such we can rule out something to do with the path name.

I've played around with the print((re.search('.nc','.nc'))) line that I've added in. It only triggers an error on the third call when the first argument of re.search has a dot in the string, so re.search('.nc','.nc') causes the error, but re.search('nc','.nc') doesn't. The error isn't dependent on .nc in any way, '.AAA' in the arguments will cause the same error. The error doesn't replicate if I simply import re in ipython.

The error does not occur in xarray 0.9.6. The same function is called in a similar way and the function evaluates to False each time.

I'm not really sure what to do next, though. The obvious workaround is to set engine='scipy' if you're working with netcdf3 files.

Can anyone replicate this error?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  277538485
Powered by Datasette · Queries took 0.647ms · About: xarray-datasette