html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/444#issuecomment-119698728,https://api.github.com/repos/pydata/xarray/issues/444,119698728,MDEyOklzc3VlQ29tbWVudDExOTY5ODcyOA==,1177508,2015-07-08T19:07:41Z,2015-07-08T19:07:41Z,NONE,"I think this issue can be closed, after some digging and playing with different `netcdf4` modules I'm pretty certain that it was a linkage and compilation issue between system `hdf5` and `netcdf` libraries. You see, the computer I got this error on is one of those ""module load"" managed supercomputers... and somewhere on the way things got messed up while compiling python modules... ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107 https://github.com/pydata/xarray/issues/444#issuecomment-118373477,https://api.github.com/repos/pydata/xarray/issues/444,118373477,MDEyOklzc3VlQ29tbWVudDExODM3MzQ3Nw==,1177508,2015-07-03T15:28:16Z,2015-07-03T15:28:16Z,NONE,"Per file basis (`open_dataset`) there's no problem... but again, if I try `h5netcdf` engine, `open_mfdataset` doesn't throw a segmentation fault, but then I go into the string unicode/ascii problem. So I guess `h5netcdf` and `netcdf4` use the same netcdf/hdf5 libraries don't they? so if if works for `h5netcdf` then it should work for `netcdf4` as well... ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107 https://github.com/pydata/xarray/issues/444#issuecomment-118091969,https://api.github.com/repos/pydata/xarray/issues/444,118091969,MDEyOklzc3VlQ29tbWVudDExODA5MTk2OQ==,1177508,2015-07-02T16:55:02Z,2015-07-02T16:55:02Z,NONE,"Yes, I'm using the same files that I [once uploaded on Dropbox](https://www.dropbox.com/sh/wi7s59wu1soh5f3/AABxHjZ1ssXzIeAMwLG5cKAea?dl=0) for you to play with for #443. I'm not doing anything special, just passing in the glob pattern to `open_mfdataset` with no option for `engine` (which I guess goes for `netcdf4` by default). ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107 https://github.com/pydata/xarray/issues/444#issuecomment-117993960,https://api.github.com/repos/pydata/xarray/issues/444,117993960,MDEyOklzc3VlQ29tbWVudDExNzk5Mzk2MA==,1177508,2015-07-02T10:36:06Z,2015-07-02T12:18:09Z,NONE,"OK... as a follow-up, I did some tests and with `netcdf4` I got this error again, but using `open_mfdataset` with the latest versions of `h5py` & `h5netcdf` I don't. But there are some decodings that aren't happening now... for whatever reason (maybe `h5netcdf`?). Anyway, my netcdf files store the attributes in `'ascii'`, that is, `byte`s in python so when trying to check for the time I get: ``` Traceback (most recent call last): File ""segfault.py"", line 62, in concat_dim='time', engine='h5netcdf')) File ""/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/backends/api.py"", line 202, in open_mfdataset datasets = [open_dataset(p, **kwargs) for p in paths] File ""/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/backends/api.py"", line 202, in datasets = [open_dataset(p, **kwargs) for p in paths] File ""/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/backends/api.py"", line 145, in open_dataset return maybe_decode_store(store) File ""/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/backends/api.py"", line 101, in maybe_decode_store concat_characters=concat_characters, decode_coords=decode_coords) File ""/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/conventions.py"", line 850, in decode_cf decode_coords) File ""/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/conventions.py"", line 791, in decode_cf_variables decode_times=decode_times) File ""/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/conventions.py"", line 735, in decode_cf_variable if 'since' in attributes['units']: TypeError: Type str doesn't support the buffer API ``` This is simple to solve.. just have every `byte` attribute decode to `'utf8'` when first reading in the variables... I'll have some more time to look at this alter today. _edit_: boy... there are some differences between these packages (`netcdf4` & `h5netcdf`)... so, when trying to `open_mfdataset` with `netcdf4` I get the segmentation fault... when I open it with `h5netcdf` I don't, but I the attributes are in `byte`s so then `xray` gives some errors when trying to get the date/time... but `netcdf4` doesn't produce this error, it probably converts the `byte`s to `str`ings internally... so I went in and tried to patch some `.decode('utf8')` here and there in `xray` and it works... when using `h5netcdf`, but then I get another error from `h5netcdf`: ``` File ""/ichec/home/users/razvan/.local/lib/python3.4/site-packages/h5py/_hl/attrs.py"", line 55, in __getitem__ raise IOError(""Empty attributes cannot be read"") OSError: Empty attributes cannot be read ``` I didn't put the full error cause I don't think it's relevant. Anyway, needless to say... `netcdf4` doesn't give this error... so these things need to be put in accordance somehow :) _edit2_: so I was going through the posts here and now I saw you addressed this issue using that `lock` thing, which is set to `True` by default in `open_datset`, right? well, I don't know exactly what this thing is supposed to do, but I'm still getting a segmentation fault, but as stated before, only when using `netcdf4`, not `h5netcdf`, but then I run in that inconsistency with the `ascii` vs `utf8` issue if I use `h5netcdf`... maybe I should open an open issue about this string issue? I don't know if this is an upstream issue or not, I mean, I guess `h5netcdf` just decides to not convert the `ascii` to `utf8`, whereas `netcdf4` goes with the more contemporary approach of returning `utf8`... or is this internally handled by `xray`? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107 https://github.com/pydata/xarray/issues/444#issuecomment-117217039,https://api.github.com/repos/pydata/xarray/issues/444,117217039,MDEyOklzc3VlQ29tbWVudDExNzIxNzAzOQ==,1177508,2015-06-30T14:55:58Z,2015-06-30T14:55:58Z,NONE,"Well... I have a couple of remarks to make. After some more thought about this it might have been all along my fault. Let me explain. I have this machine at work where I don't have administrative privileges so I decided to give `linuxbrew` a try. Now there are some system `hdf5` libraries (but in custom locations) and they have this `module` command to load different versions of packages and set up proper environment variables. Before I had this issue, I did have `xray` installed with `dask` and everything compiled against the system libraries (and I had no problems with it). Then, with `linuxbrew` I started getting this weird behavior, using the latest version of `hdf5` (1.8.14), but then I tried with version (1.8.13) and I had the same issue. Then I read somewhere on the net that... because of this mixture of local - system install with `linuxbrew` there might be issues when compiling, that is, the compiler uses versions of some header files that don't necessarily match local installed libraries. I can't confirm this any more though cause I reconfigured everything and removed `linuxbrew` cause it was producing more problems than solving... but I'll be happy to give the current installation a try and see if I can reproduce the error... can't do more than this though... sorry. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107 https://github.com/pydata/xarray/issues/444#issuecomment-116146897,https://api.github.com/repos/pydata/xarray/issues/444,116146897,MDEyOklzc3VlQ29tbWVudDExNjE0Njg5Nw==,1177508,2015-06-27T21:33:30Z,2015-06-27T21:33:30Z,NONE,"So I just tried @mrocklin's idea with using single-threaded stuff. This seems to fix the segmentation fault, but I am very curious as to why there's a problem with working in parallel. I tried two different hdf5 libraries (I think version 1.8.13 and 1.8.14) but I got the same segmentation fault. Anyway, working on a single thread is not a big deal, I'll just do that for the time being... I already tried `gdb` on python but I'm not experienced enough to make heads or tails of it... I have the `gdb` backtrace [here](https://gist.github.com/razvanc87/0986c4f7a591772e1778) but I don't know what to do with it... @shoyer, the files are not the issue here, they're the same ones I provided in #443. Question: does the hdf5 library need to be built with parallel support (mpi or something) maybe?... thanks guys ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107 https://github.com/pydata/xarray/issues/444#issuecomment-115906191,https://api.github.com/repos/pydata/xarray/issues/444,115906191,MDEyOklzc3VlQ29tbWVudDExNTkwNjE5MQ==,1177508,2015-06-26T22:10:46Z,2015-06-26T22:22:11Z,NONE,"Just tried `engine='h5netcdf'`. Still get the segfault. It looks to me that something doesn't properly initialize the hdf5 library and calling that `isnull` function like this somehow triggers some initialization for the both arrays. It might also be the `&` operator... because if I do `isnull(arr1) & isnull(arr2)` I still get the segmentation fault. Only when using `isnull(arr1 & arr2)` it seems to work... strange things. _edit:_ I was right... it's actually the `&` operator, I just need to call `arr1 & arr2` before the return statement and I don't get the segmentation fault... ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107 https://github.com/pydata/xarray/issues/444#issuecomment-115900337,https://api.github.com/repos/pydata/xarray/issues/444,115900337,MDEyOklzc3VlQ29tbWVudDExNTkwMDMzNw==,1177508,2015-06-26T21:50:01Z,2015-06-26T21:53:50Z,NONE,"Unfortunately I can't use `engine='scipy'` cause they're not netcdf3 files so it defaults to `'netcdf4'`. On the other hand [here](https://gist.github.com/razvanc87/0986c4f7a591772e1778) you can find the back trace from `gdb`... if that helps in any way... ``` print(arr1.dtype, arr2.dtype) print((arr1 == arr2)) print((arr1 == arr2) | (isnull(arr1) & isnull(arr2))) # gives: float64 float64 dask.array dask.array ``` Funny thing is when I'm adding these print statements and so on I get some traceback from Python (some times). Without them I would only get segmetation fault with no additional information. For example, just now, after introducing these `print`s I got [this](https://gist.github.com/razvanc87/82dc9635f89b55ffaf46) traceback. This doesn't seem to be an `xray` bug, I mean it can't since it's just `Python` code... but any help is appreciated. Thanks! _edit:_ oh yeah... this is a funny thing. If I do `print(((arr1 == arr2) | (isnull(arr1) & isnull(arr2))).all())`, I get `dask.array` which I guess it's a problem... so calling that `all` method kind of screws things up, or at least calls other stuff that screw it up, but I have no idea why calling `isnull(arr1 & arr2)` before all this... makes it run without segfault. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107