html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/444#issuecomment-119698728,https://api.github.com/repos/pydata/xarray/issues/444,119698728,MDEyOklzc3VlQ29tbWVudDExOTY5ODcyOA==,1177508,2015-07-08T19:07:41Z,2015-07-08T19:07:41Z,NONE,"I think this issue can be closed, after some digging and playing with different `netcdf4` modules I'm pretty certain that it was a linkage and compilation issue between system `hdf5` and `netcdf` libraries. You see, the computer I got this error on is one of those ""module load"" managed supercomputers... and somewhere on the way things got messed up while compiling python modules...
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-118436430,https://api.github.com/repos/pydata/xarray/issues/444,118436430,MDEyOklzc3VlQ29tbWVudDExODQzNjQzMA==,3101370,2015-07-03T23:02:52Z,2015-07-03T23:02:52Z,NONE,"@shoyer, there are basically two levels of thread safety for HDF5/h5py. First, the HDF5 library has an optional compile-time ""threadsafe"" build option that wraps all API access in a lock. This is all-or-nothing; I'm not aware of any per-file effects.
Second, h5py uses its own global lock on the Python side to serialize access, which is only disabled in MPI mode. For added protection, h5py also does not presently release the GIL around reads/writes.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-118373477,https://api.github.com/repos/pydata/xarray/issues/444,118373477,MDEyOklzc3VlQ29tbWVudDExODM3MzQ3Nw==,1177508,2015-07-03T15:28:16Z,2015-07-03T15:28:16Z,NONE,"Per file basis (`open_dataset`) there's no problem... but again, if I try `h5netcdf` engine, `open_mfdataset` doesn't throw a segmentation fault, but then I go into the string unicode/ascii problem. So I guess `h5netcdf` and `netcdf4` use the same netcdf/hdf5 libraries don't they? so if if works for `h5netcdf` then it should work for `netcdf4` as well...
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-118091969,https://api.github.com/repos/pydata/xarray/issues/444,118091969,MDEyOklzc3VlQ29tbWVudDExODA5MTk2OQ==,1177508,2015-07-02T16:55:02Z,2015-07-02T16:55:02Z,NONE,"Yes, I'm using the same files that I [once uploaded on Dropbox](https://www.dropbox.com/sh/wi7s59wu1soh5f3/AABxHjZ1ssXzIeAMwLG5cKAea?dl=0) for you to play with for #443. I'm not doing anything special, just passing in the glob pattern to `open_mfdataset` with no option for `engine` (which I guess goes for `netcdf4` by default).
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-117993960,https://api.github.com/repos/pydata/xarray/issues/444,117993960,MDEyOklzc3VlQ29tbWVudDExNzk5Mzk2MA==,1177508,2015-07-02T10:36:06Z,2015-07-02T12:18:09Z,NONE,"OK... as a follow-up, I did some tests and with `netcdf4` I got this error again, but using `open_mfdataset` with the latest versions of `h5py` & `h5netcdf` I don't. But there are some decodings that aren't happening now... for whatever reason (maybe `h5netcdf`?). Anyway, my netcdf files store the attributes in `'ascii'`, that is, `byte`s in python so when trying to check for the time I get:
```
Traceback (most recent call last):
File ""segfault.py"", line 62, in
concat_dim='time', engine='h5netcdf'))
File ""/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/backends/api.py"", line 202, in open_mfdataset
datasets = [open_dataset(p, **kwargs) for p in paths]
File ""/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/backends/api.py"", line 202, in
datasets = [open_dataset(p, **kwargs) for p in paths]
File ""/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/backends/api.py"", line 145, in open_dataset
return maybe_decode_store(store)
File ""/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/backends/api.py"", line 101, in maybe_decode_store
concat_characters=concat_characters, decode_coords=decode_coords)
File ""/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/conventions.py"", line 850, in decode_cf
decode_coords)
File ""/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/conventions.py"", line 791, in decode_cf_variables
decode_times=decode_times)
File ""/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/conventions.py"", line 735, in decode_cf_variable
if 'since' in attributes['units']:
TypeError: Type str doesn't support the buffer API
```
This is simple to solve.. just have every `byte` attribute decode to `'utf8'` when first reading in the variables... I'll have some more time to look at this alter today.
_edit_: boy... there are some differences between these packages (`netcdf4` & `h5netcdf`)... so, when trying to `open_mfdataset` with `netcdf4` I get the segmentation fault... when I open it with `h5netcdf` I don't, but I the attributes are in `byte`s so then `xray` gives some errors when trying to get the date/time... but `netcdf4` doesn't produce this error, it probably converts the `byte`s to `str`ings internally... so I went in and tried to patch some `.decode('utf8')` here and there in `xray` and it works... when using `h5netcdf`, but then I get another error from `h5netcdf`:
```
File ""/ichec/home/users/razvan/.local/lib/python3.4/site-packages/h5py/_hl/attrs.py"", line 55, in __getitem__
raise IOError(""Empty attributes cannot be read"")
OSError: Empty attributes cannot be read
```
I didn't put the full error cause I don't think it's relevant. Anyway, needless to say... `netcdf4` doesn't give this error... so these things need to be put in accordance somehow :)
_edit2_: so I was going through the posts here and now I saw you addressed this issue using that `lock` thing, which is set to `True` by default in `open_datset`, right? well, I don't know exactly what this thing is supposed to do, but I'm still getting a segmentation fault, but as stated before, only when using `netcdf4`, not `h5netcdf`, but then I run in that inconsistency with the `ascii` vs `utf8` issue if I use `h5netcdf`... maybe I should open an open issue about this string issue? I don't know if this is an upstream issue or not, I mean, I guess `h5netcdf` just decides to not convert the `ascii` to `utf8`, whereas `netcdf4` goes with the more contemporary approach of returning `utf8`... or is this internally handled by `xray`?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-117217039,https://api.github.com/repos/pydata/xarray/issues/444,117217039,MDEyOklzc3VlQ29tbWVudDExNzIxNzAzOQ==,1177508,2015-06-30T14:55:58Z,2015-06-30T14:55:58Z,NONE,"Well... I have a couple of remarks to make. After some more thought about this it might have been all along my fault. Let me explain. I have this machine at work where I don't have administrative privileges so I decided to give `linuxbrew` a try. Now there are some system `hdf5` libraries (but in custom locations) and they have this `module` command to load different versions of packages and set up proper environment variables. Before I had this issue, I did have `xray` installed with `dask` and everything compiled against the system libraries (and I had no problems with it). Then, with `linuxbrew` I started getting this weird behavior, using the latest version of `hdf5` (1.8.14), but then I tried with version (1.8.13) and I had the same issue. Then I read somewhere on the net that... because of this mixture of local - system install with `linuxbrew` there might be issues when compiling, that is, the compiler uses versions of some header files that don't necessarily match local installed libraries. I can't confirm this any more though cause I reconfigured everything and removed `linuxbrew` cause it was producing more problems than solving... but I'll be happy to give the current installation a try and see if I can reproduce the error... can't do more than this though... sorry.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-116146897,https://api.github.com/repos/pydata/xarray/issues/444,116146897,MDEyOklzc3VlQ29tbWVudDExNjE0Njg5Nw==,1177508,2015-06-27T21:33:30Z,2015-06-27T21:33:30Z,NONE,"So I just tried @mrocklin's idea with using single-threaded stuff. This seems to fix the segmentation fault, but I am very curious as to why there's a problem with working in parallel. I tried two different hdf5 libraries (I think version 1.8.13 and 1.8.14) but I got the same segmentation fault. Anyway, working on a single thread is not a big deal, I'll just do that for the time being... I already tried `gdb` on python but I'm not experienced enough to make heads or tails of it... I have the `gdb` backtrace [here](https://gist.github.com/razvanc87/0986c4f7a591772e1778) but I don't know what to do with it...
@shoyer, the files are not the issue here, they're the same ones I provided in #443.
Question: does the hdf5 library need to be built with parallel support (mpi or something) maybe?... thanks guys
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-115906191,https://api.github.com/repos/pydata/xarray/issues/444,115906191,MDEyOklzc3VlQ29tbWVudDExNTkwNjE5MQ==,1177508,2015-06-26T22:10:46Z,2015-06-26T22:22:11Z,NONE,"Just tried `engine='h5netcdf'`. Still get the segfault. It looks to me that something doesn't properly initialize the hdf5 library and calling that `isnull` function like this somehow triggers some initialization for the both arrays. It might also be the `&` operator... because if I do `isnull(arr1) & isnull(arr2)` I still get the segmentation fault. Only when using `isnull(arr1 & arr2)` it seems to work... strange things.
_edit:_ I was right... it's actually the `&` operator, I just need to call `arr1 & arr2` before the return statement and I don't get the segmentation fault...
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107
https://github.com/pydata/xarray/issues/444#issuecomment-115900337,https://api.github.com/repos/pydata/xarray/issues/444,115900337,MDEyOklzc3VlQ29tbWVudDExNTkwMDMzNw==,1177508,2015-06-26T21:50:01Z,2015-06-26T21:53:50Z,NONE,"Unfortunately I can't use `engine='scipy'` cause they're not netcdf3 files so it defaults to `'netcdf4'`. On the other hand [here](https://gist.github.com/razvanc87/0986c4f7a591772e1778) you can find the back trace from `gdb`... if that helps in any way...
```
print(arr1.dtype, arr2.dtype)
print((arr1 == arr2))
print((arr1 == arr2) | (isnull(arr1) & isnull(arr2)))
# gives:
float64 float64
dask.array
dask.array
```
Funny thing is when I'm adding these print statements and so on I get some traceback from Python (some times). Without them I would only get segmetation fault with no additional information. For example, just now, after introducing these `print`s I got [this](https://gist.github.com/razvanc87/82dc9635f89b55ffaf46) traceback. This doesn't seem to be an `xray` bug, I mean it can't since it's just `Python` code... but any help is appreciated. Thanks!
_edit:_ oh yeah... this is a funny thing. If I do `print(((arr1 == arr2) | (isnull(arr1) & isnull(arr2))).all())`, I get `dask.array` which I guess it's a problem... so calling that `all` method kind of screws things up, or at least calls other stuff that screw it up, but I have no idea why calling `isnull(arr1 & arr2)` before all this... makes it run without segfault.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,91184107