github: issue_comments: 26 rows where issue = 91184107 sorted by updated

26 rows where issue = 91184107 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
120447670	https://github.com/pydata/xarray/issues/444#issuecomment-120447670	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDEyMDQ0NzY3MA==	shoyer 1217238	2015-07-10T16:11:19Z	2015-07-10T16:11:19Z	MEMBER	@razvanc87 I've gotten a few other reports of issues with multithreading (not just you), so I think we do definitely need to add our own lock when accessing these files. Misconfigured hdf5 installs may not be so uncommon.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
119698728	https://github.com/pydata/xarray/issues/444#issuecomment-119698728	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExOTY5ODcyOA==	razcore-rad 1177508	2015-07-08T19:07:41Z	2015-07-08T19:07:41Z	NONE	I think this issue can be closed, after some digging and playing with different `netcdf4` modules I'm pretty certain that it was a linkage and compilation issue between system `hdf5` and `netcdf` libraries. You see, the computer I got this error on is one of those "module load" managed supercomputers... and somewhere on the way things got messed up while compiling python modules...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
118436430	https://github.com/pydata/xarray/issues/444#issuecomment-118436430	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExODQzNjQzMA==	andrewcollette 3101370	2015-07-03T23:02:52Z	2015-07-03T23:02:52Z	NONE	@shoyer, there are basically two levels of thread safety for HDF5/h5py. First, the HDF5 library has an optional compile-time "threadsafe" build option that wraps all API access in a lock. This is all-or-nothing; I'm not aware of any per-file effects. Second, h5py uses its own global lock on the Python side to serialize access, which is only disabled in MPI mode. For added protection, h5py also does not presently release the GIL around reads/writes.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
118435615	https://github.com/pydata/xarray/issues/444#issuecomment-118435615	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExODQzNTYxNQ==	shoyer 1217238	2015-07-03T22:43:41Z	2015-07-03T22:43:41Z	MEMBER	@razvanc87 netcdf4 and h5py use the same HDF5 libraries, but have different bindings from Python. H5py likely does a more careful job of using its own locks to ensure thread safety, which likely explains the difference you are seeing (the attribute encoding is a separate issue).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
118435484	https://github.com/pydata/xarray/issues/444#issuecomment-118435484	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExODQzNTQ4NA==	shoyer 1217238	2015-07-03T22:40:57Z	2015-07-03T22:40:57Z	MEMBER	The library itself is not threadsafe? What about on a per-file basis? @andrewcollette could you comment on this for h5py/hdf5? @mrocklin based on my reading of Andrew's comment in the h5py issue, this is indeed the case.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
118373477	https://github.com/pydata/xarray/issues/444#issuecomment-118373477	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExODM3MzQ3Nw==	razcore-rad 1177508	2015-07-03T15:28:16Z	2015-07-03T15:28:16Z	NONE	Per file basis (`open_dataset`) there's no problem... but again, if I try `h5netcdf` engine, `open_mfdataset` doesn't throw a segmentation fault, but then I go into the string unicode/ascii problem. So I guess `h5netcdf` and `netcdf4` use the same netcdf/hdf5 libraries don't they? so if if works for `h5netcdf` then it should work for `netcdf4` as well...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
118373188	https://github.com/pydata/xarray/issues/444#issuecomment-118373188	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExODM3MzE4OA==	mrocklin 306380	2015-07-03T15:26:18Z	2015-07-03T15:26:18Z	MEMBER	The library itself is not threadsafe? What about on a per-file basis?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
118195247	https://github.com/pydata/xarray/issues/444#issuecomment-118195247	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExODE5NTI0Nw==	shoyer 1217238	2015-07-02T23:45:01Z	2015-07-02T23:45:01Z	MEMBER	Ah, I think I know why the seg faults are still occuring. By default, `dask.array.from_array` uses a thread lock that is specific to each array variable. We need a global thread lock, because the HDF5 library is not thread safe. @mrocklin maybe `da.from_array` should use a global thread lock if `lock=True`? Alternatively, I could just change this in xray -- but I suspect that other dask users who want a lock also probably want a global lock.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
118091969	https://github.com/pydata/xarray/issues/444#issuecomment-118091969	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExODA5MTk2OQ==	razcore-rad 1177508	2015-07-02T16:55:02Z	2015-07-02T16:55:02Z	NONE	Yes, I'm using the same files that I once uploaded on Dropbox for you to play with for #443. I'm not doing anything special, just passing in the glob pattern to `open_mfdataset` with no option for `engine` (which I guess goes for `netcdf4` by default).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
118090209	https://github.com/pydata/xarray/issues/444#issuecomment-118090209	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExODA5MDIwOQ==	shoyer 1217238	2015-07-02T16:46:57Z	2015-07-02T16:46:57Z	MEMBER	Thanks for your help debugging! I made a new issue for ascii attributes handling: https://github.com/xray/xray/issues/451 This is one case where Python 3's insistence that bytes and strings are different is annoying. I'll probably have to decode all bytes type attributes read from h5netcdf. How do you trigger the seg-fault with netcdf4-python? Just using `open_mfdataset` as before? I'm a little surprised that still happens with the thread lock.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
117993960	https://github.com/pydata/xarray/issues/444#issuecomment-117993960	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExNzk5Mzk2MA==	razcore-rad 1177508	2015-07-02T10:36:06Z	2015-07-02T12:18:09Z	NONE	OK... as a follow-up, I did some tests and with `netcdf4` I got this error again, but using `open_mfdataset` with the latest versions of `h5py` & `h5netcdf` I don't. But there are some decodings that aren't happening now... for whatever reason (maybe `h5netcdf`?). Anyway, my netcdf files store the attributes in `'ascii'`, that is, `byte`s in python so when trying to check for the time I get: Traceback (most recent call last): File "segfault.py", line 62, in <module> concat_dim='time', engine='h5netcdf')) File "/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/backends/api.py", line 202, in open_mfdataset datasets = [open_dataset(p, kwargs) for p in paths] File "/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/backends/api.py", line 202, in <listcomp> datasets = [open_dataset(p, kwargs) for p in paths] File "/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/backends/api.py", line 145, in open_dataset return maybe_decode_store(store) File "/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/backends/api.py", line 101, in maybe_decode_store concat_characters=concat_characters, decode_coords=decode_coords) File "/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/conventions.py", line 850, in decode_cf decode_coords) File "/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/conventions.py", line 791, in decode_cf_variables decode_times=decode_times) File "/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/conventions.py", line 735, in decode_cf_variable if 'since' in attributes['units']: TypeError: Type str doesn't support the buffer API This is simple to solve.. just have every `byte` attribute decode to `'utf8'` when first reading in the variables... I'll have some more time to look at this alter today. edit: boy... there are some differences between these packages (`netcdf4` & `h5netcdf`)... so, when trying to `open_mfdataset` with `netcdf4` I get the segmentation fault... when I open it with `h5netcdf` I don't, but I the attributes are in `byte`s so then `xray` gives some errors when trying to get the date/time... but `netcdf4` doesn't produce this error, it probably converts the `byte`s to `str`ings internally... so I went in and tried to patch some `.decode('utf8')` here and there in `xray` and it works... when using `h5netcdf`, but then I get another error from `h5netcdf`: `File "/ichec/home/users/razvan/.local/lib/python3.4/site-packages/h5py/_hl/attrs.py", line 55, in __getitem__ raise IOError("Empty attributes cannot be read") OSError: Empty attributes cannot be read` I didn't put the full error cause I don't think it's relevant. Anyway, needless to say... `netcdf4` doesn't give this error... so these things need to be put in accordance somehow :) edit2: so I was going through the posts here and now I saw you addressed this issue using that `lock` thing, which is set to `True` by default in `open_datset`, right? well, I don't know exactly what this thing is supposed to do, but I'm still getting a segmentation fault, but as stated before, only when using `netcdf4`, not `h5netcdf`, but then I run in that inconsistency with the `ascii` vs `utf8` issue if I use `h5netcdf`... maybe I should open an open issue about this string issue? I don't know if this is an upstream issue or not, I mean, I guess `h5netcdf` just decides to not convert the `ascii` to `utf8`, whereas `netcdf4` goes with the more contemporary approach of returning `utf8`... or is this internally handled by `xray`?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
117217039	https://github.com/pydata/xarray/issues/444#issuecomment-117217039	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExNzIxNzAzOQ==	razcore-rad 1177508	2015-06-30T14:55:58Z	2015-06-30T14:55:58Z	NONE	Well... I have a couple of remarks to make. After some more thought about this it might have been all along my fault. Let me explain. I have this machine at work where I don't have administrative privileges so I decided to give `linuxbrew` a try. Now there are some system `hdf5` libraries (but in custom locations) and they have this `module` command to load different versions of packages and set up proper environment variables. Before I had this issue, I did have `xray` installed with `dask` and everything compiled against the system libraries (and I had no problems with it). Then, with `linuxbrew` I started getting this weird behavior, using the latest version of `hdf5` (1.8.14), but then I tried with version (1.8.13) and I had the same issue. Then I read somewhere on the net that... because of this mixture of local - system install with `linuxbrew` there might be issues when compiling, that is, the compiler uses versions of some header files that don't necessarily match local installed libraries. I can't confirm this any more though cause I reconfigured everything and removed `linuxbrew` cause it was producing more problems than solving... but I'll be happy to give the current installation a try and see if I can reproduce the error... can't do more than this though... sorry.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
116787098	https://github.com/pydata/xarray/issues/444#issuecomment-116787098	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExNjc4NzA5OA==	shoyer 1217238	2015-06-29T18:30:48Z	2015-06-29T18:30:48Z	MEMBER	@razvanc87 What version of h5py were you using with h5netcdf? @andrewcollette suggests (https://github.com/h5py/h5py/issues/591#issuecomment-116785660) that h5py should already have the lock that fixes this issue if you were using h5py 2.4.0 or later.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
116779716	https://github.com/pydata/xarray/issues/444#issuecomment-116779716	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExNjc3OTcxNg==	shoyer 1217238	2015-06-29T18:07:52Z	2015-06-29T18:07:52Z	MEMBER	Just merged the fix to master. @razvanc87 if you could try installing the development version, I would love to hear if this resolves your issues.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
116189535	https://github.com/pydata/xarray/issues/444#issuecomment-116189535	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExNjE4OTUzNQ==	shoyer 1217238	2015-06-28T03:34:30Z	2015-06-28T03:34:30Z	MEMBER	I have a tentative fix (adding the threading lock) in https://github.com/xray/xray/pull/446 Still wondering why multi-threading can't use more than one CPU -- hopefully my h5py issue (referenced above) will get us some answers.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
116182511	https://github.com/pydata/xarray/issues/444#issuecomment-116182511	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExNjE4MjUxMQ==	mrocklin 306380	2015-06-28T01:55:39Z	2015-06-28T01:55:39Z	MEMBER	Oh, I didn't realize that that was built in already. Sounds like you could handle this easily on the xray side. On Jun 27, 2015 4:40 PM, "Stephan Hoyer" notifications@github.com wrote: Of course, concurrent access to HDF5 files works fine on my laptop, using Anaconda's build of HDF5 (version 1.8.14). I have no idea what special flags they invoked when building it :). That said, I have been unable to produce any benchmarks that show improved performance when simply doing multithreaded reads without doing any computation (e.g., %time xray.open_dataset(..., chunks=...).load()). Even when I'm reading multiple independent chunks compressed on disk, CPU seems to be pegged at 100%, when using either netCDF4-python or h5py (via h5netcdf) to read the data. For non-compressed data, reads seem to be limited by disk speed, so CPU is also not relevant. Given these considerations, it seems like we should use a lock when reading data into xray with dask. @mrocklin https://github.com/mrocklin we could just use lock=True with da.from_array, right? If we can find use cases for multi-threaded reads, we could also add an optional lock argument to open_dataset/open_mfdataset. — Reply to this email directly or view it on GitHub https://github.com/xray/xray/issues/444#issuecomment-116165986.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
116165986	https://github.com/pydata/xarray/issues/444#issuecomment-116165986	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExNjE2NTk4Ng==	shoyer 1217238	2015-06-27T23:40:29Z	2015-06-27T23:40:29Z	MEMBER	Of course, concurrent access to HDF5 files works fine on my laptop, using Anaconda's build of HDF5 (version 1.8.14). I have no idea what special flags they invoked when building it :). That said, I have been unable to produce any benchmarks that show improved performance when simply doing multithreaded reads without doing any computation (e.g., `%time xray.open_dataset(..., chunks=...).load()`). Even when I'm reading multiple independent chunks compressed on disk, CPU seems to be pegged at 100%, when using either netCDF4-python or h5py (via h5netcdf) to read the data. For non-compressed data, reads seem to be limited by disk speed, so CPU is also not relevant. Given these considerations, it seems like we should use a lock when reading data into xray with dask. @mrocklin we could just use `lock=True` with `da.from_array`, right? If we can find use cases for multi-threaded reads, we could also add an optional `lock` argument to `open_dataset`/`open_mfdataset`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
116162351	https://github.com/pydata/xarray/issues/444#issuecomment-116162351	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExNjE2MjM1MQ==	mrocklin 306380	2015-06-27T22:12:37Z	2015-06-27T22:12:37Z	MEMBER	There was a similar problem with PyTables, which didn't support concurrency well. This resulted in the from-hdf5 function in dask array which uses explicit locks to avoid concurrent access. We could repeat this treatment more generally without much trouble to force single threaded access on access but still allow parallelism otherwise. On Jun 27, 2015 2:33 PM, "Răzvan Rădulescu" notifications@github.com wrote: So I just tried @mrocklin https://github.com/mrocklin's idea with using single-threaded stuff. This seems to fix the segmentation fault, but I am very curious as to why there's a problem with working in parallel. I tried two different hdf5 libraries (I think version 1.8.13 and 1.8.14) but I got the same segmentation fault. Anyway, working on a single thread is not a big deal, I'll just do that for the time being... I already tried gdb on python but I'm not experienced enough to make heads or tails of it... I have the gdb backtrace here https://gist.github.com/razvanc87/0986c4f7a591772e1778 but I don't know what to do with it... @shoyer https://github.com/shoyer, the files are not the issue here, they're the same ones I provided in #443 https://github.com/xray/xray/issues/443. Question: does the hdf5 library need to be built with parallel support (mpi or something) maybe?... thanks guys — Reply to this email directly or view it on GitHub https://github.com/xray/xray/issues/444#issuecomment-116146897.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
116146897	https://github.com/pydata/xarray/issues/444#issuecomment-116146897	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExNjE0Njg5Nw==	razcore-rad 1177508	2015-06-27T21:33:30Z	2015-06-27T21:33:30Z	NONE	So I just tried @mrocklin's idea with using single-threaded stuff. This seems to fix the segmentation fault, but I am very curious as to why there's a problem with working in parallel. I tried two different hdf5 libraries (I think version 1.8.13 and 1.8.14) but I got the same segmentation fault. Anyway, working on a single thread is not a big deal, I'll just do that for the time being... I already tried `gdb` on python but I'm not experienced enough to make heads or tails of it... I have the `gdb` backtrace here but I don't know what to do with it... @shoyer, the files are not the issue here, they're the same ones I provided in #443. Question: does the hdf5 library need to be built with parallel support (mpi or something) maybe?... thanks guys	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
115930797	https://github.com/pydata/xarray/issues/444#issuecomment-115930797	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExNTkzMDc5Nw==	mrocklin 306380	2015-06-27T01:09:44Z	2015-06-27T01:09:44Z	MEMBER	Alternatively can we try doing the operations that xray would do manually and see if one of them triggers something? One could also try `$ gdb python`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
115930685	https://github.com/pydata/xarray/issues/444#issuecomment-115930685	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExNTkzMDY4NQ==	mrocklin 306380	2015-06-27T01:08:13Z	2015-06-27T01:08:13Z	MEMBER	@shoyer asked me to chime in in case this is an issue with dask. One thing to try would be to remove multi-threading from the equation. I'm not sure how this would affect things but it's worth a shot. ``` python import dask from dask.async import get_sync dask.set_options(get=get_sync) # use single-threaded scheduler by default ... do work as normal ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
115925776	https://github.com/pydata/xarray/issues/444#issuecomment-115925776	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExNTkyNTc3Ng==	shoyer 1217238	2015-06-27T00:49:19Z	2015-06-27T00:49:19Z	MEMBER	do you have an example file? this might also be your HDF5 install....	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
115906191	https://github.com/pydata/xarray/issues/444#issuecomment-115906191	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExNTkwNjE5MQ==	razcore-rad 1177508	2015-06-26T22:10:46Z	2015-06-26T22:22:11Z	NONE	Just tried `engine='h5netcdf'`. Still get the segfault. It looks to me that something doesn't properly initialize the hdf5 library and calling that `isnull` function like this somehow triggers some initialization for the both arrays. It might also be the `&` operator... because if I do `isnull(arr1) & isnull(arr2)` I still get the segmentation fault. Only when using `isnull(arr1 & arr2)` it seems to work... strange things. edit: I was right... it's actually the `&` operator, I just need to call `arr1 & arr2` before the return statement and I don't get the segmentation fault...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
115902800	https://github.com/pydata/xarray/issues/444#issuecomment-115902800	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExNTkwMjgwMA==	shoyer 1217238	2015-06-26T22:01:41Z	2015-06-26T22:01:41Z	MEMBER	Another backend to try would be `engine='h5netcdf'`: https://github.com/shoyer/h5netcdf That might help us identify if this is a netCDF4-python bug. I am also baffled by how inserting `isnull(arr1 & arr2)` avoids the seg fault. This is a lazy computation created with dask that is immediately thrown away without accessing any of the values.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
115900337	https://github.com/pydata/xarray/issues/444#issuecomment-115900337	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExNTkwMDMzNw==	razcore-rad 1177508	2015-06-26T21:50:01Z	2015-06-26T21:53:50Z	NONE	Unfortunately I can't use `engine='scipy'` cause they're not netcdf3 files so it defaults to `'netcdf4'`. On the other hand here you can find the back trace from `gdb`... if that helps in any way... ``` print(arr1.dtype, arr2.dtype) print((arr1 == arr2)) print((arr1 == arr2) \| (isnull(arr1) & isnull(arr2))) gives: float64 float64 dask.array<x_1, shape=(50, 39, 59), chunks=((50,), (39,), (59,)), dtype=bool> dask.array<x_6, shape=(50, 39, 59), chunks=((50,), (39,), (59,)), dtype=bool> ``` Funny thing is when I'm adding these print statements and so on I get some traceback from Python (some times). Without them I would only get segmetation fault with no additional information. For example, just now, after introducing these `print`s I got this traceback. This doesn't seem to be an `xray` bug, I mean it can't since it's just `Python` code... but any help is appreciated. Thanks! edit: oh yeah... this is a funny thing. If I do `print(((arr1 == arr2) \| (isnull(arr1) & isnull(arr2))).all())`, I get `dask.array<x_13, shape=(), chunks=(), dtype=bool>` which I guess it's a problem... so calling that `all` method kind of screws things up, or at least calls other stuff that screw it up, but I have no idea why calling `isnull(arr1 & arr2)` before all this... makes it run without segfault.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107
115887568	https://github.com/pydata/xarray/issues/444#issuecomment-115887568	https://api.github.com/repos/pydata/xarray/issues/444	MDEyOklzc3VlQ29tbWVudDExNTg4NzU2OA==	shoyer 1217238	2015-06-26T21:25:50Z	2015-06-26T21:25:50Z	MEMBER	Oh my, that's bad! Can you experiment with the `engine` argument to `open_mfdataset` and see if that changes things? For example, try `engine='scipy'` (if this is a netcdf3 files) and `engine='netcdf4'`. It would be also be helpful to report the dtypes of the arrays that trigger failure in `array_equiv`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	segmentation fault with `open_mfdataset` 91184107

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

26 rows where issue = 91184107 sorted by updated_at descending

gives:

Advanced export