html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/pull/7698#issuecomment-1489744942,https://api.github.com/repos/pydata/xarray/issues/7698,1489744942,IC_kwDOAMm_X85Yy7Qu,10678620,2023-03-30T06:02:41Z,2023-03-30T06:03:03Z,NONE,"Agreed, and a reference to a pretty authoritative source: https://github.com/python/cpython/blob/3.11/Modules/_io/bufferedio.c#L915 It's confusing the method has a parameter called `filename_or_obj` but doesn't actually handle filenames. One workaround is to use `os.read` when passed a filename, and `.read()` when passed a file object. Something similar to: ```python def get_magic_number(filename_or_obj, count=8): if isinstance(filename_or_obj, (str, os.PathLike)): fd = os.open(filename_or_obj, os.RDONLY) # Append os.O_BINARY on windows magic_number = os.read(fd, count) if len(magic_number) != count: raise TypeError(""Error reading magic number"") os.close(fd) elif isinstance(filename_or_obj, io.BufferedIOBase): if filename_or_obj.seekable(): pos = filename_or_obj.tell() filename_or_obj.seek(0) magic_number = filename_or_obj.read(count) filename_or_obj.seek(pos) else: raise TypeError(""File not seekable."") else: raise TypeError(""Cannot read magic number."") return magic_number ``` On my laptop (w/ SSD) using `os.read` is about 2x faster than using `.read()`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1646350377 https://github.com/pydata/xarray/issues/7697#issuecomment-1489312337,https://api.github.com/repos/pydata/xarray/issues/7697,1489312337,IC_kwDOAMm_X85YxRpR,10678620,2023-03-29T20:59:24Z,2023-03-29T20:59:24Z,NONE,"@dcherian I'll look at that. I thought the `compat='override'` option bypassed most of the consistency checking. In my case, it is typically safe to assume the set of files are consistent (each file represents one timestep, the structure of each file is otherwise identical). @headtr1ck I was just informed that the underlying filesystem is actually a networked filesystem. The PR might still be useful, but the latest profile seems more reasonable in light of my new info.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1646267547 https://github.com/pydata/xarray/issues/7697#issuecomment-1489267595,https://api.github.com/repos/pydata/xarray/issues/7697,1489267595,IC_kwDOAMm_X85YxGuL,10678620,2023-03-29T20:30:49Z,2023-03-29T20:33:28Z,NONE,"> It seems that this problematic code is mostly used to determine the engine that is used to finally open it. Did you try specifying the correct engine directly? I tried setting the engine to 'netcdf4' and while it did help a little bit, it still seems slow on my system. Here is my profile with `engine='netcdf4'` ![slowmfdataset](https://user-images.githubusercontent.com/10678620/228648325-24128f1a-fb53-486a-8739-38f06a7d3375.png) I'm not sure what to make of this profile. I don't see anything in the file_manager that would be especially slow. Perhaps it is a filesystem bottleneck at this point (given that the cpu time is 132s of the total 288s duration).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1646267547