issues
2 rows where state = "open" and user = 10678620 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
| id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1646350377 | PR_kwDOAMm_X85NMgVL | 7698 | Use read1 instead of read to get magic number | groutr 10678620 | open | 0 | 5 | 2023-03-29T18:57:23Z | 2023-04-18T22:36:30Z | FIRST_TIME_CONTRIBUTOR | 0 | pydata/xarray/pulls/7698 | Addresses #7697. I changed the isinstance check because neither  I think that there is little benefit to using  | {
    "url": "https://api.github.com/repos/pydata/xarray/issues/7698/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | xarray 13221727 | pull | ||||||
| 1646267547 | I_kwDOAMm_X85iIAyb | 7697 | open_mfdataset very slow | groutr 10678620 | open | 0 | 6 | 2023-03-29T17:55:45Z | 2023-03-29T21:21:24Z | NONE | What happened?I am trying to open an mfdataset consisting of over 4400 files. The call completes in 342.735s on my machine. After running a profiler, I discovered that most of that time is spent reading the first 8 bytes of the file. However, on my filesystem, looking at my system resource monitor, it looks like the entire file is being read (with a sustained 40-50MB of read IO most of that time). I traced the bottleneck down to https://github.com/pydata/xarray/blob/96030d4825159c6259736155084f38ee8db5c9bb/xarray/core/utils.py#L662 According to my profile, 264.381s (77%) of the execution time is spent on this line. I isolated the essence of this code, by reading the first 8 bytes of each file.
 What did you expect to happen?I expected open_mfdataset to be quicker. Minimal Complete Verifiable Example```Python import xarray as xr import pathlib files = [... <list of 4400 filenames> ...] This takes almost 6 minutes to finish.D = xr.open_mfdataset(files, compat='override', coords='minimal', data_vars='minimal') ``` MVCE confirmation
 Relevant log outputNo response Anything else we need to know?I cannot share the netcdf files. I believe this issue to isolated, and possibly triggered by the shared filesystems found on supercomputers. Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:27:40) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.80.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, None)
libhdf5: 1.12.2
libnetcdf: 4.9.1
xarray: 2023.2.0
pandas: 1.5.3
numpy: 1.24.2
scipy: 1.10.1
netCDF4: 1.6.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.6
cfgrib: None
iris: None
bottleneck: None
dask: 2023.3.1
distributed: None
matplotlib: 3.7.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.3.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.5.1
pip: 23.0.1
conda: None
pytest: None
mypy: None
IPython: 8.11.0
sphinx: None
 | {
    "url": "https://api.github.com/repos/pydata/xarray/issues/7697/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | xarray 13221727 | issue | 
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);