issues: 290084668

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
290084668	MDU6SXNzdWUyOTAwODQ2Njg=	1845	speed up opening multiple files with changing data variables	14314623	closed	0			1	2018-01-19T19:38:14Z	2020-09-23T16:47:37Z	2020-09-23T16:47:37Z	CONTRIBUTOR				Code Sample, a copy-pastable example if possible I am trying to open several ocean model data files. During the model run additional variables were written to the files. So for instance the first file will look like this: <xarray.Dataset> Dimensions: (st_edges_ocean: 51, st_ocean: 50, time: 1, xt_ocean: 3600, yt_ocean: 2700) Coordinates: * xt_ocean (xt_ocean) float64 -279.9 -279.8 -279.7 -279.6 -279.5 ... * yt_ocean (yt_ocean) float64 -81.11 -81.07 -81.02 -80.98 -80.94 ... * time (time) float64 4.401e+04 * st_ocean (st_ocean) float64 5.034 15.1 25.22 35.36 45.58 55.85 ... * st_edges_ocean (st_edges_ocean) float64 0.0 10.07 20.16 30.29 40.47 ... Data variables: jp_recycle (time, st_ocean, yt_ocean, xt_ocean) float64 dask.array<shape=(1, 50, 2700, 3600), chunksize=(1, 1, 2700, 3600)> jp_reminp (time, st_ocean, yt_ocean, xt_ocean) float64 dask.array<shape=(1, 50, 2700, 3600), chunksize=(1, 1, 2700, 3600)> jp_uptake (time, st_ocean, yt_ocean, xt_ocean) float64 dask.array<shape=(1, 50, 2700, 3600), chunksize=(1, 1, 2700, 3600)> jo2 (time, st_ocean, yt_ocean, xt_ocean) float64 dask.array<shape=(1, 50, 2700, 3600), chunksize=(1, 1, 2700, 3600)> dic_stf (time, yt_ocean, xt_ocean) float64 dask.array<shape=(1, 2700, 3600), chunksize=(1, 2700, 3600)> o2_stf (time, yt_ocean, xt_ocean) float64 dask.array<shape=(1, 2700, 3600), chunksize=(1, 2700, 3600)> Attributes: filename: 01210101.ocean_minibling_term_src.nc title: CM2.6_miniBling grid_type: mosaic grid_tile: 1 and the last file will look like this (with additional data variables `o2_btf`, `dic_btf`, and 'po4_btf`). ``` <xarray.Dataset> Dimensions: (st_edges_ocean: 51, st_ocean: 50, time: 1, xt_ocean: 3600, yt_ocean: 2700) Coordinates: * xt_ocean (xt_ocean) float64 -279.9 -279.8 -279.7 -279.6 -279.5 ... * yt_ocean (yt_ocean) float64 -81.11 -81.07 -81.02 -80.98 -80.94 ... * st_ocean (st_ocean) float64 5.034 15.1 25.22 35.36 45.58 55.85 ... * st_edges_ocean (st_edges_ocean) float64 0.0 10.07 20.16 30.29 40.47 ... * time (time) float64 7.25e+04 Data variables: jp_recycle (time, st_ocean, yt_ocean, xt_ocean) float64 dask.array<shape=(1, 50, 2700, 3600), chunksize=(1, 1, 2700, 3600)> jp_reminp (time, st_ocean, yt_ocean, xt_ocean) float64 dask.array<shape=(1, 50, 2700, 3600), chunksize=(1, 1, 2700, 3600)> jp_uptake (time, st_ocean, yt_ocean, xt_ocean) float64 dask.array<shape=(1, 50, 2700, 3600), chunksize=(1, 1, 2700, 3600)> jo2 (time, st_ocean, yt_ocean, xt_ocean) float64 dask.array<shape=(1, 50, 2700, 3600), chunksize=(1, 1, 2700, 3600)> dic_stf (time, yt_ocean, xt_ocean) float64 dask.array<shape=(1, 2700, 3600), chunksize=(1, 2700, 3600)> dic_btf (time, yt_ocean, xt_ocean) float64 dask.array<shape=(1, 2700, 3600), chunksize=(1, 2700, 3600)> o2_stf (time, yt_ocean, xt_ocean) float64 dask.array<shape=(1, 2700, 3600), chunksize=(1, 2700, 3600)> o2_btf (time, yt_ocean, xt_ocean) float64 dask.array<shape=(1, 2700, 3600), chunksize=(1, 2700, 3600)> po4_btf (time, yt_ocean, xt_ocean) float64 dask.array<shape=(1, 2700, 3600), chunksize=(1, 2700, 3600)> Attributes: date: created 2014-01-08 program: time_average_netcdf.rb history: Perform time-means on all variables in 01990101.ocean_minibli... filename: 01990101.ocean_minibling_term_src.nc title: CM2.6_miniBling grid_type: mosaic grid_tile: 1 `` If I specify the additional variables to be dropped, reading all files withxarray.open_mfdataset` works like a charm. But without specifying the variables to be dropped it takes an excruciating amount of time to load. First of all, I was wondering if there would be the possibility to display a warning if this situation occurs, suggesting to add these variables as `drop_variables` keyword. That would have saved me a ton of digging time. Even better would be some way to read such datasets in a fast manner. If we could specify a `fastpath` option (like suggested in #1823), perhaps this could speed this task up (given that all dimensions stay the same)? INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-642.15.1.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US LOCALE: en_US.ISO8859-1 xarray: 0.10.0rc2-2-g1a01208 pandas: 0.20.3 numpy: 1.13.3 scipy: 0.19.1 netCDF4: 1.3.0 h5netcdf: 0.4.2 Nio: None bottleneck: 1.2.1 cyordereddict: None dask: 0.16.0 matplotlib: 2.1.0 cartopy: 0.15.1 seaborn: 0.8.1 setuptools: 36.3.0 pip: 9.0.1 conda: None pytest: 3.2.3 IPython: 6.2.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1845/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	13221727	issue

Links from other tables

2 rows from issues_id in issues_labels
1 row from issue in issue_comments