github: issues: 1 row where repo = 13221727, state = "closed" and user = 20794996 sorted by updated

1 row where repo = 13221727, state = "closed" and user = 20794996 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at ▲	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1384226112	I_kwDOAMm_X85SgZ1A	7075	Convert xarray dataset to pandas dataframe is much slower in newest xarray version	rilllydi 20794996	closed	0			4	2022-09-23T19:36:28Z	2023-10-14T20:37:40Z	2023-10-14T20:37:40Z	NONE				What is your issue? Converting an xarray dataset to pandas dataframe has become much slower in the newest xarray version. I want to read in very large netcdf files, extract a slice, and convert the slice to a pandas dataframe. For an input size of 2GB, the xarray version 0.21.0 takes 3 seconds versus the xarray version 2022.6.0 takes 44 seconds. See table below for more tests with increasing size of xarray dataset. <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/rilllydi/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/rilllydi/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> </head> <body link="#0563C1" vlink="#954F72"> Number of NetCDF Input Files in Xarray Dataset (~1GB per file): \| 2 \| 5 \| 10 \| 15 \| 20 \| 30 \| 40 -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- Older Xarray Version 0.21.0 \| 0:03 \| 0:02 \| 0:04 \| 0:06 \| 0:09 \| 0:13 \| 0:17 Newer Xarray Version 2022.6.0 \| 0:44 \| 1:30 \| 2:46 \| 4:01 \| 5:23 \| 7:56 \| 10:29 </body> </html> Here is my code: ``` Read in a list of netcdf files and combine into a single dataset. with xr.open_mfdataset(infile_list, combine='by_coords') as ds: `# Extract the data for a single location (the nearest grid point) using the provided coordinates (lat/lon). ds_slice = ds.sel(lon=-84.725, lat=42.3583, method='nearest') # Convert xarray dataset to a pandas dataframe. # This is now the slow part since the xarray library was updated. df = ds_slice.to_dataframe()` ``` The netcdf files I am reading in are about 1 GB each, containing daily weather data for the entire CONUS. There is 1 file per year, so if I read in 2 files, the dimensions are (lon: 1386, lat: 585, day: 731, crs: 1) with coordinates of lon, lat, day, and crs. They include 8 float data variables.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7075/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		not_planned	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

1 row where repo = 13221727, state = "closed" and user = 20794996 sorted by updated_at descending

What is your issue?

Read in a list of netcdf files and combine into a single dataset.

Advanced export