github: issue_comments: 20 rows where issue = 187608079 sorted by updated

20 rows where issue = 187608079 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1100969648	https://github.com/pydata/xarray/issues/1086#issuecomment-1100969648	https://api.github.com/repos/pydata/xarray/issues/1086	IC_kwDOAMm_X85Bn3aw	stale[bot] 26384082	2022-04-17T23:43:46Z	2022-04-17T23:43:46Z	NONE	In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the `stale` label; otherwise it will be marked as closed automatically	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
661972749	https://github.com/pydata/xarray/issues/1086#issuecomment-661972749	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDY2MTk3Mjc0OQ==	andreall 25382032	2020-07-21T16:41:52Z	2020-07-21T16:41:52Z	NONE	Hi @darothen , Thanks a lot..I hadn't thought of processing each file and then merging. Will give it a try, Thanks,	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
661953980	https://github.com/pydata/xarray/issues/1086#issuecomment-661953980	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDY2MTk1Mzk4MA==	darothen 4992424	2020-07-21T16:09:25Z	2020-07-21T16:09:52Z	NONE	Hi @andreall, I'll leave @dcherian or another maintainer to comment on internals of `xarray` that might be pertinent for optimization here. However, just to throw it out there, for workflows like this, it can sometimes be a bit easier to process each NetCDF file (subsetting your locations and whatnot) and convert it to CSV individually, then merge/concatenate those CSV files together at the end. This sort of workflow can be parallelized a few different ways, but is nice because you can parallelize across the number of files you need to process. A simple example based on your MRE: ``` python import xarray as xr from pathlib import Path from joblib import delayed, Parallel dir_input = Path('.') fns = list(sorted(dir_input.glob('*/' + 'WW3_EUR-11_CCCma-CanESM2_r1i1p1_CLMcom-CCLM4-8-17_v1_6hr_.nc'))) Helper function to convert NetCDF to CSV with our processing def _nc_to_csv(fn): data_ww3 = xr.open_dataset(fn) data_ww3 = data_ww3.isel(latitude=74, longitude=18) df_ww3 = data_ww3[['hs', 't02', 't0m1', 't01', 'fp', 'dir', 'spr', 'dp']].to_dataframe() `out_fn = fn.replace(".nc", ".csv") df_ww3.to_csv(out_fn) return out_fn` Using joblib.Parallel to distribute my work across whatever resources i have out_fns = Parallel( n_jobs=-1, # Use all cores available here delayed(_nc_to_csv)(fn) for fn in fns ) Read the CSV files and merge them dfs = [ pd.read_csv(fn) for fn in out_fns ] df_ww3_all = pd.concat(dfs, ignore_index=True) ``` YMMV but this pattern often works for many types of processing applications.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
661940009	https://github.com/pydata/xarray/issues/1086#issuecomment-661940009	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDY2MTk0MDAwOQ==	andreall 25382032	2020-07-21T15:44:54Z	2020-07-21T15:46:06Z	NONE	Hi, ``` import xarray as xr from pathlib import Path dir_input = Path('.') data_ww3 = xr.open_mfdataset(dir_input.glob('*/' + 'WW3_EUR-11_CCCma-CanESM2_r1i1p1_CLMcom-CCLM4-8-17_v1_6hr_.nc')) data_ww3 = data_ww3.isel(latitude=74, longitude=18) df_ww3 = data_ww3[['hs', 't02', 't0m1', 't01', 'fp', 'dir', 'spr', 'dp']].to_dataframe() ``` You can download one file here: https://nasgdfa.ugr.es:5001/d/f/566168344466602780 (3.5 GB). I did a profiler when opening 2 .nc files an it said the to_dataframe() call was the one taking most of the time. I'm just wondering if there's a way to reduce computing time. I need to open 95 files and it takes about 1.5 hour. Thanks,	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
661919828	https://github.com/pydata/xarray/issues/1086#issuecomment-661919828	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDY2MTkxOTgyOA==	dcherian 2448579	2020-07-21T15:10:02Z	2020-07-21T15:10:02Z	MEMBER	can you make a reproducible example @andreall?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
661775197	https://github.com/pydata/xarray/issues/1086#issuecomment-661775197	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDY2MTc3NTE5Nw==	andreall 25382032	2020-07-21T10:29:48Z	2020-07-21T10:29:48Z	NONE	I am running into the same problem, this might be a long shot but @naught101 , do you remember if you managed to convert to dataframe in a more efficient way? Thanks,	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
259044958	https://github.com/pydata/xarray/issues/1086#issuecomment-259044958	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDI1OTA0NDk1OA==	naught101 167164	2016-11-08T04:47:56Z	2016-11-08T04:47:56Z	NONE	Ok, no worries. I'll try it if it gets desperate :) Thanks for your help, shoyer!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
259044805	https://github.com/pydata/xarray/issues/1086#issuecomment-259044805	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDI1OTA0NDgwNQ==	shoyer 1217238	2016-11-08T04:46:23Z	2016-11-08T04:46:23Z	MEMBER	So it would be more efficient to concat all of the datasets (subset for the relevant variables), and then just use a single .to_dataframe() call on the entire dataset? If so, that would require quite a bit of refactoring on my part, but it could be worth it. Maybe? I'm not confident enough to advise you to go to that trouble.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
259041491	https://github.com/pydata/xarray/issues/1086#issuecomment-259041491	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDI1OTA0MTQ5MQ==	naught101 167164	2016-11-08T04:16:26Z	2016-11-08T04:16:26Z	NONE	So it would be more efficient to concat all of the datasets (subset for the relevant variables), and then just use a single .to_dataframe() call on the entire dataset? If so, that would require quite a bit of refactoring on my part, but it could be worth it.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
259035428	https://github.com/pydata/xarray/issues/1086#issuecomment-259035428	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDI1OTAzNTQyOA==	shoyer 1217238	2016-11-08T03:25:58Z	2016-11-08T03:25:58Z	MEMBER	Under the covers open_mfdataset just uses open_dataset and merge/concat. So this would be similar either way. On Mon, Nov 7, 2016 at 7:14 PM naught101 notifications@github.com wrote: Yeah, I'm loading each file separately with xr.open_dataset(), since it's not really a multi-file dataset (it's a lot of single-site datasets, some of which have different variables, and overlapping time dimensions). I don't think I can avoid loading them separately... — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1086#issuecomment-259033970, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1oUWnGIBO3mX5h56mgPvCbCU7PI3ks5q7-krgaJpZM4Kqw2_ .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
259033970	https://github.com/pydata/xarray/issues/1086#issuecomment-259033970	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDI1OTAzMzk3MA==	naught101 167164	2016-11-08T03:14:50Z	2016-11-08T03:14:50Z	NONE	Yeah, I'm loading each file separately with `xr.open_dataset()`, since it's not really a multi-file dataset (it's a lot of single-site datasets, some of which have different variables, and overlapping time dimensions). I don't think I can avoid loading them separately...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
259028693	https://github.com/pydata/xarray/issues/1086#issuecomment-259028693	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDI1OTAyODY5Mw==	shoyer 1217238	2016-11-08T02:36:16Z	2016-11-08T02:36:16Z	MEMBER	One thing that might hurt is that xarray (lazily) decodes times from each file separately, rather than decoding times all at one. But this hasn't been much of an issue before even with hundreds of times, so I'm not sure what's going on here.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
259026069	https://github.com/pydata/xarray/issues/1086#issuecomment-259026069	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDI1OTAyNjA2OQ==	naught101 167164	2016-11-08T02:19:01Z	2016-11-08T02:19:01Z	NONE	Not easily - most scripts require multiple (up to 200, of which the linked one is one of the smallest, some are up to 10Mb) of these datasets in a specific directory structure, and rely on a couple of private python modules. I was just asking because I thought I might have been missing something obvious, but now I guess that isn't the case. Probably not worth spending too much time on this - if it starts becoming a real problem for me, I will try to generate something self-contained that shows the problem. Until then, maybe it's best to assume that xarray/pandas are doing the best they can given the requirements, and close this for now.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
258884141	https://github.com/pydata/xarray/issues/1086#issuecomment-258884141	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDI1ODg4NDE0MQ==	shoyer 1217238	2016-11-07T16:27:21Z	2016-11-07T16:27:21Z	MEMBER	can you give me a copy/pastable script that has the slowness issue with that file?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
258774196	https://github.com/pydata/xarray/issues/1086#issuecomment-258774196	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDI1ODc3NDE5Ng==	naught101 167164	2016-11-07T08:30:25Z	2016-11-07T08:30:25Z	NONE	I loaded it from a netcdf file. There's an example you can play with at https://dl.dropboxusercontent.com/u/50684199/MitraEFluxnet.1.4_flux.nc	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
258755912	https://github.com/pydata/xarray/issues/1086#issuecomment-258755912	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDI1ODc1NTkxMg==	shoyer 1217238	2016-11-07T06:20:18Z	2016-11-07T06:20:18Z	MEMBER	How did you construct this dataset?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
258755061	https://github.com/pydata/xarray/issues/1086#issuecomment-258755061	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDI1ODc1NTA2MQ==	naught101 167164	2016-11-07T06:12:27Z	2016-11-07T06:12:27Z	NONE	Slightly slower (using `%timeit` in ipython)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
258754037	https://github.com/pydata/xarray/issues/1086#issuecomment-258754037	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDI1ODc1NDAzNw==	shoyer 1217238	2016-11-07T06:02:56Z	2016-11-07T06:02:56Z	MEMBER	Try calling `.load()` before `.to_dataframe`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
258753366	https://github.com/pydata/xarray/issues/1086#issuecomment-258753366	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDI1ODc1MzM2Ng==	naught101 167164	2016-11-07T05:56:26Z	2016-11-07T05:56:26Z	NONE	Squeeze is pretty much identical in efficiency. Seems very slightly better (2-5%) on smaller datasets. (I still need to add the final `[data_vars]` to get rid of the extraneous index_var columns, but that doesn't affect performance much). I'm not calling `pandas.tslib.array_to_timedelta64`, `to_dataframe` is - the caller list is (sorry, I'm not sure of a better way to show this):	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
258748969	https://github.com/pydata/xarray/issues/1086#issuecomment-258748969	https://api.github.com/repos/pydata/xarray/issues/1086	MDEyOklzc3VlQ29tbWVudDI1ODc0ODk2OQ==	shoyer 1217238	2016-11-07T05:14:11Z	2016-11-07T05:14:24Z	MEMBER	The simplest thing to try is making use of `.squeeze()`, e.g., `dataset[data_vars].squeeze().to_dataframe()`. Does that have any better performance? At least it's a bit less typing. I'm not sure why `pandas.tslib.array_to_timedelta64` is slow here, or even how it is being called in your example. I would need a complete example that I can run to debug that.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Is there a more efficient way to convert a subset of variables to a dataframe? 187608079

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

20 rows where issue = 187608079 sorted by updated_at descending

Helper function to convert NetCDF to CSV with our processing

Using joblib.Parallel to distribute my work across whatever resources i have

Read the CSV files and merge them

Advanced export