issue_comments
12 rows where author_association = "NONE" and issue = 187608079 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Is there a more efficient way to convert a subset of variables to a dataframe? · 12 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1100969648 | https://github.com/pydata/xarray/issues/1086#issuecomment-1100969648 | https://api.github.com/repos/pydata/xarray/issues/1086 | IC_kwDOAMm_X85Bn3aw | stale[bot] 26384082 | 2022-04-17T23:43:46Z | 2022-04-17T23:43:46Z | NONE | In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
661972749 | https://github.com/pydata/xarray/issues/1086#issuecomment-661972749 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDY2MTk3Mjc0OQ== | andreall 25382032 | 2020-07-21T16:41:52Z | 2020-07-21T16:41:52Z | NONE | Hi @darothen , Thanks a lot..I hadn't thought of processing each file and then merging. Will give it a try, Thanks, |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
661953980 | https://github.com/pydata/xarray/issues/1086#issuecomment-661953980 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDY2MTk1Mzk4MA== | darothen 4992424 | 2020-07-21T16:09:25Z | 2020-07-21T16:09:52Z | NONE | Hi @andreall, I'll leave @dcherian or another maintainer to comment on internals of ``` python import xarray as xr from pathlib import Path from joblib import delayed, Parallel dir_input = Path('.') fns = list(sorted(dir_input.glob('*/' + 'WW3_EUR-11_CCCma-CanESM2_r1i1p1_CLMcom-CCLM4-8-17_v1_6hr_.nc'))) Helper function to convert NetCDF to CSV with our processingdef _nc_to_csv(fn): data_ww3 = xr.open_dataset(fn) data_ww3 = data_ww3.isel(latitude=74, longitude=18) df_ww3 = data_ww3[['hs', 't02', 't0m1', 't01', 'fp', 'dir', 'spr', 'dp']].to_dataframe()
Using joblib.Parallel to distribute my work across whatever resources i haveout_fns = Parallel( n_jobs=-1, # Use all cores available here delayed(_nc_to_csv)(fn) for fn in fns ) Read the CSV files and merge themdfs = [ pd.read_csv(fn) for fn in out_fns ] df_ww3_all = pd.concat(dfs, ignore_index=True) ``` YMMV but this pattern often works for many types of processing applications. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
661940009 | https://github.com/pydata/xarray/issues/1086#issuecomment-661940009 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDY2MTk0MDAwOQ== | andreall 25382032 | 2020-07-21T15:44:54Z | 2020-07-21T15:46:06Z | NONE | Hi, ``` import xarray as xr from pathlib import Path dir_input = Path('.') data_ww3 = xr.open_mfdataset(dir_input.glob('*/' + 'WW3_EUR-11_CCCma-CanESM2_r1i1p1_CLMcom-CCLM4-8-17_v1_6hr_.nc')) data_ww3 = data_ww3.isel(latitude=74, longitude=18) df_ww3 = data_ww3[['hs', 't02', 't0m1', 't01', 'fp', 'dir', 'spr', 'dp']].to_dataframe() ``` You can download one file here: https://nasgdfa.ugr.es:5001/d/f/566168344466602780 (3.5 GB). I did a profiler when opening 2 .nc files an it said the to_dataframe() call was the one taking most of the time. I'm just wondering if there's a way to reduce computing time. I need to open 95 files and it takes about 1.5 hour. Thanks, |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
661775197 | https://github.com/pydata/xarray/issues/1086#issuecomment-661775197 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDY2MTc3NTE5Nw== | andreall 25382032 | 2020-07-21T10:29:48Z | 2020-07-21T10:29:48Z | NONE | I am running into the same problem, this might be a long shot but @naught101 , do you remember if you managed to convert to dataframe in a more efficient way? Thanks, |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
259044958 | https://github.com/pydata/xarray/issues/1086#issuecomment-259044958 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1OTA0NDk1OA== | naught101 167164 | 2016-11-08T04:47:56Z | 2016-11-08T04:47:56Z | NONE | Ok, no worries. I'll try it if it gets desperate :) Thanks for your help, shoyer! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
259041491 | https://github.com/pydata/xarray/issues/1086#issuecomment-259041491 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1OTA0MTQ5MQ== | naught101 167164 | 2016-11-08T04:16:26Z | 2016-11-08T04:16:26Z | NONE | So it would be more efficient to concat all of the datasets (subset for the relevant variables), and then just use a single .to_dataframe() call on the entire dataset? If so, that would require quite a bit of refactoring on my part, but it could be worth it. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
259033970 | https://github.com/pydata/xarray/issues/1086#issuecomment-259033970 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1OTAzMzk3MA== | naught101 167164 | 2016-11-08T03:14:50Z | 2016-11-08T03:14:50Z | NONE | Yeah, I'm loading each file separately with |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
259026069 | https://github.com/pydata/xarray/issues/1086#issuecomment-259026069 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1OTAyNjA2OQ== | naught101 167164 | 2016-11-08T02:19:01Z | 2016-11-08T02:19:01Z | NONE | Not easily - most scripts require multiple (up to 200, of which the linked one is one of the smallest, some are up to 10Mb) of these datasets in a specific directory structure, and rely on a couple of private python modules. I was just asking because I thought I might have been missing something obvious, but now I guess that isn't the case. Probably not worth spending too much time on this - if it starts becoming a real problem for me, I will try to generate something self-contained that shows the problem. Until then, maybe it's best to assume that xarray/pandas are doing the best they can given the requirements, and close this for now. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
258774196 | https://github.com/pydata/xarray/issues/1086#issuecomment-258774196 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1ODc3NDE5Ng== | naught101 167164 | 2016-11-07T08:30:25Z | 2016-11-07T08:30:25Z | NONE | I loaded it from a netcdf file. There's an example you can play with at https://dl.dropboxusercontent.com/u/50684199/MitraEFluxnet.1.4_flux.nc |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
258755061 | https://github.com/pydata/xarray/issues/1086#issuecomment-258755061 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1ODc1NTA2MQ== | naught101 167164 | 2016-11-07T06:12:27Z | 2016-11-07T06:12:27Z | NONE | Slightly slower (using |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
258753366 | https://github.com/pydata/xarray/issues/1086#issuecomment-258753366 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1ODc1MzM2Ng== | naught101 167164 | 2016-11-07T05:56:26Z | 2016-11-07T05:56:26Z | NONE | Squeeze is pretty much identical in efficiency. Seems very slightly better (2-5%) on smaller datasets. (I still need to add the final I'm not calling |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 4