issue_comments
20 rows where issue = 187608079 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Is there a more efficient way to convert a subset of variables to a dataframe? · 20 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1100969648 | https://github.com/pydata/xarray/issues/1086#issuecomment-1100969648 | https://api.github.com/repos/pydata/xarray/issues/1086 | IC_kwDOAMm_X85Bn3aw | stale[bot] 26384082 | 2022-04-17T23:43:46Z | 2022-04-17T23:43:46Z | NONE | In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
661972749 | https://github.com/pydata/xarray/issues/1086#issuecomment-661972749 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDY2MTk3Mjc0OQ== | andreall 25382032 | 2020-07-21T16:41:52Z | 2020-07-21T16:41:52Z | NONE | Hi @darothen , Thanks a lot..I hadn't thought of processing each file and then merging. Will give it a try, Thanks, |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
661953980 | https://github.com/pydata/xarray/issues/1086#issuecomment-661953980 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDY2MTk1Mzk4MA== | darothen 4992424 | 2020-07-21T16:09:25Z | 2020-07-21T16:09:52Z | NONE | Hi @andreall, I'll leave @dcherian or another maintainer to comment on internals of ``` python import xarray as xr from pathlib import Path from joblib import delayed, Parallel dir_input = Path('.') fns = list(sorted(dir_input.glob('*/' + 'WW3_EUR-11_CCCma-CanESM2_r1i1p1_CLMcom-CCLM4-8-17_v1_6hr_.nc'))) Helper function to convert NetCDF to CSV with our processingdef _nc_to_csv(fn): data_ww3 = xr.open_dataset(fn) data_ww3 = data_ww3.isel(latitude=74, longitude=18) df_ww3 = data_ww3[['hs', 't02', 't0m1', 't01', 'fp', 'dir', 'spr', 'dp']].to_dataframe()
Using joblib.Parallel to distribute my work across whatever resources i haveout_fns = Parallel( n_jobs=-1, # Use all cores available here delayed(_nc_to_csv)(fn) for fn in fns ) Read the CSV files and merge themdfs = [ pd.read_csv(fn) for fn in out_fns ] df_ww3_all = pd.concat(dfs, ignore_index=True) ``` YMMV but this pattern often works for many types of processing applications. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
661940009 | https://github.com/pydata/xarray/issues/1086#issuecomment-661940009 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDY2MTk0MDAwOQ== | andreall 25382032 | 2020-07-21T15:44:54Z | 2020-07-21T15:46:06Z | NONE | Hi, ``` import xarray as xr from pathlib import Path dir_input = Path('.') data_ww3 = xr.open_mfdataset(dir_input.glob('*/' + 'WW3_EUR-11_CCCma-CanESM2_r1i1p1_CLMcom-CCLM4-8-17_v1_6hr_.nc')) data_ww3 = data_ww3.isel(latitude=74, longitude=18) df_ww3 = data_ww3[['hs', 't02', 't0m1', 't01', 'fp', 'dir', 'spr', 'dp']].to_dataframe() ``` You can download one file here: https://nasgdfa.ugr.es:5001/d/f/566168344466602780 (3.5 GB). I did a profiler when opening 2 .nc files an it said the to_dataframe() call was the one taking most of the time. I'm just wondering if there's a way to reduce computing time. I need to open 95 files and it takes about 1.5 hour. Thanks, |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
661919828 | https://github.com/pydata/xarray/issues/1086#issuecomment-661919828 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDY2MTkxOTgyOA== | dcherian 2448579 | 2020-07-21T15:10:02Z | 2020-07-21T15:10:02Z | MEMBER | can you make a reproducible example @andreall? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
661775197 | https://github.com/pydata/xarray/issues/1086#issuecomment-661775197 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDY2MTc3NTE5Nw== | andreall 25382032 | 2020-07-21T10:29:48Z | 2020-07-21T10:29:48Z | NONE | I am running into the same problem, this might be a long shot but @naught101 , do you remember if you managed to convert to dataframe in a more efficient way? Thanks, |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
259044958 | https://github.com/pydata/xarray/issues/1086#issuecomment-259044958 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1OTA0NDk1OA== | naught101 167164 | 2016-11-08T04:47:56Z | 2016-11-08T04:47:56Z | NONE | Ok, no worries. I'll try it if it gets desperate :) Thanks for your help, shoyer! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
259044805 | https://github.com/pydata/xarray/issues/1086#issuecomment-259044805 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1OTA0NDgwNQ== | shoyer 1217238 | 2016-11-08T04:46:23Z | 2016-11-08T04:46:23Z | MEMBER |
Maybe? I'm not confident enough to advise you to go to that trouble. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
259041491 | https://github.com/pydata/xarray/issues/1086#issuecomment-259041491 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1OTA0MTQ5MQ== | naught101 167164 | 2016-11-08T04:16:26Z | 2016-11-08T04:16:26Z | NONE | So it would be more efficient to concat all of the datasets (subset for the relevant variables), and then just use a single .to_dataframe() call on the entire dataset? If so, that would require quite a bit of refactoring on my part, but it could be worth it. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
259035428 | https://github.com/pydata/xarray/issues/1086#issuecomment-259035428 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1OTAzNTQyOA== | shoyer 1217238 | 2016-11-08T03:25:58Z | 2016-11-08T03:25:58Z | MEMBER | Under the covers open_mfdataset just uses open_dataset and merge/concat. So this would be similar either way. On Mon, Nov 7, 2016 at 7:14 PM naught101 notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
259033970 | https://github.com/pydata/xarray/issues/1086#issuecomment-259033970 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1OTAzMzk3MA== | naught101 167164 | 2016-11-08T03:14:50Z | 2016-11-08T03:14:50Z | NONE | Yeah, I'm loading each file separately with |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
259028693 | https://github.com/pydata/xarray/issues/1086#issuecomment-259028693 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1OTAyODY5Mw== | shoyer 1217238 | 2016-11-08T02:36:16Z | 2016-11-08T02:36:16Z | MEMBER | One thing that might hurt is that xarray (lazily) decodes times from each file separately, rather than decoding times all at one. But this hasn't been much of an issue before even with hundreds of times, so I'm not sure what's going on here. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
259026069 | https://github.com/pydata/xarray/issues/1086#issuecomment-259026069 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1OTAyNjA2OQ== | naught101 167164 | 2016-11-08T02:19:01Z | 2016-11-08T02:19:01Z | NONE | Not easily - most scripts require multiple (up to 200, of which the linked one is one of the smallest, some are up to 10Mb) of these datasets in a specific directory structure, and rely on a couple of private python modules. I was just asking because I thought I might have been missing something obvious, but now I guess that isn't the case. Probably not worth spending too much time on this - if it starts becoming a real problem for me, I will try to generate something self-contained that shows the problem. Until then, maybe it's best to assume that xarray/pandas are doing the best they can given the requirements, and close this for now. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
258884141 | https://github.com/pydata/xarray/issues/1086#issuecomment-258884141 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1ODg4NDE0MQ== | shoyer 1217238 | 2016-11-07T16:27:21Z | 2016-11-07T16:27:21Z | MEMBER | can you give me a copy/pastable script that has the slowness issue with that file? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
258774196 | https://github.com/pydata/xarray/issues/1086#issuecomment-258774196 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1ODc3NDE5Ng== | naught101 167164 | 2016-11-07T08:30:25Z | 2016-11-07T08:30:25Z | NONE | I loaded it from a netcdf file. There's an example you can play with at https://dl.dropboxusercontent.com/u/50684199/MitraEFluxnet.1.4_flux.nc |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
258755912 | https://github.com/pydata/xarray/issues/1086#issuecomment-258755912 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1ODc1NTkxMg== | shoyer 1217238 | 2016-11-07T06:20:18Z | 2016-11-07T06:20:18Z | MEMBER | How did you construct this dataset? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
258755061 | https://github.com/pydata/xarray/issues/1086#issuecomment-258755061 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1ODc1NTA2MQ== | naught101 167164 | 2016-11-07T06:12:27Z | 2016-11-07T06:12:27Z | NONE | Slightly slower (using |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
258754037 | https://github.com/pydata/xarray/issues/1086#issuecomment-258754037 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1ODc1NDAzNw== | shoyer 1217238 | 2016-11-07T06:02:56Z | 2016-11-07T06:02:56Z | MEMBER | Try calling |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
258753366 | https://github.com/pydata/xarray/issues/1086#issuecomment-258753366 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1ODc1MzM2Ng== | naught101 167164 | 2016-11-07T05:56:26Z | 2016-11-07T05:56:26Z | NONE | Squeeze is pretty much identical in efficiency. Seems very slightly better (2-5%) on smaller datasets. (I still need to add the final I'm not calling |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 | |
258748969 | https://github.com/pydata/xarray/issues/1086#issuecomment-258748969 | https://api.github.com/repos/pydata/xarray/issues/1086 | MDEyOklzc3VlQ29tbWVudDI1ODc0ODk2OQ== | shoyer 1217238 | 2016-11-07T05:14:11Z | 2016-11-07T05:14:24Z | MEMBER | The simplest thing to try is making use of I'm not sure why |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Is there a more efficient way to convert a subset of variables to a dataframe? 187608079 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 6