issue_comments
6 rows where issue = 253407851 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- to_dataframe (pandas) usage question · 6 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
327721325 | https://github.com/pydata/xarray/issues/1534#issuecomment-327721325 | https://api.github.com/repos/pydata/xarray/issues/1534 | MDEyOklzc3VlQ29tbWVudDMyNzcyMTMyNQ== | jhamman 2443309 | 2017-09-07T08:00:41Z | 2017-09-07T08:00:41Z | MEMBER | @mmartini-usgs - Thanks for the questions. I'm going to close this now as it seems like you're up and going. In the future, we try to keep our "Usage Questions" to the xarray users google group or StackOverflow. Cheers! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_dataframe (pandas) usage question 253407851 | |
326068119 | https://github.com/pydata/xarray/issues/1534#issuecomment-326068119 | https://api.github.com/repos/pydata/xarray/issues/1534 | MDEyOklzc3VlQ29tbWVudDMyNjA2ODExOQ== | mmartini-usgs 23199378 | 2017-08-30T17:50:11Z | 2017-08-30T17:50:11Z | NONE | Many thanks! I will try this. OK, since you asked for more details: I have used xarray resample successfully on a file with ~3 million single ping ADCP ensembles, 17 variables of these with 1D and 2D data. Three lines of code in a handful of minutes to reduce that. On a middling laptop. Amazing. Unintended behaviors from resample that I need to figure out: On the menu next to learn/use: -- Multi-indexing and data reshaping -- slicing -- Separating an uneven time base into separate time series -- Calculations involving several of the variables at the same time (e.g. using xarray to perform the ADCP beam to earth rotations) Where is all this going? Ultimately to produce mean current profile time series for data release (think hourly time-depth shaped dataframes) and bursts of hourly data (think hourly time-depth-sample shaped dataframes) on which to perform a variety of wave analysis. This is my learn-python project, so apologies for the non-pythonic approach. I also need to preserve backwards compatibility with existing code and conventions (EPIC, historically, CF and thredds, going forward). The project is here: https://github.com/mmartini-usgs/ADCPy |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_dataframe (pandas) usage question 253407851 | |
325777712 | https://github.com/pydata/xarray/issues/1534#issuecomment-325777712 | https://api.github.com/repos/pydata/xarray/issues/1534 | MDEyOklzc3VlQ29tbWVudDMyNTc3NzcxMg== | darothen 4992424 | 2017-08-29T19:42:24Z | 2017-08-29T19:42:24Z | NONE | @mmartini-usgs, an entire netCDF file (as long as it only has 1 group, which it most likely does if we're talking about standard atmospheric/oceanic data) would be the equivalent of an To start with, you should read in your data using the chunks keyword to
You'd have to choose chunks based on the dimensions of your data. Like @rabernat previously mentioned, it's very likely you can perform your entire workflow within xarray without every having to drop down to pandas; let us know if you can share more details |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_dataframe (pandas) usage question 253407851 | |
325773402 | https://github.com/pydata/xarray/issues/1534#issuecomment-325773402 | https://api.github.com/repos/pydata/xarray/issues/1534 | MDEyOklzc3VlQ29tbWVudDMyNTc3MzQwMg== | mmartini-usgs 23199378 | 2017-08-29T19:31:00Z | 2017-08-29T19:31:00Z | NONE | Hello Ryan, I have read a bit about dask. Am I missing the Pandas Panel analog in Dask? My data is in netcdf4, and the files can have as many as 17 variables or more. It's not clear how to get this easily into dask. In Pandas I think the entire netCDF file equates to a Panel. A single variable would be a DataFrame. Rather than wandering around in the weeds, I could use a hint here. Do I really need to open the netCDF4 file, then iterate over my variables and deal them into a series of dask data frames? That seems very un-pythonic. I tried this... presumably from here: http://www.unidata.ucar.edu/software/netcdf/docs/interoperability_hdf5.html I can open a netCDF4 file as "HDF5" using dask. Let's try a dask example (http://dask.pydata.org/en/latest/examples/dataframe-hdf5.html) with one of my netCDF files: df = dd.read_hdf('reallybignetCDF4file.nc',key='/c') # this does not work Thanks, Marinna |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_dataframe (pandas) usage question 253407851 | |
325458780 | https://github.com/pydata/xarray/issues/1534#issuecomment-325458780 | https://api.github.com/repos/pydata/xarray/issues/1534 | MDEyOklzc3VlQ29tbWVudDMyNTQ1ODc4MA== | mmartini-usgs 23199378 | 2017-08-28T19:45:53Z | 2017-08-28T19:45:53Z | NONE | Many thanks, I will go learn about dask dataframes. Marinna |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_dataframe (pandas) usage question 253407851 | |
325447523 | https://github.com/pydata/xarray/issues/1534#issuecomment-325447523 | https://api.github.com/repos/pydata/xarray/issues/1534 | MDEyOklzc3VlQ29tbWVudDMyNTQ0NzUyMw== | rabernat 1197350 | 2017-08-28T19:03:09Z | 2017-08-28T19:03:09Z | MEMBER | Marinna, You are correct. In the present release of Xarray, converting to a pandas dataframe loads all of the data eagerly into memory as a regular pandas object, giving up dask's parallel capabilities and potentially consuming lots of memory. With chunked Xarray data, It would be preferable instead to convert to a dask.dataframe, rather than a regular pandas dataframe, which would carry over some of the performance benefits. This is a known issue: https://github.com/pydata/xarray/issues/1462 With a solution in the works: https://github.com/pydata/xarray/pull/1489 So hopefully a release of Xarray in the near future will have the feature you seek. Alternatively, if you describe the filtering, masking, and other QA/QC that you need to do in more detail, we may be able to help you accomplish this entirely within Xarray. Good luck! Ryan On Mon, Aug 28, 2017 at 2:02 PM, Marinna Martini notifications@github.com wrote:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
to_dataframe (pandas) usage question 253407851 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 4