home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 253407851 and user = 23199378 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • mmartini-usgs · 3 ✖

issue 1

  • to_dataframe (pandas) usage question · 3 ✖

author_association 1

  • NONE 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
326068119 https://github.com/pydata/xarray/issues/1534#issuecomment-326068119 https://api.github.com/repos/pydata/xarray/issues/1534 MDEyOklzc3VlQ29tbWVudDMyNjA2ODExOQ== mmartini-usgs 23199378 2017-08-30T17:50:11Z 2017-08-30T17:50:11Z NONE

Many thanks! I will try this. OK, since you asked for more details:

I have used xarray resample successfully on a file with ~3 million single ping ADCP ensembles, 17 variables of these with 1D and 2D data. Three lines of code in a handful of minutes to reduce that. On a middling laptop. Amazing.

Unintended behaviors from resample that I need to figure out:
-- a time offset is introduced. Hourly mean data is offset by 16 hours in one use case. 16 hours is odd. I may have my time set up wrong. Or it is a start transient from a filter. I've been reading code trying to figure out exactly what kind of algorithm pandas is using in resample (as best I can tell this is where xarray is getting its resample method). -- NaNs are introduced. I think I need to learn how to set mask_and_scale to prevent this.
-- when outputting to netCDF the "time' dimension got added to a variable that didn't and shouldn't have it.

On the menu next to learn/use: -- Multi-indexing and data reshaping -- slicing -- Separating an uneven time base into separate time series -- Calculations involving several of the variables at the same time (e.g. using xarray to perform the ADCP beam to earth rotations)

Where is all this going? Ultimately to produce mean current profile time series for data release (think hourly time-depth shaped dataframes) and bursts of hourly data (think hourly time-depth-sample shaped dataframes) on which to perform a variety of wave analysis.

This is my learn-python project, so apologies for the non-pythonic approach. I also need to preserve backwards compatibility with existing code and conventions (EPIC, historically, CF and thredds, going forward). The project is here: https://github.com/mmartini-usgs/ADCPy

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_dataframe (pandas) usage question 253407851
325773402 https://github.com/pydata/xarray/issues/1534#issuecomment-325773402 https://api.github.com/repos/pydata/xarray/issues/1534 MDEyOklzc3VlQ29tbWVudDMyNTc3MzQwMg== mmartini-usgs 23199378 2017-08-29T19:31:00Z 2017-08-29T19:31:00Z NONE

Hello Ryan,

I have read a bit about dask. Am I missing the Pandas Panel analog in Dask?

My data is in netcdf4, and the files can have as many as 17 variables or more. It's not clear how to get this easily into dask. In Pandas I think the entire netCDF file equates to a Panel. A single variable would be a DataFrame.

Rather than wandering around in the weeds, I could use a hint here. Do I really need to open the netCDF4 file, then iterate over my variables and deal them into a series of dask data frames? That seems very un-pythonic.

I tried this... presumably from here: http://www.unidata.ucar.edu/software/netcdf/docs/interoperability_hdf5.html I can open a netCDF4 file as "HDF5" using dask. Let's try a dask example (http://dask.pydata.org/en/latest/examples/dataframe-hdf5.html) with one of my netCDF files:

df = dd.read_hdf('reallybignetCDF4file.nc',key='/c') # this does not work

Thanks, Marinna

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_dataframe (pandas) usage question 253407851
325458780 https://github.com/pydata/xarray/issues/1534#issuecomment-325458780 https://api.github.com/repos/pydata/xarray/issues/1534 MDEyOklzc3VlQ29tbWVudDMyNTQ1ODc4MA== mmartini-usgs 23199378 2017-08-28T19:45:53Z 2017-08-28T19:45:53Z NONE

Many thanks, I will go learn about dask dataframes. Marinna

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_dataframe (pandas) usage question 253407851

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 36.18ms · About: xarray-datasette