home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

1 row where author_association = "CONTRIBUTOR" and issue = 454073421 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • sjvrijn 1

issue 1

  • NaN values for variables when converting from a pandas dataframe to xarray.DataSet · 1 ✖

author_association 1

  • CONTRIBUTOR · 1 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
602508864 https://github.com/pydata/xarray/issues/3007#issuecomment-602508864 https://api.github.com/repos/pydata/xarray/issues/3007 MDEyOklzc3VlQ29tbWVudDYwMjUwODg2NA== sjvrijn 8833517 2020-03-23T10:27:27Z 2020-03-23T10:27:27Z CONTRIBUTOR

I recently had a similar issue and found out the cause: When transforming from a dataframe to an xarray, the xarray allocates memory for all possible combinations of the coordinates. In this particular case, you have 5 unique values for latitude and longitude in your five rows, which means there are 5*5=25 possible combinations of lat/long values. All missing values are then filled in as NaN.

Let me illustrate by recreating just your data on latitude, longitude, wind_surface and hurs:

python In [3]: data = [ ...: [34.511383, 16.467664, 29.658546, 70.481293], ...: [34.515558, 16.723973, 30.896049, 71.356644], ...: [34.517359, 16.852138, 31.514799, 71.708603], ...: [34.518970, 16.980310, 32.105423, 72.023773], ...: [34.520391, 17.108487, 32.724174, 72.106110], ...: ] In [4]: df = pd.DataFrame(data=data, columns=['lat', 'long', 'wind_surface', 'hurs']).set_index(['lat', 'long']) In [5]: df Out[5]: wind_surface hurs lat long 34.511383 16.467664 29.658546 70.481293 34.515558 16.723973 30.896049 71.356644 34.517359 16.852138 31.514799 71.708603 34.518970 16.980310 32.105423 72.023773 34.520391 17.108487 32.724174 72.106110

But for the xarray, this means it will end up creating a 5x5 array, of which only 5 values are given along the diagonal. This is very clearly visible when showing just the DataArray for a single column: python In [6]: df.to_xarray()['wind_surface'] Out[6]: <xarray.DataArray 'wind_surface' (lat: 5, long: 5)> array([[29.658546, nan, nan, nan, nan], [ nan, 30.896049, nan, nan, nan], [ nan, nan, 31.514799, nan, nan], [ nan, nan, nan, 32.105423, nan], [ nan, nan, nan, nan, 32.724174]]) Coordinates: * lat (lat) float64 34.51 34.52 34.52 34.52 34.52 * long (long) float64 16.47 16.72 16.85 16.98 17.11

However, as to_xarray() outputs a DataSet, each DataArray, i.e. column from the dataframe, is summarized as a 1D array, which makes it seem like a lot of data is just 'missing': python In [7]: df.to_xarray() Out[7]: <xarray.Dataset> Dimensions: (lat: 5, long: 5) Coordinates: * lat (lat) float64 34.51 34.52 34.52 34.52 34.52 * long (long) float64 16.47 16.72 16.85 16.98 17.11 Data variables: wind_surface (lat, long) float64 29.66 nan nan nan ... nan nan nan 32.72 hurs (lat, long) float64 70.48 nan nan nan ... nan nan nan 72.11

So it works as intended, but can throw you for a loop if you don't realize it's creating an array the size of all possible index combinations.

@shoyer can you close this issue?

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  NaN values for variables when converting from a pandas dataframe to xarray.DataSet 454073421

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.366ms · About: xarray-datasette