home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 454073421 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • shoyer 2
  • ghost 1
  • dcherian 1
  • sjvrijn 1

author_association 3

  • MEMBER 3
  • CONTRIBUTOR 1
  • NONE 1

issue 1

  • NaN values for variables when converting from a pandas dataframe to xarray.DataSet · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
602581904 https://github.com/pydata/xarray/issues/3007#issuecomment-602581904 https://api.github.com/repos/pydata/xarray/issues/3007 MDEyOklzc3VlQ29tbWVudDYwMjU4MTkwNA== dcherian 2448579 2020-03-23T13:15:15Z 2020-03-23T13:15:15Z MEMBER

Thanks @sjvrijn

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  NaN values for variables when converting from a pandas dataframe to xarray.DataSet 454073421
602508864 https://github.com/pydata/xarray/issues/3007#issuecomment-602508864 https://api.github.com/repos/pydata/xarray/issues/3007 MDEyOklzc3VlQ29tbWVudDYwMjUwODg2NA== sjvrijn 8833517 2020-03-23T10:27:27Z 2020-03-23T10:27:27Z CONTRIBUTOR

I recently had a similar issue and found out the cause: When transforming from a dataframe to an xarray, the xarray allocates memory for all possible combinations of the coordinates. In this particular case, you have 5 unique values for latitude and longitude in your five rows, which means there are 5*5=25 possible combinations of lat/long values. All missing values are then filled in as NaN.

Let me illustrate by recreating just your data on latitude, longitude, wind_surface and hurs:

python In [3]: data = [ ...: [34.511383, 16.467664, 29.658546, 70.481293], ...: [34.515558, 16.723973, 30.896049, 71.356644], ...: [34.517359, 16.852138, 31.514799, 71.708603], ...: [34.518970, 16.980310, 32.105423, 72.023773], ...: [34.520391, 17.108487, 32.724174, 72.106110], ...: ] In [4]: df = pd.DataFrame(data=data, columns=['lat', 'long', 'wind_surface', 'hurs']).set_index(['lat', 'long']) In [5]: df Out[5]: wind_surface hurs lat long 34.511383 16.467664 29.658546 70.481293 34.515558 16.723973 30.896049 71.356644 34.517359 16.852138 31.514799 71.708603 34.518970 16.980310 32.105423 72.023773 34.520391 17.108487 32.724174 72.106110

But for the xarray, this means it will end up creating a 5x5 array, of which only 5 values are given along the diagonal. This is very clearly visible when showing just the DataArray for a single column: python In [6]: df.to_xarray()['wind_surface'] Out[6]: <xarray.DataArray 'wind_surface' (lat: 5, long: 5)> array([[29.658546, nan, nan, nan, nan], [ nan, 30.896049, nan, nan, nan], [ nan, nan, 31.514799, nan, nan], [ nan, nan, nan, 32.105423, nan], [ nan, nan, nan, nan, 32.724174]]) Coordinates: * lat (lat) float64 34.51 34.52 34.52 34.52 34.52 * long (long) float64 16.47 16.72 16.85 16.98 17.11

However, as to_xarray() outputs a DataSet, each DataArray, i.e. column from the dataframe, is summarized as a 1D array, which makes it seem like a lot of data is just 'missing': python In [7]: df.to_xarray() Out[7]: <xarray.Dataset> Dimensions: (lat: 5, long: 5) Coordinates: * lat (lat) float64 34.51 34.52 34.52 34.52 34.52 * long (long) float64 16.47 16.72 16.85 16.98 17.11 Data variables: wind_surface (lat, long) float64 29.66 nan nan nan ... nan nan nan 32.72 hurs (lat, long) float64 70.48 nan nan nan ... nan nan nan 72.11

So it works as intended, but can throw you for a loop if you don't realize it's creating an array the size of all possible index combinations.

@shoyer can you close this issue?

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  NaN values for variables when converting from a pandas dataframe to xarray.DataSet 454073421
501313101 https://github.com/pydata/xarray/issues/3007#issuecomment-501313101 https://api.github.com/repos/pydata/xarray/issues/3007 MDEyOklzc3VlQ29tbWVudDUwMTMxMzEwMQ== shoyer 1217238 2019-06-12T14:58:53Z 2019-06-12T14:58:53Z MEMBER

You will need to share a full example in order for me to explain why it works that way

On Wed, Jun 12, 2019 at 7:36 AM santianmen notifications@github.com wrote:

I know what "NaN" means. I was hoping that by transforming the dataset into a dataframe and then returning back, the dataset variables would recover its original shape.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/3007?email_source=notifications&email_token=AAJJFVUBDLFYNW2KB26SNQTP2ECX3A5CNFSM4HWRYQZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXQUM2Q#issuecomment-501302890, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJJFVVIXVCD6GUVFRIBEPDP2ECX3ANCNFSM4HWRYQZA .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  NaN values for variables when converting from a pandas dataframe to xarray.DataSet 454073421
501302890 https://github.com/pydata/xarray/issues/3007#issuecomment-501302890 https://api.github.com/repos/pydata/xarray/issues/3007 MDEyOklzc3VlQ29tbWVudDUwMTMwMjg5MA== ghost 10137 2019-06-12T14:36:44Z 2019-06-12T14:36:44Z NONE

I know what "NaN" means. I was hoping that by transforming the dataset into a dataframe and then returning back, the dataset variables would recover its original shape.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  NaN values for variables when converting from a pandas dataframe to xarray.DataSet 454073421
501121539 https://github.com/pydata/xarray/issues/3007#issuecomment-501121539 https://api.github.com/repos/pydata/xarray/issues/3007 MDEyOklzc3VlQ29tbWVudDUwMTEyMTUzOQ== shoyer 1217238 2019-06-12T05:02:56Z 2019-06-12T05:02:56Z MEMBER

In xarray and pandas, "NaN" just means "missing value". This typically happens if your DataFrame does not contain a row for every combination of Dataset coordinates, for whatever reason.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  NaN values for variables when converting from a pandas dataframe to xarray.DataSet 454073421

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.598ms · About: xarray-datasette