github: issue_comments: 1 row where author_association = "CONTRIBUTOR" and issue = 454073421 sorted by updated

1 row where author_association = "CONTRIBUTOR" and issue = 454073421 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	performed_via_github_app	issue
602508864	https://github.com/pydata/xarray/issues/3007#issuecomment-602508864	https://api.github.com/repos/pydata/xarray/issues/3007	MDEyOklzc3VlQ29tbWVudDYwMjUwODg2NA==	sjvrijn 8833517	2020-03-23T10:27:27Z	2020-03-23T10:27:27Z	CONTRIBUTOR	I recently had a similar issue and found out the cause: When transforming from a dataframe to an xarray, the xarray allocates memory for all possible combinations of the coordinates. In this particular case, you have 5 unique values for latitude and longitude in your five rows, which means there are 55=25 possible combinations of lat/long values. All missing values are then filled in as `NaN`. Let me illustrate by recreating just your data on latitude, longitude, `wind_surface` and `hurs`: python In [3]: data = [ ...: [34.511383, 16.467664, 29.658546, 70.481293], ...: [34.515558, 16.723973, 30.896049, 71.356644], ...: [34.517359, 16.852138, 31.514799, 71.708603], ...: [34.518970, 16.980310, 32.105423, 72.023773], ...: [34.520391, 17.108487, 32.724174, 72.106110], ...: ] In [4]: df = pd.DataFrame(data=data, columns=['lat', 'long', 'wind_surface', 'hurs']).set_index(['lat', 'long']) In [5]: df Out[5]: wind_surface hurs lat long 34.511383 16.467664 29.658546 70.481293 34.515558 16.723973 30.896049 71.356644 34.517359 16.852138 31.514799 71.708603 34.518970 16.980310 32.105423 72.023773 34.520391 17.108487 32.724174 72.106110 But for the xarray, this means it will end up creating a 5x5 array, of which only 5 values are given along the diagonal. This is very clearly visible when showing just the `DataArray` for a single column: `python In [6]: df.to_xarray()['wind_surface'] Out[6]: <xarray.DataArray 'wind_surface' (lat: 5, long: 5)> array([[29.658546, nan, nan, nan, nan], [ nan, 30.896049, nan, nan, nan], [ nan, nan, 31.514799, nan, nan], [ nan, nan, nan, 32.105423, nan], [ nan, nan, nan, nan, 32.724174]]) Coordinates: lat (lat) float64 34.51 34.52 34.52 34.52 34.52 * long (long) float64 16.47 16.72 16.85 16.98 17.11` However, as `to_xarray()` outputs a `DataSet`, each `DataArray`, i.e. column from the dataframe, is summarized as a 1D array, which makes it seem like a lot of data is just 'missing': `python In [7]: df.to_xarray() Out[7]: <xarray.Dataset> Dimensions: (lat: 5, long: 5) Coordinates: * lat (lat) float64 34.51 34.52 34.52 34.52 34.52 * long (long) float64 16.47 16.72 16.85 16.98 17.11 Data variables: wind_surface (lat, long) float64 29.66 nan nan nan ... nan nan nan 32.72 hurs (lat, long) float64 70.48 nan nan nan ... nan nan nan 72.11` So it works as intended, but can throw you for a loop if you don't realize it's creating an array the size of all possible index combinations. @shoyer can you close this issue?	{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		NaN values for variables when converting from a pandas dataframe to xarray.DataSet 454073421

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);