html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/3007#issuecomment-602581904,https://api.github.com/repos/pydata/xarray/issues/3007,602581904,MDEyOklzc3VlQ29tbWVudDYwMjU4MTkwNA==,2448579,2020-03-23T13:15:15Z,2020-03-23T13:15:15Z,MEMBER,Thanks @sjvrijn ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,454073421 https://github.com/pydata/xarray/issues/3007#issuecomment-602508864,https://api.github.com/repos/pydata/xarray/issues/3007,602508864,MDEyOklzc3VlQ29tbWVudDYwMjUwODg2NA==,8833517,2020-03-23T10:27:27Z,2020-03-23T10:27:27Z,CONTRIBUTOR,"I recently had a similar issue and found out the cause: When transforming from a dataframe to an xarray, the xarray allocates memory for all possible combinations of the coordinates. In this particular case, you have 5 unique values for latitude and longitude in your five rows, which means there are 5*5=25 possible combinations of lat/long values. All missing values are then filled in as `NaN`. Let me illustrate by recreating just your data on latitude, longitude, `wind_surface` and `hurs`: ```python In [3]: data = [ ...: [34.511383, 16.467664, 29.658546, 70.481293], ...: [34.515558, 16.723973, 30.896049, 71.356644], ...: [34.517359, 16.852138, 31.514799, 71.708603], ...: [34.518970, 16.980310, 32.105423, 72.023773], ...: [34.520391, 17.108487, 32.724174, 72.106110], ...: ] In [4]: df = pd.DataFrame(data=data, columns=['lat', 'long', 'wind_surface', 'hurs']).set_index(['lat', 'long']) In [5]: df Out[5]: wind_surface hurs lat long 34.511383 16.467664 29.658546 70.481293 34.515558 16.723973 30.896049 71.356644 34.517359 16.852138 31.514799 71.708603 34.518970 16.980310 32.105423 72.023773 34.520391 17.108487 32.724174 72.106110 ``` But for the xarray, this means it will end up creating a 5x5 array, of which only 5 values are given along the diagonal. This is very clearly visible when showing just the `DataArray` for a single column: ```python In [6]: df.to_xarray()['wind_surface'] Out[6]: array([[29.658546, nan, nan, nan, nan], [ nan, 30.896049, nan, nan, nan], [ nan, nan, 31.514799, nan, nan], [ nan, nan, nan, 32.105423, nan], [ nan, nan, nan, nan, 32.724174]]) Coordinates: * lat (lat) float64 34.51 34.52 34.52 34.52 34.52 * long (long) float64 16.47 16.72 16.85 16.98 17.11 ``` However, as `to_xarray()` outputs a `DataSet`, each `DataArray`, i.e. column from the dataframe, is summarized as a 1D array, which makes it seem like a lot of data is just 'missing': ```python In [7]: df.to_xarray() Out[7]: Dimensions: (lat: 5, long: 5) Coordinates: * lat (lat) float64 34.51 34.52 34.52 34.52 34.52 * long (long) float64 16.47 16.72 16.85 16.98 17.11 Data variables: wind_surface (lat, long) float64 29.66 nan nan nan ... nan nan nan 32.72 hurs (lat, long) float64 70.48 nan nan nan ... nan nan nan 72.11 ``` So it works as intended, but can throw you for a loop if you don't realize it's creating an array the size of all possible index combinations. @shoyer can you close this issue?","{""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,454073421 https://github.com/pydata/xarray/issues/3007#issuecomment-501313101,https://api.github.com/repos/pydata/xarray/issues/3007,501313101,MDEyOklzc3VlQ29tbWVudDUwMTMxMzEwMQ==,1217238,2019-06-12T14:58:53Z,2019-06-12T14:58:53Z,MEMBER,"You will need to share a full example in order for me to explain why it works that way On Wed, Jun 12, 2019 at 7:36 AM santianmen wrote: > I know what ""NaN"" means. I was hoping that by transforming the dataset > into a dataframe and then returning back, the dataset variables would > recover its original shape. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > , > or mute the thread > > . > ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,454073421 https://github.com/pydata/xarray/issues/3007#issuecomment-501302890,https://api.github.com/repos/pydata/xarray/issues/3007,501302890,MDEyOklzc3VlQ29tbWVudDUwMTMwMjg5MA==,10137,2019-06-12T14:36:44Z,2019-06-12T14:36:44Z,NONE,"I know what ""NaN"" means. I was hoping that by transforming the dataset into a dataframe and then returning back, the dataset variables would recover its original shape.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,454073421 https://github.com/pydata/xarray/issues/3007#issuecomment-501121539,https://api.github.com/repos/pydata/xarray/issues/3007,501121539,MDEyOklzc3VlQ29tbWVudDUwMTEyMTUzOQ==,1217238,2019-06-12T05:02:56Z,2019-06-12T05:02:56Z,MEMBER,"In xarray and pandas, ""NaN"" just means ""missing value"". This typically happens if your DataFrame does not contain a row for every combination of Dataset coordinates, for whatever reason.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,454073421