html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/3007#issuecomment-602508864,https://api.github.com/repos/pydata/xarray/issues/3007,602508864,MDEyOklzc3VlQ29tbWVudDYwMjUwODg2NA==,8833517,2020-03-23T10:27:27Z,2020-03-23T10:27:27Z,CONTRIBUTOR,"I recently had a similar issue and found out the cause: When transforming from a dataframe to an xarray, the xarray allocates memory for all possible combinations of the coordinates. In this particular case, you have 5 unique values for latitude and longitude in your five rows, which means there are 5*5=25 possible combinations of lat/long values. All missing values are then filled in as `NaN`.

Let me illustrate by recreating just your data on latitude, longitude, `wind_surface` and `hurs`:

```python
In [3]: data = [
    ...:     [34.511383, 16.467664, 29.658546, 70.481293],
    ...:     [34.515558, 16.723973, 30.896049, 71.356644],
    ...:     [34.517359, 16.852138, 31.514799, 71.708603],
    ...:     [34.518970, 16.980310, 32.105423, 72.023773],
    ...:     [34.520391, 17.108487, 32.724174, 72.106110],
    ...: ]
In [4]: df = pd.DataFrame(data=data, columns=['lat', 'long', 'wind_surface', 'hurs']).set_index(['lat', 'long'])
In [5]: df
Out[5]:
                     wind_surface       hurs
lat       long
34.511383 16.467664     29.658546  70.481293
34.515558 16.723973     30.896049  71.356644
34.517359 16.852138     31.514799  71.708603
34.518970 16.980310     32.105423  72.023773
34.520391 17.108487     32.724174  72.106110
```

But for the xarray, this means it will end up creating a 5x5 array, of which only 5 values are given along the diagonal. This is very clearly visible when showing just the `DataArray` for a single column:
```python
In [6]: df.to_xarray()['wind_surface']
Out[6]:
<xarray.DataArray 'wind_surface' (lat: 5, long: 5)>
array([[29.658546,       nan,       nan,       nan,       nan],
       [      nan, 30.896049,       nan,       nan,       nan],
       [      nan,       nan, 31.514799,       nan,       nan],
       [      nan,       nan,       nan, 32.105423,       nan],
       [      nan,       nan,       nan,       nan, 32.724174]])
Coordinates:
  * lat      (lat) float64 34.51 34.52 34.52 34.52 34.52
  * long     (long) float64 16.47 16.72 16.85 16.98 17.11
```

However, as `to_xarray()` outputs a `DataSet`, each `DataArray`, i.e. column from the dataframe, is summarized as a 1D array, which makes it seem like a lot of data is just 'missing':
```python
In [7]: df.to_xarray()
Out[7]:
<xarray.Dataset>
Dimensions:       (lat: 5, long: 5)
Coordinates:
  * lat           (lat) float64 34.51 34.52 34.52 34.52 34.52
  * long          (long) float64 16.47 16.72 16.85 16.98 17.11
Data variables:
    wind_surface  (lat, long) float64 29.66 nan nan nan ... nan nan nan 32.72
    hurs          (lat, long) float64 70.48 nan nan nan ... nan nan nan 72.11
```

So it works as intended, but can throw you for a loop if you don't realize it's creating an array the size of all possible index combinations.

@shoyer can you close this issue?","{""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,454073421