html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/3007#issuecomment-602581904,https://api.github.com/repos/pydata/xarray/issues/3007,602581904,MDEyOklzc3VlQ29tbWVudDYwMjU4MTkwNA==,2448579,2020-03-23T13:15:15Z,2020-03-23T13:15:15Z,MEMBER,Thanks @sjvrijn ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,454073421
https://github.com/pydata/xarray/issues/3007#issuecomment-602508864,https://api.github.com/repos/pydata/xarray/issues/3007,602508864,MDEyOklzc3VlQ29tbWVudDYwMjUwODg2NA==,8833517,2020-03-23T10:27:27Z,2020-03-23T10:27:27Z,CONTRIBUTOR,"I recently had a similar issue and found out the cause: When transforming from a dataframe to an xarray, the xarray allocates memory for all possible combinations of the coordinates. In this particular case, you have 5 unique values for latitude and longitude in your five rows, which means there are 5*5=25 possible combinations of lat/long values. All missing values are then filled in as `NaN`.
Let me illustrate by recreating just your data on latitude, longitude, `wind_surface` and `hurs`:
```python
In [3]: data = [
...: [34.511383, 16.467664, 29.658546, 70.481293],
...: [34.515558, 16.723973, 30.896049, 71.356644],
...: [34.517359, 16.852138, 31.514799, 71.708603],
...: [34.518970, 16.980310, 32.105423, 72.023773],
...: [34.520391, 17.108487, 32.724174, 72.106110],
...: ]
In [4]: df = pd.DataFrame(data=data, columns=['lat', 'long', 'wind_surface', 'hurs']).set_index(['lat', 'long'])
In [5]: df
Out[5]:
wind_surface hurs
lat long
34.511383 16.467664 29.658546 70.481293
34.515558 16.723973 30.896049 71.356644
34.517359 16.852138 31.514799 71.708603
34.518970 16.980310 32.105423 72.023773
34.520391 17.108487 32.724174 72.106110
```
But for the xarray, this means it will end up creating a 5x5 array, of which only 5 values are given along the diagonal. This is very clearly visible when showing just the `DataArray` for a single column:
```python
In [6]: df.to_xarray()['wind_surface']
Out[6]:
array([[29.658546, nan, nan, nan, nan],
[ nan, 30.896049, nan, nan, nan],
[ nan, nan, 31.514799, nan, nan],
[ nan, nan, nan, 32.105423, nan],
[ nan, nan, nan, nan, 32.724174]])
Coordinates:
* lat (lat) float64 34.51 34.52 34.52 34.52 34.52
* long (long) float64 16.47 16.72 16.85 16.98 17.11
```
However, as `to_xarray()` outputs a `DataSet`, each `DataArray`, i.e. column from the dataframe, is summarized as a 1D array, which makes it seem like a lot of data is just 'missing':
```python
In [7]: df.to_xarray()
Out[7]:
Dimensions: (lat: 5, long: 5)
Coordinates:
* lat (lat) float64 34.51 34.52 34.52 34.52 34.52
* long (long) float64 16.47 16.72 16.85 16.98 17.11
Data variables:
wind_surface (lat, long) float64 29.66 nan nan nan ... nan nan nan 32.72
hurs (lat, long) float64 70.48 nan nan nan ... nan nan nan 72.11
```
So it works as intended, but can throw you for a loop if you don't realize it's creating an array the size of all possible index combinations.
@shoyer can you close this issue?","{""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,454073421
https://github.com/pydata/xarray/issues/3007#issuecomment-501313101,https://api.github.com/repos/pydata/xarray/issues/3007,501313101,MDEyOklzc3VlQ29tbWVudDUwMTMxMzEwMQ==,1217238,2019-06-12T14:58:53Z,2019-06-12T14:58:53Z,MEMBER,"You will need to share a full example in order for me to explain why it
works that way
On Wed, Jun 12, 2019 at 7:36 AM santianmen wrote:
> I know what ""NaN"" means. I was hoping that by transforming the dataset
> into a dataframe and then returning back, the dataset variables would
> recover its original shape.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> ,
> or mute the thread
>
> .
>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,454073421
https://github.com/pydata/xarray/issues/3007#issuecomment-501302890,https://api.github.com/repos/pydata/xarray/issues/3007,501302890,MDEyOklzc3VlQ29tbWVudDUwMTMwMjg5MA==,10137,2019-06-12T14:36:44Z,2019-06-12T14:36:44Z,NONE,"I know what ""NaN"" means. I was hoping that by transforming the dataset into a dataframe and then returning back, the dataset variables would recover its original shape.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,454073421
https://github.com/pydata/xarray/issues/3007#issuecomment-501121539,https://api.github.com/repos/pydata/xarray/issues/3007,501121539,MDEyOklzc3VlQ29tbWVudDUwMTEyMTUzOQ==,1217238,2019-06-12T05:02:56Z,2019-06-12T05:02:56Z,MEMBER,"In xarray and pandas, ""NaN"" just means ""missing value"". This typically happens if your DataFrame does not contain a row for every combination of Dataset coordinates, for whatever reason.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,454073421