html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/3873#issuecomment-601948472,https://api.github.com/repos/pydata/xarray/issues/3873,601948472,MDEyOklzc3VlQ29tbWVudDYwMTk0ODQ3Mg==,14808389,2020-03-20T23:12:11Z,2020-03-20T23:13:25Z,MEMBER,"I agree, we should add something like that to the documentation. I just have been trying to do the same with pure xarray, but didn't find a way to do so without falling back to either numpy or pandas or doing something really complicated (I didn't figure out how to use `where` to put `data` into `reshaped`, yet): ```python In [150]: df = pd.DataFrame({ ...: ""lat"": np.linspace(-90, 90, 10), ...: ""lon"": np.linspace(0, 360, 10), ...: ""data"": np.arange(10), ...: }) ...: ds = df.to_xarray() ...: ds Out[150]: <xarray.Dataset> Dimensions: (index: 10) Coordinates: * index (index) int64 0 1 2 3 4 5 6 7 8 9 Data variables: lat (index) float64 -90.0 -70.0 -50.0 -30.0 ... 30.0 50.0 70.0 90.0 lon (index) float64 0.0 40.0 80.0 120.0 ... 240.0 280.0 320.0 360.0 data (index) int64 0 1 2 3 4 5 6 7 8 9 In [151]: reshaped = ( ...: ds.set_index(coordinates=[""lat"", ""lon""]) ...: .unstack() ...: .stack(coordinates=[""lat"", ""lon""]) ...: .coordinates ...: .unstack() ...: .where(False) ...: ) ...: reshaped Out[151]: <xarray.DataArray 'coordinates' (lat: 10, lon: 10)> array([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]], dtype=object) Coordinates: * lat (lat) float64 -90.0 -70.0 -50.0 -30.0 -10.0 ... 30.0 50.0 70.0 90.0 * lon (lon) float64 0.0 40.0 80.0 120.0 160.0 ... 240.0 280.0 320.0 360.0 ``` which is not something we would want to put into a documentation","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,585323675 https://github.com/pydata/xarray/issues/3873#issuecomment-601932932,https://api.github.com/repos/pydata/xarray/issues/3873,601932932,MDEyOklzc3VlQ29tbWVudDYwMTkzMjkzMg==,3274,2020-03-20T22:14:37Z,2020-03-20T22:14:37Z,CONTRIBUTOR,"@keewis -- Yes, that is what I ended up with: making a multi-indexed pandas data array first. But I still think it would be helpful to have this information in some of the tutorial for xarray. Also, one thing that can happen that could be addressed in this process is a non-unique multi-index. I did this with a set of experimental data, and then `.to_xarray()` errored out because the multi index was non-unique, which is OK for pandas apparently, but not for xarray. Note that this isn't necessarily pathological: This happened to me because there was oversampling of some data points in my data set. So it would be very helpful for the tutorial to address this -- if you have multiple samples at the same point in the condition space, how do you add arbitrary indexing so that you can successfully translate from a data frame with non-unique indexing to an xarray. Is there some way to do this automatically, so we have the equivalent of `lat x lon x 0`, `lat x lon x 1`, and so on? Even the diagnostic process is a bit of a nuisance: ``` import numpy as np foo = filtered.index[np.where(filtered.index.duplicated(False))] foo.sort_values() ``` so it would be nice to have a step-by-step translation guide.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,585323675 https://github.com/pydata/xarray/issues/3873#issuecomment-601927475,https://api.github.com/repos/pydata/xarray/issues/3873,601927475,MDEyOklzc3VlQ29tbWVudDYwMTkyNzQ3NQ==,14808389,2020-03-20T21:57:21Z,2020-03-20T21:57:21Z,MEMBER,"to answer your question, at least partially: ```python In [3]: df = pd.DataFrame({ ...: ""lat"": np.linspace(-90, 90, 10), ...: ""lon"": np.linspace(0, 360, 10), ...: ""data"": np.arange(10), ...: }) ...: df.set_index([""lat"", ""lon""]).to_xarray() Out[3]: <xarray.Dataset> Dimensions: (lat: 10, lon: 10) Coordinates: * lat (lat) float64 -90.0 -70.0 -50.0 -30.0 -10.0 ... 30.0 50.0 70.0 90.0 * lon (lon) float64 0.0 40.0 80.0 120.0 160.0 ... 240.0 280.0 320.0 360.0 Data variables: data (lat, lon) float64 0.0 nan nan nan nan nan ... nan nan nan nan 9.0 ``` so the levels of the multi-index of the `DataFrame` are the dimension coordinates of the `Dataset`. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,585323675 https://github.com/pydata/xarray/issues/3873#issuecomment-601919879,https://api.github.com/repos/pydata/xarray/issues/3873,601919879,MDEyOklzc3VlQ29tbWVudDYwMTkxOTg3OQ==,5635139,2020-03-20T21:33:19Z,2020-03-20T21:33:19Z,MEMBER,"(very much agree, and also on going the other way; there's at least one additional recent issue on this)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,585323675