html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/1391#issuecomment-481786968,https://api.github.com/repos/pydata/xarray/issues/1391,481786968,MDEyOklzc3VlQ29tbWVudDQ4MTc4Njk2OA==,26384082,2019-04-10T17:31:48Z,2019-04-10T17:31:48Z,NONE,"In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the `stale` label; otherwise it will be marked as closed automatically ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,225536793 https://github.com/pydata/xarray/issues/1391#issuecomment-300545110,https://api.github.com/repos/pydata/xarray/issues/1391,300545110,MDEyOklzc3VlQ29tbWVudDMwMDU0NTExMA==,6980561,2017-05-10T16:53:25Z,2017-05-10T16:53:25Z,NONE,"@darothen That sounds great! I think we should be clearer. The issue that @NicWayand and I are highlighting is the coercing observational data, which often comes with some fairly heinous formatting issues, into an xarray format. The stacking of these data along a new dimension is usually the last step in this process, and one that can be frustrating. An example of this in practice can be found in this notebook (please be forgiving, it is one of the first things I ever wrote in python). https://github.com/klapo/CalRad/blob/master/CR.SurfObs.DataIngest.xray.ipynb The data flow looks like this: - read the csv summarizing each station - read data from one set of stations using pandas - clean the data - assign the data in a pandas DataFrame to a dictionary of DataFrames - rinse and repeat for the other set of data - concat the dictionary of DataFrames into a single DataFrame - convert to an xarray DataSet This example is a little ludicrous because I didn't know what I was doing, but I think that's the point. There is a lot of ambiguity on which tools to use at what point. Concatenating a dictionary of DataFrames into a single DataFrame and then converting to a DataSet was the only solution I could get to work, after a lot of trial and error, for putting these data in an xarray DataSet. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,225536793 https://github.com/pydata/xarray/issues/1391#issuecomment-300462962,https://api.github.com/repos/pydata/xarray/issues/1391,300462962,MDEyOklzc3VlQ29tbWVudDMwMDQ2Mjk2Mg==,4992424,2017-05-10T12:11:56Z,2017-05-10T12:11:56Z,NONE,"@klapo! Great to see you here! Happy to iterate with you on documenting this functionality. For reference, I wrote [a package](https://github.com/darothen/experiment) for my dissertation work to help automate the task of constructing multi-dimensional Datasets which include dimensions corresponding to experimental/ensemble factors. One of my on-going projects is to actually fully abstract this (I have a not-uploaded branch of the project which tries to build the notion of an ""EnsembleDataset"", which has the same relationship to a Dataset that an pandas Panel used to have to a DataFrame).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,225536793 https://github.com/pydata/xarray/issues/1391#issuecomment-300383088,https://api.github.com/repos/pydata/xarray/issues/1391,300383088,MDEyOklzc3VlQ29tbWVudDMwMDM4MzA4OA==,6980561,2017-05-10T06:03:20Z,2017-05-10T06:03:20Z,NONE,"Also, just a small thing in the docs for `concat` The example includes this snippet `xr.concat([arr[0], arr[1]], pd.Index([-90, -100], name='new_dim'))` but as far as I can tell, `name` is not an argument accepted by `concat`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,225536793 https://github.com/pydata/xarray/issues/1391#issuecomment-300381278,https://api.github.com/repos/pydata/xarray/issues/1391,300381278,MDEyOklzc3VlQ29tbWVudDMwMDM4MTI3OA==,6980561,2017-05-10T05:52:21Z,2017-05-10T05:56:15Z,NONE,"I have an example that I just struggled through that might be relevant to this idea. I'm running a point model using some arbitrary number of experiments (for the below example there are 28 experiments). Each experiment is opened and then stored in a dictionary `resultsDict`. The below excerpt extracts all of my scalar variables, concatenates them along an experiment dimension, and finally combines all scalar variables into a DataSet. I often find myself struggling to combine data (for instance meteorological stations) into a DataSet and I can never remember how to use `merge` and/or `concat`. ``` resultsDataSet = xr.Dataset() for k in scalar_data_vars: if not 'scalar' in k: continue # Assign scalar value to a dataArray darray = xr.concat([resultsDict[scen][scalar_data_vars[0]] for scen in resultsDict], dim='expID') # Remove hru dimension, as it is unused darray = darray.squeeze('hru') resultsDataSet[k] = darray print(resultsDataSet) ``` which yields ``` Dimensions: (expID: 28, time: 8041) Coordinates: * time (time) datetime64[ns] 2008-10-01 ... hru int32 1 Dimensions without coordinates: expID Data variables: scalarRainPlusMelt (expID, time) float64 -9.999e+03 -9.999e+03 ... scalarSWE (expID, time) float64 -9.999e+03 -9.999e+03 ... scalarSnowSublimation (expID, time) float64 -9.999e+03 -9.999e+03 ... scalarInfiltration (expID, time) float64 -9.999e+03 -9.999e+03 ... scalarSurfaceRunoff (expID, time) float64 -9.999e+03 -9.999e+03 ... scalarSurfaceTemp (expID, time) float64 -9.999e+03 -9.999e+03 ... scalarSenHeatTotal (expID, time) float64 -9.999e+03 -9.999e+03 ... scalarLatHeatTotal (expID, time) float64 -9.999e+03 -9.999e+03 ... scalarSnowDepth (expID, time) float64 -9.999e+03 -9.999e+03 ... ``` And here is a helper function that can do this more generally, which I wrote a while back. ``` def combinevars(ds_in, dat_vars, new_dim_name='new_dim', combinevarname='new_var'): ds_out = xr.Dataset() ds_out = xr.concat([ds_in[dv] for dv in dat_vars], dim='new_dim') ds_out = ds_out.rename({'new_dim': new_dim_name}) ds_out.coords[new_dim_name] = dat_vars ds_out.name = combinevarname return ds_out ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,225536793