Train model for prediction of one vote — train_prediction

This function can be used to train the model for the prediction of one vote based on a number of past vote results. It uses the machine learning models available in the caret package. To create replicable examples, use the function together with set.seed().

Usage

train_prediction_model(
  x,
  traindata,
  method = "svmRadial",
  trControl = NULL,
  to_exclude_vars = NULL,
  geovars = c("gemeinde", "v_gemwkid"),
  training_prop = NA,
  ...
)

Arguments

x: Column name of the dependent variable.
traindata: Data used to train the model containing the dependent variable and the predictor columns.
method: A string specifying which classification or regression model to use. Possible values are found using names(getModelInfo()). See http://topepo.github.io/caret/train-models-by-tag.html. A list of functions can also be passed for a custom model function. See http://topepo.github.io/caret/using-your-own-model-in-train.html for details.
trControl: A list of values that define how this function acts. See trainControl and http://topepo.github.io/caret/using-your-own-model-in-train.html. (NOTE: If given, this argument must be named.)
to_exclude_vars: Variables that should be excluded from the model. It makes sense to exclude other votes from the current Sunday since these can contain a lot of NAs that negatively impact the quality of the model (since all rows containing NAs are dropped from the training data).
geovars: Variables containing labels and IDs of the spatial units.
training_prop: Optional argument to define a share of observations to be randomly kept in the training data. It generates a training dataset by excluding the inverse proportion from the training data.
...: Optional parameters that can be passed to the caret::train() function.

Value

A train object.

Examples


# Set seed for reproducibility
set.seed(42)

train_prediction_model("Eidg1", vote_data, to_exclude_vars = "Kant1")
#> Support Vector Machines with Radial Basis Function Kernel 
#> 
#> 170 samples
#>  75 predictor
#> 
#> No pre-processing
#> Resampling: Cross-Validated (10 fold) 
#> Summary of sample sizes: 153, 152, 154, 152, 152, 153, ... 
#> Resampling results across tuning parameters:
#> 
#>   C     RMSE      Rsquared   MAE     
#>   0.25  5.111902  0.7522515  3.165523
#>   0.50  4.109593  0.8299735  2.578325
#>   1.00  3.491039  0.8702774  2.256367
#> 
#> Tuning parameter 'sigma' was held constant at a value of 0.01242253
#> RMSE was used to select the optimal model using the smallest value.
#> The final values used for the model were sigma = 0.01242253 and C = 1.