Skip to contents

This function can be used to train the model for the prediction of one vote based on a number of past vote results. It uses the machine learning models available in the caret package. To create replicable examples, use the function together with set.seed().

Usage

train_prediction_model(
  x,
  traindata,
  method = "svmRadial",
  trControl = NULL,
  to_exclude_vars = NULL,
  geovars = c("gemeinde", "v_gemwkid"),
  training_prop = NA,
  ...
)

Arguments

x

Column name of the dependent variable.

traindata

Data used to train the model containing the dependent variable and the predictor columns.

method

A string specifying which classification or regression model to use. Possible values are found using names(getModelInfo()). See http://topepo.github.io/caret/train-models-by-tag.html. A list of functions can also be passed for a custom model function. See http://topepo.github.io/caret/using-your-own-model-in-train.html for details.

trControl

A list of values that define how this function acts. See trainControl and http://topepo.github.io/caret/using-your-own-model-in-train.html. (NOTE: If given, this argument must be named.)

to_exclude_vars

Variables that should be excluded from the model. It makes sense to exclude other votes from the current Sunday since these can contain a lot of NAs that negatively impact the quality of the model (since all rows containing NAs are dropped from the training data).

geovars

Variables containing labels and IDs of the spatial units.

training_prop

Optional argument to define a share of observations to be randomly kept in the training data. It generates a training dataset by excluding the inverse proportion from the training data.

...

Optional parameters that can be passed to the caret::train() function.

Value

A train object.

Examples


# Set seed for reproducibility
set.seed(42)

train_prediction_model("Eidg1", vote_data, to_exclude_vars = "Kant1")
#> Support Vector Machines with Radial Basis Function Kernel 
#> 
#> 170 samples
#>  75 predictor
#> 
#> No pre-processing
#> Resampling: Cross-Validated (10 fold) 
#> Summary of sample sizes: 153, 152, 154, 152, 152, 153, ... 
#> Resampling results across tuning parameters:
#> 
#>   C     RMSE      Rsquared   MAE     
#>   0.25  5.111902  0.7522515  3.165523
#>   0.50  4.109593  0.8299735  2.578325
#>   1.00  3.491039  0.8702774  2.256367
#> 
#> Tuning parameter 'sigma' was held constant at a value of 0.01242253
#> RMSE was used to select the optimal model using the smallest value.
#> The final values used for the model were sigma = 0.01242253 and C = 1.