Train model for prediction of one vote
train_prediction_model.Rd
This function can be used to train the model for the prediction of one vote based on a number of past vote results. It uses the machine learning models available in the caret package. To create replicable examples, use the function together with set.seed().
Usage
train_prediction_model(
x,
traindata,
method = "svmRadial",
trControl = NULL,
to_exclude_vars = NULL,
geovars = c("gemeinde", "v_gemwkid"),
training_prop = NA,
...
)
Arguments
- x
Column name of the dependent variable.
- traindata
Data used to train the model containing the dependent variable and the predictor columns.
- method
A string specifying which classification or regression model to use. Possible values are found using
names(getModelInfo())
. See http://topepo.github.io/caret/train-models-by-tag.html. A list of functions can also be passed for a custom model function. See http://topepo.github.io/caret/using-your-own-model-in-train.html for details.- trControl
A list of values that define how this function acts. See
trainControl
and http://topepo.github.io/caret/using-your-own-model-in-train.html. (NOTE: If given, this argument must be named.)- to_exclude_vars
Variables that should be excluded from the model. It makes sense to exclude other votes from the current Sunday since these can contain a lot of NAs that negatively impact the quality of the model (since all rows containing NAs are dropped from the training data).
- geovars
Variables containing labels and IDs of the spatial units.
- training_prop
Optional argument to define a share of observations to be randomly kept in the training data. It generates a training dataset by excluding the inverse proportion from the training data.
- ...
Optional parameters that can be passed to the caret::train() function.
Examples
# Set seed for reproducibility
set.seed(42)
train_prediction_model("Eidg1", vote_data, to_exclude_vars = "Kant1")
#> Support Vector Machines with Radial Basis Function Kernel
#>
#> 170 samples
#> 75 predictor
#>
#> No pre-processing
#> Resampling: Cross-Validated (10 fold)
#> Summary of sample sizes: 153, 152, 154, 152, 152, 153, ...
#> Resampling results across tuning parameters:
#>
#> C RMSE Rsquared MAE
#> 0.25 5.111902 0.7522515 3.165523
#> 0.50 4.109593 0.8299735 2.578325
#> 1.00 3.491039 0.8702774 2.256367
#>
#> Tuning parameter 'sigma' was held constant at a value of 0.01242253
#> RMSE was used to select the optimal model using the smallest value.
#> The final values used for the model were sigma = 0.01242253 and C = 1.