Run predictions for multiple votes — predict

This function can be used to predict the outcome of multiple votes based on a number of past vote results. It uses the machine learning models available in the caret package. To create replicable examples, use the function together with set.seed().

Usage

predict_votes(
  x,
  traindata,
  testdata = traindata,
  method = "svmRadial",
  trControl = NULL,
  exclude_votes = TRUE,
  geovars = c("gemeinde", "v_gemwkid"),
  training_prop = NA,
  ...
)

Arguments

x: Column names of the dependent variables.
traindata: Data used to train the model containing the dependent variable and the predictor columns.
testdata: Dataset on which the prediction should be run. The data must contain all columns of the training data of the model model$trainingData.
method: A string specifying which classification or regression model to use. Possible values are found using names(getModelInfo()). See http://topepo.github.io/caret/train-models-by-tag.html. A list of functions can also be passed for a custom model function. See http://topepo.github.io/caret/using-your-own-model-in-train.html for details.
trControl: A list of values that define how this function acts. See trainControl and http://topepo.github.io/caret/using-your-own-model-in-train.html. (NOTE: If given, this argument must be named.)
exclude_votes: If set to TRUE, the variables to be predicted will be excluded from each others models. This makes sense on a vote Sunday due to differences in the counting processes. This means, that a lot of the votes in the data can contain NAs and should therefore be excluded. Defaults to TRUE.
geovars: Variables containing labels and IDs of the spatial units.
training_prop: Optional argument to define a share of observations to be randomly kept in the training data. It generates a training dataset by excluding the inverse proportion from the training data.
...: Optional parameters that can be passed to the caret::train() function.

Value

A data.frame.

Examples


# Set seed for reproducibility
set.seed(42)

predict_votes(c("Eidg1", "Kant1"), vote_data)
#> # A tibble: 342 × 5
#>    gemeinde           v_gemwkid  pred  real vorlage
#>    <chr>                  <dbl> <dbl> <dbl> <chr>  
#>  1 Adlikon                   21  23.4  21.5 Eidg1  
#>  2 Adliswil                 131  47.4  48.3 Eidg1  
#>  3 Aesch                    241  30.0  30.9 Eidg1  
#>  4 Aeugst am Albis            1  33.2  31.5 Eidg1  
#>  5 Affoltern am Albis         2  40.7  39.8 Eidg1  
#>  6 Altikon                  211  28.9  29.8 Eidg1  
#>  7 Andelfingen               30  33.0  32.1 Eidg1  
#>  8 Bachenbülach              51  38.3  39.9 Eidg1  
#>  9 Bachs                     81  31.4  30.5 Eidg1  
#> 10 Bäretswil                111  32.6  33.3 Eidg1  
#> # ℹ 332 more rows