Skip to contents

This function can be used to predict the outcome of multiple votes based on a number of past vote results. It uses the machine learning models available in the caret package. To create replicable examples, use the function together with set.seed().

Usage

predict_votes(
  x,
  traindata,
  testdata = traindata,
  method = "svmRadial",
  trControl = NULL,
  exclude_votes = TRUE,
  geovars = c("gemeinde", "v_gemwkid"),
  training_prop = NA,
  ...
)

Arguments

x

Column names of the dependent variables.

traindata

Data used to train the model containing the dependent variable and the predictor columns.

testdata

Dataset on which the prediction should be run. The data must contain all columns of the training data of the model model$trainingData.

method

A string specifying which classification or regression model to use. Possible values are found using names(getModelInfo()). See http://topepo.github.io/caret/train-models-by-tag.html. A list of functions can also be passed for a custom model function. See http://topepo.github.io/caret/using-your-own-model-in-train.html for details.

trControl

A list of values that define how this function acts. See trainControl and http://topepo.github.io/caret/using-your-own-model-in-train.html. (NOTE: If given, this argument must be named.)

exclude_votes

If set to TRUE, the variables to be predicted will be excluded from each others models. This makes sense on a vote Sunday due to differences in the counting processes. This means, that a lot of the votes in the data can contain NAs and should therefore be excluded. Defaults to TRUE.

geovars

Variables containing labels and IDs of the spatial units.

training_prop

Optional argument to define a share of observations to be randomly kept in the training data. It generates a training dataset by excluding the inverse proportion from the training data.

...

Optional parameters that can be passed to the caret::train() function.

Value

A data.frame.

Examples


# Set seed for reproducibility
set.seed(42)

predict_votes(c("Eidg1", "Kant1"), vote_data)
#> # A tibble: 342 × 5
#>    gemeinde           v_gemwkid  pred  real vorlage
#>    <chr>                  <dbl> <dbl> <dbl> <chr>  
#>  1 Adlikon                   21  23.4  21.5 Eidg1  
#>  2 Adliswil                 131  47.4  48.3 Eidg1  
#>  3 Aesch                    241  30.0  30.9 Eidg1  
#>  4 Aeugst am Albis            1  33.2  31.5 Eidg1  
#>  5 Affoltern am Albis         2  40.7  39.8 Eidg1  
#>  6 Altikon                  211  28.9  29.8 Eidg1  
#>  7 Andelfingen               30  33.0  32.1 Eidg1  
#>  8 Bachenbülach              51  38.3  39.9 Eidg1  
#>  9 Bachs                     81  31.4  30.5 Eidg1  
#> 10 Bäretswil                111  32.6  33.3 Eidg1  
#> # ℹ 332 more rows