This function prepares the cross-validation by splitting the data into num.folds training and test folds for num.resample times.

create.data.split(siamcat, num.folds = 2, num.resample = 1, 
stratify = TRUE, inseparable = NULL, verbose = 1)

Arguments

siamcat

object of class siamcat-class

num.folds

integer number of cross-validation folds (needs to be >=2), defaults to 2

num.resample

integer, resampling rounds (values <= 1 deactivate resampling), defaults to 1

stratify

boolean, should the splits be stratified so that an equal proportion of classes are present in each fold?, will be ignored for regression tasks, defaults to TRUE

inseparable

string, name of metadata variable to be inseparable, defaults to NULL, see Details below

verbose

integer, control output: 0 for no output at all, 1 for only information about progress and success, 2 for normal level of information and 3 for full debug information, defaults to 1

Value

object of class siamcat-class with the data_split-slot filled

Details

This function splits the labels within a siamcat-class object and prepares the internal cross-validation for the model training (see train.model).

The function saves the training and test instances for the different cross-validation folds within a list in the data_split-slot of the siamcat-class object, which is a list with four entries:

  • num.folds - the number of cross-validation folds

  • num.resample - the number of repetitions for the cross-validation

  • training.folds - a list containing the indices for the training instances

  • test.folds - a list containing the indices for the test instances

If provided, the data split will take into account a metadata variable for the data split (by providing the inseparable argument). For example, if the data contains several samples for the same individual, it makes sense to keep data from the same individual within the same fold.

If inseparable is given, the stratify argument will be ignored.

Examples

data(siamcat_example)

# simple working example
siamcat_split <- create.data.split(siamcat_example, num.folds=10, 
num.resample=5, stratify=TRUE)
#> Features splitted for cross-validation successfully.