balancedSplit {CrossValidate} | R Documentation |
When performing cross-validation on a dataset, it often becomes necessary to split the data into training and test sets that are balanced for a factor. This function implements such a balanced split.
balancedSplit(fac, size)
fac |
A factor that should be balanced between the two subsets. |
size |
A number between 0 and 1 indicating the fraction of the dataset to be used for training. |
This function randomly samples the same fraction of items from each level of a factor to include in a training set. In most cases, this will be a binary factor (and might even be the outcome that one wants to predict). However, the implementation works for factors with an arbitrary number of levels.
Returns a logical vector with length equal to the length of
fac
. TRUE values designate samples selected for the training
set.
Kevin R. Coombes <krc@silicovore.com>
CrossValidate
, CrossValidate-class
.
nFeatures <- 40 nSamples <- 2*10 dataset <- matrix(rnorm(nSamples*nFeatures), ncol=nSamples) groups <- factor(rep(c("A", "B"), each=10)) balancedSplit(dataset, groups)