Predicting stroke (II): Model selection

Second part of the series Predicting stroke.

Description

So far we’ve seen different models and approaches to the problem of classifying people in the “stroke” group and the “no-stroke” group. However, we haven’t compared them yet. In order to make a good decision about which model to choose as best, we must compare them under the most similar conditions. That’s the purpose of this project. The models are fitted and then different performance metrics are computed.

Results

Model	accuracy	$F_1$	AUC	Sensitivity	Specificity
QDA	0,6331	0,0803	0,8396	0,8876	0,6284
Decision tree	0,7482	0,1050	0,8118	0,8186	0,7469
SVM	0,7218	0,0974	0,8507	0,8314	0,7198

(In bold type the highest score obtained for each statistic)

A graphical depiction of the ROC curves is also very useful

ROC curves

The best model in terms of AUC, a commonly used performance measure, is the SVM. Despite that, for various reasons I finally decided to choose the decision tree model as the best one.

Report can be found here and the two-page “executive report” here

link to data set

Comments

That was my first time seriously doing model comparison and model selection so I probably made a lot of mistakes. I know, for instance, that the comparison is not completely legit because the models are not totally equivalent (i.e. different number of predictors and different ones).