Está en la página 1de 2

Binary classification performances measure cheat sheet

Damien François – v1.1 - 2009 (

Confusion matrix for two possible True positive rate: proportion of Youden's index: arithmetic mean (Cumlative) Lift chart plot of the
outcomes p (positive) and n actual positives which are predicted between sensitivity and specificity true positive rate as a function of the
(negative) positive sensitivity - (1 - specificity) proportion of the population being
TP / (TP + FN) predicted positive, controlled by some
Actual Matthews correlation correlation classifier parameter (e.g. a threshold)
True negative rate: proportion of between the actual and predicted
p n Total
actual negative which are predicted (TP . TN – FP . FN) /
true false negative ((TP+FP) (TP+FN) (TP + FP) (TN+FN))1/2
p' P
positive postive TN / (TN + FP) comprised between -1 and 1
false true
n' N Discriminant power normalised
negative negative
Positive likelihood: likelihood that a likelihood index
total P' N' predicted positive is an actual positive sqrt(3) / π .
Classification accuracy sensitivity / (1 - specificity) (log (sensitivity / (1 – specificity)) +
(TP + TN) / (TP + TN + FP + FN) log (specificity / (1 - sensitivity)))
Error rate Negative likelihood: likelihood that a <1 = poor, >3 = good, fair otherwise
(FP + FN) / (TP + TN + FP + FN) predicted negative is an actual
negative Graphical tools
sensitivity = recall = true positive rate
Paired criteria (1 - sensitivity) / specificity
specificity = true negative rate
ROC curve receiver operating
BCR = ½ . (sensitivity + specificity)
Precision: (or Positive predictive value) Combined criteria characteristic curve : 2-D curve
BCR = 2 . Youden's index - 1
proportion of predicted positives which parametrized by one parameter of the
F-measure = F1measure
are actual positive BCR: Balanced Classification Rate classification algorithm, e.g. some
Accuracy = 1 – error rate
TP / (TP + FP) ½ (TP / (TP + FN) + TN / (TN + FP)) threshold in the « true postivie rate /
BER: Balanced Error Rate, or HTER: false positive rate » space
Recall: proportion of actual positives Half Total Error Rate: 1 - BCR AUC The area under the ROC is
which are predicted positive between 0 and 1
Sokolova, M. and Lapalme, G. 2009. A
TP / (TP + FN) F-measure harmonic mean between
systematic analysis of performance
precision and recall
measures for classification tasks. Inf.
2 (precision . recall) /
Process. Manage. 45, 4 (Jul. 2009),
Sensitivity: proportion of actual (precision + recall)
positives which are predicted positive Fβ-measure weighted harmonic mean
Demsar, J.: Statistical comparisons of
TP / (TP + FN) between precision and recall
classifiers over multiple data sets.
(1+β )2 TP / ((1+β)2 TP + β 2 FN + FP)
Journal of Machine Learning Research 7
Specificity: proportion of actual (2006) 1–30
negative which are predicted negative The harmonic mean between specificity
TN / (TN + FP) and sensitivity is also often used and
sometimes referred to as F-measure.
Regression performances measure cheat sheet
Damien François – v0.9 - 2009 (

Let be a set of Absolute error Robust error measures Resampling methods

input/output pairs and a
MAD Mean Absolute Deviation Median Squared error LOO – Leave-one-out: build the model
function such that for ,
on data elements and test on
the remaining one. Iterate times to
MAPE Mean Absolute Percentage Error
-trimmed MSE collect all and compute mean error.

Squared error X-Val – Cross validation. Randomly

Predicted error where is the set of residuals split the data in two parts, use the
SSE Sum of Squared Errors, or where percents of the largest first one to build the model and the
RSS Residual Sum of Squares PRESS Predicted REsidual Sums of values are discarded. second one to test it. Iterate to get a
Squares distribution of the test error of the
M-estimators model.

MSE Mean Squared Error where is a matrix built by stacking

the in rows. is the vector of K-Fold – Cut the data into K parts.
where \rho is a non-negative function
Build the model on the K-1 first parts
with a mininmum in 0, like the
and test on the Kth one. Iterate from
GCV Generalised Cross Validation parabola, the Hubber function, or the
RMSE Root Mean Squared Error 1 to K to get a distribution of the test
bisquare function.
error of the model.

where is a matrix built by stacking Bootstrap – Draw a random subsample

Graphical tool
the in rows. is the vector of of the data with replacement. Compute
NMSE Normalised Mean Squared Error
the error on the whole dataset minus
Plot of predicted value against actual
Information criteria the training error of the model and
value. A perfect model places all dots
Iterate to get a distribution of such
where var is the empirical variance in on the diagonal.
AIC Akaike Information Criterion values. The mean of the distribution is
the sample. the optimism. The bootstrap error
where is the number of parameters estimate is the training error on the
R-squared whole dataset plus the optimism.
in the model

BIC Bayesian Information Criterion

where var is the empirical variance in
the sample
where is the number of parameters
in the model