Documentos de Académico
Documentos de Profesional
Documentos de Cultura
AbstractBinary classification is one of the most frequent unsuccessful aspects of a classification model. Some represent
studies in applied machine learning problems in various domains, performance from the specific point of view while ignoring the
from medicine to biology to meteorology to malware analysis. others. Many researchers who design a classification model give
Many researchers use some performance metrics in their narrow metrics causing misperceptions.
classification studies to report their success. However, the
literature has shown a widespread confusion about the In this study, our approach to performance metrics is from
terminology and ignorance of the fundamental aspects behind holistic perspective covering the wide range of the subject. This
metrics. This paper clarifies the confusing terminology, suggests is important for some emerging domains such as malware
formal rules to distinguish between measures and metrics for the classification or other new machine learning classification
first time, and proposes a new comprehensive visualized roadmap applications that focus on implementation details and acquainted
in a leveled structure for 22 measures and 22 metrics for exploring with only a few misleading metrics such as Accuracy (ACC),
binary classification performance. Additionally, we introduced True Positive Rate (TPR) or F-measures to claim their success.
novel concepts such as canonical notation, duality, and The researchers who want to improve their machine learning
complementation for measures/metrics, and suggested two new algorithms on different domain problems and compare their test
canonical base measures simplifying equations. It is expected that results with others have difficulties to understand performance
the study will guide other studies to have standardized approach metrics and select the most proper ones from the wide set of
to performance metrics for machine learning based solutions.
possibilities. For this reason, an originally developed visually
Keywordsbinary classification; classification performance;
enhanced performance metrics roadmap is designed as a chart
metrics; measures; machine learning; visualization; ontology based on the confusion matrix to help these researchers.
The proposed comprehensive roadmap shows the complete
I. INTRODUCTION set of primary metrics not only the common ones such as ACC,
Machine learning classification performance that is an TPR, True Negative Rate (TNR), False Positive Rate, (FPR),
important subject in several domains is related to state how well Positive Predictive Value (PPV), False Negative Rate (FNR), F1
a classifier that implements a specific machine learning score but also the others such as Prevalence, (Label) Bias,
algorithm or model makes a correct distinction between classes. INFORM (informedness), MARK (markedness), MCC
The most basic and studied classification type is binary (Matthews Correlation Coefficient), BACC (balanced accuracy,
classification or two-class classification that separates a given also known as strength), Gm (G-mean), Cohens Kappa (CK),
input into two opposite classes such as 'presence' vs. 'absence' of and Matthews Correlation Coefficient (MCC).
a disease or a condition, respond vs. no respond for a
treatment [1], 'spam' vs. 'non-spam' for an e-mail, and 'malign' The roadmap is domain independent and useful in all the data
vs. 'benign' for software. mining, machine learning and statistics studies. We aim that this
study is also a reference study for covering all the primary
Stating or comparing a classification performance with only measures and metrics with their equations specifically arranged
4 base measures is not suitable and understandable. Therefore, in binary classification context. We also reviewed metrics'
several metrics have been proposed for evaluating classification naming used interchangeably in academic and online resources
performances. Area Under (ROC) Curve (AUC) has its origins and included here in order to suit different naming conventions.
in signal detection theory in the 1970s is considered as a best The corresponding terminology in other domains such as
metric to state the performance [2], but there are other combined meteorology, medicine, or statistics is provided to see the
metrics that are useful for indicating the successful and synonyms of the measures and metrics.