SQO-OSS D 2 Final

Software Quality Observatory for Open Source Software
Project Number: IST-2005-33331
D2 - Overview of the state of the art

Deliverable Report
Work Package Number: 1

Work Package Title: Requirements Definition and Analysis
Deliverable Number: 2
Coordinator: AUTH
Contributors: AUTH,AUEB,KDE
Due Date: 22nd January 2007
Delivery Date: 22nd January 2007
Availability: Restricted
Document Id: SQO-OSS_D_2
D2 / IST-2005-33331 SQO-OSS 22nd January 2007
Executive Summary
Chapter one presents the most important and widely used metrics in software engi-
neering for quality evaluation. The area of software engineering metrics is always
under study; researchers continue to validate the metrics. The metrics presented
were selected after studying software engineering literature, yielding only those
metrics that are widely accepted. We must stress that we have not presented any
models for evaluating quality, only metrics that can be used for quality evaluation.
Quality evaluation models will be presented in the appropriate deliverable.
The metrics presented are categorised according to an accepted taxonomy among
researchers into three sections: process metrics, product metrics and resources met-
rics. We have also included a section for metrics specific for Open Source software
development. The presentation of the metrics is brief, allowing for a straightforward
application and tool development. We have included both metrics that are consid-
ered classic (e.g. program length and McCabe’s cyclomatic complexity) and modern
metrics (e.g. the Chidamber and Kemerer metrics suite and object oriented design
heuristics). While we present some metrics for Open Source software development,
this topic will be presented at length elsewhere.
Chapter two presents tools for acquiring metrics presented in chapter one. The
tools presented are both Open Source and proprietary. There are a lot of metrics
tools available and we tried to present a representative sample of them. Specifically
we present those tools that are going to be useful for our own system and there is a
potential to include them in our system (especially the Open Source ones). We tried
to install and test each tool ourselves. For each tool we present its functionality
and include also some screenshots of it. Although we tried to include all possible
tools that might be helpful to our project, future work will accomodate such tools as
become available.
Chapter three introduces empirical Open Source Software studies from many
viewpoints. The first part details the historical perspectives of the evolution of five
popular Open Source Software systems (Linux, Apache, Mozilla, GNOME, and the
FreeBSD). This is followed by horizontal studies in which researchers examining sev-
eral projects collectively. A model for the simulation of the evolution of Open Source
Software projects and results from early studies is also presented. The evolution of
Open Source Software projects is directly linked with the evolution of the code and
communities around the project. Thus, the forth viewpoint in this chapter considers
code quality studies of Open Source Software by applying evolution laws of Open
Source software development to study how code evolves and how this evolution af-
fects the quality of the software. The chapter concludes with community studies
in mailing lists, in which a research methodology for the extraction and analysis of
community activities in mailing lists is proposed.
Chapter four introduces the concept of data mining and its significance in the
Revision: final 1
context of software engineering. A large amount of data is produced in software de-

velopment that software organizations collect in hope of better understanding their
processes and products. Specifically, the data in software development can refer
to versions of programs, execution traces, error or bug reports and Open Source
packages. As well, mailing lists, discussion forums and newsletters could provide
useful information about software. This data is widely believed that hides signif-
icant knowledge about software projects’ performance and quality. Data mining
provides the techniques (clustering, classification and association rules) to analyze
and extract novel, interesting patterns from software engineering databases. In this
chapter we review the data mining approaches that have currently been proposed,
aiming to assist with some of the main software engineering tasks.
Since software engineering repositories consists of text documents (e.g. mailing
lists, bug reports, execution logs), the mining of textual artifacts is requisite for
many important activities in software engineering: tracing of requirements, retrieval
of components from a repository, identification and prediction of software failures,
etc. We present the state-of-the-art of the text mining techniques applied in software
engineering, providing also a comparative study for them. We conclude by briefly
discussing further work directions of Data/Text Mining in software engineering.
Revision: final 2
Document Information
Deliverable Number: 2
Due Date: 22nd January 2007
Deliverable Date: 22nd January 2007
Approvals
Name Organisation Date
Coordinator Georgios AUEB/SENSE 10/09/2006
Gousios
Technical Coordinator Ioannis Samo- AUTH/PLaSE
ladas
WP leader Ioannis Antoni- AUTH/PLaSE
ades
Quality Reviewer 1
Quality Reviewer 2
Quality Reviewer 3
Revisions
Revision Date Modification Authors
0.1 05/10/2006 Initial version AUTH
Revision: final 3
Contents
1 Software Metrics and Measurement 7
1.1 Software Metrics Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Process Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Structure Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.2 Design Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.3 Product Quality Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3 Productivity Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.4 Open Source Development Metrics . . . . . . . . . . . . . . . . . . . . . . 22
1.5 Software Metrics Validation . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.5.1 Validation of prediction measurement . . . . . . . . . . . . . . . . . 25
1.5.2 Validation of measures . . . . . . . . . . . . . . . . . . . . . . . . . 26
2 Tools 27
2.1 Process Analysis Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.1 CVSAnalY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.1.2 GlueTheos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1.3 MailingListStats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2 Metrics Collection Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.1 ckjm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.2 The Byte Code Metric Library . . . . . . . . . . . . . . . . . . . . . 33
2.2.3 C and C++ Code Counter . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.4 Software Metrics Plug-In for the Eclipse IDE . . . . . . . . . . . . 34
2.3 Static Analysis Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.1 FindBugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.2 PMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3.3 QJ-Pro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3.4 Bugle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4 Hybrid Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.1 The Empirical Project Monitor . . . . . . . . . . . . . . . . . . . . . 39
2.4.2 HackyStat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4.3 QSOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5 Commercial Metrics Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.6 Process metrics tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.6.1 MetriFlame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6.2 Estimate Professional . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.6.3 CostXpert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.6.4 ProjectConsole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.6.5 CA-Estimacs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.6.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.7 Product metrics tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Revision: final 4
2.7.1 CT C++ -CMT++-CTB . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.7.2 Cantata++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.7.3 TAU/Logiscope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.7.4 McCabe IQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.7.5 Rational Functional Tester (RFT) . . . . . . . . . . . . . . . . . . . 52
2.7.6 Safire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.7.7 Metrics 4C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.7.8 Resource Standard Metrics . . . . . . . . . . . . . . . . . . . . . . . 55
2.7.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3 Empirical OSS Studies 57

3.1 Evolutionary Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.1.1 Historical Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.1.2 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.1.3 Apache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.1.4 Mozilla . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.1.5 GNOME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.1.6 FreeBSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.1.7 Other Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.1.8 Simulation of the temporal evolution of OSS projects . . . . . . . . 72
3.2 Code Quality Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.3 F/OSS Community Studies in Mailing Lists . . . . . . . . . . . . . . . . . 84
3.3.1 Introduction: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.3.2 Mailing Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.3.3 Studying Community Participation in Mailing Lists: Research method-
ology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4 Data Mining in Software Engineering 88

4.1 Introduction to Data Mining and Knowledge Discovery . . . . . . . . . . 88
4.1.1 Data Mining Process . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.2 Data mining application in software engineering: Overview . . . . . . . . 89
4.2.1 Using Data mining in software maintenance . . . . . . . . . . . . . 90
4.2.2 A Data Mining approach to automated software testing . . . . . . 102
4.3 Text Mining and Software Engineering . . . . . . . . . . . . . . . . . . . . 105
4.3.1 Text Mining - The State of the Art . . . . . . . . . . . . . . . . . . . 106
4.3.2 Text Mining Approaches in Software Engineering . . . . . . . . . . 108
4.4 Future Directions of Data/Text Mining Applications in Software Engi-
neering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5 Related IST Projects 113

5.1 CALIBRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2 EDOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Revision: final 5
5.3 FLOSSMETRICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

5.4 FLOSSWORLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.5 PYPY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.6 QUALIPSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.7 QUALOSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.8 SELF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.9 TOSSAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Revision: final 6
1 Software Metrics and Measurement

As stated in the Description of Work, SQO-OSS aims to provide a a holistic approach
on software assessment, initially targeted to open source software development.
Specifically the main goals of the project are:
1. Evaluate the quality of Open Source software.
2. Evaluate the health of an Open Source software project.
These two main goals will be delivered through a plug-in based quality assessment
platform. In order to achieve these goals, the project’s consortium has to answer
specific questions derive from those goals. Thus, for the goals presented the follow-
ing have to be answered:
1. How can the quality of Open Source software be evaluated and improved?
• How is quality evaluated?
2. How can the health of an Open Source software project be evaluate?
• How is the health of a project evaluated?
These questions can be answered if we examine and measure both the process of
creating Open Source software and the product itself, i.e. the code. Both entities
can be measured with the help of software metrics. This section presents software
metrics and overview of how useful the metrics are for software evaluation.
1.1 Software Metrics Taxonomy

In this section we describe the various software metrics that exist in the area of
software engineering and are going to be useful for our research. Furthermore,
we refer to metrics specific to open source software development. These metrics
are divided into categories. The chosen classification is widely used in the software
metrics literature [FP97].
• Process metrics are metrics that refer to the software development activities
and processes. Measuring defects per testing hour, time, number of people,
etc. falls under this category.
• Product metrics are metrics that refer to the the products of the software de-
velopment process (e.g. code but also documents etc.).
• Resources metrics are metrics that refer to any input to the development pro-
cess (e.g. people and methods).
Revision: final 7
Each one of these categories contains metrics that are further distinguished as
either internal or external metrics.
• Internal metrics of a product refers to the process or resource that can be

purely measured by examining the product, process or the resource on its own.
• External metrics of a product refers to the process or resource that can be

measured only with respect to how the product, process or resource related to
its environment (i.e. the behaviour).
Apart from the formal categories presented, we shall also include some metrics de-
rived directly from the Open Source development process.
In the following sections the most important (to our own perspective) metrics
shall be presented in each of the categories above. However, the metrics presented
have been studied and used extensively in traditional closed source software devel-
opment. In the end we present metrics for Open Source software that have appeared
in the recent years, when researchers started studying Open Source software. Al-
though these metrics can be classified according to the above taxonomy, we prefer
to present them separately.
1.2 Process Metrics

Defect Density: One of the most widely accepted metrics for software quality is
Defect Density. This metric is expressed as the number of defects found per certain
amount of software. This amount is usually counted as the number of lines of code of
the delivered product (specific metrics regarding size are presented in the following
sections). Defect Density can simply be expressed thus:
Number Of Known Defects

Defect Density =
LOC
Many researchers split the kind of defects into two categories: known defects, which
are the defects that have been discovered during testing (before the release of the
product) and latent defects, which are the defects discovered after the release of
product [FP97]. For each one of these two categories, there is a separate defect
density metric.
Defect density is considered to be a product metric and thus it should have been
presented in the next section. However it is directly derived from the development
process [FP97] (defect discovery through testing) so it is presented in this section.
In addition, it is also a product metric, as it reflects the quality of the product, par-
ticularly the defects found after product release.
Revision: final 8
Defect Removal Effectiveness: Defect Removal Effectiveness is a process metric

and reflects the ability of the development team to remove defects [Kan03]. The
metric is defined as:
Defects Removed in Development
Defect Removal Effectiveness = ∗ 100%
Defects Removed + Defects Found Later
This is a very useful metric and can be applied at any phase of the software develop-
ment process.
One other metric which can be derived from defect density is system spoilage
[FP97], a metric rather useful for the effectiveness of the development team. This
metric is defined as
Time To Fix Post Release Defects
System Spoilage =
Total System Development Time
As mentioned, this metric reflects the ability of the development team to respond to
defects found.
LOC: Code can be measured in several ways. The first and most common metrics
in the area of software engineering is the number of lines of code (LOC). Although
it may seem easy to measure the lines of the code of a computer program, there is
a controversy about what we mean by LOC. Most researchers refer to LOC as the
Source Lines Of Code (SLOC) which can either be physical SLOCs or Logical SLOCs.
Specific definitions of these two measures vary in the sense that what is actually
measured is not explicit. One needs to consider whether what is measured refers to
any one of the following:
• Blank lines.
• Comment lines.
• Data declarations.
• Lines that contain several separate declarations.
Logical SLOC measures attempt to measure the number of "statements". Its

definition will vary depending on the programming language. Since programming
languages have language-specific syntax, the Logical SLOC definition for each lan-
guage will be different. One simple logical SLOC measure for C-like languages is the
number of statement-terminating semicolons. It is much easier to create tools that
measure physical SLOC, and physical SLOC definitions are easier to explain. How-
ever, physical SLOC measures are sensitive to logically irrelevant formatting and
style conventions, while logical SLOC is less sensitive to formatting and style con-
ventions. Unfortunately, SLOC measures are often stated without providing their
Revision: final 9
definition, and logical SLOC can often be significantly different from physical SLOC.
For the purpose of our research, a physical source line of code (SLOC) will be defined
as:
... a line ending in a newline or end-of-file marker, and which contains at

least one non-whitespace non-comment character. Comment delimiters
(characters other than newlines starting and ending a comment) are con-
sidered comment characters. Data lines only including whitespace (e.g.,
lines with only tabs and spaces in multiline strings) are not included.
Using the definition above we have to stress that this size metrics does not repre-
sent the actual size of the source code of the program since it excluded the comment
lines. Thus the total length of the program is represented as
Totallength(LOC ) = SLOC + Numberofcommentedlines.
The number of commented line is also a useful metric when we refer to other aspects
of software, e.g. documentation.
Halstead Software Science: Apart from counting lines of code there are also
other kind of metrics that try to measure the length of a computer program. One
of the earliest of such metrics was introduced by Halstead [Hal77] in the late ’70s.
Halstead measures are based on four measures that are directly derived from the
source code:
• µ1 the number of distinct operators,
• µ2 the number of distinct operands,
• N1 the total number of operators,
• N2 the total number of operands.
Halstead further introduced some metrics based upon the previous measures. These
metrics are:
• The length N of a program N = N1 + N2 ,
• The vocabulary µ of a program µ = µ1 + µ2 ,
• The volume V of a program V = N ∗ log2 µ,

µ1 N2
• The difficulty D of a program D = 2
∗ µ2
.
Revision: final 10
In order for these metrics to be measured, one has to decide how to identify the
operators and operands. Halstead also used his metrics to estimate the length and
the effort for a given program. For more on Halstead estimations see [Hal77].
Halstead Software Science metrics have been criticised a lot during the years and
there are controversial opinions regarding them, especially for the volume, difficulty
and the rest of estimation metrics. These opinions vary from “no corresponding
consensus” [FP97] to “strongest measures of maintainability” [OH94]. However the
value of N as a program length, as well as the volume of a program, as proposed
by Halstead, does not contradict any relations we have between a program and its
length. Thus, we choose to include Halstead metrics in our research [FP97].
Function Points: The previous size measures (1.2-1.2) count physical size: lines,
operators and operands. Many researchers argue that this kind of measurement
could be misleading since it does not capture the notion of functionality, i.e. the
amount of function inside the source code of a given program. Thus, they propose
the use of functionality metrics.
One of the first such metrics was proposed by Albrecht in 1977 and it was called
Function Point Analysis (FPA) [Alb79] as a means of measuring size and productivity
(and later on also complexity). It uses functional, logical entities such as inputs,
outputs, and inquiries that tend to relate more closely to the functions performed by
the software as compared to other measures, such as lines of code. Function point
definition and measurement have evolved substantially; the International Function
Point User Group or IFPUG1 , formed in 1986, actively exchanges information on
function point analysis (FPA).
In order to compute Function Points (FP), one first need to compute Unadjusted
Function Point Count (UFC). To calculate this, first on further needs to calculate the
following:
• External inputs: Every input provided from the user (data and UI interactions)
but not inquiries.
• External outputs: Every output to the user (i.e. reports and messages).
• External inquiries: Interactive inputs requiring a response.
• External files: Interfaces to other systems.
• Internal files: Files that the system uses for its purposes.
Next each item is assigned a subjective “complexity” rating on a 3-point ordinal

scale:
• Simple.
1
http://www.ifpug.com/
Revision: final 11
• Average.
• Complex.
Then a weight is assigned to the item according to some tables (e.g. for simple
external input this is 3 and for a complex external inquiry this is 6, the total number
of weights equals to 15). So, the UFC is calculated as
15
X
UFC = {(Number Of Items Of Variety i) ∗ weighti }
i=1
Then we compute a technical complexity factor (TCF). In order to do this we rate 14

factors(Fi , such as Reusability and Performance, from 0 to 5 (0 means irrelevant, 3
average and 5 means it is essential to the system built) and then combine all this to
the following formula:
14
X
TCF = 0.65 + 0.01 Fi
i=1
The final calculation of the total FP of the system is
FP = UFC ∗ TCF
There is a very large user community for function points; IFPUG has more than
1200 member companies, and they offer assistance in establishing a FPA program.
The standard practices for counting and using function points can be found in the
IFPUG Counting Practices Manual. Without some standardisation of how the func-
tion points are enumerated and interpreted, consistent results can be difficult to
obtain. Successful application seems to depend on establishing a consistent method
of counting function points and keeping records to establish baseline productivity
figures for specific systems. Function measures tend to be independent of language,
coding style, and software architecture, but environmental factors such as the ratio
of function points to source lines of code will vary, although there have been some
tries to map LOCs to FPs [Jon95]. Some limitations of the function points include
problems about the subjectivity of the TCF and other subjective measures used, the
weights and other. Also, their application is rather time consuming and demands
well trained staff. Taking into account its limitations, the method can be rather use-
ful as an estimator about size and other metrics that take size into account.
Object Oriented Size Metrics: In object oriented development, classes and meth-
ods are the basic constructs. Thus, apart from the metrics presented above, in object
oriented technology we can use the number of the classes and methods as an aspect
of size. These metrics are straightforward:
• Number of classes.
Revision: final 12
• Number of methods per class.
• LOC per class.
• LOC per method.
It is obvious that metrics from other sections also apply to object oriented develop-
ment, but in relation to classes and objects (for example, for the complexity metrics
presented later in this document, we have average complexity per class or method).
Reuse: With the term reuse we mean the amount of code which is reused in the fu-
ture release of the software. Although it may sound simple, reuse cannot be counted
in a straightforward manner, because it is difficult to define what we mean by code
reuse. So there are different notions of reuse that take into account the extent of
reuse [FP97]: we have straight reuse (copy and paste of the code) and modified
reuse (take a module and change the appropriate line in order to implement new
features). In addition, in object oriented programming, reuse extends to the reuse
or inheritance of certain classes.
Reuse also affects size measurement of successive releases, if the present release
of a software contains a large identical amount of code from the previous one, what
is its actual size? For example, IBM uses a metric called shipped source instructions
(SSI) [Kan03] which is expressed as
SSI (current) = SSI (previous)

+ CSI (new and changed code for current release)
− deleted code − changed code
The final term adjusts for changed code which would otherwise be counted twice.
This metric encapsulates reuse in its definition and it is rather useful.
1.2.1 Structure Metrics
Apart from size, there are other internal product attributes that are useful to soft-
ware engineering measurement practice. Since the early stages of the science of
software metrics, researchers pointed out a link between the structure of the prod-
uct (i.e. the code) and certain quality aspects. These are called structural metrics
and here we are going to present them. According to our belief these metrics are
going to be useful for our research.
McCabe’s Complexity Metrics: One of the first and widely used complexity met-
rics is McCabe’s Cyclomatic Complexity [McC76]. McCabe proposed that a pro-
gram’s cyclomatic complexity can be measured by applying principles of graph the-
ory. He represented the program structure as a graph G. So for a program with a
Revision: final 13
graph G, the cyclomatic complexity is
v(G) = e − n + 1
where e is the number of edges of G and n the number of nodes. In addition, McCabe
has given some other definitions such as the cyclomatic number, which is
v(G) = e − n + 2
and the essential cyclomatic complexity, which is
ev(G) = v(G) − m
where m is the number of sub flowgraphs, else the number of connected components
of the graph. In the literature there is also the definition:
v(G) = e − n + p
(where e is the number of edges, n the number of nodes and p the number of nodes
that are exit points — last instruction, exit, return etc.) So for the graph in Figure
1.2.12 , the cyclomatic complexity V (G) = 3.
Although the cyclomatic complexity metric was developed in the mid ’70s, it has
evolved and been calibrated during the years and it has become a mature, objective
and useful metric to measure a program’s complexity. It is also considered to be a
good maintainability metric.
The above metrics (LOC, McCabe’s Cyclomatic Complexity and Halstead’s Soft-
ware Science) treat each module separately. The metrics below try to take into
account the interaction between the modules and quantify this interaction.
Coupling: The notion of coupling was introduced by three IBM researchers in

1974. Stevens, Myers and Constantine proposed a metric that measures the quality
of a program’s design [SMC74]. Coupling between two modules of a piece of soft-
ware is the degree of interaction between them. By combining coupling between all
the system’s modules, one can compute the whole system’s global coupling. There
are no standard measures of coupling. However, there are six basic types of coupling
that are expressed as a relation between two modules x and y [FP97] (the relations
are listed from the least dependent to the most):
• No coupling relation: x and y have no communication and they are totally

independent of each other.
• Data coupling relation: x and y communicate by parameters. This type of

coupling is necessary for the communication of x and y.
2
Courtesy of: http://www.dacs.dtic.mil/techs/baselines/complexity.html
Revision: final 14
Figure 1: A program’s flowchart. The cyclomatic complexity of this program V (G) is

3.
• Stamp coupling relation: x and y accept the same record type (i.e. in database
systems) as a parameter, which may cause interdependency between otherwise
not related modules.
• Control coupling relation: x passes a parameter to y with the intention of con-

trolling its behaviour.
• Common coupling relation: x and y refer to the same global data. This type of
coupling is the kind that we don’t want to have.
• Content coupling relation: x refers to the inside of y (i.e. that is it branches

into, changes data in, or alters a statement in y.
From the above coupling relations the instance of common coupling has been ex-
plored in the case of the Linux kernel in order to explore its maintainability [YSCO04].
Henry and Kafura’s Information Flow Complexity: Another complexity metric

that is common in software engineering measurement is Henry and Kafura’s Infor-
mation Flow Complexity [HK76]. This metric is based on the information passing
between the modules (or functions) of a program and particularly the fan in and
fan out of a module. With the term fan in of a module m, we mean the number of
Revision: final 15
modules that call module m plus the number of data structures that are retrieved
by m. With fan out we mean the number of modules that are called from m plus the
number of data structures that are updated by m. The definition of the metric for a
module m is:
2
Information Flow Complexity(m) = length(m) · (Fan In(m) · Fan Out(m))
Other researchers have proposed to omit the factor length and, thus simplify the
metric.
Since its introduction, Henry and Kafura’s metric has been validated and con-
nected with maintainability [FP97], [Kan03]. Modules with high information flow
complexity tend to be error prone, while, on the other hand, low values of the metric
correlate with fewer errors.
Object Oriented Complexity Metrics: With the rise of object oriented program-
ming, software metrics researchers tried to figure out how to measure the complex-
ity of such applications. One of the most widely used complexity metrics for object
oriented systems is the Chidamber and Kemerer’s metrics suite [CK76]:
• Metric 1: Weighted Methods per Class (WMC) WMC is the sum of the complexi-
ties of the methods, whereas complexity is measured by cyclomatic complexity:
Pn
WMC = i=1 ci
where n is the number of methods and c i is the complexity of the i -th method.
We have to stress here that measuring the complexity is difficult to implement
because, due to inheritance, not all methods are assessable in the class hier-
archy. Therefore, in empirical studies, WMC is just the number of methods in
a class and the average of WMC is the average number of methods per class
[Kan03].
• Metric 2: Depth of Inheritance Tree (DIT) This metric represents the length of
the maximum path from the node to the root of the inheritance tree.
• Metric 3: Number of Children (NOC) This is the number of immediate succes-

sors (subclasses) of a class the hierarchy of the inheritance tree.
• Metric 4: Coupling between Object Classes (CBO) An object class is coupled

to another if it invokes another one’s member functions or instance variables.
CBO is the number of these other classes.
• Metric 5: Response for Class (RFC) This metric represents the number of the
methods that can be executed in response to a message received by an object
of that class. It equals to all the local methods plus the number of methods
called by local methods.
Revision: final 16
• Metric 6: Lack of Cohesion Metric (LCOM) The cohesion of a class is indicated

by how closely the local methods are related to local instance variables of the
class. LCOM equals the number of disjoint sets of local methods.
Several studies show that the CK metrics suite assist in measuring and predicting an
object oriented systems maintainability [FP97], [Kan03]. In particular, studies show
that certain CK metrics are linked to faulty classes and help predict such [Kan03].
1.2.2 Design Metrics
Along with object oriented programming the notion of object oriented design was
introduced, too. The programmer has to use some kind of modelling with classes
and objects in order to design his application first. After the design is completed,
the programmer goes on with coding. One of the questions that a programmer asks
himself is whether or not his design is of good quality. An experienced programmer
can answer that question by applying on his/her design a number of rules based
on his/her experience. He looks for bad choices that may have been done or the
violation of some intuitive rules of himself. If the design passes his own checks then
it is of good quality and he continues to code writing. Of course with big applications,
inspection of a design by a person is rather difficult, so a tool is needed.
These intuitive rules are called “design heuristics” checks. They are based on
experience. They are like design patterns, but rather than proposing a certain design
for certain problems, heuristics are rules that help the designer check the validity
of his design. Design heuristics are validations for the object oriented design and
advise the programmer for certain design mistakes. These advises should be taken
into account by the programmer, who has to make some research to correct things.
Of course a heuristics violation does not mean a design mistake all the time, but it
is a point for further investigation by the development team. A well known set of
such object oriented design heuristics was first introduced by Arthur Riel. Riel in
his seminal work [Rie96] defined a set of more than 60 design heuristics, a result
of his experience. His work has helped many people to improve their designs and
the way they program. Before Riel other researchers have addressed similar issues,
including Coad and Yourdon [YC91]. Additionally, there is on going research in the
field of design heuristics. Researchers are investigating the impact of the application
of object oriented design heuristics and the evaluation and the validation of these
heuristics [DSRS03, DSA+ 04]. As an example someone can read the object oriented
design heuristics listed in the list below. These heuristics are taken from Riel [Rie96].
1. The inheritance hierarchy should not be deeper than six.
2. Do not use global data. Class variables or methods should be used instead.
3. All data should be hidden within its class.
Revision: final 17
4. All data in a class should be private.
5. All methods in a class should have no more than six parameters.
6. A class should not have zero methods.
7. A class should not have one or two methods.
8. A class should not be converted to an object of another class.
9. A class should not contain more than six objects.
10. The number of public methods of a class should be no more than seven.
11. The number of classes with a class collaborates should not be more than four.
12. Classes with so much information should be avoided. We consider that a class
fits into this description when it associates with more than four classes, has
more than seven methods and more than seven attributes.
13. The fan out of the class should me minimised. The fan out is the product of the
number of methods defined by the class and the number of methods they send.
This number should be no more than nineteen.
14. All abstract classes must be base classes.
15. All base classes should be abstract classes.
16. Do not use multiple inheritance.
17. A class should not have only methods with names set, get print.
18. If a class contains objects from another class, then the containing class should
be sending messages to the contained objects. If this does not happen then we
have a violation of the heuristic.
19. In case that a class contains objects from other classes, these objects should
not be associated with each other.
20. If a class has only one method apart from set, get and print it means then there
is a violation.
21. The number of messages between a class and its collaborator should me mini-
mal. If this number is more than fifteen we have
One should note here that the above heuristics can be validated with the use of a
tool.
Before Riel, Lorenz [LK94] proposed similar rules derived from industrial experi-
ence (include metrics for the development process):
Revision: final 18
1. Average method size should be less than 24 LOC for C++.
2. Average number of methods class should be less than 20.
3. Average number of instance variables per class should be less than 6.
4. Class hierarchy nesting level (or DIT of CK metrics) should be less than 6.
5. Number of subsystem-subsystem relationships should be less than the number

of the 6th metric.
6. Number of class-class relationships in each subsystem should be relatively

high.
7. Instance variable usage: If groups of methods in a class use different sets of

instance variables, look closely to see if the class should be split into multiple
classes along those “service” lines.
8. Average number of comment lines per method should be greater than 1.
9. Number of problem reports should be low.
10. Number of times class is reused (a class should be reused in other projects,
otherwise it might need redesign).
11. Number of classes and methods thrown away (this should occur in a steady
rate).
As mentioned before, all these “rules of thumb” are derived from the experience
gained during multiple development processes and reflect practical knowledge. For
example, a large number of average method size may indicate a poor OO design
and a function oriented coding [Kan03]. A class containing too much responsibility
(many methods) indicates that there should be a separate class for some of methods.
The list of practice goes on, reflecting this practical knowledge mentioned.
1.2.3 Product Quality Metrics
The previous sections discussed development and design quality. These are the qual-
ity metrics that can be applied to a software product early in the product lifecycle:
before the product is released these metrics may already be calculated. The follow-
ing metrics are post-release metrics and apply to a finished software product.
Revision: final 19
Maintainability: When a software product is complete and released, it enters into

the maintenance phase. During this phase, defects are corrected, re-engineering
occurs and new features are added. Here we look at four types of software mainte-
nance:
• Corrective maintenance, which is the main maintenance task and involves cor-
recting defects that are reported by users.
• Adaptive maintenance is the maintenance that has to do with adding new func-
tionality to the system.
• Preventive maintenance is the defect fixing done by the development team,

preventing defects delivered to the user.
• Perfective maintenance involves mainly re-engineering and redesigning tasks.
For the maintenance process, we mainly have four maintainability metrics.
Average Code Lines per Module: This is a very simple metric which is the aver-
age number of comment lines in the code of its module of the code (e.g. function, or
class). This metric show how easy the code can be maintained or how easy someone
can understand part of the code and correct it. With this metric there are some con-
siderations regarding the comment lines, considerations that also apply later in the
Maintainability Index metric. For instance, considerations need to be given to how
much of the comment lines reflects the code (are there useless comment lines?), if
the commented lines contain comment with copyright notices and other legal notices
etc.
Mean Time To Repair: Mean Time To Repair (MTTR) is an external measure, it

has to do with the delivered product from the user point of view, and not the code.
MTTR is the average time to fix a defect from the time it was reported to the moment
the development team corrected it. Sometimes MTTR is referred to as “fix response
time.”
Backlog Management Index: Backlog Management Index (BMI) is also an exter-

nal measure of maintainability and it has to do both with defect fixing and defect
arrival [Kan03]. The BMI is expressed as
Number Of Problems Closed

BMI = · 100%
Number Of Problem Arrivals
The number of problems that arrive or are closed are counted over some fixed time
period, usually a month. Of course the time period can change from a week to a
fixed day period. If BMI is bigger than 100% it means that the development team
Revision: final 20
is efficient and closes bugs faster than their arrival rate. If it is less than 100% it
means that the development team has efficiency problems and stays behind with the
defect fixing process.
Maintainability Index: Several metrics have been proposed to describe the inter-
nal measurement of maintainability [FP97]. Most of them try to correlate structural
metrics presented before, with maintainability. In certain cases there has been a link
of a certain metric with maintainability. For example, McCabe categorised programs
in the maintenance risk categories, stating that any program with a McCabe metric
larger than 20 has a high risk of causing problems.
One interesting model that derived from regression analysis and is based on met-
rics proposed before is the Maintainability Index (MI) proposed by Welker and Oman
[WO95]. The MI shows strong correlation between Halstead Software Science met-
rics, McCabe’s cyclomatic complexity, LOCs , and the number of comments in the
code. There are two expressions of MI, one using the three of the previous metrics
and another using the four:
Three-Metric MI equation
MI = 171 − 5.2 ln(aveV ) − 0.23aveV (g) − 16.2 ln(aveLOC)
where aveV is the average Halstead Volume per module, aveV (g) is the average
extended cyclomatic complexity per module, and aveLOC is the average lines
of code per module.
Four-Metric MI equation
q
MI = 171 − 5.2 ln(aveV ) − 0.23aveV (g) − 16.2 ln(aveLOC) + 50 sin 2.4perCM
here aveV (g) and aveLOC are as before and aveE is the average Halstead Effort
per module and and perCM is the average percent of lines of comments per
module.
In their article, Welker and Oman proposed three rules on how to choose which
metric (3 or 4 metric equation) is appropriate for use [WO95]. They proposed three
criteria, if one them is true then it is better to use the 3 metric equation, otherwise
use the 4 metric one. The criteria are:
• The comments do not accurately match the code. Unless considerable attention
is paid to comments, they can become out of synchronisation with the code and
thereby make the code less maintainable. The comments could be so far off as
to be of dubious value.
• There are large, company-standard comment header blocks, copyrights, and

disclaimers. These types of comments provide minimal benefit to software
maintainability. As such, the 4 metric MI will be skewed and will provide an
overly optimistic maintainability picture.
Revision: final 21
• There are large sections of code that have been commented out. Code that has
been commented out creates maintenance difficulties.
Calculated MI is simple because there are tools (we examine such tools in Section
2) that measure the metrics it facilitates. As authors suggest MI is useful periodic
assessment of the code in order to test its maintainability.
1.3 Productivity Metrics

Software productivity is a very complex metric which is mainly used in effort and
cost estimation. However, we shall use productivity as a quality metric in order to
evaluate the health of a software project. Generally productivity is expressed as
Number Of Things Implemented

Productivity =
PersonMonths
The term “things” refers to size measurements which can be expressed as lines of
code, function points or number of classes in case of object oriented development.
Similarly, person months can be any fixed time period. We must note here that the
metric proposed is a very simple one. Of course, more complex metrics exist (such
as metrics derived from regression techniques) but are beyond the scope of our own
research.
1.4 Open Source Development Metrics

Apart from the metrics presented in the previous sections, there are metrics that
can be applied directly to the Open Source development process and have been
use in the past in order to perform open source software evaluation and success
measurement [lD05], [CHA06]. Additionally we present some metrics used by Open
Source hosting companies like Freshmeat.net.
Number of releases in past 12 months: This measures the activity of a software

project, particularly its productivity and also reliability (and defect density). This bi-
directional nature of this metric (productivity and maintainability) has no a nominal
scale. Small values of the metric may show small productivity, but it may be an indi-
cator for minor improvements and bug fixes. Thus, this metric has to be measured
along with others, like the number of contributors and/or number of downloads.
Furthermore, the metric can be refined to number of major releases and number of
minor releases or patches. With this distinction, the previous problem of the metric
can be overcome, high number of minor releases is an indicator of problematic soft-
ware (but also of high response fix time). The number of minor releases or patches
metric can be used along with the defect removal effectiveness metric
Revision: final 22
Volume of mailing lists: This metric is rather useful in order to evaluate the
health of a project and the support it provides [SSA06]. It is a direct measurement
of the number of messages sent in a project’s list in a month period (or another
fixed time period). A healthy project has an active mailing list, while a soon to be
abandoned one has lower activity. The volume of the users’ mailing list is also an
indicator of how well this project is supported and well documented.
Volume of available documentation: Along with the previous metric, this one
is an indicator of the available support. When we refer to the volume of available
documentation, we mean the available documents, like the installation guide or the
administrator’s guide.
Number of contributors: A direct repository measurement, which represents

how big the community of a project is. High number of contributors mean fast bug
fixing, availability of support and of course it is a perquisite for a project to evolve.
We have to stress here that a lot of projects like Apache, have a small core group that
produces the majority of the code and a larger one that contributes less [MFH02].
Thus, this metric has to be further evaluated and be used along with other metrics.
Repository Checkouts: From a project’s repository, someone can directly extract

some other interesting metrics, particularly productivity metrics. These metrics are
the number of commits per commiter, the number of commits of a specific commiter
for a fixed period (for example, a month) and the total number of commits for a fixed
period. All these metrics are productivity metrics and also can be an indicator for
the defect removal effectiveness. Of course, as these metrics measure activity, they
represent the healthiness of the project.
Number of downloads: Direct measurement of the number of downloads of an

Open Source project. This metric can show us a project’s popularity, thus it is an
indicator of its healthiness and end user quality. However, someone must have in
mind that someone may downloading a software does not mean that he used it and,
if he did, whether he was satisfied with it.
Freshmeat User Rating: Freshmeat.net hosting service uses a user rating metric
which works like this, according to its website3 : Every registered user of freshmeat
may rate a project featured on this website. Based on these ratings, they build a top
20 list and users may sort their search results by ratings as well. Please be aware
of the fact that unless a project received 20 ratings or more a project will not be
3
http://freshmeat.net/faq/view/31/
Revision: final 23
considered for inclusion in the top 20. The formula gives a true Bayesian estimate,
the weighted rank (WR):
v m
WR = R+ C
v+m v+m
where:
R = average rating for the project
v = number of votes for the project
m = minimum votes required to be listed in the top 20 (currently 20)
C = the mean vote across the whole report
Freshmeat Vitality: The second metric that Freshmeat uses is the project’s vital-
ity. Again, according to Freshmeat 4 , the vitality score for a project is calculated
thus:
popularity = (announcements ∗ age)/(last announcement)
which is the number of announcements multiplied by the number of days an applica-
tion exists divided by the days passed since the last release. This way, applications
with lots of announcements that have been around for a long time and have recently
come out with a new release earn a high vitality score, old applications that have only
been announced once get a low vitality score. The vitality score is available through
the project page and can be used as a sort key for the search results (definable in
the user preferences).
Freshmeat Popularity: From the Freshmeat site 5 : The popularity score super-
seded the old counters for record hits, URL hits and subscriptions. Popularity is
calculated as
q
popularity = (record hits + URL hits) ∗ (subscriptions + 1)
Again we have to stress here that these metrics are used by Freshmeat and of
course they need further investigation and validation.
1.5 Software Metrics Validation

The metrics presented in this chapter try to measure a wide range of attributes of
software. For each attribute of a piece of software (e.g. length) there are vari-
ous kinds of metrics to measure them. This availability of various metrics for each
attribute raises the question if a particular metric is suitable for measuring an at-
tribute. The matter of suitability of a specific metrics is a very important area in
4
5
Revision: final 24
software engineering research and it is the reason of why metrics are questioned by
researchers and there is a lot of discussion for them.
According to Fenton [FP97] the way we want to validate metrics depends on
whether we want just to measure an attribute or we want to measure in order to
predict. Prediction of an attribute of a system (e.g. code quality or cost) is a core
issue in software engineering. So, in order to perform metrics validation we must
distinguish between these two types:
• Measurement that is performed in order to assess an existing entity by numer-

ically characterising one or more of its attributes, for example size.
• Measurement that is performed in order to predict some attribute of a future

entity, like the quality of the code.
The validation procedure also can be distinguished in two types[BEM95]:
• Theoretical validation. This kind of validation facilitates a mathematical for-

malism and a model. This is usually done by setting mathematical relations for
each attribute and try to validate these relations.
• Empirical validation. As Briand et al state [BEM95] empirical validation is the

answer to the question “is the measure useful in a particural development envi-
ronment, considering a given purpose in defining the measure and a modeler’s
viewpoint?”.
From these two approaches, empirical validation is the one which is widely used. It
practically tries to correlates a measure with some external attributes of a software,
for example complexity with defects.
1.5.1 Validation of prediction measurement
As Fenton states,validating a measurement conducting for prediction is the process

of establishing the accuracy of the prediction by empirical means, that is by com-
paring model performance with known data. In other words, a prediction measure-
ments valid if it makes accurate predictions. This kind of validation is widely used
in software engineering research for cost estimation purposes, quality and reliabil-
ity prediction. With this method, researchers form and test hypotheses in order to
predict certain attributes of software or conduct formal experiments. Then they use
mathematical (statistical) techniques to test their results, for example if a particular
metric such as size is an accurate cost estimator. Other kinds of predictions are
quality and fault proneness detection. Mathematical techniques used are regression
analysis, logistic regression and also more sophisticated methods such as decision
trees and neural networks. Examples of such a metric validation are [GFS05] and
[BBM96]
Revision: final 25
1.5.2 Validation of measures
Again, according to Fenton [FP97] validating a software measure is the process of

ensuring that the measure is a proper numerical characterisation of the claimed
attribute by showing that the representation condition is satisfied. As it is implied,
this kind of validation facilitates theoretical validation. For example, for a metric
that measures size, we form a model to represent a program and relation of that
model to the notion of size. Let’s call the program P and the relation m(P). In order
to validate the length we can use the following. If a program P1 is of length m(P1 )
and a program P2 of length m(P2 ) then this equation
m(P1 + P2 ) = m(P1 ) + m(P2 )
should be valid. If also this is valid
P1 < P2 ⇒ m(P1 ) < m(P2 )
then our relation, our metric, is valid.

Although we are not going to discuss metrics validation in depth here, we are
going to perform validation throughout our projects, especially when we are going
to present new metrics for Open Source software development. A good place to
start studying metrics validation is [BEM95] and [Sch92]. Both papers provide a lot
of insight about metrics validation and also present mathematical techniques both
for theoretical and empirical validation. A more recent study that discusses metrics
is that of Kener and Bond [KB04b]. Another interesting paper which discusses how
empirical research in software engineering should be conducted and contains a lot
about validation is that of Kitchenham et. al. [KPP+ 02]. Two good examples of
application of metrics validation are that of Briand et al [BDPW98] and Basili et. al.
[BBM96]. A rather complete publication list on software metrics validation can be
found here http://irb.cs.uni-magdeburg.de/sw-eng/us/bibliography/bib_10.
shtml
Revision: final 26
2 Tools
Many publications mention measurement tool support and automation as important
success factors for software measurement efforts and quality assurance [KSPR01],
providing frameworks and general approaches [KRSZ00], or giving more specific
solution architectures [JL99]. There is a great variety of research tools to support
software metric creation, handling, and analysis; an overview on different types of
software metrics tools is given in [Irb]. Wasserman [A.I89] introduces the concept
of tools with vertical and horizontal architecture, with the former supporting ac-
tivities in a single life cycle phase, such as UML design tools or change request
databases, and the latter supporting activities over several life cycle phases, such as
project management and version control tools. Fuggetta [Fug93], on the other hand,
classifies tools as either single tools, workbenches supporting a few development ac-
tivities, and environments supporting a great part of the development process. The
above ideas for different kind of metrics tools certainly affected the functionality that
commercial tools offer but still the most popular categorisation of metrics tools clas-
sifies metrics tools as either product metrics tools or process metrics tools. Product
metrics tools measure the software product at any stage of its development, from
requirements to installed system. Product metrics tools may measure the complex-
ity of the software design, the size of the final program (either source or object
code), or the number of pages of documentation produced. Process metrics tools,
on the other hand, measure the software development process, such as overall de-
velopment time, type of methodology used, or the average level of experience of the
programming staff. In this chapter we are going to present tools, both Open Source
and commercial that support and automate the measurement process.
2.1 Process Analysis Tools

The process of Open Source software development depends heavily on a repository,
responsible for version control. The majority of the projects use mainly two ver-
sioning control systems for their repositories, CVS6 and Subversion7 . Many of the
information needed in order to extract the various metrics is contained in that repos-
itories. This information is
• The code itself, along with historical data (changes, additions, etc).
• Information regarding programmers (commiters) (number of commiters, user-

names, etc).
• Historical data about the productivity of commiters (number of commits, which

part of code is committed by whom, etc.
6
http://www.nongnu.org/CVS/
7
http://subversion.tigris.org/
Revision: final 27
Figure 2: CVSAnalY Web Interface, Main Page
All these are stored in a repository and someone can find tools, available as Open
Source software, to extract all the useful information from the repository.
2.1.1 CVSAnalY
CVSAnalY8 (CVS Analysis) is one of the first tools that accesses a repository in order
to find information regarding an open source project. It has been developed by the
Libresoft Group at the Universidad Juan Carlos in Spain and has already produced
results, used in research in open source software [RKGB04]. The tool is licenced
under the GNU General Public Licence.
Specifically, CVSAnalY is a tool that extracts statistical information out of CVS
and Subversion repository logs and transforms it in database SQL formats. The
main tool is a command line tool. The presentation of the results is done with a web
interface - called CVSAnalYweb - where the results can be retrieved and analysed
in an easy way (after someone has run the command line main tool, CVSAnalY). The
tools produces various results and statistics regarding the evolution of a project over
time. A general view of the tool is shown in Figure 2. The tool stores historical data
such as:
• First commit logged in the versioning system.
• Last commit (until the date we want to examine.
• Number of days examined.

8
http://cvsanaly.tigris.org/
Revision: final 28
• Total number of modules in the versioning system.
• Commiters.
• Commits.
• Files.
• Aggregated Lines.
• Removed Lines.
• Changed Lines.
• Final Lines.
File type statistics for all modules:
• File type.
• Modules.
• Commits.
• Files.
• Lines Changed.
• Lines Added.
• Lines Removed.
• Removed files.
• External.
• CVS flag.
• First commit.
• Last commit.
The tool also logs the inactivity rate for modules and commiters, commiters per
module, Herfindahl-Hirschman Index for modules and also as mentioned before it
produces helpful graphs. Example of graphs produced are:
• Evolution of the number of modules.
• Modules by Commiters (log-log).
Revision: final 29
Figure 3: CVSAnalY Web Interface, Evolution of the number of modules
• Modules by Commits (log-log).
• Modules by Files (log-log).
• Commiter by Changes (log-log).
Examples of the graphs are shown in Figure 3 and Figure 4.

The CVSAnalY tool is a rather useful one and it helps to gather data about the
process of Open Source software development and data substantial to measure other
metrics, especially process metrics. Another very important feature of CVSAnalY is
the reconstruction of the repository in specific timelines.
2.1.2 GlueTheos
GlueTheos [RGBG04] has been developed to coordinate other tools used to analyse
Open Source repositories. The tool is a set of scripts used to download data (source-
code) from Open Source repositories, analyse them with external tools (developed
from third parties) and store the results in a database for further investigation. The
parts, which comprise GlueTheos are:
• The core scripts act as a user interface interacting with the user and handle
details like repository configuration, periods of analysis (the periodic snapshots
from a repository), storage details, third party tools details and parameters.
Revision: final 30
Figure 4: CVSAnalY Web Interface, Commiter by Changes
• The downloading module which is responsible for downloading source code

snapshots at specific dates and storing it locally.
• The analysis module. Here user describes further details of the external tools
used for source code analysis. These details include instructions on how to
invoke the tool, which are the parameters to the tool and the output details of
the tool. The module is also responsible for running these eternal tools.
• The storage module. This module is responsible for the storage of the results
created by the previous module. It takes the output of an analysis tool and
formats it into an appropriate SQL command, suitable to store the result into a
database.
Generally the tools runs like this:
1. The user chooses which project to analyse (e.g. GNOME) and which periods to
analyse (e.g. every month from December 2003 until September 2005).
2. Then it chooses an analysis tool (e.g. sloccount, which counts physical source
lines of code9 ). The integration of the tool with the main set of scripts include
description on how to call the tool, parameter passing and description of its
output.
9
http://www.dwheeler.com/sloccount/
Revision: final 31
Figure 5: GlueTheos, Table that contains analysis of a project
3. The program retrieves the code of the project analysed for the configured
dates, then it analyses the code with the external tool and stores the output
in a database.
The table of the database that contains the analysis results has as a column its output
of the external tool. Figure 5 shows a table created by GlueTheos, which contains
the output of sloccount (SLOC -source lines of code- and language type) for the files
of the gnome core project at a specific date. GlueTheos is released under the GNU
General Public Licence.
2.1.3 MailingListStats
MailingListStats10 analyses Mailman (and in future other mailing list manager soft-
ware) archives in order to get statistical data out of them. Statistical data is trans-
formed into XML and SQL to allow further analysis and research. This tool also
includes a web interface.
10
http://libresoft.urjc.es/Tools/MLStats
Revision: final 32
2.2 Metrics Collection Tools

2.2.1 ckjm
ckjm11 calculates Chidamber and Kemerer object-oriented metrics by processing the

bytecode of compiled Java files. The program calculates for each class the six metrics
proposed by Chidamber and Kemerer as well as afferent couplings and the number
of public methods. This application was developed by Professor Diomidis Spinellis,
the coordinator of the SQO-OSS project.
2.2.2 The Byte Code Metric Library
The Byte Code Metric Library12 (BCML) is a collection of tools to calculate the met-
rics of the Java byte code classes or JAR files in directories, output the result into
XML files, and report the result with HTML format.
2.2.3 C and C++ Code Counter
CCCC is a tool which analyses C / C++ files and generates a report on various metrics.
The tool 13 is developed as a MSc thesis by Tim Littlefair and it is copyrighted by him.
The tool is command line and analyses an input of a list of files and generates HTML
and XML reports containing results. The metrics measured are the most common
ones, specifically they are:
• Summary table of high level metrics summed over all files processed in the
current run.
• Table of procedural metrics (i.e. lines of code, lines of comment, McCabe’s

cyclomatic complexity summed over each module.
• Table of four of the six metrics proposed by Chidamber and Kemerer.
• Structural metrics based on the relationships of each module with others. In-
cludes fan-out (i.e. number of other modules the current module uses), fan-in
(number of other modules which use the current module), and the Information
Flow measure suggested by Henry and Kafura, which combines these to give a
measure of coupling for the module.
• Lexical counts for parts of submitted source files which the analyser was unable
to assign to a module. Each record in this table relates to either a part of the
code which triggered a parse failure, or to the residual lexical counts relating
to parts of a file not associated with a specific module.
11
http://www.spinellis.gr/sw/ckjm
12
http://csdl.ics.hawaii.edu/Tools/BCML
13
http://cccc.sourceforge.net/
Revision: final 33
Figure 6: CCCC, Report for Procedural Metrics
Figure 6 shows the report for procedural metrics for an Open Source project, while
Figure 7 shows the report for Object Oriented Metrics of the same project.
2.2.4 Software Metrics Plug-In for the Eclipse IDE
The Software Metrics 14 Plug In for Eclipse IDE is a powerful add-on for the popular
Open Source software IDE Eclipse. It is installed, as its name denotes, as a plug in
to Eclipse and it is distributed under the same licence as the Eclipse IDE itself. The
tool measures Java code against a long list of metrics:
• Lines of Code (LOC): Total lines of code in the selected scope. Only counts
non-blank and non-comment lines inside method bodies.
• Number of Static Methods (NSM): Total number of static methods in the se-
lected scope.
• Afferent Coupling (CA):The number of classes outside a package that depend

on classes inside the package.
• Normalised Distance (RMD): RM A + RM I − 1, this number should be small,

close to zero for good packaging design.
• Number of Classes (NOC): Total number of classes in the selected scope.
• Specialisation Index (SIX): Average of the specialisation index, defined as NORM

* DIT / NOM. This is a class level metric.
14
http://metrics.sourceforge.net/
Revision: final 34
Figure 7: CCCC, Report for Object Oriented Metrics
• Instability (RMI): CE / (CA + CE).
• Number of Attributes (NOF): Total number of attributes in the selected scope.
• Number of Packages (NOP): Total number of packages in the selected scope.
• Method Lines of Code (MLOC): Total number of lines of code inside method
bodies, excluding blank lines and comments.
• Weighted Methods per Class (WMC): Sum of the McCabe Cyclomatic Complex-
ity for all methods in a class.
• Number of Overridden Methods (NORM): Total number of methods in the se-

lected scope that are overridden from an ancestor class.
• Number of Static Attributes (NSF): Total number of static attributes in the

selected scope.
• Nested Block Depth (NBD): The depth of nested blocks of code.
• Number of Methods (NOM): Total number of methods defined in the selected

scope.
• Lack of Cohesion of Methods (LCOM): A measure for the cohesiveness of a

class. Calculated with the Henderson-Sellers method: If m(A) is the number
of methods accessing an attribute A, calculate the average of m(A) for all at-
tributes, subtract the number of methods m and divide the result by (1-m). A
low value indicates a class with a high degree of cohesion. A value close to 1
Revision: final 35
indicates a lack of cohesion and suggests the class might better be split into a
number of (sub)classes.
• McCabe Cyclomatic Complexity (VG): Counts the number of flows through a

piece of code. Each time a branch occurs (if, for, while, do, case, catch and
the ?: ternary operator, as well as the && and || conditional logic operators in
expressions) this metric is incremented by one. Calculated for methods only.
For a full treatment of this metric see McCabe [McC76].
• Number of Parameters (PAR): Total number of parameters in the selected scope.
• Abstractness (RMA): The number of abstract classes (and interfaces) divided

by the total number of types in a package.
• Number of Interfaces (NOI): Total number of interfaces in the selected scope.
• Efferent Coupling (CE): The number of classes inside a package that depend
on classes outside the package.
• Number of Children (NSC): Total number of direct subclasses of a class.
• Depth of Inheritance Tree (DIT): Distance from class Object in inheritance hi-
erarchy.
The user can also set ranges and thresholds for each metric in order to track code
quality. Examples of these ranges can be:
• Lines of Code (Method Level): Max 50 - If a method is over 50 lines of code it is

suggested that the method should be broken up for readability and maintain-
ability.
• Nested Block Depth (Method Level): Max 5 - If a block of code has over 5
nested loops, break up the method.
• Lines of Code (Class Level): Max 750 - If a class has over 750 lines of code,
split up the class and delegate it’s responsibilities.
• McCabe Cyclomatic Complexity (Method Level): Max 10 - If a method has over

10 different loops, break up the method.
• Number of Parameters (Method Level): Max 5 - A method should have no more

than 5 parameters. If it does have, create an object and pass the object to the
method.
Revision: final 36
Figure 8: Metrics, List of metrics
As someone can see from this list, the tool is rather extensive and the metrics mea-
sured is exhaustive. A view of the plugin displaying the results of a measurement is
shown in Figure 8.
The tool also displays the dependency connections among the various packages
and classes of a project analysed as a connected graph. An example of this graph is
shown in Figure 9.
2.3 Static Analysis Tools

These tools analyse a program’s source code and locate bugs and problematic con-
structions. If a tool simply collects metrics, then it is listed under metric collection
tools. It is best to limit this page to tools that are open source, and candidates for
SQO-OSS data generation. Wikipedia maintains an exhaustive list of tools.
2.3.1 FindBugs
FindBugs15 looks for bugs in Java programs. It is based on the concept of bug pat-
terns.
15
http://findbugs.sourceforge.net
Revision: final 37
Figure 9: Metrics, Dependency Graph
2.3.2 PMD
PMD16 scans source code and looks for potential problems possible bugs, unused
and suboptimal code, over-complicated expressions and duplicate code.
2.3.3 QJ-Pro
QJ-Pro17 is a tool-set for static analysis of Java source code: a combination of auto-
matic code review and automatic coding standards enforcement.
2.3.4 Bugle
Bugle18 uses Google code search queries to locate security vulnerabilities.
2.4 Hybrid Tools

Hybrid tools analyse both process and project data.
16
http://pmd.sourceforge.net
17
http://qjpro.sourceforge.net
18
http://www.cipher.org.uk/index.html?p=projects/bugle.project
Revision: final 38
2.4.1 The Empirical Project Monitor
The Empirical Project Monitor19 (EPM) provides a tool for automated collection and
analysis of project data. The current version uses CVS, GNATS, and Mailman as data
sources.
2.4.2 HackyStat
Hackystat20 is a framework for automated collection and analysis of software engi-

neering product and process data. Hackystat uses sensors to unobtrusively collect
data from development environment tools; there is no chronic overhead on devel-
opers to collect product and process data. Hackystat does not tie you to a partic-
ular tool, environment, process, or application. It is intended to provide in-process
project management support.
2.4.3 QSOS
QSOS21 is a method, designed to qualify, select and compare free and Open Source
software in an objective, traceable and argued way. It publicly available under the
terms of the GNU Free Documentation License.
2.5 Commercial Metrics Tools

This section aims to document several popular commercial software metrics tools.
When possible we attempted to assess properties of commercial metrics tools in a
highly heterogeneous and ever-changing software development environment. There-
fore, the chosen tools are able to support the generation and storage of metric data
consistently and in a structured way and provide some degree of customisation with
development specific parameters. Based on the most common categorisation of met-
rics tools mentioned earlier product and process metrics tools will be documented.
2.6 Process metrics tools

This section documents several software process metrics tools. Apart from the pre-
sentation of the tools the assessment of the capabilities of the tools will be performed
when possible. The evaluation is based on three basic criteria, indicated by other
studies [A.I89] involving platform independence, input/output functions and automa-
tion.
19
http://www.empirical.jp
20
http://www.hackystat.org
21
http://www.qsos.org
Revision: final 39
Platform The first step in utilising any tool is to install it on an operating system.
In the worst case, tool’s platform requirements can not be fulfilled by an exist-
ing environment, which means a new OS would have to be added, i.e., bought,
installed, maintained. Another platform issue is the database support: some
tools are based on a metric repository and have to rely on some sort of rela-
tional database. The range of supported databases affects the tools’ platform
interoperability. As some of the tools have both server and client components
(for data storage and collection/reporting purposes, respectively), one has to
distinguish these components’ platform interoperability separately.
Input/output Software project quality tracking and estimation tools heavily rely on
data from external sources such as UML modelling tools, source code anal-
ysers, work effort or change request databases etc. The ease of connecting
to these applications through interfaces or file input substantially influences
a metric tool’s efficiency and error-proneness. On the other hand, data often
has to be exported for further processing in spread-sheets, project manage-
ment tools or slide presentations. Reports and graphs have to be created and
possibly viewed, posted on the Web, or printed.
Automation A key aspect of metric data processing is automatic data collection.

This can range from simple alerts sent to project managers at certain condi-
tions, periodic extraction of metric information from external tools, to advanced
scripting and programming capabilities. Missing automation usually requires
tedious and expensive manual data input and makes measurement inconsisten-
cies more likely, as measurements are performed by different persons.
2.6.1 MetriFlame
MetriFlame22 , a tool for managing software measurement data, is strongly based on

the GQM approach [BCR94]. A goal is defined, then corresponding questions and
metrics are determined, to assess whether a goal has been reached. Metrics can
only be accessed through such a GQM structure; it is not possible to simply collect
metrics without having to formulate goals and questions. The main elements of the
MetriFlame tool environment are: the actual MetriFlame tool, data collectors and
converters, and components for viewing the results. MetriFlame does not feature a
project database; it stores all its data in different files with proprietary formats. The
functionality that the tool offers is summarised in figure 1. MetriFlame supports 32-
bit Microsoft Windows environments (Windows 95 and later versions). The database
converter requires the Borland Database Engine (BDE) in order to access the differ-
ent types of databases. BDE is installed during the MetriFlame installation proce-
dure. Data can be imported to MetriFlame by using the so called data converters,
22
http://www.virtual.vtt.fi
Revision: final 40
Figure 10: Metriflame functionality
which are not part of the MetriFlame tool, but separate programs. These programs
convert the data and generate structured text files, which can then be imported into
MetriFlame. New data can also be entered manually. The process of data collection
cannot be automated. Project data can only be saved in a MetriFlame project file;
no other file format is available. Reports (graphs) can be saved as WMF, EMF, BMP,
JPEG or structured text. MetriFlame does not feature an estimation model.
2.6.2 Estimate Professional
Estimate Professional23 is a tool to control project data, create project estimates

based on different models or historical project data and visualise these estimates.
Different scenarios can be created by changing project factors. Estimate Profes-
sional is an extended and improved version of “Estimate”, a freely available pro-
gram, which can perform only basic size-based estimates, does not feature reporting
and does not consider risk factors. Estimate Professional does not feature a project
database; it stores all project information in a single file. Initially, project data is
entered by creating a new project and starting the estimate wizard. After specifying
project related information like type of project, current phase of project, maximum
schedule, priority of a short schedule, one has to choose between size-based estima-
tion, which focuses on artifact metrics (LOC, number of classes, function points), and
effort-based estimation, which focuses on effort metrics (staff-months). Estimates in
Estimate Professional are based on three models: Putnam Methodology, COCOMO
II and Monte Carlo Simulation. Estimates can be calibrated in three ways: Using the
outcome of historical projects from the project database, altering the project type
23
http://www.workflowdownload.com/
Revision: final 41
Figure 11: Estimate Professional.
by choosing subtypes for parts of the project or tuning the estimation by changing
productivity drivers like database size or programmer capability. A screenshot of the
tool is presented in Figure 11. Estimate Professional supports MS Windows 95/98,
NT 4.0 and 2000. For Installation on NT systems, administrator rights are required.
Project data can be imported from a Microsoft Project file or from a CSV file. The
process of data collection cannot be automated. Project data can be exported to a
Microsoft Project file; project metrics can be exported into a CSV file.
2.6.3 CostXpert
The software cost estimation tool CostXpert24 produces estimates of project dura-
tion, costs, staff effort, labour costs etc. using software size, labour costs, risk fac-
tors and other input variables. The tool features mappings of source lines of code
equivalents for more than 600 different programming languages. The main menu
of the tool is presented in Figure 12. Import of project data is limited to manual
entry. Data connectors to tools processing software artifacts do not exist. Data can
be exchanged between different copies of CostXpert via CostXpert project files. The
process of data collection can not be automated. Regarding the estimation process
Cost Xpert integrates multiple software sizing methods, it is compliant with CO-
COMO and over 32 lifecycles and standards. Cost Xpert is designed to aid project
control, facilitate process improvement and earn a greater return on investment
(ROI). Especially for COTS products the tool is able to estimate the portion of the
package that needs no modification but should be configured and parameterised,
what portion of the package needs to be modified and the amount of functionality
24
http://www.costxpert.com/
Revision: final 42
Figure 12: Cost expert main menu
that should be added to the system. Project data in a work breakdown structure
can be exported to Microsoft Project or Primavera TeamPlay. The expected labour
distribution can be exported to a CSV file. Customised project types, standards and
lifecycles can be exported to so-called customised data files. Reports can be printed
or exported as PDF, RTF or HTML files. Graphs can be exported as BMP, WMF or
JPEG files. CostXpert integrates more than 40 different estimation models based on
data from over 25.000 software projects. CostXpert supports MS Windows 95 and all
later versions. CostXpert does not feature a project database; project data is stored
in a project file in proprietary format.
2.6.4 ProjectConsole
ProjectConsole 25 is a Web-based tool for project control that offers project reporting
capabilities to software development teams. Project information can be extracted
from Rational tools or other third-party tools, is stored in a database and can be
accesses through a Web site. Rational ProjectConsole makes it easy to monitor the
status of development projects, and utilise objective metrics to improve project pre-
dictability. Rational ProjectConsole greatly simplifies the process of gathering met-
rics and reporting project status by creating a project metrics Web site based on
data collected from the development environment. This Web site, which Rational
ProjectConsole updates on demand or on schedule, gives all team members com-
plete, up-to-date view of your project environment. Rational ProjectConsole collects
metrics from the Rational Suite development platform and from third-party products,
and presents the results graphically in a customisable format to help the assessment
25
http://www-128.ibm.com
Revision: final 43
Figure 13: Rational ProjectConsole.
of the progress and quality. Rational ProjectConsole supports MS Windows XP; Win-
dows NT 4.0 Server or Workstation, SP6a or later and Windows 2000 Server or
Professional, SP1 or later. All the data is stored in a database, the so-called metric
data warehouse. Supported databases include SQL Server, Oracle and IBM DB2.
ProjectConsole needs a Web server (IIS or Apache Tomcat) to publish its data over
a network (local network or the Internet). The project Web site can be viewed with
any browser. ProjectConsole can extract metrics directly from Rational Clear-Quest,
Requisite Pro, Rose, and Microsoft Project repositories. In addition, ProjectConsole
provides so-called collection agents that can parse Rational Purify, Quantify, Cover-
age, and ClearCase data files. Automatic collection tasks can be scheduled to run
daily, weekly or monthly at a specified date and time. The data is extracted from the
source programs and stored in the metric data warehouse. The project Web site is
automatically updated. Graphs are in stored in PNG files. Data can be published in
tables and exported into HTML format. MS Excel 2000 or later can be used to im-
port the HTML table format. ProjectConsole does not feature an estimation model.
Figure 13 depicts the multi chart display of Project Console.
2.6.5 CA-Estimacs
Rubin has developed a proprietary software estimating model 26 that utilises gross
business specifications for its calculations. The model provides estimates of total
development effort, staff requirements, cost, risk involved, and portfolio effects.
The ESTIMACS model addresses three important aspects of software management-
estimation, planning, and control. The ESTIMACS system includes five modules.
26
http://www.ca.com/products/estimacs.htm
Revision: final 44
The first module is the System development effort estimator. This module requires
responses to 25 questions regarding the system to be developed, development envi-
ronment, etc. It uses a database of previous project data to calculate an estimate of
the development effort. Staffing and cost estimator is another. Inputs required for
this module are: the effort estimation from above, data on employee productivity,
and salary for each skill level. Again, a database of project information is used to
compute the estimate of project duration, cost, and staffing required. Hardware con-
figuration estimator requires as inputs information on the operating environment for
the software product, total expected transaction volume, generic application type,
etc. Output is an estimate of the required hardware configuration. Risk estimator
module calculates risk using answers to some 60 questions on project size, struc-
ture, and technology. Some of the answers are computed automatically from other
information already available. Finally portfolio analyser provides information on the
effect of this project on the total operations of the development organisation. It
provides the user with some understanding of the total resource demands of the
projects.
2.6.6 Discussion
The tools evaluated provide a broad variety of analysis capabilities, and different
degrees of explicit estimation support. However, they all allowed storing and com-
paring project measures in a structured way. Certain conclusions can be drawn on
whether the tools can integrate seamlessly in an existing and heterogeneous soft-
ware development environment. All of the evaluated tools are only available on one
operating system (MS Windows). This is particularly problematic for server compo-
nents, as many times a dedicated server would have to be added to an otherwise
Unix-based server farm. Some tools only work with particular database engines, for
example Project Console. In addition to manual data entry, the tools generally are
restricted to a few input file formats (e.g. Estimate Professional only reads Microsoft
Project and CSV files). While communication with spreadsheet applications is usu-
ally supported, few tools can access development tools like integrated development
environments (IDE) or requirement databases directly. Tools with advanced metric
data collection capability (like MetricCenter) offer only a limited set of connectors to
specific development tools, which have to be purchased separately. Their communi-
cation protocol is disclosed. Automation support is either not available (MetriFlame,
Estimate Professional, CostXpert) or limited to pull operations (MetricCenter). The
degree of flexibility with respect to defining new metrics and changing reports dif-
fers greatly, however all tools provide only basic reporting flexibility. This would not
be a problem itself, if the tools would allow unrestricted data access for online analyt-
ical processing (OLAP) reporting tools, but this is not possible with most of the tools
either. Data output for further processing is sometimes limited to CSV files and a pro-
prietary file format (Metri-Flame). Tools often don’t support common reporting file
Revision: final 45
formats like PDF. Output automation is supported by few of the evaluated tools (Met-
ricCenter, Project-Console). Some tools, instead of supporting integration, seem to
duplicate features, which normally are already available in medium and large-scale
IT environments: Some tools introduce a proprietary file format (Metri-Flame), or
are limited to a particular database system instead of accessing the companies re-
liable database infrastructure. Some basic graphical reporting and Web-publishing
features are provided, instead of feeding advanced OLAP reporting tools, whose use
would also automatically eliminate the need of duplicating features for the handling
of user access rights. Finally, the difficulties in getting access to some tools pro-
vide an additional cost barrier in integrating them in existing IT environments and
seem to indicate that at least some of these tools do not provide user interfaces
with a low learning curve. Altogether, process engineers and portfolio managers
operating in highly dynamic environments must still expect substantial costs when
evaluating, integrating, customising, operating and continuously adapting planning
and monitoring tools. Even tools with advanced architectures like MetricCenter of-
fer a limited set of supported development tool, restricted customisation capabilities
due to disclosed data protocols, and platform restrictions. Proprietary approaches to
security and user access concerns are further complicating integration. Much work
needs to be done to lower the technological barrier for collecting software metrics
in a varying and changing environment. Possible approaches to some of the current
problems are likely to embrace the support of modern file formats like XML, and
light-weight data communication by using, for example, the SOAP protocol.
2.7 Product metrics tools

The initial target of product metrics tools was the assessment of objective measures
of software source code regarding size and complexity. As experience has been
gained with metrics and models, it has become increasingly apparent that metric
information available earlier in the development cycle can be of greater value in
controlling the process and results. Along with the calculation of several metrics
values the tools attempt to support testing procedures as well taking into consider-
ation the information coming from the metrics values. In this section a number of
product metrics tools will be discussed. These tools were chosen because of their
wide use or because they represent a particularly interesting point of view. The tools
presented reflect the areas where most work on product metrics has been done. Ref-
erences have been provided for readers who are interested in further examining a
tool.
Revision: final 46
2.7.1 CT C++ -CMT++-CTB
CT C++ , CMT++ and CTB 27 are all tools developed by the Finish company, Testwell
and available from Verifysoft for Microsoft Windows, Solaris, HP-UX and Linux. They
focus on test coverage (CT C++ ), metric analysis (CMT++) and unit testing (CTB)
for C/ C++ source code. CT C++ is a coverage tool supporting testing and tuning
of programs written in C and C++ . This coverage-analyser supports coverage for
function, decision, statement, condition and multi-condition presenting the result
in a text or HTML report. The analyser is available for coverage measuring in the
host, for operating systems as well as for embedded systems. The tool is integrated
in Microsoft Visual C++ , the Borland compiler and WindRiver Tornado. CMT++ is
a tool for assessing code complexity. Code complexity has effect on how difficult
it is to test and maintain an application. Complex code is likely to contain errors.
Metrics like McCabe Cyclomatic Complexity, Halsteads Software Metrics and Line-
of-Code-Mare are supported by the tool. The tool can be customised by the user
for company coding standards. CMT++ identifies complex and error-prone code.
As there is usually too little time to inspect all the code carefully, it is an important
step to select the most error-prone modules. CMT++ also gives an estimation of
the number of test cases needed to test all paths of a function and gives you an
idea of how many bugs you should find to have a “clean” code. CTB is a module-
testing-tool for C programming language that allows the testing of the code at a
very early development stage having as a result the prevention of bugs. As soon
as the module compiles, the test bed can be generated on it without any additional
programming. The tool supports specification based (black-box) testing approach
from “ad-hoc”-trial to systematic script-based regression-tests. Tests can run in an
interactive mode with a C-like command interface as well as script- or file-based
and made automatic. Test based execution is as if the test driver would read the
test main program and immediately execute it command-by-command showing what
happens. CTB works together with coverage analysis tools, such as CT C++ .
2.7.2 Cantata++
Cantata++ 28 is a commercial tool for unit and integration testing, coverage and
static analysis. It tool is built on Eclipse v3.2 Open Source development platform
including the C Development Tools (CDT). Unit and test integration capabilities of
the environment support automated test script generation by parsing source code to
derive parameter and data information with stubs and wrappers automatically gen-
erated into the test script. Stubs provide programmable dummy versions of external
software while wrappers are used for establishing programmable interceptions to
the real external software. The building and the execution of tests, black and white
27
http://www.verifysoft.com
28
http://www.ipl.com/products/tools/pt400.uk.php
Revision: final 47
Figure 14: Cantata++ V5 - a fully integrated Test Development and Analysis Envi-
ronment
box, is supported both by the tool and also via the developer’s build system. Verifi-
cation of the code is also supported by providing sequential execution of test cases
based on wrappers and stubs. The test cases defined in verification can be reused
for inherited classes and template instantiation. Figures 14 and 15 present the en-
vironment of the tool.
Coverage analysis provides measurement of how effective testing has been in ex-
ecuting the source code. Configurable coverage requirements are defined in rule
sets that are integrated into dynamic tests resulting in Pass/Fail for coverage re-
quirements. The coverage metrics used by the tool are the following:
• Entry points
• Call Returns
• Statements
• Basic Blocks
• Decisions (branches)
• Conditions MC/DC (for DO-178B)
Cantata has certain features that support coverage especially for applications de-
veloped in Java such as reuse of JUnit tests with coverage by test case, and builds
with ANT. Static analysis generates over 300 source code metrics. The results of
these metrics are stored in reports that can used to help enforce code quality stan-
dards. The metrics defined are both procedural and product metrics. Procedural
Revision: final 48
Figure 15: Automated Test Script, Stub and Wrapper generation
metrics involve code lines, comments, functions and counts of code constructs. Prod-
uct metrics calculate Myers, MOOSE, McCabe, MOOD, Halstead, QMOOD, Hansen,
Robert Martin, McCabe, Object Oriented, Bansiya’s Class Entropy metrics. Can-
tata++ can be integrated with many development tools including debuggers, simula-
tors/emulators, UML modelling, Project Management and Code execution profilers.
2.7.3 TAU/Logiscope
Logiscope 29 supports automated error-prone module detection and code reviews

for bug detection. This is enabled by the use of quality metrics and coding rules
to identify the modules that are most likely to contain bugs. Finally the tool pro-
vides direct connection to the faulty constructs and improvement recommendations.
There is a set of predefined coding and naming rules or quality metrics, which can
be customised to comply with specific types of project and organisational guidelines
along with reuse industry standards. The main aspect of the tool is the establish-
ment of best coding practices that are used both to test the existing code and to
train developers. Logiscope supports three basic functions, RuleChecker, Audit and
Testchecker. RuleChecker checks code against a set of programming rules, prevent-
ing language traps and code misunderstandings. There are over 220 coding and
naming rules initially in the tool with potentials for other rules to be added. Lo-
giscope Audit locates error-prone modules and produces quantitative information
based on software metrics and graphs that is used for the analysis of problems and
the rendering of corrective decisions. The decision may involve either the rewrite of
the module or the more thorough testing. Software metrics templates used to evalu-
29
http://www.telelogic.com/products/logiscope/index.cfm
Revision: final 49
Figure 16: Results presented in Logiscope
ate the code are ISO 9126 compliant. Templates as mentioned can be customised to
fit project-specific requirements. Logiscope TestChecker measures structural code
coverage and shows uncovered source code paths having as a result the discovery
of bugs hidden in untested source code. TestChecker is based on a source code in-
strumentation technique that is adaptable to test environment constraints. Figure
16 shows the way results are depicted by Logiscope. Both the three functions of the
tool are based on international recognised standards and models such as SEI/CMM,
DO-178B and ISO/IEC 9126 and 9001. Several techniques that methodically track
software quality for organisations at SEI/CMM Level 2 (repeatable) that want to
reach Level 3 (defined) and above are supported. “Reviews and Analysis of the
Source Code” and the “Structural Coverage Analysis” as required by the avionics
standard, DO-178B, for software systems from Levels E to A are partially supported
by Logiscope as well as “Quality Characteristics” as defined by ISO/IEC 9126. The
Logiscope product line is available for both UNIX and Windows.
2.7.4 McCabe IQ
McCabe30 IQ manages software quality through advanced static analysis based on

McCabe’s research in software quality measurement and tracks the system’s metric
values over time to document the progress made in improving the overall stability
and quality of the project. The tool identifies error-prone code by using several
metrics:
• McCabe Cyclomatic Complexity

30
http://www.mccabe.com/iq.htm
Revision: final 50
Figure 17: Battlemap in Mc Cabe IQ
• McCabe Essential Complexity
• Module Design Complexity
• Integration Complexity
• Lines of Code
• Halstead
By using the above metrics complex code is identified. Figure 17 shows an example
of how complex code identification is presented to the user. The Battlemap uses
colour coding to show which sections of code are simple (green), somewhat complex
(yellow), and very complex (red). Figure 18 presents the metric statistics that the
tool calculates. Another function supported is the tracking of redundant code by
using a module comparison tool. This tool allows the selection of predefined search
criteria or the establishment of new criteria for finding similar modules. After the
selection of the search criteria the process is as follows: selection of the modules
you used for matching, specification of programs or repositories that will be used
for searching and finally localisation of the modules that are similar to the ones used
for matching based on the search criteria selected. Then it is determined if there is
any redundant code. If redundant code is found it is evaluated and if needed reengi-
neered. The tool provides a series of data metrics. The parser analyses the data
declarations and parameters in the code. The result of this analysis is the produc-
tion of metrics based on data. There are two kinds of data-related metrics: global
data and specified data. Global data refers to those data variables that are declared
as global in the code. Based on the result of the parser’s data analysis reports are
Revision: final 51
Figure 18: Presentation of the metrics statistics
produced that show how global data variables are tied to the cyclomatic complexity
of each module in code. As cyclomatic complexity and global data complexity in-
crease, so does the likelihood that the code contains errors. Specified data refers
to the data variables that are specified as what is called a specified data set in the
data dictionary. In general, a data set is specified in the data dictionary one or more
variables have to be located in the code in order to analyse their association with
the complexity of the modules in which they appear. The tool includes a host of
tools and reports for locating, tracking, and testing code containing specified data,
as well as for enforcing naming conventions. The tool is platform independent and
supports Ada, ASM86, C, C++ .NET, C++ , COBOL, FORTRAN, JAVA, JSP, Perl, PL1,
VB, VB.NET
2.7.5 Rational Functional Tester (RFT)
Rational Functional Tester 31 is an automated functional and regression testing tool

for Java, Visual Studio .NET and Web-based applications. It provides automated ca-
pabilities for activities such as data-driven testing and it includes pattern-matching
capabilities for test script resiliency in the face of frequent application user inter-
face changes. RFT incorporates support for version control to enable parallel de-
velopment of test scripts and concurrent usage by geographically distributed teams.
The tool includes several components. IBM Rational Functional Tester Extension
for Siebel Test Automation provides automated functional and regression testing for
Siebel 7.7 applications. Combining advanced test development techniques with the
simplification and automation of basic test needs, Rational Functional Tester Exten-
31
http://www-306.ibm.com/software/awdtools/tester/functional/features/index.html
Revision: final 52
sion for Siebel Test Automation accelerates the process of system test creation, exe-
cution and analysis to ensure the early capture and repair of application errors. IBM
Rational Manual Tester is a manual test authoring and execution tool for testers and
business analysts. The tool enables test step reuse to reduce the impact of software
change on manual test maintenance activities and supports data entry and verifi-
cation during test execution to reduce human error. IBM Rational TestManager is
a tool for managing all aspects of manual and automated testing from iteration to
iteration. It is the central console for test activity management, execution and re-
porting supporting manual test approaches, various automated paradigms including
unit testing, functional regression testing, and performance testing. Rational Test-
Manager is meant to be accessed by all members of a project team, ensuring the
high visibility of test coverage information, defect trends, and application readiness.
IBM Rational Functional Tester Extension for Terminal-based Applications allows
the testers to apply their expertise to the mainframe environment while continuing
to use the same testing tool used for Java, VS.NET and Web applications.
2.7.6 Safire
SAFIRE 32 Professional is a fully integrated development and run-time environment

optimised for the implementation, validation and observation of signalling systems.
It is used for a wide range of applications, such as gateways, signalling testers and
protocol analysers. The tool is based on international standards, such as UML, SDL,
MSC, ASN.1 and TTCN (ITU-T, ETSI, ANSI, ISO). SAFIRE supports testing features
for signalling systems that can be validated to various levels of confidence, from top-
level tests to detailed conformance tests according to international standards. The
tests generated are automated, deterministic, reproducible and documented. The
tool has a modular architecture that involves the following components:
• SAFIRE Designer - graphical editor, viewer, compiler
• SAFIRE Campaigner - test execution and report generator
• SAFIRE Animator - slow motion replay (actions, events, behaviour)
• SAFIRE Tracer - protocol analyser
• SAFIRE Organiser - version control and project management
• SAFIRE VM Virtual Machine - high performance virtual machine
The component that is most involved in quality assurance is the Campaigner that
supports automated execution of tests. This component creates, edits, manages and
executes test campaigns allowing the configuration of parameters. Campaigner also
32
http://www.safire-team.com/products/index.htm
Revision: final 53
produces test report in the form of quality pass or fail modules. Also the tool allows
automated repentance of certain tests. The quality rules that are used during the
design and the testing of the code are the following:
• System structure
• Naming conventions-existence
• Naming conventions-properties
• SDL simplicity
• Uniqueness
• Modularity
• Proper-functionality
• Comments
• Communication
• Events
• Behaviour
2.7.7 Metrics 4C
Metrics4C33 calculates software metrics for individual modules or for the entire
project. These tools run interactively or in the background on a daily, weekly, or
monthly basis. The software metrics calculated for an individual module include:
• Lines of code
• Number of embedded SQL lines
• Number of blank lines
• Number of comment lines
• Total number of lines
• Number of decision elements
• Cyclomatic complexity
• Fan out
33
http://www.plus-one.com
Revision: final 54
The above values are then summed to provide their respective project metrics. In
addition, other project metrics calculated include:
• Average project cyclomatic complexity
• Project fan out metric (with and without leaf nodes)
• Total number of procedures and functions
• Total number of source code and header files
• Lines of code in source code and header files
• Total number of source code files unit tested
• Number of embedded SQL statements
• Lines of code unit tested
• Percent of files unit tested
• Integration Test Percentage
The Integration Test Percentage (ITP) provides a numeric value indicating how much
of the project’s source code has been tested and can be used to better prepare for
Formal Qualification Testing (FQT). Output from Metrics4C can easily be imported
into a spreadsheet program to graphically display the data. Metrics4C can also flag
warnings if the lines of code or the cyclomatic complexity value exceeds a specified
maximum.
2.7.8 Resource Standard Metrics
Resource Standard Metrics 34 for C/ C++ and Java in any operating system generates
source code metrics. Source code quality metrics and complexity are measured by
this tool from the written source code having as a target to evaluate the projects
performance. Source code metric differentials can be determined between base-
lines using RSM code differential work files. Source code metrics (SLOC, KSLOC,
LLOC) from this tool can provide line of code derived function point metrics. RSM is
compliant with ISO9001, CMMI and AS9100. Typical functionality of RSM enables:
• The determination of source code LOC, SLOC, KSLOC for C, C++ and Java
• Measurement of software metrics for each baseline and determine metrics dif-
ferentials between baselines
34
http://msquaredtechnologies.com/
Revision: final 55
• Capturing baseline code metrics independent of metrics differentials in order

to preserve history.
• Report of CMMI, ISO metrics for code compliance audit
• Performance of source code static analysis, best used for code peer reviews
• Remove of tabs, conversion from DOS to UNIX format.
• Measurement and analysis of source code for outsourced or subcontracted

code.
• Measurement of cyclomatic code complexity and analysis of interface complex-

ity for maintenance.
• Creation of user defined code quality notices with regular expressions or utili-
sation of the 40 predefined code quality notices.
2.7.9 Discussion
Most of the testing and product metrics tools provide the online capability to record
defect information including severity, class, origin, phase of detection, and phase
introduced. Several tools automate the testing procedure by providing estimation
of error prone code and automatically generating results and reports. Metrics tools
provide a variety of metrics reports or transport data into spreadsheets or report
generators. Query and search capabilities are also provided. Users have the ca-
pability to customise tools to meet their organisation’s unique requirements. For
example, users can customise quality rules, workflow, queries, reports, and access
controls. Other common features of the tools studied include:
• Graphical user interface.
• Integration to databases, spreadsheets, version control tools, configuration

management systems, test tools, and E-mail systems.
• Support for ad hoc queries and reports.
• Support for standards, i.e., CMMI, DoD-STD-2167A and ISO 9000.
• Support for distributed development.
• Ability to link defects and track duplicate defect reports.

Metrics capabilities of tools in most cases involve:
• Data gathering.
• Measurement analysis.
• Data reporting.
Revision: final 56
3 Empirical OSS Studies
3.1 Evolutionary Studies

3.1.1 Historical Perspectives
Back in 1971, in his book titled “The Psychology of Computer Programming,” Gerald
M. Weinberg was probably the first who analysed the so-called “egoless program-
ming,” meaning non-selfish, altruistic programming. This term was used in order to
describe the function of a software development environment in which volunteers
participate actively by discovering and fixing bugs, contributing new code, express-
ing ideas etc. These activities are without any direct material reward. Weinberg
subsequently observed that when developers are not territorial about their code and
encourage other people to look for bugs and potential improvements, then improve-
ment happens much faster [Wei71].
Several years later, Frederick P. Brooks, in his classic “The Mythical Man-Month:
Essays on Software Engineering,” predicted that OSS developers will play a signif-
icant role in software engineering in the future. In addition, he claimed that main-
taining a widely used program is typically 40% or more of the cost of developing it.
This cost is strongly affected by the number of users or developers of the specific
project. As more people will find more bugs and other flaws, the overall cost of the
software will be reduced. Brooks concluded [Bro75] that, this is why OSS can be
competent and sometimes even better than conventionally-built software.
In his influential article, “The Cathedral and the Bazaar,” Eric Steven Raymond,
gathered and presented the main features of OSS development. Starting with the
analysis of his own OSS project, Fetchmail, he distinguished the classical “Cathedral-
like” way of developing a commercial software from the new, “Bazaar-like” world of
Linux and other FOSS projects. Eventually, he came up with a series of lessons to be
learned, which can very well serve as principles that make a FOSS project successful
[Ray99].
According to OSS History written by Peter H. Salus [Sal], there are indications
that OSS development has its roots in the 1980’s or even earlier. But Raymond’s arti-
cle was actually the first attempt for a systematic approach to OSS and its methods.
His work though has met a lot of opposition, both in the FOSS community [DOS99]
and the academic circles [Beza, Bezb], as being too simplistic and shallow. No mat-
ter how controversial Raymond’s article is, its main contribution is that it raised a
widespread interest in OSS empirical studies. Since the dawn of the new millen-
nium, a satisfactory number of research essays on this subject have been published.
Some findings of these essays are described below, in order to let us gain a deeper
understanding on the evolution of several famous OSS projects.
Revision: final 57
3.1.2 Linux
The Linux operating system kernel is the best-known FOSS project worldwide, there-
fore it’s a case worth of a closer study. The Linux project started in 1991 as a private
research project by a 22-years-old Finnish student named Linus Torvalds. Being dis-
satisfied with the existing operating systems, he started programming a kernel him-
self, based on code and ideas from Minix, a tiny Unix-like operating system. Linux’s
first official release version 1.0 occurred in March 1994.
Today, Linux is one of the dominant computer operating systems, enjoying world-
wide acceptance. It is a large system: it contains over four millions lines of code and
it releases new versions very often. It has occupied hundreds of developers, who
have willfully dedicated a lot of their time to fix bugs, develop new code and report
their ideas for its evolution. According to Wikipedia’s relative article, it is estimated
that Linus Torvalds himself has contributed only about 2 per cent of Linux’s code,
but he remains the ultimate authority on what new code is incorporated into the
Linux kernel. Such a case is definitely a fine example of how a FOSS community
can work successfully by gathering the powers of a large, geographically distributed
community of software specialists.
The growth of the Linux Operating System began by following two parallel paths:
the stable and the development releases. The stable release contains features that
have been already tested, showing a proven stability, ease of use and lack of bugs.
The development release contains more features that are still in an experimental
phase, therefore it lacks stability and it contains more bugs. As one would expect,
there are more development releases than stable ones. Also, the features of develop-
ment releases that have been adequately tested are incorporated in the next stable
release. This development concept has played a big part in the project’s success, as
it provides conventional users with a reliable operating system (the stable release)
and at the same time giving software developers the freedom to experiment and try
new features (the development release).
Following Raymond’s analysis on development method of the Linux operating sys-
tem, Godfrey and Tu presented a research of Linux’s evolution over the years from
1994 till 1999 [GT00]. As they say, most might think that as Linux got bigger and
more complex, its growing pace should slow down. This is also what the well-known
Lehman’s laws of software evolution suggest: “as systems grow in size and com-
plexity, it becomes increasingly difficult to insert new code” [LRW+ 97]. In the same
context, Turski analysed several large software systems - that were all created and
maintained by small, predefined teams of developers using traditional management
techniques. From his study, Turski posits that system growth is usually sub-linear.
That is, a software system slows down as the system grows in volume and complex-
ity [Tur96]. Also Parnas referred to this subject, by comparing software aging with
human aging [Par94].
But the findings of Godfrey and Tu after studying the evolution of Linux, indicated
Revision: final 58
Figure 19: Growth of the compressed tar file for the full Linux kernel source release,
([GT00], p.135).
a different trend. The methodology that they employed was to examine Linux both
at the overall system level and at each one of the major subsystems. In this way,
they were able to study not just the whole system’s evolution of size, but each major
subsystem’s volume as well. This concept can provide us with more information, as it
is not obligatory that each and every subsystem follows the same evolution patterns
with the overall system. A sample of 96 kernel versions was selected, including 34
stable releases and 62 development releases. Two main metrics were used in this
research: the size of tar files and the number of the lines of code (LOC). A tar file
includes all the source artifacts of the kernel, such as documentation, scripts and
other, but no binary files. LOC were counted in two ways: with the Unix command
wc -l (that included blank lines and comments) and with an awk script (that ignored
blank lines and comments).
Regarding the overall system’s growth, the results of this research show that the
development releases grew at a super-linear rate over time, while the stable releases
grew at a much slower rate (Figures 19 and 20). These tendencies are common for
both metrics that were used. It is therefore clear that Linux’s development releases
follow an evolution pattern that differs from the Lehman’s laws of software evolu-
tion. We can support the view that this happens due to the way development releases
are built: they attract capable developers that are willing to contribute to the sys-
tem’s growth. As the project’s popularity rises, more developers are attracted to it
and more code is contributed. The stable releases, that follow a more conservative
development path and don’t accept new contributions too easily, show a slower rate
of size growth.
As for the growth of major subsystems, Godfrey and Tu selected 10 of these
Revision: final 59
Figure 20: Growth in the number of lines of code measured using two methods: the
Unix command wc -l, and an awk script that removes comments and blank lines,
([GT00], p.135).
subsystems:
• drivers: contains the drivers for various hardware devices
• arch: contains the kernel code that is specific to particular hardware architec-
tures/CPU’s
• include: contains most of the system’s include (header) source files
• net: contains the main networking code
• fs: contains support for various kinds of file systems
• init: contains the initialisation code for the kernel
• ipc: contains the code for inter-process communications
• kernel: contains the main kernel code that is architecture independent
• lib: contains the library code
• mm: contains the memory management code
Figure 21 shows the evolution of each one of these subsystems in terms of LOC.
We notice that drivers subsystem is both the biggest subsystem and the one with the
fastest growth. In Figure 22, a comparative analysis of each subsystem’s LOC versus
the overall system’s LOC is presented. We can see that drivers occupy more than 60
Revision: final 60
Figure 21: Growth of SLOC of the major subsystems of Linux (development

releases),([GT00], p.138).
per cent of the total system’s size and this percentage is continuously growing. This
fact can be explained as a result of Linux’s rising awareness: more users wish to run
it with many different types of devices, therefore the respective drivers have to be
included to the system.
A recent observation of Linux’s evolution was published by [Rob05]. He employed
a methodology similar to this of Godfrey and Tu, but examined all the available re-
leases of Linux (both stable and development) till December 2004, instead of picking
a sample. The metric that was used in this research was the SLOCCount tool, which
counts source lines of code written in identified source code files. The kernel had
grew a lot in comparison to the previous survey: the number of SLOC and the size
of tar file were more than double. This trend is visible in Figures 23 and 24: the
super-linearity of Linux’s evolution is even more remarkable during the last years.
Like Godfrey and Tu, Robles also examined the evolution of Linux’s major sub-
systems, as we can see in Figures 25 and 26. The results were similar, as drivers is
still the biggest subsystem, though its share of the total Linux kernel has decreased,
mainly due to the removal of sound subsystem in early 2002.
All in all, we conclude that the OSS communities’ power can push a project to
super-linear growth, in contrast to the typical software evolution rules. Voluntary
participation in a software’s development ensures that the participants are really
interested on it both as developers and as users. In this case, software isn’t treated
merely as a commercial product, but as a means of improving people’s lives. Linux
is a very good example of such a case.
Revision: final 61
Figure 22: Percentage of SLOC for each major subsystem of Linux (development
releases), ([GT00], p.138).
Figure 23: Growth of SLOC of Linux for all the stable and development releases,
([Rob05], p.89).
Revision: final 62
Figure 24: Growth of the tar file (right) and the number of files (left) for the full
Linux kernel source release, ([Rob05], p.90).
Figure 25: Growth of SLOC of the major subsystems of Linux (development re-
leases), ([Rob05], p.91).
Revision: final 63
Figure 26: Percentage of SLOC for each major subsystem of Linux (development
releases), ([Rob05], p.93).
3.1.3 Apache
Another famous OSS project is the Apache web server. It began in early 1995 by
Rob McCool, a software developer and architect who was 22 years old at that time.
Apache was initially an effort to coordinate the improvement of NCSA (National
Center for Supercomputing Applications) HTTPd program, by creating patches and
adding new features. Actually this was the initial explanation of the project’s name:
it was “a patchy” server. Later though, the project’s official website claimed that
Apache name was given as a sign of respect to the native American tribe of Apache.
Apache quickly attracted the attention of an initial core team of developers, who
formed the “Apache Group,” and it was first launched in early 1996, as Apache HTTP
version 1.0. That time, it was actually the only workable Open Source alternative to
the Netscape web server. Since April 1996, it has reportedly been the most popular
HTTP server on the internet, as it hosts over half of all websites globally.
One of the most comprehensive research on Apache server was conducted by Au-
dris Mockus, Roy T. Fielding and James Herbsleb in 2002 [MFH02]. In this research,
they discuss about the way the Apache development occurred and they present some
quantitative results of Apache’s development evolution. The following information is
based on this article.
As we mentioned earlier, the “Apache Group” was formed at the initial stage of
the project and it was charged with the project’s coordination. It was an informal
organisation of people, consisted entirely of volunteers, who all had other full-time
jobs. Therefore they decided to employ a decentralised, scattered development con-
cept, that supported asynchronous communication. This was achieved through the
Revision: final 64
Figure 27: The cumulative distribution of contributions to the code base, ([MFH02],
p.321)
use of e-mailing lists, newsgroups and the problem reporting system (BUGDB). Ev-
ery developer may take part in the project, submit his contributions and then the
“Apache Group” decides on the inclusion of any code change. Apache core develop-
ers are free to choose the project’s area that most attracts them and leave it when
they are no more interested in it.
Mockus, Fielding and Herbsleb studied several aspects of Apache’s development.
Firstly, they examined the participation of the project’s development community,
which counts almost 400 individuals, in the two main parts of the software’s devel-
opment: code generation and bug fixes. In Figure 27, we can see the cumulative
proportion of code changes (on the vertical axis) versus the top N contributors to
the code base (on the horizontal axis), which are ordered by the number of Modifi-
cation Requests (MRs) from largest to smallest. Code contribution is measured by
4 factors: MRs, Delta, Lines Added and Lines Deleted. The Figure shows that the
top 15 developers contributed more than 83 per cent of MRs and deltas, 88 per cent
of lines added, and 91 per cent of deleted lines. Similarly, Figure 28 shows the cu-
mulative proportion of bug fixes (vertical axis) versus the top N contributors to bug
fixing. This time, the core of 15 developers produced only 66 per cent of the fixes.
These two figures show that the participation of a wide development community
is more important in defect repair than in new code submission. We notice that,
despite the broad overall participation in the project, almost all new functionalities
are created by the core developers. A broad developers’ community though, is es-
sential for bug fixing. Mockus, Fielding and Herbsleb made a comparative analysis
of these findings to several commercial projects’ data. This study’s outcome was
that in commercial projects, core developers’ contribution in the project’s evolution
Revision: final 65
Figure 28: The cumulative distribution of fixes, ([MFH02], p.322)
was significantly lower than in Apache. As an attempt to interpret these findings,

we can argue that Apache core developers seem to be very productive compared to
commercial software’s developers. This conclusion is strengthened given the fact
that participation in Apache’s development is a voluntary, part-time activity.
3.1.4 Mozilla
Mockus, Fielding and Herbsleb [MFH02] present an analysis of another OSS project,
the Mozilla web browser. Mozilla was initially created as a commercial project by
Netscape Corporation, which (in January 1998) decided to distribute its communi-
cator free of charge, and give free access to the source code as well - therefore
turning it into a OSS project. Netscape was actually so impressed by Linux’s evo-
lution, that they were attracted by the idea of developing an Open Source web
browser. The project’s management was assigned to the “Mozilla Organisation,”
now named “Mozilla Foundation.” Nowadays, the foundation coordinates and main-
tains the Mozilla Firefox browser and the Mozilla Thunderbird e-mail application,
among others.
Mockus, Fielding and Herbsleb investigate the size of Mozilla’s development
community. By examining the project’s repository, they found 486 code contribu-
tors and 412 bug fixes contributors. In Figure 29, we can see the project’s external
participation over time. The vertical axis represents the fraction of external devel-
opers and the horizontal axis represents time. It is clear that participation gradually
increases over time, as a result of widespread interest and improved documentation.
As an example, it is mentioned that 95 per cent of the people who created problem
reports were external, and they committed 53 per cent of the total number of prob-
Revision: final 66
Figure 29: Trends of external participation in Mozilla project, ([MFH02], p.335)
lem reports. Figure 30 shows the cumulative distribution of code contribution for
seven Mozilla modules. In this case, the developer contribution does not seem to
vary as much as in Apache project.
Mozilla represents a way in which commercial and Open Source development
approaches could be combined. The interdependence among Mozilla modules is
high and the effort dedicated in code inspections is high. Therefore, Mozilla’s core
teams are bigger than in Apache, employing more formal means of coordinating
the project. But the fact is that, despite its commercial development roots, Mozilla
managed to leverage the OSS community, achieve high participation and result in a
high-quality product.
3.1.5 GNOME
GNOME is also one of the biggest and most famous OSS projects. It is a desktop
environment for Unix systems and its name was formed as an acronym of the words
“GNU Network Object Model Environment.” In 2004, Daniel M. German published a
research of GNOME, in order to examine how global software development can lead
to success [Ger04b]. The discussion below is based on that article.
The GNOME project was started by Miguel de Icaza, a Mexican software pro-
grammer. Its first version was released in 1997 and contained one simple appli-
cation and a set of libraries. Today, GNOME has turned into a large project, with
more than two millions of LOC and hundreds of developers worldwide. In 2000, the
GNOME Foundation (similar to Apache’s Software Foundation) was established. It is
composed of four entities: the Board of Directors, the Advisory Board, the Executive
Director and the members. Many of the participants in the Board of Directors are
Revision: final 67
Figure 30: The cumulative distribution of contributions to the code base for seven
Mozilla modules, ([MFH02], p.336
fully employed in private companies. The Advisory Board is composed of corporate

and non-profit organisations. Membership can be granted to any of the current con-
tributors to the project, which can be non-programmers as well. By October 2003,
the Foundation counted 320 members. The GNOME Foundation is also responsi-
ble to organise sub-committees that will run some of the project’s administrative
tasks, like the Foundation membership Committee, the Fund-raising Committee, the
Sysadmin Committee, the Release Team etc.
German reaches several interesting conclusions by examining the contributions
and the overall project’s evolution. First of all, an important factor of GNOME’s suc-
cess is the wide participation in decision-making process. Developers are treated
as equal partners of the project and are inspired by its goals, which explains their
motivation to work. Secondly, an essential feature of GNOME is the use of multiple
types of communication, like mailing lists, IRC and reports on the project’s current
state of development. There are scheduled meetings about GNOME’s evolution, that
boost collaboration and team-spirit between contributors. Moreover, the creation
of task forces makes their members accountable and committed towards GNOME’s
improvement. Finally, there are clear procedures and policies for conflict manage-
ment, as well as a strong culture of creating documentation, so that contributors
know what others are working on.
3.1.6 FreeBSD
FreeBSD is an open-source operating system that is derived from BSD, the version of
Unix that has been developed by the University of California. The project started in
Revision: final 68
Figure 31: FreeBSD stable release growth by release number, ([IB06], p.207)
1993 and its current (end of 2006) stable version is 6.1. It is run by the FreeBSD de-
velopers that have commit access to the project’s CVS. As it is considered a success-
ful OSS project, it has attracted scientific interest over its evolutionary process. The
most recent publication on FreeBSD’s evolution has been committed by Clemente
Izurieta and James Bieman [IB06]. Based on an earlier study of Trung Dinh-Trong
and James Bieman [DTB04] that praised the system’s organizational structure, Izuri-
eta and Bieman focused on examining the growth rate of FreeBSD stable releases
since its inception, by employing metrics such as LOCs, number of directories, total
size in Kbytes, average and median LOC for header (dot-h) and source (dot-c) files,
and number of modules for each sub-system and for the system as a whole.
This study indicates that FreeBSD follows a linear (and sometimes sub-linear)
rate of growth, as it is demonstrated in figures 31 to 35. We observe that dot-c and
dot-h files (figure 34) show a very slight growth in size, which is due to the fact
that the system does not evolve in an uncontrolled manner, as Izurieta and Bieman
explain. It also has to be clarified that in figure 35, contrib subsystem contains soft-
ware contributed by users, and sys subsystem is the system’s kernel. As one could
expect, sys is smaller in size and grows in a slower pace than contrib, because its
content goes through a stricter validation process before its inclusion in the system.
3.1.7 Other Studies
During the last years, some horizontal studies of OSS projects have been published,
in which several projects are examined collectively. Such an example is an article
by Andrea Capiluppi, Patricia Lago and Maurizio Morisio [CLM04], in which they
pick up 12 projects from the Freshmeat Open Source portal. These projects were
Revision: final 69
Figure 32: FreeBSD cumulative growth rate, ([IB06], p.207)
Figure 33: FreeBSD release sizes by development branch, ([IB06], p.208)
Revision: final 70
Figure 34: FreeBSD average and median values of dot-c and dot-h files, ([IB06],
p.209)
Figure 35: FreeBSD contrib and sys sub-systems, ([IB06], p.210)
Revision: final 71
all “alive,” meaning that they had shown significant growth over time and there
were still developers working on them by the date of the research. Actually, the
authors report that during their research on Fresh Meat portal, they discovered
that a significant percentage of the hundreds of accessible OSS projects were not
evolving anymore, having no developers and no growth for a considerable amount
of time. The authors concluded that that mortality of OSS projects is quite high.
After an initial observation of the sample, they clustered the 12 projects into
three categories: large, medium and small projects, as follows: Large projects: Mutt,
ARLA Medium projects: GNUPARTED, Weasel, Disc-cover, XAutolock, Motion, Bub-
blemon Small projects: Dailystrips, Calamaris, Edna, Rblcheck
The authors analysed some basic attributes of these projects, such as size, mod-
ules and number of developers. According to their findings, all projects had grown
at a linear rate over time, both in terms of size and in terms of the number of de-
velopers. Some periodic fluctuations of the code’s size were noticed, mainly caused
by internal redesigns of the software, but the long-term view has been upward in
all cases. In large and medium projects, the core teams had grown as well, but in a
limited way, which suggests that there is always a ceiling in the core project teams’
expansion. The same patterns of linear or sub-linear growth have been discovered
for the number of modules, too. In a later study, Andrea Capiluppi, Maurizio Morisio
and Juan Ramil proceeded to a further examination of the ARLA project, reaching
similar conclusions [CMR04].
Finally, another interesting research has been carried out by James W. Paulson,
Giancarlo Succi and Armin Eberlein [PSE04]. In order to test the effectiveness of
OSS development process, they investigated the evolutionary patterns of three ma-
jor OSS projects (Linux, GCC and Apache) in comparison to three closed-source
software projects, the names of which were kept confidential. According to their
findings, OSS development structure fosters creativity and constructive communica-
tion among the developers more effectively than traditional ways of software devel-
opment, because the new functions and features that were added to OSS projects
were bigger in number and in volume than the ones added to closed-source software
projects. In addition, OSS projects perform faster fixing of bugs and other defects,
because of the greater number of developers and testers that contribute to them.
However, the evidence presented in this research does not support the arguments
that OSS systems are more modular and grow faster than closed-source competitors.
3.1.8 Simulation of the temporal evolution of OSS projects
A generic structure for F/OSS simulation modeling
The authors in [ASAB02] and later on in [ASS+ 05] described a general framework
for F/OSS dynamical simulation models and the extra difficulties that have to be
confronted relative to analogous models of the closed-source process. It is actually
Revision: final 72
a framework for discrete-event simulation models which the authors presented as

follows:
1. Much unlike closed source projects, in F/OSS projects, the number of contrib-
utors greatly varies in time and is based on the interest that the specific F/OSS
project attracts. It cannot be directly controlled and cannot be predetermined
by project coordinators. Therefore, an F/OSS model should a) contain an ex-
plicit mechanism for determining the flow of new contributors as a function of
time and b) relate this mechanism to specific project-dependent factors that
affect the overall “interest” in the project.
2. In any F/OSS project, any particular task at any particular moment in time
can be performed either by a new contributor or an old one. In addition, al-
most all F/OSS projects have a dedicated team of “core” programmers that
perform most of the contributions, while their interest in the project stays ap-
proximately the same. Therefore, the F/OSS simulation model must contain
a mechanism that determines the number of contributions that will be under-
taken per category of contributors (e.g. new, old or core contributors) at each
time interval.
3. In F/OSS projects, there is also no direct central control over the number of
contributions per task type or per project module. Anyone may choose any
task (eg. code writing, defect correction, etc) and any project module to work
on. The allocation of contributions per task type and per project module depend
on the following sets of factors:
(a) Programmer profile (eg. some programmers may prefer code testing to
defect correcting). These factors can be further categorized as follows:
i. constant in time (eg. the preference of a programmer in code-writing)
and
ii. variable with time (eg. the interest of a programmer to contribute
to any task or module may vary based on frequency of past contribu-
tions).
(b) Project-specific factors (eg. a contributor may wish to write code for a
specific module, but there may be nothing interesting left to write for that
module).
Therefore, the F/OSS model should (a) identify and parameterise the depen-
dence of programmer interest to contribute to a specific task/module on (i)
programmer profile, (ii) project evolution and (b) contain a quantitative mech-
anism to allocate contributions per task type and per project module.
Revision: final 73
4. In F/OSS projects, because there is no strict plan or task assignment mech-

anism, the total number of Lines of Code (LOC) written by each contributor
varies significantly per contributor and per time period, again in an uncon-
trolled manner. Therefore, project outputs such as LOC added, number of
defects or number of reported defects are expected to have much larger sta-
tistical variance than in closed source projects. The F/OSS simulation model
should determine delivered results of particular contributions in a stochastic
manner, i.e. drawing from probability distributions. This is a similar practice
to what is used in closed source simulation models, with the difference being
that probability distributions here are expected to have a much larger variance.
5. In F/OSS projects there is no specific time plan for project deliverables. There-
fore, the number of calendar days for the completion of a task varies greatly.
Also, delivery times should depend on project specific factors such as the amount
of work needed to complete the task. Therefore, task delivery times should be
determined in a stochastic manner on the one hand, while average delivery
times should follow certain deterministic rules.
The authors concluded that the core of any F/OSS simulation model should be
based upon a specific behavioural model that must be properly quantified in order to
model the behaviour of project contributors in deciding a) whether to contribute to
the project or not, b) which task to perform, c) which module to contribute to and d)
how often to contribute. The behavioural model should then define the way that the
above four aspects depend on a) programmer profile and b) project-specific factors.
The formulation of a behavioural model must be based on a set of qualitative
rules. Fortunately, previous case studies have already pinpointed such rules either
by questioning a large sample of F/OSS contributors or by analysing publicly avail-
able data in F/OSS project repositories. As previous case studies identified many
common features across several F/OSS project types, one certainly can devise a be-
havioural model general enough to describe at least a large class of F/OSS projects.
Selecting a suitable equation that describes a specific qualitative rule is largely
an arbitrary task in the beginning, however a particular choice may be subsequently
justified by the model’s demonstrated ability to fit actual results. Once the be-
havioural model equations and intrinsic parameters are validated, then the model
may be applied to other F/OSS projects.
Application of an F/OSS simulation model

General procedure
Figure 36 shows the structure of a generic F/OSS dynamic simulation model. As
in any simulation model of a dynamical system, the user must specify on input a) val-
ues to project-specific time-constant parameters, b) initial conditions for the project
dynamic variables. These values are not precisely known from project start. One
Revision: final 74
INPUT (Project-Specific)
Probability distribution Behavioral model project- Initial conditions of

parameters specific parameters dynamic variables
BEHAVIOURAL
MODEL Behavioural
model
fixed
PROBABILITY parameters
DISTRIBUTIONS
Task Delivery times Task Deliverables

OSS SIMULATION
OUTPUT
TIME EVOLUTION
OF DYNAMIC
VARIABLES
Figure 36: Structure of a generic F/OSS dynamic simulation model. Figure was
reproduced from [ASS+ 05].
may attempt to provide rough estimates for these values based on results of other
(similar) real-world F/OSS projects. However, these values may be readjusted in the
course of the evolution of the simulated project as real data becomes available. If
the simulation does not get more accurate in predicting the future evolution of the
project, by applying this continuous re-adjustment of parameters, it means that a)
either some of the behavioural model qualitative rules are based on wrong assump-
tions for the specific type of project studied or, b) the values of behavioural model
which are project-independent must be re-adjusted.
Calibration of the model

The adjustment of behavioural model intrinsic parameters is the calibration pro-
cedure of the model. According to this procedure, one may introduce arbitrary val-
ues to these parameters as reasonable ’initial guesses’. Then, one would run the
simulation model, re-adjusting parameter values until simulation results satisfac-
torily fit the results of a real-world F/OSS project in each time-window of project
evolution. More than one similar type F/OSS projects may be used in the calibration
process.
Validation of the model

Once the project-independent parameters of the behavioural model are properly
calibrated, the model may be used to simulate other F/OSS projects.
Practical use of F/OSS simulation models
Revision: final 75
• Prediction of F/OSS project evolution. Project coordinators may obtain a pic-

ture of plausible evolution scenarios of the project they are about to initiate.
Software users may also be interested in such prediction, as it would indicate
when the software will most likely be available for use. This also applies to
organizations, especially if they are interested to pursue a specific business
model that is based on this software.
• F/OSS project risk management. F/OSS projects are risky, in the sense that
many, not easily anticipated, factors may affect negatively their evolution. Sim-
ulation models may help in quantifying the impact of such factors, taking into
account their probability of occurrence and the effect they may have, in case
they occur.
• What-if analysis. F/OSS coordinators may try different development processes,

coordination schemes (e.g. core programming team), tool usage, etc. to iden-
tify the best possible approach for initiating and managing their project.
• F/OSS process evaluation. The nature of F/OSS guarantees that in the fu-
ture we will observe new types of project organisation and evolution patterns.
Researchers may be particularly interested in understanding the dynamics of
F/OSS development and simulation models may provide a suitable tool for that
purpose.
Simulation studies and results

Based on the general framework described earlier, the authors of [ASAB02] pre-
sented a formal mathematical model based on findings of F/OSS case studies. The
simulation model was applied to the Apache project simulation outputs were com-
pared to real data. The model was further refined in [ASS+ 05] and similarly applied
to the gtk+ module of the GNOME project. Simulation outputs included the temporal
evolution of LOC, active programmers, residual defect density, number of reported
defects etc.
Figures 37 and 38 compare simulation results and real data for LOC vs. time for
the Apache project and gtk+ in respect.
Conclusions
In conclusion, the authors in both [ASAB02] and [ASS+ 05] claimed that existing
case studies do not contain the complete set of data necessary for a full-scale cali-
bration and validation of their simulation model. Despite this fact, qualitatively, the
simulation results demonstrated the super-linear project growth at the initial stages,
the saturation of project growth at later stages where a project reached a level of
functional completion (Apache) and the effective defect correction, facts that agree
with known studies.
Revision: final 76
Figure 37: Simulation results for the Apache project: Cummulative LOC difference
vs. time for the Apache project. The bold line is the average of the 100 runs. The
gray lines are one standard deviation above and below the average. The dashed
vertical line shows the end of the time period for which data was collected in the
Apache case study [MFH02]. Figure was reproduced from [ASS+ 05]
.
Figure 38: LOC evolution in gtk+ module of GNOME project: Cumulative LOC dif-
ference vs. time. The bold line is the expectation (average) value of LOC evolution.
The gray lines are one standard deviation above and below the average. The dashed
vertical line shows approximately the end of the time period for which data was
collected in the GNOME case study. Figure was reproduced from [ASS+ 05].
Revision: final 77
One of the most evident intrinsic limitations of the F/OSS simulation models, the
authors claimed, comes from the very large variances of the probability distribu-
tions used. On output, this leads to large variances in the evolution of key project
variables, a fact that naturally limits the predictive power of the model.
Finally, the authors concluded that despite the aforementioned intrinsic and ex-
trinsic limitations, their “first attempt” simulation runs, fairly demonstrated the
model’s ability to capture reported qualitative and quantitative features of F/OSS
evolution.
3.2 Code Quality Studies

There are no studies, regarding the code quality of Open Source software. Many
early studies focus on evolutionary aspects of Open Source software and study evo-
lution laws of Open Source software development. It is not until recently that Open
Source code quality studies appeared in highly ranked journals (not white papers by
consulting firms or subjective articles, but reviewed research), resulting the small
number of available Open Source code quality studies.
One of the first studies that examined the code quality in Open Source software
was conducted by Stamelos et al [SAOB02]. In this study authors tried to measure
the modularity and the structural quality of the code of 100 Open Source applica-
tions and tried to correlate the size of the application components with the user
satisfaction. The measurement of the applications was conducted by a commercial
tool (Telelogic Logiscope) and the quality was assessed against a quality standard
very similar with that of ISO/IEC 9126. The standard was proposed by the tool itself
and, as authors indicate, is used by more than 70 multinational companies in various
areas. The model facilitated metrics that are a mixture of size metrics, structural
metrics and complexity metrics, which can be found in this document in the metrics
section. The paper also grounds its findings on statistical foundations.
The tool measured each module of all applications and evaluates it against the
built in model. For each criterion the tool outputs a recommendation level, namely
ACCEPT, COMMENT, INSPECT, TEST and REWRITE. The result of the measurement
is depicted in Table 1. As authors notice, the table shows that the mean value of the
acceptable components is about 50%, a value that neither good nor bad, and it can
be interpreted both ways. It suggests that either the code quality of the Open Source
applications is higher than someone could expect, taken into account the nature of
Open Source software development and the time of the study, or the quality is lower
than the industrial code standard implied by the tool.
Regarding the second part of their study, component size or metrics and user
satisfaction, authors did not find any relationship between the majority of the metrics
considered and user satisfaction. However they detected indication of relationship
between the size of its component and user satisfaction (or else “external quality”
Revision: final 78
Table 1: Modules percentage in its recommendation level, as studied by Stamelos et

al.
% Minimum Maximum Mean SD Median

ACCEPTED 0 100 50.18 18.65 48.37
COMMENT 0 66.66 30.95 14.09 31.83
INSPECT 0 50 8.55 8.5 7.65
TEST 0 16 4.31 4.14 3.55
REWRITE 0 100 5.57 10.73 3.2
UNDEFINED 0 7.69 0.42 1.29 0
of a project). The two size metrics that relate with satisfaction are the “Number
of statements” and “Program Length”. This relation is negative, i.e. the bigger a
component is, the worse the performance of its “external quality”.
The authors at the end suggest that Open Source performs no worse of a standard
implied by an industrial tool and they emphasise the need for more empirical studies
in order to clarify Open Source quality performance. The authors suggest (in 2002)
that in an Open Source project programmers should follow a programming standard
and have a quality assurance plan, leading to a high quality code. This suggestion
has been recently adopted by large Open Source projects like KDE35 .
Another study from the same group that assesses the maintainability of open
source software is that of Samoladas et al [SSAO04]. In this paper, authors studied
the maintainability of five Open Source software projects and one closed source, a
comparison that is not frequent in Open Source literature. The measurement was
conducted in successive versions, allowing the study of the evolution of maintain-
ability and how it behaves over time. The maintainability was measured using the
Maintainability Index described in section 1.3.3 and the measurement was done with
the help of a metrics package found in the Debian r3.0 distribution, which contains
tools from Chris Lott’s page, and a set of Perl scripts to coordinate the whole process.
The projects under study had certain characteristics: Two of them were pure
open source projects (initiated as Open Source and continue to evolve as such), the
other is an academia project that gave birth to an Open Source project, the fourth
is a closed source project that opened its code and continued as open source, the
fifth was an Open Source project that was forked to a commercial one, while itself
continued as Open Source and the last one is the latter closed source, which code
is available with a commercial, non-modifiable licence. The result of the study was
that in all cases the maintainability of all projects deteriorates over time. When they
compared the evolution of the maintainability of the closed source one versus its
35
http://www.englishbreakfastnetwork.org/
Revision: final 79
Figure 39: Maintainability Index evolution for an Open Source project and its closed
source “fork” (Samoladas et al.).
counterpant, the closed source performs worse than the Open Source project. The
authors conclude that open source code quality, as it is expressed by maintainability,
suffers from the same problems that have been observed in closed source software
studies. They also point the fact that further empirical studies are needed in order
to produce safe results about Open Source code quality.
Another study of maintainability of Open Source projects and particularly the
maintainability of the Linux kernel, was conducted by Yu et al. [YSCO04]. Here,
authors study the number of instances of common coupling between the 26 kernel
modules and all the other non kernel modules. As coupling they mean the degree
of interaction between the modules and thus the dependency between them. In
this document, coupling was also explained in section 1.3.1. Additionally, for kernel
based software, they also consider couplings between the kernel and non kernel
modules. The reason they studied coupling as a measure for maintainability is that,
as authors suggest and explain, common coupling is connected to fault proneness,
thus maintainability.
The specific study is a follow up to previous ones, which the same team has con-
ducted. In these previous studies, they examined 400 successive versions of the
Linux kernel and tried to find relations between the size, as it is expressed by the
lines of code, and the number of instances of coupling. Their findings showed that
the number of lines of code in each kernel module increases linearly with the version
number, but that the number of instances of common coupling between kernel mod-
ules and all others shows an exponential growth. In this new study they perform an
in depth analysis of the notion of coupling in the Linux kernel. In order to perform
Revision: final 80
Figure 40: Maintainability Index evolution for three Open Source project (Samoladas
et al.).
their new study, authors first refined the definition of coupling and defined differ-
ent expressions of it (e.g global variables inside the Linux kernel, global variables
outside the kernel, etc), by separating coupling into five categories and characterise
them as “safe” and “unsafe”. Then, they constructed an analysis technique and met-
ric for evaluating coupling and applied it to analyse the maintainability of the Linux
kernel.
The application of this classification of coupling in the Linux 2.4.20 kernel, showed
that for a total 99 global variables (common expression of coupling) there are 15.110
instances of them, of which 1.908 are characterised as “unsafe”. Along with the re-
sults from their previous study (the exponential growth of instances) they conclude
that the maintainability of the Linux kernel will face serious problems in the long
term.
A more recent paper from the same group compares the maintainability, as it
expressed by coupling, of the Linux kernel with that of the FreeBSD, OpenBSD and
NetBSD [YSC+ 06]. They applied the similar analysis as in [YSCO04] and compared
the performance of Linux against the BSD family (as statistical formal hypotheses).
Results showed that there Linux contains considerable more instance of common
coupling than the BSD family kernels, making it more difficult to maintain and fault
proness to changes. Authors suggest that the big difference between Linux and the
BSD family kernels indicates that it is possible to design a kernel without having a
lot of global variables and, thus, the Linux kernel development team does not take
into account maintainability so much.
A more recent study is that of Güneş Koru and Jeff Tian [KT05]. Here the two
Revision: final 81
authors try to correlate change proness and structural metrics, like size, coupling,
cohesion and inheritance metrics. They suggest, based on previous studies, that
change prone modules are also defect prone and these modules can be spotted by
measuring their structural characteristics. In short, authors measured two -large-
Open Source projects, namely Mozilla and OpenOffice, by using a large set of struc-
tural measures which fit into the categories mentioned before. The measurement
was done with the Columbus36 tool. In addition with the help of a custom made (by
them) Perl scripts, they counted the differences for each application from its imme-
diately preceding revision. As the smallest software unit they considered the class.
This measurement involved 800 KLOC and 51 measures for Mozilla and 2.700 KLOC
and 46 measures for OpenOffice.
With the results obtained they questions whether high change high change mod-
ules were the same as modules with the highest measurement values considering
each metric individually. They also tried to compare the results with an older similar
study of their own, a study that was conducted for six large scale projects in industry
(IBM and Nortel) In order to answer the questions they created appropriate statis-
tical, formal hypothesis and tests. The results showed that there is strong evidence
that modules, which had the most changes, did not have the highest measurement
values, a fact that was true for the previous industrial study. Authors also performed
a similar analysis, but with clustering techniques. The second analysis resulted in
the same statement, but also it pointed out that the high change modules were not
the modules with the highest measurement values but those with fairly high mea-
surement values.
The latter was the main outcome of the paper and, as authors indicate, the same
is true for the six industrial applications. Authors, trying to explain this, suggest
that this fact holds because expert programmers in Open Source take on the difficult
tasks and novice ones the easier ones. This might result in modules with the highest
structural measures, which solve complex tasks, not to be the most problematic
ones. Of course as they suggest this needs further investigation and is a central
issue in their future studies.
A very intersting paper, although not directly an Open Source code quality study,
is that of Gyimóthy, Ferenc and Siket [GFS05]. The study has as its main goal, the
validation of the Object Oriented Metrics Suite of Chidamber and Kemerer ( CK suite
- as described in section 1.3.1) with the help of open source software, not the assess-
ment of the quality of an Open Source software per se. Particularly they validated
the CK metrics suite with the help of a framework-metrics collection tool named
Columbus, which was mentioned previously, on an Open Source project, Mozilla. In
order to perform their analysis, except from using Columbus to extract the metrics,
they also collected information about bugs in Mozilla from the bugzilla database,
the system that Mozilla uses for bug reporting and tracking. The validation of the
36
http://www.frontendart.com
Revision: final 82
Figure 41: Changes in the mean value of CK metrics for 7 version of Mozilla (Gy-
imóthy et al.)
metrics was done with statistical methods such as logistic and linear regression, but
also with machine learning techniques, like decision trees and neural networks. The
latter techniques were used to predict fault proneness of the code.
The methodology followed can be summarized in:
1. Analysis and calculation of metrics from the Mozilla code
2. Application of the four techniques (logistic and linear regression, decision tress
and neural networks) to predict the fault proneness of the code
3. Analysis of the changes in the fault-proneness of Mozilla through seven ver-

sions using the results.
It (the methodology) is well described in the paper. As authors admit, the challenge
of the whole process was to associate the bugs from the Bugzilla database with the
classes found in the source code. This association was complicated and demanded a
lot of iterative work, it is described in the paper.
From the “pure” software engineering part of the study, the validation of the
metrics and the models predictiveness, the most interesting results is that the CBO
metric (Coupling Between Object classes) seems to be the best in predicting the
fault-proneness of classes. Someone is easy to notice that again the notion of cou-
pling is strongly related to bugs and, thus, to maintainability. This fact demands
further investigation and it has to be in our project’s research agenda. Regarding
the “Open Source” part of the study, authors observed a significant growth of the
values of 5 out of 7 CK metrics (the seventh is LCOMN - Lower of Cohesion on Meth-
ods allowing Negative Value, a metric not included in the CK metrics suite). Authors
assume that this happened because of big reorganization of the Mozilla source code
with version 1.2, causing this growth. Of course this justification needed further
investigation. Figure 11 the changes of metrics for the seven versions of the Mozilla
suite. To conclude, we could say that, although this study does not directly assesses
Revision: final 83
Open Source, it is a very good example of applying empirical software engineering

research.
3.3 F/OSS Community Studies in Mailing Lists

3.3.1 Introduction:
Free and Open Source Software (F/OSS) development not only exemplifies a vi-
able software development approach, but it is also a model for the creation of self-
learning and self-organising communities in which geographically distributed indi-
viduals contribute to build a particular software. The Bazaar model [Ray99], as
opposed to the Cathedral model of developing F/OSS has produced a number of
successful applications (eg. Linux, Apache, Mozilla, MySQL, etc). However, the ini-
tial phase of most F/OSS projects does not operated at the Bazaar level and only
successful projects make the transition from Cathedral to Bazaar style of software
development [Mar04].
Participants who are motivated by a combination of intrinsic and extrinsic mo-
tives congregate in projects to develop software on-line, relying on extensive peer
collaboration. Some project participants often augment their knowledge on coding
techniques by having access to a large code base. In many projects epistemic com-
munities of volunteers provide support services [BR03], act as distributing agents
and help newcomers or users. The F/OSS windfall is such that there is increased
motivation to understand the nature of community participation in F/OSS projects.
Substantial research on Open Source software projects focused on software repos-
itories such as mailing lists to study developer communities with the ultimate aim to
inform our understanding of core software development activities. Mundane project
activities which are not explicit in most developer lists have also received attention
[SSA06], [LK03a]. Many researchers focus on mailing lists in conjunction with other
software repositories [KSL03], [Gho04], [LK03a], [HM05]. These studies provided
great insight into the collaborative software development process that characterises
F/OSS projects. F/OSS community studies in mailing lists are important because on
one hand, one major technical infrastructure F/OSS projects require is mailing lists.
On the other hand, F/OSS projects are symbiotic cognitive systems where ongo-
ing interactions among project participants generate valuable software knowledge
- a collection of shared and publicly reusable knowledge - that is worth archiving
[SSA06]. One form of knowledge repository where archiving of public knowledge
takes place is the project’s mailing list.
3.3.2 Mailing Lists
Lists are active and complex living repositories of public discussions among F/OSS
participants on issues relating to project development and software use. They con-
Revision: final 84
tain ’software trails’-pieces left behind by the contributors of a software project

and are very important in educating future developers [GM03b] and non-developers
[SSA06] on the characteristics and evolution of the project and software. Generally,
a project will host many lists, each addressing a specifically area of need. For ex-
ample, software developers will consult developer lists, participants needing help
on documentation will seek links from lists associated with project documentation,
beginners or newbies will confer with mentors’ lists, etc. Fundamentally, two forms
of activities are addressed in lists;
• core activities typified by developing, debugging, and improving software.

Developer mailing lists are usually the avenues for such activities
• mundane activities [KSL03], [MFH02], [LK03a]. Documentation, testing, lo-

calisation, and field support exemplifies these activities and they take place
predominately in non-developer lists [SSA06]
However, expert software developers, project and package maintainers take part
in mundane activities in non-developer mailing lists. They interact with participants
and help answer questions others posted. Sometimes they encounter useful issues
which help them to further plan and improve code or overall software quality and
functionality. In addition, although mundane activities display a low level of innova-
tiveness, they are fundamental for the adoption of F/OSS [BR03].
3.3.3 Studying Community Participation in Mailing Lists: Research method-

ology
Compared to the traditional way of developing proprietary software, F/OSS devel-

opment has provided researchers with an unprecedented abundance of easily ac-
cessible data for research and analysis. It is now possible for researchers to obtain
large sets of data for analysis or to carry out what [Gho04] referred to as ’Inter-
net archaeology’ in F/OSS development. However, [Con06] remarked that collecting
and analysing F/OSS data has become a problem of abundance and reliability in
terms of storage, sharing, aggregation, and filtering of the data. F/OSS projects
employ different kinds of repositories for software development and collaboration.
From these repositories community activities can be analysed and studied. The fig-
ure below shows a methodology by which community participation in mailing lists
may be studied. The methodology shows F/OSS project selection, choice of software
repository and lists to analyse, data extraction schema, and data cleaning proce-
dure used to extract results for analysing community participation in developer and
non-developer mailing lists.
Mailing lists participants interact by exchanging email messages. A participant
posts a message to a list and may get a reply from another participant. This kind
of interaction represents a cycle where posters are continuously internalising and
Revision: final 85
Figure 42: Methodological Outline to Extract Data from Mailing Lists Archives. Mod-
ified from [SSA06] (p.1027).
Revision: final 86
externalising knowledge into the mailing lists. In any project’s mailing list, these
posters could assume the role of knowledge seekers and/or knowledge providers
[SSA06]. The posting and replying activities of the participants are two variables
that can be compared, measured and quantified. The affiliation an individual partic-
ipation has with others as a result of the email messages they exchange within the
same list or across lists in different projects could be mapped and visualised using
Social Network Analyses (SNA). For the construction of such an affiliation network
or ’mailing list network’ see ([SSA06], pp. 130-131).
Revision: final 87
4 Data Mining in Software Engineering
4.1 Introduction to Data Mining and Knowledge Discovery

The recent explosive growth of our ability to generate and store data has created
a need for new, scalable and efficient, tools for data analysis. The main focus of
the discipline of knowledge discovery in databases is to address this need. Knowl-
edge discovery in databases is the fusion of many areas that are concerned with
different aspects of data handling and data analysis, including databases, machine
learning, statistics, and algorithms. The term Data Mining is also used as a synonym
to Knowledge Discovery in Databases, as well as to refer to the techniques used for
the analysis and the extraction of knowledge from large data repositories. Formally,
data mining has been defined as the process of inducing previously unknown and
potentially useful information from databases.
4.1.1 Data Mining Process
The two main goals of data mining are the prediction and the description. The
prediction aims at estimating the future value or predicting the behaviour of some
interesting variables based on some other variables’ behaviour. The description is
concentrated on the discovery of patterns that represents the data of a complicated
database by a comprehensible and exploitable way. A good description could suggest
a good explanation of the data behaviour. The relevant importance of the prediction
and description varies for different data mining applications. However, as regards
the knowledge discovery, the description tends to be more important than the predic-
tion contrary to the pattern recognition and machine learning application for which
the prediction is more important. A number of data mining methods have been pro-
posed to satisfy the requirements of different applications. However, all of them
accomplish a set of data mining tasks to identify and describe interesting patterns
of knowledge extracted from a data set. The main data mining tasks are as follows:
• Unsupervised learning (Clustering). Clustering is one of the most useful tasks

in data mining process for discovering the groups and identifying interesting
distributions and patterns in the underlying data. The clustering problem is
about partitioning a given data set into groups (clusters) such that the data
points in a cluster are more similar to each other than points in different clus-
ters [JD88, KR90]. In the clustering process, there are no predefined classes
and no examples that would show what kind of desirable relations should be
valid among the data. That is why it is perceived as an unsupervised process
[BL96].
• Supervised learning (Classification). The classification problem has been stud-

ied extensively in the statistics, pattern recognition and machine learning com-
Revision: final 88
munity as a possible solution to the knowledge acquisition or knowledge ex-

traction problem [DH73] [WK91]. It is one of the main tasks in the data mining
procedure for assigning a data item to a predefined set of classes. According
to [FPSSR96], classification can be described as a function that maps (classi-
fies) a data item into one of the several predefined classes. A well-defined set
of classes and a training set of pre-classified examples characterise the clas-
sification. On the contrary, the clustering process does not rely on predefined
classes or examples [BL96]. The goal in the classification process is to induce
a model that can be used to classify future data items whose classification is
unknown.
• Association rules extraction. Mining association rules is one of the main tasks
in the data mining process. It has attracted considerable interest because the
rules provide a concise way to state potentially useful information that is eas-
ily understood by the end-users. Association rules reveal underlying “correla-
tions” between the attributes in the data set. These correlations are presented
in the following form: A → B , where A, B refer to sets of attributes in underly-
ing data.
• Visualisation of Data. It is the task of describing complex information through

visual data displays. Generally, visualisation is based on the premise that a
good description of an entity (data resource, process, patterns) will improve a
domain expert’s understanding of this entity and its behaviour.
4.2 Data mining application in software engineering: Overview

A large amount of data is produced in software development that software organisa-
tions collect in hope of extracting useful information from them and thus better un-
derstanding their processes and products. However, it is widely believed that large
amount of useful information remains hidden in software engineering databases.
Specifically, the data in software development can refer to versions of programs,
execution traces, error/bug reports, Open Source packages. Also mailing lists, dis-
cussion forums and newsletters could provide useful information about software.
Data mining provides the techniques to analyse and extract novel, interesting pat-
terns from data. It assists with software engineering tasks by better understanding
software artifacts and processes. Based on data mining techniques we can extract
relations among software projects and extracted knowledge. Then we can exploit
the extracted information to evaluate the software projects and/or predict software
behaviour. Below we briefly describe the main tasks of data mining and how they
can be used in the context of software engineering [MN99].
• Clustering in software engineering

The clustering produce a view of the data distribution. It can also be used to
Revision: final 89
automatically identify data outliers. An example of using data mining in soft-

ware engineering is to define groups of similar modules based on the number
of modifications and cyclomatic number metrics (the number of linearly inde-
pendent paths through a program’s source code).
• Classification
Classification is a function that maps (classifies) a data item into one of the sev-
eral predefined classes. One of the widely used classification techniques is the
decision trees. They can be used to discover classification rules for a chosen
attribute of a dataset by systematically subdividing the information contained
in this data set. Decision trees have been one of the tools that have been cho-
sen for building classification models in the software engineering field. Figure
43 shows a classification tree that has been built to provide a mechanism for
identifying risky software modules based on attributes of the module and its
system. Thus based on the given decision tree we can extract the following
rule that assists with making decision on errors in a module:
IF(# of data bindings > 10) AND (it is part of a non real-time system)
THEN
the module is unlikely to have errors
• Association rules in software engineering

Association discovery techniques discover correlations or co-occurrences of
events in a given environment. Thus it can be used to extract information from
coincidences in a dataset. Analysing for instance the logs errors discovered
at software modules in a system we can extract relations between inducing
events based on the software module features and errors categories. Such a
rule could be the following:
(large/small size, large/small complexity, number of revisions) → (interface er-
ror, missing or wrong functionality, algorithms or data structure error etc.)
A number of approaches has been proposed in literature which based on the above
data mining techniques aims to assist with some of the main software engineering
tasks, that is software maintenance and testing. We provide an overview of these
approaches in the following section. Also Table 2 summarises their main features.
4.2.1 Using Data mining in software maintenance
Data mining due to its capability to deal with large volumes of data and its efficiency
to identify hidden patterns of knowledge, has been proposed in a number of research
work as mean to support industrial scale software maintenance.
Revision: final 90
Figure 43: Classification tree for identifying risky software modules [MN99]
Analysing source code repositories

Data mining approaches have been extensively used to analyse source code version
repositories and thus assist with software maintenance and enhancement. Many of
these repositories are examined and managed by tools such as CVS (Concurrent Ver-
sion Systems). These tools store difference information access across document(s)
versions, identifies and express changes in terms of physical attributes, i.e., file and
line numbers. However, CVS does not identify, maintain or provide any change-
control information such as grouping several changes in multiple files as a single
logical change. Moreover, it does not provide high-level semantics of the nature
of corrective maintenance(e.g. bug-fixes). Recently, the interest of researchers
has been focused on techniques that aim to identify relationships and trends at a
syntactic-level of granularity and further associate high-level semantics from the in-
formation available in repositories. Thus a wide array of approaches that perform
mining of software repositories (MSR) have been emerged. They are based on data
mining techniques and aim to extract relevant information from the repositories,
analyse it and derive conclusions within the context of a particular interest. These
approaches based on [KCM05] can be classified based on:
• Entity type and granularity they use ( e.g. file, function, statement, etc).
• Expression and definition of software changes (e.g. modification, addition,

deletion, etc).
• Type of question (e.g. market-basket, frequency of a type of change, etc).
Revision: final 91
Technique Approach Input Output

Data mining
Classification [FLMP04] execution profiles & decision tree of
result(success/failure) failed executions
Clustering [KDTM06] source code Extract significant
behavioural or patterns from
structural entities, the system source code
attributes, groups of similar
metrics classes, methods, data
Association rules [ZWDZ04] software entities Prediction of
failures,
e.g.functions correlations
between entities
identification of additions,
modifications,deletions
of syntactic entities
Neural networks [LFK05] input, output variables a network producing
of software system sets for function testing
Differencing
Pattern extraction [WH05] source code, track of bugs
change history
Analysis of [RRP04] source code syntactic and
semantic graph repositories semantic changes
CVS Annotations
Semantic analysis [GHJ98] version history syntax &
of source code, semantic -
classes hidden dependencies
[GM03a] file & comments syntax &
semantic -
file coupling
Heuristic
[HH04] CVS annotation candidate entities
heuristics for change
Table 2: Mining approaches in software engineering
In the sequel, we introduce the main concepts used in MSR and then we briefly
present some of the most known MSR approaches proposed in literature.
Fundamental Concepts in MSR. The basic concepts with respect to MSR involve
Revision: final 92
the level of granularity of what type of software entity is investigated, the changes
and the underlying nature of a change. Then most widely used concepts can be
summarised to the followings:
• An entity, e, is a physical, textual or syntactic element in software. For example,

a file, line, function, class, comment, if-statement, etc.
• A change is a modification, addition, deletion, to or of an entity. A change

describes which entities are changed and where the change occurs.
• The syntax of a change is a concise and specific description of the syntactic

changes to the entity. This description is based on the grammar of the entities’
language. For instance, a condition was added to an if-statement; a parameter
was renamed; assignment statement was added inside a loop etc.
• The semantics of a change is a high level, yet concise description of the change
in the entity’s semantics or feature space. For instance, a class interface
change, bug fix, a new feature was added to GUI etc.
MSR via CVS annotations. One approach is to utilise CVS annotation information.
Gall et. al. [GHJ98] propose an approach for detecting common semantic (logical
and hidden) dependencies between classes on account of addition or modification
of particular class. This approach is based on the version history of the source
code where a sequence of release numbers for each class in which its changes are
recorded. Classes that have been changed in the same release are compared in
order to identify common change patterns based on author name and time stamp
from the CVS annotations. Classes that are changed with the same time stamp are
inferred to have dependencies.
Specifically, this approach can assist with answering questions such as which
classes change together, how many times was a particular class changed, how many
class changes occurred in a subsystem (files in a particular directory). An approach
that studies the file-level changes in software is presented in [Ger04a]. The CVS
annotations are utilised to group subsequent changes into what termed modification
request (MR). Specifically this approach focus on studying bug-MRs and comment-
MRs to address issues regarding the new functionality that may be added or the
bugs that may be fixed by MRs, the different stages of evolution to which MRs cor-
respond or identify the relation between the developer and the modification of files.
MSR via Data Mining. Data mining provides a variety of techniques with potential
application to MSR. One of these techniques are the association rules. The work
proposed by Zimmerman et al [ZWDZ04] exploit the association rules extraction
technique to identify co-occurring changes in a software system. For instance, we
want to discover relation between the modification of software entities. Then we aim
Revision: final 93
to answer the question when a particular source-code entity (e.g. a function A) is

modified, what other entities are also modified (e.g. the functions with names B and
C)? Specifically, a tool is proposed that parses the source code and maps the line
numbers to the syntactic or physical-level entities. These entities are represented
as a triple (filename, type, id). The subsequent entity changes in the repository are
grouped as a transaction. An association rule mining techniques is then applied to
determine rules of the form B, C → A. This techniques has been applied to open-
source projects with a goal of utilising earlier version to predict changes in the later
versions. In general terms, this technique enables the identification of additions,
modifications, deletions of syntactic entities without utilising any other external in-
formation. It could handle various programming languages and assists with detect-
ing hidden dependencies that cannot be identified by source code analysis.
MSR via Heuristics. CVS annotation analysis can be extended by applying heuris-
tics that include information from source code or source code models. Hassan et
al [HH04] proposed a variety of heuristics (developer-based, history-based, code-
layout-based (file-based)) which are then used to predict the entities that are candi-
dates for a change on account of a given entity being changed. CVS annotations are
lexically analysed to derive the set of changed entities from the source-code repos-
itories. Also the research in [ZWDZ04] and [HH04] use source-code version history
to identify and predict software changes. The questions that they answered are quite
interesting with respect to testing and impact analysis.
MSR via Differencing. Source-code repositories contain differences between ver-

sions of source code. Thus MSR can be performed by analysing the actual source-
code differences.
Such an approach that aims to detect syntactic and semantic changes from a version
history of C code is presented by Raghavan [RRP04]. According to this approach,
each version is converted to an abstract semantic graph (ASG) representation. This
graph is a data structure which is used in representing or deriving the semantics
of an expression in a programming language. A top-down or bottom-up heuristics-
based differencing algorithm is applied to each pair of in-memory ASGs. The dif-
ferencing algorithm produces an edit script describing the nodes that are added,
deleted, modified or moved in order to achieve one ASG from another. The edit
scripts produced for each pair of ASGs are analysed to answer questions from entity
level changes such as how many functions and functions calls are inserted, added or
modified to specific changes such as how many if statement conditions are changed.
Also in [CH04] a syntactic-differencing approach, which is called meta-differencing,
is introduced. It allows us to ask syntax-specific questions about differences. Accord-
ing to this approach the abstract syntax tree (AST) information is directly encoded
into the source code via XML format. Then we compute the added, deleted or mod-
Revision: final 94
ified syntactic elements based on the encoded AST. The types and prevalence of
syntactic changes can be easily computed. Specifically, the approach supports the
following questions:
i Are new methods added to an existing class?
ii Are there changes to pre-processor directives?
iii Was the condition in an if-statement modified?
According to the above discussion on MSR we can conclude that the types of ques-
tions that MSR can answer can be classified to two categories:
• Market-basket questions. These are formulated as :

IF A happens then what ELSE happens on a regular basis?
The answer to such a question is a set of rules or guidelines describing situation
of trends or relationships. This can be expressed as follows: if A happens then
B and C happen X amount of the time.
• Questions dealing with the prevalence or lack of a particular type or change.
The type of questions often addresses finding hidden dependences or relationships

which could be very important for impact analysis. MSR aims to identify the actual
impact set after an actual change. However, the MSR techniques often give a “best-
guess” for the change. Then the change may not explicitly be documented and thus
sometimes it must be inferred.
A clustering approach for semi-automated software maintenance

In [KDTM06] presents a framework for knowledge acquisition from source code
in order to comprehend an object-oriented system and evaluate its maintainability.
Specifically, clustering techniques are used to assist engineers with understanding
the structure of source code and assessing its maintainability. The proposed ap-
proach is applied to a set of elements collected from source code, including:
• Entities that belong either to behavioural (classes, member methods) or struc-

tural domain (member data).
• Attributes that describe the entities (such class name, superclass, method name
etc).
• Metrics used as additional attributes that facilitate the software maintainer to

comprehend more thoroughly the system under maintenance.
The above elements specifies the data input model of the framework. Another
part of the framework is an extraction process which aim to extract elements and
metrics from source code. Then the extracted information is stored in a relational
Revision: final 95
database so that the data mining techniques can be applied. In the specific approach,
clustering techniques are used to analyse the input data and provide a rough grasp
of the software system to the maintenance engineer. Clustering produces overviews
of systems by creating mutually exclusive groups of classes, member data, methods
based on their similarities. Moreover, it can assist with discovering programming
patterns and outlier cases (unusual cases) which may require attention.
Another problem that we have to tackle in software engineering is the correc-

tive maintenance of software. It would be desirable to identify software defects
before they cause failures. It is likely that many of the failures fall into small groups,
each consisting of failures caused by the same software defect. Recent research
has focused on data mining techniques which can simplify the problem of classifying
failures according to their causes. Specifically these approaches requires that three
types of information about executions are recorded and analysed: i)execution pro-
files reflecting the causes of the failures, ii) auditing information that can be used to
confirm reported failures and iii) diagnostic information that can be used in deter-
mining their causes.
Classification of software failures

A semi-automated strategy for classifying software failures is presented in [PMM+ 03].
This approach is based on the idea that if m failures are observed over some period
during which the software is executed, it is likely that these failures are due to a
substantially smaller number of distinct defects. Assume that F = {f1 , f2 , . . . , fm } is
the set of reported failures and that each failure is caused by just one defect. Then
F can be partitioned into k < m subsets F1 , F2 , . . . , Fk such that all of the failures in
Fi are caused by the same defect di , 1 ≤ i ≤ k . This partitioning is called the true
failure classification. In the sequel, we describe the main phases of the strategy for
approximating the true failure classification:
1. The software is implemented to collect and transmit to the development either

execution profiles or captured executions and then it is deployed.
2. Execution profiles corresponding to reported failures are combined with a ran-

dom sample of profiles of operational executions for which no failures were re-
ported. This set of profiles is analysed to select a subset of all profile features to
use in grouping related failures. A feature of an execution profile corresponds
to an attribute or element of it. For instance, a function call profile contains an
execution count for each function in a program and each count is a feature of
the profile. Then the feature selection strategy is as follows:
• Generate candidate feature-sets and use each one to create and train a
pattern classifier to distinguish failures from the successful executions.
• Select the features of the classifier that give the best results.
Revision: final 96
Figure 44: A clusters hierarchy
3. The profiles of reported failures are analysed using cluster analysis, in order to
group together failures whose profiles are similar with respect to the features
selected in phase 2.
4. The resulting classification of failures into groups is explored in order to con-

firm it or refine it.
The above described strategy provides an initial classification of software fail-

ures. Depending on the application and the user requirements these initial classes
can be merged or split so that the software failure are identified in an appropriate
fashion. In [FLMP04], two tree-based techniques for refining an initial classification
of failures are proposed and they are discussed below.
Refinement using dendrograms. A dendrogram is a tree-like diagram used to repre-

sent the results of hierarchical clustering algorithm. One of the strategies that has
been proposed in literature for refining initial failure clustering relies on dendro-
grams. Specifically, it uses them to decide how non-homogeneous clusters should be
split into two or more sub-clusters and to decide which clusters should be considered
for merging. A cluster in a dendrogram corresponds to a subtree that represents re-
lationships among the cluster’s sub-clusters. The more similar two clusters are to
each other, the farther away from the dendrogram root their nearest common an-
cestor is. For instance, based on the dendrogram presented in Figure 44we can
observe that the clusters A and B are more similar than the clusters C and D. A clus-
ter’s largest homogeneous subtree is the largest subtree consisting of failures with
the same cause. If a clustering is too coarse, some clusters may have two or more
Revision: final 97
Figure 45: Merging two clusters. The new cluster A contains the clusters repre-
sented by the two homogeneous sub-trees A1 and A2
large homogeneous subtrees containing failures with different causes. Such a clus-
ter should be split at the level where its large homogeneous subtrees are connected,
so that these subtrees become siblings as Figure 46 shows. If it is too fine, siblings
may be clusters containing failures with the same causes. Such siblings (clusters)
should be merged at the level of their parent as Figure 45 depicts.
Based on these definitions, the strategy that has been proposed for refining an
initial classification of failures using dendrograms has three phases:
1. Select the number of clusters into which the dendrogram will be divided.
2. Examine the individual clusters for homogeneity by choosing the two execu-
tions in the cluster with maximally dissimilar profiles. If the
selected executions have the same or related causes, it is likely that all of the
other failures in the cluster do as well. If the selected executions do not have
the same or related causes, the cluster is not homogeneous and should be split.
3. If neither the cluster nor its sibling is split by step 2, and the failures were
examined have the same cause then we merge them.
Clusters that have been generated from merging or splitting should be analysed
in the same way, which allow for recursive splitting or merging.
Refinement using classification trees. The second technique proposed by Francis et

al, relies on building a classification tree to recognise failed executions. A classifi-
cation tree is a type of pattern classifier that takes the form of binary decision tree.
Each internal node in the tree is labelled with a relational expression that compares
a numeric feature of the object being classified to a constant splitting value. On the
other hand, each leaf of the tree is labelled with a predicted value, which class of
interest the leaf represents.
Having the classification tree, an object is classified by traversing the tree from
the root to the leaf. At each step of the traversal prior to reach a leaf, we evaluate
the expression at the current node. When the object reaches a leaf, the predicted
value of that leaf is taken as the predicted class for that object.
Revision: final 98
Figure 46: Splitting a cluster: The two new clusters (subtrees with roots A11 and
A12) correspond to the large homogeneous subtrees in the old cluster.
In case of software failure classification problem, we consider two classes, that

is success and failure. The Classification And Regression Tree (CART) algorithms
was used in order to build the classification tree corresponding of software failures.
Assume a training set of execution profiles
L = {(x1 , j1 ), . . . , (xN , jN )}
where each xi represents an execution profile and ji is the result (success/failure)

associated with it. The steps of building the classification tree based on L are as
follows:
• The deviance of a node t ⊆ L is defined as
1 X 2
d(t) = ji − j(t))
Nt
where Nt is the size of t and j(t) is the average value of j in t.
• Each node t is split into two children tR and tL . The split is chosen that max-
imises the reduction in deviance. That is, from the set of possible splits S, the
optimal split is found by:
Nt Nt

∗
s = argmins∈S d(t) − L d(tR ) − L d(tL )
Nt Nt
• A node is declared a leaf node if d(t) ≤ β , for some threshold β .
• The predicted value for a leaf is the average value of j among the executions in
that leaf.
Revision: final 99
Analysing Bug Repositories

Source code repositories stores a wealth of information that is not only useful for
managing and building source code, but also a detailed log how the source code
has evolved during development. Information regarding the evidence of source code
refactoring will be stored in the repository. Also as bugs are fixed, the changes made
to correct the problem are recorded. As new APIs are added to the source code, the
proper way to use them is implicitly explained in the source code. Then, one of the
challenges is to develop tools and techniques to automatically extract and use this
useful information.
In [WH05], a method is proposed which uses data describing bug fixes mined
from the source code repository to improve static analysis techniques used to find
bugs. It is a two step approach where the source code change history of a software
project helps to refine the search for bugs.
The first step in the process is to identify the types of bugs that are being fixed in
the software. The goal is to review the historical data stored for the software project,
in order to gain an understanding of what data exists and how useful it may be in the
task of bug findings. Many of the bugs found in the CVS history are good candidates
for being detected by statistic analysis, NULL pointer checks and function return
value checks.
The second step is to build a bug detector driven by these findings. The idea is to
develop a function return value checker based on the knowledge that a specific type
of bug has been fixed many times in the past. Briefly, this checker looks for instances
where the return value from a function is used in the source code before being
tested. Using a return value could mean passing it as an argument to a function,
using it as part of calculation, dereferencing the value if it is a pointer or overwriting
the value before it is tested. Also, cases that return values are never stored by the
calling function are checked. Testing a return value means that some control flow
decision relies on the value.
The checker does a data flow analysis on the variable holding the returned value
only to the point of determining if the value is used before being tested. It simply
identifies the original variable the returned value is stored into and determines the
next use of that variable. If the variable during its next use is an operand to a
comparison in a control flow decision, the return value is deemed to be tested before
being used. If the variable is used in any way before being used in a control flow
decision, the value is deemed to be used before being tested. Also, a small amount
of inter-procedural analysis is performed in order to improve the results. It is often
the case that a return value will be immediately used as an argument in a call to a
function. In these cases, the checker determines if that argument is tested before
being used in the called function.
Moreover, the checker categorises the warnings it finds into one of the following
categories:
Revision: final 100

• Warnings are flagged for return values that are completely ignored or if the
return value is stored but never used.
• Warnings are also flagged for return values that are used in a calculation before
being tested in a control flow statement.
Any return value passed as an argument to a function before being tested is flagged,
as well as any pointer return value that is dereferenced without being tested.
However there are types of functions that lead the static analysis procedure to
produce false positive warnings. If there is no previous knowledge, it is difficult to
tell which function does not need their return value checked. Mining techniques for
source code repository can assist with improving static analysis results. Specifically
the data we mine from the source code repository and from the current version of
the software is used to determine the actual usage pattern for each function.
In general terms, it has been observed that the bugs catalogued in bug databases
and those found by inspecting source code change histories differ in type and level
of abstraction. Software repositories record all the bug fixed, from every step in
development process and thus they provide much useful information. Therefore, a
system for bug finding techniques is proved to be more effective when it automati-
cally mines data from source code repositories.
Mining the Source Code Repository

Williams et al. [WH05] proposes the use of analysis tool to automatically mine data
from the source code repository by inspecting every source code change in the repos-
itory. Specifically, they try to determine when a bug of the type they are concerned
with is fixed. A source code checker is developed (as described above) which is used
to determine when a potential bug has been fixed by a source code change. The
checker is run over both version of the source code. If, for a particular function
called in the changed file, the number of calls remain the same and the number of
warnings produced by the tool decreases, the change is said to fix a likely bug. If
we determine that a check has been added to the code, we flag the function that
produces the return value as being involved in a potential bug fix in a CVS commit.
The results of the mining is a list of functions that are involved in a potential bug fix
in a CVS commit.
The output of the function return value checker is a list of warnings denoting
instances in the code where a return value from a function is used before being
tested. A full description of the warning including the source file, line number and
category of the warning including the source file, line number and category of the
warning. Since there are many reasons that could lead a static analysis to produce a
large number of false positive warnings, the proposed tool provide a ranking of the
warnings. It tries to rank the warnings from least likely to most likely to be false
positive. The rank is done in two parts. First, the function are divided into those
Revision: final 101

that are involved in a potential bug fix in a CVS commit and those that are not. Next,
within each group, the functions are ranked by how often their return values are
tested before being used in the current version of the software.
4.2.2 A Data Mining approach to automated software testing
The evaluation of software is based on tests that are designed by software testers.
Thus the evaluation of test outputs is associated with a considerable effort by human
testers who often have imperfect knowledge of the requirements specification. This
manual approach of testing software results in heavy losses to the world’s economy.
Thus the interest of researchers has been focused on the development of automated
techniques that induces functional requirements from execution data. Data mining
approaches can be used for extracting useful information from the tested software
which can assist with the software testing. Specifically the induced data mining mod-
els of tested software can be used for recovering missing and incomplete specifica-
tions, designing a set of regression tests and evaluating the correctness of software
outputs when testing new releases of the system.
In developing a large system, the test of the entire application (system testing)
is followed by the stages of unit testing and integration testing. The activities of
system testing includes function testing, performance testing, acceptance testing
and installation testing. The function testing aims to verify that the system performs
its functions as specified in the requirements and there are no undiscovered errors
left. Thus a test set is considered adequate if it causes all incorrect versions of the
program to fail. It is then important that the selection of tests and the evaluation of
their outputs are crucial for improving the quality of the tested software with less
cost. Assuming that requirements can be re-stated as logical relationships between
input and outputs, test cases can be generated automatically by techniques such
as cause effect graphs [Pfl01] and decision tables [LK03b]. A software system in
order to stay useful has to undergo continual changes. Most common maintenance
activities in software life-cycle include bug fixes, minor modifications, improvements
of basic functionality and addition of brand new features.
The purpose of regression testing is to identify new faults that may have been
introduced into the basic features as a result of enhancing software functionality or
correcting existing faults. A regression test library is a set of test cases that run
automatically whenever a new version of software is submitted for testing. Such
a library should include a minimal number of tests that cover all possible aspects
of system functionality. A standard way to design regression test library is to iden-
tify equivalence classes of every input and then use only one value from each edge
(boundary) of every class. One of the main problems is the generation of a mini-
mal test suite which covers as many cases as possible. Ideally such a test suite can
be generated by a complete and up-to-date specification of functional requirements.
However, frequent changes make the original requirements specifications, hardly
Revision: final 102

Figure 47: An example of Info-Fuzzy Network structure [LFK05]
relevant to the new versions of software. Then to ensure effective design of new
regression test cases, one has to recover the actual requirements of an existing sys-
tem. Thus, a tester can analyse system specifications, perform structural analysis
of the system’s source code and observe the results of system execution in order to
define input-output relationships in tested software.
An approach that aims to automate the input-output analysis of execution data
based on a data mining methodology is proposed in [LFK05]. This methodology
relies on the info-fuzzy network (IFN) which has an ‘oblivious’ tree-like structure.
The network components include the root node, a changeable number of hidden
layers (one layer for each selected input) and the target (output) layer representing
the possible output values. The same input attribute is used across all nodes of a
given layer (level) while each target node is associated with a value (class) in the
domain of a target attribute. If the IFN model is aimed at predicting the values of
a continuous target attribute, the target nodes represent disjoint intervals in the
attribute range.
A hidden layer l, consists of nodes representing conjunctions of values of the first
l input attributes, which is similar to the definition of an internal node in a standard
decision tree. The final (terminal) nodes of the network represent non-redundant
conjunctions of input values that produce distinct outputs. Considering that the
network is induced from execution data of a software system, each interconnection
between a terminal and target node represents a possible output of a test case.
Figure 47 presents an IFN structure where the internal nodes include the nodes
(1,1), (1,2), 2, (3,1), (3,2) and the connect (1, 1) → 1 implies that the expected output
values for a test case where both input variables are equal to 1, is also 1. The
confectionist nature of IFN resembles the structure of a multi-layer neural network.
Therefore, the IFN model is characterised as a network and not as a tree.
A separate info-fuzzy network is constructed to represent each output variable.
Thus we present below the algorithm for building an info-fuzzy network of a single
Revision: final 103

output variable.
Network Induction Algorithm. The induction procedure starts with defining the tar-
get layer (one node for each target interval or class) and the “root” node. The root
node represents an empty set of input attributes which are selected incrementally
to maximise a global decrease in the conditional entropy of the target attribute. The
IFN algorithm is based on the pre-pruning approach unlike algorithms of building de-
cision trees such as CART and C4.5. Thus it assumes that when no attribute causes a
statistically significant decrease in the entropy, the network construction is stopped.
The algorithm performs discretisation of continuous input attributes “on-the-fly” by
recursively finding a binary partition of an input attribute that minimises the con-
ditional entropy of the target attribute [FI93]. The search for the best partition of
attribute is dynamic and it is performed each time a candidate input attribute. Each
hidden node in the network is associated with an interval of a discretised input at-
tribute. The estimated conditional mutual information between the partition of the
interval S at the threshold Th and the target attribute T given the node z is defined
as follows:
X X P (Sy ; Ct /S, z)
M I (Th ; T /S, z =) = P (Sy ; Ct ; z) · log
t=0,...,MT −1 y=1,2 P (Sy /S, z) · P (Ct /S, z)
where
• P (Sy ; Ct ; z) is an estimated conditional probability of a sub-interval Sy , given

the interval S and the node z .
• P (Ct /S, z) is an estimated conditional probability of a value Ct of the target

attribute T the interval S and the node z .
• P (Sy ; Ct ; z) is an estimated joint probability of a value Ct of the target attribute

T , a sub-interval Sy and the node z .
Then the statistical significance of splitting the interval S by the threshold Th at

the node z is evaluated using the likelihood-ratio statistic. A new input attribute
is selected to maximise the total significant decrease in the conditional entropy as
result of splitting the nodes of the last layer. The nodes of the new hidden layer are
defined as the Cartesian product of split nodes of the previous layer and the discre-
tised interval of the new input variables. If there is no input variable that decreases
the conditional entropy of the output variable then the network construction stops.
The IFN induction procedure is a greedy algorithm which is not guaranteed to find
the optimal ordering of input attributes. Though some functions are highly sensitive
to this ordering, alternative orderings will still produce acceptable results in most
cases.
Revision: final 104

An IFN-based environment for automated input-output analysis is presented in [LFK05].

The main modules of this environment are:
• Legacy system (LS). This module represents a program, a component or a sys-

tem to be tested in subsequent versions of the software.
• Specification of Application Inputs and Outputs (SAIO). Basic data on each in-
put and output variable in the Legacy System.
• Random test generator (RTG). This module generates random combinations of

values in the range of each input variable.
• Test bed (TB). This module feeds training cases generated by the RTG module
to the LS.
The IFN algorithm is trained on inputs provided by RTG and outputs obtained
from a legacy system by means of the Test Bed module. A separate IFN module is
built for each output variable. The information derived from each IFN model can be
summarised to the following:
• A set of input attributes relevant to the corresponding output.
• Logical (if... then ...) rules expressing the relationships between the selected
input attributes and the corresponding output. The set of rules appearing at
each terminal node represents the distribution of output values at that node.
• Discretisation intervals of each continuous input attribute included in the net-

work. Each interval represents an “equivalence” class, since for all values of a
given interval the output values conform to the same distribution.
• A set of test cases. The terminal nodes in the network are converted into test
cases, each representing a non-redundant conjunction of input values / equiva-
lence classes and the corresponding distribution of output values.
The IFN algorithm takes as input the training cases that are randomly generated
by the RTG module and the outputs produced by LS for each test case. The IFN
algorithm repeatedly runs to find a subset of input variables relevant to each out-
put and the corresponding set of non-redundant test cases. Actual test cases are
generated from the automatically detected equivalence classes by using an existing
testing policy.
4.3 Text Mining and Software Engineering

Software engineering repositories consists of text documents containing source code,
mailing lists, bug reports, execution logs. Thus the mining of textual artifacts is
Revision: final 105

requisite for many important activities in software engineering: tracing of require-

ments; retrieval of components from a repository; identify and predict software fail-
ures; software maintenance; testing etc.
This section describes the state of the art in text mining and the application of text
mining techniques in software engineering. Furthermore, a comparative analysis for
the text mining techniques applied in software engineering is provided, and future
directions are discussed.
4.3.1 Text Mining - The State of the Art
Text mining is the process of extracting knowledge and patterns from unstructured
document text. It is a young interdisciplinary research field under the wider area of
data mining engaged in information retrieval, machine learning and computational
linguistics. The methods deployed in text mining, depending on the application,
usually require the transformation of the texts into an intermediate structured rep-
resentation, which can be for example the storage of the texts into a database man-
agement system, according to a specific schema. In many approaches though, there
is gain into also keeping a semi-structured intermediate form of the texts, as for ex-
ample could be the representation of documents in a graph, where social analysis
and graph techniques can be applied.
Independently from the task objective, text mining requires preprocessing tech-
niques, usually levying qualitative and quantitative analysis of the documents’ fea-
tures. In Figure 48, the diagram depicts the most important phases of the prepro-
cessing analysis, as well as the most important text mining techniques.
Preprocessing assumes a preselected documents representation model, usually
the vector space model though the boolean and the probabilistic are other options.
According to the representation model, documents are parsed, and text terms are
weighted according to weighting schemes like the TF-IDF (Term Frequency - Inverse
Document Frequency), which is based on the frequency of occurrence of terms in
the text. Several other options are described in [Cha02, BYRN99]. Natural lan-
guage processing techniques are also applied, the state of the art of which is well
described in [Mit05, MS99]. Often, stop-words removal and stemming is applied.
In favour of the use of natural language processing techniques in text mining, it
has been shown in the past that the use of semantic linguistic features, mainly de-
rived from a language knowledge base like WordNet word thesaurus [Fel98], can
help text retrieval [Voo93] and text classification [MTV+ 05]. Furthermore, the use
of word sense disambiguation (WSD) techniques [IV98] is important in several natu-
ral language processing techniques and text mining tasks, like machine translation,
speech processing and information retrieval. Lately, state of the art approaches in
unsupervised WSD [TVA07, MTF04], have pointed the way towards the use of se-
mantic networks generated from texts, enhanced with semantic information derived
from word thesauri. These approaches are to be launched in the text retrieval task,
Revision: final 106

Preprocessing
Storage and Indexing
Feature Extraction
Structured Representation
Term Weighting
Boolean
Dimensionality Reduction
Vector
Text Keyword Characterization
Probabilistic
Natural Language Processing
Non-Overlapping Lists
Part of Speech Tagging
Proximal Nodes
Document Word Sense Disambiguation
Collection Summarization
Semi-Structured Representation
Graph
Phrase Detection
Link Analysis
Entity Recognition
Meta-data
Word Thesauri/Domain Ontologies
Text Mining
Processing
Clustering
Structured and/or
Classification semi-structured data
Retrieval Models/
Patterns/
Social Analysis Answers
Domain Ontology Evolution
Figure 48: Preprocessing, Storage and Processing of Texts in Text Mining
where it is expected that under certain circumstances the representation of texts as

semantic networks can improve retrieval performance.
Another important factor when tackling with unstructured text is the curse of
dimensionality. While tackling millions or even billion of documents, the respective
term space is huge and often prohibitive of applying any type of analysis or feature
extraction. Towards this direction, techniques that are based in singular value de-
composition, like latent semantic indexing, or removing of features with low scores
based on statistical weighting measures are levied. Several examples of such tech-
niques can be found in [DZ07].
Once feature extraction and natural language processing techniques have been
applied on the document collection, storage takes place with the use of techniques
like inverted indexing. Depending on the application of the text mining methods,
a semi-structured representation of documents, like in [TVA07, MTF04], might be
needed. In such cases, indexing of the respective information (i.e. node types, edge
types, edge weights) is useful.
The text mining techniques that are mentioned in Figure 49 are representative
Revision: final 107

and frequently used in many applications. For example, clustering has already been
used in information retrieval and it is already applied in popular web search engines,
like in Vivisimo 37 . Text classification is widely used in spam filtering. Text retrieval
is a core task with unrestricted range of applications varying from search engines
to desktop search. Social analysis can be applied when any type of links between
documents is available, like for example publications and references, or posts in fo-
rums and replies, and is widely used for authority and hubs detection (i.e. finding
the most important people in this graph). Finally domain ontology evolution is a task
where through the use of other text mining techniques, like clustering or classifi-
cation, an ontology describing a specific domain can be evolved and enhanced with
term features of new documents pertaining with the domain. This is really important
in cases where the respective domain evolves fast, prohibiting the manual update of
the ontology with new concepts and instances.
4.3.2 Text Mining Approaches in Software Engineering
Applying text mining techniques in software engineering is a real challenge, mostly

because of the perplexed nature of the unstructured text. Text mining in software
engineering has employed a wide range of text repositories, like document hierar-
chies, code repositories, bug report databases, concurrent versioning system logs
repositories, newsgroups, mailing lists and several others. Since the aim is to define
metrics which can lead to software assessment and evaluation, while the input data
is unstructured and unrestricted text, the text mining processes in software engi-
neering are hard to design and moreover to apply. The most challenging part is the
selection and the preprocessing of the input text sources, along with the design of a
metric that shall use one or more text mining techniques applied to these sources,
while in parallel shall oblige to the existing standards for software engineering met-
rics. Discussion of some of the most recent approaches within this scope follows,
while Figure 49 summarises the methods and their use.
In [BGD+ 06], they used as text input the Apache developer mailing list. Entity
resolution was essential, since many individuals used more than one alias. After con-
structing the social graph occurring from the interconnections between poster and
replier, they made a social network analysis and came to really important findings,
like the strong relationship between email activity and source code level activity.
Furthermore, social network analysis in that level revealed the important nodes (in-
dividuals) in the discussions. Though graph and link analysis were engaged in the
method, the use of node ranking techniques, like PageRank, or other graph process-
ing techniques like Spreading Activation, did not take place.
In [CC05] another text source has been used with the aim of predicting parts of
the source code that will be influenced by fixing future bugs. More precisely, for each
37
Publicly available at http://vivisimo.com/
Revision: final 108

Text Mining
Method Text Input Source Output
Technique
Weighting of OSS
Entity Resolution, participants,
E-mail archives of
[BGD+06] Social Network Relationship of e-mail
OSS software
Analysis activity and commit
activity
Patterns in the
development of large
[VT06] CVS repositories Text Clustering software projects
(history analysis,
major contributions)
Similarity between
CVS commit notes, new bug reports and
[CC05] Text Retrieval
Set of fixed bugs source code files -
Prediction
Text analysis,
CVS repositories, Predictions of source
[WK05] retrieval,
source code bugs
classification
OSSD Web
Text Extraction, Transformation of
Repositories (Web
Entity Resolution, data into process
[JS04] pages, mailing lists,
Social Network events, Ordering of
process entity
Analysis processing events
taxonomy)
Mailing lists, CVS Statistical measures
Text Summarization
[GM03] logs, Change Log for code changes and
and Validation
files developers
Figure 49: Summary of Recent Text Mining Approaches in Software Engineering
source file they used the set of fixed bugs data and the respective CVS commit notes
as descriptors. With the use of a probabilistic text retrieval model they measure the
similarity between the descriptors of each source file and the new bug description.
This way they predict probably future affected parts of code by bug fixing. Still, the
same method could have been viewed from a supervised learning perspective and
classification along with predictive modelling techniques, would have been a good
baseline for their predictions.
Following the same goal, in [WH05] they mined the CVS repositories to obtain
categories of bug fixes. Using a static analysis tool, they inspected every source
code change in the software repository and they predicted whether a potential bug
in the code has been fixed. These predictions are then ranked with the analysis of
the contemporary context information in the source code (i.e. checking the percent-
age of the invocations of a particular function where the return value is tested before
being used). The whole mining procedure is based on text analysis of the CVS com-
mit changes. They have conducted experiments on the Apache Web server source
code and the Wine source code, in which they showed that the mined data from the
softwares’ repositories produced really good precision and certainly better than a
baseline naive technique.
From another perspective, text mining has been used in software engineering to
validate the data from mailing lists, CVS logs, and change log files of Open Source
Revision: final 109

software. In [GM03a] they created a set of tools, namely SoftChange38 , that imple-
ments data validation from the aforementioned text sources of Open Source soft-
ware. Their tools retrieve, summarise and validate these types of data of Open
Source projects. Part of their analysis can mark out the most active developers of an
Open Source project. The statistics and knowledge gathered by SoftChange analysis
has not been exploited fully though, since further predictive methods can be applied
with regards to fragments of code that may change in the future, or associative anal-
ysis between the changes’ importance and the individuals (i.e. were all the changes
committed by the most active developer as important as the rest, in scale and in
practice?).
Text mining has also been applied in software engineering for discovering devel-
opment processes. Software processes are composed of events such as relations of
agents, tools, resources, and activities organised by control flow structures dictat-
ing that sets of events execute in serial, parallel, iteratively, or that one of the set
is selectively performed. Software process discovery takes as input artifacts of de-
velopment (e.g. source code, communication transcripts, etc) and aims to elicit the
sequence of events characterising the tasks that led to their development. In [JS04]
an innovative method of discovering software processes from open source software
Web repositories is presented. Their method contains text extraction techniques,
entity resolution and social network analysis, and it is based in process entity tax-
onomies, for entity resolution. Automatic means of evolving the taxonomy using text
mining tasks could have been levied, so as for the method to lack strict dependency
from the taxonomy’s actions, tools, resources and agents. An example could be text
clustering on the open software text resources and extraction of new candidate items
for the taxonomy arising from the clusters’ labels.
Text clustering has also been used in software engineering, in order to discover
patterns in the history and the development process of large software projects. In
[VT06] they have used CVSgrab to analyse the ArgoUML and PostgreSQL reposito-
ries. By clustering the related resources, they generated the evolution of the projects
based on the clustered file types. Useful conclusions can be drawn by careful man-
ual analysis of the generated visualised project development histories. For example,
they discovered that in both projects there was only one author for each major ini-
tial contribution. Furthermore, they came to the conclusion that PostgreSQL did not
start from scratch, but was built atop of some previous project. An interesting evo-
lution of this work could be a more automated way of drawing conclusions from the
development history, like for example extracting clusters labels, map them to tax-
onomy of development processes and automatically extract the development phases
with comments emerging from taxonomy concepts.
38
Publicly available at http://sourcechange.sourceforge.net/
Revision: final 110

4.4 Future Directions of Data/Text Mining Applications in Soft-

ware Engineering
Defining software engineering metrics with the use of text mining can be no different
process from following the existing standards for defining direct or indirect metrics
for evaluating software using any background knowledge. The IEEE Standard 1061
[IEE98] defines a methodology for developing metrics for software quality attributes.
A framework for evaluating proposed metrics in software engineering, according to
the IEEE 1061 Standard is discussed in [KB04a]. The latter refers to ten questions
that need to be answered when defining software evaluation measures.
Though any design and implementation of a method using text mining for soft-
ware evaluation must follow the aforementioned and/or related standards, there is
a common place in how differently, aside or atop of the described techniques, can
text mining be used in future directions. A short description of issues that would be
interesting to address in the context of this project follows.
• Social network analysis, for the purposes of discovering the important cluster
of individuals in a software project, through using more sophisticated graph
processing techniques, like PageRank or Spreading Activation.
Actually social net analysis is a set of algorithms that exist long ago having
been applied in other context. The ‘future direction’ is to extend and apply
them in the context of SQO-OSS - aiming at ranking relevant entities appearing
in software development.
• Supervised Learning approaches, like text classification, based on predictive

modelling techniques, for the purposes of predicting future bugs and/or possi-
bly affected parts of code. A measure of future influence of bugs in the source
code, associated with a weight and a prediction ranking can show a lot for the
software quality.
• Text clustering of the bug reports, and cluster’s labelling can be used to auto-
matically create a taxonomy of bugs in the software. Metrics in that taxonomy
can be defined to show the influence of generated bugs belonging in a cate-
gory of bugs, to other categories. This can also be translated as a metric of bug
influence across the software project.
• Graph mining techniques to detect hidden structures in a OSS(Open Source

Software) project. A complex graph can be created based on functions’ rela-
tions as defined by the function calls in a project. Then a program execution is
a path in this graph. Using graph mining techniques (link analysis algorithms,
min-cut algorithms), we could derive correlations of paths leading to errors;
predict software behaviour assuming first k steps; statistically analyse large
number of paths and make decisions.
Revision: final 111

Also we can assume graphs created from the existing OSS software and the
communication data. This implies a graph G(V, E), where V = node/ node
represents user, E = edge/ edge: e.g. email exchange.
Applying mining techniques we can extract useful information from the graph
and predict individual actions (i.e. what/when will be the next action of q user)
and calculate aggregate measures regarding the software quality.
Revision: final 112

5 Related IST Projects

This section contains information about related IST projects. The following list was
taken from the draft agenda of the Software Technologies Concertation Meeting, 25
September 2006, Brussels. The projects are presented in an alphabetical order.
5.1 CALIBRE
CALIBRE was an EU FP6 Co-ordination Action project that involved the leading au-
thorities on libre/Open Source software. CALIBRE brought together an interdis-
ciplinary consortium of 12 academic and industrial research teams from France,
Ireland, Italy, the Netherlands, Poland, Spain, Sweden, the UK and China.
The two-year project managed to:
• Establish a European industry Open Source software research policy forum
• Foster the effective transfer Open Source best practice to European industry
• Integrate and coordinate European Open Source software research and prac-
tice
CALIBRE aimed to coordinate the study of the characteristics of open source soft-
ware projects, products and processes; distributed development; and agile meth-
ods. This project integrated and coordinated these research activities to address
key objectives for open platforms, such as transferring lessons derived from open
source software development to conventional development and agile methods, and
vice versa.
CALIBRE also examined hybrid models and best practices to enable innovative
reorganisation of both SMEs and large institutions, and aimed to construct a com-
prehensive research road-map to guide future Open Source software research. To
secure long-term impact, an important goal of CALIBRE was to establish a Euro-
pean Open Source Industry Forum, CALIBRATION, to coordinate policy making into
the future. The CALIBRATION Forum and the results of the CALIBRE project were
disseminated through a series of workshops and international conferences in the
various partner countries.
The first public deliverable of CALIBRE was to present an initial gap-analysis of
the academic body of knowledge of Libre Software, as represented by 155 peer-
reviewed research artifacts. The purpose of this work was to support the wider
CALIBRE project goal of articulating a road map for Libre Software research and
exploitation in a European context. For the gap-analysis, a representative collection
of 155 peer-reviewed Libre Software research artifacts was examined and it was
attempted to answer three broad questions about each:
• Who are we (the academic research community) looking at?
Revision: final 113

• What questions are we asking? industry
• How are we trying to find the answers?
The artifacts were predominantly research papers published in international jour-

nals or peer reviewed anthologies, and/or presented at international conferences.
The papers were discovered through citation indices (e.g. EBSCO, Science-Direct,
ACM Portal) and through recursion using the references cited within papers. Peer-
review was the key criteria for inclusion, as this represented the official body of
knowledge, however two particularly influential non-reviewed books [DOS99, Ray01]
were also included.
In the second publicly available report of CALIBRE, the development model of Li-
bre software was addressed. This report described what the research community has
learnt about those models, and the implications for future research lines that those
lessons have. Among the different research approaches applied to understanding
Libre software, there was a focus on the empirical study of Libre software develop-
ment, based on quantitative data, usually available from the public repositories of
the studied projects. In the report, the peculiarities of Libre software development
from a research perspective were also studied, concluding that it is quite an inter-
esting field in which to apply the traditional scientific methodology, thanks to this
wealth of public data, covering large parts of the development activities and results.
From this standpoint, the early and current research was reviewed, offering a sam-
ple of the most interesting and promising results, the tools, approaches and method-
ologies used to reach those, and the current trends in the research community. The
report ended with two chapters summarising the most important implications of the
current research for the main actors of Libre software development (Libre software
developers themselves, companies interested in Libre software development, and
the software industry in general), and a road-map for the future on this field. This
report was not considered as a set of proven recommendations and forecasts. On the
contrary, it intended to be a starting point for discussion, trying to highlight those
aspects more relevant to its authors, but for sure missing many other of equal (or
larger) interest.
In the third deliverable of CALIBRE there was a focus on complexity as a major
driver of software quality and costs, both in the traditional sense of software com-
plexity and in the sense of complexity theory. The analysis of a benchmark database
of 10 large Libre & open-source projects, suggested that:
• Risk evaluations could adequately supplement cost estimations of Libre soft-

ware products
• Maintenance teamwork seems to be generally correlated with complexity met-

rics in large Libre software projects
Revision: final 114

• Libre software projects can be categorised first between small (I-Mode) and
large (C-Mode) projects in the context of an entrepreneurial analysis of Li-
bre software, and second thanks to a dynamic and open meta-maintenance fo-
rum which would provide a standard quality assessment model to all software-
enabled industries, and specially to the secondary software sector
Another deliverable of CALIBRE was to present an overview of the field of dis-

tributed development of software systems and applications (DD). Based on an anal-
ysis of the published literature, including its use in different industrial contexts,
the document provided a preliminary analysis which established the basic charac-
teristics of DD in practice. The analysis resulted in a framework that structured
existing DD knowledge by focusing on threats to communication, coordination and
control caused by temporal distance, geographical distance, and socio-cultural dis-
tance. The purpose of this work was to support the wider CALIBRE project goal of
articulating a road-map for DD in relation to Libre Software research and exploita-
tion in a European context. Ultimately, this road-map would form a partial basis
for the development of the next generation software development paradigm, which
would integrate DD, Libre software and agile methods.
The next deliverable of this project provided an analysis of the process dimension
for distributed software development. This included an investigation of a number of
company case studies in various contexts, and presented a reference model for suc-
cessful distributed development. This model was tailored for distributed scenarios
in which time differences are low, as is the case in intra-EU collaborations. The
study was broadened to consider strategies for successful Libre (Free/Open Source)
software development, and then consider the technology dimension of distributed
development. This deliverable was positioned with respect to a road-map for re-
search in the domain of Libre software development.
The establishment of this research road-map was the objective of the next deliver-
able. This started with a discussion of some of the tensions and paradoxes inherent in
FOSS generally, and which served as the engine driving the phenomenon. Then the
emergent OSS 2.0 was characterised in terms of the tensions and paradoxes that are
associated with it. Furthermore, a number of business strategies that underpin OSS
2.0 were identified. To exemplify the industrial impact of the phenomenon six inter-
views with leading industrial partners using Libre/OSS in different vertical domains
were presented, forming a series of industrial viewpoints. Following this the discus-
sion of the impact of OSS 2.0 was presented for the IS development process, and its
wider implications for organisational and societal processes more generally. Finally,
this document concluded with a road-map for European research on Libre/OSS sum-
marising and highlighted the history of Free/Libre/OSS, the current status and the
areas where more research is needed.
Agile Methods (AMs) was the focus of another public deliverable of CALIBRE.
AMs have grown very popular in the last few years and so has Libre Software. Both
Revision: final 115

AMs and Libre Software push for a less formal and hierarchical, and more human-
centric development, with a major emphasis on focusing on the ultimate goal of de-
velopment -producing the running system with the correct amount of functionalities.
This deliverable presented an attempt to deepen the understanding of the analogies
between the two methods and to identify how such analogies may help in getting
a deeper understanding of both. The relationships were analysed theoretically and
experimentally, with a final, concrete case study of a company adopting both the XP
development process and Libre Software tools.
Other deliverables of CALIBRE reported on the groundwork for future research
within the CALIBRE project, leading towards the overall project goal of articulating
a road-map for Libre Software in the European context. The research was shaped by
the concerns expressed by the CALIBRE industry partners in the various CALIBRE
events to-date. Specifically, industry partners, notably Paul Everett of the Zope Eu-
rope Association (ZEA) have identified that the primary challenge for Libre software
businesses was effectively delivering the whole product in a manner that takes ac-
count of, and in fact leverages, the unique business model dynamics associated with
Libre software licensing and processes. The document described a framework for
analysing Libre software business models, an initial taxonomy of model categories,
and a discussion of organisational and network agility based on ongoing research
within the ZEA membership.
Another deliverable of the CALIBRE project presented a selection of product and
process metrics defined in various suites, frameworks and categorisations to time.
Each metric was analysed for citations and applications to both agile and Libre devel-
opment approaches. Opportunities for migration and knowledge transfer between
these areas were stressed and outlined. The document also summarised product
maturity models available for Open Source software and emphasises the need for
alternative approaches to shaping Open Source Process maturity models.
CALIBRE project has produced the CALIBRE Working Environment (CWE). As
a result, a deliverable described the first version of the CALIBRE Working Envi-
ronment (CWE). The requirements for the system were described, and the way in
which the CWE addresses these requirements was identified. The CWE require-
ments were identified collaboratively, in consultation with its users, and the system
as it stands largely meets the needs of the users. The software and hardware used to
implement the CWE was described, and areas for further work were identified. The
current CWE is located at http://hemswell.lincoln.ac.uk/calibre/ and allows
registered members to prepare content, with varying levels of dissemination (pub-
lic, restricted to registered members and private), upload documents and files, add
events to a shared calendar and archive mailing list information.
The last publicly available deliverable of CALIBRE focused on Education and
training on Libre (Free, Open Source) software. In this report, a scenario which
could be considered as the second generation in Libre software training was pre-
Revision: final 116

sented: the compendium of knowledge and experiences needed to deal with the
many facets of the Libre software phenomenon. For this goal, higher education was
considered as the best possible framework. The main guidelines of such a program
on Libre software were proposed. In summary, the studies designed in this report
were aimed at providing students with the knowledge and expertise that would make
them expert in Libre software. The programme provided capabilities and enhances
skills to the point that students can deal with problems ranging from the legal or
economic areas to the more technically oriented ones. It did not (intentionally) focus
on a set of technologies, but approached the Libre software phenomenon from an
holistic point of view. However, it was also designed to provide practical and real
world knowledge. It could be offered jointly by several universities across Europe,
within the framework of the ESHE, or adapted to the specific needs of a single one.
In addition, it could also be adapted for non-formal training.
5.2 EDOS
EDOS stands for Environment for the development and Distribution of Open Source
software. This is a research project funded by the European Commission as a STREP
project under the IST activities of the 6th Framework Programme. The project in-
volves universities - Paris 7, Tel Aviv, Zurich and Geneva Universities -, research
institutes - INRIA - and private companies - Caixa Magica, Nexedi, Nuxeo, Edge-IT
and CSP Torino.
The project aims to study and solve problems associated with the production,
management and distribution of Open Source software packages. Software pack-
ages are files in the RPM or Debian packaging format that contain executable pro-
grams or libraries, their files, along with metadata describing what’s in the package
and what conditions are needed to use it.
There are several problems associated with software packages.
• Dependencies: Software packages may need other software packages to run,

and often they don’t tell exactly what other packages they need but leave a
large room for choice. Also, some software packages cannot be installed at the
same time. This makes the job of tools that automatically download required
software packages difficult. Distribution maintainers want to make sure that
there is always a way of selecting available packages to correctly install ev-
ery piece of software they include, and that users can upgrade their systems
without loosing functionality. Work package 2 handled these issues.
The stated goal of EDOS Work package 2 was:
To build new generation tools for managing large sets of software

packages, like those found in Free software distributions, using for-
mal methods
Revision: final 117

The focus was mainly on the issues related to dependency management for
large sets of software packages, with a particular attention to what must be
done to maintain consistency of a software distribution on the repository side,
as opposed to maintaining a set of packages on a client machine. This choice
is justified by the fact that maintaining the consistency of a distribution of soft-
ware packages is essential to make sure the current distributions will scale
up, yet it is also an invisible task, as the smooth working it will ensure on the
end user side will tend to be considered as normal and obvious as the smooth
working of routing on the Internet. In other words, the project was tackling
an essential infrastructure problem, which was perfectly suited for an Euro-
pean Community funded action. Over the first year and a half of its existence,
Work Package 2 team of the EDOS project has done an extensive analysis of the
whole set of problems that are in its focus, ranging from upstream tracking, to
thinning, rebuilding, and dependency managements for F/OSS distributions.
• Downloading: Users need to download software packages from somewhere.

This requires a lots of bandwidth and puts strains on mirrors that host those
packages. This problem would be better solved with peer-to-peer methods.
Work package 4 handles these issues.
The goal of this work package is to investigate scalable and secure solutions to
improving the process of distributing data (source code, binaries, documenta-
tion and meta-data) to end-users. The key issue in the code distribution process
is the ability to transfer a large sized code base to a large number of people. In
the case of Mandrake linux, for instance, this entails copying a code base of 20
Gigabytes to a community containing up to 4 million users (i.e. the number of
installed versions of Mandrake linux). This community is growing so the prob-
lems have to be addressed. Currently the process is quite slow, as it takes 48
hours to copy from a master server to all mirror servers. This creates a latency
problem that leads to inconsistencies at the user and developer side. This in
turn can create awkward dependencies at the module level in future releases.
This work package will test and evaluate two alternative architectures for data
distribution that address the issue of latency and consistency.
• Quality assurance: The complexity of the quality assurance process increases

exponentially with the number of packages and the number of platforms. To
maintain the workload manageable, Linux distribution developers are forced to
reduce system quality, reduce the number of packages, or accept long delays
before final releases of high quality system. Work package 3 handles these
issues.
The goal of the work package is to research and experiment solutions which
will ultimately allow to dramatically reduce the costs and delays of quality as-
surance in the process of building an industry grade custom GNU/Linux distri-
Revision: final 118

bution or custom application comprising several. It will design, implement and

experiment an integrated quality assurance framework based on code analysis
and runtime tests, which operates at the system level.
• Metrics: Following the “release early, release often” philosophy, Free and Open
Source software is always in constant development and any serious project has
many versions floating around : older but stable versions, and newer versions
with new features but with more bugs. Free software can be of wildly varying
quality. Quality metrics are defined, their relevance is assessed and they are
implemented. Work package 5 handles these issues.
The goal of work package 5 is to develop technology and products that will
improve the efficiency of two key processes and one system. The two processes
are the generation of a new version of a distribution from the previous version
and the production of a customised distribution from an existing one. The
system is the current inefficient mechanism of mirroring the Cooker data that
needs to be replaced by a more efficient system. In the end, a demonstration
that the processes have indeed been improved and the system will take place.
Thus, the goal is to define a set of metrics to measure the efficiency of the
processes in question. These metrics will include man power as measured in
man months and elapsed time.
The EDOS project attempts to solve those problems by using formal methods
coming from the academic research groups in the project, to address in a novel way
three outstanding problems:
• Dependency management among large, heterogeneous collections of software

packages.
• Testing and QA for large, complex software systems.
• The efficient distribution of large software systems, using peer-to-peer and dis-
tributed data-base technology.
These problems were studied and various technical reports were produced ex-
plaining their importance and giving ways of mathematically expressing them, algo-
rithms for solving associated problems and real-world statistics. A certain amount
of software was also produced which is, of course, Free and Open Source :
• Java software for the peer-to-peer distribution of software packages. debcheck/rpmcheck

is a very efficient piece of Ocaml software for verifying that a Debian or RPM
collection of packages does not contain non-installable packages.
Revision: final 119

• The day-to-day evolution of the Debian packages, that is, its detailed history,
can be browsed using anla. This also gives, for every day, reports on instal-
lable software packages and a global installability index for every day (Debian
weather).
• That history can be queried in the EDOS-designed Debian Query Language

using the command-line tool history or the AJAX-based EDOS Console.
• Ara is a search engine for Debian packages that allows arbitrary boolean com-
binations of field-limited regular-expressions, and that ranks results by popu-
larity (again in Ocaml)
5.3 FLOSSMETRICS
FLOSSMetrics stands for Free/Libre Open Source Software Metrics.
Industry, SMEs, public administrations and individuals are increasingly relying
on Libre (Free, Open Source) software as a competitive advantage in the globalis-
ing, service-oriented software economy. But they need detailed, reliable and com-
plete information about Libre software, specifically about its development process,
its productivity and the quality of its results. They need to know how to benchmark
individual projects against the general level. And they need to know how to learn
from, and adapt, the methods of collaborative, distributed, agile development found
in Libre software to their own development processes, especially within industry.
FLOSSMETRICS addresses those needs by analysing a large quantity (thousands)
of Libre software projects, using already proven techniques and tools. This analy-
sis will provide detailed quantitative data about the development process, develop-
ment actors, and developed artifacts of those projects, their evolution over time, and
benchmarking parameters to compare projects. Several aspects of Libre software
development (software evolution, human resources coordination, effort estimation,
productivity, quality, etc.) will be studied in detail. The main objective of FLOSS-
METRICS is to construct, publish and analyse a large scale database with informa-
tion and metrics about Libre software development coming from several thousands
of software projects, using existing methodologies, and tools already developed. The
project will also provide a public platform for validation and industrial exploitation
of results.
The FLOSSMetrics targets are to:
• Identify and evaluate sources of data and develop a comprehensive database

structure, built upon the results of CALIBRE (WP1, WP2).
• Integrate already available tools to extract and process such data into a com-
plete platform (WP2).
Revision: final 120

• Build and maintain an updated empirical database applying extraction tools to

thousands of open source projects (WP3).
• Develop visualisation methods and analytical studies, especially relating to

benchmarking, identification of best practices, measuring and predicting suc-
cess and failure of projects, productivity measurement, simulation and cost/effort
estimation (WP4, WP5, WP6, WP11).
• Disseminate the results, including data, methods and software (WP7).
• Provide for exploitation of the results by producing an exploitation plan, vali-

dated with the project participants from industry especially from an SME per-
spective (WP8, WP9, WP10).
The main results of FLOSSMETRICS will be: a huge database with factual details
about all the studied projects; some higher level analysis and studies which will help
to understand how Libre software is actually developed; and a sustainable platform
for continued, publicly available benchmarking and analysis beyond the lifetime of
this project. With these results, European industry, SMEs, as well as public adminis-
trations and individuals will be able to take informed decisions about how to benefit
from the competitive advantage of Libre software, either as a development process
or in the evaluation and choosing of individual software applications. The project
methodologies and findings go well beyond Libre software with implications for evo-
lution, productivity and development processes in software and services in general.
FLOSSMETRICS is scheduled in three main phases (running partially in parallel).
The first one will set up the infrastructure for the project, and the first version of the
database with factual data. During the second phase most of the studies and analysis
will be performed, and the contents of the database will be enlarged and improved.
During the third phase the results of the project will be validated and adapted to the
needs of the target communities.
The usability of the results of the project (datasets and studies) will be targeted
to several different users: SMEs developing or using Libre software (or even in-
terested in it), industrial players developing Libre software, and the Libre software
community at large. Based on the feedback obtained in these contexts, a complete
exploitation strategy will also be designed.
Dissemination to these communities will be performed using the project website,
specific presentations at conferences, and by organising a series of workshops. Wide
impact of the results will be supported by using open access licenses for all output
documents.
The data is also expected to be useful for the scientific community, which could
use it for their research lines, thus helping to improve the general understanding of
Libre software development.
Revision: final 121

The impact of the project is expected to be large in the Libre software develop-
ment realm (and in the whole software development landscape). FLOSSMETRICS
will produce the most complete and detailed view of the current landscape of Libre
software, providing not only a static snapshot of how projects are performing now,
but also historical information about the last ten years of Libre software develop-
ment.
5.4 FLOSSWORLD
Free Libre and Open Source Software - Worldwide Impact Study The FLOSS-
World project aims to strengthen Europe’s leadership in research into FLOSS and
open standards, building a global constituency with partners from Argentina, Brazil,
Bulgaria, China, Croatia, India, Malaysia and South Africa. So far, FLOSSWORLD is
a European Union funded project involving 17 institutions from 12 countries span-
ning Europe, Africa, Latin America and Asia, to undertake a worldwide study on the
impact of select issues in the context of Free/Libre Open Source Software (FLOSS).
Context Free/Libre/Open Source Software (FLOSS) is arguably one of the best

examples of open, collaborative, internationally distributed production and develop-
ment that exists today, resulting in tremendous interest from around the world, from
government, policy, business, academic research and developer communities.
The problem Empirical data on the impact of FLOSS, its use and development
is still quite limited. The FP5 FLOSS project and FP6 FLOSSPOLS project have
helped fill in the gaps in knowledge about why and how FLOSS is developed and
used, but have necessarily been focused on Europe. FLOSS is a global phenomenon,
particularly relevant in developing countries, and thus more knowledge on FLOSS
outside Europe is needed.
Project objectives FLOSSWorld primarily aims to strengthen Europe’s leadership

in international research in FLOSS and open standards, and to exploit research and
policy complementarities to improve international cooperation, by building a global
constituency of policy-makers and researchers. It is expected that FLOSSWorld will
enhance Europe’s leading role in research in the area of FLOSS and strongly em-
bed Europe in a global network of researchers and policy makers, and the business,
higher education and developer communities. FLOSSWorld will enhance the level
of global awareness related to FLOSS development and industry, human capacity
building, standards and interoperability and e-government issues in the geographi-
cal regions covered by the consortium. The project will result in a stronger, sustain-
able research community in these regions. Broad constituency-building exercises
risk losing momentum after initial workshops and meetings without specific actions
Revision: final 122

to sustain a focus. FLOSSWorld will perform three global empirical studies of proven
relevance to Europe and third countries, which will provide a foundation for FLOSS-
World’s regional and international workshops. The studies will cover topics such
as impact of being in a FLOSS community on career growth and prospects, motiva-
tional factors in choice of FLOSS, perspectives from user community towards FLOSS,
inter-regional differences in FLOSS development methodology, etc.
A four track approach FLOSSWorld is designed around three research tracks,

each providing insights and gathering empirical evidence on important aspects of
FLOSS usage and development:
1. Human capacity building: investigating FLOSS communities as informal skills

development environments, with economic value for employment
2. Software development: spotting the regional and international differences -

technical, organisational, business - between FLOSS projects across countries
3. e-Government policy: reporting adopted policies and behaviour of governments

around the world towards FLOSS, open standards and interoperability
4. Workshops and working group activities to build an international research and

policy development constituency: Following and in parallel with the research
tracks will be a fourth track, for regional and international workshops and fo-
cused working groups from the represented target regions for building further
collaboration.
The first phase focuses on actual collaboration by implementing tasks 1 to 3,

thus, the second phase focuses on analysis and building concrete future collabora-
tions. Global dissemination is part of the second track, as is the engagement of
organisations outside the FLOSSWorld consortium.
Schedule
FLOSSWorld is funded by 6th Framework Programme and it is a 2-year project.
In the following table, there is the schedule of the project.
Goals of Workshops During workshops all consortium partners (17 in all) are
brought together with additional participants from their countries, and observers
from the organisations listed as having provided letters of support to the FLOSS-
World project. Workshop participants are experts representing the interests of the
Open Source community, government, businesses, researchers and higher education
institutes, as appropriate for the workshop questions. Some participants will take a
more active role as specific questions are addressed, but in principle all the three
research tracks will be treated in each single workshop.
Revision: final 123

Date Action Subject Place

1/05/2005 Start
Nov 05 - Mar 2006 1st regional work- Discuss research Buenos Aires, Bei-
shops questions, jing,
interact Mumbai, Sofia
(Bulgaria), Nairobi
(Kenya)
26/04/2006 – 1st International Brussels, Belgium
28/04/2006 Workshop
Nov 2005 – Jul 2006 On –going survey
and study
Aug 2006 – Sep Analysis
2006
Oct 2006 – Feb 2nd Regional and Discuss survey re-
2007 International Work- sults,
shop policy issues
Feb – Apr 2007 Finalise Recom-
mendations
30/04/2007 End
On-going survey FlossWorld is conducting worldwide surveys among the follow-

ing target groups:
1. Private sector
2. Government sector
3. Open Source participants community
4. Higher Education Institutes - Administrators
5. Higher Education Institutes - IT Managers
Furthermore, there are different questions from country to country to ensure

international comparability - i.e. using local currencies in the questionnaire and lo-
calised scales when asking about income or expenditure levels, introduction of addi-
tional questions that are unique to each country’s context. The FLOSSWorld survey,
is at least to become an indicator on local OSS perception, usage and adoption as
compared to other countries in the world.
Revision: final 124

5.5 PYPY
The PyPy project has been an ongoing Open Source Python language implementation
since 2003. In December 2004 PyPy received EU-funding within the Framework
Programme 6, second call for proposals ("Open development platforms and services"
IST).
PyPy is an implementation of the Python programming language written in Python
itself, flexible and easy to experiment with. The long-term goals of this project are
to target a large variety of platforms, small and large, by providing a compiler tool
suite that can produce custom Python versions. Platform, memory and threading
models are to become aspects of the translation process - as opposed to encoding
low level details into the language implementation itself. Eventually, dynamic opti-
misation techniques - implemented as another translation aspect - should become
robust against language changes.
A consortium of 8 (12) partners in Germany, France and Sweden are working to
achieve the goal of an open run-time environment for the Open Source Program-
ming Language Python. The scientific aspects of the project is to investigate novel
techniques (based on aspect-oriented programming code generation and abstract
interpretation) for the implementation of practical dynamic languages.
A methodological goal of the project is also to show case a novel software engi-
neering process, Sprint Driven Development. This is an Agile methodology, provid-
ing a dynamic and adaptive environment, suitable for co-operative and distributed
development.
The project is divided into three major phases, phase 1 has the focus of develop-
ing the actual research tool - the self contained compiler, phase 2 has the focus of
optimisations (core, translation and dynamic) and in phase 3 the actual integration
of efforts and dissemination of the results. The project has an expected deadline in
November 2006.
PyPy is still, though EU-funded, heavily integrated in the Open Source community
of Python. The methodology of choice is the key strategy to make sure that the com-
munity of skilled and enthusiastic developers can contribute in ways that wouldn’t
have been possible without EU-funding.
5.6 QUALIPSO
Goals The Integrated Project (QualiPSo) aims at making a major contribution to
the state of the art and practice of Open Source Software. The goal of the QualiPSo
integrated project is:
to define and implement technologies, procedures and policies to leverage

the Open Source Software development current practices to sound and
well recognised and established industrial operations.
Revision: final 125

The project brings together software companies, application solution developers

and research institutions and will be driven by the need for having for OSS software
the appropriated level of trust which makes OSS development an industrial and wide
accepted practice. To reach this goal the QualiPSo project will define, deploy and
launch the QualiPSo Competence Centres in Europe (4), Brazil (1) and China (1) all
of the making use of the QualiPSo Factory.
Exploitation of results will be achieved through different routes, but with the
common theme of partners incorporating these results in current or planned prod-
ucts. Under their founders partners the QualiPSo project will be closely related with
important OSS Communities such as QbjectWeb and Morfeo.
With the economy moving towards new open models, the potential impact of
QualiPSo will be across the entire chain of software system development, proposing
an integrated approach along many dimensions:
• technically, through a focus on complementary problem areas addressed by

strong research teams,
• industrially, through application partners from different sectors who share a

common vision for the potential of services,
• managerially, through the creation of a strong management structure based on

an entrepreneurial company,
• internationally with partners from different countries coming from different

continents,
• Individually, through strong existing working relationships between partners.
The need to sustain and advance the QualiPSo solutions in the future requires an
open sustainability approach. QualiPSo is open in the following ways:
• its use of open standards and the Open Source software development approach
• it is based on an open community to enlarge and enforce its resources and

input from researchers, scientists, art professionals and users
• it is open to expansion, by inserting new application scenarios and other project

results in a "plug and play" manner.
The project will be structured into the following classes of activities:
• Problem activities: These activities provide the foundation and technological

content upon which the project is built.
• Legal Issues: This activity addresses the need for a clear legal context in which
OSS will be able to evolve within the European Union.
Revision: final 126

• Business Models: This activity addresses the need to incorporate new software
development models that can cope with the OSS peculiarities.
• Interoperability: This activity addresses the needs of the software industry for
standards based interoperable software.
• Trustworthy Results: This activity addresses the need for the definition of
clearly identified and tested quality factors in OSS products.
• Trustworthy Processes: This activity addresses the need for the definition of an
OSS-aware standard software development methodology.
• Project activities. The project activities are cross-cutting activities that take
the results generated by the problem activities, integrate them in a coherent
framework and assess and improve their applicability using the selected ap-
plication scenarios. Project activities also include all issues related to indus-
trialisation, dissemination, standardisation, and exploitation of the resulting
framework. These activities are the following:
• QualiPSo Factory: This activity integrates the results achieved in the prototyp-
ing phase of the problem activities to create the QualiPSo environment.
• QualiPSo Competence Centre: this activity aims to develop the means for con-
tinuous and sustainable (beyond the scope of the project) centralisation of ref-
erence information concerning quality OSS development.
• Promotion and support: this activity aims develop awareness for the QualiPSo
results within the global OSS community.
• Demonstration
• Training: This activity will focus on providing training services both in class-
room and through the internet in order to evangelise the results of QualiPSo.
Coordination To achieve its ambitious goal QualiPSo will pursue the following ob-
jectives:
• Define methods, development processes, and business models for the imple-
mentation and deployment of Open Source Software systems to insure inten-
sive software consumers that Open Source projects conform to the standards
required to provide industry level software.
• Design and implement a specific environment where different tools are inte-
grated to facilitate and support the development of viable industrial OSS sys-
tems. This environment will include a secure collaborative platform able to
Revision: final 127

guarantee that there is no facetious intrusion in the development of code. This

also implies that necessary audit of the liability of the software for the IT play-
ers to be able to indemnify their users in case of problem caused by the soft-
ware will be supported.
• Implement specific tools for benchmarking to check the expected quality of

OSS that will prove non-functional properties, such as robustness and scalabil-
ity, for supporting major critical applications. The evaluation of these qualities
will be carried out in a rigorous, yet practical way what will encompass both
static (i.e. related to the structure of OSS) and dynamic (i.e. related to the
execution and use of OSS) aspects.
• Implement and support better practices in respect to management of informa-

tion (including source code, documentation and info exchanged between actors
involved in a project) in order to improve the productivity of development and
evolution of OSS systems.
• Demonstrate interoperability which is at the centre of Open Standards com-

monly implemented in OSS by providing test suites and qualified integration
stacks.
• Understand the legal conditions by which OSS products are protected and
recognised, without violating the OSS spirit.
• Develop a long lasting network of professionals concerned by the quality of

Open Source Software for the enterprise computing.
5.7 QUALOSS
The strategic objective of this project is to enhance the competitive position of the
European software industry by providing methodologies and tools for improving
their productivity and the quality of their software products.
To achieve this goal, this proposal aims to build a high level methodology to
benchmark the quality of Open Source software in order to ease the strategic deci-
sion of integrating adequate F/OSS1 components into software systems. The results
of the QUALOSS project directly address the strategic objective 2.5.5 of providing
methodologies to use Open Source software into industrial development, to enable
its benchmarking, and to support its development and evolution.
Two main outcomes of the QUALOSS project achieve the strategic objectives by
delivering an assessment methodology for gauging the evolvability and robustness
of Open Source software and a tool that mostly automate the application of the
methodology. Unlike current assessment techniques, ours combines data from soft-
ware products (its source code, documentation, etc) with data about the developer
Revision: final 128

community supporting the software products in order to estimate the evolvability

and robustness of the evaluate software products.
In fact, QUALOSS takes advantage of information widely available in F/OSS repos-
itories that often contains both kind of information, that is, software product data
and data produced by the developer community while developing and maintaining
the software product. Although tools aim to automate most of the procedure of apply-
ing quality models, it is unlikely that every aspect can be computed hence pointers
from the user will be needed. This is why tools will be accompanied by a user manual
specifying, first, the manual activities to perform when applying quality models and
second, how to use the outcomes of the manual activities in combination with tools
to finally estimate the evolvability and robustness of the selected F/OSS component.
In the end, tools and the user manual provide the user with integrated assessment
methodology to gauge the quality of F/OSS components.
Ultimately, tooled methodology reaches the strategic objectives stated above. By
integrating more evolvable and robust F/OSS components in their solutions, organ-
isation will spend less time fighting with the F/OSS component hence will be more
productive. This proposition will studied through cases studies.
This instrumented method will allow increasing the productivity and improving
the software quality by integrating evolvable and robust Open Source software. In a
more quantifiable way, the targets of QUALOSS project are:
• to increase the productivity of software companies by 30%
• to decrease the average number of defects by 10%
• to decrease the effort to modify a software by 20%
The QUALOSS consortium is composed of leading research organisations in the

field of measurement, software quality and Open Source as well as a panel of indus-
try representatives (including SMEs) involved in Open Source projects.
5.8 SELF
SELF will be a web-based, multi-language, free content knowledge base written
collaboratively by experts and interested users. The SELF Platform aims to be the
central platform with high quality educational and training materials about Free
Software and Open Standards. It is based on world-class Free Software technologies
that permit both reading and publishing free materials, and is driven by a worldwide
community.
The SELF Platform is a repository with free educational and training materials
on Free Software and Open Standards and an environment for the collaborative
creation of new materials. Inspired by Wikipedia, the SELF Platform provides the
materials in different languages and forms. The SELF Platform is also an instrument
Revision: final 129

for evaluation, adaptation, creation and translation of these materials. Most impor-
tantly, the SELF Platform is a tool to unite community and professional efforts for
public benefit.
The general strategic objectives of the SELF project are:
• Bring together universities, training centres, Free Software communities, soft-

ware companies, publishing houses and government bodies to facilitate mutual
support and exchange of educational and training materials on Free Software
and Open Standards.
• Centralise, transmit and enlarge the available knowledge on Free Software and
Open Standards by creating a platform for the development, distribution and
use of information, educational and training programmes about Free Software
and its main applications.
• Raise awareness and contribute to the building of critical mass for the use of
Free Software and Open Standards.
The concrete project objectives of the SELF project are:
• Research the state of the art of currently available Free Software educational
and training programmes and detect the potential gaps.
• Create an open platform for the development, distribution and use of informa-
tion, educational and training programmes on Free Software and Open Stan-
dards.
• Develop educational and training materials concerning Free Software and Open
Standards. The project aims for including information on at least 50 software
applications in the initial period.
• Make the SELF platform self-sustainable by creating an active community of in-

dividuals and institutions (universities, training centres, Free Software commu-
nities, software companies, publishing houses and government bodies) around
it. The SELF project aims for involving at least 150 members in the SELF
community by the end of the project.
While the SELF platform will be started by the members of the consortium, its fi-
nal goal is to become a community of different interested parties (from governments
and educational institutes to companies) that can not only exploit the SELF materials
but also participate in its production. The commercial and educational interests of
exploiting the SELF materials will assure the self-sustainable character of the SELF
Platform beyond the EC funding period. The SELF Project aims for involving at least
150 members in the SELF community by the end of the project.
This project starts from three main assumptions:
Revision: final 130

1. Free Software and Open Standards are crucial to support the competitive po-
sition of the European software industry.
2. The real and long term technological change from private to Free Software can
only come by investing in education and training.
3. The production of educational and training materials on Free Software and

Open Standards should be done collaboratively by all the parties involved.
That is why the SELF platform will have two main functions. It will be simulta-
neously a knowledge base and a collaborative production facility. On the one hand,
it will provide information, educational and training materials that can be presented
in different languages and forms: from course texts, presentations, e-learning pro-
grammes and platforms to tutor software, e-books, instructional and educational
videos and manuals. On the other hand, it will offer a platform for the evaluation,
adaptation, creation and translation of these materials. The production process of
such materials will be based on the organisational model of Wikipedia. In short,
SELF will be a web-based, multi-language, free content knowledge base written col-
laboratively by experts and interested users.
5.9 TOSSAD
Europe, as a whole, has a stake in improving the usage of F/OSS in all branches of
IT and public life, in general. F/OSS communities throughout Europe can achieve
better results through co-ordination of their research activities/programmes that
reflect the current state-of-the-art.
The main objective of the tOSSad project is to start integrating and exploiting
already formed methodologies, strategies, skills and technologies in F/OSS domain
in order to help governmental bodies, educational institutions and SMEs to share
research results, establish synergies, build partnerships and innovate in an enlarged
Europe.
More precisely, the tOSSad project aims at improving the outcomes of the F/OSS
communities throughout Europe through supporting the coordination and network-
ing of these communities by means of state-of-the-art studies, national program initi-
ations, usability cases, curriculum development and implementation of collaborative
information portal and web based groupware.
Main tOSSad coordination activities are:
• F/OSS study (Work package 1)
• F/OSS national programs (Work package 2)
• F/OSS usability study (Work package 3)
Revision: final 131

• F/OSS curriculum development (Work package 4)
• Dissemination and exploitation (Work package 5)
Work package 1 has the intention of producing a report detailing both the current
status of F/OSS adoption in European countries, and the barriers that such future
adoption might face. It has the intention of producing a report detailing both the
current status of F/OSS adoption in European countries, and the barriers that such
future adoption might face. The main goal is to give a clear picture of the current
status (usage, implementation, adoption, penetration, government policies, etc.) of
F/OSS related to following topics:
• The technical barriers that hinder F/OSS usage in a larger scale
• Infra-structural weaknesses in some European countries
• Usability and accessibility
• Operating system specific technical problems
• Social barriers that hinder F/OSS usage in a larger scale
• Educational weakness
• Cultural readiness
• Political and financial problems
• Market problems (existing monopolies of any sort)
• Current and future trends and opportunities
The main deliverable of the WP1 is a report entitled ”F/OSS Study”.

Work package 2 aims to start up national programmes for improved usage of
F/OSS in some of the target countries and develop guidelines that will be used for
F/OSS adoption in the public sector. This Work package aims to start up national
programmes for improved usage of F/OSS in some (at least one) of the target coun-
tries and develop guidelines that will be used for F/OSS adoption in the public sector.
As part of this Work package, an expert group (containing individuals from partners,
as well as policy makers from governmental bodies) will be established in the kick-
off meeting in which all participants will attend. This expert group will also help
national and regional government institutes understand the benefits of F/OSS and
Open Source components where possible. A main goal of Work package WP2 is to
produce a road-map for F/OSS adoption. The deliverables of the work package are
designed according to this main goal.
Work package 2 tasks:
Revision: final 132

• Organising one workshop aiming to determine the requirements for national

programs with special focuses on best practices and success stories, F/OSS in
the public sector and migration strategies.
• Preparing research documents which can be proposed to be added to the Na-

tional ICT Programmes. These documents should focus on the following items:
• Usability centres, F/OSS R&D and solution centres
• Making use of F/OSS for e-learning
• F/OSS training and certification solutions for IT people, developers and users
making use of existing or new training institutions
• Catalysing the formation of Open Source communities and participation in the

development of Open Source software as part of global projects
• Collaborative models of joint development between F/OSS target countries"

and MS with superior F/OSS adoption.
• Building partnerships within the public and private sectors and civil society, as
well as regionally within Europe.
• Preparing not only high-line case histories, but also all the details needed to
copy and implement F/OSS solutions locally.
• Lobbying the national strategies decision makers in public sector by putting

forward reports on economical and social benefits of F/OSS usage. These re-
ports can include success stories in Europe and worldwide.
• Developing guidelines towards F/OSS adoption and dissemination in public

bodies.
The major objectives of Work package 3 are to tackle the obstacles and leading
to a breakthrough of usability in F/OSS, by assuring that usability will be paid more
attention in F/OSS in the future.
Within the usability work package of tOSSad the major objectives are to tackle
obstacles and leading to a breakthrough of usability in F/OSS, by assuring that us-
ability will be paid more attention in F/OSS in the future. To reach these objectives,
besides the intensive spreading of awareness, the following three major areas will
be addressed within Work package 3:
• State of the art usability based on both in depth desk research and an empirical
survey in F/OSS. If appropriate, the survey will be integrated into the empirical
investigations conducted in Work package 1
Revision: final 133

• Usability test of selected F/OSS components with a specific focus towards desk-
top applications, personal information management (PIM) and office applica-
tions
• Based on the test results and research in the area of tomorrow’s usability re-
quirements (thinking of mobile end devices, voice interaction, wearable) F/OSS
gaps will be detected. Derived from these recommendations for future research
directions will generated
• Guideline taking both the attention of usability aspects during F/OSS develop-
ment and the conduction of usability testing into account Thereby a recurrent
user involvement for usability assurance during shared developments via mock-
ups for inclusion in F/OSS development environment will be focused.
Work package 4 gathers partners with deep and complementary knowledge in

software engineering, university curricula development, e-learning and collaborative
learning, application of Open Source methodology and business models to real world
problems. WP4 partners shall work together in order to define one or more broadly
accepted, detailed curricula for F/OSS. There will be a focus in particular on items
2 and 3 below (courses and curricula about F/OSS operating system Linux related
system applications, and courses and and F/OSS software development tools), not
excluding studying and giving suggestions on items 1, 4 and 5.
Work package 4 curriculum development items are as follows:
• Courses and curricula about using the most popular F/OSS desktop applica-
tions - F/OSS office automation software, mail applications, Web browsers,
Wiki’s, etc. - even on proprietary operating systems.
• Courses and curricula about F/OSS server application & management - Linux
operating system Application Server (Tomcat), Web Server (Apache), databases,
middleware, and related system applications.
• Courses and curricula about F/OSS software development tools IDE (Eclipse),
Versioning System and related tools.
• Courses and curricula about how to develop and take advantage of F/OSS soft-
ware and software engineering of F/OSS. They are related to ongoing research
on methodologies and tools for F/OSS development, and aim to train software
developers able to build, customise and consult on F/OSS applications, being
active members of the F/OSS development community.
• Use of F/OSS software in computer science courses and curricula, as a cheap

and powerful mean to help understanding the computer science concepts.
Revision: final 134

References
[A.I89] A.I.Wasserman. The architecture of case environments. CASE Out-
look,pp 13-22, 1989.
[Alb79] A. J. Albrecht. Measuring application development. Proceedings of IBM

Applications Development Joint SHARE/GUIDE Symposium, Monterey,
CA, pp 83-92, 1979.
[ASAB02] Ioannis P. Antoniades, Ioannis Stamelos, Lefteris Angelis, and George

Bleris. A novel simulation model for the development process of open
source software projects. Software Process Improvement and Practice,
vol.7, pp 173-188, 2002.
[ASS+ 05] Ioannis P. Antoniades, Ioannis Samoladas, Ioannis Stamelos, Lefteris An-
gelis, and George Bleris. Dynamical Simulation Models of the Open
Source Development Process, chapter 8, pages 174–202. Idea Group
Inc., 2005.
[BBM96] Victor Basili, Lionel Briand, and Walcelio Melo. A validation of object-
oriented design metrics as quality indicators. IEEE Transactions on Soft-
ware Engineering, Vol. 22, No. 10, pp 751-761, 1996.
[BCR94] V. Basili, G. Caldiera, and D. Rombach. Encyclopedia of Software Engi-

neering, Vol. 1, pages 528-532. John Wiley and Sons, 1994.
[BDPW98] L. Briand, J. Daly, V. Porter, and J. Wuest. A comprehensive empirical val-

idation of product measures for object-oriented systems. IEEE METRICS
Symposium, Washington D.C, USA, 1998.
[BEM95] Lionel Briand, Khaled El Emam, and Sandro Morasca. Theoretical

and empirical validation of software product measures, technical report
isern-95-03. Technical report, ISERN, 1995.
[Beza] Nikolai Bezroukov. Open source software development as a special type

of academic research (critique of vulgar raymondism).
[Bezb] Nikolai Bezroukov. A second look at the cathedral and the bazaar.
[BGD+ 06] C. Bird, A. Gourley, P. Devanbu, M. Gertz, and A. Swaminathan. Min-

ing email social networks. In Proceedings of International Workshop on
Mining Software Repositories (MSR-06)., 2006.
[BL96] M. Berry and G. Linoff. Data Mining Techniques For marketing, Sales
and Customer Support. John Willey and Sons Inc., 1996.
[BR03] A. Bonaccorsi and C. Rossi. Why open source can succeed. http://
opensource.mit.edu/papers/rp-bonaccorsirossi.pdf, 2003.
[Bro75] Frederick P. Brooks. The Mythical Man-Month: Essays on Software En-

gineering. Addison-Wesley, 1975.
[BYRN99] R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Ad-

dison Wesley., 1999.
[CC05] G. Canfora and L. Cerulo. Impact analysis by mining software and change
request repositories. In Proceedings of the 11th IEEE International Soft-
ware Metrics Symposium (METRICS-05)., 2005.
[CH04] M.L. Collard and J.K. Hollingsworth. Meta-differencing: An infrastruc-

ture for source code difference anlysis. Kent State University, Kent, Ohio
USA, Ph.D. Dissertation Thesis, 2004.
[Cha02] S. Chakrabarti. Mining the Web: Analysis of Hypertext and Semi Struc-
tured Data. Morgan Kaufmann., 2002.
[CHA06] K. Crowston, J. Howison, and H. Annabi. Information systems success

in free and open source software development: Theory and measures.
Software Process Improvement and Practice, 11, pp. 123-148, 2006.
[CK76] S. R. Chidamber and C. F. Kemerer. A metrics suite for object oriented

design. IEEE Transactions in Software Engineering, Vol. 20, 1994, pp.
476-493, 1976.
[CLM04] Andrea Capiluppi, Patricia Lago, and Maurizio Morisio. Software engi-
neering metrics: What do they measure and how do we know? 10th
International Software Metrics Symposium, METRICS 2004, 2004.
[CMR04] Andrea Capiluppi, Maurizio Morisio, and Juan F. Ramil. Structural evo-
lution of an open source system: a case study. In Proceedings of the 12th
IEEE International Workshop on Program Comprehension (IWPC), Bari,
Italy, June 24-26, 2004, 2004.
[Con06] S.M. Conlin. Beyond low-hanging fruit: Seeking the next generation in
floss data mining. In IFIP International Federation for Information Pro-
cessing (IFIP), Vol. 203, Open Source Systems, pp. 261-266,, 2006.
[DH73] R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. John
Wiley and Sons, 1973.
[DOS99] Chris DiBona, Sam Ockman, and Mark Stone. Open Sources: Voices from
the Open Source Revolution. OReilly and Associates, 1999.
Revision: final 136

[DSA+ 04] I. Deligiannis, I. Stamelos, L. Angelis, M. Roumeliotis, and Shepperd.

M. A controlled experiment investigation of an object oriented design
heuristic for maintainability. The Journal of Systems and Software, 72,
pp 129-143, 2004.
[DSRS03] I. Deligianis, M. Shepperd, M. Roumeliotis, and I Stamelos. An empirical

investigation of and object oriented design heuristic for maintainability.
The Journal of Systems and Software, 65, pp 127-139, 2003.
[DTB04] Trung Dinh-Trong and James Bieman. Open source software develop-
ment: A case study of freebsd. In Proceedings of the 10th IEEE Interna-
tional Symposium on Software Metrics, 2004.
[DZ07] C. Ding and H. Zha. Spectral clustering, ordering and ranking statisti-
cal learning. Springer Verlag, Computational Science and Engineering.,
2007.
[Fel98] C. Fellbaum. WordNet – an electronic lexical database. MIT Press., 1998.
[FI93] U. Fayyad and K. Irani. Multi-interval discretization of continuous-valued

attributed for classification learning. In Proceedings of the 13th Interna-
tional Joint Conference on Artificial Intelligence, 1993.
[FLMP04] P. Francis, D. Leon, M. Minch, and A. Podguraki. Tree-based method for

classifying software failures. In Proceedings of the 15th International
Symposium on Software Reliability Engineering, 2004.
[FP97] Norman Fenton and Shari Lawrence Pfleeger. Software Metrics - A Rig-
orous Approach. International Thomson Publishing, London, 1997.
[FPSSR96] U. M. Fayyad, G. Piatesky-Shapiro, P. Smuth, and Uthurusamy R. Ad-

vances in Knowledge Discovery and Data Mining. AAAI Press, 1996.
[Fug93] A. Fuggetta. A classification of case technology. Computer, 26(12):25-38,

1993.
[Ger04a] D. M. German. An empirical study of fine-grained software modifications.

In Proceedings of 20th IEEE International Conference on Software Main-
tenance (ICSM’04), 2004.
[Ger04b] Daniel German. Software process improvement and practice, vol.8,

2004. Software Process Improvement and Practice, 8:201–215, 2004.
[GFS05] Tibor GyimÃşthy, Rudolf Ferenc, and IstvÃan

˛ Siket. Empirical validation
of object-oriented metrics on open source software for fault prediction.
IEEE Transactions on Software Enginering, 31(10):897–910, 2005.
Revision: final 137

[GHJ98] H. Gall, K. Hajek, and M. Jazayeri. Detection of logical coupling based on

product release history. In Proceedings of the 14th IEEE International
Conference in Software Maintainance, 1998.
[Gho04] A.R. Ghosh. Clustering and dependencies in free/open source software

development: Methodology and tools. Firstmonday, 8(4), 2004.
[GM03a] D. German and A. Mockus. Automating the measurement of open source

projects. In Proceedings of the 3rd Workshop on Open Source Software
Engineering, 25th International Conference on Software Engineering
(ICSE-03)., 2003.
[GM03b] M. German and A. Mockus. Automating the measurement of open source

projects. In Proceedings of the First International Conference on Open
Source Systems. Genova, Italy, pp. 100-107, 2003.
[GT00] Michael W. Godfrey and Qiang Tu. Evolution in open source software:
A case study. 16th IEEE International Conference on Software Mainte-
nance (ICSM’00), 2000.
[Hal77] M. H. Halstead. Elements of software science. 1977.
[HH04] A. Hassan and R.C. Holt. Predicting change propagation in software

systems. In Proceedings of 26th International Conference on Software
Maintenance (ICSM’04), 2004.
[HK76] S. M. Henry and D. Kafura. Software structure measurements based on

information flow. IEEE Transactions in Software Engineering, Vol. SE-7,
1981, pp. 510-518, 1976.
[HM05] Koch S. Hahsler M. Discussion of a large-scale open source data col-

lection methodology. In Proceedings of the 38th Hawaii International
Conference on System Sciences (IEEE, HICSS ’05-Track 7), Jan 03-06,
Big Island, Hawaii, page 197b., 2005.
[IB06] Clemente Izurieta and James Bieman. The evolution of freebsd and linux.
In ACM/IEEE International Symposium on Empirical Software Engineer-
ing, Rio de Janeiro, Brazil, 21-22 September, 2006, 2006.
[IEE98] IEEE. Standard for a software quality metrics methodology, revision.

IEEE Standards Department., 1998.
[Irb]
[IV98] N.M. Ide and J. Veronis. Word sense disambiguation: The state of the art.
Computational Linguistics, 24:1–40., 1998.
Revision: final 138

[JD88] A.K. Jain and R.C. Dubes. Algorithms for Clustering Data. Prentice-Hall,
1988.
[JL99] T. Jokikyyny and C. Lassenius. Using the internet to communicate soft-

ware metrics in a large organization. In Proceedings of GlobeCom’99,
1999.
[Jon95] C. Jones. Backfiring: Converting lines of code to function points. IEEE

Computer, Vol. 28, No. 11, pp 87-88, 1995.
[JS04] C. Jensen and W. Scacchi. Data mining for software process discovery in
open source software development communities. In Proceedings of In-
ternational Workshop on Mining Software Repositories (MSR-04)., 2004.
[Kan03] Stephen H. Kan. Metrics and Models in Software Quality Engineering.

Addison Wesley Professional, 2003.
[KB04a] C. Kaner and P. B. Bond. Software engineering metrics: What do they

measure and how do we know? In Proceedings of the 10th International
Software Metrics Symposium., 2004.
[KB04b] Cem Kaner and Walter Bond. Software engineering metrics: What do
they measure and how do we know? 10th International Software Metrics
Symposium, METRICS 2004, 2004.
[KCM05] H. Kagdi, L. Colland, and J. Maletic. Towards a taxonomy of approaches

for mining of source code repositories. In Proceedings of International
Workshop on Mining Software Repositories(MSR), 2005.
[KDTM06] Y. Kannelopoulos, Y. Dimopoulos, C. Tjortjis, and C. Makris. Mining

source code elements for comprehensing object-oriented systems and
evaluating their maintainability. SIGKDD Explorations, Vol.8, Issue 1,
2006.
[KPP+ 02] Barbara Kitchenham, Shari Lawrence Pfleeger, Lesley M. Pickard, Pe-
ter W. Jones, David C. Hoaglin, Khaled El Emam, and Jarrett Rosenberg.
Premilinary guidelines for empirical research in software engineering.
IEEE Transactions on Software Engineering, Vol. 28, No. 8, pp 721-733,
2002.
[KR90] L. Kauffman and P.J. Rousseeuw. Finding Groups in Data: An Introduction

to Cluster Analysis. John Wiley and Sons, 1990.
[KRSZ00] R. Kempkens, P. Rsch, L. Scott, and J. Zettel. Instrumenting measure-

ment programs with tools. technical report 024.00/e. Technical report,
Fraunhofer IESE, March 2000.
Revision: final 139

[KSL03] G. Krogh, S. Spaeth, and K. Lakhani. Community, joining, and specialisa-

tion in open source software innovation: a case study. Research Policy,
Vol. 32, pp. 1217-1241, 2003.
[KSPR01] S. Komi-Sirvi, P. Parviainen, and J. Ronkainen. Measurement automa-

tion: Methodological background and practical solutions-a multiple case
study. n Proceedings of the 7th International Software Metrics Sympo-
sium (Metrics 2001), London, 2001.
[KT05] G Koru and J. Tian. Comparing high-change modules and modules with
the highest measurement values in two large-scale open-source prod-
ucts. IEEE Transactions on Software Engineering, Vol. 31, No. 6, pp
625-642, 2005.
[lD05] On line Document. Business readiness rating for open source. BRR 2005
- RFC 1, http://www.openbrr.org, 2005.
[LFK05] M. Last, M. Friedman, and A. Kandel. The data dimining approach to

automated software testing. In Proceeding of the SIGKDD Conference,
2005.
[LK94] M. Lorenz and J Kidd. Object Oriented Software Metrics, A Practical

Guide. Prentice-Hall, Englewood Cliffs, N.J., 1994.
[LK03a] Hippel von E. Lakhani K. How open source software works: "free" user-
to-user assistance. Research Policy, 32:923–943., 2003.
[LK03b] M. Last and A. Kandel. Automated test reduction using an info-fuzzy net-
work. Annals of Software Engineering, Special Volume on Computational
Intelligenece in Software Enginnering, 2003.
[LRW+ 97] M. M. Lehman, J. F. Ramil, P. D. Wernick, D. E. Perry, and W. M. Turski.

Metrics and laws of software evolution - the nineties view. 4th Interna-
tional Software Metrics Symposium (METRICS’97), 1997.
[Mar04] Michlmayr Martin. Managing volunteer activity in free software projects.

In Proceedings of the 2004 USENIX Annual Technical Conference,
Freenix Track, pp.93-102., 2004.
[McC76] T. J. McCabe. A complexity measure. IEEE Transactions in Software

Engineering, Vol. 2, No. 4, December 1976, pp. 308-320, 1976.
[MFH02] Audris Mockus, Roy T. Fielding, and James Herbsleb. Two case studies
of open source software development: Apache and mozilla. ACM Trans-
actions on Software Engineering and Methodology, vol.11, no.3, 2002.
Revision: final 140

[Mit05] R. Mitkov. The Oxford Handbook of Computational Linguistics. Oxford

University Press., 2005.
[MN99] M. Mendonca and Sunderhaft N. Mining software engineering data: A

survey. Report (SPO700-98-D-400), 1999.
[MS99] C.D. Manning and H. Schutze. Foundations of Statistical Natural Lan-

guage Processing. MIT Press., 1999.
[MTF04] R. Mihalcea, P. Tarau, and E. Figa. PageRank on semantic networks,

with application to word sense disambiguation. In Proceedings of the
20th International Conference on Computational Linguistics (COLING-
04)., 2004.
[MTV+ 05] D. Mavroeidis, G. Tsatsaronis, M. Vazirgiannis, M. Theobald, and

G. Weikum. Word sense disambiguation for exploiting hierarchical the-
sauri in text classification. In Proceedings of the 9th European Confer-
ence on Principles of Data Mining and Knowledge Discovery (PKDD-05).,
2005.
[OH94] P. Oman and J. Hagemeister. Constructing and testing of polynomials

predicting software maintainability. Journal of Systems and Software 24,
3 (March 1994): 251-266., 1994.
[Par94] David Lorge Parnas. Software aging. Proceedings of the 16th Interna-
tional Conference on Software Engineering, 1994.
[Pfl01] Shari Lawrence Pfleeger. Software Engineering: Theory and Practice.

Prentice-Hall, 2nd edition, 2001.
[PMM+ 03] A. Podgurski, W. Masri, Y. McCleese, M. Minch, J. Sun, B. Wang, and

W. Masri. Automated support for classifying software failure reports. In
Proceedings of the 25th International Conference on Software Engineer-
ing, 2003.
[PSE04] James W. Paulson, Giancarlo Succi, and Armin Eberlein. An empirical

study of open-source and closed-source software products. IEEE Trans-
actions on Software Engineering, vol.30, No.4, pp 246-256, 2004.
[Ray99] Eric Steven Raymond. The Cathedral and the Bazaar: Musings on Linux
and Open Source by an Accidental Revolutionary. O’Reilly and Asso-
ciates, 1999.
[Ray01] Eric Steven Raymond. How to become a hacker, 2001.
Revision: final 141

[RGBG04] Gregorio Robles, JesÃžs M. GonzÃalez-Barahona,

˛ and Rishab Aiyer
Ghosh. Gluetheos: Automating the retrieval and analysis of data from
publicly available repositories. Proceedings of the Mining Software
Repositories Workshop. 26th International Conference on Software En-
gineering (Edinburgh, Scotland), 2004.
[Rie96] Arthur J. Riel. Object Oriented Design Heuristics. Addison Wesley Pro-
fessional, 1996.
[RKGB04] Gregorio Robles, Stefan Koch, and Jesus M. Gonzalez-Barahona. Remote

analysis and measurement of libre software systems by means of the
CVSAnalY tool. In Proceedings of the 2nd ICSE Workshop on Remote
Analysis and Measurement of Software Systems (RAMSS), Edinburg,
Scotland, UK, 2004.
[Rob05] Gregorio Robles. EMPIRICAL SOFTWARE ENGINEERING RESEARCH

ON LIBRE SOFTWARE: DATA SOURCES, METHODOLOGIES AND RE-
SULTS. PhD thesis, Dept. of Informatics. Universidad Rey Juan Carlos,
Madrid, Spain., 2005.
[RRP04] S. Raghavan, R. Rohana, and A. Podgurski. Dex: A semantic-graph dif-

ferencong tool for studying changes in large code bases. In Proceed-
ings of 20th IEEE International Conference on Software Maintenance
(ICSM’04), 2004.
[Sal] Peter H. Salus. The daemon, the gnu and the penguin.
[SAOB02] Ioannis Stamelos, Lefteris Angelis, Apostolos Oikonomou, and Geor-

gios L. Bleris. Code quality analysis in open source software develop-
ment. Information Systems Journal, 12(1):43âĂŞ60, 2002.
[Sch92] Norman Schneidewind. Methodology for validating software metrics.

IEEE Transactions on Software Engineering, Vol. 18, No. 5, pp 410-422,
1992.
[SMC74] W. Stevens, G. Myers, and L. Constantine. Structured design. IBM Sys-

tems Journal, 13, 2, 1974.
[SSA06] S.K. Sowe, I. Stamelos, and L. Angelis. Identifying knowledge brokers

that yield software engineering knowledge in oss projects. Information
and Software Technology, 48, 11(November 2006): 1025-1033, 2006.
[SSAO04] Ioannis Samoladas, Ioannis Stamelos, Lefteris Angelis, and Aposto-

los Oikonomou. Open source software development should strive
for even greater code maintainability. Communications of the ACM,
47(10):83âĂŞ87, 2004.
Revision: final 142

[Tur96] Wladyslaw M. Turski. Reference model for smooth growth of software

systems. IEEE Transactions on Software Engineering, vol.22, no.8, 1996.
[TVA07] G. Tsatsaronis, M. Vazirgiannis, and I. Androutsopoulos. Word sense

disambiguation with spreading activation networks generated from the-
sauri. In Proceedings of the 20th International Joint Conference on Arti-
ficial Intelligence (IJCAI-07)., 2007.
[Voo93] E.M. Voorhees. Using WordNet to disambiguate word senses for text re-
trieval. In Proceedings of the 16th International Conference on Research
and Development in Information Retrieval (SIGIR-93)., 1993.
[VT06] L. Voinea and A. Telea. Mining software repositories with cvsgrab. In

Proceedings of International Workshop on Mining Software Repositories
(MSR-06)., 2006.
[Wei71] Gerald M. Weinberg. The Psychology of Computer Programming. Van

Nostrand Reinhold, 1971.
[WH05] C.C. Williams and J.K. Hollingsworth. Automating mining of source code
repositories to improve bug finding techniques. IEEE Transactions on
Software Engineering 31(6):466–480., 2005.
[WK91] S.M. Weiss and C. Kulikowski. Computer Systems that Learn: Classi-
fication and Prediction Methods from Statistics, Neural Nets, Machine
Learning and Expert Systems. Morgan Kauffman, 1991.
[WO95] K. D. Welker and P. W. Oman. Software maintainability metrics models

in practice. Crosstalk, Journal of Defense Software Engineering 8, 11
(November/December 1995): 19-23, 1995.
[YC91] E. Yourdon and P. Coad. Object-Oriented Design. Prentice-Hall, Engle-

wood Cliffs, N.J., 1991.
[YSC+ 06] L. Yu, S. Schach, K. Chen, G. Heller, and J. Offnutt. Maintainability of the
kernels of open source operating systems: A comparison of linux with
freebsd, netbsd and openbsd. The Journal of Systems and Software, 79,
807-815, 2006.
[YSCO04] Liguo Y., S.R. Schach, K. Chen, and J. Offnutt. Categorization of the
common coupling and its application to the maintainability of the linux
kernel. IEEE Transaction on Software Engineering, Vol. 30, No. 10, pp
694-706, 2004.
Revision: final 143

[ZWDZ04] T. Zimmermann, P. Weibgerber, S. Diehl, and A. Zeller. Mining version

histories to guide software changes. In Proceedings of 26th International
Conference on Software Engineering (ICSE’04), 2004.
XML Extensible Markup Language
SQL Structured Query Language
HTML Hypertext Markup Language
IDE Integrated Development Environment
UML Unified Modeling Language
CSV Comma Separated Values
COTS Commercial off-the-shelf
Revision: final 144

SQO-OSS D 2 Final

Cargado por

Información del documento

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

SQO-OSS D 2 Final

Cargado por

Copyright:

Formatos disponibles

Software Quality Observatory for Open Source Software

Project Number: IST-2005-33331

D2 - Overview of the state of the art

Work Package Number: 1

context of software engineering. A large amount of data is produced in software de-

2.7.1 CT C++ -CMT++-CTB . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3 Empirical OSS Studies 57

4 Data Mining in Software Engineering 88

5 Related IST Projects 113

5.3 FLOSSMETRICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

1 Software Metrics and Measurement

1. Evaluate the quality of Open Source software.

2. Evaluate the health of an Open Source software project.

• How is quality evaluated?

2. How can the health of an Open Source software project be evaluate?

• How is the health of a project evaluated?

1.1 Software Metrics Taxonomy

• Internal metrics of a product refers to the process or resource that can be

• External metrics of a product refers to the process or resource that can be

1.2 Process Metrics

Number Of Known Defects

Defect Removal Effectiveness: Defect Removal Effectiveness is a process metric

• Lines that contain several separate declarations.

Logical SLOC measures attempt to measure the number of "statements". Its

... a line ending in a newline or end-of-file marker, and which contains at

Totallength(LOC ) = SLOC + Numberofcommentedlines.

• µ1 the number of distinct operators,

• µ2 the number of distinct operands,

• N1 the total number of operators,

• N2 the total number of operands.

• The length N of a program N = N1 + N2 ,

• The vocabulary µ of a program µ = µ1 + µ2 ,

• The volume V of a program V = N ∗ log2 µ,

• External inquiries: Interactive inputs requiring a response.

• External files: Interfaces to other systems.

Next each item is assigned a subjective “complexity” rating on a 3-point ordinal

Then we compute a technical complexity factor (TCF). In order to do this we rate 14

The final calculation of the total FP of the system is

• Number of methods per class.

• LOC per class.

• LOC per method.

SSI (current) = SSI (previous)

1.2.1 Structure Metrics

graph G, the cyclomatic complexity is

and the essential cyclomatic complexity, which is

Coupling: The notion of coupling was introduced by three IBM researchers in

• No coupling relation: x and y have no communication and they are totally

• Data coupling relation: x and y communicate by parameters. This type of

Figure 1: A program’s flowchart. The cyclomatic complexity of this program V (G) is

• Control coupling relation: x passes a parameter to y with the intention of con-

• Content coupling relation: x refers to the inside of y (i.e. that is it branches

Henry and Kafura’s Information Flow Complexity: Another complexity metric

• Metric 3: Number of Children (NOC) This is the number of immediate succes-

• Metric 4: Coupling between Object Classes (CBO) An object class is coupled

• Metric 6: Lack of Cohesion Metric (LCOM) The cohesion of a class is indicated

1.2.2 Design Metrics

1. The inheritance hierarchy should not be deeper than six.

3. All data should be hidden within its class.

4. All data in a class should be private.

5. All methods in a class should have no more than six parameters.

6. A class should not have zero methods.

7. A class should not have one or two methods.

8. A class should not be converted to an object of another class.