Bachmann Christian 201111 MASc Thesis

Multi-Sensor Data Fusion for Traffic Speed and Travel
Time Estimation
by
Christian Bachmann
A thesis submitted in conformity with the requirements

for the degree of Master of Applied Science
Department of Civil Engineering
University of Toronto
© Copyright by Christian Bachmann 2011

ii
Multi-Sensor Data Fusion for Traffic Speed and Travel Time

Estimation
Christian Bachmann
Master of Applied Science
Department of Civil Engineering
University of Toronto
2011
Abstract
In this thesis, seven multi-sensor data fusion based estimation techniques are investigated. All
methods are compared in terms of their ability to fuse data from loop detectors and Bluetooth
tracked probe vehicles to accurately estimate freeway traffic speed. In the first case study, data
generated from a microsimulation model are used to assess how data fusion might perform with
present day conditions, having few probe vehicles, and what sort of improvement might result
from an increased proportion of vehicles carrying Bluetooth-enabled devices in the future. In the
second case study, data collected from the real-world Bluetooth traffic monitoring system are
fused with corresponding loop detector data and the results are compared against GPS collected
probe vehicle data, demonstrating the feasibility of implementing data fusion for real-time traffic
monitoring today. This research constitutes the most comprehensive evaluation of data fusion
techniques for traffic speed estimation known to the author.

iii
Acknowledgments
Writing acknowledgments is always an enjoyable task; not only because it allows for reflection
on all those who have been helpful but also because you can be sure it’s one part that someone
will actually read. Being the last and final task probably makes it somewhat relieving as well.
First and foremost, I am deeply indebted to my two supervisors, Professor Matthew Roorda and
Professor Baher Abdulhai. I cannot express my gratitude enough for all that both of you have
done for me over these past few years. Baher: Thank you for taking a chance on me in my
undergraduate years and allowing me to get an early start in research; you were the inspiration.
Matt: Thank you for your constant presence and enthusiasm since day one; you are a role model
for all of us. I hope we are all proud of our joint accomplishments.
Thank you also to Professor Behzad Moshiri from the University of Tehran who was visiting us
during some of this work. Your knowledge of data fusion pointed me in the right direction and
got things moving along quickly.
I gratefully acknowledge Professor Bruce Hellinga, Pedram Izadpanah, and the team of students
from the University of Waterloo who undertook the probe vehicle data collection effort used in
the sixth chapter of this thesis.
I feel as though none of this work would have been possible without all of my fellow
transportation graduate students. I am very thankful for all of the help I received in the ITS lab
throughout the progression of this work. I will never forget that many of you acted as surrogate
supervisors to me. I am equally thankful for all of the laughs and lunches we have shared
together. I will always look back fondly on my time spent in the ITS lab.
Very special thanks go to all of the students and staff at Chestnut Residence. Being a don
throughout my Master’s has been an educational experience all in itself. The friends and
memories I have made at Chestnut have undoubtedly changed my life in ways I’m not yet sure I
fully understand. Thank you to the students on my floor for motivating me and inspiring me with
your hard work and dedication. Thank you to my fellow dons for making my life exciting when
school was not. The Chestnut Residence community has helped me greatly; I hope I have helped
it too.
iv
Of course, I would like to acknowledge my family and friends for their constant love and
support. My life has always been blessed with company far better than I deserve and I am always
grateful for that.
Funding for this Master’s was provided in part by the Social Sciences and Humanities Research
Council of Canada (SSHRC) in the form of a Canada Graduate Scholarship (CGS) and by the
Ontario Ministry of Training, Colleges, and Universities in the form of an Ontario Graduate
Scholarship (OGS).
v
Table of Contents
Acknowledgments ..................................................................................................................... iii
Table of Contents ....................................................................................................................... v
List of Tables .......................................................................................................................... viii
List of Figures ........................................................................................................................... ix
List of Appendices.................................................................................................................... xii
List of Acronyms ..................................................................................................................... xiii
Chapter 1 Introduction ................................................................................................................ 1
1.1 Bluetooth Traffic Monitoring .......................................................................................... 1
1.2 Loop Detectors ............................................................................................................... 3
1.3 Research Questions and Objectives ................................................................................. 3
1.4 Thesis Structure .............................................................................................................. 5
Chapter 2 Background ................................................................................................................ 6
2.1 What is Data Fusion? ...................................................................................................... 6
2.2 Importance of Data Fusion – Why Fuse Data? ................................................................ 7
2.3 On the Use of Multiple Sensors....................................................................................... 8
2.3.1 Complementary ................................................................................................... 8
2.3.2 Competitive ......................................................................................................... 9
2.3.3 Cooperative ....................................................................................................... 10
2.4 Fusion System Architectures ......................................................................................... 10
2.4.1 Centralized ........................................................................................................ 11
2.4.2 Decentralized .................................................................................................... 11
2.4.3 Distributed ........................................................................................................ 12
Chapter 3 Literature Review ..................................................................................................... 13
3.1 Data Fusion in Transportation Engineering ................................................................... 13
3.2 Data Fusion for Traffic Speed and Travel Time Estimation ........................................... 16
3.2.1 Statistical Approaches ....................................................................................... 16
3.2.2 Kalman Filter Applications ................................................................................ 17
3.2.3 Neural Network Models .................................................................................... 18
3.2.4 Evidence Theory (Dempster–Shafer theory) ...................................................... 20
3.2.5 Other Contributions ........................................................................................... 24
3.3 Findings from the Literature Review ............................................................................. 26
vi
Chapter 4 Data Fusion Techniques ........................................................................................... 27

4.1 Simple Convex Combination ........................................................................................ 27
4.2 Bar-Shalom/Campo Combination ................................................................................. 28
4.3 Measurement Fusion ..................................................................................................... 29
4.3.1 The Kalman Filter ............................................................................................. 29
4.3.2 Multi-Sensor Multi-Temporal Data Fusion ........................................................ 30
4.4 Single-Constraint-At-A-Time (SCAAT) Kalman filter .................................................. 32
4.5 Ordered Weighted Averaging (OWA) ........................................................................... 33
4.5.1 Orness ............................................................................................................... 34
4.5.2 Dispersion ......................................................................................................... 34
4.5.3 Learning OWA Operator Weights from Data..................................................... 34
4.6 Fuzzy Integrals ............................................................................................................. 36
4.6.1 The Sugeno Fuzzy Integral ................................................................................ 37
4.6.2 The Choquet Fuzzy Integral .............................................................................. 37
4.6.3 Fuzzy Integrals as Aggregation Operators ......................................................... 37
4.6.4 Identification of Fuzzy Measures based on Learning Data ................................. 39
4.7 Artificial Neural Networks ............................................................................................ 40
4.7.1 Neuron Architecture .......................................................................................... 40
4.7.2 Layer Architecture............................................................................................. 41
4.7.3 Network Architecture ........................................................................................ 41
4.7.4 Neural Network Training – Backpropagation Algorithm.................................... 42
4.8 Fusion Architectures ..................................................................................................... 43
4.8.1 A Competitive Distributed Data Fusion Architecture ......................................... 43
4.8.2 A Cooperative and Competitive Distributed Data Fusion Architecture............... 44
4.9 Measures of Effectiveness ............................................................................................. 45
Chapter 5 Highway 400 Simulation Case Study........................................................................ 47
5.1 Highway 400 ................................................................................................................ 47
5.2 Traffic Microsimulation in Paramics ............................................................................. 48
5.2.1 Monitoring of Bluetooth Devices in Paramics.................................................... 49
5.2.2 Installation of Loop Detectors in Paramics ........................................................ 53
5.3 5 x 2 Cross Validation................................................................................................... 53
5.4 Data Fusion Results ...................................................................................................... 54
5.4.1 North of Steeles Ave W to North of Finch Ave W ............................................. 54
vii
5.4.2 North of Finch Ave W to Finch Ave W ............................................................. 57

5.4.3 Finch Ave W to North of Sheppard Ave W ....................................................... 60
5.4.4 North of Sheppard Ave W to North of Hwy 401 ................................................ 62
5.5 Summary of Key Findings ............................................................................................ 65
Chapter 6 Highway 401 Real-World Case Study ...................................................................... 66
6.1 From Microsimulation to the Real World ...................................................................... 66
6.2 Highway 401 Real-World Data Collection .................................................................... 67
6.3 k-fold Cross Validation ................................................................................................. 69
6.4 Data Fusion Results ...................................................................................................... 70
6.4.1 Highway 400 to West of Bathurst St .................................................................. 70
6.4.2 West of Bathurst St to East of Kennedy Rd ....................................................... 73
6.4.3 East of Kennedy Rd to West of Bathurst St ....................................................... 76
6.4.4 West of Bathurst St to Highway 400 .................................................................. 78
6.5 Summary of Key Findings ............................................................................................ 81
Chapter 7 Conclusion ............................................................................................................... 82
7.1 On Data Fusion Techniques .......................................................................................... 82
7.2 On Fusing Data from Loop Detectors and Probe Vehicles ............................................. 83
7.3 Recommendations for Future Work .............................................................................. 84
References ................................................................................................................................ 86
viii
List of Tables
Table 3-1: Summary of fusion techniques applied to ITS (Dailey, 1996) .................................. 14
Table 3-2: The relative merits of level 1 data fusion techniques (Keever et al., 2003) ............... 15
Table 4-1: Conventional measures of effectiveness for the evaluation of estimation error ......... 46
Table 5-1: Highway 400 sensor details ..................................................................................... 47
Table 6-1: A comparison of merits between microsimulation and real world data ..................... 66
Table 6-2: Eastbound Highway 401 sensor details .................................................................... 68
Table 6-3: Westbound Highway 401 sensor details ................................................................... 68
Table A-1: Average Root of Mean Squared Error – Hwy 400 Link 1, Architecture 1 ................ 89
Table A-2: Average Root of Mean Squared Error - Hwy 400 Link 1, Architecture 2................. 89
Table B-1: Average Root of Mean Squared Error – Hwy 401 Link 1 ........................................ 92
ix
List of Figures
Figure 1-1: Bluetooth station installation (Roorda et al., 2009) ................................................... 2
Figure 1-2: Bluetooth traffic monitoring operation concept (Young, 2008) ................................. 2
Figure 2-1: (Con)fusion of terminology (Hall & Llinas, 2001) .................................................... 7
Figure 2-2: A complementary sensor network may consist of several thermometers, each
covering a different geographical region (note there is no overlap in coverage) .......................... 9
Figure 2-3: Competitive thermometers would all return information regarding the same region
(note the overlap in coverage) ..................................................................................................... 9
Figure 2-4: Thermometers separated by equal distance along a line provide information about
temperature. They can also be used cooperatively to find the rate of change of temperature...... 10
Figure 2-5: Centralized architecture with a central processor – adapted from (Ng, 2003) .......... 11
Figure 2-6: Decentralized fusion architecture – adapted from (Ng, 2003) ................................. 11
Figure 2-7: Distributed fusion architecture – adapted from (Ng, 2003)...................................... 12
Figure 3-1: Frame of the FEFM (Kong & Liu, 2007) ................................................................ 21
Figure 3-2: Frame of the improved FEFM (Kong et al., 2007) .................................................. 22
Figure 3-3: Flowchart of the proposed evidential fusion algorithm (Kong et al., 2009) ............. 23
Figure 3-4: A data fusion algorithm for link travel time (Choi & Chung, 2002) ........................ 24
Figure 3-5: Time–space diagram plots: (a) congested routine based on signal system
measurements (b) manual revised estimate of congested regime based on bus probe and signal
system data (Berkow et al., 2009) ............................................................................................. 25
Figure 4-1: The ongoing discrete Kalman filter cycle (Welch & Bishop, 2006)......................... 30
Figure 4-2: The measurement fusion process for two measurement sequences. The individual
measurement sequences are placed in an augmented measurement sequence. The augmented
vector is then fused using a single KF (Mitchell, 2007)............................................................. 32
Figure 4-3: Illustration of various sources of traffic monitoring (Byon et al., 2010) .................. 33
x
Figure 4-4: Set relations between various aggregation operators and fuzzy integrals (Grabisch,
1996) ........................................................................................................................................ 38
Figure 4-5: Typically, neural networks are adjusted, or trained, so that a particular input leads to
a specific target output (Beale et al., 2010) ............................................................................... 40
Figure 4-6: A neuron with a single scalar input and a scalar bias (Beale et al., 2010) ................ 40
Figure 4-7: Three of the most commonly used functions: a) hard-limit transfer function, b) linear
transfer function , c) sigmoid transfer function (Beale et al., 2010) ........................................... 41
Figure 4-8: A one-layer network with R input elements and S neurons (Beale et al., 2010) ....... 41
Figure 4-9: A network can have several layers. Each layer has a weight matrix W, a bias vector
b, and an output vector a (Beale et al., 2010) ............................................................................ 42
Figure 4-10: Competitive data fusion architecture (“Architecture 1”)........................................ 44
Figure 4-11: Cooperative and competitive data fusion architecture (“Architecture 2”) .............. 45
Figure 5-1: Highway 400 sensor schematic (distances shown in meters – drawn to scale) ......... 48
Figure 5-2: Bluetooth detection coverage projected onto a road lane ......................................... 50
Figure 5-3: Theoretical Bluetooth device discovery times ......................................................... 51
Figure 5-4: A typical simulation of Highway 400 – Link 1 ....................................................... 55
Figure 5-5: Error as a function of probe vehicle sample size (Link 1, Architecture 1) ............... 56
Figure 5-7: A typical simulation of Highway 400 – Link 2 ....................................................... 58
Figure 5-10: A typical simulation of Highway 400 – Link 3 ..................................................... 60
Figure 5-11: Error as a function of probe vehicle sample size (Link 3, Architecture 1).............. 61
Figure 5-13: A typical simulation of Highway 400 – Link 4 ..................................................... 63

xi
Figure 6-1: Highway 401 sensor schematic (distances shown in kilometers – drawn to scale) ... 69
Figure 6-2: Data collected on Highway 401 – Link 1 ................................................................ 71
Figure 6-3: Comparison of loop detector, Bluetooth, and GPS estimates on Link 1 ................... 72
Figure 6-4: Error of data fusion techniques on Hwy 401 - Link 1 .............................................. 73
Figure 6-7: Error of data fusion techniques on Hwy 401 - Link 2 .............................................. 76
Figure 6-10: Error of data fusion techniques on Hwy 401 - Link 3 ............................................ 78
Figure 6-11: Data collected on Highway 401 – Link 4 .............................................................. 79
Figure 6-12: Comparison of loop detector, Bluetooth, and GPS estimates on Link 4 ................. 80
Figure 6-13: Error of data fusion techniques on Hwy 401 - Link 4 ............................................ 81
xii
List of Appendices
Appendix A Highway 400 Statistical Significance Tests........................................................... 89
Appendix B Highway 401 Statistical Significance Tests ........................................................... 92
xiii
List of Acronyms
ADAS = Advanced Driver Assistance
ADVANCE = Advanced Driver and Vehicle Advisory Navigation Concept
AGV = Autonomous Guided Vehicles
AID = Automatic Incident Detection
ATIS = Advanced Traveler Information Systems
AVL = Automatic Vehicle Location
DSER = Dempster-Schafer Evidential Reasoning
EOBR = Electronic On Board Recorder
FEFM = Federated Evidence Fusion Model
GEP = Generalized Evidence Processing
GPS = Global Positioning System
ILD = Inductive Loop Detector
ITS = Intelligent Transportation Systems
KF = Kalman Filter
LS = Least Square
MAC = Media Access Control
MAE = Mean Absolute Error
MAPE = Mean Absolute Percentage Error
MARE = Mean Absolute Relative Error
ME = Mean Error
ML = Maximum Likelihood
MRE = Mean Relative Error
MSDE = Mean State Decision Error

xiv
MSE = Mean Squared Error
MTO = Ministry of Transportation Ontario
OWA = Ordered Weighted Averaging
RME = Relative Mean Error
RMSE = Root Mean Squared Error
SCAAT = Single-Constraint-At-A-Time
TMC = Traffic Management Center
VDS = Vehicle Detector Station

1
Chapter 1
Introduction
“The beginning is the most important part of the work.”
– Plato
1.1 Bluetooth Traffic Monitoring

Travel time measurement in real-time is a major function of Intelligent Transportation Systems.
Recently, there has been an interest in developing an anonymous probe vehicle monitoring
system to measure travel times on highways and arterials based on wireless signals available
from technologies such as Bluetooth. The majority of consumer electronic devices produced
today come equipped with Bluetooth wireless capability to communicate with other devices in
close proximity. For example, it is the primary means to enable hands-free use of cell phones.
Bluetooth enabled devices can communicate with other Bluetooth enabled devices anywhere
from 1 meter (class 3) to 100 meters (class 1), depending on the power rating of the Bluetooth in
the devices. The Bluetooth protocol uses an electronic identifier in each device called a Media
Access Control (MAC) address. By mounting a simple antenna adjacent to the roadway (Figure
1-1), MAC addresses for visible devices can be easily logged and time-stamped. If these MAC
addresses are consecutively logged at multiple stations, the unique MAC addresses can be
matched, and the difference in time stamps can be used to estimate travel times (Figure 1-2).
Studies in Maryland (Young, 2008) and Indiana (Wasson, Sturdevant, & Bullock, 2008) show
the feasibility of using such a system to estimate travel time.
A real-time traffic monitoring system is currently under development in Toronto, Canada, which
detects Bluetooth-enabled devices travelling past roadside receivers, allowing for the
aforementioned method of travel time estimation. The system also makes use of RouteTrackers,
which are currently installed in over 20,000 trucks in over 250 firms. A RouteTracker is a Global
Positioning System (GPS) tracking device that is connected directly to the vehicles engine
computer. Engine diagnostic reporting combined with GPS data is gathered to help fleets
optimize their operations and automate regulatory compliance (Xata Turnpike, 2010). These
RouteTrackers download GPS data wirelessly to roadside receivers in pseudo real time along
freeways throughout Toronto. It is noted that the GPS data obtained by this system are only
2
“real-time” in the sense that closely spaced stations provide very frequent updates of recent truck
location and speed data. A preliminary analysis of this system by Roorda et al. (2009) showed
that travel time estimates can be obtained by observing either RouteTracker-enabled trucks or
other vehicles carrying a Bluetooth device at consecutive locations on the highway.
Figure 1-1: Bluetooth station installation (Roorda et al., 2009)
Figure 1-2: Bluetooth traffic monitoring operation concept (Young, 2008)

3
1.2 Loop Detectors

More traditionally, vehicle detector stations (VDS) are the major elements of a freeway traffic
management system. Inductance loops are the most widely used detectors in freeway traffic
management systems because of their reliability in data measurements and flexibility in design
(Ministry of Transportation, 2010).
As the name suggests, the main function of loop detectors is to detect the passage and presence
of vehicles on the freeway. Data collected at vehicle detector stations are initially processed by a
micro processor located at the side of the highway. The processed data contain traffic volumes,
vehicle speeds, road occupancy, and vehicle length information. This information is then
transmitted at regular intervals to a Traffic Management Center (TMC) via a communications
system. The computer system uses the data to monitor traffic patterns and also attempts to
identify traffic incidents as they occur.
An inductance loop detector system basically consists of three components: a loop embedded in
the pavement consisting of multiple turns of wire; a lead-in cable which connects the loop wire
to the input of the loop detector amplifier; and a detector amplifier that intensifies the electrical
energy produced by the detector loop (Ministry of Transportation, 2010)..
Many vehicle detector stations in the Greater Toronto Area have a double-loop arrangement and
are specifically designed to measure vehicle speeds and lengths in addition to the traffic volumes
and occupancy information. Stations with one loop per lane are capable of directly measuring
traffic volumes and occupancy information only, from which speed can be estimated (Ministry of
Transportation, 2010).
1.3 Research Questions and Objectives

Clearly, both Bluetooth device tracking and loop detectors provide a means of traffic speed and
travel time estimation. The co-existence of these systems begs a number of research questions.
Most obviously, how do these systems compare in terms of their ability to estimate freeway
traffic speeds and travel times? Furthermore, is one system superior to the other? Bluetooth
device monitoring is a relatively new method for traffic monitoring, and accordingly, its
performance is not well known. Moreover, there has been no attempt thus far to compare it with
monitoring via loop detectors.
4
An even more interesting question is how to fuse data from these systems together. The rationale
for fusion is strong when you consider how the sensors complement one another. The data from
loop detectors cover almost all the vehicles that have travelled on the road section, resulting in
excellent temporal sampling and resolution. However, these measurements can be imprecise and
the spatial sampling depends on the sensor spacing. Moreover, such measurements typically only
represent traffic speed at the location of the sensor and not over the entire link. On the other
hand, probe vehicles can be more accurate, although of variable quality, and with good spatial
coverage. They describe the state of traffic on the entire road link, but are not exhaustive as they
are only a small portion of the vehicles that make up all of traffic in the network. With this
complementarity in mind, one can imagine how fusing data from these sources together might
enhance a traffic monitoring system.
However, considering data fusion adds further questions. First, is there a best way of fusing data
from these systems together? Once the data have been fused, is there an improvement in
accuracy? If there is no improvement in accuracy, should we bother with data fusion? Another
interesting point lies in the number of probe vehicles captured by the Bluetooth traffic
monitoring system. More specifically, how does the number of probe vehicles captured affect the
accuracy of the system and of the fusion result? Finally, with all of the other questions answered,
how feasible is implementing a data fusion based system today?
With these questions in mind, the research objectives of this thesis are as follows:
• Compare the accuracies of loop detectors and Bluetooth traffic monitoring.

• Identify all of the data fusion techniques that could be used to fuse loop detectors and
probe vehicle estimates.
• Compare the applicable data fusion methods in their ability to fuse loop detector data and
probe vehicle data to accurately estimate freeway traffic speeds.
• Using microsimulation scenarios, investigate how the number of probe vehicles captured
by the Bluetooth traffic monitoring system affects its accuracy and the subsequent fusion
estimate.
• Fuse real-world data coming from the Bluetooth traffic monitoring system and
corresponding loop detector data and compare against GPS collected data to determine if
these techniques are “practice-ready”.
5
This thesis is the first attempt to fuse a Bluetooth traffic monitoring system with loop detectors.
Furthermore, it constitutes the most comprehensive evaluation of data fusion techniques for
traffic speed estimation known to the author.
1.4 Thesis Structure

Chapter 1 introduced the underlying motivations and research objectives. Chapter 2 provides the
relevant background information to familiarize the reader with the field of multi-sensor data
fusion with the intent of making the remainder of the thesis comprehensible. Chapter 3 provides
an introduction to data fusion applications in transportation engineering and a comprehensive
literature review of data fusion research conducted in traffic speed and travel time estimation.
Chapter 4 presents the mathematical details of the data fusion techniques utilized in this research;
every effort was made to make this chapter as self contained as possible, but the reader may find
they require additional texts to aid in understanding the complex portions of this material (which
can be found in the cited references). Chapter 5 provides the results of a microsimulation case
study of Highway 400 in Toronto, Canada. Chapter 6 provides the results of fusing real-world
data from the Bluetooth traffic monitoring system with corresponding loop detector data on
Highway 401, also in Toronto. Chapter 7 concludes with a summary of key findings and
directions for future work.
6
Chapter 2
Background
“Nature provides the main inspiration in designing intelligent systems.”
– G.W Ng
2.1 What is Data Fusion?

The general concept of multi-sensor data fusion is analogous to the manner in which humans and
animals use a combination of multiple senses, experience, and the ability to reason to improve
their chances of survival (Mitchell, 2007). In particular, the brain fuses information from our
surrounding environment and attempts to derive knowledge, draw conclusions or inferences from
the fused information (Ng, 2003). For example, consider how many sensors are used by a human
being when eating. Assessing the quality of an edible substance may not be possible using only
the sense of vision; the combination of sight, touch, smell, and taste is far more effective (Hall &
Llinas, 2001).
While there is not one commonly referenced definition of data fusion, there is a general
consensus of what fusing data means. Mitchell (2007) suggests that multi-sensor data fusion is
“the theory, techniques and tools which are used for combining sensor data, or data derived from
sensory data, into a common representational format…in performing sensor fusion our aim is to
improve the quality of the information, so that it is, in some sense, better than would be possible
if the data sources were used individually.” Hall & Llinas (2001) propose “data fusion
techniques combine data from multiple sensors and related information to achieve more specific
inferences than could be achieved by using a single, independent sensor.” Ng (2003) provides the
simplest definition, stating that “fusion involves the combination of data and information from
more than once source.” As can be seen from these three definitions, there is a common
understanding that data fusion encompasses a wide variety of activities that involve using
multiple data sources. Unfortunately, the universality of data fusion has engendered a profusion
of overlapping research and development in many applications. A jumble of confusing
terminology (Figure 2-1) and ad hoc methods in a variety of scientific, engineering,
management, and educational disciplines obscures the fact that the same ground has been
covered repeatedly (Hall & Llinas, 2001).
7
Sensor
Resource Management
Management
Processing Control
Management
Planning Sensor Fusion
Correlation Estimation
Information Fusion
Tracking
Data Mining
Data Fusion
Figure 2-1: (Con)fusion of terminology – adapted from (Hall & Llinas, 2001)
2.2 Importance of Data Fusion – Why Fuse Data?

There are many reasons why we need data and information fusion systems as noted by Mitchell
(2007), Ng (2003), Hall & Llinas (2001), Brooks & Iyengar (1998) and Luo & Kay (1989):
• Reliability/Robustness/Redundancy: A system that depends on a single source of input

is not robust in the sense that if the single source fails to function properly, the whole
system operation will fail. However, the system fusing several sources of data has a
higher fault-tolerance since multiple sensors providing redundant information serve to
increase reliability in the case of sensor error or failure.
• Accuracy/Certainty: Combining readings from several different kinds of sensors can

give a system more accurate information. Combining several readings from the same
sensor makes a system less sensitive to noise and temporary glitches. Therefore, multiple
independent sources of data can not only help improve accuracy, but can also add
certainty by removing ambiguity in the data.
• Completeness/Coverage/Complementarity: More data sources will provide extended

coverage of information on an observed object or state. Extended coverage is particularly
relevant in spatial and temporal environments for the sake of completeness. Sometimes
information from multiple sensors is complementary and allows features in the
environment to be perceived that are impossible to perceive using just the information
from each individual sensor operating separately (see section 2.3.3).
8
• Cost effectiveness: To build a single sensor that can perform multiple functions is often
more expensive than to integrate several simple and cheap sensors with specific
functions.
• Representation: Another problem that sensor fusion attempts to address is information

overload. The amount of time required to make a decision increases rapidly as the
amount of information available increases. Sensor fusion is necessary to combine
information and clearly present the best interpretation of the sensor data to allow for a
well informed and timely decision.
• Timeliness: More timely information may be provided by multiple sensors due to either
the actual speed of operation of each sensor, or the processing parallelism that may be
possible, as compared to the speed at which it could be provided by a single sensor.
Of course, all of these benefits hinge on the assumption that there is no single perfect source of
information. This assumption is well made since all sensors have a few things in common: every
sensor device has a limited accuracy, limited coverage, is subject to the effect of some type of
noise, and will under some conditions function incorrectly. Hence, there is no single perfect
source of information.
2.3 On the Use of Multiple Sensors

Durrant-Whyte (1988) first classified a multi-sensor data fusion system according to its sensor
configuration. The typology proposed by this author gained popularity and is now widely used in
the data fusion research community. The three basic types of configurations are: complementary,
competitive, and cooperative. While these divisions are defined by the functionality of the sensor
network, they are not necessarily mutually exclusive.
2.3.1 Complementary
A sensor configuration is called complementary if the sensors do not directly depend on each
other, but can be combined in order to give a more complete image of the phenomenon under
study. Complementary sensors help resolve the problem of incompleteness. As a simple
example, Figure 2-2 shows a temperature monitoring system that consists of several
thermometers each covering a different region. This configuration is complementary because
each thermometer provides the same type of data but for a different geographic region. In
general, fusing complementary data is intuitive and easy.
9
Figure 2-2: A complementary sensor network may consist of several thermometers, each
covering a different geographical region (note there is no overlap in coverage)
2.3.2 Competitive
A sensor configuration is competitive if each sensor delivers an independent measurement of the
same property. Since they provide what should be identical data, the sensors are in competition
as to which reading should be believed by the system in the case of discrepancies. Competing
sensors can be identical or they can use different methods of measuring the same attribute. The
aim of competitive fusion is reduce the effect of uncertain and erroneous measurements, provide
greater reliability, and/or add fault tolerance to a system. Figure 2-3 shows three thermometers
partially surveying the same region (shaded darker). Note that this type of configuration would
still be able to function for the joined region if one of the thermometers were to cease
functioning.
Figure 2-3: Competitive thermometers would all return information regarding the same
region (note the overlap in coverage)
10
2.3.3 Cooperative
A cooperative sensor configuration uses the information provided by two or more independent
sensors to derive information that would not be available from the single sensors alone. Figure
2-4 shows four thermometers that measure the temperature at different points along a line. Not
only can they be used as complementary sensors to provide temperature information over a
combined area (as in section 2.3.1), but they can also be use cooperatively to determine the rate
of change of temperature along the line: rate of change can be estimated as the difference
between two readings divided by the distance between two thermometers. Note that the
temperature change along the line could never be determined by using only one sensor. Thus, the
aim of cooperative sensor networks is to derive new information through the use of several
sensors.
Temperature
Position
Figure 2-4: Thermometers separated by equal distance along a line provide information
about temperature. They can also be used cooperatively to find the rate of change of
temperature
2.4 Fusion System Architectures

There are several ways of classifying data and information fusion system architectures but they
are most commonly divided into: centralized, decentralized, and distributed architectures.
Occasionally, there is reference to other types such as hierarchical or hybrid architectures, which
are simply some combination of the three aforementioned architectures.
11
2.4.1 Centralized
In centralized fusion architectures, the fusion unit is located at a central processor that collects all
of the raw data from the various sensors as shown in Figure 2-5. All processing and decisions are
made at this node and instructions or task assignments are given out to the respective sensors.
Central Processor
Processes:
Association
Filtering
Tracking
Legend:
input sensor
measurement
Figure 2-5: Centralized architecture with a central processor – adapted from (Ng, 2003)
2.4.2 Decentralized
Decentralized fusion architectures consist of a network of nodes, where each node has its own
processor. There is no central fusion or central communication center. Fusion occurs at each
node on the basis of local information and information from neighboring nodes. Additionally,
nodes have no knowledge of the global network architecture of which they are a part.
Decentralized fusion architectures could be further categorized as fully connected (as shown in
Figure 2-6) or partially connected (not shown).
Legend:
where
association,
filtering and
fusion occurs
input sensor
measurements
Figure 2-6: Decentralized fusion architecture – adapted from (Ng, 2003)

12
2.4.3 Distributed
Distributed fusion architectures are an extension of the centralized fusion architecture, where
each sensor’s measurements are processed independently before sending the estimate (often
referred to as a “track”) to a central processor for fusion with other distributed sources of input.
A distributed fusion architecture is shown in Figure 2-7.
Legend:
single sensor
tracking
Central Processor
State Vector Fusion
input sensor
measurements
Figure 2-7: Distributed fusion architecture – adapted from (Ng, 2003)
The next chapter provides an introduction to data fusion applications in transportation

engineering and a comprehensive literature review on data fusion research conducted in traffic
speed and travel time estimation.
13
Chapter 3
Literature Review
“Use only that which works, and take it from any place you can find it.”
— Bruce Lee
3.1 Data Fusion in Transportation Engineering

Transportation management centers are continuously motivated to obtain reliable information for
traffic monitoring and control operations. Basic traffic data is typically obtained from sensors
embedded in the road pavement, namely loop detectors. These fixed sensors are very useful, but
they fail in measuring spatial characteristics of traffic. That is, loop detectors are only
representative of traffic conditions at their specific location. A proliferation of new measurement
devices (cameras, cell phones, Bluetooth, GPS, etc.) mean that other sources of data are
becoming increasingly available to complement the information provided by conventional loop
detectors. Although these technologies vary, they promote a common trend: probe vehicle data
collection. In this sense, cars on the road act as a moving sensor, continuously providing
information about traffic conditions. Therefore, a wide spectrum of data and heterogeneous
sources of information are now becoming available for traffic monitoring. As a result, many
applications in transportation engineering involve a data fusion problem (El Faouzi, 2004a).
Though various reviews of data fusion have been conducted, Dailey (1996) was the first to
specifically examine data fusion technology with an eye to its application in Intelligent
Transportation Systems (ITS). Table 3-1 provides a synopsis of the data fusion techniques and
key ITS projects reviewed by Dailey (1996). Note that the year given in column two of Table 3-1
represents the date the article was published and not necessarily the date the ITS project was
completed. Therefore, one must keep in mind that some of the data fusion techniques listed in
Table 3-1 may not have actually been implemented in the final version of the ITS project cited.
14
Table 3-1: Summary of fusion techniques applied to ITS – adapted from (Dailey, 1996)
Project Year Technique(s) Purpose

(Author)
ADVANCE 1992 Kalman filter Forecasts future traffic conditions
(Kirson et al.) Neural Pattern-matches current traffic situations with historical
network situations
Expert system Identifies abnormal traffic conditions
Fuzzy logic Permits traffic conditions to be described with qualitative
measures rather than simple “yes-no” responses
PROMETHEUS 1992 Kalman filter Constructs 4-D position estimates from autonomous driving
(Behringer et al.) 1990 Expert system Decomposes a driving task into independent subtasks
(Martinez et al.) Neural Allocates one neural net for each driving subtask
network
Brainmaker 1992 Neural Pattern-matches current traffic situations with historical
(Change) network situations
IGHLC 1991 Kalman filter Determines vehicle position
(Niehaus, Stengel) Bayesian Deals with traffic uncertainty
Expert system Models concept of Worse-Case Decision Making
Pathfinder 1991 Fuzzy logic Permits traffic conditions to be described with qualitative
(Summer) measures rather than simple “yes-no” responses
TravTek 1991 Fuzzy logic Permits traffic conditions to be described with qualitative
(Summer) measures rather than simple “yes-no” responses
DRIVE 1990 Expert system Decomposes a driving task into independent subtasks
(Martinez et al.) Neural Allocates one neural net for each driving subtask
network
PRODYN 1989 Kalman filter Estimates traffic-turning movements
(Kessaci et al.) Bayesian Estimates traffic-state variables, e.g., queues and saturation
Application to AGVs: 1989 DSER Determines state of AGV (Autonomous Guided Vehicles) and
Autonomous Guided (Dempster- outside world
Vehicles Schafer
(Harris & Read) Evidential
Reasoning)
(Harris) 1988 Fuzzy logic Effectively controls AGV’s lateral motions in real time
Similar to the previous study, Keever, Shimizu, & Seplow (2003) investigated Data Fusion for
ITS but with a particular focus on delivering Advanced Traveler Information Services (ATIS).
The purpose of ATIS is to provide practical and timely information to aid travelers in an
integrated, multi-modal environment. Their study includes a literature review of ATIS data
fusion practices, the development of an appropriate ATIS data fusion model, and general
guidelines on the development of an ATIS data fusion system. Table 3-2 shows some of the data
fusion techniques assessed by Keever et al. (2003), the applicability of which is bound by the
specific task for which the data fusion model is being developed. In other words, each method
15
cannot be applied to all data fusion problems, but rather solve a particular problem through data
fusion. For example, Dempster-Shafer theory cannot be used for all data fusion problems.
Table 3-2: The relative merits of level 1 data fusion techniques –adapted from (Keever et
al., 2003)
Relative Scalable Computational Maintenance Cost to

Performance Complexity (Time) Implement
Parametric Based
Classical Inference Excellent Excellent Excellent Excellent Excellent
Bayesian Inference Good Poor Good Poor Poor
Dempster-Shafter Good Poor Good Poor Poor
GEP (Generalized Poor Poor Poor Poor Poor
Evidence Processing)
Non Parametric
Based
Parametric Templates Poor Good Good Poor Poor
Neural Nets Good Good Poor Poor Poor
Clustering Good Excellent Good Good Good
Voting Good Excellent Excellent Good Excellent
Figure of Merit Good Good Good Good Good
Correlation Measures Excellent Excellent Good Good Excellent
Pattern Recognition Good Poor Poor Poor Poor
Cognitive Based
Logical Templates Poor Good Poor Poor Good
Knowledge-Based Poor Poor Poor Good Poor
Fuzzy Set Techniques Good Good Good Good Good
A more recent and comprehensive overview of data fusion in road traffic engineering is provided
by El Faouzi (2004a). This paper acquaints the reader with the most significant applications of
data fusion techniques in various road traffic engineering areas: Intelligent Transportation
Systems (ATIS, Automatic Incident Detection (AID), and Advanced Driver Assistance
(ADAS)), network control, accident analysis and prevention, traffic demand estimation, traffic
forecasting and traffic monitoring. In keeping with the focus of this research, the remainder of
this literature review focuses on data fusion for traffic speed and travel time estimation.
16
3.2 Data Fusion for Traffic Speed and Travel Time Estimation
The purpose of this section is to provide a comprehensive overview of data fusion research for
traffic speed and travel time estimation. For the sake of readability, the research projects are
presented in chronological order under one of the following subheadings: statistical approaches,
Kalman filter applications, neural network models, evidence theory (Dempster–Shafer theory),
and other contributions.
3.2.1 Statistical Approaches

One of the earliest data fusion applications to travel time estimation was proposed by Tarko &
Rouphail (1993) for the ADVANCE (Advanced Driver and Vehicle Advisory Navigation
Concept) project. The ADVANCE project in the Chicago metropolitan area was one of several
ATIS operational tests that were underway in the USA and abroad in the early nineties. The
basic data fusion concept involved the following steps:
1. Estimate the expected link travel time from detector data (EDTT) using a regression
model developed off-line (these detectors did not measure speed),
2. Calculate the mean probe travel time (EPTT) from probe reports received during the last
interval,
3. Fuse EDTT with EPTT in order to obtain the on-line link travel time estimate (EOTT),
4. Obtain the final link travel time (EFTT) by fusing EOTT with a historical travel time
estimate (ESTT).
In step 1, the regression model chosen for detector data conversion is used to establish a
relationship between detector occupancy and expected travel time since loop detectors measuring
speed were not available. In the following steps, the estimates are fused based on the squared
estimation error similar to that of the simple convex combination described in section 4.1, only
the weights of the estimates are additionally influenced by the sample size of the estimator.
El Faouzi (2004b) explores the use of aggregative strategies in which all competing estimators
are aggregated to form a single estimate. This author assumes the intuitive strategy of using a
weighted average of individual estimators that minimizes the overall estimation error. In other
words, the optimal weights are proportional to the quality of the set of estimators. Although the
terminology is not explicitly used by the author, this research is essentially an investigation of
the simple convex combination and the Bar-Shalom/Campo combination (sections 4.1 and 4.2).
17
In either case, the research shows combining two estimators consistently outperformed the
individual estimators and provided substantial improvement under the root of mean squared error
(RMSE) criterion. El Faouzi (2004b) notes the main advantage of aggregative strategies is that
errors from different estimators may cancel out one another, but it is not easy to determine the
expected improvement of fusion for a given configuration. Moreover, it is not straightforward
how to determine which combination of estimators can improve the overall estimation reliability.
3.2.2 Kalman Filter Applications

Guo et al. (2009) sought to develop an accurate speed estimation methodology for single loop
detectors under congested conditions. Based on this requirement their research team developed a
method using the Kalman filter technique (section 4.3), based on an empirical investigation into
the relationship among single loop measurements. In their study, the unknown speed is treated as
a hidden state, and the common instrument of assuming a random walk model for the state
transition is used (Guo et al., 2009). In order to evaluate the proposed methodology, the
estimated speeds (determined only using the flow rate and occupancy data from the stations)
were compared to the dual loop measured speeds that served as the ground truth. The proposed
algorithm was tested using data from two urban regions in Northern Virginia and Northern
California. The empirical evaluation showed that their proposed algorithm can produce
acceptable speed estimates under congested traffic conditions. The comparison of the proposed
approach with the g-factor approach (adopted for comparison in their study) shows that the
proposed method consistently outperformed the traditional g-factor approach. Overall, their
study shows the Kalman filter is capable of estimating speed accurately in an online fashion.
Peng et al. (2009) present another Kalman filter based traffic measurement fusion method to
solve the problem of monitoring road sections without GPS traffic data by using neighbouring
roads that are monitored by GPS devices. The Kalman filter algorithm is verified with the real-
world GPS traffic data from the city of Hangzhou in China, and the fusion results are compared
to the results obtained by using historical speed data to replace the current estimated speed. In
the first experiment, the relative mean error (RME) of the fused results is 19.61%, while the
relative mean error of using historical speed data to replace the current speed is 44.44%. In the
second experiment, the RME of the fused results is 19.84%, while the RME of using historical
speed data to replace the current speed is 64.75%. Thus, for road sections without GPS sampling
signals, the fusion results of the speed information of associated road sections are better than the
18
results obtained by using historical speed data to estimate the current speed. The authors note
that while the Kalman filter algorithm is simple and easy to implement, and the computation cost
is very small, there is a need for optimization or identification methods to determine the
parameters to calibrate the model.
Byon et al. (2010) combine multiple data sources to estimate the current traffic condition as soon
as any single sensor becomes active using a single-constraint-at-a-time (SCAAT) Kalman filter
(see section 4.4). This method addresses the need for a versatile technique that can fuse data
from not only loop detectors and probe vehicles, but also other available data sources, which
may not necessarily have the same frequency or accuracy. Moreover, some information may
have a different unit, such as that based either fully or partially on human judgment, for instance
the information provided by websites of traffic departments and by radio stations (e.g. “moving
well”, “moving slowly”, “extremely slow” or “not moving”). For the initial application and
evaluation of the proposed SCAAT Kalman filter, different types of data are collected from 4
different sources: a floating car survey using a GPS unit, 40 loop detectors across Highway 401
in Toronto, radio broadcasting from AM 640 Chopper Traffic channel and the on-line freeway
management system of the Ontario Ministry of Transportation (MTO). The results indicate the
SCAAT estimated travel times are reasonably close to what a road user actually experiences.
Furthermore, a microsimulation package is used in order to have access to the true traffic
conditions of a simulated environment that has been calibrated for a particular road section in
Toronto. Then, the performance of the developed SCAAT filters are compared with the true
traffic conditions under different sampling strategies with varying number of probes and varying
sampling frequencies of sensors. The conclusion from the microsimulation study is that the data
should be combined only if there are data gaps from the most accurate sensor. Fusing the lower
quality loop detectors in the presence of other better sensors would only increase the error
measures. Overall, the major advantage of adopting the SCAAT data fusion method for traffic
monitoring is that any change in the sampling rate or addition/removal of any new/old sensor can
be handled with no additional major modifications to the filtering framework. Thus, the flexible
nature of SCAAT filtering can enable robust and easy-to-implement traffic monitoring systems.
3.2.3 Neural Network Models

Nelson & Palacharla (1993) introduced the idea of using neural networks (see section 4.7) for
solving the travel time estimation problem. Their work describes the ability of a
19
counterpropagation neural network model to classify input traffic flow patterns and output
current travel time estimates. Neural networks are capable of learning how to classify and
associate input/output patterns, making them suitable for solving problems like estimating
current travel times from traffic flow patterns received from several different sources. The
outputs of the counterpropagation neural network developed in this study are statistical averages
of the travel time outputs that are best representative of each traffic flow pattern class. The
authors experimented using the model for a group of ten consecutive links where some links
have loop detectors in all lanes, other links have detectors in a few lanes, and some links have no
detectors at all. Altogether, the ten links have 27 input sources (loop detectors). The network was
trained using 4 input traffic flow patterns (at four different times of the day), and following
completion of training, the counterpropagation neural network functions as a lookup table. Thus
during normal operation of the neural network, traffic flow patterns are collected during each
interval from all sources, and the Kohonen weight vector that is closest to the input traffic flow
pattern is selected (i.e. the closest traffic pattern stored in the neural network), and the
corresponding Grossberg weight vector is given as the current travel time output (i.e. the
associated travel time of that traffic pattern stored in the neural network).
Cheu et al. (2001) present a more advanced neural network based arterial speed estimation model
using data from mobile probe vehicles and inductive loop detectors. The input layer has three
neurons receiving: (1) average speed estimated by the loop detector modules; (2) average speed
estimated by the probe vehicle module; and (3) probe vehicle sample size. The latter gives an
indication on the accuracy of the average speed calculated from probe vehicle data. The authors
decided to base their model development and testing on simulated data from a microscopic
traffic simulation model capable of simulating probe vehicles and loop detectors. To evaluate the
performance of the data fusion model, the error between the fused speed estimates and all vehicle
link speeds is computed from a subset of data reserved for this purpose. The root mean squared
error (RMSE) of speed estimates is reduced to 1.08km/h. This is an improvement from using
either probe vehicle estimates or detector estimates alone.
Park & Lee (2004) developed a neural network with one input layer, one hidden layer, and one
output node. The inputs to the neural network in this study were four outputs obtained from a
dual-loop detector: average speed, average occupancy, flow, maximum occupancy. The
evaluation of their model is based on probe data collected on Whasan-ro in Jeonju, South Korea,
20
on the 23rd and 24th of April, 2003, and dual-loop detector data collected in the same period.
Whasan-ro is an arterial road consisting of 6 links. Probe vehicle data are collected by recording
all of the passing license plates last four digits at the 7 intersections on Whasan-ro. To simulate
the effect of probe vehicles as learning data, two vehicles are randomly sampled in a given link
during each time interval and taken as probe vehicles. Furthermore, two versions of the neural
network were developed: one uses the average of all observed vehicles speeds as a target value
for training, and the second uses only the probe vehicles (the average of two sample cars speeds)
as target values. The performance of the two neural networks is almost the same, except the
neural network trained on probe vehicle data can show some biases generated by the
characteristics of probe vehicle type (an intuitive result).
3.2.4 Evidence Theory (Dempster–Shafer theory)

El Faouzi, N. (2000) decided to investigate the use of evidence theory for the problem of travel
time estimation with heterogeneous sources of data. This theory is based on the work of
Dempster which was formalized by Shafer in 1976 and can be presented as an extension of
probability theory to deal with ignorance. In order to perform fusion using evidence theory, the
available data must be subjected to a certain amount of preliminary processing. Travel time
measurements are therefore broken down into a number of classes, each of which is assigned a
credibility measurement on the basis of belief functions. In their work, the percentage of correct
classifications was below 40%. Nonetheless, the proposed fusion method clearly outperforms, in
all cases, the methods based on a single data sources. The improvement in the quality of
estimation, in terms of percentage of correct classifications, varied between 6% and 13.6%.
El Faouzi & Lefevre (2006) use the evidence theory framework to present two different
approaches for the fusion of probe vehicles reports and measurements from conventional traffic
loop detectors. The first approach is a fusion of classifiers in which each source of information is
considered as a classifier. In this case, the measures obtained by the available sensors are
considered as classes. The second approach is based on a distance-based strategy by deriving
mass functions from evidence theory. In this case, the authors calculate dissimilarities between a
new couple of measurements and measurements in the learning sample. These dissimilarities
allow them to build belief functions and so to attribute a class of travel time to the new couple of
measurements. In their study, loop detectors had a correct classification rate of 26.6% and probe
vehicles had a correct classification rate of 27.3%. The results of the distance-based method for
21
designing belief functions vary according to its parameters (either number of prototypes or
number of nearest neighbours), and thus cannot be generalized. The results of the classifiers
fusion approach range from 27.27% to 32.87% correct classifications; although low, the use of these
two approaches within the framework of their application proved to be more effective than mono-
sensor approaches.
Kong & Liu (2007) introduced a fusion model to meet the requirement of real-time fusion based on
evidence theory and inspired by the federated Kalman filter; hence they call it the Federated
Evidence Fusion Model (FEFM). The model uses belief functions from evidence theory to classify
different traffic states into classes: very congested, congested, medium, smooth or very smooth. The
proposed model is illustrated in Figure 3-1. For the sub-fusion systems, the evidence result of
subsystem at time is fused with the fusion result of the main system at time − 1 using
Dempster’s combination rules. A parameter is introduced to weaken the feedback of the previous
time step (avoid the feedback leading the fusion result at the current time step). The value of this
parameter can be determined under the condition that the fusion result is identical with the real state
at all times in the training set. For the main fusion system, the fusion rule also uses Dempster’s
combination rules. The model shows advantages over conventional evidence theory in the simulation
tests and good accuracy by the tests with the real-world data. The authors also propose two
modifications to their model: distributed feedback fusion and no feedback fusion. In the distributed
feedback fusion, feedback information to every subsystem does not come from the main fusion
system anymore, but from its own decision at the previous time step. In the no feedback case, there is
neither any feedback information from the main fusion system, nor from the sub-fusion system.
Figure 3-1: Frame of the FEFM (Kong & Liu, 2007)

22
Kong, Chen, & Liu (2007) improved the FEFM by adding a mechanism to evaluate the dynamic
reliability of sensors using a group of Kalman filters. Reliability can be discussed as static reliability
and dynamic reliability. The former is the reliability that can be obtained before its subsequent
application; the latter denotes the one that changes with the varying environments such as unexpected
disturbances, environmental noises, and meteorological conditions. In the original FEFM model, only
static reliability was considered. In order to account for dynamic reliability, data smoothing is
performed in the pre-processing level. Essentially, a set of Kalman filters is built to get rid of some
noises in the sensor data. At the same time, the Kalman filters can estimate the variances of the
sensors, which are used in the calculation of the dynamic reliability in the fusion systems. The
revised model is shown in Figure 3-2.
Figure 3-2: Frame of the improved FEFM (Kong et al., 2007)
Kong et al. (2009a) and Kong et al. (2009b) present further work on the FEFM. In these papers, a
method based on traffic wave theory is proposed to better deal with the raw traffic data obtained
by loop detectors. Their method can estimate the spatiotemporal mean speed through a link, only
depending on one loop detector buried at the end of the link. Their study also proposes a method
to obtain spatiotemporal mean speeds from GPS probe vehicle data by three processing steps: 1)
coordinates transforming; 2) map matching; and 3) curve approximating. The final flowchart of
the proposed evidential fusion algorithm is show in Figure 3-3. Perhaps the most interesting part
of these papers is that the new FEFM model is validated using real-world traffic data in four
aspects: 1) accuracy; 2) conflict resistance (i.e. conflicting evidence from sensors); 3) robustness;
23
and 4) operation speed. For arteries (larger roads with more detectors), the mean state decision
error (MSDE) of the proposed method (2.5%) is a little lower than that of the conventional
evidence theory (4.2%), but for branches (smaller roads with fewer detectors), the progress made
by the proposed approach (MSDE of 6.7%) as compared to evidence theory (MSDE of 14.7%).
is more obvious.
Figure 3-3: Flowchart of the proposed evidential fusion algorithm (Kong et al., 2009)
Unfortunately, the drawback of evidence theory is that it is only able to classify different traffic
states (i.e. very congested, congested, medium, smooth or very smooth) and not estimate any
24
precise numerical traffic speed. Hence, evidence theory is more precisely defined as a traffic
speed classification method than a traffic speed estimation method.
3.2.5 Other Contributions

Choi & Chung (2002) developed a fusion algorithm based on a voting technique, fuzzy
regression, and Bayesian pooling technique for estimating link travel time in congested urban
road networks. Their proposed model is shown in Figure 3-4. The voting technique uses a
weighted sum of one-minute link travel times from loop detectors weighted according to the
standard deviation collected from the sample and the sample size to produce the weighted
average of five-minute link travel times. The Bayesian pooling technique is again a weighted
sum, where the weights are output from the fuzzy regression representing the degree of
membership of the given estimate to the link. The fuzzy regression model constructed relates the
historical link profile to the length of the link (an odd relationship to consider). The proposed
algorithm is shown to have a lower mean absolute percentage error (MAPE) compared to the
arithmetic mean.
Figure 3-4: A data fusion algorithm for link travel time (Choi & Chung, 2002)
El-Geneidy & Bertini (2004) examine the optimal temporal resolution for detector data reporting
using a combination of loop detector and automatic vehicle location (AVL) data from a bus fleet.
Basically, the authors examine the differences between average segment speeds measured by the
bus AVL system and point speeds reported by the inductive loop detector (ILD) system. Three
25
scenarios are investigated: 1) Comparing measured average segment speed with ILD reported
speed using the nearest 20-second interval; 2) Comparing measured average segment speed with
a 5- minute average ILD reported speed; 3) Comparing measured average segment speed with a
5-minute median ILD reported speed. Based on this analysis, the speed reported by loop
detectors every 20 seconds can be misleading when extrapolated over a long non-homogeneous
segment. Of the several cases examined, the five minute median speed appeared to be the most
representative of measured speed along a segment. The five minute average was also in the
acceptable range for reporting speed.
Berkow et al. (2009) developed an interesting graphical technique to trace the boundaries of the
congested regime in time and space along an arterial corridor by combining data from traffic
signal system detectors and from buses acting as probe vehicles. Their paper describes the results
of a case study from Portland, Oregon. To characterize traffic flow dynamics, it is possible to
reconstruct a map of traffic conditions by producing a color contour plot of speed as measured by
the detectors, overlaid with trajectories of the buses as constructed from the archived AVL data.
This situation is demonstrated in Figure 3-5a. In this figure, note that the detector prediction
overestimates travel times as compared with the dashed bus trajectory because the congested
conditions are present over a shorter segment than what is interpolated by the traffic detectors.
Figure 3-5b shows manual adjustments made by the authors to the time and space boundaries
generated by the signal system data to more accurately represent travel conditions. The authors
suggest this method for identifying congestion intervals.
(a) (b)
Figure 3-5: Time–space diagram plots: (a) congested routine based on signal system
measurements (b) manual revised estimate of congested regime based on bus probe and
signal system data (Berkow et al., 2009)
26
3.3 Findings from the Literature Review

As demonstrated by the literature review, some of the methods and techniques applied in traffic
speed and travel time estimation draw heavily from other disciplines and fields of study; this is a
reminder of Figure 2-1, which shows how a number of disciplines, areas of study, and techniques
contribute to the field of data fusion in general.
The literature reviewed all share the common objective of fusing multiple traffic sensors, with a
particular focus on data from conventional loop detectors and a small sample of travel time
measurements from probe vehicles obtained during the same time period. Although traffic
speed estimation has been tackled from various perspectives, the preferred fusion method is still
undetermined because of some important gaps in the literature. First, none of the researchers
made an attempt to compare their proposed technique with competing techniques. Instead, each
researcher proposed a fusion method and then compared their method with mono-sensor
approaches. Also, these individual studies cannot be accurately compared due to the lack of a
common measure of effectiveness. Many studies did not even use quantitative measures to
evaluate their proposed technique, and when they did, the same quantitative measure was not
used among researchers. Even if a common measure of effectiveness could be found between
studies, the methods cannot be compared because they were tested under different
circumstances; methods were tested on different road networks precluding a valid comparison.
Furthermore, the number of probe vehicle measurements is inconsistent; some case studies had
an abundance of probe vehicles while others had very few (as a percentage of traffic flow).
Lastly, aggregation techniques for fusing imperfect information such as the Ordered Weighted
Averaging (OWA) operator and Fuzzy Integral operators have not been explored as possible
solutions to this fusion problem, even though they are known in the data fusion literature.
Mitchell (2007) notes “Of course, the basic problem of multi-sensor data fusion is one of
determining the best procedure for combining the multi-sensor data inputs.” Hence, this thesis
implements and compares numerous data fusion techniques to observe their relative
performances.
27
Chapter 4
Data Fusion Techniques
“Data fusion is deceptively simple in concept but enormously complex in implementation.”
– US Department of Defense.
4.1 Simple Convex Combination

The generalized Millman’s formula deals with the optimal linear combination of an arbitrary
number of correlated estimates (Shin, Lee, & Choi, 2006). Suppose there are K local estimates,
, , … ,
, of an unknown L-dimensional random vector . Then, the optimal linear estimate
of is

= ,

(1)

where

= (1,
1, . .
. ,1),
(2)

and the are × constant weighting matrices determined from the least squared criterion
( , , … ,
) = min #$ − $ %.
(3)
"

Given two uncorrelated estimates and , the generalized Millman formula reduces to
= ' (' + ' )) + ' (' + ' )) ,

(4)
is the optimal linear estimate and ' ≡ ' = +(( − )( − ), ) is the covariance
where
matrix of estimator k.
For the fusion of the N uncorrelated estimates, the weighting matrices ( , , … , - ) take the
form
28
)
-
. = #'.. '// ) % , = 1, … , 0,
(5)
/
and the general equations for n state estimates are:
: )
+1121 3241567: ' = 9 ' )

; ,
(6)

= ' 9 ' ) ;.
<7 +=>7:
(7)

This solution is simple to implement, hence it is commonly referred to as the “simple convex
combination” technique in data fusion literature (Chong & Mori, 2001), (Ng, 2003).
4.2 Bar-Shalom/Campo Combination

If the measurement noises are correlated, then the simple convex combination is not optimal;
rather the Bar-Shalom/Campo combination is the optimal solution to the generalized Millman
formula in this case.
Given two correlated estimates and , the generalized Millman formula reduces to
= (' − ' )(' + ' − ' − ' ))

(8)
+ (' − ' )(' + ' − ' − ' )) ,
is the optimal linear estimate, ' ≡ ' = +(( − )( − ), ) is the covariance

where
matrix of estimator k, and '? = +@( − )( − ? ), A, B ≠ D is the cross-covariance matrix of
estimators k and l (Mitchell, 2007).
Chang, Saha, & Bar-Shalom (1997) expressed the Bar-Shalom/Campo combination for two
correlated estimates and , more conveniently as:
+1121 3241567: ' = ' − (' − ' )(' + ' − ' − ' )) (' − ' ) (9)
= + (' − ' )(' + ' − ' − ' )) ( − ).

<7 +=>7: (10)
More recently Huimin, Kirubarajan, & Bar-Shalom (2003) extended the Bar-Shalom-Campo
equations to the case when there are more than two estimators. For N correlated estimates:
+1121 3241567: E = (F, ') F)) , (11)

29
J,
<7 +=>7: GH = (F, ') F)) F, ') I (12)
where K is an 5 L 5 identity matrix (n is the length of each state vector), F = MK, K, … K N, is an

J = MG , G , … , G- N, and ' is the 05 L 05 covariance matrix taking the form:
05 L 5 matrix, I
'PP 'PQ ⋯ 'PS

' 'QQ ⋯ 'QS
' = O QP U.
(13)
⋯ ⋯ ⋱ ⋯
'SP 'SQ ⋯ 'SS
The application of the Bar-Shalom/Campo formula is often referred to as “track-to-track” fusion

where a track traditionally refers to the state vector of a target. The Bar-Shalom/Campo formula
is the optimal track-to-track fusion technique as it is a maximum likelihood (ML) estimator of
the state. It should be noted it is also the least squares (LS) estimate since LS and ML are
equivalent in the Gaussian case (Huimin et al., 2003). For calibration, the simple convex
combination and the Bar-Shalom/Campo combination simply require the covariance matrix of
the estimators, which can be easily be obtained from a training set of data that includes
measurements and true values (i.e. sensor measurements and “ground truth” knowledge).
4.3 Measurement Fusion

Sequential Bayesian inference is the process of using Bayesian estimation for a dynamic system
which is changing in time. For most applications, the equations of sequential Bayesian filtering
are analytically intractable and approximate solutions must be used. For the special case of linear
Gaussian systems, the equations are tractable and a closed-form recursive solution for the
sequential Bayesian filter is available. This is the Kalman filter and because of its computational
efficiency, it quickly established itself as the favorite algorithm for sequential Bayesian inference
(Mitchell, 2007). The Kalman filter is a set of mathematical equations that provides an efficient
computational recursive means to estimate the state of a process, in a way that minimizes the
mean of the squared error (Welch & Bishop, 2006).
4.3.1 The Kalman Filter

Mathematically, the Kalman filter assumes a linear Gaussian process model:
V = W V) + X Y) + Z) , (14)
where V is the system state, W relates the state at the previous time step to the state at the current
time step, X relates the optional control input [ at the previous time step to the state at the
current time step, and Z~ ](0, _) represents the process noise (where Q is the variance of w).
30
The Kalman filter also assumes a linear Gaussian measurement model:
` = a V + b , (15)
where ` is a measurement, a relates the state to the measurement, and b~ ](0, c) represents
the measurement noise (where R is the variance of v).
The Kalman filter operates in an ongoing discrete cycle as shown in Figure 4-1: The time update
projects the current state estimate ahead in time. The measurement update adjusts the projected
estimate by an actual measurement at that time.
Time Update Measurement Update

(“Predict”) (“Correct”)
Figure 4-1: The ongoing discrete Kalman filter cycle (Welch & Bishop, 2006)
The equations for the time update (prediction) are:
V|) = W V) + X Y) , (16)
'|) = W ') W , + _) . (17)
And the equations for the measurement update (correction) are:
V = V|) + e @` − a V|) A, (18)
' = '|) − e a '|) , (19)
where f is the Kalman gain matrix:
e = '|) a , @a '|) a , + c A .

) (20)
4.3.2 Multi-Sensor Multi-Temporal Data Fusion

The traditional Kalman filter as defined above is suitable for multi-temporal data fusion in which
a recursive filter is used to fuse together a sequence of measurements made using a single sensor
describing a system evolving over time. In order to use a Kalman filter for multi-sensor multi-
temporal data fusion, the algorithm must be extended to consider several sequences of
31
measurements being made by numerous sensors. Measurement fusion is a theoretically optimum

multi-sensor multi-temporal data fusion method (Mitchell, 2007).
The corresponding process and measurement equations (for two sensors g() and g() ) are:
V = W V) + X Y) + Z) , (21)
g = a () V + b ,
() () (22)
g = a () V + b ,
() () (23)
where Z~ ](0, _), b() ~ ](0, c() ) and b() ~ ](0, c() ) as before.
In measurement fusion, we place all the measurements g , > ∈ {1,2, … , l} obtained at any
(h)
time step B into a single augmented measurement vector:
, , , ,
g = nog p , og p , … , og p r .
() () (q) (24)
We then estimate the state of the system using the Kalman filter equations as before; only the
following modifications are made:
a
()
s = 9 ;,
a
(25)
a
()
b
()
= 9 () ;,
b
(26)
b
c 0
()
s = 9
c ;.
(27)
0 c
()
This process is illustrated in Figure 4-2. Furthermore, this formulation can be expanded
intuitively for more than two sensors.
32
Figure 4-2: The measurement fusion process for two measurement sequences. The
individual measurement sequences are placed in an augmented measurement sequence.
The augmented vector is then fused using a single KF (Mitchell, 2007)
4.4 Single-Constraint-At-A-Time (SCAAT) Kalman filter

The measurement fusion application of the Kalman filter presented above involves collecting a
group of sensor measurements and solving a system of equations that work together to produce a
theoretically optimum solution. This method has one major disadvantage: it requires
measurements from all sensors to produce a fused estimate. In traffic monitoring, different
sensors provide measurements at different frequencies, and occasionally measurements are
unavailable from a certain sensor (see Figure 4-3). Therefore, a special type of Kalman filtering
can be used for the purpose of fusing traffic data: a modified version of the Kalman filter known
as the Single-Constraint-At-A-Time (SCAAT) tracking method is useful (Byon et al., 2010).
This method uses the single most recent measurement from any available sensor to update the
state estimate based on the characteristics of the observed sensor (i.e. the variance associated
with that sensor) and the accumulated state estimation from the previous time step. The
formulation is identical to that of a regular multi-temporal Kalman filter as presented earlier,
except that the SCAAT Kalman filter reads the single most recent sensor in the measurement
update step. In other words, the SCAAT Kalman filter acts as though there is only one sensor
reporting measurements (i.e. a regular Kalman filter), but adjusts the measurement noise (i.e. the
variance associated with the sensor) to fuse different sensors. This means that the matrix of speed
measurements from different sensors is not required anymore.
The measurement fusion Kalman filter and SCAAT Kalman filter require the definition of the
process and measurement model (Equations 14 and 15). As is commonly done, speed is treated
33
as a hidden state and the random walk process model is used in this research. Furthermore, since
the sensors are measuring the observations directly, no special measurement model is required.
This leaves only the process noise to be determined (_), which can be found by iteratively
applying the filter to a training set until the optimal process noise is found.
Figure 4-3: Illustration of various sources of traffic monitoring (Byon et al., 2010)
4.5 Ordered Weighted Averaging (OWA)

The OWA operator is generally composed of the following three steps (Xu, 2005):
1. Reorder the input arguments in descending order.
2. Determine the weights associated with the OWA operator by using a proper method.
3. Utilize the OWA weights to aggregate these reordered arguments.
Formally, an OWA operator of dimension 5 is a mapping, tuv: w: → w, that has an associated

5 vector y = (y , y , … , y: ), such that y/ ∈ M0,1N and ∑:/ y. = 1:
:
tuv{ ( , , … , : ) = y/ |/ ,
(28)
/
where |/ is the }~ largest element of the collection of the aggregated objects , , … , : .
34
Central to this operator is the reordering of the arguments, based upon their values. Note that an
argument . is not associated with a particular weight y. , but rather a weight y. is associated
with a particular ordered position of the arguments.
Since the OWA aggregation operator runs between Max (21) and Min (5), a measure called
4.5.1 Orness
“orness” has been defined to characterize the type of aggregation being performed for a
particular value of the weighted vector u (Filev & Yager, 1994). The measure is defined as:
1
:
2157==(u) = (5 − )y. .

5−1
(29)
.
like a Max (21) operation. It can be shown that 2157==(M1 0 … 0N, ) = 1,

This measure, which lies in the unit interval, characterizes the degree to which the aggregation is
2157== (M0 0 … 1N, ) = 0 and 2157==(M1⁄5 1⁄5 … 1⁄5N, ) = 0.5. Therefore, the Max, Min, and
arithmetic mean operators can be regarded as OWA operators with degree of orness 1, 0 and 0.5
respectively.
4.5.2 Dispersion
A second measure known as “dispersion” is suggested for use in calculating how much of the
information in the arguments is used during an aggregation based on a weighting vector u
(Filev & Yager, 1994). The measure is defined as:
:
=(u) = y. ln y. .
(30)
.
4.5.3 Learning OWA Operator Weights from Data

One important issue in the OWA operator is to determine its associated weights. An algorithm
was developed by (Filev & Yager, 1994) that allows OWA operator weights to be learned from
data, as described in brief below. This algorithm was implemented for this research.
Assume there is a collection of > observations each comprised of a tuple of 5 arguments

( , , … , : ) and an associated aggregated value, . We denote the reordered objects of
the B ~ sample by | , | , … , |: where |/ is the }~ largest element of the argument collection
, , … , : . Using these ordered arguments, we need to find a vector of the OWA weights
y = (y , y , … , y: ), to satisfy the following condition as faithfully as possible:
35
| y + | y + ⋯ + |: y: = , B = 1,2, … , >. (31)
aggregation operator by minimizing the instantaneous errors 7 (B = 1,2, … , >):

The above condition is relaxed by looking for a vector of OWA weights that approximates the
1
7 = (| y + | y + ⋯ + |: y: − ) , B = 1,2, … , >,
2
(32)
with the constraints that y. ∈ M0,1N and ∑:. y. = 1.
To circumvent the constraints on y. , an iterative learning procedure can be used. Let . ( =

1,2, … , 5) be 5 parameters, and set the initial values . (0) = 0, ( = 1,2, … , 5); then the
procedure at each iteration D is suggested as follows:
1. Observe a new sample and compute the ordered arguments | , | , … , |: .
2. Use the . (D)( = 1,2, … , 5) to provide a current estimate of the weights:
7 (?)
y. (D ) = , = 1,2, … , >.
∑:/ 7 (?)
(33)
3. Utilize the estimated weights along with ordered arguments to get a calculated aggregated
value:
= | y (D ) + | y (D ) + ⋯ + |: y: (D ), B = 1,2, … , >. (34)
4. Update the estimate of the . :
. (D + 1) = . (D ) − y. (D )@|. − A@ − A, = 1,2, … , 5, (35)
where denotes the learning rate (0 ≤ ≤ 1).
Obviously, the . parameters determining the OWA weights are updated by propogation of the
error @ − A between the current estimated aggregated value and the actual aggregated value
with the factors y. and @|. − A. These factors are the current OWA weight y. and the
difference @|. − A between the ~ aggregate object |. and the current estimated aggregated
value .
36
4.6 Fuzzy Integrals

Let = {L , … , L: } represent the set of criteria being aggregated, and E() represent the power
set of , i.e. the set of all subsets of . A fuzzy measure on the set of criteria is a set function
G: E () → M0,1N, satisfying the following axioms:
i. G (∅) = 0, G () = 1
ii. v ⊂ ⊂ >D7= G(v) ≤ G()
G (v) can be viewed as the weight of importance of the set v. Therefore, in addition to the
weights defined on each criterion, weights for each combination of criteria are also defined.
A fuzzy measure is said to be additive if G (v ∪ ) = G (v) + G () whenever v ∩ = ∅, super-

additive if G(v ∪ ) ≥ G(v) + G() whenever v ∩ = ∅, and sub-additive if G (v ∪ ) ≤
G (v) + G () whenever v ∩ = ∅. Note that if a fuzzy measure is additive, then it suffices to
define the 5 coefficients (weights) G ({L }), … , G ({L: }) to entirely define the measure.
Otherwise, one needs to define the 2: coefficients corresponding to the power set of
(Bouchon-Meunier, 1998).
• A measure which is sub-additive for two criteria i and j expresses a weakening

dependency between these criteria, in the sense that i and j are redundant. That is, the
satisfaction of one criterion more or less entails the satisfaction of the other.
Consequently, the weight of importance attached {i ,j} a is lower than the sum of
individual weights (Grabisch et al., 1995).
• A measure which is super-additive for two criteria i and j expresses a strengthening

dependency between them, in the sense that criteria i and j support each other. That is,
though satisfaction of i and j taken individually is not considered so important, the
simultaneous satisfaction of these criteria is considered very important. Consequently, the
weight of importance attached to {i ,j} is greater than the sum of individual weights
(Grabisch et al., 1995).
Below, the discrete fuzzy integrals are introduced in the view of aggregation operators, and are
therefore defined using a connective-like notation instead of the usual integral form.
37
Let G be a fuzzy measure on (as before). The discrete Sugeno integral of , … , : with respect
4.6.1 The Sugeno Fuzzy Integral
to G is defined by (⋀ and ⋁ denote min and max respectively):

:
< ( , … , : ) = o(.) G@v(.) Ap,

(36)
.
where (.) , = 1, . . , 5 indicates that the indices have been permuted so that () ≤ () … (:)
and v(.) = L(.) , … , L(:).
Let G be a fuzzy measure on (as before). The discrete Choquet integral of , … , : with
4.6.2 The Choquet Fuzzy Integral
respect to G is defined by:

:
3 ( , … , : ) = @(.) − (.)) A G@v(.) A,

(37)
.
with the same notations as above, and () = 0.
4.6.3 Fuzzy Integrals as Aggregation Operators

Discrete fuzzy integrals can be viewed as aggregation operators or functions mapping an input
space to an output space. Neural networks can also be viewed as mappings from an input space
to an output space, but there are at least two important differences between a neural net and a
fuzzy integral as noted by Grabisch et al. (1995):
i. The number of coefficients defining a neural net is a priori undetermined, since the
number of hidden layers and hidden neurons can be adjusted freely. For fuzzy integrals,
the number of coefficient is at most 2: − 2.
ii. In the present state of the art, nobody can tell what the meaning of synaptic weights is,
nor relate the properties of the weights to the properties of the network. Unfortunately,
neural nets are black boxes. Fuzzy integrals however, offer more transparency since
meaning can be attached to the coefficients of a fuzzy measure. Some of these measures
include: importance index (Shapley value), interaction index, overlap coefficient, degree
of overlap, and the necessity coefficient (See Bouchon-Meunier (1998) and Grabisch et
al. (1995) for formal definitions).
38
Both the Sugeno and the Choquet integrals compute a kind of distorted average, however they
are essentially different in nature since the former is based on non linear operators (min and max)
and the latter on usual linear operators. Together, the Sugeno and the Choquet integral contain all
of the order statistics, including the min, max and median (Figure 4-4). Furthermore, the Choquet
integral encompasses both the weighted arithmetic sum and the OWA operators, which have
been said to be “orthogonal” in an intuitive sense; thus, the Choquet integral has strong
expressive power since it can arbitrarily mix these two kinds of operators. It has been said that
the Choquet integral is suitable for cardinal aggregation (where numbers have a real meaning),
while the Sugeno integral seems to be more suitable for ordinal aggregation (where only order
makes sense) (Bouchon-Meunier, 1998).
Figure 4-4: Set relations between various aggregation operators and fuzzy integrals
(Grabisch, 1996)
The main interest of fuzzy integrals lies in the fact that they can represent interaction between
criteria. This is due to the fact that a weight of importance is attributed to every subset of criteria.
A simple example to illustrate what is meant by interaction and how it can be modeled by fuzzy
integrals is found in (Grabish, 1996). However, the richness of fuzzy integrals has to be paid for
by the complexity of the model, since the number of coefficients involved in the fuzzy integral
model grows exponentially with the number of criteria to be aggregated. For example, when the
number of criteria to be aggregated is 3, the number of fuzzy coefficients required is 8 (2 ).
Adding one additional criteria (5 = 4), the number of fuzzy coefficients required is 16 (2 ).
The main difficulty is to identify all these coefficients, either by some learning data, or by
questionnaire, or both.
39
There are 2: coefficients describing the fuzzy measure required in a fuzzy integral. Thus, the
4.6.4 Identification of Fuzzy Measures based on Learning Data
problem of identifying these coefficients is far more complex than for the OWA operators.
However, similar to the OWA operator, it is possible to identify the best fuzzy measure to use for
fuzzy integral aggregation based on learning data.
Suppose that ( , ¡ ), B = 1, … , D are learning data, where = ¢ , … ,: £ is a 5 dimensional
~
input vector, containing the partial scores of object B with respect to criteria 1 2 5, and ¡ is the
fused score of object B. Then, one can try to identify the best fuzzy measure G so that the squared
error criterion of the Choquet fuzzy integral (3 ) is minimized:
+ = (3 @ , , … , ,: A − ¡ ) .

(38)

It can be shown (Grabisch et al., 1995) that the above equation can be put under a quadratic
program form that is:
Y~ ¤Y + ¥~ Y

minimize:
under the constraint: WY + ¦ ≥ §
where u is a (2: − 2) dimensional vector containing all the coefficients of the fuzzy measure G
(except G(∅) and G() which are fixed), D is a (2: − 2) dimensional square matrix, c is a
(2: − 2) dimensional vector, A is a 5(2:) − 1) × (2: − 2) dimensional matrix, and b is a
5(2:) − 1) dimensional vector. The general form of these vectors and matrices is not easily
expressed; interested readers are referred to “Fundamentals of Uncertainty Calculi with
Applications to Fuzzy Inference” (Grabisch et al. 1995). This program has a unique solution but
is subject to a few flaws. If there are too few learning data, matrices may be ill conditioned.
Furthermore, the matrix A is a sparse matrix and becomes sparser as n grows, causing bad
behaviour in optimization routines. For all these reasons, including memory problems and time
of convergence, Grabish (1996) suggests the solution given by the quadratic program is not
always reliable in practical situations. Nonetheless, this algorithm is popular and was
implemented for this research.
40
4.7 Artificial Neural Networks

Neural networks are composed of simple elements operating in parallel. These elements are
inspired by biological nervous systems. As in nature, the connections between elements largely
determine the network function. You can train a neural network to perform a particular function
by adjusting the values of the connections (weights) between elements as shown in Figure 4-5
Neural networks have been trained to perform complex functions in various fields, including
pattern recognition, identification, classification, speech, vision, and control systems (Beale,
Hagan, & Demuth, 2010).
Figure 4-5: Typically, neural networks are adjusted, or trained, so that a particular input
leads to a specific target output (Beale et al., 2010)
4.7.1 Neuron Architecture

A neural network is composed of individual neurons which transform an input to an output as
shown in Figure 4-6.
Figure 4-6: A neuron with a single scalar input and a scalar bias (Beale et al., 2010)
Referring to Figure 4-6, the scalar input p is transmitted through a connection that multiplies its
strength by the scalar weight w to form the product wp, again a scalar. Then, the bias is added to
the product wp as shown by the summing junction, and this becomes the argument of the transfer
41
function f, which produces the scalar output a. Here f is a transfer function, typically a step
function or a sigmoid function, that takes the argument n and produces the output a. Examples of
various transfer functions are shown in Figure 4-7. Note that w and b are both adjustable scalar
parameters of the neuron. The central idea of neural networks is that such parameters can be
adjusted so that the network exhibits some desired behavior. Thus, you can train the network to
do a particular job by adjusting the weight or bias parameters (Beale et al., 2010).
Figure 4-7: Three of the most commonly used functions: a) hard-limit transfer function, b)
linear transfer function , c) sigmoid transfer function (Beale et al., 2010)
4.7.2 Layer Architecture

Two or more of the neurons shown earlier can be combined in a layer as shown in Figure 4-8.
Figure 4-8: A one-layer network with R input elements and S neurons (Beale et al., 2010)
4.7.3 Network Architecture

Finally, a network can have several layers as shown in Figure 4-9.
42
Figure 4-9: A network can have several layers. Each layer has a weight matrix W, a bias
vector b, and an output vector a (Beale et al., 2010)
4.7.4 Neural Network Training – Backpropagation Algorithm

As in the case of the OWA operator and Fuzzy Integrals, artificial neural networks can be trained
so that a particular input leads to a specific output. The training process requires a set of
examples of proper-network behavior (inputs and target outputs). The general process is to
minimize the mean squared error by iteratively adjusting the weights and biases of the network
during training. This technique is called backpropagation, which involves performing
computations backward through the network, using the gradient of the mean squared error to
adjust the weights to minimize mean squared error. The backpropagation computation is derived
using the chain rule of calculus. While there are many variations of the backpropagation
algorithm, the simplest implementation updates the network weights and biases in the direction
in which the mean squared error decreases the most rapidly. A general iteration of the algorithm
can be written as:
L¨ = L − © (39)
where L is a vector of current weights and biases, is the slope of the error function with
respect to the weights, and © is the learning rate.
This research applied the multi-layer feed forward neural network with back propagation
training, with one input layer, one hidden layer, and one output later. The number of neurons in
the input layer is fixed by the number of inputs (the number of loop detectors plus one neuron for
the Bluetooth estimate for the first architecture, and two neurons for second architecture). The
43
number of neurons in the output layer is one – the fused estimate of all sensors. The number of
neurons in the hidden layer is variable. The optimal number of neurons in the hidden layer was
determined in each instance by applying the network to a training set and varying the number of
neurons in the hidden layer until the best net was found (the range of 1 to 10 hidden neurons was
explored). The neural network was implemented and trained using the default settings in
MATLAB using the function “newfit”; see (Beale et al., 2010) for details.
4.8 Fusion Architectures

Any one of the fusion techniques described earlier can then be implemented in a distributed
fusion architecture. Two variations of a distributed architecture (2.4.3) are implemented in
MATLAB to fuse probe vehicle estimates and loop detector estimates for traffic speed
estimation. The fundamental difference lies in how loop detectors are treated, and should provide
some insight into whether or not loop detectors should be fused together before the main fusion
occurs, or if each loop detector should be sent to the central fusion processor separately.
4.8.1 A Competitive Distributed Data Fusion Architecture

In the first architecture, each sensor is treated as an independent sensor assumed to be measuring
the same quantity on the link, namely average traffic speed. There are a few notable
consequences of this architecture. First, since each loop detector enters the fusion node as a
separate estimator, the loop detector data input into the fusion node far outweigh the Bluetooth
data since the Bluetooth stations bounding the link are treated as only one sensor. This could be
advantageous or disadvantages to the fusion node. On one hand, the fusion node receives more
raw data and is given a greater opportunity to find the relationship between these estimators. On
the other hand, the fusion node is bombarded with far more less accurate data, as probe vehicle
estimates are generally more accurate than loop detector readings but less numerous. Figure 4-10
shows this architecture, where the competitive fusion node is one of the seven techniques
described in the previous sections.
44
Loop Detector 1 Pre-processing

. .
. .
.
Loop Detector n .
Pre-processing
Competitive
Probe Vehicles Pre-processing Fusion
(Bluetooth) Algorithm
(
Figure 4-10: Competitive data fusion architecture (“Architecture 1”)
4.8.2 A Cooperative and Competitive Distributed Data Fusion

Architecture
In the second architecture, it is assumed that loop detectors are only measuring the average speed
at their location, and can therefore be fused together in a cooperative fashion before being
competitively fused with probe vehicle data. As is commonly done in freeway estimation
(Berkow et al., 2009), the midpoint method is used to aggregate loop detector measurements at
the cooperative fusion node. In the midpoint method, each detector speed measurement is
extrapolated in space halfway to the upstream detector and halfway to the downstream detector
to determine the weight each detector should receive in a weighted sum. In this way, each
detector is given a weight linearly proportional to the amount of freeway it covers relative to the
other loop detectors. Then, the central competitive fusion node uses one of the seven algorithms
discussed earlier to fuse the joint loop detector estimate with the Bluetooth estimate. This fusion
architecture is shown in Figure 4-11. Clearly, one of the disadvantages of this architecture is that
in the absence of Bluetooth probe vehicle estimates, many of the algorithms at the final fusion
node will simply output the midpoint average of loop detectors that was sent into the node. That
is, many of the aforementioned data fusion techniques require more than one estimate to perform
any sort of processing. On the other hand, when Bluetooth probe vehicle estimates are available,
the central fusion node receives one Bluetooth estimate and one joint loop detector estimate. This
might prove beneficial since the central fusion node is not being showered by an abundance of
45
loop detectors. For example, if there are five loop detectors between two consecutive Bluetooth
stations on a stretch of freeway, the estimates enter the central fusion node of the first
architecture in a 5:1 (loop detectors : Bluetooth) ratio but retain a 1:1 ratio in the second
architecture. Note the input ratio of the second architecture is fixed at 1:1 regardless of the
number of loop detectors. Lastly, the SCAAT Kalman filter is not compatible with this
architecture since the cooperative fusion node needs estimates from all sensors, thereby requiring
a predefined interval at which to aggregate its readings, defeating the purpose of the SCAAT
Kalman filter in the first place. Therefore, this combination is not considered in the analysis.

Cooperative
. . Fusion
. . Algorithm
. .
Loop Detector n Pre-processing
Competitive
Probe Vehicles Pre-processing Fusion
(Bluetooth) Algorithm
(
Figure 4-11: Cooperative and competitive data fusion architecture (“Architecture 2”)
4.9 Measures of Effectiveness

Various measures have been used in the literature to evaluate data fusion algorithms, as shown in
Table 4-1. Perhaps the most intuitive measure of effectiveness is mean absolute error (MAE).
The MAE measures the average magnitude of the errors in a set of estimates, without
considering their direction. The MAE is a linear score which means that all the individual
differences are weighted equally in the average. A more common measure is the root of mean
squared error (RMSE). The RMSE is a quadratic scoring rule which measures the average
magnitude of the error. Since the errors are squared before they are averaged, the RMSE gives a
46
relatively high weight to large errors. This means the RMSE is most useful when large errors are
particularly undesirable. The other measures listed in Table 4-1 are useful for more specific
purposes. For example, ME can be used to determine whether or not the estimator has a bias.
Relative measures such as MRE and MARE can be used to determine if the estimator’s accuracy
is affected by different operating domains. Overall, the RMSE is widely used because of its
desirable properties and will be used in this study as the measure of comparison between data
fusion techniques.
Table 4-1: Conventional measures of effectiveness for the evaluation of estimation error
Name Formula
Sª
P
«¬ − b¬ )
(b
Mean Error (ME) (40)
Sª
¬P
Sª
P
|b
«¬ − b¬ |
Mean Absolute Error (MAE) (41)
Sª
¬P
Sª
P «¬ − b¬
b
( )
Mean Relative Error (MRE) (42)
Sª b¬
¬P
Sª
P b«¬ − b¬

Mean Absolute Relative Error (MARE) (43)
Sª b¬
¬P
Sª
P
«¬ − b¬ )Q
(b
Mean Squared Error (MSE) (44)
Sª
¬P
Sª
P
Root of Mean Squared Error (RMSE)
® (b
(45)
«¬ − b¬ )Q
Sª
¬P
where 0, is the number of measurements, 4̄ is the estimated link average speed, and 4 is the
true link average speed.
47
Chapter 5
Highway 400 Simulation Case Study
“Essentially, all models are wrong, but some are useful.”
– George E. P. Box
5.1 Highway 400

A 5km stretch of Highway 400 was chosen as a suitable test bed for an investigation of data
fusion techniques. The details of the 17 sensors located along this section are provided in Table
5-1. Note that loop detectors are denoted VDS (Vehicle Detector Station). Figure 5-1 provides a
scaled drawing of the 4 consecutive test links that can be constructed from this group of sensors.
Table 5-1: Highway 400 sensor details
Type Name Address Latititude Longitude

Bluetooth Reader 108020 N. of Steeles Ave W 43.7732 -79.5347
VDS 400DN0100DSS S. of Steeles Ave W 43.77163 -79.5349
VDS 400DN0090DSS S. of Steeles Ave W 43.76541 -79.5335
Bluetooth Reader 108000 N. of Finch Ave W 43.76408 -79.5325
VDS 400DN0080DSS N. of Finch Ave W 43.7608 -79.5324
VDS 400DN0070DSS At Finch Ave W 43.75583 -79.5311
Bluetooth Reader 108160 At Finch Ave W 43.75527 -79.5304
VDS 400DN0061DSS S. of Finch Ave W 43.75344 -79.5306
Bluetooth Reader 108140 N. of Sheppard Ave. W 43.74667 -79.5284
VDS 400DN0040DSS N. of Sheppard Ave W 43.73863 -79.5271
VDS 400DN0030DSS S. of Sheppard Ave W 43.73349 -79.5259
Bluetooth Reader 108240 N. of Hwy 401 43.7276 -79.5239
VDS 400DN0020DSS S. of Sheppard Ave W 43.72741 -79.5245
48
Link 1 - North of Steeles Ave W to North of Finch Ave W
10802 400DN0100DS 400DN0090DSS 108000
175.8 703.3 158
Link 2 - North of Finch Ave W to Finch Ave W
108000 400DN0080DS 400DN0070DSS 108160
364.7 560.9 69.2
Link 3 - Finch Ave W to North of Sheppard Ave W
400DN0060DSS 400DN0058DSS
108160 400DN0061DSS 400DN0059DSS 108140
200.6 51.1 49.2 49.2 621.5
Link 4 - North of Sheppard Ave W to North of Hwy 401
400DN0050DS 400DN0020DSS
108140 400DN0040DSS 400DN0030DS 108240
849.8 580.5 672.9 9.7

46.8
Figure 5-1: Highway 400 sensor schematic (distances shown in meters – drawn to scale)
As can be seen from the schematics above, this stretch provides closely spaced Bluetooth
stations (approximately 1km for the first three links, and approximately 2km for the fourth link)
and at least two loop detectors in between each set of Bluetooth stations. This study uses data
from the southbound AM peak hour, as traffic is expected to break down during this time since
Highway 400 is a major link for commuters entering the city of Toronto in the morning. This
allows the analysis to cover both uncongested (close to link 1) and congested (closer to link 4)
freeway traffic conditions.
5.2 Traffic Microsimulation in Paramics

Traffic microsimulation is the process of creating a virtual model of a city's transportation
infrastructure in order to simulate the interactions of road traffic, and other forms of
transportation, in microscopic detail. Traffic microsimulation computer models capture the
49
interactions of real world road traffic through a series of simple algorithms describing car
following, lane changing, gap acceptance, and spatial collision detection (Quadstone Paramics,
2010).
Microsimulation allows for robust experimentation for traffic studies in general, and for data
fusion research in particular. By keeping track of all of the vehicles speeds throughout the
simulation horizon, it is as if we have a perfectly accurate GPS unit in each and every vehicle in
the simulation. For the case of data fusion experimentation, this allows for the calculation of
“ground truth” traffic conditions with absolute certainty. Therefore, the ground truth average link
speed is representative of every vehicle which has traversed a road section for the given time
interval. Furthermore, we can deploy sensors into the microsimulation and use data fusion to
estimate the “ground truth” conditions and compare the estimate to the actual traffic conditions.
Conducting this sort of experimentation in the real world would be nearly impossible, as every
vehicle on the freeway would need to have a highly accurate GPS unit to establish ground truth
conditions.
5.2.1 Monitoring of Bluetooth Devices in Paramics

To simulate the affect of probe vehicles being captured by the Bluetooth traffic monitoring
system described earlier, the Bluetooth locations need to be marked in the microsimulation
model. As vehicles with a Bluetooth device pass the markers, a simulated MAC address is
stored, and once the MAC address is matched at a successive Bluetooth station, the travel time
and average speed can be computed for data fusion purposes. Unfortunately, detections do not
necessarily occur directly at the location where the vehicle passes by the Bluetooth station.
Rather, there is a range in the roadway section where the detection might take place, based on the
expected coverage range of a Bluetooth station. This concept is illustrated in Figure 5-2. Based
on Figure 5-2, the detection range (DR) for a given lane l, can be computed as follows:
°w¨/) = +/−²100 − ? ,
(46)
where ? is the perpendicular distance (m) from the centerline of lane l to the Bluetooth station.
50
100m
R+=86.6
m
dl = 50m
R-=-86.6m
Figure 5-2: Bluetooth detection coverage projected onto a road lane
This suggests that detections may occur before or after the actual Bluetooth station, with a
maximum range of nearly 100 meters if the Bluetooth station was directly adjacent to the lane.
The distribution of detection locations is unknown and for the purposes of this study is assumed
to be triangularly distributed with a minimum and maximum value defined by the equation
above, and the modal value at the Bluetooth station itself. The triangular distribution is typically
used when there is limited or no data and is based on knowledge of the minimum and maximum
values and an "inspired guess" as to the modal value. Given these qualities, this situation is well
suited for the application of the triangular distribution. It should be noted that even if the
distribution was not centered on the Bluetooth station itself, any bias would cancel itself out
since each probe vehicle measurement is based on two successive detections.
Once a location is generated, it can be used to calculate an adjusted timestamp for the MAC
address, as if it had been detected at that location. Let denote the time at which a vehicle
actually passes a Bluetooth station and let ³ denote the time the Bluetooth device was detected.
Then
ẃ
³ = + ,
(47)
where w is the distance from the Bluetooth station to the detection location (positive when the
detection location is ahead of the station, negative when the detection location is behind), and ´
51
is the vehicle speed. w can be generated by transforming a uniformly distributed random variable
to a triangularly distributed random variable:
+ ¶·(| − )(6 − ) ¸21 0 < · < º(6)»

w=µ ,
(48)
| − ¶(1 − ·)(| − )(| − 6) ¸21 º(6 ) ≤ · < 1
where is the minimum distance (min °w¨/) of Eqn. 46), | is the maximum distance
(max °w¨/) of Eqn. 46), 6 is the modal value (zero), and · is a random variable drawn from the
uniform distribution in the interval (0, 1).
The device discovery time is the amount of time a vehicle spends in the detection coverage area
which depends on the speed and lane of the vehicle. It can be calculated as follows:
2 ∗ ²100 − ?
(49)
°°¾ = ,
1000 1
´À o 1 p o3600p
where ? is the perpendicular distance (m) from the centerline of lane l to the Bluetooth station,
and ´À is the speed (km/hr) of the vehicle. Theoretical device discovery times are shown in
Figure 5-3.
12
11
12
10
Time for Device Discovery
10 9
8 8
7
6
6
4
60 0 5
80 20
Vehicle Speed (km/hr) 100 40
120 60 Distance from Bluetooth Station (m)
Figure 5-3: Theoretical Bluetooth device discovery times

52
As can be seen from the surface above, the time window for device discovery on a freeway could
theoretically range from 5 to 12 seconds. In the best case scenario, the lane is close the Bluetooth
station and the vehicle is moving slowly. In the worst case scenario, the lane is approximately 60
meters away (12 lanes of traffic) from the Bluetooth station and the vehicle is moving very fast.
An analysis by Roorda et al. (2009) showed that devices are often missed at stations, as
determined by devices that were detected at two consecutive stations but missed by a station in-
between. Indeed there could be a number of reasons for this phenomenon, such as router
saturation or error in the communication between the devices. Yet in theory Bluetooth
technology is capable of capturing all of the vehicles having these time windows for device
discovery. For example, research by Peterson, Baldwin, & Kharoufeh (2004) and Peterson,
Baldwin, & Kharoufeh (2006) show a single inquirer will locate 99 percent of all scanning
devices within transmission range in 5.12 seconds, assuming both devices are available to
communicate. Using specification v1.2, they show that the inquiry time can be reduced to 3.84
seconds and 1.28 seconds using the standard and interlaced inquiry scan modes, respectively. It
should be noted that these findings are for ideal conditions that included a number of restrictive
assumptions. Nonetheless, as technology progresses, the communication protocols should evolve
and the performance of the system is expected to improve. For these reasons, all Bluetooth
devices passing stations were assumed to have been captured at a randomly generated location
around the Bluetooth station as previously discussed. While this assumption may be considered
optimistic given current technological capabilities, it is likely to become a reality with better
wireless protocols in the future.
The Bluetooth traffic monitoring system’s ability to accurately estimate freeway traffic speeds
will obviously depend on the number of vehicles it captures, which is a function of the number of
vehicles that have a Bluetooth enabled device, and the probability that the device will be detected
at two consecutive Bluetooth stations. By assuming a fixed 100% detection rate (i.e. all vehicles
with Bluetooth are detected), the proportion of vehicles which carry a Bluetooth-enabled device
is the proportion of vehicles captured. For example, if 1% of vehicles in the microsimulation
model are generated with a Bluetooth device, then 1% of traffic is being captured as probe
vehicles. Since this variable can be adjusted, the microsimulation framework can be used to
assess how data fusion might perform with present day conditions (few probe vehicles) and what
sort of improvement might result from an increased proportion of vehicles carrying Bluetooth
devices.
53
It should be noted that this research could be generalized past fusing loop detectors with the
Bluetooth traffic monitoring system. Rather, the analysis is genuinely fusing probe vehicles and
loop detectors, where probe vehicle measurements are acquired by the Bluetooth system.
Therefore, the results of fusing loop detectors with Bluetooth probe vehicles can generally be
extended to fusing loop detectors with any system for probe vehicle data collection (traffic
cameras, cellular telephones, other wireless technologies, etc.).
5.2.2 Installation of Loop Detectors in Paramics

Installing loop detectors in Paramics is somewhat trivial. Unlike Bluetooth stations, these sensors
are common in transportation networks and are therefore built into the software suite. A user can
simply identify a location on a link where a loop detector is desired and use the built-in functions
to create the object at that location. The loop detector then simulates a real-world loop detector
by capturing traditional measures such as speed, flow, and occupancy.
5.3 5 x 2 Cross Validation

The 5 x 2 cross validation technique was first suggested by Dietterich (1998). In this test, five
replications of twofold cross-validation are performed. In each replication, the available data are
partitioned into two equal sized sets, S1 and S2. Each learning algorithm is trained on each set
and tested on the other set. More formally, 5 x 2 cross validation can be described as:
¾ = ´ =
() ()
¾ = ´ =
() ()
¾ = ´ =
() ()
¾ = ´ =
() ()
¾Â = Ã ´Â = Ã
() ()
¾ = Ã ´ = Ã
() ()
where ´. is the validation set, ¾. is the training set, and the data set . is the data set in the ~
replication divided into two equal sets . and . . Once 5 x 2 cross validation is performed,
() ()
Dietterich (1998) suggested performing a t Test. Alpaydın (1999) later showed a more robust F
test that has lower type I error and higher power than does the t test.
54
We denote . as the difference between the error rates of two algorithms on fold } = 1, 2 of
(/)
replication = 1, … ,5. The average on replication i is ̅. = (. + . )/2 and the estimated
() ()
variance is =. = (. − ̅. ) + (. − ̅. ) . Then it can be shown (Alpaydın, 1999) that:
() ()

∑Ã. ∑/ o.(/) p
¸= ,
(50)
2 ∑Ã. =.
is approximately F distributed with 10 and 5 degrees of freedom. Therefore, for example, we

reject the null hypothesis that the algorithms have the same error rate with 0.95 degree of
confidence if the statistic f is greater than 4.74. In this research, each of the ten folds is one
separate AM peak hour simulation. Therefore, two AM peak hour simulations constitute one
replication. In other words, each method is first trained on the output of one simulation, and then
tested on another. This process is reversed, and conducted over five different pairs of two
different AM peak hour simulations (ten different AM peak hour simulations in total).
5.4 Data Fusion Results

Each of the four links depicted earlier was simulated ten times for ten different Bluetooth
percentages ranging from 0 to 40%, leading to a total of 100 simulation runs. Then, each of the
seven data fusion techniques described earlier was evaluated for each Bluetooth scenario using
the ten respective simulations in the 5 x 2 cross validation scheme. The next subsections present
these results graphically, while Appendix A presents them in a tabular format with the results of
the statistical significance tests.
5.4.1 North of Steeles Ave W to North of Finch Ave W

Figure 5-4 depicts the average traffic speed of the first link. The legend displays the sensors,
where “LD” denotes a loop detector with a 12 character identification code, “BT” denotes the
two Bluetooth stations found at the start and end of the link each having a six digit identification
number, and “GPS” denotes the average speed of all vehicles travelling on the link during the
given time interval (i.e. the ground truth conditions). This graph only represents one simulation,
or one “day in the life” of this highway link and shows Bluetooth estimates when the percent of
Bluetooth equipped vehicles is 5%. Note that the upstream detector (400DN0090DSS) clearly
shows bias when compared with the average link speed as it measures speed directly after the on
ramp to the link where the road ahead is clear. To the extent that this detector is useful for data
55
fusion is questionable, as it neither indicates the proper magnitude of the quantity of interest, nor
does it follow the underlying temporal variation of the traffic pattern over the simulation horizon.
Hwy 400 - North of Steeles Ave W to North of Finch Ave W

45
40
35
30
Speed (m/s)
25
20
15
10 LD: 400DN0100DSS
LD: 400DN0090DSS
5 BT: 108020-108000
GPS (Ground Truth)
0
7:30 AM 8:00 AM 8:30 AM 9:00 AM
Simulation Time
Figure 5-4: A typical simulation of Highway 400 – Link 1
Figure 5-5 shows average root mean squared error (RMSE) for each method under each
Bluetooth percentage scenario for the first architecture. The dotted line (red) denotes the error if
only loop detectors are used using the midpoint method, where each detector speed measurement
is extrapolated in space halfway to the upstream detector and halfway to the downstream detector
(commonly used in freeway travel time estimation). The dashed line (blue) denotes the error if
only Bluetooth probe vehicle estimates are used. Moving away from the y-axis represents an
increase in probe vehicle reports coming from the Bluetooth traffic monitoring system, and
accordingly a decrease in error from Bluetooth estimates. Note that in this case, all of the fusion
methods perform well, particularly in the range of 0 to 10% probe vehicles, where significant
improvements in accuracy over using either loop detectors or Bluetooth independently are
obtained for most methods. Fusing loop detectors by themselves (i.e. observe the y-axis of
Figure 5-5) using any data fusion method outperforms the traditional midpoint method by a very
56
large amount. This suggests that data fusion is even beneficial to conventional freeways
equipped with only loop detectors. Essentially fusion techniques can find the relationship
between loop detector speeds and actual average link speeds more accurately than a midpoint
average. Unfortunately, the OWA operator and associated learning algorithm perform poorly,
with an accuracy that is statistically worse than using Bluetooth independently of loop detectors.
Overall, all other methods provide statistically significant improvements with lower quantities of
probe vehicles (0 to 10%), and observable differences with higher quantities (>10%), though
there is not enough evidence to reject the null hypothesis in these cases.
8
Probe Vehicles
7 Loop Detectors
Simple Convex
Bar-Shalom/Campo
6
Measurement Fusion
SCAAT Kalman Filter
SCATT Kalman
5 OWA
RMSE (m/s)
Fuzzy Integral
4 Neural Network
0
0 5 10 15 20 25 30 35 40
Probe Vehicles (%)
Figure 5-5: Error as a function of probe vehicle sample size (Link 1, Architecture 1)
Similar to Figure 5-5, Figure 5-6 shows the average root mean squared error (RMSE) for each
method under each Bluetooth percentage scenario for the second architecture. Note one of the
major drawbacks of the second architecture by observing the y-axis: when no probe vehicles
estimates enter the central fusion node (i.e. 0% probe vehicles), the only estimate entering the
central fusion node is the midpoint average of loop detectors. For many algorithms, this does not
allow the central fusion node to do any further processing. In fact, only neural networks and the
measurement fusion Kalman filter can actually further modify this single input. All other
algorithms require at least two estimates to do any sort of processing. Nonetheless, similar but
57
mostly smaller improvements are realized in the second architecture as seen in the first, only this
time the OWA operator does not perform worse than using Bluetooth estimates independently.
Note that neural networks perform significantly better with small probe vehicle sample sizes. In
general, the first architecture performs better on this link.
8
Probe Vehicles
7 Loop Detectors
Simple Convex
Bar-Shalom/Campo
6
Measurement Fusion
OWA
5 Fuzzy Integral
RMSE (m/s)
Neural Network
4
0
0 5 10 15 20 25 30 35 40
Probe Vehicles (%)
5.4.2 North of Finch Ave W to Finch Ave W

Figure 5-7 depicts the average traffic speed of the second link in the same fashion the first link
was presented earlier. In this case, one of the loop detectors tends to overestimate average link
speed, while the other tends to underestimate it. On balance, these loop detectors biases should
cancel out and provide reasonable estimates of traffic conditions on this link. Clearly, one of the
most obvious benefits of data fusion in general is that errors of sensors might cancel out one
another. Note that the Bluetooth probe vehicle measurements show no bias as they are made
from traversing the entire road link. However, the Bluetooth measurements are still not precise
because of measurement error associated with Bluetooth (as discussed earlier) and because the
reports collected are only from a fraction (5% in the case shown) of all vehicles travelling on the
freeway. Moreover, we would see more accurate Bluetooth estimates if we plotted a scenario
where the percent of Bluetooth equipped vehicles was larger.
58
Hwy 400 - North of Finch Ave W to Finch Ave W

45
40
35
30
Speed (m/s)
25
20
15
10 LD: 400DN0080DSS
LD: 400DN0070DSS
5 BT: 108000-108160
GPS (Ground Truth)
0
7:30 AM 8:00 AM 8:30 AM 9:00 AM
Simulation Time
Figure 5-8 and Figure 5-9 shows the results of the data fusion methods on the second link for the
first and second architectures respectively. All of the methods realize improvements that are
statistically significant compared to using loop detectors and Bluetooth data independently in
some of the cases. While all of the fusion methods perform well, they again have the greatest
improvement in accuracy when small numbers of probe vehicles are available. However, many
methods have small improvements that are still statistically significant all the way up until the
scenario where 40% of traffic is used as probe vehicles (simple convex combination, Bar-
Shalom/Campo combination, and the measurement fusion Kalman filter). These relatively small
improvements are marginal, but give confidence that the methods perform well even in
conditions where one estimator is significantly more accurate than its competitors. In the first
architecture, all results are either statistically better than using both loop detectors and Bluetooth
independently, or at least better than one of the two. Not a single result is statistically less
accurate than the single most accurate sensor used independently (i.e. never a loss of accuracy).
In the second architecture, fuzzy integrals and the OWA operator fail to capture the underlying
relationship between sensors, and perform poorly throughout the scenarios where large numbers
of probe vehicles are available.
59
4.5
Probe Vehicles
4 Loop Detectors
Simple Convex
Bar-Shalom/Campo
3.5
Measurement Fusion
SCATT Kalman
SCAAT Kalman Filter
3 OWA
RMSE (m/s)
Fuzzy Integral
2.5 Neural Network
1.5
0.5
0 5 10 15 20 25 30 35 40
Probe Vehicles (%)
4.5
Probe Vehicles
4 Loop Detectors
Simple Convex
Bar-Shalom/Campo
3.5
Measurement Fusion
OWA
3 Fuzzy Integral
RMSE (m/s)
Neural Network
2.5
1.5
0.5
0 5 10 15 20 25 30 35 40
Probe Vehicles (%)
60
5.4.3 Finch Ave W to North of Sheppard Ave W

Figure 5-10 depicts the average traffic speed of the third link. Unlike the previous two cases, this
link has four loop detectors. Although these detectors seem to provide reasonably good
estimates, it should be noted that they are bunched together at one region of the link (refer back
to Figure 5-1). Therefore, it is questionable whether or not these estimates should really be
treated as different measurements of traffic speed on the link. This case provides the motivation
for considering the second architecture proposed earlier, where loop detectors are aggregated
first by the midpoint method before being sent to the central fusion node. The underlying
motivation is that it might prove more beneficial treating these four measurements as one
aggregated estimate and fusing it with the Bluetooth data afterwards at a one to one ratio. On the
other hand, the data fusion algorithm at the central node of the first architecture might
automatically calibrate itself to consider the inherent redundancy in these four measurements,
making the four to one input ratio into the central fusion node in this architecture irrelevant. For
example, it is clear from studying fuzzy integrals that they have the ability to consider this sort of
redundancy.
Hwy 400 - Finch Ave W to North of Sheppard Ave W

45
40
35
30
Speed (m/s)
25
20
15 LD: 400DN0061DSS
LD: 400DN0060DSS
10 LD: 400DN0059DSS
LD: 400DN0058DSS
5 BT: 108160-108140
GPS (Ground Truth)
0
7:30 AM 8:00 AM 8:30 AM 9:00 AM
Simulation Time

61
Figure 5-11 and Figure 5-12 show the results of the data fusion methods on the third link for the
first and second architecture respectively. The results on this link highlight some of the patterns
that have been seen previously. As before, the simple convex combination, Bar-Shalom/Campo
combination and both Kalman filters perform best. Almost all of their improvements are
statistically significant over all scenarios. The neural network also performs quite well in this
case, although the fact that its results are not statistically different than loop detectors at low
levels of probe vehicles indicates it had a higher variance of estimates across the ten folds. This
suggests its average accuracy is on par with other methods but may be less reliable or experience
large deviations from this average accuracy at certain times. Lastly, the fuzzy integral and the
OWA operator perform badly, although their results are not statistically different than loop
detectors or Bluetooth in many cases, suggesting that some folds went well while others did not,
resulting in a large variance of the average root of mean squared error. Once again, the second
architecture sees smaller improvements than does the first architecture. Note the high average
RMSE (4.99 m/s) of the neural network when using 1% probe vehicles in the second
architecture. Only one fold achieved a high RMSE of 42.6 m/s, while all others were below 1.02
m/s. Hence the average RMSE of the neural network is not statistically different than either
single source estimate due to the high variance across the folds.
3.5
Probe Vehicles
Loop Detectors
3
Simple Convex
Bar-Shalom/Campo
2.5 Measurement Fusion
SCATT Kalman
SCAAT Kalman Filter
OWA
RMSE (m/s)
2 Fuzzy Integral
Neural Network
1.5
0.5
0
0 5 10 15 20 25 30 35 40
Probe Vehicles (%)
62
5
Probe Vehicles
4.5 Loop Detectors
Simple Convex
4
Bar-Shalom/Campo
Measurement Fusion
3.5
OWA
3 Fuzzy Integral
RMSE (m/s)
Neural Network
2.5
1.5
0.5
0
0 5 10 15 20 25 30 35 40
Probe Vehicles (%)
5.4.4 North of Sheppard Ave W to North of Hwy 401

Figure 5-13 depicts the average traffic speed of the fourth link. Similar to the third link, this link
has four loop detectors. However unlike the preceding freeway stretch, these loop detector are
spaced more evenly over the entire link, suggesting their estimates are less redundant and could
provide more meaningful data. As expected, traffic breaks down significantly over this stretch as
the morning rush hour progresses. Note that the two loop detectors upstream of the congested
region do not realize the declining average link speed. Instead, they continue to indicate free
flow traffic speeds throughout the AM peak hour. The other two loop detectors indicate the
deteriorating traffic conditions reasonably well. This situation best illustrates the fact that loop
detectors are truly only representative of traffic conditions at their location. Furthermore, if
traffic does not change linearly between successive loop detectors, their use in estimating
freeway traffic speeds might be unsuitable in turbulent traffic conditions.
63
Hwy 400 - North of Sheppard Ave W to North of Hwy 401

45
40
35
30
Speed (m/s)
25
20
15 LD: 400DN0050DSS
LD: 400DN0040DSS
10 LD: 400DN0030DSS
LD: 400DN0020DSS
5 BT: 108140-108240
GPS (Ground Truth)
0
7:30 AM 8:00 AM 8:30 AM 9:00 AM
Simulation Time
Figure 5-14 and Figure 5-15 show the results of the data fusion methods on the fourth link for the
first and second architectures respectively. These results are quite different than what has been
seen before, likely due to the traffic conditions on this link. In the case of the first architecture,
loop detectors essentially prove to be the worst method for determining average traffic speed as
they remain at the top of Figure 5-14. Bluetooth probe vehicles take their usual error curve,
suggesting their accuracy is relatively insensitive to traffic conditions. There are some significant
improvements as a result of the data fusion methods, again at lower levels of probe vehicles.
More interesting is that neural networks perform irregularly and inconsistently. Although the
average root mean squared error across all ten folds is higher than Bluetooth probe vehicles,
none of these changes are significant. This is because neural networks continue to perform well
on certain folds, and not on others, resulting in high variation in the average. The reality is that
traffic breaks down in some simulations, and not in others. This suggests neural networks can
capture a traffic pattern well, as long as the traffic pattern has been observed in training. For
example, if a neural network is trained on a simulation when traffic is not congested, it becomes
confused when tested on another simulation when traffic is congested. Overall, no significant
improvements exist in the second architecture as it performs worse than the first in most cases.
64
4.5
Probe Vehicles
4 Loop Detectors
Simple Convex
Bar-Shalom/Campo
3.5
Measurement Fusion
SCAAT Kalman Filter
SCATT Kalman
3 OWA
RMSE (m/s)
Fuzzy Integral
2.5 Neural Network
1.5
0.5
0 5 10 15 20 25 30 35 40
Probe Vehicles (%)
12
Probe Vehicles
Loop Detectors
10 Simple Convex
Bar-Shalom/Campo
Measurement Fusion
8 OWA
Fuzzy Integral
RMSE (m/s)
Neural Network
6
0
0 5 10 15 20 25 30 35 40
Probe Vehicles (%)
65
5.5 Summary of Key Findings

Across all four links studied on highway 400, there are some common reoccurring patterns:
• Loop detectors aggregated by the midpoint average are usually the worst estimator for
freeway traffic speeds
• Probe vehicle estimates become more accurate as the probe vehicle sample size increases.
Even with small sample sizes (1 to 5%), probe vehicles are usually more accurate than
loop detectors. Additionally, their error curve appears insensitive to traffic conditions
• Most fusion techniques produce statistically significant improvements over using loop
detectors or Bluetooth data independently, particularly when there are few probe vehicle
measurements.
• Fusion techniques can be used without probe vehicle data to simply fuse loop detectors
together. Fusion methods will usually outperform the conventional midpoint method.
• The three most consistently well performing algorithms are the simple convex
combination, the Bar-Shalom/Campo combination and the measurement fusion
formulation of the Kalman filter. The difference between these is marginal.
• The OWA operator performed poorly in multiple instances. This is not surprising
considering that its weights are associated with a particular rank of a measurement, and
not a particular estimator. In the case of sensor-fusion, the fusion algorithm needs to
understand sensor specific properties, which would be difficult if the measurements are
always rearranged in descending order. As a result, it often cannot find a relationship
between the order of measurements and their significance, as there is no such inherent
relationship.
• Architecture 1 outperforms Architecture 2. This suggests that fusion algorithms prefer to

have more raw data available so that they can understand the complex relationships
between sensors. It also suggests that although the central fusion node receives between
two and four times more data from loop detectors, this does not cloud the fusion nodes
judgment as it is able to understand that Bluetooth measurements should remain heavily
weighted. This means architecture 1 can be expanded easily in the future by adding other
estimators as well, without worrying about the ratio of input data sources.
66
Chapter 6
Highway 401 Real-World Case Study
“In theory, theory and practice are the same. In practice, they are not.”
– Lawrence Peter "Yogi" Berra
6.1 From Microsimulation to the Real World

Using microsimulation to investigate data fusion techniques has many advantages as discussed in
the previous chapter. The largest advantage of microsimulation is that ground truth conditions
are known with absolute certainty. Microsimulation also allows for experimentation with
different numbers of probe vehicles. Furthermore, it allows for theoretically limitless amounts of
data to be generated with relative ease. However, there are shortcomings too. The obvious
disadvantage is that the data are simulated, and while they should represent real world
conditions, this is not guaranteed. On the other hand, real world data have the advantage of
representing the actual Bluetooth traffic monitoring system and loop detectors. The relative
merits of microsimulation vs. real world data are summarized in Table 6-1 below.
Table 6-1: A comparison of merits between microsimulation and real world data
Microsimulation Real World

Advantages Advantages
• Ground truth is known with certainty • Real data
• Number of Bluetooth devices is • Bluetooth measurements reflect accuracy
variable and frequency of implemented system
• Large amounts of data “easily”
acquired
Disadvantages Disadvantages
• Data are simulated • GPS probe vehicles assumed to be
• Theoretical accuracy of Bluetooth - ground truth traffic conditions
assumed distribution of detections • Number of Bluetooth measurements is
fixed by current conditions
• Harder to acquire large amounts of data
67
6.2 Highway 401 Real-World Data Collection

On May 13, 2009, probe vehicles with GPS units traversed Highway 401, from Highway 400 to
Kennedy Rd. GPS data loggers recorded position, time and speed every two seconds. From
2:00pm to 7:30pm, 47 probe vehicle reports were made for the east bound collector’s lanes and
46 probe vehicle reports were made for the west bound collector lanes.
Data from the Bluetooth traffic monitoring system were obtained for the same period of time.
There are three stations that correspond to this section of highway: (1) one at highway 400, (2) a
second at Bathurst St., (3) and a third at Kennedy Rd. The number of Bluetooth detections is as
follows:
• Station 1only : 669 • Station 2 and 3: 45
• Station 2 only: 840 • Station 1 and 3: 33
• Station 3 only: 846 • Station 1 and 2 and 3: 12
• Station 1 and 2: 60
Lastly, data from the corresponding loop detectors along this stretch were acquired. There are six
detectors in each of the west and east bound collector lanes that correspond to this stretch of
freeway. Each detector reports measures such as speed at 20 second intervals.
The details of the 16 sensors located in this section are provided in Table 6-2 and Table 6-3.
Figure 6-1 provides a scaled drawing of the 4 consecutive test links that can be constructed from
these sensors. Note that these Bluetooth stations are more widely spaced, covering several
kilometers of distance between successive sensors, quite different from those on highway 400
where each link spanned only one or two kilometers.
68
Table 6-2: Eastbound Highway 401 sensor details
Type Name Address Latitude Longitude
Bluetooth 105140 At Hwy 400 43.715737 -79.5209
VDS 401DW0020DEC E. of Jane 43.71783 -79.5
VDS 401DE0030DEC E. of Keele 43.72435 -79.4731
Bluetooth 105660 W. of Bathurst 43.734283 -79.4358
VDS 401DE0140DEC E. of Yonge 43.75917 -79.3988
VDS 401DE0170DEC E. of Bayview 43.76387 -79.3828
VDS 401DE0280DEC E. of Birchmount 43.77251 -79.2937
Bluetooth 105680 E. of Kennedy 43.774653 -79.2839
Table 6-3: Westbound Highway 401 sensor details
Type Name Address Latitude Longitude
VDS 401DE0300DWC At Midland 43.77782 -79.2733
Bluetooth 105680 E. of Kennedy 43.774653 -79.2839
VDS 401DE0210DWC W. of Hwy 404/DVP 43.76769 -79.3434
VDS 401DE0140DWC E. of Yonge 43.76068 -79.3972
Bluetooth 105660 W. of Bathurst 43.734283 -79.4358
VDS 401DE0030DWC E. of Dufferin 43.72578 -79.4695
VDS 401DW0020DWC E. of Jane 43.71968 -79.4993
Bluetooth 105140 At Hwy 400 43.715737 -79.5209

69
Link 1 – Highway 400 to West of Bathurst St
105140 401DW0020DE 401DE0030DEC 105660
1.7 2.3 3.2
Link 2 – West of Bathurst St to East of Kennedy Rd
105660 401DE0140DE 401DE0170DEC 401DE0280DEC 105680
4.1 1.4 7.3 0.8
Link 3 – East of Kennedy Rd to West of Bathurst St
105680
401DE0300DW 401DE0210DW 401DE0140DW 105660
0.9 4.9 4.4 4.3
Link 4 – West of Bathurst St to Highway 400
105660 401DE0030DW 401DW0020DW 105140
2.9 2.5 1.8

Figure 6-1: Highway 401 sensor schematic (distances shown in kilometers – drawn to scale)
6.3 k-fold Cross Validation

In k-fold cross-validation the data is first partitioned into k equally (or nearly equally) sized
segments or folds. Subsequently k iterations of training and validation are performed such that
within each iteration a different fold of the data is held-out for validation while the remaining k -
1 folds are used for learning. A large k is seemingly desirable, since with a larger k there are
more performance estimates, and the training set size is closer to the full data size, thus
increasing the possibility that any conclusion made about the data fusion algorithms under test
will generalize to the case where all the data is used to train the learning model. As k increases,
however, the overlap between training sets also increases. These competing factors have all been
considered and the general consensus in the data mining community seems to be that k = 10 is a
good compromise. This value of k is particularity attractive because it makes predictions using
70
90% of the data, making it more likely to be generalizable to the full data (Refaeilzadeh, Tang, &
Liu, 2009). More formally, k-fold cross-validation can be described as:
´ = ¾ = ∪ ∪ … ∪
´ = ¾ = ∪ ∪ … ∪
,
⋮ ⋮
´
=
¾
= ∪ ∪ … ∪
)
where ´. is the validation set, ¾. is the training set, and the data set is divided into B subsets . .
Once k-fold cross validation is performed, a Paired t-Test can be used to show that algorithm A
is better than algorithm B (Dietterich, 1998). More formally, if we compare the performance of
two algorithms, A and B, we consider two hypotheses:
• H0: The null hypothesis that A and B have the same performance.
• Ha: The alternative hypothesis that A and B have different performance.
If the empirical results of their accuracies differ more than the threshold, we reject H0 and adopt
Ha. Otherwise, we retain H0: this does not mean that we believe in H0, but simply that we do not
have enough evidence to say otherwise. If we denote the difference in accuracies between
algorithms as (.) = Å − Æ , then we can apply the Student’s t test by computing the statistic
(.) (.)
̅ ∙ √5
= ,
(51)
²∑.( − ̅ )
: (.)
5−1
where ̅ = : ∑:.) (.) . Under the null hypothesis, the statistic has a t distribution with n -1

|| is outside the threshold we reject H0 and claim that A and B are truly different with 95%
degrees of freedom. Therefore, we can look up the 5% threshold (2-sided) from a table and when
confidence.
6.4 Data Fusion Results

Each of the four links depicted earlier was evaluated using the ten-fold cross validation on the
data collected. The next subsections present these results graphically, while Appendix B presents
them in a tabular format with the results of the statistical significance tests.
6.4.1 Highway 400 to West of Bathurst St

Figure 6-2 depicts the traffic data collected on the first link. This link has two loop detectors, one
of which follows the GPS data very closely, and another which seems to consistently
71
overestimate the link speed. Although the second loop detector overestimates traffic speeds, it
still manages to capture the temporal variation of traffic speeds over the time horizon. This
suggests that the data might still be useful for data fusion in the sense that the fusion algorithm
simply needs to learn this tendency and adjust accordingly. The data from the Bluetooth traffic
monitoring system follows the GPS data extraordinarily well and shows no bias. Figure 6-3
shows the comparison of using loop detector, Bluetooth and GPS data over the sample period to
estimate freeway traffic speeds at any given time. Loop detectors are aggregated together using
the midpoint method. If no reading is available from the Bluetooth traffic monitoring system in
the aggregation interval (5 minutes), the reading from the previous time step is used, hence there
are periods where the Bluetooth estimate flat lines as no new data is available for certain periods
of time. Clearly, even once aggregated loop detectors show a bias toward overestimating link
speed, whereas the Bluetooth estimate is relatively on par with the GPS estimate at nearly all
points throughout the time horizon.
Hwy 401 East Bound – Highway 400 to West of Bathurst St

30
LD: 401DW0020DEC
LD: 401DE0030DEC
25 BT: 105140-105660
GPS (Ground Truth)
20
Speed (m/s)
15
10
0
12:00 PM 3:00 PM 6:00 PM 9:00 PM
Time
Figure 6-2: Data collected on Highway 401 – Link 1

72
Hwy 401 East Bound – Highway 400 to West of Bathurst St

30
25
20
Speed (m/s)
15
10
Loop Detectors
Bluetooth
GPS (Ground Truth)
5
12:00 PM 3:00 PM 6:00 PM 9:00 PM
Time
Figure 6-3: Comparison of loop detector, Bluetooth, and GPS estimates on Link 1
Figure 6-4 shows the results of the data fusion estimation techniques on this link. Error is
measured by the average root of mean squared error (RMSE) in meters per second (m/s) relative
to the GPS mean speed. Note the following conventions: “A1 – LD” denotes architecture 1 is
used with only loop detectors, “A1 LD + BT” denotes architecture 1 is used with loop detectors
and Bluetooth data, “A2 – LD” denotes architecture 2 is used with only loop detectors, and “A2
LD + BT” denotes architecture 2 is used with loop detectors and Bluetooth data. When using
only loop detectors, clearly the first architecture is superior, because it allows the central fusion
node to have more raw data and therefore allowing for the possibility of more intelligent
inferences. This is seen by the OWA operator and the Choquet fuzzy integral, where their
performance in the first architecture is significantly better than using a midpoint average of loop
detectors. Neural networks also perform much better than a midpoint average in both the first
and second architecture. Once the Bluetooth data is added into the mix, each fusion method
essentially reduces its average root mean squared error to be on par with the accuracy of
Bluetooth data. There are no real improvements over using Bluetooth data independent of loop
detector data, but this is not entirely surprising given the very accurate and precise estimates
coming from the Bluetooth traffic monitoring system on this link. Only one case is statistically
73
worse than using the best sensor independently, which is the SCAAT Kalman filter when using
Bluetooth and loop detector data. This is clearly a result of the SCAAT Kalman filter taking each
measurement as it occurs, since there are many more loop detector measurements flooding the
filter as compared to the Bluetooth estimates. In other words, the filter is flooded with low
quality loop detector estimates and therefore has no choice but to use these measurements in
light of the infrequent Bluetooth data. Note that other methods wait for a predefined time step to
update their estimate (every 5 minutes), and use the previous Bluetooth estimate if one was not
made during their update interval. In this case, the other methods are clearly at an advantage.
Overall, this link shows that data fusion may not result in an improvement in accuracy if one of
the estimators is much more accurate than the other. Yet at the same time, the less accurate
estimator does not degrade the fused result either (only in one case). In other words, the fused
estimate will have equal or greater accuracy as compared to the most accurate single estimator.
5
4.5
4
3.5
RMSE (m/s)
3
2.5 A1 Fusion - LD
2 A2 Fusion - LD
1.5 A1 Fusion - LD & BT
1 A2 Fusion - LD & BT
0.5 Bluetooth
0 Loop Detectors
Figure 6-4: Error of data fusion techniques on Hwy 401 - Link 1
6.4.2 West of Bathurst St to East of Kennedy Rd

Figure 6-5 depicts the traffic data collected on the second link. This link has three loop detectors,
all of which follow the GPS data to some extent. It appears that one of the loop detectors does
this better than the others, but all of them show some deterioration of traffic speed during the
74
middle of the PM peak period. The data from the Bluetooth traffic monitoring system shows the
same sort of temporal variation as the other sensors, but the data is sparse and sometimes differs
substantially from the GPS conditions. When Bluetooth data alone is used to estimate the
average link speed as shown in Figure 6-6, the estimate shows a very close fit to the conditions
described by the GPS units, but tends to produce more sudden variations and accordingly a less
smooth estimate of time varying link speed. On the other hand, loop detectors produce a
smoother estimate which is close to GPS conditions except at the lowest valley of the curve,
where the unrepresentative measurements of upstream loop detectors taint the aggregated loop
detector estimate, a common problem in congested freeway conditions.
Hwy 401 East Bound – West of Bathurst St to East of Kennedy Rd

35
30
25
Speed (m/s)
20
15
LD: 401DE0140DEC
LD: 401DE0170DEC
10 LD: 401DE0280DEC
BT: 105660-105680
GPS (Ground Truth)
5
12:00 PM 3:00 PM 6:00 PM 9:00 PM
Time

75
Hwy 401 East Bound – West of Bathurst St to East of Kennedy Rd

30
28
26
24
22
Speed (m/s)
20
18
16
14
Loop Detectors
12 Bluetooth
GPS (Ground Truth)
10
12:00 PM 3:00 PM 6:00 PM 9:00 PM
Time
Figure 6-7 shows the error of data fusion techniques on this link. Unlike the first link, loop
detectors actually appear to provide a more accurate estimate of traffic speed compared to the
Bluetooth system, though this difference is not statistically significant according to the Student’s
t test. When fusing only loop detectors, the measurement fusion Kalman filter and the SCAAT
Kalman filter provide significant improvements over using a midpoint average when employed
in the first architecture. Again, there is little opportunity to realize an accuracy benefit from data
fusion when using only loop detectors in the second architecture. When fusing loop detector data
with the Bluetooth data, the loop detectors help the Bluetooth estimator in this case, making all
of the fusion methods statistically better than using only Bluetooth estimates. Note though, that
adding the less accurate Bluetooth estimates to the loop detector estimates improves accuracy
over fusing only loop detector estimates in all cases. However, only the SCAAT Kalman filter is
statistically better than using both the Bluetooth estimates and the loop detector estimates
independently. Other methods realize a similar average root mean squared error across the ten
folds but do not provide enough evidence to reject the null hypothesis. Recall that on the first
link, the fusion methods obtain similar accuracy to the Bluetooth estimator which is the most
accurate sensor on that link. In this case, the fusion methods obtain results similar to the loop
76
detectors which are more accurate on this link. From this we can deduce that regardless of which
system is providing the more accurate result, most data fusion techniques will learn which
measurement to trust and calibrate the fused estimate accordingly.
4.5
4
3.5
3
RMSE (m/s)
2.5
A1 Fusion - LD
2
A2 Fusion - LD
1.5
A1 Fusion - LD & BT
0.5 Bluetooth
0 Loop Detectors
6.4.3 East of Kennedy Rd to West of Bathurst St

As shown in Figure 6-8, the third link has three loop detectors, two of which fail to capture the
breakdown of traffic that occurs shortly before 4:00 PM. The third loop detector clearly shows
that traffic conditions are deteriorating between 3:00 PM and 4:00 PM, even more so than the
GPS estimates indicate. This suggests that this loop detector is likely in the heart of the
congested region of the link. Aggregating these loop detectors together by the midpoint method
(Figure 6-9) produces a very representative estimate of traffic conditions although the speeds
during the most heavily congested time are overestimated due to the fact that two of the loop
detectors are measuring uncongested speeds. The Bluetooth estimate tends to capture this valley
better, but the infrequent measurements leave the estimate stale for successive time steps as the
system waits for a new probe vehicle measurement. Nonetheless, the variation of traffic
conditions is well observed by even the few Bluetooth measurements used to construct the
estimator over the time horizon.
77
Hwy 401 West Bound – East of Kennedy Rd to West of Bathurst St

35
30
25
Speed (m/s)
20
15
LD: 401DE0300DWC
LD: 401DE0210DWC
10 LD: 401DE0140DWC
BT: 105680-105660
GPS (Ground Truth)
5
12:00 PM 3:00 PM 6:00 PM 9:00 PM
Time
Hwy 401 West Bound – East of Kennedy Rd to West of Bathurst St

30
28
26
Speed (m/s)
24
22
20
18 Loop Detectors
Bluetooth
GPS (Ground Truth)
16
12:00 PM 3:00 PM 6:00 PM 9:00 PM
Time
78
The error of the various data fusion techniques for this link is shown in Figure 6-10. Although
there is some positive and negative variation of various fusion methods when fusing only loop
detectors in both architectures, none of these changes are statistically significant. In other words,
no method does better or worse than the midpoint method when given only raw loop detector
data. However, once Bluetooth data is added, many significant changes are obtained, particularly
by the simple convex combination, the Bar-Shalom/Campo combination, and the measurement
fusion Kalman filter. Note that on this link, the Bluetooth data is more accurate than the loop
detector data and this difference is statistically significant. Note also that in this case, the second
architecture generally outperforms the first, possibly because two of the loop detectors
(401DE0300DWC and 401DE0210DWC) provide nearly redundant information which creates a
bias toward their readings in the first architecture.
3.5
2.5
RMSE (m/s)
2
A1 Fusion - LD
1.5 A2 Fusion - LD
A2 Fusion - LD & BT
0.5
Bluetooth
0 Loop Detectors
6.4.4 West of Bathurst St to Highway 400

The last link has only two loop detectors and once again suffers from congestion between 3:00
PM and 6:00 PM (Figure 6-11). Both of the loop detectors measure deteriorating traffic speeds
during the peak of the peak, as does the Bluetooth traffic monitoring system which provides only
few but accurate estimates of traffic conditions over the time horizon. When the loop detector
79
and Bluetooth estimates are overlaid as shown in Figure 6-12, they both show reasonably good
fit to the GPS collected data. Observe that at first, loop detectors are underestimating traffic
speed, and later they are overestimating traffic speed. Bluetooth on the other hand, is vice-versa.
Seeing this sort of situation is promising for data fusion, as the two estimator’s biases might
cancel each other out once fused. Notably, the main disadvantage of the Bluetooth technology at
this point is the low number of successive detections that leave the estimator flat lining for short
periods of time while waiting for new data.
Hwy 401 West Bound – West of Bathurst St to Highway 400

35
30
25
Speed (m/s)
20
15 LD: 401DE0030DWC
LD: 401DW0020DWC
BT: 105660-105140
GPS (Ground Truth)
10
12:00 PM 3:00 PM 6:00 PM 9:00 PM
Time

80
Hwy 401 West Bound – West of Bathurst St to Highway 400

32
30
28
26
24
Speed (m/s)
22
20
18
16
Loop Detectors
14 Bluetooth
GPS (Ground Truth)
12
12:00 PM 3:00 PM 6:00 PM 9:00 PM
Time
Saving the best for last, Figure 6-13 presents the results of the data fusion techniques on this link.
Not much benefit is seen from fusing loop detectors alone as compared to the midpoint method,
with the exception of an artificial neural network in the second architecture, which sees a
significant reduction in error according to the average root mean squared error criterion. More
interesting, is that nearly all methods in both architectures obtain a significant improvement in
accuracy over both Bluetooth data and loop detector data used independently. This is of course
the most desirable result, where multiple sensors are combined to achieve accuracy greater than
any one of them used individually. The measurement fusion Kalman filter provides the most
significant improvement, but all methods follow closely making this difference somewhat
marginal.
81
3.5
2.5
RMSE (m/s)
2 A1 Fusion - LD
1.5 A2 Fusion - LD
A1 Fusion - LD & BT
1
A2 Fusion - LD & BT
0.5
Bluetooth
0 Loop Detectors
6.5 Summary of Key Findings

There are some notable findings from the real-world case study:
• Bluetooth estimates of traffic speed tend to be very close to what GPS probe vehicles
experience, but the measurements resulting from the studied Bluetooth traffic monitoring
system are too infrequent. This is the only notable downfall of the system, which is likely
in part due to the preliminary technology being deployed at this point.
• Regardless of which estimator is providing a more accurate estimate, the data fusion
techniques tune themselves accordingly to achieve an error not worse than the best
estimator. For example, on some links Bluetooth probe vehicles were more accurate, and
on others the loop detectors were more accurate. In either case, the data fusion algorithm
learns this property.
• All of the data fusion techniques can be trained using a sample set of field collected data.
This suggests the algorithms are “practice-ready” and a real-time data fusion based traffic
monitoring system is entirely feasible.
82
Chapter 7
Conclusion
“A man should look for what is, and not for what he thinks should be.”
– Albert Einstein
In this thesis, seven multi-sensor data fusion based estimation techniques are investigated. All
methods are compared in terms of their ability to fuse data from loop detectors and Bluetooth
tracked probe vehicles to accurately estimate freeway traffic speed. In the first case study, data
generated from a microsimulation model are used to assess how data fusion might perform with
present day conditions, having few probe vehicles, and what sort of improvement might result
from an increased proportion of vehicles carrying Bluetooth-enabled devices in the future. In the
second case study, data collected from the real-world Bluetooth traffic monitoring system are
fused with corresponding loop detector data and the results are compared against GPS collected
probe vehicle data, demonstrating the feasibility of implementing data fusion for real-time traffic
monitoring today. This research constitutes the most comprehensive evaluation of data fusion
techniques for traffic speed estimation known to the author.
7.1 On Data Fusion Techniques

Each of the data fusion techniques investigated performs reasonably well and decreases
estimation error in nearly all cases. The OWA operator is an exception, as it often cannot find a
relationship between the order of measurements and their significance, as there is no such
inherent relationship. Thus, while the OWA operator might be extremely useful with a group of
sensors that have the same properties, it has difficulty in establishing the inherent relationships
between sensors with different properties. The Choquet fuzzy integral also suffers from
problematic behavior when it cannot fully capture the relationship between criteria in the training
stage. The simple convex combination, Bar-Shalom/Campo combination and the measurement
fusion Kalman filter consistently perform well and often improve accuracy. The difference
between these three methods is marginal. Neural networks can perform on par with these three
techniques, often even surpassing their accuracies, but the fact that these improvements are
sometimes not statistically significant indicates a higher variance in the individual performances,
translating to decreased overall reliability. In a practical sense, many of the techniques perform
83
similarly, and there is certainly not one method that outperforms all others substantially in all
cases. This is an important conclusion, as many algorithms are quite complex, and this
complexity is questionable considering they do not provide any added value as a result. For
example, compare the simple convex combination to the Choquet fuzzy integral; not only does
the simple convex combination perform better for traffic speed estimation, but it is much simpler
to understand, implement, and compute. Therefore, it might be difficult to justify implementing a
complex algorithm if it does not outperform its competition substantially. In the end, other
considerations might factor into the decision of which technique is preferred (such as the
background knowledge of the system operator or performance of the algorithm in a local field
test). Overall, in the presence of multiple traffic data sources, data fusion should be utilized so
that the aforementioned benefits of a data fusion based system can be realized.
7.2 On Fusing Data from Loop Detectors and Probe Vehicles

Based on the simulation case study, probe vehicle sample sizes between one and five percent
have relatively large estimation error when used without fusion. Combining loop detectors with
the conventional midpoint method can yield relatively high estimation error as traffic conditions
can change drastically from one detector to another. Fusing loop detectors by themselves using
any data fusion method generally outperforms the traditional midpoint method, sometimes by a
very large amount. This suggests that data fusion is even beneficial to conventional freeways
equipped with only loop detectors. Essentially fusion techniques can find the relationship
between loop detector speeds and actual average link speeds more accurately than a midpoint
average. As probe vehicle estimates are added, and the sample size of such estimates increase, all
data fusion techniques perform better. This suggests that acquiring data from more probe
vehicles on the freeway is a worthwhile task because the extra data are meaningful. With that in
mind, data fusion shows the highest benefit when probe vehicle sample sizes are smaller (1-
15%), generally improving accuracy. As the probe sample size increases, the need for data fusion
in terms of accuracy is lessened because the sample size is generally representative of the
platoon’s average traffic speed. This implies we might eventually rely on probe vehicle
observation alone for travel time and traffic speed estimation. However, other generic benefits of
data fusion such as reliability, robustness, redundancy, certainty, coverage, and others, still make
a case for utilizing data fusion in the traffic monitoring context.
84
One of the more encouraging results is found in the case study using real-world data collected on
Highway 401. Sometimes the Bluetooth traffic monitoring system provides more accurate
information, and sometimes the loop detector data is more accurate. Sometimes the difference
between these estimators is statistically significant, and sometimes it is not. Regardless of all of
these cases, it is quite rare that fusing their data together would result in an estimate that is
statistically worse than using the better of the two data sources independently. That is, the worst
case scenario is that multiple sensors are combined and the fused estimate has no greater
accuracy than the most accurate of those sensors. Moreover, the best case scenario is that
multiple sensors are combined and the fused estimate has greater accuracy that is statistically
better than the best estimator used independently. Therefore, in implementing a data fusion based
system, there is generally no concern for making the fused estimate “worse”. In sum, a data
fusion based system will realize many benefits, which may or may not include an increase in
accuracy, but will likely not result in a loss of accuracy.
7.3 Recommendations for Future Work

In terms of refining this research, there are a few points to consider. The information gathered in
this thesis comes from various fields including statistics, target tracking, multi-criteria decision
making, artificial intelligence, machine learning, and others. Although every effort was made to
best utilize the information drawn from these fields, any researcher interested in a particular
topic should derive more knowledge about its originating field of research. This might lead to
further refinements in performance over those seen here. For example, although speed has been
treated as hidden state in the Kalman filter models here, a more refined process model might
further improve the accuracy of the Kalman filter based techniques. In the same vein, a different
learning algorithm for the Choquet fuzzy integral operator or OWA operator might yield
different results. Adding further inputs into the neural network model might increase its
performance as well. Furthermore, different architectures than those proposed here could be
considered. For example, in the pre-processing stage, a Kalman filter could filter each sensors
measurement in real time, before taking the average for each aggregation interval. This might
smooth some of the error in the preprocessing stage and lead to cleaner data entering the central
fusion node. On the other hand, the central fusion node might prefer the raw data to derive the
most intelligent inference possible. Indeed, there are almost innumerable combinations of
techniques and architectures, and there are no guidelines for designing or implementing such a
85
system. Overall, every effort was made to create intuitive architectures that make sense and use
reasonably well calibrated techniques which would likely show the most promising results.
Perhaps the more rewarding work lies in practice. As time progresses, there appears to be a
greater number of sources of data regarding traffic speed and travel time. These sources of data
should be considered for inclusion into a data fusion based system along with other conventional
traffic measurement instruments (loop detectors). For example, a traffic management center
should clearly no longer rely on a simple midpoint average of loop detectors to estimate traffic
speeds. In the presence of probe vehicle data, such as that coming from a Bluetooth traffic
monitoring system, or from a cellular telephone company, data from these sources should be
fused together. The techniques investigated and implemented here are “practice ready” in the
sense that any method can be used to fuse any arbitrary number of estimators. For example, a
data fusion based traffic estimation system may first rely only on loop detectors and then might
later add Bluetooth device monitoring. Even later, an algorithm might be implemented to deduce
traffic speeds from traffic cameras. These new data can easily be implemented in the distributed
fusion architectures proposed earlier, by simply sending the estimate to the central fusion node
for consideration. Of course the caveat is that the central fusion node will need re-training, and
while this requires collecting a small sample of data from all sensors simultaneously, this only
needs to be done once. As demonstrated in Chapter 5, this data collection process is entirely
possible. Furthermore, now that sources of data for traffic monitoring are becoming increasingly
available, data fusion needs to be applied in practice not only to realize the benefits of data
fusion, but to use the data most efficiently. For example a traffic management center would not
benefit from several different technologies providing what should be the same information. In
fact, the abundance of data would only overwhelm decision makers and slow down their ability
to make timely decisions. Rather, a single fused intelligent inference resulting from all of these
data sources would be preferred. Therefore, data fusion provides numerous benefits while also
addressing the issue of information overload. For all of these reasons, it is time to more widely
apply data fusion in the traffic monitoring context.
86
References
Alpaydin, E. (1999). Combined 5×2 cv F test for comparing supervised classification learning
algorithms. Neural Computation, 11(8), 1885-1892.
Beale, M. H., Hagan, M. T., & Demuth, H. B. (2010). Neural network toolbox 7 User’s guide.
Natick, MA: The MathWorks, Inc.
Berkow, M., Monsere, C. M., Koonce, P., Bertini, R. L., & Wolfe, M. (2009). Prototype for data
fusion using stationary and mobile data. Transportation Research Record, 2099, 102-112.
Bouchon-Meunier, B. (Ed.). (1998). Aggregation and fusion of imperfect information. New
York: Physica-Verlag.
Brooks, R. R., & Iyengar, S. S. (1998). Multi-sensor fusion : Fundamentals and applications
with software. Upper Saddle River, N.J.: Prentice Hall PTR.
Byon, Y., Shalaby, A., Abdulhai, B., & Elshafiey, S. (2010). Traffic data fusion using SCAAT
kalman filters. TRB 89th Annual Meeting Compendium of Papers DVD, Washington, DC.
Chang, K. C., Saha, R. K., & Bar-Shalom, Y. (1997). On optimal track-to-track fusion. IEEE
Transactions on Aerospace and Electronic Systems, 33(4), 1271-1276.
Cheu, R. L., Lee, D., & Xie, C. (2001). An arterial speed estimation model fusing data from
stationary and mobile sensors. 2001 IEEE Intelligent Transportation Systems Conference
Proceedings, Oakland, CA. 573-578.
Choi, K., & Chung, Y. (2002). A data fusion algorithm for estimating link travel time. Journal of
Intelligent Transportation Systems: Technology, Planning, and Operations, 7(3-4), 235-
260.
Chong, C., & Mori, S. (2001). Convex combination and covariance intersection algorithms in
distributed fusion. FUSION 2001 Proceedings, Montréal, Quebec, Canada.
Dailey, D. J. (1996). ITS data fusion No. Research Project T9903, Task 9: ATIS/ATMS Regional
IVHS Demonstration. Washington: Washington State Transportation Commission.
Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification
learning algorithms. Neural Computation, 10(7), 1895-1923.
Durrant-Whyte, H. F. (1988). Sensor models and multisensor integration. The International
Journal of Robotics Research, 7(6), 97-113.
El Faouzi, N. (2000). Travel time estimation by evidential data fusion. Recherche Transports
Securite, 68, 15-30.
El Faouzi, N., & Lefevre, E. (2006). Classifiers and distance-based evidential fusion for road
travel time estimation. Proceedings of SPIE, 6242, 92-107.
El Faouzi, N. (2004a). Data fusion in road traffic engineering: An overview. Proceedings of
SPIE, 5434, 360-371.
El Faouzi, N. (2004b). Data-driven aggregative schemes for multisource estimation fusion: A
road travel time application. Proceedings of SPIE, 5434, 351-359.
87
El-Geneidy, A. M., & Bertini, R. L. (2004). Toward validation of freeway loop detector speed
measurements using transit probe data. Proceedings of the 7th International IEEE
Conference on Intelligent Transportation Systems, 2004, 779-784.
Filev, D., & Yager, R. R. (1994). Learning OWA operator weights from data. Proceedings of
1994 IEEE 3rd International Fuzzy Systems Conference, Orlando, FL, USA. , 1 468-473.
Grabisch, M. (1996). The application of fuzzy integrals in multicriteria decision making.
European Journal of Operational Research, 89(3), 445-456.
Grabisch, M., Nguyen, H. T., & Walker, E. A. (1995). Fundamentals of uncertainty calculi with
applications to fuzzy inference. Boston: Kluwer Academic Publishers.
Guo, J., Xia, J., & Smith, B. L. (2009). Kalman filter approach to speed estimation using single
loop detector measurements under congested conditions. Journal of Transportation
Engineering, 135(12), 927
Hall, D. L., & Llinas, J. (2001). Handbook of multisensor data fusion. Boca Raton, FL: CRC
Press.
Huimin Chen, Kirubarajan, T., & Bar-Shalom, Y. (2003). Performance limits of track-to-track
fusion versus centralized estimation: Theory and application. IEEE Transactions on
Aerospace and Electronic Systems, 39(2), 386-400.
Keever, D., Shimizu, M., & Seplow, J. (2003). Data fusion for delivering advanced traveler
information systems No. FHWA-OP-03-119. Washington, D.C.: FHWA.
Kong, Q., Chen, Y., & Liu, Y. (2007). An improved evidential fusion approach for real-time
urban link speed estimation. 2007 IEEE Intelligent Transportation Systems Conference,
Seattle, WA. 562-567.
Kong, Q., Chen, Y., & Liu, Y. (2009a). A fusion-based system for road-network traffic state
surveillance: A case study of shanghai. IEEE Intelligent Transportation Systems
Magazine, 1(1), 37-42.
Kong, Q., Li, Z., Chen, Y., & Liu, Y. (2009b). An approach to urban traffic state estimation by
fusing multisource information. IEEE Transactions on Intelligent Transportation
Systems, 10(3), 499-511.
Kong, Q., & Liu, Y. (2007). A model of federated evidence fusion for real-time urban traffic
state estimation. Journal of Shanghai Jiaotong University, E12(6), 793-798.
Luo, R. C., & Kay, M. G. (1989). Multisensor integration and fusion in intelligent systems. IEEE
Transactions on Systems, 19(5), 901-931.
Ministry of Transportation. (2010). Freeway traffic management systems. Retrieved from
http://www.mto.gov.on.ca/english/traveller/trip/compass-ftms.shtml
Mitchell, H. B. (2007). Multi-sensor data fusion: An introduction. New York: Springer.
Nelson, P., & Palacharla, P. (1993). Neural network model for data fusion in ADVANCE.
Proceedings of the Pacific Rim Trans Tech Conference, 237-293.
Ng, G. W. (2003). Intelligent systems - fusion, tracking and control. Philadelphia: Research
Studies Press Ltd.
88
Park, T., & Lee, S. (2004). A bayesian approach for estimating link travel time on urban arterial
road network. Lecture notes in computer science (pp. 1017-1025) Springer Berlin /
Heidelberg.
Peng, D., Zuo, X., Wu, J., Wang, C., & Zhang, T. (2009). A kalman filter based information
fusion method for traffic speed estimation. 2009 2nd Conference on Power Electronics
and Intelligent Transportation System.
Peterson, B. S., Baldwin, R. O., & Kharoufeh, J. P. (2004). A specification-compatible bluetooth
inquiry simplification. System Sciences, 2004. Proceedings of the 37th Annual Hawaii
International Conference on, 9.
Peterson, B. S., Baldwin, R. O., & Kharoufeh, J. P. (2006). Bluetooth inquiry time
characterization and selection. Mobile Computing, IEEE Transactions on, 5(9), 1173-
1187.
Quadstone Paramics. (2010). Traffic microsimulation.,2010, from http://www.paramics-
online.com/what-is-microsimulation.php
Refaeilzadeh, P., Tang, L., & Liu, H. (2009). In Liu L., Özsu M. T. (Eds.), Encyclopedia of
database systems. New York: Springer.
Roorda, M., Sharman, B., Sekula, C., & Masters, P. (2009). Preliminary analysis of a system for
real-time monitoring of bluetooth device data on an urban freeway. CD Proceedings of
the TRANSLOG 2009 Conference, Hamilton, ON.
Shin, V., Lee, Y., & Choi, T. (2006). Generalized millman's formula and its application for
estimation problems. Signal Processing, 86(2), 257-266.
Tarko, A., & Rouphail, N. (1993). Travel time data fusion in ADVANCE. Proceedings of the
Pacific Rim Trans Tech Conference, 36-42.
Wasson, J. S., Sturdevant, J. R., & Bullock, D. M. (2008). Real-time travel time estimates using
media access control address matching. ITE Journal, 78(6), 20-23.
Welch, G., & Bishop, G. (2006). An introduction to the kalman filter. Unpublished manuscript,
from http://www.cs.unc.edu/~welch/media/pdf/kalman_intro.pdf
Xata Turnpike. (2010). Routetracker., 2010, from
http://www.turnpikeglobal.com/products_services/routetracker.php
Xu, Z. (2005). An overview of methods for determining OWA weights. International Journal of
Intelligent Systems, 20(8), 843-865.
Young, S. (2008). Bluetooth traffic monitoring technology - concept of operation & deployment
guidelines. College Park, Maryland: University of Maryland, Center for Advanced
Transportation Technology.
89
Appendix A
Highway 400 Statistical Significance Tests
(*) denotes the error of the algorithm is statistically different than using only Bluetooth probe
vehicles according to the 5 x 2 cross validation F Test at 5% level of significance. (^) denotes the
error of the algorithm is statistically different than using only loop detectors aggregated by the
midpoint method according to the 5 x 2 cross validation F Test at 5% level of significance. Cells
shaded in green (medium grey if viewed in black and white) are fusion results that are statistically
better than using loop detectors and probe vehicle measurements independently. Cells shaded in
yellow (light grey) are statistically better than using one of loop detectors or Bluetooth but are
not statistically different than using the other of the two. Cells shaded in red (dark grey) show a
result that is statistically worse than using the best sensor used independently.
Table A-1: Average Root of Mean Squared Error – Hwy 400 Link 1, Architecture 1
Probe Vehicles (%)

0 1 5 10 15 20 25 30 35 40
Probe Vehicles 3.44^ 1.80^ 1.23^ 0.96^ 0.80^ 0.68^ 0.60^ 0.53^ 0.49^
Loop Detectors 7.11* 7.11* 7.11* 7.11* 7.11* 7.11* 7.11* 7.11* 7.11* 7.11*
Simple Convex 3.56*^ 2.48*^ 1.65*^ 1.18^ 0.94^ 0.78^ 0.68^ 0.61^ 0.53^ 0.49^
Bar-Shalom/Campo 3.57*^ 2.48*^ 1.66^ 1.18*^ 0.94^ 0.79^ 0.68^ 0.62^ 0.54^ 0.51^
Measurement Fusion 3.07*^ 2.01*^ 1.36*^ 1.03*^ 0.83^ 0.73^ 0.63^ 0.59^ 0.51^ 0.47^
SCAAT Kalman Filter 3.10*^ 2.27*^ 1.30*^ 1.09^ 0.92^ 0.76^ 0.67^ 0.62^ 0.53^ 0.49^
OWA 3.53*^ 3.05^ 2.08^ 1.87*^ 1.82*^ 1.76*^ 1.77*^ 1.77*^ 1.74*^ 1.74*^
Fuzzy Integral 3.54*^ 2.68*^ 1.82^ 1.40^ 1.06^ 0.95^ 0.82^ 0.76^ 0.69^ 0.67^
Neural Network 1.49*^ 1.35*^ 1.32^ 1.12^ 0.85^ 0.85^ 0.70^ 0.68^ 0.55^ 0.54^
Table A-2: Average Root of Mean Squared Error - Hwy 400 Link 1, Architecture 2
Probe Vehicles (%)

0 1 5 10 15 20 25 30 35 40
Probe Vehicles 3.44^ 1.80^ 1.23^ 0.96^ 0.80^ 0.68^ 0.60^ 0.53^ 0.49^
Loop Detectors 7.11* 7.11* 7.11* 7.11* 7.11* 7.11* 7.11* 7.11* 7.11* 7.11*
Simple Convex 7.11* 3.17^ 1.76^ 1.22^ 0.96^ 0.80^ 0.68^ 0.61^ 0.53^ 0.49^
Bar-Shalom/Campo 7.11* 3.17*^ 1.77^ 1.23^ 0.96^ 0.80^ 0.68^ 0.61^ 0.53^ 0.50*^
Measurement Fusion 7.10* 4.60*^ 1.50^ 1.04^ 0.84*^ 0.74^ 0.63*^ 0.59^ 0.51^ 0.47^
OWA 7.11* 3.02*^ 1.75^ 1.22^ 0.97^ 0.82^ 0.70^ 0.63^ 0.54^ 0.53^
Fuzzy Integral 7.11* 3.35^ 1.92^ 1.33^ 1.09^ 0.96^ 0.83^ 0.77^ 0.68^ 0.67^
Neural Network 1.45*^ 1.54*^ 1.19*^ 0.99*^ 0.86^ 0.81^ 0.65^ 0.68^ 0.56^ 0.51^
90
Probe Vehicles (%)

0 1 5 10 15 20 25 30 35 40
Probe Vehicles 4.42^ 2.18^ 1.36 1.08^ 0.92^ 0.80^ 0.74^ 0.63^ 0.59^
Loop Detectors 1.71* 1.71* 1.71* 1.71 1.71* 1.71* 1.71* 1.71* 1.71* 1.71*
Simple Convex 1.69* 1.73* 1.45* 1.10*^ 0.95*^ 0.82*^ 0.75*^ 0.69*^ 0.60*^ 0.57*^
Bar-Shalom/Campo 1.66* 1.58* 1.37*^ 1.07*^ 0.93*^ 0.81^ 0.75^ 0.69^ 0.60^ 0.56*^
Measurement Fusion 1.32*^ 1.53* 1.26* 1.02*^ 0.91*^ 0.78*^ 0.73*^ 0.67*^ 0.58*^ 0.55*^
SCAAT Kalman Filter 1.40* 1.36* 1.44* 1.19^ 0.99*^ 0.85*^ 0.77^ 0.72^ 0.62^ 0.58^
OWA 1.66*^ 2.20* 1.44* 1.13*^ 1.01^ 0.89^ 0.87^ 0.82^ 0.76^ 0.74^
Fuzzy Integral 1.65*^ 1.56* 1.34*^ 1.08*^ 0.93*^ 1.08 0.74^ 0.67*^ 0.57*^ 0.54*^
Neural Network 1.65* 1.74* 1.69 1.13*^ 1.01^ 0.93^ 0.96 0.76^ 0.65^ 0.60^
Probe Vehicles (%)

0 1 5 10 15 20 25 30 35 40
Probe Vehicles 4.42^ 2.18^ 1.36 1.08^ 0.92^ 0.80^ 0.74^ 0.63^ 0.59^
Loop Detectors 1.71* 1.71* 1.71* 1.71 1.71* 1.71* 1.71* 1.71* 1.71* 1.71*
Simple Convex 1.71* 1.59* 1.37*^ 1.06*^ 0.93*^ 0.81*^ 0.75^ 0.68*^ 0.59^ 0.56^
Bar-Shalom/Campo 1.71* 1.60* 1.37*^ 1.06*^ 0.93*^ 0.81^ 0.75^ 0.68*^ 0.59*^ 0.56*^
Measurement Fusion 1.29*^ 2.13* 1.27* 0.98*^ 0.88*^ 0.77*^ 0.73^ 0.67*^ 0.58^ 0.55*^
OWA 1.71* 2.78* 1.69* 1.38^ 1.29^ 1.23^ 1.24^ 1.21^ 1.19*^ 1.19*^
Fuzzy Integral 1.71* 1.63*^ 1.55* 1.40^ 1.32^ 1.28^ 1.24^ 1.21^ 1.18^ 1.18*^
Neural Network 1.78* 1.48* 1.37* 1.25 1.17 0.82^ 0.79^ 0.73^ 0.69^ 0.71^
Probe Vehicles (%)

0 1 5 10 15 20 25 30 35 40
Probe Vehicles 3.18^ 1.74^ 1.16 0.93 0.79^ 0.67^ 0.63^ 0.58^ 0.50^
Loop Detectors 1.09* 1.09* 1.09* 1.09 1.09 1.09* 1.09* 1.09* 1.09* 1.09*
Simple Convex 1.08* 1.03*^ 0.95*^ 0.87*^ 0.75^ 0.68^ 0.63^ 0.59^ 0.54^ 0.47^
Bar-Shalom/Campo 0.76*^ 0.76*^ 0.69*^ 0.66*^ 0.58*^ 0.54*^ 0.52*^ 0.49*^ 0.45*^ 0.41*^
Measurement Fusion 0.95*^ 0.92*^ 0.85*^ 0.80*^ 0.71^ 0.65*^ 0.61^ 0.56^ 0.53^ 0.46^
SCAAT Kalman Filter 0.96*^ 0.90*^ 0.78*^ 0.80*^ 0.72*^ 0.68*^ 0.60*^ 0.57*^ 0.52*^ 0.47^
OWA 2.26* 1.50* 1.54 1.56 1.54 1.53 1.53 1.53 1.53 1.52
Fuzzy Integral 0.99*^ 1.40* 1.18 0.99* 0.89^ 1.10 0.84 0.80^ 0.80 0.80
Neural Network 0.89* 0.90* 0.81*^ 0.72*^ 0.62*^ 0.61*^ 0.57^ 0.53^ 0.48*^ 0.45^
91
Probe Vehicles (%)

0 1 5 10 15 20 25 30 35 40
Probe Vehicles 3.18^ 1.74^ 1.16 0.93 0.79^ 0.67^ 0.63^ 0.58^ 0.50^
Loop Detectors 1.09* 1.09* 1.09* 1.09 1.09 1.09* 1.09* 1.09* 1.09* 1.09*
Simple Convex 1.09* 1.01*^ 0.88*^ 0.80*^ 0.66*^ 0.60*^ 0.55*^ 0.52*^ 0.48*^ 0.43*^
Bar-Shalom/Campo 1.09* 1.01* 0.88*^ 0.80*^ 0.66*^ 0.60*^ 0.55*^ 0.52*^ 0.48*^ 0.43^
Measurement Fusion 0.98*^ 2.33*^ 0.93*^ 0.75*^ 0.64*^ 0.58*^ 0.53*^ 0.50*^ 0.47*^ 0.41*^
OWA 1.09* 2.20*^ 1.24* 1.08 1.00^ 0.98* 0.97* 0.96* 0.92 1.00*^
Fuzzy Integral 1.09* 1.09* 1.03* 1.04 1.01^ 1.01*^ 1.00*^ 1.00*^ 1.00*^ 0.99*^
Neural Network 0.84*^ 4.99 0.75*^ 0.71*^ 0.62*^ 0.56*^ 0.56^ 0.54^ 0.48^ 0.42^
Probe Vehicles (%)

0 1 5 10 15 20 25 30 35 40
Probe Vehicles 4.13 2.10 1.43^ 1.26^ 1.19 0.98^ 0.91^ 0.87^ 0.79^
Loop Detectors 4.46* 4.46 4.46 4.46* 4.46* 4.46 4.46* 4.46* 4.46* 4.46*
Simple Convex 3.77*^ 3.17 2.14^ 1.50^ 1.32^ 1.35^ 1.09^ 1.01^ 0.98^ 0.88^
Bar-Shalom/Campo 2.35*^ 2.09*^ 1.66^ 1.34^ 1.16^ 1.27^ 0.94^ 0.95^ 0.93^ 0.83^
Measurement Fusion 3.77*^ 3.16 2.06^ 1.47^ 1.29^ 1.36^ 1.09^ 1.00^ 0.98^ 0.88^
SCAAT Kalman Filter 3.80*^ 2.87^ 1.68^ 1.31^ 2.30*^ 1.36 1.42*^ 1.57 1.18^ 0.79^
OWA 2.62* 2.21* 1.56^ 1.37^ 1.34^ 1.22^ 1.20^ 1.18^ 1.09^ 1.08^
Fuzzy Integral 2.10*^ 1.98*^ 1.55^ 1.86 1.73*^ 1.35^ 1.25^ 1.13^ 1.47 1.37^
Neural Network 2.45*^ 3.38 2.30 3.26 2.35 2.48 1.68 2.11 3.18 2.09
Probe Vehicles (%)

0 1 5 10 15 20 25 30 35 40
Probe Vehicles 4.13 2.10 1.43^ 1.26^ 1.19 0.98^ 0.91^ 0.87^ 0.79^
Loop Detectors 4.46* 4.46 4.46 4.46* 4.46* 4.46 4.46* 4.46* 4.46* 4.46*
Simple Convex 4.46* 3.56 2.13^ 1.48^ 1.31^ 1.34^ 1.07^ 0.97^ 0.95^ 0.85^
Bar-Shalom/Campo 4.46* 3.65 2.14^ 1.47^ 1.29^ 1.28 1.00^ 0.93^ 0.94^ 0.83^
Measurement Fusion 4.49* 6.95 2.83 1.51^ 1.65 1.51 1.33*^ 1.75^ 1.32^ 0.85^
OWA 4.46* 3.71^ 3.01 2.08^ 2.09^ 1.94^ 2.01^ 1.85^ 1.60^ 1.72^
Fuzzy Integral 4.46* 3.73^ 2.90^ 2.44^ 2.29^ 2.25^ 2.19^ 1.93^ 1.83^ 1.52^
Neural Network 10.98* 2.90 4.85 2.08^ 2.61 2.91 2.62 1.24^ 2.43 1.27^
92
Appendix B
Highway 401 Statistical Significance Tests
(*) denotes the error of the algorithm is statistically different than using only Bluetooth probe
vehicles according to the Student’s t Test at 5% level of significance. (^) denotes the error of the
algorithm is statistically different than using only loop detectors aggregated by the midpoint
method according to the Student’s t Test at 5% level of significance.
Note the following conventions: “A1 – LD” denotes architecture 1 is used with only loop
detectors, “A1 LD + BT” denotes architecture 1 is used with loop detectors and Bluetooth data,
and the same notation applies for architecture 2.
For cases when only loop detectors are used (“A1 LD” and “A2 LD”): Cells shaded in green are
fusion results that are statistically better than using only loop detectors (midpoint method). Cells
shaded in red are fusion results that are statistically worse than using only loop detectors
(midpoint method).
For cases when loop detectors and Bluetooth data are used (“A1 LD + BT ” and “A2 LD + BT”):
Cells shaded in green are fusion results that are statistically better than using loop detectors and
probe vehicle measurements independently. Cells shaded in yellow are statistically better than
using one of loop detectors or Bluetooth but are not statistically different than using the other of
the two. Cells shaded in red show a result that is statistically worse than using the best sensor
used independently.
Table B-1: Average Root of Mean Squared Error – Hwy 401 Link 1
No Fusion A1 LD A2 LD A1 LD+BT A2 LD+BT

Bluetooth Only 1.98673^
Loop Detectors Only 4.39975*
Simple Convex 4.0719* 4.39975* 1.85796^ 1.8818^
Bar-Shalom/Campo 4.03966* 4.39975* 1.8838^ 1.89323^
Measurement Fusion 3.79486* 4.20585* 1.85256^ 1.86824^
SCAAT KF 3.87638* n/a 3.60843* n/a
OWA 2.97015*^ 4.39975* 1.82378^ 1.81296*^
Fuzzy Integral 2.91006*^ 4.39975* 1.73943^ 1.87357^
Neural Network 2.89404 2.70712^ 2.43537^ 1.92409^
93

Bluetooth Only 4.04698
Loop Detectors Only 3.14415
Simple Convex 2.92728 3.14415 2.59538* 3.00884*
Bar-Shalom/Campo 2.69261* 3.14415 2.34069* 3.06904
Measurement Fusion 2.52819^ 2.85365 2.35143* 2.86851*
SCAAT KF 2.45003*^ n/a 2.44254*^ n/a
OWA 3.16662 3.14415 2.4638* 2.70951*
Fuzzy Integral 2.70886* 3.14415 2.42011* 2.41339*
Neural Network 3.34506 3.2136 2.21504* 2.60549*

Bluetooth Only 1.99677^
Loop Detectors Only 2.63665*
Simple Convex 2.86825* 2.63665* 1.68115^ 1.57889*^
Bar-Shalom/Campo 3.16197 2.63665* 1.86442 1.5941*^
Measurement Fusion 2.29225 2.26661 1.52741*^ 1.46968*^
SCAAT KF 2.31614 n/a 2.26261 n/a
OWA 2.66507 2.63665* 1.8907^ 1.6202^
Fuzzy Integral 2.87524 2.63665* 2.38624 1.70776^
Neural Network 2.48191 2.35177 2.10491 2.28601

Bluetooth Only 3.4912
Loop Detectors Only 2.46275
Simple Convex 2.55694 2.46275 1.65456*^ 1.62609*^
Bar-Shalom/Campo 2.51644 2.46275 1.61319*^ 1.60541*^
Measurement Fusion 2.28082 2.1999 1.55103*^ 1.51266*^
SCAAT KF 2.56259 n/a 2.53905 n/a
OWA 2.41351 2.46275 1.78738*^ 1.65018*^
Fuzzy Integral 2.64966 2.46275 1.58807*^ 1.63632*^
Neural Network 2.07812 1.32559*^ 2.02439 1.74791*^

Bachmann Christian 201111 MASc Thesis

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Bachmann Christian 201111 MASc Thesis

Cargado por

Copyright:

Formatos disponibles

Multi-Sensor Data Fusion for Traffic Speed and Travel

A thesis submitted in conformity with the requirements

© Copyright by Christian Bachmann 2011

Multi-Sensor Data Fusion for Traffic Speed and Travel Time

techniques for traffic speed estimation known to the author.

Chapter 4 Data Fusion Techniques ........................................................................................... 27

5.4.2 North of Finch Ave W to Finch Ave W ............................................................. 57

Table 5-1: Highway 400 sensor details ..................................................................................... 47

Table 6-2: Eastbound Highway 401 sensor details .................................................................... 68

Table 6-3: Westbound Highway 401 sensor details ................................................................... 68

Figure 2-1: (Con)fusion of terminology (Hall & Llinas, 2001) .................................................... 7

Figure 2-7: Distributed fusion architecture – adapted from (Ng, 2003)...................................... 12

Figure 4-10: Competitive data fusion architecture (“Architecture 1”)........................................ 44

Figure 5-3: Theoretical Bluetooth device discovery times ......................................................... 51

Figure 5-4: A typical simulation of Highway 400 – Link 1 ....................................................... 55

Figure 5-7: A typical simulation of Highway 400 – Link 2 ....................................................... 58

Figure 5-10: A typical simulation of Highway 400 – Link 3 ..................................................... 60

Figure 5-13: A typical simulation of Highway 400 – Link 4 ..................................................... 63

Figure 6-2: Data collected on Highway 401 – Link 1 ................................................................ 71

Figure 6-5: Data collected on Highway 401 – Link 2 ................................................................ 74

Figure 6-8: Data collected on Highway 401 – Link 3 ................................................................ 77

Figure 6-11: Data collected on Highway 401 – Link 4 .............................................................. 79

ADVANCE = Advanced Driver and Vehicle Advisory Navigation Concept

AGV = Autonomous Guided Vehicles

AID = Automatic Incident Detection

ATIS = Advanced Traveler Information Systems

AVL = Automatic Vehicle Location

DSER = Dempster-Schafer Evidential Reasoning

EOBR = Electronic On Board Recorder

FEFM = Federated Evidence Fusion Model

GEP = Generalized Evidence Processing

GPS = Global Positioning System

ILD = Inductive Loop Detector

ITS = Intelligent Transportation Systems

MAC = Media Access Control

MAE = Mean Absolute Error

MAPE = Mean Absolute Percentage Error

MARE = Mean Absolute Relative Error

MRE = Mean Relative Error

MSDE = Mean State Decision Error

MSE = Mean Squared Error

MTO = Ministry of Transportation Ontario

OWA = Ordered Weighted Averaging

RME = Relative Mean Error

RMSE = Root Mean Squared Error

TMC = Traffic Management Center

VDS = Vehicle Detector Station

“The beginning is the most important part of the work.”

1.1 Bluetooth Traffic Monitoring

Figure 1-1: Bluetooth station installation (Roorda et al., 2009)

Figure 1-2: Bluetooth traffic monitoring operation concept (Young, 2008)

1.2 Loop Detectors

1.3 Research Questions and Objectives

• Compare the accuracies of loop detectors and Bluetooth traffic monitoring.

1.4 Thesis Structure

“Nature provides the main inspiration in designing intelligent systems.”

2.1 What is Data Fusion?

Planning Sensor Fusion

2.2 Importance of Data Fusion – Why Fuse Data?

• Reliability/Robustness/Redundancy: A system that depends on a single source of input

• Accuracy/Certainty: Combining readings from several different kinds of sensors can

• Completeness/Coverage/Complementarity: More data sources will provide extended

• Representation: Another problem that sensor fusion attempts to address is information

2.3 On the Use of Multiple Sensors

= ' (' + ' )) + ' (' + ' )) ,

+1121 3241567: ' = 9 ' )

= (' − ' )(' + ' − ' − ' ))

is the optimal linear estimate, ' ≡ ' = +(( − )( − ), ) is the covariance

= + (' − ' )(' + ' − ' − ' )) ( − ).

+1121 3241567: E = (F, ') F)) , (11)

V = W V) + X Y) + Z) , (14)

V|) = W V) + X Y) , (16)

'|) = W ') W , + _) . (17)

V = V|) + e @` − a V|) A, (18)

' = '|) − e a '|) , (19)

where f is the Kalman gain matrix:

e = '|) a , @a '|) a , + c A .

V = W V) + X Y) + Z) , (21)

2157==(u) = (5 − )y. .

| y + | y + ⋯ + |: y: = , B = 1,2, … , >. (31)

aggregation operator by minimizing the instantaneous errors 7 (B = 1,2, … , >):

with the constraints that y. ∈ M0,1N and ∑:. y. = 1.

To circumvent the constraints on y. , an iterative learning procedure can be used. Let . ( =

2. Use the . (D)( = 1,2, … , 5) to provide a current estimate of the weights:

= | y (D ) + | y (D ) + ⋯ + |: y: (D ), B = 1,2, … , >. (34)

4. Update the estimate of the . :

. (D + 1) = . (D ) − y. (D )@|. − A@ − A, = 1,2, … , 5, (35)

where denotes the learning rate (0 ≤ ≤ 1).

ii. v ⊂ ⊂ >D7= G(v) ≤ G()

A fuzzy measure is said to be additive if G (v ∪ ) = G (v) + G () whenever v ∩ = ∅, super-

< ( , … , : ) = o(.) G@v(.) Ap,