Documentos de Académico
Documentos de Profesional
Documentos de Cultura
3, MARCH 2017
Abstract Session initiation protocol (SIP) is an application termination between two clients. SIP being a text-based pro-
layer protocol used for signaling purposes to manage voice over tocol and designed to work with both UDP and TCP is
IP connections. SIP being a text-based protocol is vulnerable to vulnerable to a range of security attacks [1] particularly Denial
a range of denial of service (DoS) attacks. These DoS attacks can
render the SIP servers/SIP proxy servers unusable by depleting of Service (DoS) attacks. DoS attacks include various SIP
memory and CPU time. In this paper, we consider two types of flooding attacks like BYE flooding, INVITE flooding, multi-
DoS attacks, namely, flooding attacks and coordinated attacks attribute flooding, etc. Recently it has been discovered that a
for detection. Flooding attacks affect both stateless and stateful number of coordinated attacks [2] can also be mounted on SIP
SIP servers while coordinated attacks affect stateful SIP servers. (proxy) servers and user agents, creating DoS. These flooding
We model the SIP operation as discrete event system (DES)
and design a new state transition machine, which we name as and coordinated attacks can completely cripple the communi-
probabilistic counting deterministic timed automata (PCDTA) to cation between VoIP servers rendering them unusable. Hence
describe the behavior of SIP operations. We also identify different detecting these attacks is important.
types of anomalies that can occur in a DES model, which appear There are mitigation and detection techniques [3] proposed
in the form of illegal transitions, violating timing constraints, and in the literature for protecting SIP operation from various types
appear in number which is otherwise not seen. Subsequently, we
map various DoS attacks in SIP to a type of anomaly in DES. of DoS attacks. Many of these methods propose cryptographic
PCDTA can learn probabilities of various transitions and timings extensions to secure SIP or use rule based engine to detect
delay from a set of nonmalicious training sequences. A trained malformed SIP messages (which sometimes cause DoS) or
PCDTA can detect anomalies, and hence various DoS attacks in propose a machine learning algorithm to predict different DoS
SIP. We perform a thorough experiment with computer simulated attacks. Most of the prior works focused on detecting specific
SIP traffic and report the detection performance of PCDTA on
various attacks generated through custom scripts. types of DoS attack. In this paper we propose a generic
Index Terms Communication system security, Computer formal framework which is custom designed to describe SIPs
security, Network security. operational behavior and use it to detect different types of DoS
I. I NTRODUCTION attacks. In particular we make following specific contributions
in this paper.
V OICE over IP (VoIP) is an economical alternative for
telephone communication compared to traditional Public
Switched Telephone Network (PSTN) communication. In VoIP
1) We consider the SIP operation sequence as a Discrete
Event System (DES). Subsequently we develop a proba-
bilistic timed transition model (PCDTA) to characterize
communication the voice conversation data is sent using IP
SIP event sequences and their timings.
packets over Internet. A typical voice call communication
2) We also propose to learn transition and delay prob-
involves two phases as signaling and data transmission. Sig-
abilities of various events of state transition diagram
naling is used to establish and maintain the end to end VoIP
from a set of known non malicious SIP event sequences
call; the actual data transmission usually happens in a different
thus making learning automatic which is otherwise done
session. VoIP can use a range of protocols (H.323, SIP) for
manually.
signaling purposes. Session Initiation Protocol (SIP) is an
3) We identify a range of anomalies that can occur in any
application layer signaling protocol for VoIP communication.
timed DES model and map these anomalies to various
It is used to establish, modify and terminate multimedia
DoS attacks in SIP.
sessions between two VoIP clients also called user agents.
4) We use the timed transition model as an anomaly
It is also used to request and deliver clients presence; send
detection system to detect anomalies, arising as a conse-
and receive instant messages between clients. SIP server(s)
quence of occurrence of illegal transitions, timings and
and/or SIP proxy servers mediate the session initiation and
in number for a particular message type, which help
Manuscript received February 1, 2016; revised June 27, 2016, September 8, detect different SIP attacks.
2016, and November 14, 2016; accepted November 14, 2016. Date of II. SIP OVERVIEW
publication November 23, 2016; date of current version January 18, 2017.
The associate editor coordinating the review of this manuscript and approving SIP has a distributed architecture. It includes the following
it for publication was Prof. Wanlei Zhou. entities.
D. Golait was with IIT Indore, Indore 453552, India. She is now with User agent: These are VoIP phones with a valid URI
R&D, Microsoft India, City - Hyderabad (Telangana) 500032, India (e-mail:
digola@microsoft.com). (user name used by a user). Multimedia sessions are setup
N. Hubballi is with IIT Indore, Indore 453552, India (e-mail: and terminated between user agents.
neminath@iiti.ac.in). Registrar server: User agents register with a registrar
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. server when they connect to network and also update
Digital Object Identifier 10.1109/TIFS.2016.2632071 them periodically.
1556-6013 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
GOLAIT AND HUBBALLI: DETECTING ANOMALOUS BEHAVIOR IN VoIP SYSTEMS: A DES MODELING 731
4) Multi-Attribute Flooding: In this attack a large number call-ID is an anomalous event. These type of messages
of messages using all four SIP message types (INVITE, appear in case of BYE flooding attack.
BYE, RINGING, OK) are sent to overwhelm the SIP server. 2) Anomalous Path: It may be the case that an individual
An attacker can cleverly craft the attack to balance the ratio event may be normal, however a sequence of events
of each type of messages rendering the anomaly detection taken together may signify anomalous behavior.
methods like [5] and [6] ineffective which use the imbal- For example, the SIP system can be in the state
ance between different message types to detect DoS flooding RINGING received. Further it accepts an OK message.
attacks. However occasionally messages do get lost in network,
or due to errors the end hosts retransmit such messages.
B. Coordinated SIP Attacks Thus it is possible that being in RINGING received
Coordinated SIP attacks are carried out by colluding users state to receive another RINGING message with same
who have registered in the VoIP service. Following are the call-ID. However in case of coordinated RINGING
two prominent attacks. attack, the system may repeatedly receive RINGING
1) Ringing Attacks: These attacks attempt to create incom- messages when the system is in RINGING received state
plete transactions at the server with host cooperation [2]. The to unnecessarily prolong the call setup process.
attacker sends invite requests to known peers which reply 3) Non Occurrence of Events/Stalled Progress: As the
with provisional (1xx) messages like RINGING messages but system changes its state or makes progress only after
dont accept the call. Instead callee just repeatedly sends these receiving events, it is anomalous not to receive messages
provisional RINGING messages. This can prolong the lifetime to further the progress. Although this can happen due to
of a transaction to several minutes. Large number of such various reasons like when connectivity goes down for a
incomplete transactions will cause DoS. The attacker needs to node responsible for sending that message, etc. Never-
send fewer requests (compared to flooding cases) to deplete theless frequency of such cases may raise suspicion and
the memory of the SIP server. Ringing attacks affect both types need to be detected.
of stateful proxy servers. For example, in VoIP system a peer may deliberately
2) Prolonged Calls: In these attacks, the attackers exchange stop sending the next expected OK message after receiv-
with the SIP server the same initiation messages as in a normal ing an INVITE message. If such events occur in large
SIP session. But once the call is established, the attackers stay number, the proxy server or SIP server may experience
in the call for as much indefinite time as they can or till the resource crunch and become victim of DoS.
server interrupt occurs. Prolonged call attacks affect only call 4) Anomalous Timing of Events: Normally the next
stateful proxy servers. expected event should occur within a time period after
the current events timing. If the next event takes unusu-
IV. D ISCRETE E VENT S YSTEMS ally longer than expected timing, it is considered an
anomalous timing event.
A discrete event system is characterized by following three For example, in RINGING attacks, after receiving initial
properties. INVITE message the peer entity deliberately delays
1) Discrete States: The system can be in any one of the sending RINGING message.
finite number of states. The state in which the system 5) Anomalous Sample Timing Path: Timing of a path may
is currently in indicates the status of system. The most be anomalous even if every events timing is normal, if
basic status of system used to detect anomalies indicates the aggregate timing of the sequence of events is not
either normal or anomalous state. within a prior identified range.
2) Dynamic: The new state in which the system stays is For example, in coordinated SIP attack, calls are pro-
dependent on the current state. longed unnecessarily by delaying messages and repeat-
3) Event-driven: The change in the system state is com- edly sending some of the messages.
pletely driven by events occurring at certain times. In section VIII we show that all the SIP DoS attacks
described previously can be detected if these anomalies are
V. T YPES OF A NOMALIES
identified in a DES model of SIP.
In this section we adopt and describe few types of anomalies
seen in any DES [7] with few of our own additions. We also VI. P ROPOSED DES M ODEL
establish mapping between these anomaly types to VoIP based In this section we formally define a DES model which we
SIP threat vectors described previously. use to describe the behavior of VoIP based SIP communica-
1) Anomalous Event: An anomalous event is an event tion. We treat SIP events as timed sequences indicating every
which causes the system to move to a state from its event occurs in the system at discrete time. Figure 2 and
current state which is not a regular next state of current Figure 3 show the timed events appearing in INVITE dialogue
state. and REGISTER operation of user agent and as observed at SIP
For example, BYE message must appear only after server and Registrar server respectively.
a successful INVITE initiated dialogue. However any In order to characterize these timed events we propose a
event with a BYE message without corresponding state transition machine as a DES model. One of the motive for
INVITE, RINGING and OK messages, with an arbitrary this novel state transition model is, it should detect all types
GOLAIT AND HUBBALLI: DETECTING ANOMALOUS BEHAVIOR IN VoIP SYSTEMS: A DES MODELING 733
qi Current state
qj Next state
c Counter value at the state qi
Input symbol
(t) Boolean conjunction of constraints on a subset
of clock variables t T to be satisfied
Fig. 2. Timing diagram showing basic path of a SIP call sequence.
Reset (t) Is a subset of clock variables t T to be reset
on this transition
I nc(c) Is a function which maps the current state of
counter to a new value
TABLE I
T RANSITION TABLE FOR INVITE D IALOGUE PCDTA
TABLE II
T RANSITION TABLE FOR REGISTER E VENT PCDTA
as INVITE received state. After receiving the first INVITE the forwards the response to the UAC. If the UAC receives a 200
proxy server may send some redirect response. However for OK response, it sends an ACK to the proxy and the machine
the sake of brevity we have ignored these cases from modeling proceeds to state q4 .
as it does not affect the ultimate detection ability of machine. After the previous message sequence the media session
Further if authentication is enabled then 401, 407 responses begins. Media session can last for arbitrary time, moreover
may be sent to the UAC. Being in INVITE received state it happens in a separate session which we do not model here.
(q1 ) if the proxy server receives any INVITE retransmission Hence, the proxy waits for a BYE event to occur from either
or 401 or 407 responses the machine remains in the same of the UAC or UAS which indicates end of media session.
state.2 If any error message appears (other 4xx, 5xx or 6xx), When it receives a BYE message, the machine goes to the
the machine goes to Error (E) state from state q1 terminating state q5 . The other party then sends a corresponding 200 OK
the existing dialog.3 If the proxy server successfully accepts to the proxy, and the machine reaches the final state q6 . The
the INVITE request, it forwards the INVITE request to the INVITE initiated call event hence is terminated.
corresponding User Agent Server (UAS) or the callee user
b) REGISTER operation: Similar to the previous case we
agent and sends a provisional TRYING message to the UAC.
define a PCDTA for REGISTER operation also, as below
When UAS completely understands the INVITE message
Q = {q7 , q8 , q9 , E, S}
it sends a RINGING message to UAC. When the proxy
q7 is the start state
receives a RINGING message from UAS, the machine makes
F = {q9 } is the final state
a transition to the state q2 which denotes RINGING message
= {R E G I ST E R, O K , 2x x, 3x x, 4x x, 5x x, 6x x} {}
received for that dialogue. If UAS wishes to reject the request
T = {1 }
it sends an error response to the proxy server (one of 4xx, 5xx,
C = {c7 , c8 , c9 , c E , c S }
6xx) and the state machine moves to the Error (E) state. If UAS
transition set and defined as shown in Table II.
wishes to accept the request it sends a 200 OK response to the
The machine at the registrar is initially at state q7 . As the
proxy, and machine reaches state q3 . In either case, the proxy
registrar receives a REGISTER request from a UAC, the
2 Similar retransmission provisions are there at all the states except start and
machine goes to state q8 . If any retransmissions, authentication
final state
requests or redirect requests occur, the machine stays in the
3 Similar exit option is available from every state other than start and final same state. If the request is accepted the registrar responds
state with a 200 OK response, and the machine goes to the final
736 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 3, MARCH 2017
state q9 . If any error occurs, the machine goes to the Error (E) C. Learning Transition Timing Probability
state. After the previous step of learning, we have a transition
In both Figure 4 and Figure 5 there is a transition from diagram with various events and their respective probability.
all the states to a state with label S except start and final In the second step we learn the timing probability on tran-
state. This transition is an epsilon () transition triggered sitions. In order to do this we use timing of various events
when the delay between the previous event and next event of sequences. For a particular event of a sequence
timing exceeds a threshold set on each of these transitions. sk we deduce the time difference between the previous event
How to learn and set these delay constraints is discussed occurrence and current event timings. An example sequence
subsequently. It should be noted that even transition happens is shown here Sk = 1 , N A, 2 , et2 , , n , etn . In this
deterministically hence does not violate definition. Also for converted sequence the first input symbol does not have a
brevity we do not show transitions from each state for each time delay constraint as it can occur at any time. From
input symbol as all such missing transitions should be treated the second symbol onward etl indicates the observed time
as illegal transitions. delay of symbol r from previous occurrence of event r1 .
After converting all the sequences into this format we find
B. Learning Transition Probability the mean time delay of a particular event among all the M
As defined in previous section each transition probability sequences of training set. Using this mean value of time dif-
indicates the likelihood of system changing the state from qi ference we subsequently fit a probability distribution function
to q j upon occurrence of a symbol . This probability (Poisson distribution) which best describes these sequences
has two components as p(/c) prior probability on input of time differences. The probability distribution function will
on present counter value c, and second is probability of have its maximum probability around the mean and decrease as
counter taking certain vaue at a state and is denoted with p(c). the delay increases which indicates it is more anomalous. The
We learn these two probabilities by treating these two as choice of Poisson distribution is justified by the factors affect-
independent. The counter indicates the number of times the ing the timing of events. Timings of different events depend on
state qi has been visited by the repeated occurrence of events round trip delay between two machines. This includes many
in the system. Thus the probability of a transition is the product parameters like network congestion, number of hops traversed
of p( ) and p(c). Counter here indicates how anomalous it is and processing delay by node. These parameters will show
to revisit a state with repeated occurrence of an event. Usually, randomness with not a very significant variation. Hence the
higher the counter value, lesser is its probability indicating rare time differences will also be random with not much variation
event. and are best described with Poisson distribution.
We derive the value of p(c) (the probability value of
counter) by treating it as a discrete random variable and VIII. SIP ATTACK D ETECTION
assigning a probability distribution to it. In particular we use
In this section we describe how the PCDTA model is
Poisson distribution [13] calculated using the mean value of
used to detect various SIP DoS attacks described earlier.
number of times a state has been visited. The probability mass
As mentioned in Section III we mainly deal with two types of
function of a Poisson Random Variable c taking value n N
SIP attacks, namely flooding and coordinated attacks. The next
is given by Equation 4. In Equation 4, represents mean value
two subsections outline how these two attacks can be detected.
of counter c and e is a Eulers number (e = 2.71828).
e n
p(c = n) = (4) A. Detecting SIP Flooding Attacks
n!
To learn the transition probability p( ) we use a sequence In case of flooding DoS attack, a SIP server or proxy
of strings of language L . Let S1 , S2 , , Sm server is targeted with many messages of a particular type. For
(with m 1) be a set of sequences from training set (strings of example, in case of INVITE flooding large number of INVITE
L) with each string of form Si = 1 , t1 , 2 , t2 , , n , tn messages are sent either from a single source or from multiple
be a sequence of timed events (indicating input symbols with sources. Each such message creates an instance of machine,
their timing of occurrence). To learn the probability of event , however these transactions fail to make progress after initial
we use only event sequences (ignoring timing) of transitions. transition as source is not interested in completing INVITE
Each such sequence of events represent a path from start state dialogue; hence all of these transitions will time out (with a
to a final state where each edge label is a symbol name. Thus transition to state S). Thus in order to detect this flooding
each internal state of PCDTA represents a prefix of string. For attack it is sufficient to count the number of such transactions
each edge qi to q j , count the number of such transitions in terminated from INVITE received state to the time out state
all strings. Then the probability of a transition is the ratio of and can be compared with a threshold value (How to select this
number of edges going to state q j to the total number of edges threshold is described in Section IX-C). Similar observation
from qi to any other state qk on any symbol . This is can be made for REGISTER transaction (with valid URI and
given by the Equation 5. authentication enabled), where also large number of timeouts
M are seen as malicious user is not interested in completing the
t =1 (qi , q j , )
pi j ( ) = M ||
(5) transaction. Algorithm 1 describes the method for detecting
t =1 y=1 (qi , qk , y ) INVITE and REGISTER flooding (with random URI) attacks.
GOLAIT AND HUBBALLI: DETECTING ANOMALOUS BEHAVIOR IN VoIP SYSTEMS: A DES MODELING 737
TABLE III
M EAN C OUNTER VALUES . (a) C ALL PCDTA.
(b) R EGISTRATION PCDTA
TABLE V
M EAN T IMING D ELAY VALUES . (a) T RANSITION D ELAY FOR INVITE
PCDTA. (b) T RANSITION D ELAY FOR REGISTER PCDTA
Fig. 7. Poisson distribution graphs for counters at states from q1 worth noting that there is no timing constraint from the state q4
to q6 of INVITE dialogue. (a) Distribution C1. (b) Distribution C2. (which represents the media transmission is on and waiting for
(c) Distribution C3. (d) Distribution C4. (e) Distribution C5.
(f) Distribution C6. BYE event), as media transmission can last arbitrary amount
of time and it would be wrong to put a limit on how long
the users can stay in a call. These thresholds are calculated
2) Learning Transition Probability: As mentioned in from the probability distributions of transition timing delays
Section VI, transition probability is the ratio of fraction of using Equation 6. In this equation, j is the standard deviation,
transitions to a particular next state to all possible next states t mean is the mean of distribution j , and is an experimental
j
of a particular state. Using this method we derive the transition value. We chose to be 100 for our experiment. This value
probability of all the transitions for both INVITE dialogue is derived experimentally so as to minimize false stalled state
and REGISTER transaction. Table IV and Table IV show the detection. We do a sensitivity analysis for different values of
transition probabilities for various transitions for two PCDTAs in subsection IX-E.
learnt from the transaction sequence. All illegal transitions (not
shown here) have 0 probability. f (i ) = t mean
j + j (6)
3) Learning Transition Delay Probability: In order to
We get these threshold values from the mean values calcu-
learn the timing delay probabilities for various transitions,
lated in the table V. We declare a state qi stalled when the
we calculated the mean delay between successive events of
maximum timing delay threshold amongst all the transitions
the sequences. Using these mean values we again fit a Pois-
from qi is crossed.
son distribution graph for every transition delay probability.
Table V and Table V show the average delay observed between
successive events in the sequences of training set. For the C. Setting Threshold for Attack Detection
sake of brevity we do not show the corresponding probability We generated and used one more days SIP transaction
distribution graphs here (can be drawn by plugging mean delay sequences to derive and set various thresholds for attack
value into Poisson distribution equation). detection. The rational behind this is to use an independently
4) Setting Threshold on Timing Constraints for Transitions: generated dataset to set appropriate thresholds. As explained
In PCDTA, transitions are constrained by timing of occurrence in Section VIII, INVITE and REGISTER flooding attacks
and machine will wait for the next event in a particular state are detected by counting the number of timeout transitions
only until this timer expires, upon which an epsilon transition in a T interval of time, however in training dataset there
is activated which will terminate the dialogue with a time were no timeout and illegal transitions. In order to derive
expired message. Setting an appropriate threshold for the time threshold for these cases we generated traffic by randomly
guard is also learned from the event training sequences. It is deleting random number of relevant transitions so that timeout
740 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 3, MARCH 2017
TABLE VIII
F LOODING ATTACK D ETECTION R ESULTS
TABLE IX
C OORDINATED ATTACK D ETECTION R ESULTS
Prolonged call attacks: The bots could stay in call now As mentioned previously in order to detect flooding attacks
for unusually long times. The call times allowed were we count the number of timeout transitions or illegal transi-
between 20 to 70 minutes, which were exceptionally tions in a window period and to detect coordinated attacks
long for normal users. The average call time for training we use path transition probability (per transition sequence).
set was 10 minutes in our experiments (generated by It is worth noting that, the detection performance of PCDTA
transmitting recorded media files of user conversations). is governed by the threshold on these two values.
For coordinated attack detection we selected 500 nor- 1) For flooding detection, a threshold value which is too
mal sequences (from the testing dataset) and mixed with conservative may detect all flooding attack instances
sequences of coordinated attacks. Table IX shows the number however it may generate too many false detection cases.
of sequences, Recall and Accuracy of PCDTA. Unlike flooding On the other hand a threshold which is too large may
cases, here the true positives, true negatives, false positives and miss many genuine flooding intervals. We experimented
false negatives are counted for number of sequences rather with different threshold multipliers using the flooding
than number of intervals as coordinated attack is detected for dataset (entire dataset used for Table VIII) and chosen a
every sequence. threshold value of 2 balancing the recall and false alarm
rate as shown in Table XI.
2) For detecting coordinated attacks the threshold mul-
E. Threshold Parameter Sensitivity Analysis
tiplier is set for the transition sequence probability
As mentioned previously we identify timeout transitions by directly, which keeps on decreasing as more number of
multiplying to the standard deviation of transition timings repeated transitions are observed. Thus any sequence
and adding it to the mean of timing values of events. Hence the probability which is lower than the threshold proba-
value of affects the detected number of stalled states. Thus bility is detected as attack (since values less than 1
it is very essential to choose its value such that we minimize get multiplied). In order to set an appropriate value
the number of falsely identified stalled states. We performed which minimizes the chances of false alarms we experi-
sensitivity analysis on different values of starting from mented with different values for the threshold multipliers
10 to 200 using the average number of stalled state counts ranging from 1 to 3 in step size of 0.5. Table XII
in a window period from training sequences. We used 3 types shows the detection performance and false alarm rate
of dataset to study the sensitivity of false detection to values. generated by PCDTA with different threshold values (it
First is 950 normal INVITE sequences (newly generated), is a multiplier of average number of such transitions
entire dataset used for flooding attack case (Table VIII which in normal intervals) when tested with 1000 instances of
has both normal and different flooding scenarios) and coor- normal sequences and 425 instances of each coordinated
dinated attack dataset (Table IX). The results of percentage attack type (as in Table IX). We can notice that for
of falsely identified stalled states6 when tested with these a threshold multiplier of 2 we get 100% recall and
3 types of datasets are shown in Table X. We can notice that acceptable false alarm rate (4 + 9)/(1000)100 = 1.3%.
the number of falsely identified stalled states lessen as we
reach 100. After 100, it showed no improvement hence we F. Comparison of PCDTA With Other
used the value of as 100. Flooding Detection Method
6 These are not false alarms. False alarms are generated when an interval In this subsection we report the comparison of PCDTA with
has threshold number of timeouts one of recent work on flooding attack detection method in
742 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 3, MARCH 2017
TABLE XI
P ERFORMANCE OF PCDTA W ITH D IFFERENT T HRESHOLDS FOR
T IMEOUT /I LLEGAL T RANSITION C OUNT
false alarm rate for our method and that of [20]. False attacks
were detected using this method [20] mainly in two cases, first
TABLE XII when the legitimate call could not be completed because the
P ERFORMANCE OF PCDTA W ITH D IFFERENT T HRESHOLDS FOR PATH other user was busy and other, when the number of packets
P ROBABILITY in the normal sequences exceeded the maximum threshold,
which may occur in case of retransmissions.
In literature there are other works based on Machine
learning and anomaly detection techniques to detect flooding
attacks. An advantage of PCDTA over other techniques is, it
is generic and can detect both flooding and coordinated types
of DoS attacks, whereas the methods proposed in literature are
TABLE XIII
for specific attack types. Machine learning technique require
F LOODING ATTACK D ETECTION C OMPARISON W ITH [20]
large number of features to be extracted and training with
each attack type which is often difficult. Hellinger distance
based technique [6] use the correlation between different SIP
messages for detecting attack, while this may be evaded by
balancing the number of these messages cleverly. PCDTA
doesnt need training with any attack type and can detect
all cases of DoS. A comparison on detection capability of
different methods is shown in Table XIV.
prior art. For comparison purpose we implemented a method
proposed in [20] and evaluated its performance on our dataset.
The method described in [20] generates statistics of various G. Comparison of PCDTA With Random
SIP messages grouped on the basis of call-ID, ToUri and IP Early Termination Method
address of source. Statistics so generated in a block period We also compare our method to a technique called Random
will be compared with two thresholds. Any interval traffic Early Termination proposed in [2]. Random Early Termination
statistics not lying within this range is declared a flooding is a technique to terminate connections which are in the
attempt. Thresholds are derived from the normal sequences RINGING stage to avoid resource crunch at the proxy server.
data. The maximum and minimum values of the number of It selects active sessions probabilistically for termination based
packets in a group in a particular interval become the two on the age of session in RINGING phase.
thresholds. These thresholds get updated after every normal All the active sessions or transactions which are in RING-
traffic interval is observed. We set the interval here to be ING stage are sorted based on their starting time and two
10 minutes, same as the interval we chose for our case. For thresholds T1 and T2 are used for identifying calls which
most of the cases, the two thresholds came out to be 2 and 5. are to be terminated. Most recent T1 number of calls will
Two thresholds are set for this case because, if an attacker not be considered for termination (white area in Figure 8).
attempts flooding, the number of packets sent from her side This is done to avoid terminating any call which has just
with a unique combination of parameters will be less than been initiated. All calls which are older than T1 but less than
that sent by a normal caller provided she uses different SIP T2 will be dropped with probability pr ob which is given by
parameters for different attack instances (attack strategy 1), Equation 9 (grey area in Figure 8). If the number of active
in a second method attacker can use the same fields for the RINGING transactions are greater than T2 all transactions
SIP packet to generate the attacks (attack strategy 2), like whose sequence number is older than T2 (black area in
repeating the same IP, call-ID, and ToUri, this time, number Figure 8) will be definitely terminated.
of the packets in same group can go beyond than that for ageM RT T
the normal user. Table XIII shows a comparison of recall and pr ob = 1 e M RT T (9)
GOLAIT AND HUBBALLI: DETECTING ANOMALOUS BEHAVIOR IN VoIP SYSTEMS: A DES MODELING 743
In equation 9 MRTT is the Minimum Ringing Time Thresh- 1) Preventive Techniques: SIP specification does not sug-
old and age is the duration of RINGING time. MRTT is the gest any particular security mechanism; instead it allows the
threshold time on the age of RINGING call below which it is use of other security mechanisms like TLS and SMIME for
not considered for termination (and is done regardless of num- securing SIP messages [26]. Preventive methods are mainly in
ber of active transactions). We conducted experiments similar the form of cryptographic extensions to secure the SIP [27].
to the one described in [2]. We generated 900 normal calls and These techniques authenticate messages exchanged between
also Ringing attacks in a span of 25 minutes (which is a higher user agents and SIP servers preventing malicious users from
rate of calls compared to our previous experiments). Normal generating spoofed requests.
and Ringing attacks are generated using bots as described 2) Detection Techniques: Detection methods broadly fall
previously. Table XV shows the performance comparison of under four cases as, signature based, statistical methods,
PCDTA with RET method. In case of RET the number of machine learning based and formal models.
calls terminated by it are considered to calculate Recall and a) Signature based detection: Signature based
Accuracy. For these experiments we set threshold values approaches have encoded patterns for different types of
T1= 18 and T2= 25 calls and also MRTT to 18 seconds in attacks, and the detection system systematically scan the
case of RET and same path probability threshold (15.2737 in incoming traffic for these patterns. Works in [28][30] propose
log scale) used for PCDTA in our previous experiment. These signatures for detecting DoS caused due to maliciously formed
thresholds are computed after observing the performance of SIP messages. These patterns are based on SIP grammar as
RET method for different values. The value of MRTT is set defined in RFC 3261 [4]. However, these approaches can
to 18 seconds as normal calls in the dataset we generated had only detect the previously encoded attacks. A framework
Ringing duration ranging between 5 seconds to 18 plus few to describe known vulnerability in SIP and preventing such
fraction of seconds. We can notice that PCDTA outperforms vulnerabilities is described in [31].
RET in all cases. The lowered Recall and Accuracy of RET is b) Statistical methods: Statistical deviation detection
because it does not terminate any call even if its age is greater approaches attempt to detect abnormalities in the incoming
than MRTT if the number of concurrent transactions does not messages by observing significant deviation from the normal
cross T1 and it may also terminate some normal calls if any behavior. Many researchers [6], [24], [25] have proposed to
normal calls ringing time is greater than MRTT. use hellinger distance (HD) to measure aberration between
In order to access the performance of RET against different normal and attack traffic distribution. Since these approaches
threshold values and MRTT values we performed sensitivity identify the difference in the occurrence count of events of
analysis using different values. Table XVI shows the detection various types they may sometimes generate false alarms or
performance of RET with MRTT set at 18 seconds and for fail to detect some attacks. For example an attacker can craft
different threshold values of T1 and T2. We can see that as a flooding attack balancing the different types of messages
the threshold values are decreased more number of ringing (INVITE, REGISTER, BYE, OK, ACK) such that there are
sequences are detected. Similarly in the second experiment we no differences in distributions (Multi-Attribute flooding) com-
set the threshold values T1 and T2 at 18 and 25 respectively pared to normal scenario, rendering few detection methods
(as these thresholds detected maximum number of ringing ineffective. Reynolds and Ghoshal [5] proposed to measure
sequences in previous case) and varied the MRTT value the difference between the number of attempted connection
between 12 to 18 seconds in step size of 3 seconds. Table XVII establishments and the number of completed connections. This
shows the performance of RET with these values. We can is motivated by the fact that, in flooding cases there would be
again observe that lower MRTT value can detect many attacks many call setup requests which are not completed. A signal
as in this case more number of transactions qualify to be processing technique which observes the change in energy
counted for the thresholds T1 and T2. However as many level of a wavelet to detect slow rate SIP floods is described
normal sequences also have ringing time greater than MRTT in [32].
and are likely to be terminated by probabilistic selection which c) Machine learning methods: There are several attempts
increases the false alarm rate. to use machine learning methods as anomaly detector for
VoIP flooding. Nassar et al. [23] proposed a machine learning
approach to classify and detect SIP traffic and also different
X. P RIOR W ORK
flooding attacks. They used 38 features extracted as statistics
In this section we describe prior work related to SIP flooding of various message types and intervals to train a SVM. Akbar
and anomaly detection. A very brief discussion about most and Farooq [21] evaluated two machine learning algorithms
closely related work in discrete event systems used to detect (Naive Bayes and Decision Tree) using a set of features
anomalies in other domains is also given here. extracted from a window period to detect flooding attacks.
These features are extracted as statistics directly from first
line of SIP packets. Tsiatsikas et al. [22] proposed to generate
A. Voice Over IP Denial of Service a hash value from first line of SIP packet and count number of
To protect against SIP flooding attacks, there are prior works unique hash values in a time window as feature. Similar to [21]
which can be classified as preventive methods (which try it used different machine learning algorithms (SMO, Naive
to secure SIP itself) and detection methods which identify Bayes, Neural Network, Decision Tree and Random Forest
flooding cases by monitoring network traffic. classifiers) to detect attacks of different rates. Mehta et al. [33]
744 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 3, MARCH 2017
TABLE XV
C OMPARISON OF PCDTA W ITH R ANDOM E ARLY T ERMINATION M ETHOD
TABLE XVI
S ENSITIVITY A NALYSIS OF R ANDOM E ARLY T ERMINATION M ETHOD W ITH T HRESHOLDS
TABLE XVII
S ENSITIVITY A NALYSIS OF R ANDOM E ARLY T ERMINATION M ETHOD W ITH MRTT
have pointed various inefficiencies of machine learning tech- characterize the behavior of system and detect these anomalies.
niques in detecting attacks and high degree of false alarms A discrete event system based model is proposed for detecting
particularly of those methods using Euclidean distance. faults in powerline networks [41]. X10 powerline network
d) Formal models: Many prior works have used formal system is used as a case study and based on commands
methods to study and analyze the behavior of SIP protocol. exchanged an automaton is generated which represents the
Works described in [34] and [35] model the SIP INVITE normal behavior profile and subsequently used to detect
transactions using Coloured Petri Nets (CPNs) over reliable anomalies.
and unreliable transport medium. They examine the effect of
various losses in events and corresponding state space models XI. C ONCLUSION
to find defects in SIPs implementation. Another work [36] Session Initiation Protocol has become the defacto standard
utilizes CPN state space analysis to identify the states in for session management in Voice over IP implementations. SIP
SIP INVITE transaction which are vulnerable to Denial of is vulnerable to a range of Denial of Service attacks including
Service attacks. The standard INVITE server transition model flooding and coordinated attacks. In this paper we modeled
described in RFC 3261 is annotated with state constraints different SIP dialogues and transactions as discrete event sys-
in [37] to detect flooding attacks. Work described in [38] tems and proposed a probabilistic state transition machine to
proposed an extended CPN model known as the timed Hier- describe these dialogues and transactions. Further we identified
archical CPN (HCPN) as an intrusion detection system for a range of anomalies generated in a DES system. We described
detecting SIP post-session and pre-session flooding attacks. algorithms to detect various DoS attacks using the proposed
One major limitation of these techniques is they either verify state transition model. We designed and experimented with a
the SIP protocol behavior or mainly focus on detecting only range of DoS attacks generated through custom programs and
the INVITE flooding attacks. report that proposed DES model can detect these attacks with
Some recent surveys on various SIP vulnerabilities and their high accuracy and detection rate.
detection methods can be found in [3] and [39].
R EFERENCES
[1] G. Ormazabal, S. Nagpal, E. Yardeni, and H. Schulzrinne, Secure SIP:
B. Discrete Event Systems A scalable prevention mechanism for DoS attacks on SIP based VoIP
systems, in Proc. Principles, Syst. Appl. IP Telecommun. Serv. Secur.
Discrete Event System modeling has been extensively used Next Generat. Netw., 2008, pp. 107132.
to formally describe various types of failures in control sys- [2] W. Conner and K. Nahrstedt, Protecting SIP proxy servers from
ringing-based denial-of-service attacks, in Proc. 10th IEEE Int. Symp.
tems [40]. Klerx et al. [7] described a probabilistic automata Multimedia (ISM), Dec. 2008, pp. 340347.
to identify anomalies in discrete event systems. In particular [3] A. D. Keromytis, A comprehensive survey of voice over IP security
they consider ATM machine operation as a discrete event research, IEEE Commun. Surveys Tut., vol. 14, no. 2, pp. 514537,
2nd Quart. 2012.
system and describe range of anomalies found in discrete [4] J. Rosenberg et al., SIP: Session Initiation Protocol,
event systems. A probabilistic learning automata is used to document RFC 3261, 2002.
GOLAIT AND HUBBALLI: DETECTING ANOMALOUS BEHAVIOR IN VoIP SYSTEMS: A DES MODELING 745
[5] B. Reynolds and D. Ghoshal, Secure IP telephony using multi- [30] A. Lahmadi and O. Festor, VeTo: An exploit prevention language from
layered protection, in Proc. 10th Annu. Netw. Distrib. Syst. Secur. known vulnerabilities in SIP services, in Proc. Netw. Oper. Manage.
Symp. (NDSS), 2003, pp. 113. Symp. (NOMS), 2010, pp. 216223.
[6] H. Sengar, H. Wang, D. Wijesekera, and S. Jajodia, Detecting VoIP [31] A. Lahmadi and O. Festor, A framework for automated exploit preven-
floods using the Hellinger distance, IEEE Trans. Parallel Distrib. Syst., tion from known vulnerabilities in voice over IP services, IEEE Trans.
vol. 19, no. 6, pp. 794805, Jun. 2008. Netw. Service Manage., vol. 9, no. 2, pp. 114127, Jun. 2012.
[7] T. Klerx, M. Anderka, H. K. Bning, and S. Priesterjahn, Model-based [32] J. Tang and Y. Cheng, Quick detection of stealthy SIP flooding attacks
anomaly detection for discrete event systems, in Proc. IEEE 26th Int. in VoIP networks, in Proc. IEEE Int. Conf. Commun. (ICC), Jun. 2011,
Conf. Tools Artif. Intell. (ICTAI), Nov. 2014, pp. 665672. pp. 15.
[8] R. Alur and D. L. Dill, A theory of timed automata, Theory Comput. [33] A. Mehta, N. Hantehzadeh, V. K. Gurbani, T. K. Ho, J. Koshiko,
Scince, vol. 126, no. 2, pp. 183235, 1994. and R. Viswanathan, On the inefficacy of Euclidean classifiers for
[9] C. Meiners, E. Norige, A. X. Liu, and E. Torng, FlowSifter: A counting detecting self-similar session initiation protocol (SIP) messages, in
automata approach to layer 7 field extraction for deep flow inspection, Proc. 12th IFIP/IEEE Int. Symp. Integr. Netw. Manage. (IM), May 2011,
in Proc. IEEE INFOCOM, Mar. 2012, pp. 17461754. pp. 329336.
[10] R. C. Carrasco and J. Oncina, Learning stochastic regular grammars by [34] L. Ding and L. Liu, Modelling and analysis of the INVITE transaction
means of a state merging method, in Proc. Int. Colloq. Grammatical of the session initiation protocol using coloured Petri Nets, in Proc.
Inference Appl., 1994, pp. 139152. 29th Int. Conf. Appl. Theory Petri Nets, 2008, pp. 132151.
[11] D. D. Ron, Y. Singer, and N. Tishby, On the learnability and usage of [35] L. Liu, Verification of the SIP transaction using coloured Petri Nets,
acyclic probabilistic finite automata, in Proc. 8th Annu. Conf. Comput. in Proc. 32nd Austral. Comput. Sci. Conf., 2009, pp. 6372.
Learn. Theory, 1995, pp. 3140. [36] L. Liu, Uncovering SIP vulnerabilities to DoS attacks using coloured
[12] C. De La Higuera, Grammatical Inference: Learning Automata and Petri Nets, in Proc. 10th IEEE Int. Conf. Trust, Secur. Privacy Comput.
Grammars. New York, NY, USA: Cambridge Univ. Press, 2010. Commun., Nov. 2011, pp. 2936.
[13] S. M. Ross, A First Course in Probability, 8th ed. New York, NY, USA: [37] D. Seo, H. Lee, and E. Nuwere, SIPAD: SIP-VoIP anomaly detec-
Prentice-Hall, 2010. tion using a stateful rule tree, Comput. Commun., vol. 36, no. 3,
[14] [Online]. Available: http://www.asterisk.org/ pp. 562574, Mar. 2013.
[15] [Online]. Available: https://wiki.linuxfoundation.org/networking/netemNetem[38] Y. Ding and G. Su, Intrusion detection system for signal based SIP
[16] M. Nassar, R. State, and O. Festor, Labeled VoIP data-set for intrusion attacks through timed HCPN, in Proc. 2nd Int. Conf. Availability, Rel.
detection evaluation, in Proc. 16th EUNICE/IFIP Conf. Netw. Serv. Secur. (ARES), 2007, pp. 190197.
Appl. Eng. Control Manage., 2010, pp. 97106. [39] S. Ehlert, D. Geneiatakis, and T. Magedanz, Survey of network security
[17] [Online]. Available: http://www.tcpdump.org/Tcpdump systems to counter SIP-based denial-of-service attacks, Comput. Secur.,
[18] [Online]. Available: http://jnetpcap.com/jNetPcap vol. 29, no. 1, pp. 225243, 2010.
[19] [Online]. Available: http://www.secdev.org/projects/scapy/Scapy [40] C. G. Cassandras and S. Lafortune, Introduction to Discrete Event
[20] J. Lee, K. Cho, C. Lee, and S. Kim, VoIP-aware network attack Systems. New York, NY, USA: Springer, 2008.
detection based on statistics and behavior of SIP traffic, Peer-to-Peer [41] A. Arora, R. Jagannathan, and Y.-M. Wang, Model-based fault detection
Netw. Appl., vol. 8, no. 5, pp. 872880, 2015. in powerline networking, in Proc. 16th Int. Parallel Distrib. Process.
[21] M. A. Akbar and M. Farooq, Securing SIP-based VoIP infrastructure Symp. (IPDPS), 2002, pp. 18.
against flooding attacks and spam over IP telephony, J. Knowl. Inf.
Syst., vol. 38, no. 2, pp. 491510, 2014.
[22] Z. Tsiatsikas, A. Fakis, D. Papamartzivanos, D. Geneiatakis,
G. Kambourakis, and C. Kolias, Battling against DDoS in SIP. is Diksha Golait was born in Bhopal, India. She
machine learning-based detection an effective weapon? in Proc. 12th received the B. Tech. degree in computer science
Int. Conf. Secur. Cryptogr. (SECRYPT), 2015, pp. 301308. engineering from IIT Indore, India, in 2016. In
[23] M. Nassar, R. State, and O. Festor, Monitoring SIP traffic using July 2016, she joined Microsoft India (Research
support vector machines, in Proc. 11th Int. Symp. Recent Adv. Intrusion and Development), where she currently pursues a
Detection (RAID), 2008, pp. 311330. career in software development. Her research inter-
[24] J. Tang, Y. Cheng, Y. Hao, and W. Song, SIP flooding attack detection ests include network and system security.
with a multi-dimensional sketch design, IEEE Trans. Depend. Sec.
Comput., vol. 11, no. 6, pp. 582595, Nov/Dec. 2014.
[25] J. Tang, Y. Cheng, and C. Zhou, Sketch-based SIP flooding detection
using Hellinger distance, in Proc. 28th IEEE Conf. Global Telecommun.
(GLOBECOM), Nov. 2009, pp. 16.
[26] D. Geneiatakis, G. Kambourakis, T. Dagiuklas, C. Lambrinoudakis, and
S. Gritzalis, SIP security mechanisms: A state-of-the-art review, in Neminath Hubballi received the Ph.D. degree from
Proc. 5th Int. Netw. Conf. (INC), 2005, pp. 147155. the Department of Computer Science and Engineer-
[27] R. Farley and X. Wang, VoIP Shield: A transparent protection of ing, IIT Guwahati, India. He was with corporate
deployed VoIP systems from SIP-based exploits, in Proc. IEEE Netw. Research and Development Centers of Samsung,
Oper. Manage. Symp. (NOMS), Apr. 2012, pp. 486489. Infosys Lab and Hewlett-Packard. He is currently
[28] D. Geneiatakis, G. Kambourakis, C. Lambrinoudakis, T. Dagiuklas, an Assistant Professor in computer science with IIT
and S. Gritzalis, A framework for protecting a sip-based infrastructure Indore, India. He has authored or coauthored in the
against malformed message attacks, Comput. Netw., vol. 51, no. 10, area of security. He is also a regular reviewer in
pp. 25802593, 2006. many security journals and conferences and also
[29] S. Ehlert et al., Two layer denial of service prevention on SIP VoIP served as a TPC member of several conferences.
infrastructures, Comput. Commun., vol. 31, no. 10, pp. 24432456, His areas of interest include network and system
2008. security.