Está en la página 1de 16

730 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO.

3, MARCH 2017

Detecting Anomalous Behavior in VoIP Systems:


A Discrete Event System Modeling
Diksha Golait and Neminath Hubballi

Abstract Session initiation protocol (SIP) is an application termination between two clients. SIP being a text-based pro-
layer protocol used for signaling purposes to manage voice over tocol and designed to work with both UDP and TCP is
IP connections. SIP being a text-based protocol is vulnerable to vulnerable to a range of security attacks [1] particularly Denial
a range of denial of service (DoS) attacks. These DoS attacks can
render the SIP servers/SIP proxy servers unusable by depleting of Service (DoS) attacks. DoS attacks include various SIP
memory and CPU time. In this paper, we consider two types of flooding attacks like BYE flooding, INVITE flooding, multi-
DoS attacks, namely, flooding attacks and coordinated attacks attribute flooding, etc. Recently it has been discovered that a
for detection. Flooding attacks affect both stateless and stateful number of coordinated attacks [2] can also be mounted on SIP
SIP servers while coordinated attacks affect stateful SIP servers. (proxy) servers and user agents, creating DoS. These flooding
We model the SIP operation as discrete event system (DES)
and design a new state transition machine, which we name as and coordinated attacks can completely cripple the communi-
probabilistic counting deterministic timed automata (PCDTA) to cation between VoIP servers rendering them unusable. Hence
describe the behavior of SIP operations. We also identify different detecting these attacks is important.
types of anomalies that can occur in a DES model, which appear There are mitigation and detection techniques [3] proposed
in the form of illegal transitions, violating timing constraints, and in the literature for protecting SIP operation from various types
appear in number which is otherwise not seen. Subsequently, we
map various DoS attacks in SIP to a type of anomaly in DES. of DoS attacks. Many of these methods propose cryptographic
PCDTA can learn probabilities of various transitions and timings extensions to secure SIP or use rule based engine to detect
delay from a set of nonmalicious training sequences. A trained malformed SIP messages (which sometimes cause DoS) or
PCDTA can detect anomalies, and hence various DoS attacks in propose a machine learning algorithm to predict different DoS
SIP. We perform a thorough experiment with computer simulated attacks. Most of the prior works focused on detecting specific
SIP traffic and report the detection performance of PCDTA on
various attacks generated through custom scripts. types of DoS attack. In this paper we propose a generic
Index Terms Communication system security, Computer formal framework which is custom designed to describe SIPs
security, Network security. operational behavior and use it to detect different types of DoS
I. I NTRODUCTION attacks. In particular we make following specific contributions
in this paper.
V OICE over IP (VoIP) is an economical alternative for
telephone communication compared to traditional Public
Switched Telephone Network (PSTN) communication. In VoIP
1) We consider the SIP operation sequence as a Discrete
Event System (DES). Subsequently we develop a proba-
bilistic timed transition model (PCDTA) to characterize
communication the voice conversation data is sent using IP
SIP event sequences and their timings.
packets over Internet. A typical voice call communication
2) We also propose to learn transition and delay prob-
involves two phases as signaling and data transmission. Sig-
abilities of various events of state transition diagram
naling is used to establish and maintain the end to end VoIP
from a set of known non malicious SIP event sequences
call; the actual data transmission usually happens in a different
thus making learning automatic which is otherwise done
session. VoIP can use a range of protocols (H.323, SIP) for
manually.
signaling purposes. Session Initiation Protocol (SIP) is an
3) We identify a range of anomalies that can occur in any
application layer signaling protocol for VoIP communication.
timed DES model and map these anomalies to various
It is used to establish, modify and terminate multimedia
DoS attacks in SIP.
sessions between two VoIP clients also called user agents.
4) We use the timed transition model as an anomaly
It is also used to request and deliver clients presence; send
detection system to detect anomalies, arising as a conse-
and receive instant messages between clients. SIP server(s)
quence of occurrence of illegal transitions, timings and
and/or SIP proxy servers mediate the session initiation and
in number for a particular message type, which help
Manuscript received February 1, 2016; revised June 27, 2016, September 8, detect different SIP attacks.
2016, and November 14, 2016; accepted November 14, 2016. Date of II. SIP OVERVIEW
publication November 23, 2016; date of current version January 18, 2017.
The associate editor coordinating the review of this manuscript and approving SIP has a distributed architecture. It includes the following
it for publication was Prof. Wanlei Zhou. entities.
D. Golait was with IIT Indore, Indore 453552, India. She is now with User agent: These are VoIP phones with a valid URI
R&D, Microsoft India, City - Hyderabad (Telangana) 500032, India (e-mail:
digola@microsoft.com). (user name used by a user). Multimedia sessions are setup
N. Hubballi is with IIT Indore, Indore 453552, India (e-mail: and terminated between user agents.
neminath@iiti.ac.in). Registrar server: User agents register with a registrar
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. server when they connect to network and also update
Digital Object Identifier 10.1109/TIFS.2016.2632071 them periodically.
1556-6013 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
GOLAIT AND HUBBALLI: DETECTING ANOMALOUS BEHAVIOR IN VoIP SYSTEMS: A DES MODELING 731

These attacks affect registrar servers; and both stateful and


stateless proxy servers. There are mainly 4 types of flooding
attacks possible in SIP, as detailed below.
1) Invite Flooding: In this attack, large number of INVITE
messages are sent to a SIP server by a/many malicious client(s)
which never completes the handshake with other user agents
leaving these transactions incomplete. Each such request has a
bearing on resources available at server. Further if the interme-
diate proxy servers are stateful, they also have the risk of run-
ning out of resources if the rate of INVITE messages is high.
2) BYE Flooding: In this attack a large number of BYE
Fig. 1. SIP call setup.
messages are sent to a SIP server. There are two variations of
BYE flooding cases.
Location server: Stores different locations of user agents
which are identified by their IP addresses. There can be 1) With random call-ID: Here attacker sends large number
more than one location associated with a user. of crafted BYE message to SIP server with a randomly
Proxy server: Forwards the connection requests to the generated call-ID. This can terminate arbitrary ongoing
intended recipients on behalf of user agents. connections if the call-ID of such messages match
Redirect server: If a user has more than one location, with call-ID of any ongoing call for which server has
redirect server helps to fork a connection request to maintained a state.
different addresses. 2) With a known call-ID: This is a targeted attack where
Figure 1 shows a typical call setup using SIP. A typical SIP the attacker is able to sniff the traffic exchanged between
session begins with an INVITE from the caller. SIP proxy an SIP user agent and SIP server and get hold of
server receives the INVITE and immediately sends a 100 call-ID. Now the attacker can generate a BYE message
(TRYING) message to caller. When the request reaches the using that call-ID which will terminate the connection.
callee, it starts ringing and it indicates the RINGING status in 3) REGISTER Flooding: Here a large number of user agent
a 180 response. If the callee accepts the call, callees phone REGISTER requests are sent to the SIP registrar. The server
sends a 200 (OK) response, otherwise an error response would spends lot of computing and memory resources in addressing
be sent. Callers phone acknowledges it by sending an ACK. these requests.
This completes the three-way handshake INVITE/200/ACK SIP mandates every user agent to first REGISTER with
used to establish SIP sessions. After this, media session start. a registrar server. This registration process can be secure
At the end of call, either of the two user agents can send a with added authentication or can be unsecured without any
BYE request to end the call. The other user agent responds to security. SIP uses a digest authentication algorithm to negotiate
it with a 200 (OK) response. credentials with the user agents. In this, the user initially
sends a request to the server that requires authentication but
III. SIP T HREAT V ECTORS does not provide credentials; the server responds with the 401
In this section we describe different types of DoS attacks Unauthorized response code, providing the authentication
to which SIP is vulnerable. These attacks can be grouped realm and a randomly generated value called nonce; the
as flooding and coordinated attacks; details of these two client then re-sends the same REGISTER request with an
are given in next two subsections. It is worth differentiating authentication header that includes the response code; if the
between stateful and stateless SIP proxy servers at this point credentials are correct SIP server answers this request with
as few types of attacks affect only particular SIP proxy a 200 OK message. There are two variations of REGISTER
server. A stateful proxy server maintains the state information flooding attacks that can be generated here depending on
about various incomplete and ongoing transactions, whereas a whether it uses security or not.
stateless proxy server just forwards the SIP messages without 1) With Random URI: In this case attacker can use a
maintaining any state information. A stateful proxy server can random user agent name and try to REGISTER with a
be of type transaction stateful or call stateful. SIP server. Depending on whether the user is having an
Transaction stateful: A proxy is transaction stateful if it account or not SIP server will respond back with either
maintains the client and server transaction state machines OK or 404 user not found message.
as defined in the specification of RFC 3261 [4]. 2) With a valid URI: In this case known user agent names
Call stateful: A proxy is call stateful if it retains state for are used to REGISTER with random IP addresses. If the
a dialog from the initiating INVITE to the terminating authentication is enabled SIP server sends a validate
BYE request. A call stateful proxy is always transaction authentication message, the attacker just ignores the
stateful. challenge and starts a new request with another user
agent name. However the server keeps the allocated
A. SIP Flooding Attacks memory for a certain period of time, in addition to the
In flooding attacks large number of messages of a particular wasted CPU resources for calculating the nonce. This
type are sent to overwhelm the SIP server or SIP proxy server. can cause DoS if number of such requests is very high.
732 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 3, MARCH 2017

4) Multi-Attribute Flooding: In this attack a large number call-ID is an anomalous event. These type of messages
of messages using all four SIP message types (INVITE, appear in case of BYE flooding attack.
BYE, RINGING, OK) are sent to overwhelm the SIP server. 2) Anomalous Path: It may be the case that an individual
An attacker can cleverly craft the attack to balance the ratio event may be normal, however a sequence of events
of each type of messages rendering the anomaly detection taken together may signify anomalous behavior.
methods like [5] and [6] ineffective which use the imbal- For example, the SIP system can be in the state
ance between different message types to detect DoS flooding RINGING received. Further it accepts an OK message.
attacks. However occasionally messages do get lost in network,
or due to errors the end hosts retransmit such messages.
B. Coordinated SIP Attacks Thus it is possible that being in RINGING received
Coordinated SIP attacks are carried out by colluding users state to receive another RINGING message with same
who have registered in the VoIP service. Following are the call-ID. However in case of coordinated RINGING
two prominent attacks. attack, the system may repeatedly receive RINGING
1) Ringing Attacks: These attacks attempt to create incom- messages when the system is in RINGING received state
plete transactions at the server with host cooperation [2]. The to unnecessarily prolong the call setup process.
attacker sends invite requests to known peers which reply 3) Non Occurrence of Events/Stalled Progress: As the
with provisional (1xx) messages like RINGING messages but system changes its state or makes progress only after
dont accept the call. Instead callee just repeatedly sends these receiving events, it is anomalous not to receive messages
provisional RINGING messages. This can prolong the lifetime to further the progress. Although this can happen due to
of a transaction to several minutes. Large number of such various reasons like when connectivity goes down for a
incomplete transactions will cause DoS. The attacker needs to node responsible for sending that message, etc. Never-
send fewer requests (compared to flooding cases) to deplete theless frequency of such cases may raise suspicion and
the memory of the SIP server. Ringing attacks affect both types need to be detected.
of stateful proxy servers. For example, in VoIP system a peer may deliberately
2) Prolonged Calls: In these attacks, the attackers exchange stop sending the next expected OK message after receiv-
with the SIP server the same initiation messages as in a normal ing an INVITE message. If such events occur in large
SIP session. But once the call is established, the attackers stay number, the proxy server or SIP server may experience
in the call for as much indefinite time as they can or till the resource crunch and become victim of DoS.
server interrupt occurs. Prolonged call attacks affect only call 4) Anomalous Timing of Events: Normally the next
stateful proxy servers. expected event should occur within a time period after
the current events timing. If the next event takes unusu-
IV. D ISCRETE E VENT S YSTEMS ally longer than expected timing, it is considered an
anomalous timing event.
A discrete event system is characterized by following three For example, in RINGING attacks, after receiving initial
properties. INVITE message the peer entity deliberately delays
1) Discrete States: The system can be in any one of the sending RINGING message.
finite number of states. The state in which the system 5) Anomalous Sample Timing Path: Timing of a path may
is currently in indicates the status of system. The most be anomalous even if every events timing is normal, if
basic status of system used to detect anomalies indicates the aggregate timing of the sequence of events is not
either normal or anomalous state. within a prior identified range.
2) Dynamic: The new state in which the system stays is For example, in coordinated SIP attack, calls are pro-
dependent on the current state. longed unnecessarily by delaying messages and repeat-
3) Event-driven: The change in the system state is com- edly sending some of the messages.
pletely driven by events occurring at certain times. In section VIII we show that all the SIP DoS attacks
described previously can be detected if these anomalies are
V. T YPES OF A NOMALIES
identified in a DES model of SIP.
In this section we adopt and describe few types of anomalies
seen in any DES [7] with few of our own additions. We also VI. P ROPOSED DES M ODEL
establish mapping between these anomaly types to VoIP based In this section we formally define a DES model which we
SIP threat vectors described previously. use to describe the behavior of VoIP based SIP communica-
1) Anomalous Event: An anomalous event is an event tion. We treat SIP events as timed sequences indicating every
which causes the system to move to a state from its event occurs in the system at discrete time. Figure 2 and
current state which is not a regular next state of current Figure 3 show the timed events appearing in INVITE dialogue
state. and REGISTER operation of user agent and as observed at SIP
For example, BYE message must appear only after server and Registrar server respectively.
a successful INVITE initiated dialogue. However any In order to characterize these timed events we propose a
event with a BYE message without corresponding state transition machine as a DES model. One of the motive for
INVITE, RINGING and OK messages, with an arbitrary this novel state transition model is, it should detect all types
GOLAIT AND HUBBALLI: DETECTING ANOMALOUS BEHAVIOR IN VoIP SYSTEMS: A DES MODELING 733

qi Current state
qj Next state
c Counter value at the state qi
Input symbol
(t) Boolean conjunction of constraints on a subset
of clock variables t T to be satisfied
Fig. 2. Timing diagram showing basic path of a SIP call sequence.
Reset (t) Is a subset of clock variables t T to be reset
on this transition
I nc(c) Is a function which maps the current state of
counter to a new value

PCDTA is similar to a Probabilistic Timed Automata (PTA)


and has constraints on timings of various events similar to
Timed Automata [8] with added counting ability similar to [9]
Fig. 3. Timing diagram showing basic path of a SIP REGISTER sequence. with a subtle difference in the way probability values are
assigned to each transition. In PTA the following probability
of anomalies described previously. As different DoS attacks
constraint holds
of VoIP system are mapped to one of these anomaly type this
model in effect should be able to detect all DoS attacks. s

Our proposed DES model is a probabilistic timed transition pi j (a) = 1 a  (1)


j =1
state machine with the ability to count the number of times a
particular state has been visited. Counting the number of times In Equation 1, pi j (a) denotes the probability of automaton
a state has been visited is required for identifying repeated to move from state qi to state q j on input symbol a. The value
transmissions of same message type. Similarly timing con- s is the number of such next states of qi . This ensures that on
straints help in identifying unusually delayed transitions. To be input a the sum of probabilities of all outgoing edges from qi
precise, we describe a Probabilistic Counting Deterministic add to 1. In PCDTA the probability of transitions satisfy the
Timed Automaton (PCDTA). The state machine PCDTA is constraint of Equation 2.
denoted as

s
pi j (/c) p(c) = 1  (2)
M = (Q, , C, T , , q0 , , , , , F) where
cN j =1
Q Is a finite set of states
Equation 2 enforces that the sum of probabilities of all
q0 Q Is an initial state
transitions from state qi to q j over all possible values of a
 Is a finite set of input symbols
counter value c at qi and for all possible input symbols 
 Is an input symbol
add to 1.
C Is a finite set of Counters with each ci C
The PCDTA is a machine which generates a set of strings
taking values in N
starting from start state to an accepting state. The set of
T Is a finite set of Clock Variables where each
strings generated by the PCDTA is the language generated
ti T take non negative values in R
by it and is denoted as L  . The string generation can
Is a set of transitions defined as
be defined as a recursive process. For example the string with
: Q C 2|T |  (T ) Q 2|T | S = 1 , 2 , , n can be generated as (q0 , 1 , 2 , , n )
(C C) where denote the extended transition function which is
 Is a set of transition probability distributions defined recursively as ( (q0 , 1 , 2 , , n1 ), n ). While
 Is a set of timing delay probability distributions generating a string if qi is the current state and k is the input
: i  is a transition probability function, symbol, the next state is chosen according to the probability of
which assigns a probability distribution  transitions defined. Thus any string S  is generated with a
to a collection of transitions i where i probability given by Equation 3. Later we use this probability
represents the transitions to next state q j from of a string to detect anomalous paths. Similarly we also define
current state qi . Every  is defined as the timing probability of a path as a product of their respective
: [0, 1] with N a set of counter event timing delays.
values of state qi .
:  is a transition time probability 
n

function, which assigns a probability distribution P(S) = p(qi , i+1 ) (3)


i=0
 to every transition. Every  is
defined as : [0, 1] with R a set of
VII. L EARNING THE M ODEL
time values.
F A subset of states (F Q) which are final states In this section we describe how to learn the PCDTA model
from a set of observations of normal VoIP SIP call sequence
Each transition i is a seven tuple qi , q j , c, behavior. Normally in the discrete event systems the transitions
, (t), Reset (t), I nc(c) with the elements representing are identified using a set of observation sequences. There
734 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 3, MARCH 2017

Assumptions: We make the following two assumptions


while modeling the SIP event sequences.
i. If there are any redirection messages, the User Agent
Client (UAC) or the caller reuses the same To, From,
and call-ID used in the original redirected request so
that it follows the same automata instance.
ii. Any 4xx except 401(Unauthorized) or 407(Proxy
Authentication Required), 5xx and 6xx response is con-
sidered to lead to an error state that terminates the
existing dialog.
Separate machine for each operation: The machine for each
session is identified by its call-ID. Hence there will be many
machines running in parallel when our system is put into use
Fig. 4. Simplified PCDTA for SIP INVITE dialogue. for validating SIP traffic. If the machine successfully follows
all the transitions and reaches the final state from start state
with the path probability less than threshold probability (to
be discussed), the SIP operation is considered to be normal.
If the machine terminates by reaching the error state, if it
reaches a deadlock (stalls somewhere) or if the machine gets
involved in a live lock, the SIP operation is considered to
be anomalous. The deadlock condition is handled by the timer
constraint associated with the concerned transition. In order
to measure the time difference between events, system time
is captured and associated with timing of each event. If the
Fig. 5. Simplified PCDTA for SIP REGISTER operation. timing delay at any instance is greater than the threshold, the
machine reaches the stalled state. The live lock condition is
are algorithms like ALERGIA [10] and Acyclic Probabilistic checked by the path transition probability.
Finite Automata Learning [11] to learn both transitions and 1) Complexity: It is worth noting that maintaining state
probabilities from input sequences. A good coverage of dif- information of transition diagram involves some amount of
ferent types of learning methods for transitions and the proba- CPU and memory overhead. As the transition diagrams are
bilities is in [12]. However in our case the progress of system created for every new VoIP call and are destroyed either
(transition) behavior is very well defined and is unambiguously because of error, timeout (in case of DoS attack) or successful
described (as shown in Figures 2 and 3) in SIP protocol completion and graceful termination of call, at any time
specification, hence we create the transition diagram of the there are only handful of active transition diagrams. Moreover
machine from this specification [4]. However some of the many of the operations involved in training and testing phase
SIP messages may be retransmitted due to errors and losses. are not computationally (asymptotically) expensive. Assuming
These retransmissions need to be handled to avoid them being each transition has fixed memory requirements, the overall
treated as anomalous events. These messages carry the same complexity is linear in terms of average number of VoIP calls.
call-ID as the original message. These events are handled by a) INVITE initiated dialog: We define a PCDTA for
having suitable additional transitions in the machine. Further modeling INVITE initiated SIP dialogue as follows
the delay between successive message sequence are governed Q = {q0 , , q6 , E, S}
by timing constraints. Thus we wish to learn the probabilities q0 is start state
of transitions and timings from a sequence of observations. F = {q6 } is the final state
Generating event sequence diagram from SIP specification,  = {I N V I T E, O K , AC K , 2x x, 3x x, 4x x, 5x x, 6x x, BY E}
learning the probabilities of transitions and probabilities of {}
timings of transitions in the model are described in next three T = {1 }
subsections. C = {c0 , c6 , c E , c S }
transition set and defined as shown in Table I.1
A simplified representation of state transition corresponding
A. Generating Event Sequence Diagram to single INVITE initiated dialogue is shown in Figure 4. The
machine is initially in state q0 waiting for an INVITE event to
We model two SIP use case scenarios as shown in Figure 4
appear with a different call-ID than all the existing dialogues.
and Figure 5 using Probabilistic Counting Deterministic Timed
When a User Agent Client (UAC) or the caller user agent sends
Automata (PCDTA). The first is the INVITE initiated dialog
a new INVITE request (new call-ID), a new instance of the
which is terminated with a BYE (and its corresponding OK
machine is created and the machine moves to state q1 , labeled
response). The second is the REGISTER operation. Both the
operations are modelled as observed on a proxy server and 1 Trasitions with indicate either there is no constraint on timing or
registrar server respectively. resetting clock or updating counter value is not required
GOLAIT AND HUBBALLI: DETECTING ANOMALOUS BEHAVIOR IN VoIP SYSTEMS: A DES MODELING 735

TABLE I
T RANSITION TABLE FOR INVITE D IALOGUE PCDTA

TABLE II
T RANSITION TABLE FOR REGISTER E VENT PCDTA

as INVITE received state. After receiving the first INVITE the forwards the response to the UAC. If the UAC receives a 200
proxy server may send some redirect response. However for OK response, it sends an ACK to the proxy and the machine
the sake of brevity we have ignored these cases from modeling proceeds to state q4 .
as it does not affect the ultimate detection ability of machine. After the previous message sequence the media session
Further if authentication is enabled then 401, 407 responses begins. Media session can last for arbitrary time, moreover
may be sent to the UAC. Being in INVITE received state it happens in a separate session which we do not model here.
(q1 ) if the proxy server receives any INVITE retransmission Hence, the proxy waits for a BYE event to occur from either
or 401 or 407 responses the machine remains in the same of the UAC or UAS which indicates end of media session.
state.2 If any error message appears (other 4xx, 5xx or 6xx), When it receives a BYE message, the machine goes to the
the machine goes to Error (E) state from state q1 terminating state q5 . The other party then sends a corresponding 200 OK
the existing dialog.3 If the proxy server successfully accepts to the proxy, and the machine reaches the final state q6 . The
the INVITE request, it forwards the INVITE request to the INVITE initiated call event hence is terminated.
corresponding User Agent Server (UAS) or the callee user
b) REGISTER operation: Similar to the previous case we
agent and sends a provisional TRYING message to the UAC.
define a PCDTA for REGISTER operation also, as below
When UAS completely understands the INVITE message
Q = {q7 , q8 , q9 , E, S}
it sends a RINGING message to UAC. When the proxy
q7 is the start state
receives a RINGING message from UAS, the machine makes
F = {q9 } is the final state
a transition to the state q2 which denotes RINGING message
 = {R E G I ST E R, O K , 2x x, 3x x, 4x x, 5x x, 6x x} {}
received for that dialogue. If UAS wishes to reject the request
T = {1 }
it sends an error response to the proxy server (one of 4xx, 5xx,
C = {c7 , c8 , c9 , c E , c S }
6xx) and the state machine moves to the Error (E) state. If UAS
transition set and defined as shown in Table II.
wishes to accept the request it sends a 200 OK response to the
The machine at the registrar is initially at state q7 . As the
proxy, and machine reaches state q3 . In either case, the proxy
registrar receives a REGISTER request from a UAC, the
2 Similar retransmission provisions are there at all the states except start and
machine goes to state q8 . If any retransmissions, authentication
final state
requests or redirect requests occur, the machine stays in the
3 Similar exit option is available from every state other than start and final same state. If the request is accepted the registrar responds
state with a 200 OK response, and the machine goes to the final
736 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 3, MARCH 2017

state q9 . If any error occurs, the machine goes to the Error (E) C. Learning Transition Timing Probability
state. After the previous step of learning, we have a transition
In both Figure 4 and Figure 5 there is a transition from diagram with various events and their respective probability.
all the states to a state with label S except start and final In the second step we learn the timing probability on tran-
state. This transition is an epsilon () transition triggered sitions. In order to do this we use timing of various events
when the delay between the previous event and next event of sequences. For a particular event  of a sequence
timing exceeds a threshold set on each of these transitions. sk we deduce the time difference between the previous event
How to learn and set these delay constraints is discussed occurrence and current event timings. An example sequence
subsequently. It should be noted that even  transition happens is shown here Sk = 1 , N A, 2 , et2 , , n , etn . In this
deterministically hence does not violate definition. Also for converted sequence the first input symbol does not have a
brevity we do not show transitions from each state for each time delay constraint as it can occur at any time. From
input symbol as all such missing transitions should be treated the second symbol onward etl indicates the observed time
as illegal transitions. delay of symbol r from previous occurrence of event r1 .
After converting all the sequences into this format we find
B. Learning Transition Probability the mean time delay of a particular event among all the M
As defined in previous section each transition probability sequences of training set. Using this mean value of time dif-
indicates the likelihood of system changing the state from qi ference we subsequently fit a probability distribution function
to q j upon occurrence of a symbol . This probability (Poisson distribution) which best describes these sequences
has two components as p(/c) prior probability on input of time differences. The probability distribution function will
on present counter value c, and second is probability of have its maximum probability around the mean and decrease as
counter taking certain vaue at a state and is denoted with p(c). the delay increases which indicates it is more anomalous. The
We learn these two probabilities by treating these two as choice of Poisson distribution is justified by the factors affect-
independent. The counter indicates the number of times the ing the timing of events. Timings of different events depend on
state qi has been visited by the repeated occurrence of events round trip delay between two machines. This includes many
in the system. Thus the probability of a transition is the product parameters like network congestion, number of hops traversed
of p( ) and p(c). Counter here indicates how anomalous it is and processing delay by node. These parameters will show
to revisit a state with repeated occurrence of an event. Usually, randomness with not a very significant variation. Hence the
higher the counter value, lesser is its probability indicating rare time differences will also be random with not much variation
event. and are best described with Poisson distribution.
We derive the value of p(c) (the probability value of
counter) by treating it as a discrete random variable and VIII. SIP ATTACK D ETECTION
assigning a probability distribution to it. In particular we use
In this section we describe how the PCDTA model is
Poisson distribution [13] calculated using the mean value of
used to detect various SIP DoS attacks described earlier.
number of times a state has been visited. The probability mass
As mentioned in Section III we mainly deal with two types of
function of a Poisson Random Variable c taking value n N
SIP attacks, namely flooding and coordinated attacks. The next
is given by Equation 4. In Equation 4, represents mean value
two subsections outline how these two attacks can be detected.
of counter c and e is a Eulers number (e = 2.71828).
e n
p(c = n) = (4) A. Detecting SIP Flooding Attacks
n!
To learn the transition probability p( ) we use a sequence In case of flooding DoS attack, a SIP server or proxy
of strings of language L  . Let S1 , S2 , , Sm server is targeted with many messages of a particular type. For
(with m 1) be a set of sequences from training set (strings of example, in case of INVITE flooding large number of INVITE
L) with each string of form Si = 1 , t1 , 2 , t2 , , n , tn  messages are sent either from a single source or from multiple
be a sequence of timed events (indicating input symbols with sources. Each such message creates an instance of machine,
their timing of occurrence). To learn the probability of event , however these transactions fail to make progress after initial
we use only event sequences (ignoring timing) of transitions. transition as source is not interested in completing INVITE
Each such sequence of events represent a path from start state dialogue; hence all of these transitions will time out (with a
to a final state where each edge label is a symbol name. Thus transition to state S). Thus in order to detect this flooding
each internal state of PCDTA represents a prefix of string. For attack it is sufficient to count the number of such transactions
each edge qi to q j , count the number of such transitions in terminated from INVITE received state to the time out state
all strings. Then the probability of a transition is the ratio of and can be compared with a threshold value (How to select this
number of edges going to state q j to the total number of edges threshold is described in Section IX-C). Similar observation
from qi to any other state qk on any symbol . This is can be made for REGISTER transaction (with valid URI and
given by the Equation 5. authentication enabled), where also large number of timeouts
M are seen as malicious user is not interested in completing the
t =1 (qi , q j , )
pi j ( ) =  M  ||
(5) transaction. Algorithm 1 describes the method for detecting
t =1 y=1 (qi , qk , y ) INVITE and REGISTER flooding (with random URI) attacks.
GOLAIT AND HUBBALLI: DETECTING ANOMALOUS BEHAVIOR IN VoIP SYSTEMS: A DES MODELING 737

Algorithm 1 Detecting INVITE/REGISTER Flooding Attacks Algorithm 2 Detecting Coordinated Attacks


Input: PC DT A - DES model for INVITE and REGISTER Input: Trained Model PC DT A
Input: T -Interval Period Output: Coordinated Attack Detection for each Observation
Input: Event_T ype- Type of Event to be monitored Sequence
Output: Flooding Attack Detection of Event_T ype
1: while Not interrupted do
1: while Not interrupted do 2: if New call-ID then
2: Event_T i meOut = 0 3: Si Create a new Sequence
3: for t=tstt est art to t + T do 4: end if
4: Event New timeout Event of type Event_T ype 5: for Every Observation Sequence Si do
Detected from PC DT A 6: Ptrans get SequenceTr ansi ti on Pr obabili t y(Si )
5: Event_T i meOut = Event_T i meOut + 1 7: Pt ime get SequenceT i mi ng Pr obabili t y(Si )
6: end for 8: if Ptrans T h trans or Pt ime T h t ime then
7: if Event_T i meOut T h then 9: Coordinated Attack Detected for Si
8: Flooding Attack of Type Event Detected in interval 10: end if
art to t + T
tstt est 11: end for
9: end if 12: end while
10: art t + T
tstt est
11: end while
lengths are usually long and a call is ended mostly by the
server timeouts. We try to capture this anomaly in anomalous
This algorithm counts the number of such timeout events in a path timing delay probability. Thus in order to detect these
particular time window T . attacks, we monitor the path transition probability and timing
In case of BYE flooding, the PCDTA observes illegal transi- delay probability. If any of these is lesser than the threshold
tions from start state to BYE received state or from any other set, we identify the sequence as a coordinated attack attempt.
state to BYE received state (if the call-ID of BYE matches Algorithm 2 describe these steps.
with any other ongoing call). This attack can be detected by
counting the corresponding number of such illegal transitions
IX. E XPERIMENTAL E VALUATION
in T interval. In case of REGISTER with random URI
there will be large number of 404 error messages generated as In this section we describe our experiments to validate the
server does not allow users who do not have accounts at the proposed detection techniques. In the next seven subsections
registrar to REGISTER. It is easy to notice that, Algorithm 1 we elaborate on the testbed setup, data collection for training,
can be modified to count such illegal transitions and error how to set different thresholds, detection performance of
transitions. Similarly in case of multi-attribute flooding all PCDTA on various attacks, sensitivity analysis of PCDTA with
types of messages appear (with their ratio balanced), hence few different threshold values, comparison with recent work on
messages will timeout and few others will have illegal state SIP flooding detection and comparison with other coordinated
changes and few may have transitions with error. To detect attack detection method respectively.
these attacks Algorithm 1 can be modified to count respective
type of events (transitions) in T time period. A. Testbed Setup
B. Detecting Coordinated Attacks In order to study the detection performance of PCDTA, we
designed and created a testbed similar to the one described
During RINGING attacks, the server continues to receive
in [6] and shown in Figure 6. This testbed simulates two
provisional responses (180 RINGING) but no final response
enterprise networks named A and B. We simulated the two
(200 OK). The repetition of occurrence of any event is taken
enterprise networks with two PCs and used another machine
care by the counter value associated with the state where the
as a server. All the three machines run Ubuntu 12.04 64-bit
livelock is made. This attack can be detected by computing
operating system with hardware configuration of Intel Core
the probability of path taken by event sequence. If RINGING
i5-4590 CPU running at 3.30 GHz having 8 GB RAM.
events are retransmitted then the counter at state q2 has a
We installed Asterisk 11.18.0 [14] on the server machine
high value which results in a lower probability for p(c). Since
which acts as a VoIP PBX.4 We also installed a network
probability of event is the product of counter and transition,
emulator software netem [15] in the proxy server to emulate
this decreases the overall probability rapidly. It might also
the Internet behavior. We set up a simulation of 15 user agents
happen that rather than repeating the provisional responses, the
on both networks A and B. These user agents can make calls to
attacker might wait for long times (or stops responding) after
user agents in other network and to agents in the same network.
sending the RINGING response before sending OK message.
We used a modified version of VoIP Bot [16] to represent user
Hence this can also be detected by calculating the path timing
agents. It uses Jain-SIP api for handling SIP messages and
probability.
During prolonged call attacks, the system stalls (almost) 4 It stands for VoIP Private Branch Exchange an equivalent of telephone
after it has successfully established the session. The call exchange.
738 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 3, MARCH 2017

TABLE III
M EAN C OUNTER VALUES . (a) C ALL PCDTA.
(b) R EGISTRATION PCDTA

Fig. 6. Testbed setup.

Java Media Framework (JMF) stack for generating RTP media


flows. The original version though uses the communication
agent based on the Internet Relay Chat (IRC) for exchange TABLE IV
of commands between the bots and the bot manager, but our P ROBABILITY FOR T RANSITIONS . (a) T RANSITION P ROBABILITY FOR
version allows the bots to make calls without the help of the INVITE I NITIATED PCDTA. (b) T RANSITION P ROBABILITY FOR REG-
manager. All user agents are registered at the Asterisk server. ISTER I NITIATED PCDTA
Each agent runs as a separate process on a different UDP port.
The Asterisk server runs on port 5060, which is the default
port for a SIP server.

B. Traffic Generation for Training


As described earlier PCDTA transition model is created
with SIP specification and both transition probability, counter
probability and timing probability need to be learned from a
set of successful SIP transactions. To facilitate this learning we
collected successful5 SIP transaction sequences by scheduling
these agents to make calls. While generating this traffic, we set
transmission delay to 50ms and packet loss rate to 0.42 per-
cent through emulator. This emulator randomly drops packets
maintaining this loss ratio and emulates internet behavior. The
user agents were scheduled to make calls at random intervals
for random choices of bots, but closely mimicking the real
behavior. Each user agent made calls with a minimum and
maximum span of 40 and 120 minutes respectively between
successive calls. The network traffic was collected using an
open source traffic collector tcpdump [17]. We wrote a java
program using Jnetpacp [18] library to extract the details of
SIP messages. In total there were 2055 number of SIP INVITE
initiated transactions and 1800 number of REGISTER initiated
of same transaction since they carry same call-ID. From each
dialogues in 5 days of simulation data. We analyzed these SIP
sequence and for each transaction we count number of such
sequences and collected statistics for various parameters to
retransmissions and derive the average of these values over a
learn various probabilities.
period of these 5 days. The mean number of retransmissions
1) Learning Probability of Counter at Various States:
for every state are shown in Table III. As we can see from the
Each of the 2055 number of INVITE transactions are of the
table, number of retransmissions are not more than three for all
form s1 , s2 , s3 , , sn with occasional retransmissions. Each
the states indicating retransmissions are rare. The state q1 has
of these events further the machine to progress to the next state
a slightly higher rate of retransmission, as with authentication
of PCDTA if they are distinct, or otherwise result in a traversal
enabled it is visited twice, once without credentials and second
of self loop at respective states. An example sequence with
time with credentials.
retransmissions looks like s1 , s1 , s2 , s3 , , sn where the first
In order to derive the probability of revisiting a particular
event occur two times. We treat these retransmissions as part
state we fit a Poisson distribution for every state. Figure 7
5 Successful SIP calls are those which successfully establish media session shows the distribution graphs for various counter values at
and gracefully terminate. various states.
GOLAIT AND HUBBALLI: DETECTING ANOMALOUS BEHAVIOR IN VoIP SYSTEMS: A DES MODELING 739

TABLE V
M EAN T IMING D ELAY VALUES . (a) T RANSITION D ELAY FOR INVITE
PCDTA. (b) T RANSITION D ELAY FOR REGISTER PCDTA

Fig. 7. Poisson distribution graphs for counters at states from q1 worth noting that there is no timing constraint from the state q4
to q6 of INVITE dialogue. (a) Distribution C1. (b) Distribution C2. (which represents the media transmission is on and waiting for
(c) Distribution C3. (d) Distribution C4. (e) Distribution C5.
(f) Distribution C6. BYE event), as media transmission can last arbitrary amount
of time and it would be wrong to put a limit on how long
the users can stay in a call. These thresholds are calculated
2) Learning Transition Probability: As mentioned in from the probability distributions of transition timing delays
Section VI, transition probability is the ratio of fraction of using Equation 6. In this equation, j is the standard deviation,
transitions to a particular next state to all possible next states t mean is the mean of distribution j , and is an experimental
j
of a particular state. Using this method we derive the transition value. We chose to be 100 for our experiment. This value
probability of all the transitions for both INVITE dialogue is derived experimentally so as to minimize false stalled state
and REGISTER transaction. Table IV and Table IV show the detection. We do a sensitivity analysis for different values of
transition probabilities for various transitions for two PCDTAs in subsection IX-E.
learnt from the transaction sequence. All illegal transitions (not
shown here) have 0 probability. f (i ) = t mean
j + j (6)
3) Learning Transition Delay Probability: In order to
We get these threshold values from the mean values calcu-
learn the timing delay probabilities for various transitions,
lated in the table V. We declare a state qi stalled when the
we calculated the mean delay between successive events of
maximum timing delay threshold amongst all the transitions
the sequences. Using these mean values we again fit a Pois-
from qi is crossed.
son distribution graph for every transition delay probability.
Table V and Table V show the average delay observed between
successive events in the sequences of training set. For the C. Setting Threshold for Attack Detection
sake of brevity we do not show the corresponding probability We generated and used one more days SIP transaction
distribution graphs here (can be drawn by plugging mean delay sequences to derive and set various thresholds for attack
value into Poisson distribution equation). detection. The rational behind this is to use an independently
4) Setting Threshold on Timing Constraints for Transitions: generated dataset to set appropriate thresholds. As explained
In PCDTA, transitions are constrained by timing of occurrence in Section VIII, INVITE and REGISTER flooding attacks
and machine will wait for the next event in a particular state are detected by counting the number of timeout transitions
only until this timer expires, upon which an epsilon transition in a T interval of time, however in training dataset there
is activated which will terminate the dialogue with a time were no timeout and illegal transitions. In order to derive
expired message. Setting an appropriate threshold for the time threshold for these cases we generated traffic by randomly
guard is also learned from the event training sequences. It is deleting random number of relevant transitions so that timeout
740 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 3, MARCH 2017

TABLE VI 1) Flooding Attacks: The flooding attacks were gener-


F LOODING T HRESHOLDS ated through Python scripts which used custom text of
INVITE, REGISTER and BYE SIP messages and invoked
Scapy 2.3.1 [19] to send the messages to SIP server using
UDP as transport protocol.
INVITE flooding: The script sends any required number
of INVITE messages to valid/invalid users, but ignores
any responses received from the server or the other user.
REGISTER flooding: We generated these attacks taking
TABLE VII into consideration whether authentication is enabled at
PATH P ROBABILITY T HRESHOLD VALUES the registrar or disabled.
a. When authentication is enabled, the script sends
REGISTER requests as invalid users, which results in
403 (Forbidden) /404 (Not Found) error responses at the
server.
b. When authentication is disabled, the script sends
REGISTER requests to the registrar, but ignores any
events are generated at some states. Similarly we also injected challenge (called nonce) issued by the server to verify
random illegal transitions into this dataset. Using the dataset his identity. This results in stalled sequences.
so generated we derived mean number of timeout and mean BYE flooding: These attacks were generated in two ways
number of illegal transition thresholds (T h) for detecting as follows.
attacks. We used time period T to be 10 minutes. For multi- a. The script sends BYE messages with random call-IDs
attribute flooding detection we need a threshold on both types to the server, which result in 481 (Transaction Does Not
of transitions combined. In each case we set these thresholds Exist) error responses.
to 2 times the average number of such transitions appearing in b. The script sends BYE messages for the ongoing
this 10 minutes interval approximated to next highest integer communications, hence terminating legitimate calls. The
value. Table VI shows the values of these thresholds. In ongoing call information was written to a text file by a
subsection IX-E we show the attack detection and rate of false packet sniffer cum analyzer written using jnetpacp library.
alarm generated with different threshold values. As the user agent doesnt send any BYE request, the
In order to detect the coordinated attacks we need a thresh- reception of a corresponding OK is detected as an illegal
old on transition sequence probability and transition timing transition sequence at the proxy server of other user agent
probability. These two are set at the 2 times of average of all side (UAS or callees proxy server).
probabilities of transition and timing probabilities in this days Using the testing dataset sequences we again calculated
sequences. Since the probability of a sequence is calculated as the number of timeout, illegal transitions by setting time
multiplication of probabilities of individual events/transitions, window to T to 10 minutes to detect flooding attacks.
there is a chance of underflow issue from implementation The details of detection performance of PCDTA is shown in
perspective hence we converted these calculations into log Table VIII. We use Recall and Accuracy of detection system
values. Table VII shows the thresholds on both transition and the two commonly used metrics to assess the performance of
timing thresholds in log scale. PCDTA. Recall is the effectiveness of an algorithm to identify
positive labels. Accuracy is the overall effectiveness of an
D. Traffic Generation for Testing algorithm. Recall and Accuracy are given by the Equation 7
and Equation 8. In these equations the terms carry following
We generated two sets of testing data to test the detection meaning t p = number of true flooding intervals detected,
performance of PCDTA. First set of data is normal SIP tn = number of normal intervals, f p = number of false
operation data and was recorded using the same setup as in flooding attack intervals detected, f n = number of attack
training case. In total there were 150 normal intervals (with intervals labeled as normal.
call rate as in training case). Second set of dataset is created tp
with different attacks generated in separate intervals. The rate Recall = 100 (7)
tp + f n
of flooding attacks were increased in step size of 10 samples t p + tn
per interval starting from 10 to 100 totaling 10 intervals. For Accur acy = 100 (8)
t p + tn + f n + f p
example the INVITE flooding attack was generated with the
first interval having 10 INVITE calls and second interval with 2) Coordinated Attacks: The VoIP Bot [16] program itself
20 INVITE calls upto 100 INVITES in a period of 10 minutes was modified to generate these attacks.
which add up to 550 INVITE requests. This helps in assessing RINGING attacks: The program now allows bots to
the detection capability of PCDTA to different attack rates. send repetitive RINGING responses (minimum number
Similar experiments were repeated for all other cases too. being 3 and maximum being 10); and to send delayed OK
We generated these flooding attacks using custom scripts as responses (with a delay of 4 to 5 minutes) after sending
detailed below. RINGING responses.
GOLAIT AND HUBBALLI: DETECTING ANOMALOUS BEHAVIOR IN VoIP SYSTEMS: A DES MODELING 741

TABLE VIII
F LOODING ATTACK D ETECTION R ESULTS

TABLE IX
C OORDINATED ATTACK D ETECTION R ESULTS

Prolonged call attacks: The bots could stay in call now As mentioned previously in order to detect flooding attacks
for unusually long times. The call times allowed were we count the number of timeout transitions or illegal transi-
between 20 to 70 minutes, which were exceptionally tions in a window period and to detect coordinated attacks
long for normal users. The average call time for training we use path transition probability (per transition sequence).
set was 10 minutes in our experiments (generated by It is worth noting that, the detection performance of PCDTA
transmitting recorded media files of user conversations). is governed by the threshold on these two values.
For coordinated attack detection we selected 500 nor- 1) For flooding detection, a threshold value which is too
mal sequences (from the testing dataset) and mixed with conservative may detect all flooding attack instances
sequences of coordinated attacks. Table IX shows the number however it may generate too many false detection cases.
of sequences, Recall and Accuracy of PCDTA. Unlike flooding On the other hand a threshold which is too large may
cases, here the true positives, true negatives, false positives and miss many genuine flooding intervals. We experimented
false negatives are counted for number of sequences rather with different threshold multipliers using the flooding
than number of intervals as coordinated attack is detected for dataset (entire dataset used for Table VIII) and chosen a
every sequence. threshold value of 2 balancing the recall and false alarm
rate as shown in Table XI.
2) For detecting coordinated attacks the threshold mul-
E. Threshold Parameter Sensitivity Analysis
tiplier is set for the transition sequence probability
As mentioned previously we identify timeout transitions by directly, which keeps on decreasing as more number of
multiplying to the standard deviation of transition timings repeated transitions are observed. Thus any sequence
and adding it to the mean of timing values of events. Hence the probability which is lower than the threshold proba-
value of affects the detected number of stalled states. Thus bility is detected as attack (since values less than 1
it is very essential to choose its value such that we minimize get multiplied). In order to set an appropriate value
the number of falsely identified stalled states. We performed which minimizes the chances of false alarms we experi-
sensitivity analysis on different values of starting from mented with different values for the threshold multipliers
10 to 200 using the average number of stalled state counts ranging from 1 to 3 in step size of 0.5. Table XII
in a window period from training sequences. We used 3 types shows the detection performance and false alarm rate
of dataset to study the sensitivity of false detection to values. generated by PCDTA with different threshold values (it
First is 950 normal INVITE sequences (newly generated), is a multiplier of average number of such transitions
entire dataset used for flooding attack case (Table VIII which in normal intervals) when tested with 1000 instances of
has both normal and different flooding scenarios) and coor- normal sequences and 425 instances of each coordinated
dinated attack dataset (Table IX). The results of percentage attack type (as in Table IX). We can notice that for
of falsely identified stalled states6 when tested with these a threshold multiplier of 2 we get 100% recall and
3 types of datasets are shown in Table X. We can notice that acceptable false alarm rate (4 + 9)/(1000)100 = 1.3%.
the number of falsely identified stalled states lessen as we
reach 100. After 100, it showed no improvement hence we F. Comparison of PCDTA With Other
used the value of as 100. Flooding Detection Method
6 These are not false alarms. False alarms are generated when an interval In this subsection we report the comparison of PCDTA with
has threshold number of timeouts one of recent work on flooding attack detection method in
742 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 3, MARCH 2017

TABLE X TABLE XIV


S ENSITIVITY OF FALSE S TALLED S TATE D ETECTION TO VALUES D ETECTION F EATURE C OMPARISON

TABLE XI
P ERFORMANCE OF PCDTA W ITH D IFFERENT T HRESHOLDS FOR
T IMEOUT /I LLEGAL T RANSITION C OUNT

Fig. 8. Random early termination.

false alarm rate for our method and that of [20]. False attacks
were detected using this method [20] mainly in two cases, first
TABLE XII when the legitimate call could not be completed because the
P ERFORMANCE OF PCDTA W ITH D IFFERENT T HRESHOLDS FOR PATH other user was busy and other, when the number of packets
P ROBABILITY in the normal sequences exceeded the maximum threshold,
which may occur in case of retransmissions.
In literature there are other works based on Machine
learning and anomaly detection techniques to detect flooding
attacks. An advantage of PCDTA over other techniques is, it
is generic and can detect both flooding and coordinated types
of DoS attacks, whereas the methods proposed in literature are
TABLE XIII
for specific attack types. Machine learning technique require
F LOODING ATTACK D ETECTION C OMPARISON W ITH [20]
large number of features to be extracted and training with
each attack type which is often difficult. Hellinger distance
based technique [6] use the correlation between different SIP
messages for detecting attack, while this may be evaded by
balancing the number of these messages cleverly. PCDTA
doesnt need training with any attack type and can detect
all cases of DoS. A comparison on detection capability of
different methods is shown in Table XIV.
prior art. For comparison purpose we implemented a method
proposed in [20] and evaluated its performance on our dataset.
The method described in [20] generates statistics of various G. Comparison of PCDTA With Random
SIP messages grouped on the basis of call-ID, ToUri and IP Early Termination Method
address of source. Statistics so generated in a block period We also compare our method to a technique called Random
will be compared with two thresholds. Any interval traffic Early Termination proposed in [2]. Random Early Termination
statistics not lying within this range is declared a flooding is a technique to terminate connections which are in the
attempt. Thresholds are derived from the normal sequences RINGING stage to avoid resource crunch at the proxy server.
data. The maximum and minimum values of the number of It selects active sessions probabilistically for termination based
packets in a group in a particular interval become the two on the age of session in RINGING phase.
thresholds. These thresholds get updated after every normal All the active sessions or transactions which are in RING-
traffic interval is observed. We set the interval here to be ING stage are sorted based on their starting time and two
10 minutes, same as the interval we chose for our case. For thresholds T1 and T2 are used for identifying calls which
most of the cases, the two thresholds came out to be 2 and 5. are to be terminated. Most recent T1 number of calls will
Two thresholds are set for this case because, if an attacker not be considered for termination (white area in Figure 8).
attempts flooding, the number of packets sent from her side This is done to avoid terminating any call which has just
with a unique combination of parameters will be less than been initiated. All calls which are older than T1 but less than
that sent by a normal caller provided she uses different SIP T2 will be dropped with probability pr ob which is given by
parameters for different attack instances (attack strategy 1), Equation 9 (grey area in Figure 8). If the number of active
in a second method attacker can use the same fields for the RINGING transactions are greater than T2 all transactions
SIP packet to generate the attacks (attack strategy 2), like whose sequence number is older than T2 (black area in
repeating the same IP, call-ID, and ToUri, this time, number Figure 8) will be definitely terminated.
of the packets in same group can go beyond than that for ageM RT T
the normal user. Table XIII shows a comparison of recall and pr ob = 1 e M RT T (9)
GOLAIT AND HUBBALLI: DETECTING ANOMALOUS BEHAVIOR IN VoIP SYSTEMS: A DES MODELING 743

In equation 9 MRTT is the Minimum Ringing Time Thresh- 1) Preventive Techniques: SIP specification does not sug-
old and age is the duration of RINGING time. MRTT is the gest any particular security mechanism; instead it allows the
threshold time on the age of RINGING call below which it is use of other security mechanisms like TLS and SMIME for
not considered for termination (and is done regardless of num- securing SIP messages [26]. Preventive methods are mainly in
ber of active transactions). We conducted experiments similar the form of cryptographic extensions to secure the SIP [27].
to the one described in [2]. We generated 900 normal calls and These techniques authenticate messages exchanged between
also Ringing attacks in a span of 25 minutes (which is a higher user agents and SIP servers preventing malicious users from
rate of calls compared to our previous experiments). Normal generating spoofed requests.
and Ringing attacks are generated using bots as described 2) Detection Techniques: Detection methods broadly fall
previously. Table XV shows the performance comparison of under four cases as, signature based, statistical methods,
PCDTA with RET method. In case of RET the number of machine learning based and formal models.
calls terminated by it are considered to calculate Recall and a) Signature based detection: Signature based
Accuracy. For these experiments we set threshold values approaches have encoded patterns for different types of
T1= 18 and T2= 25 calls and also MRTT to 18 seconds in attacks, and the detection system systematically scan the
case of RET and same path probability threshold (15.2737 in incoming traffic for these patterns. Works in [28][30] propose
log scale) used for PCDTA in our previous experiment. These signatures for detecting DoS caused due to maliciously formed
thresholds are computed after observing the performance of SIP messages. These patterns are based on SIP grammar as
RET method for different values. The value of MRTT is set defined in RFC 3261 [4]. However, these approaches can
to 18 seconds as normal calls in the dataset we generated had only detect the previously encoded attacks. A framework
Ringing duration ranging between 5 seconds to 18 plus few to describe known vulnerability in SIP and preventing such
fraction of seconds. We can notice that PCDTA outperforms vulnerabilities is described in [31].
RET in all cases. The lowered Recall and Accuracy of RET is b) Statistical methods: Statistical deviation detection
because it does not terminate any call even if its age is greater approaches attempt to detect abnormalities in the incoming
than MRTT if the number of concurrent transactions does not messages by observing significant deviation from the normal
cross T1 and it may also terminate some normal calls if any behavior. Many researchers [6], [24], [25] have proposed to
normal calls ringing time is greater than MRTT. use hellinger distance (HD) to measure aberration between
In order to access the performance of RET against different normal and attack traffic distribution. Since these approaches
threshold values and MRTT values we performed sensitivity identify the difference in the occurrence count of events of
analysis using different values. Table XVI shows the detection various types they may sometimes generate false alarms or
performance of RET with MRTT set at 18 seconds and for fail to detect some attacks. For example an attacker can craft
different threshold values of T1 and T2. We can see that as a flooding attack balancing the different types of messages
the threshold values are decreased more number of ringing (INVITE, REGISTER, BYE, OK, ACK) such that there are
sequences are detected. Similarly in the second experiment we no differences in distributions (Multi-Attribute flooding) com-
set the threshold values T1 and T2 at 18 and 25 respectively pared to normal scenario, rendering few detection methods
(as these thresholds detected maximum number of ringing ineffective. Reynolds and Ghoshal [5] proposed to measure
sequences in previous case) and varied the MRTT value the difference between the number of attempted connection
between 12 to 18 seconds in step size of 3 seconds. Table XVII establishments and the number of completed connections. This
shows the performance of RET with these values. We can is motivated by the fact that, in flooding cases there would be
again observe that lower MRTT value can detect many attacks many call setup requests which are not completed. A signal
as in this case more number of transactions qualify to be processing technique which observes the change in energy
counted for the thresholds T1 and T2. However as many level of a wavelet to detect slow rate SIP floods is described
normal sequences also have ringing time greater than MRTT in [32].
and are likely to be terminated by probabilistic selection which c) Machine learning methods: There are several attempts
increases the false alarm rate. to use machine learning methods as anomaly detector for
VoIP flooding. Nassar et al. [23] proposed a machine learning
approach to classify and detect SIP traffic and also different
X. P RIOR W ORK
flooding attacks. They used 38 features extracted as statistics
In this section we describe prior work related to SIP flooding of various message types and intervals to train a SVM. Akbar
and anomaly detection. A very brief discussion about most and Farooq [21] evaluated two machine learning algorithms
closely related work in discrete event systems used to detect (Naive Bayes and Decision Tree) using a set of features
anomalies in other domains is also given here. extracted from a window period to detect flooding attacks.
These features are extracted as statistics directly from first
line of SIP packets. Tsiatsikas et al. [22] proposed to generate
A. Voice Over IP Denial of Service a hash value from first line of SIP packet and count number of
To protect against SIP flooding attacks, there are prior works unique hash values in a time window as feature. Similar to [21]
which can be classified as preventive methods (which try it used different machine learning algorithms (SMO, Naive
to secure SIP itself) and detection methods which identify Bayes, Neural Network, Decision Tree and Random Forest
flooding cases by monitoring network traffic. classifiers) to detect attacks of different rates. Mehta et al. [33]
744 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 3, MARCH 2017

TABLE XV
C OMPARISON OF PCDTA W ITH R ANDOM E ARLY T ERMINATION M ETHOD

TABLE XVI
S ENSITIVITY A NALYSIS OF R ANDOM E ARLY T ERMINATION M ETHOD W ITH T HRESHOLDS

TABLE XVII
S ENSITIVITY A NALYSIS OF R ANDOM E ARLY T ERMINATION M ETHOD W ITH MRTT

have pointed various inefficiencies of machine learning tech- characterize the behavior of system and detect these anomalies.
niques in detecting attacks and high degree of false alarms A discrete event system based model is proposed for detecting
particularly of those methods using Euclidean distance. faults in powerline networks [41]. X10 powerline network
d) Formal models: Many prior works have used formal system is used as a case study and based on commands
methods to study and analyze the behavior of SIP protocol. exchanged an automaton is generated which represents the
Works described in [34] and [35] model the SIP INVITE normal behavior profile and subsequently used to detect
transactions using Coloured Petri Nets (CPNs) over reliable anomalies.
and unreliable transport medium. They examine the effect of
various losses in events and corresponding state space models XI. C ONCLUSION
to find defects in SIPs implementation. Another work [36] Session Initiation Protocol has become the defacto standard
utilizes CPN state space analysis to identify the states in for session management in Voice over IP implementations. SIP
SIP INVITE transaction which are vulnerable to Denial of is vulnerable to a range of Denial of Service attacks including
Service attacks. The standard INVITE server transition model flooding and coordinated attacks. In this paper we modeled
described in RFC 3261 is annotated with state constraints different SIP dialogues and transactions as discrete event sys-
in [37] to detect flooding attacks. Work described in [38] tems and proposed a probabilistic state transition machine to
proposed an extended CPN model known as the timed Hier- describe these dialogues and transactions. Further we identified
archical CPN (HCPN) as an intrusion detection system for a range of anomalies generated in a DES system. We described
detecting SIP post-session and pre-session flooding attacks. algorithms to detect various DoS attacks using the proposed
One major limitation of these techniques is they either verify state transition model. We designed and experimented with a
the SIP protocol behavior or mainly focus on detecting only range of DoS attacks generated through custom programs and
the INVITE flooding attacks. report that proposed DES model can detect these attacks with
Some recent surveys on various SIP vulnerabilities and their high accuracy and detection rate.
detection methods can be found in [3] and [39].
R EFERENCES
[1] G. Ormazabal, S. Nagpal, E. Yardeni, and H. Schulzrinne, Secure SIP:
B. Discrete Event Systems A scalable prevention mechanism for DoS attacks on SIP based VoIP
systems, in Proc. Principles, Syst. Appl. IP Telecommun. Serv. Secur.
Discrete Event System modeling has been extensively used Next Generat. Netw., 2008, pp. 107132.
to formally describe various types of failures in control sys- [2] W. Conner and K. Nahrstedt, Protecting SIP proxy servers from
ringing-based denial-of-service attacks, in Proc. 10th IEEE Int. Symp.
tems [40]. Klerx et al. [7] described a probabilistic automata Multimedia (ISM), Dec. 2008, pp. 340347.
to identify anomalies in discrete event systems. In particular [3] A. D. Keromytis, A comprehensive survey of voice over IP security
they consider ATM machine operation as a discrete event research, IEEE Commun. Surveys Tut., vol. 14, no. 2, pp. 514537,
2nd Quart. 2012.
system and describe range of anomalies found in discrete [4] J. Rosenberg et al., SIP: Session Initiation Protocol,
event systems. A probabilistic learning automata is used to document RFC 3261, 2002.
GOLAIT AND HUBBALLI: DETECTING ANOMALOUS BEHAVIOR IN VoIP SYSTEMS: A DES MODELING 745

[5] B. Reynolds and D. Ghoshal, Secure IP telephony using multi- [30] A. Lahmadi and O. Festor, VeTo: An exploit prevention language from
layered protection, in Proc. 10th Annu. Netw. Distrib. Syst. Secur. known vulnerabilities in SIP services, in Proc. Netw. Oper. Manage.
Symp. (NDSS), 2003, pp. 113. Symp. (NOMS), 2010, pp. 216223.
[6] H. Sengar, H. Wang, D. Wijesekera, and S. Jajodia, Detecting VoIP [31] A. Lahmadi and O. Festor, A framework for automated exploit preven-
floods using the Hellinger distance, IEEE Trans. Parallel Distrib. Syst., tion from known vulnerabilities in voice over IP services, IEEE Trans.
vol. 19, no. 6, pp. 794805, Jun. 2008. Netw. Service Manage., vol. 9, no. 2, pp. 114127, Jun. 2012.
[7] T. Klerx, M. Anderka, H. K. Bning, and S. Priesterjahn, Model-based [32] J. Tang and Y. Cheng, Quick detection of stealthy SIP flooding attacks
anomaly detection for discrete event systems, in Proc. IEEE 26th Int. in VoIP networks, in Proc. IEEE Int. Conf. Commun. (ICC), Jun. 2011,
Conf. Tools Artif. Intell. (ICTAI), Nov. 2014, pp. 665672. pp. 15.
[8] R. Alur and D. L. Dill, A theory of timed automata, Theory Comput. [33] A. Mehta, N. Hantehzadeh, V. K. Gurbani, T. K. Ho, J. Koshiko,
Scince, vol. 126, no. 2, pp. 183235, 1994. and R. Viswanathan, On the inefficacy of Euclidean classifiers for
[9] C. Meiners, E. Norige, A. X. Liu, and E. Torng, FlowSifter: A counting detecting self-similar session initiation protocol (SIP) messages, in
automata approach to layer 7 field extraction for deep flow inspection, Proc. 12th IFIP/IEEE Int. Symp. Integr. Netw. Manage. (IM), May 2011,
in Proc. IEEE INFOCOM, Mar. 2012, pp. 17461754. pp. 329336.
[10] R. C. Carrasco and J. Oncina, Learning stochastic regular grammars by [34] L. Ding and L. Liu, Modelling and analysis of the INVITE transaction
means of a state merging method, in Proc. Int. Colloq. Grammatical of the session initiation protocol using coloured Petri Nets, in Proc.
Inference Appl., 1994, pp. 139152. 29th Int. Conf. Appl. Theory Petri Nets, 2008, pp. 132151.
[11] D. D. Ron, Y. Singer, and N. Tishby, On the learnability and usage of [35] L. Liu, Verification of the SIP transaction using coloured Petri Nets,
acyclic probabilistic finite automata, in Proc. 8th Annu. Conf. Comput. in Proc. 32nd Austral. Comput. Sci. Conf., 2009, pp. 6372.
Learn. Theory, 1995, pp. 3140. [36] L. Liu, Uncovering SIP vulnerabilities to DoS attacks using coloured
[12] C. De La Higuera, Grammatical Inference: Learning Automata and Petri Nets, in Proc. 10th IEEE Int. Conf. Trust, Secur. Privacy Comput.
Grammars. New York, NY, USA: Cambridge Univ. Press, 2010. Commun., Nov. 2011, pp. 2936.
[13] S. M. Ross, A First Course in Probability, 8th ed. New York, NY, USA: [37] D. Seo, H. Lee, and E. Nuwere, SIPAD: SIP-VoIP anomaly detec-
Prentice-Hall, 2010. tion using a stateful rule tree, Comput. Commun., vol. 36, no. 3,
[14] [Online]. Available: http://www.asterisk.org/ pp. 562574, Mar. 2013.
[15] [Online]. Available: https://wiki.linuxfoundation.org/networking/netemNetem[38] Y. Ding and G. Su, Intrusion detection system for signal based SIP
[16] M. Nassar, R. State, and O. Festor, Labeled VoIP data-set for intrusion attacks through timed HCPN, in Proc. 2nd Int. Conf. Availability, Rel.
detection evaluation, in Proc. 16th EUNICE/IFIP Conf. Netw. Serv. Secur. (ARES), 2007, pp. 190197.
Appl. Eng. Control Manage., 2010, pp. 97106. [39] S. Ehlert, D. Geneiatakis, and T. Magedanz, Survey of network security
[17] [Online]. Available: http://www.tcpdump.org/Tcpdump systems to counter SIP-based denial-of-service attacks, Comput. Secur.,
[18] [Online]. Available: http://jnetpcap.com/jNetPcap vol. 29, no. 1, pp. 225243, 2010.
[19] [Online]. Available: http://www.secdev.org/projects/scapy/Scapy [40] C. G. Cassandras and S. Lafortune, Introduction to Discrete Event
[20] J. Lee, K. Cho, C. Lee, and S. Kim, VoIP-aware network attack Systems. New York, NY, USA: Springer, 2008.
detection based on statistics and behavior of SIP traffic, Peer-to-Peer [41] A. Arora, R. Jagannathan, and Y.-M. Wang, Model-based fault detection
Netw. Appl., vol. 8, no. 5, pp. 872880, 2015. in powerline networking, in Proc. 16th Int. Parallel Distrib. Process.
[21] M. A. Akbar and M. Farooq, Securing SIP-based VoIP infrastructure Symp. (IPDPS), 2002, pp. 18.
against flooding attacks and spam over IP telephony, J. Knowl. Inf.
Syst., vol. 38, no. 2, pp. 491510, 2014.
[22] Z. Tsiatsikas, A. Fakis, D. Papamartzivanos, D. Geneiatakis,
G. Kambourakis, and C. Kolias, Battling against DDoS in SIP. is Diksha Golait was born in Bhopal, India. She
machine learning-based detection an effective weapon? in Proc. 12th received the B. Tech. degree in computer science
Int. Conf. Secur. Cryptogr. (SECRYPT), 2015, pp. 301308. engineering from IIT Indore, India, in 2016. In
[23] M. Nassar, R. State, and O. Festor, Monitoring SIP traffic using July 2016, she joined Microsoft India (Research
support vector machines, in Proc. 11th Int. Symp. Recent Adv. Intrusion and Development), where she currently pursues a
Detection (RAID), 2008, pp. 311330. career in software development. Her research inter-
[24] J. Tang, Y. Cheng, Y. Hao, and W. Song, SIP flooding attack detection ests include network and system security.
with a multi-dimensional sketch design, IEEE Trans. Depend. Sec.
Comput., vol. 11, no. 6, pp. 582595, Nov/Dec. 2014.
[25] J. Tang, Y. Cheng, and C. Zhou, Sketch-based SIP flooding detection
using Hellinger distance, in Proc. 28th IEEE Conf. Global Telecommun.
(GLOBECOM), Nov. 2009, pp. 16.
[26] D. Geneiatakis, G. Kambourakis, T. Dagiuklas, C. Lambrinoudakis, and
S. Gritzalis, SIP security mechanisms: A state-of-the-art review, in Neminath Hubballi received the Ph.D. degree from
Proc. 5th Int. Netw. Conf. (INC), 2005, pp. 147155. the Department of Computer Science and Engineer-
[27] R. Farley and X. Wang, VoIP Shield: A transparent protection of ing, IIT Guwahati, India. He was with corporate
deployed VoIP systems from SIP-based exploits, in Proc. IEEE Netw. Research and Development Centers of Samsung,
Oper. Manage. Symp. (NOMS), Apr. 2012, pp. 486489. Infosys Lab and Hewlett-Packard. He is currently
[28] D. Geneiatakis, G. Kambourakis, C. Lambrinoudakis, T. Dagiuklas, an Assistant Professor in computer science with IIT
and S. Gritzalis, A framework for protecting a sip-based infrastructure Indore, India. He has authored or coauthored in the
against malformed message attacks, Comput. Netw., vol. 51, no. 10, area of security. He is also a regular reviewer in
pp. 25802593, 2006. many security journals and conferences and also
[29] S. Ehlert et al., Two layer denial of service prevention on SIP VoIP served as a TPC member of several conferences.
infrastructures, Comput. Commun., vol. 31, no. 10, pp. 24432456, His areas of interest include network and system
2008. security.

También podría gustarte