Está en la página 1de 6

2015 Fourth International Conference on Cyber Security, Cyber Warfare, and Digital Forensic

MLDED:
Multi-Layer Data Exfiltration Detection System
Mohammad Ahmad Abu Allawi, Ali Hadi, Arafat Awajan
Computer Science Dept.
Princess Sumaya University for Technology (PSUT)
Amman, Jordan
mohammad.allawi@gmail.com, {a.hadi, awajan}@psut.edu.jo

Abstract— Due to the growing advancement of crime ware sophisticated one. Each level deals with a specific complexity degree of data
services, the computer and network security becomes a crucial issue. exfiltration attack. MLDED is designed to be for forensic purposes in
Detecting sensitive data exfiltration is a principal component of each addition to its high capability in data leakage detection.
information protection strategy. In this research, a Multi-Level Data
Exfiltration Detection (MLDED) system that can handle different II. LITERATURE REVIEW.
types of insider data leakage threats with staircase difficulty levels and Data-exfiltration systems are designed in many techniques and using
their implications for the organization environment has been different algorithms depending on the scenario and application under study.
proposed, implemented and tested. The proposed system detects
exfiltration of data outside an organization information system, where Shu et al. suggested Data Leakage Detection DLD Algorithm called
the main goal is to use the detection results of a MLDED system for fuzzy fingerprint, which can be used to detect the unauthorized data leakage
digital forensic purposes. MLDED system consists of three major or extraction due to haphazard human errors or application imperfections
levels Hashing, Keywords Extraction and Labeling. However, it is with emphasis on the Privacy-Preserving feature of sensitive data.
considered only for certain type of documents such as plain ASCII text Kemerlis, et.al. 2010 built a system called iLeak that composed of three
and PDF files. In response to the challenging issue of identifying major components: Uaudites, Inspectors, and Trail Gateway. Where
insider threats, a forensic readiness data exfiltration system is designed Uaudites is an agent installed on client’s computer that monitors the
that is capable of detecting and identifying sensitive information leaks. activities, these components are OS tools that could be a limitation of the
The results show that the proposed system has an overall detection system.
accuracy of 98.93%.
Liu, et al. (2009) developed a systematic multi-level algorithm called
Sensitive Information Dissemination Detection (SIDD) system. SIDD
Keywords— 
 
, Data consists of three major building blocks: Identification of network level
Hiding, Data Theft, Data Loss application, Content signature generation and detection, and Covert
communication detection. SIDD system depended on the idea of con-tent
I. INTRODUCTION signature matching. This technique is fruit yielding just in cases when the
whole monitored content is identical to the media sensitive data
The increasing growth of the network systems information exchange
along with the high advances of information technologies in everyday life, Al-Bataineh and White (2012) proposed a novel technique to analyze
and the increasing in the complexity of attacks regardless of the enhanced and detect the malicious data exfiltration in web traffic. The pro-posed
security measures make the need for network security of crucial essence [1]. technique analyze the data leakage exfiltration behaviors of one of the most
well-known  
 
         
Data Exfiltration can be defined as the “Unauthorized extraction of data stealing is identified by a classification algorithm.
information or data from a host“[2]. The essential importance of data as
enterprise assets is leading to an increasing concern and a serious growing Although most of proposed data exfiltration techniques de-pend on the
need for systems that detect, prevent, and mitigate such violations. Since a idea of whole content matching or fuzzy hashing, they exhibits a shortage
boost in insider threats has been recorded specifically in incidents of data in detection of the partial sensitive data exfiltration or the data leakage
exfiltration, data exfiltration becomes a principal part of each network attacks that designed to steal data from a sensitive database of considerable
security system. sizes
Data Exfiltration happened in most cases as a sequence of an intrusion III. PROPOSED SYSTEM
on a victim machine. The intrusion can range from a complex snippet of
code written by the attacker or by an organized team of intruders to guiltless MLDED System consists of two main phases: Pre-processing phase and
unauthorized login by a close friend [3] Leakage detection phase where each phase works in cascading manner,
namely, one cannot proceed to the next phase unless the duty of the previous
Cybercrime nature has been transformed to be more robust, automated one is performed. Each phase consists of building blocks that may work in
and designed with high difficulty levels. A variety of security models and both modes: parallel or sequentially depending on the involved phase.
algorithms are proposed, developed and implemented, but they do not reach
to a level they are fit to detect this complicated level of attacks. Furthermore,   
         
 
these crime ware services represented by organization sensitive data leakage Hashing, Keyword Extraction and labeling levels. Each detection level is
and exfiltration are recently sold as a principal part of a growing designed to handle a specific difficulty level of data leakage attack. Fig. 1.
underground economy. This underground economy has provided an depicts the MLDED system prototype.
organized market where more complicated crime wares raised and created
[4].
Therefore, in this research a Multi-Layer Data Exfiltration Detection
(MLDED) system has been proposed based on multi-level detection
framework that can handle the security breaches from a plain level to the

978-1-4673-8499-5/15 $31.00 © 2015 IEEE 108


107
DOI 10.1109/CyberSec.2015.29

   
    
          
7
3
  ( 
 
  
   

!"!$
     
  ), 
"
 
    9  "'
 2 '* 

 
 9 '
, 
 
, 
"
      
"
 
  /
$
 
 

"
 

 
  
  
   
 
  
  
 2*
& 




"
 
  25(+:* 

 4;6& 

      



 
  


 
$

 



  '  
 2+*  
              3 



      
7
   


    
   
            


<0 
9 1 < '
 
 



$<0 
9 1 )+5+=><
    
   &  ' 

   /
  

0   9209* $   
$ 
 

&
   2+* 9 "'
 
 ' 2  (  '*     /
   
  !"!#$% !    

$ 


 
4?6
9 '
  2+*
   


A. 
   '(  
2$ * 


  
&  ' 
 ( 
   !"! $   2*
 
   )* "'
"

+*!
"


 ,
 
  -
 % (-!#    ' & '
  
   
 
.& 
'  
   % (-!#    
   ) #  9 ' 
 !$
  9 ' #   
  
        "$   !"!   
  '
  

  $  
 
  '
 !"! $ & !"! $ /     
7
    $
    '
    

), #
   

 

$       
7




    
   
    
  

B.  
     
7
 

& !"! $ 
      
  &  
 
0    
1     $
     2* 

 
  
   
      
   2+* 
0      $

        
     /
$
 


 

    
"
 
 &  


  
   

 
2
*  
&


      





    

 
 
'
   
2+*
 
 
&  


  $ 

&

   
  '




&

 


'



1) 
    
  $ 


  
&  
  

  !"!$ 
 &  $    $  
 


     )
 
 
       & 
  
       
  


   


   !"! $    

  (
 
  &

 
   
  



    
   
     
 
   
3  

 $ 
  

     
    $ 
  
      
     
  
   

  

    '    8        
456  


 
 
 


    2=*&   $
 

 $       
  &     
   
  
/
)




   
 
   
  
 

-
         
   
 
  
 &  
' 

   
  
/   &              
    
 
 

 ' 

    

  
       $ 

 
     
7
     
  

8     

    2*
  
 
'

$2+*-
    
#.3     $
  !    ! 
2)  

&          
    

  "

 !   ! 


#
   
  $   
  ' 



   $&
    

'   2* 

  

108
109
   

 #   
#
   / -  * 

 

   

 
$% (/ (
0
&-  &
 &  &' 
#   
0  


 
 , $

 #
%
 ( (' 
#  
 
 
)   *  #
  
  $  
 

 



    
 &
  
&) & $   
      
+  
  !"!$
 

 
 


&   
 
  



$
/2  *
  $  

+  


 
 , $

 #
%


$  
  $  
   



 &  
  
  
3) !"
$  
   &
  
  

3   

   $  ( 
  
   $  
  

 $ 
   $ 
   
  $  

   
  
2+* 2=*
   2=*  
   $  1   
   $   
 


$ 

  
7


  $  


    
    2
* 
   $     

  
  

$     
 
    $  
$ 
         

 
       
 &   
    $
 
    +  
 
$         ( 
           
  $  


  

     " $      
7

 $ 
@ 

  
  
7


  '
  $    $


  
   
  $  &  
     
   $  
 

  
 
   
 
   
7
 

&     < 
A         2
  
$ *  
   
  
 
  ( 



 
 
-
  $    
7
 
 
  
  
 
   $
 
         

   
$ 

     

  $ 
$ 
&     
         



  $ 
 &     7

 

   

(
  
 

    
 -
  


 
  $       $
 

 
@   
8
   
 
       
  
7
 $ 


    
3 
   $  $

 '$
    
  $

  
7
-

 
   $    
  )
  


(
     

     $       

   

& 
$
2=*  
  $ 

     
 $ 
  
8
  
   $     
           +#
  B$  

 
  $ 
     
2+* 
&   $  
  
  $ 
 
  
  $ ' 

   
IV. &"#&-1C31!"D0"9- "1&39"#E&#
 
     
&   $ 



  
/
) !"!$  
 
0$
+; 


 
 0$, , 
$ " 
 =F    
 

- -.
   
  
 
  

 



'E 
+:F  -
, 
5 +! ,0E += C.   93  ::C% !"!  
   $
 
    
 
 
  

  
$ 
G 
-77  
C 
 &  &' 
#    & 
 !"!$ 



  
 
 


  
 ( (' 
#   !"! $  
 &  
      

109
110
conducted in Izzat Marji Group by using 160 sensitive files of the target Table 3 Leakage Threshold Percentage Meaning of Level (2)
organization. Then, capturing the network traffic for a whole day, and Leakage Threshold
Meaning
creating “fake” sensitive leaked files where four of the employees shared in
the process of MLDED system simulation. Data traffic on Monday is the
largest in Izzat Marji Group among four working days (Sunday, Monday, 30% 30% of the captured file contained in the
Tuesday, and Wednesday), thus it was selected to be the test sample for corresponding sensitive file
MLDED system. Monday has the highest network traffic, where on that day 60% 60% of the captured file contained in the
4299 files were extracted from the network. Therefore, those 4299 files corresponding sensitive file
extracted on Monday will be used as the dataset of the MLDED system- 75% 75% of the captured file contained in the
corresponding sensitive file
tuning phase. In addition, this dataset will be used to test the operation of
90% 90% of the captured file contained in the
MLDED system. corresponding sensitive file
Evaluation Criteria
The implemented MLDED system level (2) was run on these thresholds
The performance of MLDED system is evaluated using true positive separately where each threshold shows different experimental results and
(TP), true negative (TN), false negative (FN), false positive (FP), Accuracy, different performance in terms of both standard measures (TN, TP, FP, and
False Positive Rate (FPR), Detection Rate (DR), False Negative Rate FN) and its derivatives.
(FNR), and Precision.
In this experiment, nine sensitive files have intentionally been leaked in
True positive indicates the number of leaked sensitive files that are sake of evaluating the performance of this level of MLDED system where
correctly classified by the data-exfiltration system as leaked sensitive files the intrusion files (leaked files) were designed to suit the complexity of the
and it is considered a sign of proper detection of data leakage whereas the level (2). In case of the 30% threshold, level (2) successfully captured 12
True negative indicates the number of legal files that are correctly passed as leaked files out of 12 fake leaked files. Table 4 and Table 5 list the
legal ones by the data-exfiltration system. performance of level (3) at threshold of 30%.
False positive indicates the files that were incorrectly classified as Table 4 The Standard performance measures of Level (2) at 30%
leaked sensitive files, whereas they are non-leaked insensitive files and it Standard performance Measure No. of files
represents the accuracy of the detection system. False negative indicates TP 13
files that were incorrectly detected as legal files, whereas they are leaked TN 3632
sensitive files (intrusion activities). A false negative is a direct sign of the FP 654
inability of the data-exfiltration system to detect the intrusion. FN 0

Experimental Results of Level (1) Tuning Phase Table 5 The performance evaluation metrics of Level (2) at 30%
This section present and discuss the experimental results obtained in
case of running the level (1) of the MLDED system as a separate unit to MEASURE TNR DR FPR FNR ACCURACY PRECISION
ensure its performance. Ten sensitive files have intentionally been leaked in
sake of evaluating the performance of this level of the MLDED system VALUE 84.741 100% 15.25% 0 84.78% 1.949%
where the fake leaked files was designed to suit the complexity of the level %
(1). Table 1 and Table 2 shows the experimental results achieved at this
level.
As shown in Table 4 and Table 5, Level (2) of the implemented MLDED
Table 1 The Standard performance measures of Level (1) system has achieved a high detection rate (DR) of 100 % if the leakage
Standard performance Measure No. of files threshold set to 30%. However, the precision decreased severely to less than
TP 10 2% due to higher FP in comparison with the attained TP values.
TN 4289 On the other hand, true negative rate (TNR) of this level is lower than
FP 0 that achieved by level (1) due to the high false positive rate of this level,
FN 0 which affect the overall performance of MLDED system. In case of the 60%
threshold, level (2) successfully captured 13 leaked files out of 13 fake
Table 2 The performance evaluation metrics Level (1) /Tuning Phase leaked files. Table 6 and Table 7 list the performance of the level (2) if the
threshold set to 60%.
MEASURE TNR DR FPR FNR ACCURACY PRECISION
Table 6 The Standard performance measures of Level (2) at 60%
VALUE 100 % 100% 0 0 100% 100% Standard performance Measure No. of files
TP 13
TN 4078
Since this level checks for file hashes and uses the hash values to FP 208
compare the files, it achieved high detection rate of 100% and with high FN 0
accuracy and sensitivity of 100%. These results for this level were expected,
since no file can bypass if it is completely leaked, because the leaked and Table 7 The performance evaluation metrics of Level (2) at 60%
sensitive file share the exact hash value. The high precision of this level will
enhance the performance of the other subsequent levels (level (2) and level MEASURE TNR DR FPR FNR ACCURACY PRECISION
(3)) by decreasing the number of leaked data samples to be checked.
VALUE 95.14% 100% 4.85% 0 95.1617% 5.88%
Experimental Results of Level (2) / Tuning Phase
This section present and discusses the experimental results achieved
when running level (2) of the MLDED system as a separate unit to ensure As shown in Table 6 and Table 7, Level (2) of the implemented MLDED
its performance. Level (2) has three leakage thresholds that measure the system has achieved the same high detection rate of 100 % as achieved in
percentage of the captured file that contained in the corresponding sensitive case of the 30% threshold since the FN is zero.
file. After doing different experiments, three leakage percentages have been
chosen for the tests as shown in Table 3: However, the accuracy that achieved at 60% threshold is more than that
achieved in case of 30% due to decreasing in the FP value. To increase the
performance of the level (2) and in an attempt to achieve a better detection

110
111
   

 ;5H& ?
I  
   
3 '

     2=*  
 

    ;5H   
  2 *    
   

  $     3 
 ' 

& ?&#
 
 2+*;5H  
  
   

& +)
#
 
  1   
&0 = & + 0
  

 2=*
&1 F:F+   


0 += & 
1 : =:H =:H $  



   
>:H =:H $  



   
& I& 

  2+*;5H I:H =:H $  



   

"3#E9" &19 !9 09 19 3,,E93,J 09",-#-81 & 


 !"! $  2=*  

 
  $        
 ' 

K3E" IIFH ::H :::5> : IIFFH =>H 
 
 

  
2&1
&00
1*
   
&


  I:H2+* -
 ' 



   


$ 

$  +      =            
   
    $ 2 



    
    !"!$ 
 !"!$ 
* 
 $
    
   


$ 

&0
    
 
 
     

$ -
 =:H 2=*+= &


  & :
&   
   
  $2&0*
   
2=*  I:H 
20*& =
& F  
 
2=*=:H  
& :&#
 
 2+*I:H
#
 
  1    & =&#
 
 2=*=:H
&0 + #
 
  1   
&1 F+;= &0 ?
0 F &1 F;>
1  0 5
1 
& & 

  2+*I:
& F& 

  2=*=:H
"3#E9" &19 !9 09 19 3,,E93,J 09",-#-81
"3#E9" &19 !9 09 19 3,,E93,J 09",-#-81
K3E" II>H I+=H :::=+ ::;> II>H F>5=H
K3E" II>;H I+=H :::=+ ::;> II>H F>5=H
3   
  
 

    
 
 
 
   2+*        
   3 

& =
F  !"!$ 
L    =    
    
      
 ::H    =:H. 
  2+*  
 $  0
  
 
&0&  0 
 
  
 

 
   

$ 

=:H 
    $ 
  
 M   
 $   


&   >:H
I:H  

& 5
& > ' 
 




2=*  !"!$   >:H
& 5&#
 
 2=*>:H
#
 
  1   
&0 ?
&1 F;>
0 5
1 

 =C   


 2+*
& >& 

  2=*>:H
3 

  =;5H       
 "3#E9" &19 !9 09 19 3,,E93,J 09",-#-81
        $ &      
   
 
 
  !"!$
K3E" II?H ???H ::: : II?>H >5=H
8
 
   I:H  

    

=:H    

 &  
& ;
? ' 
 
$>:H    
 ;5H  



2=*  !"!$   I:H
     
 &        2=:H >:H 

I:H*  
         

 & ;&#
 
 2=*I:H
   
  !"!$ #
 
  1   
&0 =
  !"# $%"&'  &1 F+I:
& 


 ' 
  0 :

 


   2=*   !"! $    
   1 >

111
112
& ?& 

  2=*I:H V. ,81,E#-8131!E&E9"M89B
"3#E9" &19 !9 09 19 3,,E93,J 09",-#-81 -
 

   

)
1. 3 ($(' 

$ 

K3E" ::H ===H : >>>H II?>H ::H



2. & !"!$ 
 
$ 
I?I=H
09 H    

9
  F  >:H 
  3. &' 


  !"! 

???IH   
 >>>H     
( 
 

 
    !"!$

$ 
       

)
1. 3$



/ 
 
  
  
2. 3    
  )!8,
!8,DD D#
D#D

9""9"1,"#
[1] ,&

%$3  2+:F *K.!

  -

 !
 
 0

 #$ 2-!0#*3
11 3 -


G 
 , &


&
 $2-G,&&*N ?
 

 FC   
 2=*  [2] #02+:=*3 $  ' 

2!    
 E1-K"9#-&J 8 39J31!
%3&- 89",8E1&J*

(()* %&
& 


 ' 
  [3] # (B  
K-  ,8 9C   ,BO

 


  !"! $     


   #  ! 2+:+ # * -
  
   

 
 

 

  
 
 

)   $      

 -


& I&
 



  0 
  %
,


-
  2+;(

  F+II 
   
  >: &  +;F*3, 
  !"!$=+  

  [4] E (3
!2+:F*,$ 0 )-

"' 


%
#   $-


G 
 %
.
 
& I%
 
 
&
 $
 +,#&" 
[5] # D 
 GJ  ! ! O 
 M , 2+:5* 9  

2* & 21  *
2+* ;5H
0 ,

 #

   !
 &
  !
2=* >:H "'  9   $ +: +:5  
)@@ @

@@% # $(J (+:5
& ' 
     !"! $ 
  [6] B  K 0 0 K 0     C O B $  3 !
 

& +:
+ 2+:: 8 * ) 3    $   




 
-
0 
  '" 

& +:&#
 
  !"!$ ,


, 1 !
2",+1!*2+(+?*
#
 
  1    [7]   J ,   , , 
 B3  9  % O
&0 =+
C  ! 2+::I G
$* #-!!) 3     

&1 F++

    ' 
 $ 

   -
 #$ # 

0 F>
1 :
+::I.-,##P:IF+
. -


,

-"""2
(:*
& +& 

   !"!$ [8] 3(%
3OM C2+:+8 *3
$ 


     ' 

     -
    

"3#E9" &19 !9 09 19 3,,E93,J 09",-#-81 E

 #  2 3M39"* +:+ ; -



,

-"""2+>(=*
K3E" I?I+H ::H H :H I?I=H F:+H [9] , 
G#!  J 
,O0
$02+::5G
$*
(! Q   ) .    
   

 -

3

 ,$  $N,9J0&8 +::5 #
 %


(()% "& " .   2F=:(FF?*


3   !"! $  
      
    [10] , 2+:=*8'    
"
 2F*EB)8' 

   
$       !"!   
 $0
 '   F+II  
 !"!
 
[11] C $ G 2+::>* 9  "'
)& , &  
 

E#30

E
 $)0

112
113

También podría gustarte