Está en la página 1de 5

Advanced Methods for Detecting Unusual Behaviors on Networks in Real-Time

Xu Gang xugang@chinaweal.com China Weal Business Machinery CO., LTD Zhang Hui hzhang@cemet.edu.cn CERNET Center, Tsinghua University
Abstract -- This paper introduces a brand-new approach to detecting unusual behaviors on network by real-time analysis of packets appearing pattern. The analysis methods are based on a cross-queuing model. In the model, length of queue and threshold of repeated appearance time are key parameters to filter unusual behaviors. Practical implementation and analysis have verified that the methods can detect not only attacks on network, but also other unusual behaviors, such as unconscious wrong operations, incorrect configurations, etc. The methods provide a real-time efficient way to analyse traffic on high-speed network and can help to increase valid usage rates of network resources.
Index terms -- Traffic Analysis, Detection of Attacks, Behaviors on Network, Packet Appearing Pattern, Real-Time Analysis

A.

INTRODUCTION

The rapid development of network during the outgoing ten years have been changing human life dramatically by means of numerous applications and services based on network. However, this also brings a lot of serious security problems; the weaknesses of current network security can be made use of by whoever intends to attack network wickedly[ 13. In most cases, hostile users can send millions of packets to all or some special ports of a host repeatedly to make it stop working; also they can scan millions of different IP addresses intensively to make the loads of correlated routers up to a dangerous high level. Some sophisticated attackers even can do these by IP spoofing in order to evade tracing. The popularity and accessibility of todays network make the normal applications and services on network potentially vulnerable. Important information services may be stopped because of the host crash resulted from attacks we even can not know from where; limited connection bandwidth may be full of garbage attacking packets so that normal customers hardly can get quality of services. Moreover, the most serious attack can even make the whole or part of a large-scale network breakdown. In our statistic, on CERNET (China Education and Research Network) backbones, average 30% of bandwidth in one day is taken up by such attacks or unusual mass traffic behaviors, and the peak waste of
0-7803-6394-9/00/$10.008 2000 IEEE.

bandwidth is up to 60%. It is a great loss of the limited network resources. Recently, people have done much work on Internet measurements, and many traffic analysis tools have also been implemented[2][3], such as OCXmon, Coral, netperf, tcpdump, etc. But most work and tools focused on overall performance measurement or traffic data acquisition & statistic[2]. Few people have done work about analysis of one behavior or one kind of behaviors in real-time on network. So, facing with the challenge of serious problems caused by attacks, one kind of unusual behaviors on network, we have to find brand-new solutions. One effective way is to observe packet appearing frequency on network because all such attacks on network take on the same characteristic: sending millions of packets in a short period. It means that if someone wants to attack a host or network, he always sends packets with high frequency, otherwise his attacks will have little effects on objects. We noticed that such attacks also have following common weaknesses. Packets with same source IP address appear with abnormally high frequency on network; or if the attacker hides his source IP, the attacked object addresses are always the same, so packets with same destination IP will appear with abnormally high frequency. Therefore, our solution is to detect the abnormally high appearing frequency of source or destination IP on network. It is illustrated by figure 1, from which we can also see that different thresholds can lead to different detecting results.

IP addresses

Figure1 .Illustration of detecting abnormally high frequency

B.

DETAILED APPROACH

As well known, every packet broadcast on Ethernet can be captured[4], so it is easy to get packet information from IP layer. In other cases, such inforination also can be gathered from special devices or interfaces[5], such as Cisco's Netflow, SNMP MIB, etc. Generally, the information is presented in the format: protocol types, source IP, source port, destination IP, destination port and traffic[4]. However, because of mass traffic on network, especially on backbone or some important gateways with OC3, OC12 or even higher bandwidth, it is a great challenge to analyse detailed traffic characteristics in real tiine[2][3][5]. It means that we not only must capture every packet on network at a rate of millions of packets per second, but also, at the same time, should try to find out trends or other useful information from such mass packets. By real-time analysis, network managers are able to monitor current attacks or other unusual behaviors and get knowledge about current valid usage rates of network resources, which is very important for network managers to take right respones at once, not just to get loss reports after events.

buffer is allocated to store different data records, and one record may include many fields, such as IP address (source IP or destination IP), hit number or other fields needed. The buffer is fixed in length and every record is sorted by IP addresses. Thus, a given IP address can be quickly located froin the record array using binary search. Surely, there are some additional costs to sort records when new record is added to the buffer, or replaces existing one in the buffer. Hit number field stores number of times the corresponding IP address appears recently. In figure2, we can also find a queue composed of double link nodes; each node represents one rccord in thc buffer. Actually, the queue plays a very important role in our methods. To easily access the corresponding node of queue from buffer record, there is an additional field to hold the pointer of the node in every record. Next, an analysis strategy is illustrated by figure 3, which focuses on the appearing frequency of either source IP or destination IP. When a packet is captured on network, its IP (source IP/destination IP) is examined first whether in the buffer or not. If yes, the hit number of corresponding record is increased by 1 and the corresponding node in queue is moved to the head of queue. Otherwise, and the buffer is not full, new record is
queue

one record
IP address array (sorted) index

\
Hit number Pointer array
L
P 1

index1

. - _ .
1 2 3
I

202.112.0.1 202.112.0.2 202.112.0.3 202.112.0.4


~

*-

_ - -

index2
L

------I.
. -A

index3

I
I

l
202.112.5.100 202.1 12.5.101 202.112.5.102 202.112.5.103

I I

n-3 n-2 n-I n

I
I

I
I

hn

I I
________I

Figure 2. Data structures of the.detecting methods


1st.

Methods Design

NOW,we present our met11ods to analyze such traffic flow efficiently and get very useful infomation concerning attacks on network. As shown in figure 2, a

appended to buffer, and corresponding new node is created in queue and also placed to the head of queue. However, if the IP is not found in buffer and buffer is full, one existing record in buffer has to be replaced with the Which one should be replaced? The record new

292

corresponding to the tail node in queue. After replacement, the new tail node should be moved to the head of queue. If hit number of a record is bigger than the threshold we set, the IP in the record will be placed in a special array and its corresponding node in queue will be moved to the tail, so it will be replaced preferentially. Because every IP acquired from packet is first located in the special array which has small length, its old record inbuffer will not be hit anymore, and it will be replaced by next new IP soon. For every IP in the special array, its detailed information is recorded, including some necessary samplings. All the data stored will be carefully analyzed in next step.

Based on the above-mentioned, we let M stand for hit number threshold, N the length of buffer. Then we can find that an IP is selected as a potential attacker or victim ifand only ifthe IP continually appears M times, and there are no more than N different others between each pair of consecutive appearances of the IP. Thereby, this condition assures that the selected IP addresses must appear with high frequency.

2nd.

Analysis o f the methods

queue

-+=$
if y is hited then :
I
I

queue

-.

queue

new

queue

-p-@yJ+ *%%
I

4
I

added to the queue, then

queue

+ f J & $
if ys hit number bigger than the threshold, then

As we known, in real-time traffic analysis, the more historical information bound with current information, the more valuable and accurate results. For example, we can account total packet number with same source or destination IP in one minute so as to compute IP appearing frequency, consequently the result may be more accurate. However, it is impossible to do in this way on network backbone, because these implementations need huge memory and CPU time, and that must result in lose of packets when running. The real-time analysis methods, however, can get more accurate results with less cost both in memory and CPU time. At first glimpse, little historical information is used in the methods, and we even do not use interval time. But the queue and the strategy can provide much historical information. From the above description of the methods, except the IP with high appearing frequency, which are selected to a special array, each record replaced is the one which is hit least times recently. And the time complexity of the methods is o(buffer length) and space complexity is O(buffer length). In the methods, there are two important parameters: the buffer length--N and the hit number threshold--M. If N increases, it means that the number of different IP between each pair of consecutive same IP can increase, so the selection condition becomes weak. If M increases, however, the selection condition becomes strong. We have deduced two formulas to describe the relationship between parameters M and N. Suppose: B is network bandwidth (bps), P is the biggest PNPS (Packet Number Per Second) acceptable by victim, Q is the average PNPS of normal behaviors, T is the average time duration of normal behaviors, and L is the average packet length (Bytes), then: B/(8*L) _____________ The average PNPS on network (B/(8*L))/P -------- The smallest average interval packet number between two consecutive attack packets Therefore: N > (B/(8*L))/P 9 N > B/(8*L*P) ----- (1)

queue

i n
I
I

Figure 3. Analysis Strategy

(B/(8*L))/Q -------- The average interval packet number between two consecutive packets of normal behavior

Therefore: N < (B/(S*L))/Q 3 N < B/(S*L*Q) ----- (2) N/(B/(S*L)) --------- The biggest interval time between two consecutive attack packets. Therefore: M* N/(B/(S*L)) > T 3 M*N > T*B/(8*L) --- (3) 3rd. Cases study Using the real-time traffic analysis methods, we have implemented a probe to detect attacks or unusual behaviors on network. First, the probe is put on the gateway which connects CERNET to Internet. The gateway has 7M bps bandwidth capacity and there are about 1000-4000 packets passing it per second.
Attack Packet Frequency 250 -f

Next, we downloaded a big file from Internet via FTP and monitored its traffic packet appearing pattern. We repeated the experiment in a few days, and gathered data in different conditions, especially in different bandwidth. Two figures (Figure 5 and Figure 6) are provided to illustrate packet frequencies of normal behaviors with different current network bandwidths. We also did an experiment to simulate attack behavior. 200,000 packets are continually sent to a same IP in 17 minutes and figure 4 illustrates such packet frequency. Figure 7 illustrates the comparison of PNPS (Packet Number Per Second) between the results of two experiments (FTP and attack). By the comparison, it can be concluded that although time information is not used in the methods, the results actually include time information and they do reflect real situation. And we also found that although noimal packet
Normal packet frequency (FTP. bandwidth avg 5 2M bps)

-1

200 I

450 1

__

. -

. _ - FTP

packet

4000

I
~

3500

9 0

X 0

150I

3 loot
c al -

50

0
Packet No

0
x105

3 Packet No

5 x 10

6
1

Figure 4. Attack packet frecjuency (The Interval Packet Number is the packet number between two consecutive packets of a same flow.)
4500 4000 Normal packet frequency (FTP. bandwidth avg 4 5M bps) ---- - . ,. . r T T - 1 i

Figure 6. Normal packet frequency. (FTP, bandwidth: 4.5Mbps)

~---~~-300 -1

Com+son.PNPSmsn

attach and normal behavior Attack oacket number oer second NormallFTPI oacket number oer second

I
PN

3500

5 3000j
E

i
I
N"

2500L

-2

Packet No.

lo4

500

1000 1500 Seconds (beain on 6/26/99 10 18)

2000

2500

Figure 5. Normal packet frequency (FTP, bandwidth: 5.2Mbps)

Figure 7. Comparison PNPS between attack and normal behavior

frequencies are different at different current bandwidth usage, they all have great distinction with unusual behaviors. Because of the great difference, it is easy to find a threshold to distinguish unusual behaviors from normal behaviors. For example, we can set parameter N=1000, M=40000. Then, based on the methods, we can easily detect many unusual behaviors, such as attacking, being attacked or other aberrations. In fact, our probe really detected our simulated attack and ignored our ftp. It should be emphasized that the selection of parameter N and M is flexible, because there are a series of factors affecting them, including not only the variety of bandwidth, but also service types, even how unusual the behaviors are. Other than our experiments, the probe is also detecting other unusual behaviors on network in real-time. It has detected and recorded many attacks and other unusual behaviors. For example, at 6/26/99 15:45, the probe found an unusual behavior from the address 212.33.70.18, then recorded and sampled its behavior at once. During 15:45 and 19:00, the probe recorded that the address sent 1,096,837 packets, and its peak PNPS is 136.55. At same time, the average total PNPS at the gateway is 1868.02, so the unusual behavior took 7.3% of total connection bandwidth. From the sampling results, we found that the IP address was scanning millions of addresses in CERNET, so we regarded its behavior as an attack. The attack is not the most serious, and the probe has ever recorded an attack whose PNPS was up to 800, one fourth of our network PNPS capacity. Of course, if threshold parameters N and M are not set so strict, the probe also can detect unusual behaviors other than attacks, such as big file ftp, many people visit WWW using one proxy at same time, etc. The records of such behaviors are also valuable; they can be applied to many other applications. C. APPLICATIONS O F THE METHODS

must cause high appearing frequency of same IP address. Sometimes, traffic is the basis of IP accounting, and authorized customers are unwilling to pay for mass nonsense traffic. So, the detecting probe will alarm such customers and lead to improvement of ISPs services. Another important benefit of the methods is that it will improve quality of service for customers. With the development of network, more and more custcjmers understand that since they pay for network services, they must get high quality of them[2]. But if network is full of useless packets caused by attack or other faults, how to ensure the quality of services customers need? How ISP can provide users the report of QoS? Therefor, We think the methods can provide efficient solution ways for these problems.

D.

CONCLUSION & FUTURE WORK

We have pointed out some realistic problems and threats about attacks on network. To solve these problems and overcome great challenges in implementation, we described our advanced methods and provided some cases to prove that the methods can be implemented in real-time network detecting, and got satisfied detecting results. In the implementation and analysis, we found that the methods not only can be used to detect attacks on network, but also are helpful to analyze behaviors on network, especially about Mass Traffic Network Applications, such as attacks, big file ftp, multimedia on network, etc. We found that such Mass Traffic Network Applications play important roles in whole network traffic and have great influence to overall network performence, because our data shows that the traffic caused by them often takes up more than one third of total traffic. And we found their packet appearing patterns have relationship with network congestion, and the relationship has regulation. We will further our research in this area. Moreover, we will further improvement of the detecting methods and try to build a more accurate model to outline relationship between parameters N, M with bandwidth and behaviors.

It has been mentioned that attacks on network become more and more serious threat to modem network applications and services. So, the detecting methods can guard important hosts or network, detect, any potential attack, find the attackers, and support right responses. Anti-attack is only one part of applications of the methods. The detecting methods can also find unusual resources monopoly caused by either wrong configurations or unconscious operations. It will be very helpful for ISPs to increase the valid usage rates of their network resources. Further more, the methods can also find unusual mass traffic. Because the length of IP packet has up limitation (65535 octets), mass traffic. in a short period

REFERENCES
Chris Herringshaw, Detecting Attacks on Networks, IEEE Computer, December 1997 Tracie Monk, k claffy, Cooperation in Intemet Data Acquisition and Analysis, http://www.caida.org/Papers/Cooperation/ Tracie Monk, k claffy, Intemet data acquisition & analysis: status & next steps, http://~vww.caida.org/Papers/data-inet97.htnil k claffy, Greg Miller, Kevin Thompson, the nature of the beast: recent traffic measurements from an Internet backbone, http://www.caida.org/ Papers/IiictLS/index.html Kevin Thompson, Gregory J. Miller, Rick Wilder, Wide-Area Intemet Traffic Patterns and Characteristics, IEEE Network, Nov1997 Joel Apisdorf, k claffy, Kevin Thompson, OC3mon: Flexible, Affordable, High-Performance Statistics Collection, http:// www.isoc.org/isoc/whatis/conferences/ine~97/proceedin~s/Fl/Fl~2.H TM

295