Está en la página 1de 5

2013 IEEE Wireless Communications and Networking Conference (WCNC): NETWORKS

A Dynamic Afnity Propagation Clustering


Algorithm for Cell Outage Detection in
Self-Healing Networks
Yu Ma, Mugen Peng, Wenqian Xue and Xiaodong Ji
Key Laboratory of Universal Wireless Communication, Ministry of Education
Beijing University of Posts and Telecommunications, Beijing, 100876, China
Email:myycandy@gmail.com

AbstractWith the rapid development of the mobile wireless important and useful for timely triggering the appropriate com-
system, the operator is experiencing unprecedented challenges pensation methods. According to the self-healing technique,
on service maintenance and operational expenditure, which the cell with performance degradation can be brought back
drives the demand for realizing automation in current networks.
The cell outage detection is considered as an effective way into operation quickly.
to automatically detect network fault. Our work presents an Typical methods for detecting outage cells are usually
automated cell outage detection mechanism in which a clustering based on the deviation of an actual cell performance from
technique called Dynamic Afnity Propagation (DAP) clustering an automatically determined expected cell performance [5].
algorithm is introduced. Performance metrics are collected from Research into the automated diagnosis of anomaly failures
the network during its regular operation and then fed into
the algorithm to produce optimal clusters for further anomaly has been previously studied via knowledge-based approaches.
detection. The proposed mechanism has been implemented in the A quantitative model in [6] was rstly built by combining
LTE-Advanced simulation environment, through which we have the knowledge of GSM experts and performance metrics
successfully detected the congured cell outages and located their from a real network. An automated diagnosis model was
specic outage areas. described in [7] with Bayesian network for UMTS networks.
In [8], a classier was constructed through a given training
set of patterns. These model-driven methods mentioned above
I. INTRODUCTION
require prior real measurements and assumptions about outage
The modern wireless communication system is becoming causes and symptoms. Also, the prole built by the faultless
increasingly complex with its requirement for high through- behavior of network which will be a basis of comparison
put, seamless coverage and satisfying quality of service [1]. to identify signicant deviations may not be applicable to
Maintaining the intricate network at a properly working state all situation. Alternatively, exploratory data-driven approaches
by traditional approaches is rather time-wasting and energy- without prior respective proles are also used to detect network
consuming [2]. In order to overcome problems about capital outages. In [9], diffusion maps is used for dimensionality
and operational expenditure reduction and to improve system reduction, a density function and K-means clustering are
performance at the same time, the self-organizing networks employed for further distinction between normal and abnormal
(SON), which owns the ability to congure, optimize, and heal data samples. However, the K of K-means clustering is not
itself automatically, is introduced and promoted by the Third easy to dene as the number of faulty cells to be classied
Generation Partnership (3GPP) and Next Generation Mobile during anomaly detection stage is unpredicted beforehand.
Networks (NGMN) [3]. To present a simple and fast solution, clustering methods,
As the main component of SON, the self-healing technique especially an adaptive and dynamic clustering algorithm in
presents a promising beginning to automatically detect and pursuit of optimal clusters, are now becoming widely used
mitigate outages due to unexpected failures which could for processing data points without any prior experimental
signicantly affect customer satisfaction and system reliability. information.
The functionality is mainly comprised of two parts: cell outage In this paper, a Dynamic Afnity Propagation (DAP) clus-
detection and cell outage compensation. An outage cell is tering algorithm is proposed to produce optimal clusters for
supposed to be in a situation where the UE cannot establish UEs performance metrics based upon a clustering quality
and maintain all or only a limited set of the radio bearers [4]. evaluation criterion Silhouette index, and the algorithm is then
There are multiple causes for a network fault, e.g., hardware employed to detect and locate the outage cells via UEs position
and software failures, external failures or even erroneous information. In section II, system model for outage detection
conguration. Some outage cases can be detected easily by is briey discussed. Section III includes a detailed description
operation, administration and maintenance (OAM) system, about DAP clustering algorithm. Further, based on a given
while others require a long-term manual performance analysis simulation scenario, the results of our proposed method are
and often cause subscriber complaints [4]. The capability of provided in section IV. Finally, the paper is concluded in
discovering potential failures in radio networks is particularly section V.

978-1-4673-5939-9/13/$31.00 2013 IEEE 2266


2

II. SYSTEM MODEL provided service is interrupted almost at the same time leading
Modern mobile wireless communication system is managed to the worst outage problem. In other words, the two sectors
and maintained based on cellular networks which consists of 0(1) and 4(1) will be treated as problematic cells to be detected
several cells each. Equipments with monitoring and measur- during the simulation process.
ing capabilities like UEs, eNodeBs and OAM are activated
periodically to collect available measurements (related radio B. Collected Measurements
parameters, counters, KPIs) and then feed them into the Once a cell outage emerges in the monitored network, the
analysis module such as OAM. In this way, the self-healing performance measurements collected at that moment will be
functionality integrated into radio networks can be applied abnormal. Since UE can report measurements like RSRP and
to process these gathered measurements and automatically RSRQ for each serving cell and its neighboring cells, we only
analyze them to decide if there occurs any abnormal condition select some useful and available parameters to simplify the
in current monitoring period. When failure in networks is data collecting and processing work. The performance metrics
detected through a proper cell outage detection algorithm, the are collected with event-triggered manner instead by periodical
corresponding cell outage compensation action will be timely triggering method to obtain less reference points for further
triggered to minimize network performance degradation by computational complexity decrease. The specic triggering
adjusting control parameters of neighboring cells. event we employ is A3 event, which corresponds to a situation,
when neighbor becomes amount of offset better than serving
A. Scenario Modeling cell.
When there is something wrong with a particular cell in the
To simulate the actual mobile wireless communication sys- radio networks, both the signal level received from the serving
tem, we consider that the monitoring area of cell outage detec- cell and the inter-cell interference will be affected at the same
tion in LTE-Advanced simulation environment is composed of time. Then the values of Reference Signal Received Power
19 regular hexagonal cells with 3 sectors per cell. The scenario (RSRP) and Reference Signal Received Quality (RSRQ) which
is shown in Fig.1. To simplify the experimental analysis, could primely reect the system performance may experience
some necessary assumptions are made. Firstly, the whole a signicant change upon certain network elements failures.
network is assumed to be divided into several monitoring There are four measurements: serving RSRP, maximum neigh-
areas and each of them is comprised of cells with similar bor RSRP, serving RSRQ, maximum neighbor RSRQ being
performance, so that regular cells can be taken as references used for input parameters of the proposed cell outage detection
for comparison with non-regular cells. In addition, the antenna algorithm. Moreover, position information is also gathered to
gain failure is utilized to indicate hardware breakdown and to locate the specic outage cells when anomaly happens.
make difference between normal and outage cells. Different
antenna gain reductions are considered to represent different
III. ALGORITHM DESCRIPTION
outage degrees.
Let X = {x1 , x2 , , xN } be the set of perfor-
mance measurements reported by N UEs in A3-triggered
manner, where xi (i = 1, 2, , N ) is a four-dimensional
data vector containing key performance metrics which
is specically for representing a user. That is xi =
{RSRPis , RSRPin , RSRQsi , RSRQni }. At rst, the data set
are pre-processed to be normalized, and then be taken as
input parameters for cell outage detection algorithm described
in the following. The algorithm named Dynamic Afnity
Propagation (DAP) is proposed on the basis of a traditional
clustering technique called Afnity Propagation (AP) cluster-
ing algorithm.
AP is an innovative clustering method which considers
all data points as potential exemplars simultaneously. To get
Fig. 1. Simulation scenario with two different kinds of anomaly cells in a high-quality cluster, real-valued messages are iteratively
cellular network. exchanged in a factor graph without pre-specifying clustering
As can be seen in Fig.1, the antenna gains of most cells are numbers. It takes into a collection of similarities {s (i, k)}
set to be normal, so that users served by these eNodeBs can between pairs of data points, where, here and hereafter, i and
get satised quality of service, and they are supposed to be k represent data xi and xk respectively. The self-similarities
reference cells for distinction between good and bad samples. pk = s (k, k) also referred to as input preferences are set to be
While, the antenna gain of sector 0(1) is 50dBi lower than a common value where pk = p. Low preference value leads
the normal ones, as a result, most users cannot maintain radio to a small number of clusters, while high one could result in
bearers and will experience serious quality deterioration. The a large number of clusters [10]. That is to say, the number of
sector 4(1) reduces its antenna gain 100dBi that means the clusters is directly inuenced by the preference value which is
serving eNodeB antenna has totally broken down and all the generally hard to determine. To make original AP algorithm

2267
3

adaptive and dynamic, the modied method called Dynamic For a given cluster cK = {c1 , c2 , ..., ck } where cK denes
Afnity Propagation (DAP) clustering algorithm is proposed a clustering result in which there are K clusters and ci , i =
[11]. It could automatically produce optimal clusters through 1, 2, , K refers to each of them. The Silhouette index,
a step-by-step search in preference space and a clustering which reects the compactness and separation of clusters,
criterion for evaluating cluster qualities. is introduced to evaluate the clustering qualities [12]. The
In order to produce high-quality clustering results without Silhouette value of a data point i, Sil (i) is dened as follows:
increasing time complexity, it is necessary to bring forward an b (i) a (i)
effective adjustment mechanism. First, a reasonable searching Sil (i) = , (7)
max {a (i) ,b (i)}
space [pmin ,pmax ] of preference p is gured out, where pmin
leads to clustering N data points into two clusters, while pmax where a (i) denotes the average distance between data point

leads to clustering N data pints into N clusters which is i to other data points within the cluster to which i belongs.
the suitable upper limit of optimal cluster numbers. Further, b (i) is the minimum value among all average distance from
there are two kinds of preference steps: pstep1 and pstep2 data point i to data points in another cluster to which i doesnt
which contribute to the convergence and variational cluster belongs. Then, the average of Sil (i) in cluster ci is computed:
numbers respectively. More specically, pstep1 is a kind of 1 
m

ne adjustment used for escaping numerical oscillations, while Sav (ci ) = Sil (i), (8)
m i=1
pstep2 belongs to a kind of coarse adjustment conducing to
gradually reduced clustering numbers. Similar to AP, DAP also where m means the number of data points in cluster ci . For all
takes the similarities between data points as input parameters Sav (ci ) values, the worst cluster in {c1 , c2 , , ck } is selected
in which the self-similarity p is initialized to pmax . Then the by the following formula:
traditional AP algorithm is executed. During the message-
Savmin (K) = min{Sav (ci )}. (9)
passing process, two kinds of competitive information called
responsibility r (i, k) and availability a (i, k) are iteratively K mentioned above to represent the clustering  numbers is
updated. They are computed as follows respectively [10]: actually a variable where K {2, 3, ..., N }. Based on
 
r(i, k) = s(i, k) max

{a(i, k  ) + s(i, k  )} . (1) repeated execution in DAP algorithm, N 1 clustering
k =k
results
  for all data points can be generated. Accordingly,
If i = k, N 1 values for Savmin (K) will be produced and they

 form a numerical set {Savmin (K)} from which the best
a(i, k) = min 0, r(k, k) + max {0, r(i , k)} , (2) clustering result for all data points is gured out. That is
 
i {i,k}
/
Sind = max {Savmin (K)} , K 2, 3, , N , (10)
Else, 
a(k, k) = max {0, r(i , k)}. (3) where Sind stands for the largest value that indicates the best
i =k optimal clustering quality. Finally, the corresponding optimal
cluster number is found as follows:
In the rst iteration, the availabilities are set to zero.  
In addition, to avoid numerical oscillations, AP introduces opti cluster = arg max{Savmin (K)}, K 2, 3, , N .
K
damping factor , (0,1). Then the two messages are (11)
revised to times from the previous iteration plus (1) times The basic procedure of DAP clustering algorithm is de-
its prescribed updated value. They are updated below: scribed as below:
t t1 t
r(i, k) = r(i, k) + (1 ) r(i, k) , (4) Algorithm 1 Dynamic Afnity Propagation Algorithm
t
a(i, k) = a(i, k)
t1
+ (1 ) a(i, k) ,
t
(5) 1: Input: s (i,k) , i = k.
2: Output: Optimal clustering result.
where t denotes iteration times. 3: Compute the range of preference p: p in [pmin ,pmax ];
During afnity propagation, availabilities and responsibili- 4: Initialize the preference p: p=pmax ;
ties can be combined to identify exemplars which are repre- 5: Execute AP to generate K clusters;
sentative for data points in a cluster. The exemplar for data i 6: If not converge, p=ppstep1 , return to index 5, until
is determined as follows: converge;
7: p=ppstep2 , return to index 5, until p = pmin ;
exampi = arg max {a (i, k) + r (i, k)} , (6)
k 8: Produce optimal clustering result based on Silhouette
index and the algorithm terminates.
where exampi means the exemplar of the data point i.
When AP algorithm with pre-specied value p does not
meet the convergence condition, the preference will decrease If Sind is lower than a certain threshold (generally suggested
by step pstep1 : p=ppstep1 until it converges. Afterwards, the to be 0.5), that means both the compactness of inner-cluster
value p declines by step pstep2 : p=ppstep2 in order to yield and the separation of inter-clusters are not so good, and there
falling clusters and terminated after two clusters are found out. will exist no much difference among the classied categories

2268
4

TABLE I
SIMULATION CONFIGURATION PARAMETERS the quality indexes could be computed upon the dened for-
mula, and the largest one demonstrates the optimal clustering
Parameters Value quality. The dynamic variation of evaluation values related to
Cell type Macro cell different clustering numbers is presented in Fig.2.
Cellular layout 19 cells, each with 3 sectors
Inter site distance 500m 0.7

UE distribution Uniformly distributed


0.6
UE numbers 40 users per sector

The minimum of the Silhouette values


Path loss model P L [dB] = 128.1 + 37.6log10 R 0.5

Shadowing standard deviation 8dB 0.4

Regular antenna gain 14dBi


0.3
Non-regular cell 0(1) antenna gain 14dBi+(-50dBi)
Non-regular cell 4(1) antenna gain 14dBi+(-100dBi) 0.2

eNodeB max TX power 46dBm 0.1

Initial user access criterion Access by location


0
Trafc model Full buffer
Handover margin 0.5dB 0.1
0 5 10 15 20 25 30 35 40
The cluster number in DAP
Handover time to trigger 80ms

Fig. 2. The Silhouette index changes with different number of cluster.


which should be merged into one group [12]. In our work, Sind It can be observed from the gure that the highest Silhouette
can be applied to evaluate the probability for cell outage. The value happens when cluster number is three, which indicates
smaller Sind is, the lower the probability would be. that the collected UEs performance measurements are charac-
In this paper, we adopt DAP algorithm to process per- terized as three different categories. The Silhouette value 0.62
formance measurements collected from monitoring area de- larger than 0.5 (recommended threshold mentioned in section
scribed in Section II and produce optimal clusters afterwards, III) means that there exist signicant differences among the
which are then used to locate outages based on user location three categories, that is to say, poorly performed users exist
as shown in Section IV. and outages have occurred. Also, the corresponding result for
optimal clustering depicted in a two-dimensional coordinate is
shown in Fig.3. From the point of serving RSRP, we can see
IV. SIMULATION AND RESULTS ANALYSIS that the values of most data points (marked in khaki color)
To simulate the actual wireless communication system, a in the right side of the gure are all larger than -100dBm.
fully dynamic system simulation tool is employed in this While the average serving RSRP level of data in the middle
research. The simulation parameters are based on the 3GPP area of the gure is smaller than -130dBm, and the level of
specication [13-14]. Different faults which lead to emergence data (marked in green color) in the left side is even smaller
of cell outages can be detected separately as we described than -160dBm indicating the worst faulty condition. From the
in section II. In our simulation work, different antenna gain simulation scenario, it can be known easily that there are three
reductions are congured to represent different degree of kinds of cells being congured. The clustering result is just in
failures for cell outages as shown in Fig.1. In addition, the accordance with the pre-specied conguration.
parameter setting part about antenna gains is nished before
UEs accessing to a certain cell in the experiment. However, the
cell outages in real network may happen abruptly when users  
 


in their coverage area are still in connection with the potential 
faulty eNodeBs. To make sure that users can access the prob- 

lematic cells at the beginning of our simulation and later report

key performance indicators when A3 event is triggered, the
 
initial access approach is based on location information instead       
of power information. The detailed simulation parameters are 


listed in Table I above.
Fig. 3. Optimal clustering results for performance measurements based on
the DAP algorithm.
A. Optimal Clustering
As detailed in Section III, the pre-processed UE measure- B. Localization of Different Cell Outage
ments treated as data set are fed into the proposed DAP For localization of different cell outages in the nal stage of
algorithm for further detection of cell outages.
  The algorithm detection framework, UEs position information is applied. The
produces different clusters varied from N clusters to two clustered data points are all mapped into the network topology
clusters through searching in the preference space. According upon these location messages. The result is shown in Fig.4
to the general clustering quality evaluation Silhouette criterion, from which three distinct categories in Fig.3 are located in

2269
5

the actual geographic regions respectively. Khaki-marked user an appropriate cell outage compensation method.
data in hollow circles are distributed uniformly over the most
coverage area and only around the edge of the serving cells. V. CONCLUSION
Since A3 event is only possible to be triggered by edge users In this paper, we present a cell outage detection mechanism
in cells which could provide satisfying service, so the Khaki- for discovering network failures automatically in future mobile
marked data can be considered as normal and their serving wireless system. On the basis of network measurements from
cells perform well. While the red-marked data in solid triangles UEs, eNodeBs and OAM, the proposed approach named Dy-
are only concentrated in sector 0(1) where the antenna gain has namic Afnity Propagation algorithm has been implemented
dropped 50dBi and the sector 4(1) whose BS antenna gain has successfully in LTE-Advanced simulation environment. In the
a decrease of 100dBi is merely lled with green-marked data test, two different hardware failures represented by different
in solid circles. The situation that users of sector 0(1) and 4(1) antenna gain reductions have been congured beforehand,
are concentrated in the center and around the edge of coverage the gathered key performance metrics are then classied into
area demonstrates that there occurs network breakdown and optimal clusters based on the detection algorithm. Finally, the
measurements reported by them in these regions are abnormal. user position information is employed to map clustering results
The same conclusion can be made more visually in Fig.5 into network topology through which we can make a clearly
which is integrated by Fig.1 and Fig.4. decision that there are two outages during current monitoring
period and further locate the specic outage areas at the same
time. The proposed mechanism can be further extended to be
employed in heterogeneous networks for fault detection based
3000
on collected key performance measurements.

ACKNOWLEDGEMENT
User position Y

2500
This work was supported in part by the State Major Science
and Technology Special Projects (Grant No. 2011ZX03003-
2000 002-01), the National Natural Science Foundation of China
(Grant No. 61222103), and the Beijing Natural Science Foun-
dation (Grant No. 4131003).
1500

R EFERENCES
[1] M. Peng, and W. Wang, Technologies and standards for TD-SCDMA
1000 1500 2000 2500 3000
User position X evolutions to IMT-Advanced, IEEE Commun. Mag., vol.47, no.12, pp.
50-58, Dec. 2009.
[2] M. Peng, and W. Wang, An Adaptive Energy Saving Mechanism in the
Fig. 4. Localization of anomaly based on UEs position information. Wireless Packet Access Network, IEEE WCNC 2008, Apr. 2008.
[3] M. M. S. Marwangi, N. Fisal, S. K. S. Yusof, etal. Challenges and
Practical Implementation of Self-Organizing Networks in LTE/LTE-
Advanced Systems, IEEE ICIM 2011, Nov. 2011.
[4] M. Amirijoo, L. Jorguseski, T. Kurner, etal. Cell Outage Management
in LTE Networks, IEEE ISWCS 2009, Sept. 2009.
[5] B. Cheung, S. Fishkin, G. Kumar, etal. Method of Monitoring Wireless
Network Performance, Patent US 2006/0063521 A1, Mar. 2006.
[6] R. Barco, V. Wille and L. Diez, System for Automated Diagnosis
in Cellular Network Based on Performance Indicators, Euro. Trans.
Telecomms., Vol. 16, Issue 5, pp. 399 - 409, Sept. 2005.
[7] R. Khanafer, B. Solana, J. Triola, etal. Automated Diagnosis for
UMTS Networks using Bayesian Network Approach, IEEE Trans. Veh.
Technol., Vol. 57, Issue 4, pp. 2451 - 2461, Jul. 2008.
[8] C. Mueller, M. Kaschub, C. Blankenhorn, etal. A Cell Outage Detec-
tion Algorithm Using Neighbor Cell List Reports, Lecture Notes in
Computer Science, Nov. 2008.
[9] Fedor Chernogorov, Jussi Turkka, Tapani Ristaniemi, etal. Detection
of Sleeping Cells in LTE Networks Using Diffusion Maps, IEEE VTC
2011, May 2011.
[10] B. J. Frey and D. Dueck, Clustering by Passing Messages between Data
Points, Science, Vol. 315, No. 5814, pp. 972-976, Feb. 2007.
Fig. 5. A combination of the scenario model and localization map. [11] J. Zhang, X. Tuo, Z. Yuan, etal. Analysis of fMRI Data Using
an Integrated Principal Component Analysis and Supervised Afnity
Detailed analysis according to the clustering result is con- Propagation Clustering Approach, IEEE Trans. Biomed. Eng., Vol. 58,
sistent with simulation conguration. For this reason, it can No. 11, pp. 3184-3196, Nov. 2011.
[12] P. K. Velamuru, R. A. Renaut, H. Guo, etal. Robust clustering of
be stated that, on the basis of the DAP detection approach positron emission tomography data, Joint Interface CSNA, Jun. 2005.
and UEs location information, the cell outage can be detected [13] 3GPP TR 36.814 V9.0.0, Evolved Universal Terrestrial Radio Access (E-
successfully. Furthermore, it is important to note that the UTRA); Further Advancements for E-UTRA Physical Layer Aspects,
Mar. 2010.
anomaly caused by different hardware failures of sector 0(1) [14] 3GPP TR 36.839 V0.5.0, Evolved Universal Terrestrial Radio Access
and 4(1) is classied into two clusters because of different (E-UTRA); Mobility Enhancements in Heterogeneous Networks, Feb.
degrees in interruption, which could be better for determining 2012.

2270

También podría gustarte