Está en la página 1de 7

992 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO.

2, FEBRUARY 2010

optimizes the link-layer PDR and the physical-layer PER to improve Adaptive Sensing Technique to Maximize Spectrum
the overall PLR and, hence, the system performance. Furthermore, it Utilization in Cognitive Radio
distributes the excess packet losses according to predefined weights
to more effectively and flexibly maintain fairness. The simulation Kae Won Choi
results show that the CEPS algorithm can greatly decrease the PLR
experienced by users, particularly when the traffic demand is heavy,
and support more users in the system while meeting a specified PLR Abstract—A cognitive radio (CR) system exploits spectrum bands that
primary users (PUs) are licensed to use. The CR performs channel sensing
objective. The simulation results also show that CEPS is also effective to find spectrum opportunities. Conventional periodic sensing schemes
and more flexible in maintaining fairness among different users. require a long sensing time to detect a weak signal from the PU with fast
channel-usage variation. Since the CR network should be quiet during
a sensing period, a long sensing time results in low spectrum utilization.
R EFERENCES To improve spectrum utilization, we propose a novel sensing scheme that
[1] I. F. Akyildiz, D. A. Levine, and I. Joe, “A slotted CDMA protocol with adaptively decides whether to sense the channel or to transmit the user
BER scheduling for wireless multimedia networks,” IEEE/ACM Trans. data based on previous sensing results. The simulation results show that
Netw., vol. 7, no. 2, pp. 146–158, Apr. 1999. the proposed scheme significantly outperforms the conventional scheme.
[2] P. Kong, K. Chua, and B. Bensaou, “A novel scheduling scheme to
share dropping ratio while guaranteeing a delay bound in a multicode- Index Terms—Cognitive radio (CR), energy detection, opportunistic
CDMA network,” IEEE/ACM Trans. Netw., vol. 11, no. 6, pp. 994–1006, spectrum access, partially observable Markov decision process (POMDP).
Dec. 2003.
[3] V. Huang and W. Zhuang, “QoS-oriented packet scheduling for wire-
less multimedia CDMA communications,” IEEE Trans. Mobile Comput., I. I NTRODUCTION
vol. 3, no. 1, pp. 73–85, Jan./Feb. 2004.
[4] Q. Liu, S. Zhou, and G. B. Giannakis, “Cross-layer scheduling with The cognitive radio (CR) opportunistically accesses spectrum bands
prescribed QoS guarantee in adaptive wireless networks,” IEEE J. Sel. that primary users (PUs) are licensed to use under the condition that the
Areas Commun., vol. 23, no. 5, pp. 1056–1066, May 2005. CR does not interfere with the PU [1]. To avoid interference between
[5] D. Zhao, X. Shen, and J. W. Mark, “Radio resource management for
cellular CDMA systems supporting heterogeneous service,” IEEE Trans.
the CR and the PU, the CR exploits the spectrum opportunity, which
Mobile Comput., vol. 2, no. 2, pp. 147–160, Apr.–Jun. 2003. is defined as the frequency channel that is temporarily not used by the
[6] Q. Liu, S. Zhou, and G. B. Giannakis, “Queuing with adaptive modulation PU. The CR performs channel sensing to find spectrum opportunities.
and coding over wireless links: Cross-layer analysis and design,” IEEE Generally, the CR adopts the periodic sensing strategy, as depicted in
Trans. Wireless Commun., vol. 4, no. 3, pp. 1142–1153, May 2005. Fig. 1(a) (e.g., [2]–[8]). Using this strategy, the CR periodically senses
[7] Q. Liu, S. Zhou, and G. B. Giannakis, “Cross-layer combining of adaptive
modulation and coding with truncated ARQ over wireless links,” IEEE the current operating channel to monitor the PU activity. If the CR
Trans. Wireless Commun., vol. 3, no. 5, pp. 1746–1755, Sep. 2004. detects the PU on the operating channel, then it switches the operating
[8] C. Lin and R. D. Gitlin, “Multi-code CDMA wireless personal communi- channel to find another spectrum opportunity.
cation networks,” in Proc. IEEE Commun., Jun. 1995, pp. 1060–1064. The periodic sensing scheme may work well in a favorable environ-
[9] B. S. Thian, Y. Wang, T. T. Tjhung, and L. W. C. Wong, “A hybrid
receiver scheme for multiuser multicode CDMA systems in multipath
ment; however, its performance significantly degrades under certain
fading channels,” IEEE Trans. Veh. Technol., vol. 56, no. 5, pp. 3014– adverse conditions. For example, if the CR is required to sense very
3023, Sep. 2007. weak PU signals to provide strong protection to the PUs, then the
[10] C. S. Chang and K. C. Chen, “Medium access protocol design for delay- sensing time must be prolonged to guarantee sufficiently low detection
guaranteed multicode CDMA multimedia networks,” IEEE Trans. Wire- error probability. Since the CR is not allowed to transmit any signal
less Commun., vol. 2, no. 6, pp. 1159–1167, Nov. 2003.
[11] P. Y. Kong, K. C. Chua, and B. Bensaou, “Multicode-DRR: A packet- during a sensing period, a long sensing time results in low spectrum
scheduling algorithm for delay guarantee in a multicode-CDMA net- utilization.
work,” IEEE Trans. Wireless Commun., vol. 4, no. 6, pp. 2694–2704, Moreover, the CR may intend to access the very short spectrum
Nov. 2005. opportunity caused by the bursty arrival of PU application traffic.
[12] L. Lenzini, M. Luise, and R. Reggiannini, “CRDA: A collision resolution
and dynamic allocation MAC protocol to integrate data and voice in
Recently, there have been studies on the exploitation of temporal
wireless networks,” IEEE J. Sel. Areas Commun., vol. 19, no. 6, pp. 1153– spectrum opportunities that last on the order of milliseconds (e.g.,
1163, Jun. 2001. [6]–[9]). In this environment, the CR should very frequently perform
[13] F. Yu, V. Krishnamurthy, and V. C. M. Leung, “Cross-layer optimal con- sensing to catch up with the variations in the PU state. Therefore, fast
nection admission control for variable bit rate multimedia traffic in packet PU state variation, combined with weak PU signals, can cause the CR
wireless CDMA networks,” IEEE Trans. Signal Process., vol. 54, no. 2,
pp. 542–555, Feb. 2006. to waste a large portion of time on channel sensing.
In this paper, we propose an “adaptive sensing CR,” as depicted
in Fig. 1(b), which significantly reduces the sensing overhead in
the adverse environment. In contrast with the “periodic sensing CR”
shown in Fig. 1(a), the adaptive sensing CR determines whether to
sense a channel or to transmit data at consecutive decision epochs. By

Manuscript received March 15, 2009; revised July 4, 2009. First published
November 13, 2009; current version published February 19, 2010. The review
of this paper was coordinated by Dr. S. Wei.
The author was with the Telecommunication Business, Samsung Electronics,
Suwon 443-742, Korea. He is now with the Department of Electrical and
Computer Engineering, University of Manitoba, Winnipeg, MB R3T 5V6,
Canada (e-mail: kaewon.choi@gmail.com).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TVT.2009.2036631

0018-9545/$26.00 © 2010 IEEE


IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 2, FEBRUARY 2010 993

sensing CR. In Section IV, we design the decision-making algorithm


by using the POMDP framework. Section V presents some numerical
results, and this paper is concluded with Section VI.

II. R ELATED W ORK


There have been some studies on the investigation of the adaptive
spectrum sensing and the medium access control protocol for CR
networks (e.g., [3]–[8], [13], and [14]). In [3], the authors proposed
the algorithm that finds the optimal sensing parameters for each
frequency channel and chooses the best frequency channels for maxi-
mizing capacity. Zhao and Chen [4] developed the frequency channel-
selection strategy by formulating the CR as a POMDP. In [5], they
Fig. 1. Periodic sensing CR and adaptive sensing CR. (a) Periodic sensing CR. also showed that a simple round-robin channel-selection strategy is
(b) Adaptive sensing CR. close to optimal. The authors of [6]–[8] suggested a statistical model
of the PU activity based on empirical data and proposed channel-
means of this adaptive decision process, the CR can perform channel access strategies that maximize the channel utilization while limiting
sensing only when it is needed, and therefore, unnecessary sensing can interference to the PU. In [13] and [14], the authors proposed the
be avoided. The adaptive sensing CR makes each decision based on adaptive sensing methods that select the frequency channels based on
the previous sensing results. At a decision epoch after sensing, the CR the historical information about the PU channel usage.
can immediately stop sensing and transmit data if the sensing results All these studies deal with the frequency channel-selection problem.
strongly indicate that the channel is vacant. More sensing is required These existing works and our work intend to solve very different
only when the sensing results are not conclusive. problems arising in designing the CR. While the existing schemes
We consider the energy detection [10] as a channel-sensing method. aim to find the frequency channels most likely to be vacant, the
After each sensing period, the energy detector produces a real-valued proposed scheme focuses on efficiently utilizing the sensing results
test statistic, which is the total estimated energy during the sensing and adaptively allocating the sensing periods to minimize the sensing
period. Typically, the periodic sensing CR makes a hard decision on overhead. The existing schemes can only select the frequency channel,
whether the PU is active or not by comparing a test statistic with a not adaptively choose whether to perform channel sensing or to
certain threshold. Therefore, the sensing period of the periodic sensing transmit data. Therefore, the existing schemes constitute the periodic
CR should be sufficiently long so that the decision regarding the PU sensing CR, and they cannot reduce the sensing overhead. Among the
activity is not erroneous. existing works, [4] and [5] also make use of the POMDP framework.
However, we set a sensing period of the adaptive sensing CR to Although these works take the same mathematical approach as our
be much shorter than that of the periodic sensing CR, since a shorter work does, the focus is quite different, as previously noted.
sensing period enhances the adaptability of the CR by increasing the In addition to the energy detector, the cyclostationary feature detec-
frequency of decision epochs. Because of the short sensing period, the tor [15] is also considered as a candidate for the sensing method of
adaptive sensing CR cannot accurately decide the PU activity from a the CR network. Although the cyclostationary feature detector has a
single test statistic. Therefore, the adaptive sensing CR does not make a long detection time and high computational complexity, it can detect
hard decision from a test statistic but instead just takes the test statistic a much weaker PU signal than the energy detector can, owing to its
as a soft “sensing result.” Since such a sensing result is noisy, the robustness to noise uncertainty. In [16], we applied the sequential
adaptive sensing CR simultaneously takes into account the multiple detection framework [17] to the cyclostationary feature detector to
sensing results and combines them to generate reliable information reduce the required detection time. It is noted that the adaptive sensing
regarding the PU activity. CR can also be implemented on top of the cyclostationary feature
At each decision epoch, a “decision-making algorithm” decides detector by using some techniques proposed in [16].
whether to sense the channel or to transmit data. The aim of the
decision-making algorithm is to maximize spectrum utilization while
restricting interference to the PU. To design the optimal algorithm that III. A DAPTIVE S ENSING C OGNITIVE R ADIO
achieves such goal, we use the partially observable Markov decision
A. System Model
process (POMDP) framework [11], [12].
In summary, the proposed CR has the following distinctive features Consider N frequency channels that the PU is licensed to use. The
in comparison with the conventional periodic sensing CR: CR network is allowed to access the channel when it is not occupied by
the PU. A collision occurs when the CR network transmits a signal on
1) Adopting the adaptive sensing structure, the proposed CR can
a channel currently used by the PU. The collision probability should
avoid unnecessary sensing.
be restricted to a certain level.
2) The proposed CR combines multiple soft sensing results from
For the CR, we consider a small-scale network such as the wireless
short sensing periods to enhance adaptability.
personal area network. The CR is a star topology network where a
3) Adaptive decisions are made by the optimal decision-making al-
“master node (MN)” is located at the center of the network, and “slave
gorithm, which was designed by using the POMDP framework.
nodes (SNs)” are attached to the MN. Only the MN performs channel
We will present the simulation results that show how the proposed sensing, whereas the SNs do not. Based on the sensing results, the MN
CR significantly outperforms the periodic sensing CR in terms of chan- makes the decision on the next action at each decision epoch, and it
nel utilization. This performance gain is attributable to the previously orders the SNs to follow its decision by sending a control signal. Upon
listed features of the proposed CR. receiving the control signal, the SNs follow the order in it.
The rest of this paper is organized as follows: Some related works All the CR nodes are tuned to the same frequency channel, which is
are briefly described in Section II. Section III introduces the adaptive designated as the “operating channel,” among N frequency channels.
994 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 2, FEBRUARY 2010

The CR network utilizes the operating channel until it is reclaimed


by the PU. On the operating channel, the CR nodes exchange user
data. In addition, the MN senses the operating channel to monitor the
PU activity. If the PU is detected, then the MN switches the operating
channel to another channel and directs the SNs to move to the new
operating channel.

B. Adaptive Sensing Structure


As illustrated in Fig. 1(b), the MN chooses its next action at each
decision epoch, which occurs at the end of each action. The decision
epoch is indexed by t(= 1, . . .). At each decision epoch, the MN
selects the next action among data transmission, sensing, and channel
switching. This decision is made by the decision-making algorithm,
which is described in Section IV. From now on, we explain the
operation of the CR network when each action is selected.
1) Data Transmission: If the MN is convinced that the operating Fig. 2. Example pmf’s of the quantized sensing result ζt when the PU is inac-
channel is vacant, then it selects data transmission. In this case, the CR tive and active. The bandwidth of a frequency channel W is 1 MHz, the length
nodes exchange user data by using the time-division multiple-access of a sensing period TS is 0.1 ms, and the SNR of the PU signal is −10 dB.
scheme during the data transmission period of which the length is TD .
At the beginning of the data transmission period, the MN sends the
control signal, which contains the time allocation, to the SNs. The SNs
transmit or receive user data according to this time allocation.
2) Sensing: When the MN is not sure whether the PU exists or
not on the operating channel, it chooses sensing to perform energy
detection on the operating channel during a sensing period of which
the length is TS . No control signal is sent by the MN in this case. The
SNs should be quiet if they do not receive a control signal.
During the sensing period immediately after the tth decision epoch,
the energy detector generates a soft sensing result ξt from the input
signal. If the bandwidth of a frequency channel is denoted by W , the
energy detector takes W · TS baseband complex signal samples during
a sensing period. Let yt,i denote the ith signal sample in the sensing
period immediately after the tth decision epoch. It is assumed that the
MN is aware of the noise spectral density No . The energy detector
estimates the energy in the signal samples and normalizes it by No /2
to derive the sensing result ξt as follows:
Fig. 3. Example of the quantized sensing results over time.

1 
W ·TS
simple, it corresponds to the suboptimal channel-selection strategy
ξt = |yt,i |2 . (1) proposed in [5].
No /2
i=1

To efficiently process the sensing result, the proposed scheme C. Operation of Adaptive Sensing CR
quantizes ξt to produce the quantized sensing result ζt . Let M denote Now, we describe the operation of the adaptive sensing CR and
the number of quantization levels. We define τ1 < · · · < τm < · · · < explain how it can reduce the sensing overhead. In Fig. 2, we can
τM +1 as the thresholds for quantization. For quantization, the MN see that the variance of the sensing result is very large because of the
finds m such that τm ≤ ξt < τm+1 and sets ζt to such an m. To short sensing period. Therefore, even when the PU activity does not
closely approximate the real-valued space, the number of quantization change, the CR observes random sensing results that are different for
levels M should be sufficiently large. Fig. 2 shows examples of each sensing period, as shown in the example of quantized sensing
probability mass functions (pmf’s) of the quantized sensing result results over time in Fig. 3. In this figure, the PU is inactive throughout
when the PU is inactive and when it is active. In this figure, the time. The lengths of a data transmission period and a sensing period
number of quantization level is 20, and the thresholds are τ1 = 0, are both 0.1 ms. A time duration with no sensing result corresponds to
τm = 120 + 10 · (m − 2) for m = 2, . . . , 20, and τ21 = ∞. a data-transmission period.
3) Channel Switching: If it is highly probable that a PU exists in In the example shown in Fig. 3, the CR performs channel sensing
the operating channel, then the MN selects channel switching and from 0.1 to 0.2 ms and obtains the quantized sensing result of 12. In
sends a control signal that orders the SNs to switch the operating Fig. 2, we can see that the quantized sensing result of 12 indicates that
channel. We assume that it takes TC to complete the channel-switching the channel is more occupied than empty; therefore, the CR cannot
process and be ready to choose another action, since the CR nodes be certain that the PU is inactive, and it performs more sensing. The
should tune their frequency band and perform a synchronization CR starts to transmit data at 0.6 ms after two consecutive low-sensing
process. When channel switching is selected, the CR network simply results obtained from 0. ms to 0.6 ms. Therefore, it takes five sensing
moves to the next adjacent frequency channel. If the current operating periods, which are from 0.1 to 0.6 ms, to start data transmission. After
channel is the last channel, then the CR network moves to the first fre- one data transmission period, the CR decides that more sensing is
quency channel. Although this channel-selection strategy seems very needed at 0.7 ms. From 0.7 to 0.8 ms, the CR fortunately obtains a low
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 2, FEBRUARY 2010 995

value of the sensing result, i.e., 9. Thus, the CR quickly determines that are very large compared with TD and TS . Based on this assumption,
the PU is inactive only after one sensing period, and it resumes data the state transition probability when the action is data transmission
transmission at 0.8 ms. After 0.8 ms, we can also see that a variable or sensing (i.e., a = D or S) can be calculated as pa0,1 = 1 − e−λTa ,
number of sensing periods are required to resume data transmission, pa1,0 = 1 − e−μTa , pa0,0 = e−λTa , and pa1,1 = e−μTa .
depending on the sensing results. If the selected action is channel switching, then the CR moves to the
This example shows that the proposed CR finds out the appropriate next frequency channel. We assume that there are so many frequency
sensing time on the basis of the sensing results, whereas the periodic channels that it takes a very long time to visit the same channel again.
sensing CR senses the channel for a fixed time period. Therefore, the When the CR revisits a channel that was visited a long time ago,
proposed CR can avoid unnecessary sensing and reduce the sensing the probability distribution of the states on the channel is already
overhead. In fact, this gain is similar to the gain achieved by using converged to the stationary probability distribution. Therefore, if the
the sequential detection algorithm [17]. The sequential detection al- action is channel switching, the state transition probability is given as
gorithm is known to outperform the conventional fixed-time detection pC C
i,0 = μ/(λ + μ) and pi,1 = λ/(λ + μ) for i = 0 and 1.
algorithms by a very wide margin, and it generally requires one-half 3) Observation Model: Let rt denote the observation that the MN
to one-third sensing time in average. A similar amount of performance receives after the tth decision epoch. When the MN chooses sensing
gain is also expected for the proposed CR. at the tth decision epoch (i.e., at = S), it performs energy detection
Since the sequential detection algorithm is a sort of detection on the operating channel, calculates the sensing result ξt from (1), and
method, it has no particular consideration for the CR. For example, quantizes ξt to generate ζt . In this case, the MN takes the quantized
the sequential detection algorithm does not take into account possible sensing result ζt as the observation. That is, if at = S, then we have
changes in the PU activity during the sensing periods, which the rt = ζt . If the MN selects data transmission or channel switching,
proposed CR carefully considers. Moreover, the proposed CR can then it obtains no sensing result. In this case, the MN receives a null
adaptively decide when to stop data transmission for more sensing observation ∅, and therefore, rt = ∅ for t’s such that at = D or C.
a
and when to switch the operating channel, as shown in the example in We define qi,r as the probability that MN receives r as the obser-
a
Fig. 3. Therefore, the proposed CR integrates all the CR functionalities vation when the action a is taken in state i. That is, qi,r := Pr{rt =
into one unified decision-theoretic framework, thereby combining the r|st+1 = i, at = a}. If the selected action at the tth decision epoch
known advantages of the sequential detection algorithm with the is data transmission or channel switching, then the observation rt is
gains from the adaptive decisions of data transmission and channel always ∅. Therefore, if a = D or C, then we simply have qi,r a
=1
switching. for r = ∅ and qi,r = 0 for r = 1, . . . , M , regardless of the state
a

i. On the other hand, the observation rt is equal to the quantized


IV. D ECISION -M AKING A LGORITHM sensing result when sensing is selected at the tth decision epoch.
Therefore, it holds that qi,rS
= 0 for r = ∅ and qi,rS
= Pr{τr ≤ ξt <
A. POMDP Formulation τr+1 |st+1 = i, at = S} for r = 1, . . . , M . We can calculate q0,r
S
and
S
The decision-making algorithm decides the next action among data q1,r for r = 1, . . . , M from the probability density functions of the
transmission, sensing, and channel switching at each decision epoch. sensing result ξt under the condition that the PU is inactive and active,
In this section, we model the adaptive sensing CR as a POMDP to respectively. According to [10], ξt follows the chi-square distribution
design the optimal decision-making algorithm. In [6] and [7], it is with 2 · W · TS degrees of freedom if the PU is inactive. If the PU
shown that the PU activity can be modeled as a Markov process with is active, then ξt follows the noncentral chi-square distribution with
two states: The PU is active in one state and is inactive in the other 2 · W · TS degrees of freedom and the noncentrality parameter of
state. In addition, the CR does not have knowledge of the true state 2 · P · TS /No , where P denotes the received PU signal power.
(i.e., PU activity) but only infers it from the noisy sensing results. This 4) Objective Function and Reward Model: In the POMDP frame-
setting matches very well with the POMDP framework. A POMDP is work, the objective function is the total expected discounted reward
defined by states, actions, state transition probabilities, observations,
and rewards. From now on, we will define them one by one. See [11] ∞ 

and [12] for more information about the POMDP framework. E t
β R(st , at ) (2)
1) State and Action: Let st be the state at the tth decision epoch. t=1
The state st indicates the PU activity on the current operating channel.
The state st is 0 if the operating channel is vacant at the tth decision
epoch and is 1 if the operating channel is occupied by the PU at the tth where 0 < β < 1 is a discount factor, and R(s, a) is a reward when
decision epoch. We do not consider the PU activities on the frequency the action a is selected in state s.
channels other than the operating channel because they are needed only The decision-making algorithm tries to maximize this objective
for frequency channel selection, to which we do not pay attention in function. Thus, the reward R(s, a) should be given in such a way
this paper. The action selected at the tth decision epoch is denoted by at . that the channel utilization can be maximized while the collision
The value of at can be selected among D, S, and C, which stand for probability is minimized. The reward R(0, D) should be a positive
data transmission, sensing, and channel switching, respectively. value, since user data are successfully transmitted without collision
2) State Transition Probability: It is assumed that the PU is acti- when data transmission is selected in state 0. On the other hand, if
vated on a channel according to the Markov process with the rate of λ, data transmission is selected in state 1, then a collision occurs, and
and the PU sojourns on the channel for an exponentially distributed therefore, R(1, D) should be a negative value. If sensing or channel
time with the average of 1/μ. We also assume that the MN is aware switching is chosen, then the time is consumed without transmitting
of λ and μ. From these parameters, the state transition probability can any data, and therefore, R(i, S) and R(i, C) for i = 0, 1 should be
be calculated as follows. Let pai,j be the state transition probability less than or equal to zero.
from state i to state j when the action a is taken. That is, pai,j := Under these constraints, the values of the rewards can be varied to
Pr{st+1 = j|st = i, at = a}. We assume that the state of the PU control the tradeoff between the channel utilization and the collision
can change only once during a data-transmission period or during probability. For example, we can reduce the collision probability at the
a sensing period, since the interarrival and sojourn times of the PU expense of channel utilization by decreasing the value of R(0, D).
996 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 2, FEBRUARY 2010

B. Solution to POMDP
To choose an appropriate action, the decision-making algorithm
calculates the “belief vector” at each decision epoch. In [11], it is
shown that the belief vector contains all the necessary information
for making an optimal decision. Thus, the decision-making algorithm
selects the next action based on the belief vector. The “policy” is a
function that maps the belief vector to the next action. Among the
policies, we aim to find the optimal policy that maximizes the objective
function. From now on, we will explain how to calculate the belief
vector and how to find the optimal policy.
1) Belief Vector: The belief vector at the tth decision epoch is
denoted by π t = (πt0 , πt1 ), where πti is the probability of the state
i at the tth decision epoch, which is inferred by the CR on the
basis of the previous actions and observations. Since the CR does not
have any knowledge of the channel when it is first switched on, the
initial probability π 1 is given as the stationary probability vector γ :=
(μ/(λ + μ), λ/(λ + μ)). After the tth decision epoch, the decision-
making algorithm updates π t to π t+1 on the basis of the action at and
the observation rt .
If the action at is D, then the decision-making algorithm receives Fig. 4. Example of the optimal value function for each action Ua∗ (π) and
a null observation ∅, regardless of the true state. This means that the the optimal policy as a function of the probability that the channel is occupied
decision-making algorithm obtains no information about the true state. π 1 . The bandwidth of a frequency channel W is 1 MHz, and the SNR of the
In this case, the belief vector evolves according to the state transition PU signal is −10 dB. The other parameters are as follows: β = 0.99, TD =
0.1 ms, TS = 0.1 ms, TC = 1 ms, 1/λ = 10 ms, 1/μ = 10 ms, R0D = 1,
probability. That is, the algorithm updates the belief vector as π t+1 =
R1D = −1, R0S = R1S = 0, and R0C = R1C = −0.1.
η(π t ), where
 1 
 
1 adopted. According to [11], the optimal value function of our POMDP
η(π) := pD j
j,0 π , pD
j,1 π
j
. (3) model satisfies V ∗ (π) = maxa∈{D,S,C} Ua∗ (π), where
j=0 j=0

1

UD (π) = πj R(j, D) + βV ∗ (η(π)) (6)
In this equation, we define π := (π 0 , π 1 ).
j=0
If the action at is S, then the MN performs energy detection, and
the decision-making algorithm receives a quantized sensing result as
an observation. In addition to the state transition, the quantized sensing

1 
M

US∗ (π) = πj R(j, S) + β σ(r, π)V ∗ (θ(r, π)) (7)


result is also taken into account by using Bayes’ theorem. The belief
j=0 r=1
vector is calculated as π t+1 = θ(rt , π t ), where
 1 1  
1
S
q0,r j=0
pS
j,0 π
j S
q1,r pS π j
j=0 j,1 UC∗ (π) = πj R(j, C) + βV ∗ (γ). (8)
θ(r, π) := , (4)
σ(r, π) σ(r, π) j=0

From the optimal value function, we can derive the optimal policy as

1 
1
S
σ(r, π) := q0,r pS
j,0 π
j
+ S
q1,r pS
j,1 π
j
(5)
j=0 j=0 δ ∗ (π) = argmax Ua∗ (π). (9)
a∈{D,S,C}

for r = 1, . . . , M . From (4) and (5) and Fig. 2, we can see that the
1
belief that the state is 1 (i.e., πt+1 ) increases as the quantized value We can calculate the optimal value function and the optimal policy
of the sensing result increases. This corresponds to the fact that a high by using dynamic programming, specifically the fixed-grid method
value of the sensing result indicates a high probability that the channel in [12]. The optimal policy δ ∗ should be calculated and stored in the
is occupied. Thus, the soft sensing result is well taken into account in MN prior to real-time operation, during which, the MN uses δ ∗ to
updating the belief vector. select actions.
If the action at is C, then the CR network moves to the next channel. Although the optimal policy can easily be calculated by using
In this case, the belief vector becomes the stationary probability vector, dynamic programming, we can further simplify the calculation by
that is, π t+1 = γ. exploiting the threshold structure of the optimal policy. In Fig. 4, we
2) Optimal Policy: The proposed CR makes a decision on the basis show an example of the optimal value function and the corresponding
of the belief vector. The policy δ: Π → {D, S, C} is a function map- optimal policy. In this figure, we can see that the optimal policy can
ping a belief vector to an action, where Π := {(π 0 , π 1 )|π 0 +π 1 = 1, be represented by the lower threshold L and the upper threshold H.
π 0 ≥ 0, π 1 ≥ 0}. At the tth decision epoch, the decision-making We found that the optimal policies with different parameters also have
algorithm selects δ(π t ) as an action. Among all the possible policies, similar structures to this example. Therefore, from these observations,
the optimal policy δ ∗ is the one that maximizes the objective func- we can empirically reduce the optimal policy to
tion (2). Our target is to find the optimal policy.

To this end, we define the optimal value function. The optimal D, if π 1 < L

value function V ∗ : Π →  maps the current belief vector to the δ (π) = S, if L ≤ π 1 < H (10)
total expected reward that will be earned when the optimal policy is C, if π 1 ≥ H.
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 2, FEBRUARY 2010 997

Fig. 6. Example variation of the probability that the channel is occupied by the
Fig. 5. Algorithm for the adaptive sensing CR. PU πt1 over time. The parameters are as follows: L = 0.02, H = 0.6, 1/λ =
10 ms, and 1/μ = 10 ms.
From (10), it is possible to derive the optimal policy by just deciding
two thresholds L and H instead of calculating the optimal policy by
using dynamic programming. These thresholds affect the performance
of the CR network, such as the channel utilization and the collision
probability. We will explain how to choose the thresholds in Section V.
We summarize the algorithm for the adaptive sensing CR in Fig. 5.
In this algorithm, the CR selects actions by comparing πt1 with the
thresholds L and H. This can intuitively be explained as follows. To
minimize collision, the CR allows data transmission only if the channel
is likely to be vacant (i.e., if πt1 < L). If the CR is uncertain about
the channel (i.e., if L ≤ πt1 < H), then it performs channel sensing
to learn the channel state. The CR switches the operating channel if
it is highly probable that the channel is occupied by the PU (i.e., if
πt1 ≥ H).

V. N UMERICAL R ESULTS
In this section, we present some simulation results on the perfor- Fig. 7. Channel utilization, collision probability, and channel-switching time
mance of the proposed CR. There are ten frequency channels, each of proportion of the adaptive sensing CR according to the lower and upper
which has a 1-MHz bandwidth. Unless otherwise noted, the SNR of the thresholds.
PU signal is −10 dB. For the adaptive sensing CR, we set TD = TS =
0.1 ms. The time required to complete a channel-switching process switching time proportion” is defined as the proportion of time that the
(i.e., TC ) is set to 1 ms. CR nodes spend performing channel-switching processes.
Fig. 6 shows an example of variation of πt1 over time. In this In Fig. 7, it can be seen that the lower threshold L controls two
example, we set L = 0.02 and H = 0.6. At the beginning, the op- important performance measures, i.e., the channel utilization and the
erating channel is free of the PU. Whenever πt1 exceeds L = 0.02, the collision probability. By increasing L, we can enhance the channel
CR performs energy detection. It can be seen that data transmission utilization at the cost of the collision probability. We can also see
resumes when πt1 goes below L. This means that only the necessary that the channel switching time proportion is affected by the upper
amount of sensing is conducted by adaptive decision. The proposed threshold H. From the simulation results, we can determine the
CR avoids unnecessary sensing by this mechanism, and it can thus lower and upper thresholds that allow the CR network to achieve a
outperform the periodic sensing CR. At 5.2 ms, the PU is activated. given target performance. For example, suppose that the objective is
At 5.8 ms, πt1 exceeds H = 0.6, and the CR changes the operating to maximize the channel utilization while maintaining the collision
channel. probability below 0.01. We can achieve a channel utilization of 0.75
In Fig. 7, we show the channel utilization, the collision probability, and a collision probability of 0.01 by choosing 0.015 as the value of
and the channel-switching time proportion of the adaptive sensing CR L, based on Fig. 7. In addition, we can choose 0.9 as the value of H if
according to the lower and upper thresholds. The “channel utilization” less-frequent channel switching is preferred for stability.
is defined as the proportion of time in which the CR nodes successfully In Figs. 8 and 9, we compare the adaptive sensing CR with the
exchange data without interrupting the PU. The “collision probability” periodic sensing CR in terms of channel utilization and collision
is defined as the proportion of time in which the CR nodes transmit probability. The periodic sensing CR is described in Fig. 1(a). It is
data when the operating channel is occupied by the PU. The “channel- noted that the periodic sensing CR is virtually the same as the myopic
998 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 2, FEBRUARY 2010

10 or 100 ms. The SNR of the PU signal is different for each channel
occupation, and it is determined according to the uniform distribution
on [−10 dB, −7 dB]. The estimate of the SNR by the adaptive sensing
CR is −10 dB. In Fig. 9, we can see that the adaptive sensing CR
still outperforms the periodic sensing CR, even when the estimates are
incorrect.

VI. C ONCLUSION
In this paper, we have proposed the adaptive sensing CR. The
performance results show that the proposed CR is robust to fast PU
state variation. Thus, the proposed CR provides an efficient way to
exploit the temporary spectrum opportunities caused by bursty PU data
traffic.

ACKNOWLEDGMENT
Fig. 8. Tradeoff between channel utilization and collision probability of the
periodic and the adaptive sensing CRs when λ, μ, and the SNR of the PU signal The author would like to thank the associate editor and the anony-
are fixed and known to the adaptive sensing CR. mous reviewers for their valuable comments, which greatly improved
this paper.

R EFERENCES
[1] S. Haykin, “Cognitive radio: Brain-empowered wireless communica-
tions,” IEEE J. Sel. Areas Commun., vol. 23, no. 2, pp. 201–220,
Feb. 2005.
[2] Y. C. Liang, Y. Zeng, E. Peh, and T. Hoang, “Sensing-throughput trade-
off for cognitive radio networks,” IEEE Trans. Commun., vol. 7, no. 4,
pp. 1326–1337, Apr. 2008.
[3] W. Y. Lee and I. F. Akyildiz, “Optimal spectrum sensing framework for
cognitive radio networks,” IEEE Trans. Wireless Commun., vol. 7, no. 10,
pp. 3845–3857, Oct. 2008.
[4] Q. Zhao and Y. Chen, “Decentralized cognitive MAC for opportunistic
spectrum access in ad hoc networks: A POMDP framework,” IEEE J. Sel.
Areas Commun., vol. 25, no. 3, pp. 589–600, Apr. 2007.
[5] Q. Zhao, B. Krishnamachari, and K. Liu, “On myopic sensing for multi-
channel opportunistic access: Structure, optimality, and performance,”
IEEE Trans. Wireless Commun., vol. 7, no. 12, pp. 5431–5440, Dec. 2008.
[6] S. Geirhofer, L. Tong, and B. M. Sadler, “Dynamic spectrum access in
the time domain: Modeling and exploiting white space,” IEEE Commun.
Mag., vol. 45, no. 5, pp. 66–72, May 2007.
Fig. 9. Tradeoff between channel utilization and collision probability of the [7] S. Geirhofer, L. Tong, and B. M. Sadler, “Cognitive medium access:
periodic and the adaptive sensing CRs when λ and μ are unknown to the CR Constraining interference based on experimental models,” IEEE J. Sel.
and the SNR of the PU signal is random. Areas Commun., vol. 26, no. 1, pp. 95–105, Jan. 2008.
[8] Q. Zhao, S. Geirhofer, L. Tong, and B. M. Sadler, “Opportunistic spec-
sensing policy proposed in [5], since it also switches the channel to the trum access via periodic channel sensing,” IEEE Trans. Signal Process.,
vol. 56, no. 2, pp. 785–796, Feb. 2008.
next in a circular order. In the case of the periodic sensing CR, the time [9] S. D. Jones, N. Merheb, and I. J. Wang, “An experiment for sensing-
required for channel switching is also set to 1 ms. The periodic sensing based opportunistic spectrum access in CSMA/CA networks,” in Proc.
CR requires the following three parameters for operation: 1) the length DySPAN, Baltimore, MD, Nov. 2005, pp. 593–596.
of a data transmission period; 2) the length of a sensing period; and [10] H. Urkowitz, “Energy detection of unknown deterministic signals,” Proc.
IEEE, vol. 55, no. 4, pp. 523–531, Apr. 1967.
3) the detection threshold. The graphs of the performance of the
[11] G. E. Monahan, “A survey of partially observable Markov decision
periodic sensing CR are plotted by varying these parameters. processes: Theory, models, and algorithms,” Manage. Sci., vol. 28, no. 1,
On the other hand, we plot the graphs of the adaptive sensing CR pp. 1–16, Jan. 1982.
by varying the thresholds L and H. In addition to these thresholds, the [12] W. S. Lovejoy, “A survey of algorithmic methods for partially observable
adaptive sensing CR requires information about the PU, including λ, Markov decision processes,” Ann. Oper. Res., vol. 28, no. 1, pp. 47–66,
Dec. 1991.
μ, and the SNR of the PU. To obtain the simulation results in Fig. 8, [13] M. Wellens, A. de Baynast, and P. Mahonen, “Exploiting historical spec-
we assume that λ, μ, and the SNR are fixed and known to the adaptive trum occupancy information for adaptive spectrum sensing,” in Proc.
sensing CR. In this figure, we can see that the proposed CR has much IEEE WCNC, Las Vegas, NV, Mar. 2008, pp. 717–722.
higher channel utilization than the periodic sensing CR for the same [14] D. Datla, R. Rajbanshi, A. M. Wyglinski, and G. J. Minden, “Parametric
adaptive spectrum sensing framework for dynamic spectrum access net-
collision probability, and the performance gap is greater under adverse
work,” in Proc. DySPAN, Dublin, Ireland, Apr. 2007, pp. 482–485.
conditions, i.e., fast PU state variation and a low collision probability [15] W. A. Gardner, “Signal interception: A unifying theoretical framework
requirement. for feature detection,” IEEE Trans. Commun., vol. 36, no. 8, pp. 897–906,
To show that the adaptive sensing CR operates well even when Aug. 1988.
it does not know exact information about the PU, we present the [16] K. W. Choi, W. S. Jeon, and D. G. Jeong, “Sequential detection of
cyclostationary signal for cognitive radio systems,” IEEE Trans. Wireless
simulation results in Fig. 9 on the assumption that λ and μ are Commun., vol. 8, no. 9, pp. 4480–4485, Sep. 2009.
unknown to the CR and the SNR is random. The adaptive sensing CR [17] T. Kailath and H. V. Poor, “Detection of stochastic process,” IEEE Trans.
estimates that both 1/λ and 1/μ are 50 ms, whereas the real values are Inf. Theory, vol. 44, no. 6, pp. 2230–2259, Nov. 1998.

También podría gustarte