Lie Gonzalez Ch8

Chapter 8 : Image
Compression
CCU, Taiwan
Wen-Nung Lie
Applications of image
compression
CCU, Taiwan
Wen-Nung Lie
Televideo conferencing,
Remote sensing (satellite imagery),
Document and medical imaging,
Facsimile transmission (FAX),
Control of remotely piloted vehicles in military,
space, and hazardous waste management
8-1
Fundamentals
What is data and hat is information ?
Data redundancy
CCU, Taiwan
Wen-Nung Lie
Data are the means by which information is conveyed.

Various amounts of data may be used to represent the
same amount of information
Coding redundancy
Interpixel redundancy
Psychovisual redundancy
8-2
Coding redundancy
The graylevel histogram of an image can provide a great

deal of insight into the construction of codes
The average number of bits to represent each pixel
L 1
Lavg = l (rk ) pr (rk )

k =0
l (rk ) is the average number of bits to represent each value of rk

Variable length coding (VLC) : Higher probability, shorter bit length
Lavg = 3.0 bits (code1)

2.7 bits (code2)
CCU, Taiwan
Wen-Nung Lie
8-3
Autocorrelation coefficients
A(n)
(n) =
A(0)
CCU, Taiwan
Wen-Nung Lie
1 N 1 n
A(n) =
f ( x, y ) f ( x, y + n)
N n y =0
Spatial redundancy
Geometrical redundancy
Interframe redundancy
Larger autocorrelation, more interpixel

redundancy
8-4
Run-length coding to remove

spatial redundancy
C R = 2.63
RD = 1
CCU, Taiwan
Wen-Nung Lie
1
= 0.62
2.63
8-5
Psychovisual redundancy
Certain information simply has less relative importance

than others in normal visual processing psychovisual
redundancy
The elimination of psychovisually redundant data results
in a loss of quantitative information called quantization
lossy data compression
E.g., quantization in graylevels

E.g., line interlacing in TV
(reduced video scanning rate)
(b) False contouring on quantization
CCU, Taiwan
Wen-Nung Lie
(c) Improve graylevel quantization

via dithering
8-6
Fidelity criteria
Objective fidelity criterion
Root-mean-square error
1
2
M
1
N
1
2
erms =
[ f ( x, y ) f ( x, y )]
MN x =0 y =0
Mean-square signal-to-noise ratio
M 1 N 1
SNRms =
f ( x, y)
x =0 y =0
M 1 N 1
Subjective fidelity criterion
CCU, Taiwan
Wen-Nung Lie
f ( x, y ) f ( x, y ) 2
x =0 y =0
Much worse, worse, slightly worse, the same, slightly better, better,
much better
8-7
Image compression model
Source encoder
Channel encoder
CCU, Taiwan
Wen-Nung Lie
Remove input redundancies

Increase the noise immunity of the source encoders
output
8-8
Source encoder model
Mapper
Quantizer
Reduce the accuracy of the mappers output (reduce the

psychovisual redundancies) irreversible and must be omitted on
error-free compression
Symbol encoder
CCU, Taiwan
Wen-Nung Lie
Transform the input data

into a format designed to
reduce interpixel
redundancies in the image
(reversible)
Fixed or variable-length coder (assign the shortest code words to

the most frequently occurring output to reduce coding
redundancies) -- reversible
8-9
Channel encoder and decoder
Designed to reduce the impact of channel noise by

inserting a controlled form of redundancy into the source
encoded data
Hamming code as a channel code
7-bit Hamming (7,4) the minimum distance is 3 and could

correct one bit error
h = b b b
h =b
1
h1 h2 h4 are even
parity bits
h2 = b3 b1 b0
h5 = b2
h4 = b2 b1 b0
h6 = b1
h7 = b0
A single-bit error is indicated by a nonzero parity word c4c2c1

c1 = h1 h3 h5 h7
CCU, Taiwan
Wen-Nung Lie
c2 = h2 h3 h6 h7
c4 = h4 h5 h6 h7
8-10
Information theory
Explore the minimum amount of data that is

sufficient to describe completely an image without
loss of information
Self-information of an event E with probability
P(E)
1
I ( E ) = log
= log P( E )
P( E )
P(E)=1 I(E)=0
P(E)=1/2 I(E)=1 bit
CCU, Taiwan
Wen-Nung Lie
The base of logarithm

determines the information units
8-11
Information channel
J
P(a j ) = 1
Source alphabet A = {a1 , a2 ,..., a J }
j =1
Symbol probability z = [P(a1 ), P(a2 ),..., P(a J )]T
Ensemble (A, z) describes the information source
completely
Average information per source output
J
H (z ) = P(a j ) log P(a j )

j =1
H(z) is the uncertainty or entropy of the source or

the average amount of information (in m-ary units)
obtained by observing a single source output
CCU, Taiwan
Wen-Nung Lie
For equally probable source symbols, entropy is maximized

8-12
Channel alphabet B = {b1 , b2 ,..., bK }

T
[
]
v
=
P
(
b
),
P
(
b
),...,
P
(
b
)
Channel description (B, v)
K
1
2
J
P(bk ) = P(bk a j ) P(a j )

j =1
P (b1 a1 ) P(b1 a2 )
...
P (b2 a1 )
.
...
Q=
.
...
P (b a ) P(b a )
K 1
K 2
CCU, Taiwan
Wen-Nung Lie
v = Qz
P (b1 a J )
... ...
.
... ...
.
... ... P (bK a J )
... ...
... ...
Q : Forward channel transition

matrix, channel matrix
8-13
Conditional entropy function

J
H (z bk ) = P(a j bk ) log P(a j bk )

j =1
H (z v ) = H (z bk ) P(bk )
k =1
H (z v ) = P(a j , bk ) log P(a j bk )

j =1 k =1
H ( z v ) is called the equivocation of z wrt v.

The difference between H (z) and H ( z v ) is the average
information received upon observing a single output
symbol -- mutual information I ( z, v )
I ( z, v ) = H ( z ) H ( z v )
CCU, Taiwan
Wen-Nung Lie
8-14
Mutual information
J
I ( z, v ) = P (a j , bk ) log
j =1 k =1
J
P( a j , bk )
P (a j ) P (bk )
= P ( a j )qkj log
j =1 k =1
P(a )q
i =1
Q={qij}
qkj
i
ki
The minimum possible value of I(z,v) is zero and occurs

when the input and output symbols are statistically
independent (i.e., P(a j ,bk ) = P(a j ) P(bk ) )
The maximum value of I(z,v) for all possible z is called the
capacity C described by channel matrix Q
C = max[ I ( z, v )]
z
CCU, Taiwan
Wen-Nung Lie
8-15
Channel capacity
CCU, Taiwan
Wen-Nung Lie
Capacity : the maximum rate at which information can be

transmitted reliably through the channel
Channel capacity does not depend on the input probability
of the source (i.e., on how the channel is used) but is a
function of the conditional probabilities defining the
channel alone (i.e., Q)
8-16
Binary entropy function
To estimate source entropy
For binary source z = [ pbs ,1 pbs ]T

H ( z ) = pbs log2 pbs pbs log2 pbs
Max ( H bs ) = 1 bit
when pbs = 0.5
Plot_1
Binary symmetrical channel (BSC)
error probability : pe
channel matrix : Q = pe pe
pe pe
To estimate channel capacity
pe pbs + pe pbs
probability of received symbols : v = Qz =
pe pbs + pe pbs
mutual information : I ( z, v ) = H bs ( pbs pe + pbs pe ) H bs ( pe )
Plot_2
I(z,v)=0 when pbs=0 or 1

CCU, Taiwan
Wen-Nung Lie
I(z,v)=1-Hbs(pe)=C when pbs=0.5 and any pe
8-17
when pe=0 or 1 C=1 bit

when pe=0.5 C=0 bit (no information transmitted)
CCU, Taiwan
Wen-Nung Lie
8-18
Shannons first theorem for

noiseless coding
Define the minimum average code length per source

symbol that can be achieved
For a zero-memory source (with finite ensemble (A, z) and
statistically independent source symbols)
A = {a1 , a2 ,..., a J }
Single symbol :
Block symbol : (n-tuple of symbols) A = {1 , 2 ,...., J n }
P( i ) = P(a j1 ) P(a j 2 )...P(a jn )
( A, z ) H (z )
( A, z) H (z)
Jn
H (z) = P( i ) log P( i )
i =1
CCU, Taiwan
Wen-Nung Lie
H (z) = nH (z )
Entropy of zero-memory
source with block symbols
8-19
N-extended source
Use n symbols as a block symbol

Average word length of the n-extended code can be
1
= P( i )l ( i ) = P ( i ) log
Lavg
P
(
)
i =1
i =1
i
Jn
Jn
< H (z) + 1
H (z) Lavg
Lavg
1
H (z )
< H (z ) +
n
n
lim[
n
CCU, Taiwan
Wen-Nung Lie
Lavg
n
] = H (z )
/ n arbitrarily close to H(z) by

It is possible to make Lavg
coding infinitely long extensions of the source
8-20
Coding efficiency
H ( z)
Coding efficiency : = n
Lavg
Example :
Original source : P (a1 ) = 2 / 3,

P ( a2 ) = 1 / 3
H ( z ) = 1.83 bits / symbol
= 1 bit / symbol
Lavg
= 0.918 / 1 = 0.918
Second extension source : z = {4 / 9,2 / 9,2 / 9,1 / 9}
= 17 / 9 = 1.89 bits / symbol

Lavg
CCU, Taiwan
Wen-Nung Lie
= 1.83 / 1.89 = 0.97
8-21
Shannons second theorem

for noisy coding
For any R < C (channel capacity of the zero-memory

channel with Q), there exists codes of block length r and
rate R such that the block decoding error can be arbitrarily
small
Rate distortion function
given a distortion D, the minimum rate at which
information about the source can be conveyed to the user
Q : the probability from source symbols to output symbols by compression
Q D = {qkj d (Q) D}
R ( D) = min [ I (z, v )]
QQ D
CCU, Taiwan
Wen-Nung Lie
d(Q) : average distortion

D: distortion threshold
8-22
Example of Rate-distortion
function
Consider a zero-memory binary source with equally

probable source symbols {0,1}
R ( D) = 1 H bs ( D)
CCU, Taiwan
Wen-Nung Lie
R(D) is monotonically decreasing and convex in the interval (0,Dmax)

Shape of R-D function represents the coding efficiency of a coder
8-23
Estimate the information

content (entropy) of an image
Construct a source model based on the relative frequency

of occurrence of the graylevels
First-order estimate (consider histogram of single pixels) : 1.81

bits/pixel
Second-order estimate (consider histogram of pairs of adjacent
pixels) : 1.25 bits/pixel
Third-order, fourth-order : approaching the source entropy
21 21 21 95 169 243 243 243
21 21 21 95 169 243 243 243
21 21 21 95 169 243 243 243
Example image
21 21 21 95 169 243 243 243

CCU, Taiwan
Wen-Nung Lie
8-24
CCU, Taiwan
Wen-Nung Lie
The difference between the higher-order estimates of entropy and

the first-order estimate indicates the presence of interpixel
redundancy
If the pixels are statistically independent, the higher-order
estimates are equivalent to the first-order estimate and variable
length coding (VLC) provides optimal compression (i.e., no further
mapping is necessary)
First-order estimate of difference mapped source : 1.41 bits/pixel
1.41 > 1.25 there is an even better mapping
8-25
Error-free compression -variable-length coding (VLC)
Huffman coding
CCU, Taiwan
Wen-Nung Lie
the most popular method to yield the smallest possible number of

code symbols per source symbol
construct the Huffman tree according to source symbol
probabilities
Code the Huffman tree
Compute the source entropy, average code length, and code
efficiency
8-26
Huffman code is :
a block code (each symbol is mapped to a fixed sequence of bits)

instantaneous (decoded without referencing succeeding symbols)
uniquely decodable (any code word is not a prefix of another)
for example : 010100111100 a3 a1 a2 a2 a6
Lavg = 2.2 bits / symbol H(z)=2.14 bits/symbol
CCU, Taiwan
Wen-Nung Lie
= 0.973
8-27
Variations of VLC
CCU, Taiwan
Wen-Nung Lie
8-28
Variations of VLC (cont.)
Considered when a large number of source symbols are

considered
Truncated Huffman coding
B-code
CCU, Taiwan
Wen-Nung Lie
Only partial symbols (with most probabilities, here the top 12) are
encoded with Huffman code
A prefix code (whose probability was the sum of other symbols,
here 10) followed with a fixed-length code is used to represent
all the other symbols
Each code word is made up of continuation bits (C) and
information bits (natural binary numbers)
C can be 1 or 0, but alternative
E.g., a11a2 a7 001010101000010 or 101110001100110
8-29
Variations of VLC (cont.)
Binary shift code
Huffman shift code
CCU, Taiwan
Wen-Nung Lie
Divide the symbols into blocks of equal size

Code the individual elements within all blocks identically
Add shift-up or shift-down symbols to identify each block (here,
111)
Select one reference block
Sum up probabilities of all other blocks and use it to determine the
shift symbol by Huffman method (here, 00)
8-30
Arithmetic coding
CCU, Taiwan
Wen-Nung Lie
AC generates non-block codes (no

look-up table as in Huffman code)
An entire sequence of source symbols
is assigned a single code word
The code word itself defines an
interval of real numbers between 0 and
1. As the number of symbols in the
message increases, the interval
becomes smaller (more information
bits to represent this real number)
Multiple symbols are integratedly
coded with a set of bits number of
bits per symbol is effectively
fractional
8-31
Arithmetic coding (cont.)
Example : coding of a1a2a3a3a4
Any real number between [0.06752, 0.0688] can represent

the source symbols (e.g., 0.000100011=0.068359375)
Theoretically, 0.068 is enough to represent the source
symbol 3 decimal digits 0.6 decimal digits per
symbol
Actually, 0.068359375 9 bits for 5 symbols 1.8 bits
per symbol
As the length of the source symbol sequence increases, the
resulting arithmetic code approaches the Shannons bound
Two factors limit the coding performance
CCU, Taiwan
Wen-Nung Lie
An end-of-message indicator is necessary to separate one

message sequence from another
The finite precision arithmetic
8-32
LZW (Lempel-Ziv-Welch)
coding
Assign fixed-length code words to variable-length

sequences of source symbols but require no a priori
knowledge of the probabilities of occurrence of symbols
LZW compression has been integrated into GIF, TIFF, and
PDF formats
For a 9-bit (512 words) dictionary, the latter half entries
are constructed in the encoding process. An entry of two or
multiple pixels can be possible (i.e., assigned with a 9 bits
code)
CCU, Taiwan
Wen-Nung Lie
If the size of the dictionary is too small, the detection of matching

gray-level sequences will be less likely.
If too large, the size of code words will adversely affect
compression performance
8-33
LZW coding (cont.)
CCU, Taiwan
Wen-Nung Lie
The LZW decoder builds an

identical decompression
dictionary as it decodes
simultaneously the bit
stream
Notice to handle the
dictionary overflow
8-34
Bit-plane coding
Bit-plane decomposition
Binary : bit change between two adjacent codes may be

significant (e.g., 127 and 128)
Gray code : only 1 bit is changed between any two
adjacent codes
am 1 2 m 1 + am 2 2 m 2 + ... + a1 21 + a0 20
g i = ai ai +1
g m 1 = am 1
CCU, Taiwan
Wen-Nung Lie
0i m2
Gray-coded bit planes are less complex than the

corresponding binary bit planes
8-35
CCU, Taiwan
Wen-Nung Lie
Binary-coded
gray-coded
8-36
Lossless compression for

binary images
1-D run-length coding
RLC+VLC according to run-lengths statistics
2-D run-length coding
used for FAX image compression

Relative address coding (RAC)
CCU, Taiwan
Wen-Nung Lie
based on the principle of tracking the binary transitions that

begin and end each black and white run
combined with VLC
8-37
Other methods for lossless

compression of binary images
White block skipping
Direct contour tracing
a scan-line-oriented contour tracing procedure
Comparisons of various algorithms
CCU, Taiwan
Wen-Nung Lie
represent each contour by a single boundary point and a set of

directionals
Predictive differential quantizing (PDO)
code the solid lines as 0s and all other lines with a 1 followed by
original bit patterns
only entropies after pixel mapping are computed, instead of real

encoding (see next slice)
8-38
Comparison of various
methods
CCU, Taiwan
Wen-Nung Lie
Run-length coding proved to

be the best coding method for
bit-plane coded images
2-D techniques (PDQ, DDC,
RAC) perform better when
compressing binary images or
higher bit-plane images
Gray-coded images proved to
gain additional 1 bit/pixel
compression efficiency relative
to binary-coded images
For binary image

8-39
Lossless predictive coding
Eliminating the interpixel redundancies of closely spaced pixels

by extracting and coding only the new information (the
difference between the actual and predicted value) in each pixel
System architecture
CCU, Taiwan
Wen-Nung Lie
the same predictor in the encoder and decoder sides en = f n fn

rounding the predicted value
RLC symbol encoder
8-40
Methods of prediction
Use local, global, or adaptive predictor to generate fn

linear prediction : linear combination of m previous pixels
m
fn = round i f n i
i =1
i : prediction coefficients
Use local neighborhoods (e.g., pixels-1,2,3,4) for

prediction of pixel-X in 2-D images 2 3 4
CCU, Taiwan
Wen-Nung Lie
m : order of prediction
special case : previous pixel predictor
8-41
Entropy reduction by
difference mapping
Due to removal of interpixel redundancies by prediction, firstorder entropy of difference mapping will be lower than the
original image (3.96 bits/pixel vs. 6.81 bits/pixel)
The probability density function of the prediction errors is
highly peaked at zero and characterized by a relatively small
variance (modeled by the zero mean uncorrelated Laplacian pdf)
1
pe ( e) =
e
2 e
pe ( e ) =
CCU, Taiwan
Wen-Nung Lie
2e
8-42
Lossy predictive coding
Error-free encoding of images seldom results in more than

3:1 reduction in data
A lossy predictive coding model
the prediction in the encoder and decoder must be equivalent (same)

-- placing encoders predictor within a feedback loop
f&n = e&n + fn
This closed loop
configuration prevents
error buildup at
decoders output
CCU, Taiwan
Wen-Nung Lie
8-43
Delta modulation (DM)
Slope overload effect
is too small to represent inputs large changes, lead to blurred

object edges
Granular noise effect
CCU, Taiwan
Wen-Nung Lie
+ for en > 0
e&n =
otherwise
the resulting bit rate is 1 bit/pixel
DM : fn = f&n 1
is too large to represent inputs small change, lead to grainy or

noisy surfaces
8-44
Delta modulation (cont)
CCU, Taiwan
Wen-Nung Lie
8-45
Optimal predictors
Minimize encoders mean-square prediction error

E{en2 } = E{[ f n fn ]2 }
m
DPCM
fn = i f n i
with f&n = en + fn en + fn = f n
i =1
select m prediction coefficients to minimize
2
m
2
E{en } = E f n i f n i
i =1

1
The solution : = R r
E{ f n f n 1}
E{ f n 1 f n 1} E{ f n 1 f n 2 }
E{ f f }
E{ f f } E{ f f }
2
n
n
n 2 n 1
n 2 n 2
r=
.
R=
.
.
.
.
.
E{ f n m f n 1}
.
E{ f n f n m }
CCU, Taiwan
Wen-Nung Lie
. .
. .
. .
. .
. .
E{ f n 1 f n m }
E{ f n m f n m }
1

2
= .

.
m
8-46
Optimal predictors (cont)
Variance of prediction error
= r = E{ f n f n i } i
2
e
i =1
For a 2-D Markov source with separable autocorrelation

E{ f ( x, y ) f ( x i, y j )} = 2 vi hj
function
and 4-th order linear predictor
2
f ( x, y ) = 1 f ( x, y 1) + 2 f ( x 1, y 1) + 3 f ( x 1, y ) + 4 f ( x 1, y + 1)
1 = h , 2 = v h , 3 = v , 4 = 0
We generally have
i =1
CCU, Taiwan
Wen-Nung Lie
1 .0
to reduce the DPCM decoders susceptibility to

transmission noise Confines the impact of an input error to a
small number of outputs
8-47
Examples of predictors
Four examples :
f ( x, y ) = 0.97 f ( x, y 1)
f ( x, y ) = 0.5 f ( x, y 1) + 0.5 f ( x 1, y )
f ( x, y ) = 0.75 f ( x, y 1) + 0.75 f ( x 1, y ) 0.5 f ( x 1, y 1)
0.97 f ( x, y 1) ifh v
f ( x, y ) =
0.97 f ( x 1, y ) otherwise
CCU, Taiwan
Wen-Nung Lie
Adaptive with local measure of

directional property
The visually perceptible error decreases as the order of the

predictor increases
8-48
Optimal quantization
CCU, Taiwan
Wen-Nung Lie
The staircase quantization function is an odd function that

can be described by the decision ( si ) and reconstruction ( ti )
levels
Quantizer design is to select the best si and ti for a
particular optimization criterion and input probability
density function p(s)
8-49
L-level Lloyd-Max quantizer
Optimal in the mean-square error sense
si
si 1
Non-uniform
quantizer
( s ti ) p ( s )ds = 0
i=0
i = 1,2,..., L2 1
si = ti +2ti +1

si = si ,
i = 1,2,..., L2
i=
L
2
ti is the centroid of
each decision interval
Decision levels are halfway
of the reconstruction levels
ti = ti
How about an optimal

uniform quantizer ?
CCU, Taiwan
Wen-Nung Lie
8-50
Transform coding
Subimage decomposition
Transformation
Quantization
CCU, Taiwan
Wen-Nung Lie
Decorrelate the pixels of each subimage

Energy packing
Eliminate coefficients that carry the least information
coding
8-51
Transform selection
Requirements
Types
CCU, Taiwan
Wen-Nung Lie
orthonormal or unitary forward and inverse transformation kernels

basis function or basis images
separable and symmetric for the kernels
DFT
DCT
WHT (Walsh-Hadamard transform)
Compression is achieved during the quantization of the

transformed coefficients (not during the transformation
step)
8-52
WHT
The summation in the exponent is performed in modulo 2

arithmetic
bk (z ) is the kth bit (from right to left) in the binary
representation of z
N = 2m
N 1 N 1
T (u, v ) = f ( x, y ) g ( x, y , u, v )
x =0 y =0
N 1 N 1
f ( x, y ) = T (u, v )h ( x, y , u, v )
u =0 v =0
m 1
bi ( x ) pi ( u )+bi ( y ) pi ( v )
1
g ( x, y , u, v ) = h( x, y , u, v ) = ( 1) i =0
N
CCU, Taiwan
Wen-Nung Lie
8-53
DCT
( 2 x + 1)u ( 2 y + 1)v
g ( x, y , u, v ) = h ( x, y , u, v ) = (u ) ( v ) cos
cos
2N
2N

(u ) =
CCU, Taiwan
Wen-Nung Lie
1
N
2
N
for u = 0
for u = 1,2,..., N 1
8-54
Approximations using DFT,

WHT, and DCT
Truncating 50% of the

transformed coefficients
DFT
based on maximum
magnitudes
The RMS error values are
1.28 (DFT), 0.86 (WHT), and
0.68 (DCT).
WHT
DCT
CCU, Taiwan
Wen-Nung Lie
8-55
Basis images
2
A linear combination of n basis images : H uv
n 1 n 1
F = T (u , v)H uv
u =0 v =0
h(0,1, u , v)
h(0,0, u , v)
h(1,0, u , v)
.
H uv =
.
.
.
.
h(n 1,0, u, v) h(n 1,1, u , v)
CCU, Taiwan
Wen-Nung Lie
h(0, n 1, u , v)
. .
.
. .
.
. . h(n 1, n 1, u , v)
. .
. .
8-56
Transform coefficient masking
Masking function : (u, v)

n 1 n 1
F = (u , v)T (u , v)H uv
u =0 v =0
Optimalizing masking function :

n 1 n 1
n 1 n 1
u =0 v =0
u =0 v =0
ems = E{ F F } = E{ T (u , v)H uv (u , v)T (u , v)H uv }

2
n 1 n 1
= = T2 ( u ,v ) [1 (u , v)]
u =0 v =0
The total mean-square approximation error is the sum

of the variances of the discarded transform coefficients
CCU, Taiwan
Wen-Nung Lie
Transformations that pack the most information into

the fewest coefficients provide the best approximation
KLT or
DCT
8-57
Subimage size selection
The most popular subimage sizes are 88 and 1616.

A comparison of nn subimage for n=2,4,8,16
CCU, Taiwan
Wen-Nung Lie
Truncating 75% of resulting coefficients

Curves of WHT and DCT flatten as size of subimage becomes greater
than 88, while FT does not
Block effect decreases when subimage size increases
8-58
CCU, Taiwan
Wen-Nung Lie
original
2x2
subimage
4x4
subimage
8x8
subimage
8-59
Zonal vs. threshold coding
The coefficients are retained based on
Zonal coding
Each DCT coefficient is considered as a random variable whose

distribution could be computed over the ensemble of all transformed
subimages
The variances can also be based on an assumed image model (e.g., a
Markov autocorrelation function)
Coefficients of maximum variances are usually located around the origin
Threshold coding
CCU, Taiwan
Wen-Nung Lie
Maximum variance : zonal coding

Maximum magnitude : threshold coding
Select coefficients which have the largest magnitudes

Causing far less error than the zonal coding result
8-60
Bit allocation for

zonal coding
The retained coefficients are

allocated bits according to :
The dc coefficient is modeled by

a Rayleigh density function
The ac coefficients are often
modeled by a Laplacian or
Gaussian density
The number of bits is made

proportional to log 2 T2(u ,v )
The information content of a Gaussian

2
random variable is proportional to log 2 ( / D)
CCU, Taiwan
Wen-Nung Lie
Threshold
coding
Zonal
coding
8-61
Bit allocation in threshold

coding
The transform coefficients of largest magnitude make the

most significant contribution to reconstructed subimage
quality
Inherently adaptive in the sense that the location of the
retained transform coefficients vary from one subimage to
another
Retained coefficients are recorded in a 1-D zigzag ordering
pattern
CCU, Taiwan
Wen-Nung Lie
The mask pattern is run-length coded
8-62
Zonal mask
Thresholded and
retained coefficients
CCU, Taiwan
Wen-Nung Lie
Bit allocation
Zigzag ordering pattern

8-63
Bit allocation in threshold

coding (cont)
Thresholding the transform coefficients -- the threshold

can be varied as a function of the location of each
coefficient within the subimage
Result in a variable code rate, but offer the advantage that

thresholding and quantization can be combined by replacing the
masking (i.e., (u , v)T (u , v) ) with
T (u , v)
T (u , v) = round [
]
Z (u , v)
Z(u,v) is the transform normalization array
CCU, Taiwan
Wen-Nung Lie
Recovered or denormalized value

T& (u, v) = T (u, v) Z (u, v)
The element Z(u, v) can be scaled to achieve a variety of
compression levels
8-64
Normalization (quantization)
matrix
CCU, Taiwan
Wen-Nung Lie
Used in JPEG standard
8-65
Wavelet coding
Requirements in encoding an image
CCU, Taiwan
Wen-Nung Lie
An analyzing wavelet
Number of decomposition levels
The digital wavelets transform (DWT) converts a large

portion of the original image to horizontal, vertical and
diagonal decomposition coefficients with zero mean and
Laplacian-like distributions.
8-66
Wavelet coding (cont)
Subdivision of the original image is unnecessary
CCU, Taiwan
Wen-Nung Lie
Many computed coefficients can be quantized and coded to

minimize intercoefficient and coding redundancy
The quantization can be adapted to exploit any positional
correlation across different decomposition levels
Eliminating the blocking artifact that characterizes DCT-based
approximations at high compression ratios
8-67
1-D Subband decomposition
Analysis filter :
Synthesis filter :
CCU, Taiwan
Wen-Nung Lie
Low-pass : approximation h0 (n)

High-pass : detail h1 (n)
Low-pass : g 0 (n)
High-pass : g1 (n)
8-68
2-D Subband image coding

4 filter bank
CCU, Taiwan
Wen-Nung Lie
8-69
An example of Daubechies
orthonormal filters
CCU, Taiwan
Wen-Nung Lie
8-70
Comparison between DCTbased and DWT-based coding
A noticeable decrease of error in the wavelet coding results (based on

compression ratios of 34:1 and 67:1)
DWT-based
DCT-based
Rms error :
2.29, 2.96
Rms error :
3.42, 6.33
CCU, Taiwan
Wen-Nung Lie
8-71
CCU, Taiwan
Wen-Nung Lie
Based on compression ratios of

108:1 and 167:1
Even the 167:1 DWT result
(rms=4.73) is better than 67:1
DCT result (rms=6.33)
At more than twice of
compression ratio, the waveletbased reconstruction has only
75% of the error of the DCTbased result
Rms error :
3.72, 4.73
8-72
Wavelet selection
The wavelet selection affects all aspects of wavelet coding

system design and performance
Computational complexity of the transform

Systems ability to compress and reconstruct images of acceptable
error
Include the decomposition filters and reconstruction filters
The most widely used expansion functions
CCU, Taiwan
Wen-Nung Lie
Useful analysis property : number of zero moments

Important synthesis property : smoothness of reconstruction
Daubechies wavelets
Bi-orthogonal wavelets
Allows filters with binary coefficients (numbers of the

form k/2a)
8-73
Comparison between different

wavelets
CCU, Taiwan
Wen-Nung Lie
(1) Harr wavelets : the

simplest
(2) Daubechies wavelets :
the most popular imaging
1,2
wavelets
3,4
(3) Symlets : an extension
of Daubechies with
increased symmetry
(4) Cohen-DaubechiesFeauveau wavelets :
biorthogonal wavelets
8-74
Comparison between different

wavelets (cont)
Comparisons in number of operations required
As the computational complexity increases, the information

packing ability does as well
For analysis and reconstruction filters, respectively
CCU, Taiwan
Wen-Nung Lie
8-75
Other issues
The number of decomposition levels required
Quantizer design
CCU, Taiwan
Wen-Nung Lie
The initial decompositions are responsible for the majority of the data
compression (3 levels is enough)
Introduce an enlarged quantization interval around zero (i.e., the dead

zone)
Adapting the size of the quantization interval from scale to scale
Quantization of coefficients at more decomposition levels impacts larger
areas of the reconstructed image
8-76
Binary image compression

standards
Most of the standards are issued by
International Standardization Organization (ISO)

Consultative Committee of the International Telephone and Telegraph
(CCITT)
CCITT Group 3 and 4 are for binary image compression
Originally designed as fasimile (FAX) coding method

G3 : Nonadaptive, 1-D run-length coding
G4 : a simplified or streamlined version of G3, only 2-D coding
Joint Bilevel Imaging Group (JBIG)
CCU, Taiwan
Wen-Nung Lie
The coding approach is quite similar to the RAC method
A joint committee of CCITT and ISO

Proposed JBIG1: adaptive arithmetic compression technique (the best
average and worst-case available)
Proposed JBIG2 : achieve compressions 2 to 4 times greater than JBIG1
8-77
Continuous-tone image
compression -- JPEG
JPEG, wavelet-based JPEG-2000, JPEG-LS

JPEG define three coding system
Baseline system
CCU, Taiwan
Wen-Nung Lie
Lossy baseline coding system

Extended coding system for greater compression, higher
precision, or progressive reconstruction applications
Lossless independent coding system
A product or system must include support for the baseline system
Input, output : 8 bits
Quantized DCT values : 11 bits
Subdivided into blocks of 8x8 pixels for encoding
Pixels are level shifted by substracting 128 graylevels
8-78
JPEG (cont)
DCT-transformed
Quantized by using the quantization or normalization matrix
The DC DCT coefficients is difference coded relative to the DC
coefficient of the previous block
The non-zero AC coefficients are coded by using a VLC that
defines the coefficients value and number of preceding zeros
CCU, Taiwan
Wen-Nung Lie
The user is free to construct custom tables and/or arrays, which may
be adapted to the characteristics of the image being compressed
Add a special EOB (end of block) code behind the last non-zero
AC coefficient
8-79
Huffman Coding for the DC

and AC coefficients
VLC = base code (category code) + value code

An example to encode DC = -9
An example to encode AC = (0,-3)
CCU, Taiwan
Wen-Nung Lie
Category 4 : -15,,-8,8,,15 (base code = 101)

Total length = 7 value code length = 4
For a category K, additional K bits are needed as the value code and
computed as either the K LSBs (positive value) or the K LSBs of 1s
complement of its absolute value (negative value) : K=4, value code =
0110
Complete code : 101-0110
length of 0 run : 0, AC coefficient value : -3
AC = -3 category = 2 0/2 (Table 8.19) base code = 01
Value code = 00
Complete code = 0100
8-80
A JPEG coding example

original
52
55
61
66
70
61
64
73
-76
-73
-67
-62
-58
-67
-64
-55
63
59
66
90
109
85
69
72
-65
-69
-62
-38
-19
-43
-59
-56
62
59
68
113
144
104
66
73
-66
-69
-60
-15
16
-24
-62
-55
63
58
71
122
154
106
70
69
-65
-70
-57
-6
26
-22
-58
-59
67
61
68
104
126
88
68
70
-61
-67
-60
-24
-2
-40
-60
-58
79
65
60
70
77
68
58
75
-49
-63
-68
-58
-51
-65
-70
-53
85
71
64
59
55
61
65
83
-43
-57
-64
-69
-73
-67
-63
-45
87
79
69
68
65
76
78
94
-41
-49
-59
-60
-63
-52
-50
-34
DCT
transformed
CCU, Taiwan
Wen-Nung Lie
-415
-29
-62
25
55
-20
-1
-21
-62
11
-7
-6
-46
77
-25
-30
10
-5
-50
13
35
-15
-9
11
-8
-13
-2
-1
-4
-10
-3
-1
-1
-4
-1
-1
-3
-2
-1
-1
-2
-1
-1
-1
Level
shifted
8-81
quantized
-26
-3
-6
-2
-4
-3
-1
-1
-4
-1
1-D Zigzag scanning : [-26 3 1 3 2 6 2 4 1 4 1 1 5 0 2 0 0 1 2 0 0 0 0 0 1 1 EOB]

Symbols to be coded : (DPCM =-9, assumed),(0,-3),(0,1),(0,-3),(0,-2),(0,-6),(0,2),(0,4),(0,1),(0,-4),(0,1),(0,1),(0,5),(1,2),(2,-1),(0,2),(5,-1),(0,-1),EOB
Final codes :
1010110,0100,001,0100,0101,100001,0110,100011,001,100011,001,001,100101,1110011
0,110110,0110,11110100,000,1010
CCU, Taiwan
Wen-Nung Lie
8-82
Reconstructed after VLD

and requantization
residual
CCU, Taiwan
Wen-Nung Lie
58
64
67
64
59
62
70
78
56
55
67
89
98
88
74
69
60
50
70
119
141
116
80
64
69
51
71
128
149
115
77
68
74
53
64
105
115
84
65
72
76
57
56
74
75
57
57
74
83
69
59
60
61
61
67
78
93
81
67
62
69
80
84
84
-6
-9
-6
11
-1
-6
-5
-1
11
-3
-5
-2
-6
-3
-12
-14
-6
-4
-5
-9
-7
-7
-1
11
-2
-4
11
-1
-6
-2
-6
-2
-4
-4
-6
10
8-83
CCU, Taiwan
Wen-Nung Lie
8-84
CCU, Taiwan
Wen-Nung Lie
8-85
JPEG-2000
Increased flexibility to the access of compressed data
Portions of a JPEG 2000 compressed image can be extracted for

re-transmission, storage, display, and editing
Procedures
DC level shift by substracting 128

Optionally decorrelated by using reversible or non-reversible linear
combination of components (e.g., RGB to YCrCb component
transformation) (i.e., RGB-based or YCrCb-based JPEG-2000
coding are allowed)
Y0 ( x, y ) = 0.299 I 0 ( x, y ) + 0.587 I 1 ( x, y ) + 0.114 I 2 ( x, y )
Y1 ( x, y ) = 0.16875 I 0 ( x, y ) 0.33126 I 1 ( x, y ) + 0.5 I 2 ( x, y )
Y2 ( x, y ) = 0.5 I 0 ( x, y ) 0.41869 I 1 ( x, y ) 0.08131I 2 ( x, y )
CCU, Taiwan
Wen-Nung Lie
Y1 and Y2 are different images highly peaked around zero
8-86
JPEG-2000 (cont)
CCU, Taiwan
Wen-Nung Lie
Optionally divided into tiles (rectangular arrays of pixels which

can be extracted and reconstructed independently), providing a
mechanism for accessing a limited region of a coded image
2-D DWT by using a biorthogonal 5-3 coefficient filter (lossless)
or a 9-7 coefficient filter (lossy) produces a low-resolution
approximation and horizontal, vertical, and diagonal frequency
components (LL, HL,LH, HH bands)
8-87
9-7 filter for lossy DWT
H0
H1
0
1
6/8
2/8
1
-1/2
-1/8
5-3 filter for lossless DWT

CCU, Taiwan
Wen-Nung Lie
8-88
JPEG-2000 (cont)
A fast DWT algorithm or a lifting-based approach is often used

Iterative decomposition of the approximation part to obtain :
Difference from subband

decomposition
CCU, Taiwan
Wen-Nung Lie
The standard does not specify

the number of scales (i.e., the
number of decomposition
levels) to be computed
Important visual information is concentrated in a few coefficients

8-89
CCU, Taiwan
Wen-Nung Lie
8-90
JPEG-2000 (cont)
Coefficient quantization adapted to individual scales and subbands

qb (u , v) = sign[ab (u , v)] floor[
ab (u , v)
b
quantization step size : b = 2 R (1 +

b
b
211
Rb is the nominal dynamic range (in bits) of subband b, b and

b are the number of bits allocated to the exponent and
mantissa of subbands coefficients
For error-free compression, b = 0, Rb = b, and b=1
CCU, Taiwan
Wen-Nung Lie
The number of exponent and mantissa bits should be provided to

the decoder on a subband basis, called explicit quantization, or for
the LL subbands only, called implicit quantization (the remaining
subbands are quantized using extrapolated LL subband parameters)
8-91
JPEG-2000 (cont)
Quantized coefficients are arithmetically coded on a bit-plane basis
CCU, Taiwan
Wen-Nung Lie
The coefficients are arranged into rectangular blocks called code

blocks, which are individually coded a bit plane at a time
Starting from the MSB bit plane with nonzero elements to the LSB bit
plane
Encoding by using the context-based arithmetic coding method
The coding outputs from each code block are grouped to form
layers
The resulting layers are finally partitioned into packets
The encoder can encode Mb bitplanes for a particular

subband, the decoder can decode only Nb bitplanes, due to
the embedded nature of the code stream
8-92
N+1 decomposition level
R-D / Quality
N-1 decomposition level
code-block 3
code-block 5
code-block 1
bit-stream code-block 2 bit-stream code-block 4 bit-stream code-block 6
bit-stream
bit-stream
bit-stream
code-block 7
bit-stream
empty
Layer 5
Layer 4
N decomposition level
empty
empty
Layer 3
Layer 2
empty
Layer 1
Resolution
CCU, Taiwan
Wen-Nung Lie
8-93
Video compression -- motioncompensated transform coding
Standards
CCU, Taiwan
Wen-Nung Lie
Video teleconferencing standard (H.261, H.263, H.263+, H.320, H.264)

Multimedia standard (MPEG-1/2/4)
A combination of predictive coding (in temporal domain) and

transform coding (in spatial domain)
Estimate (predict) the image by simply propagating pixels in the
previous frame along their motion trajectory motion
compensation
The success depends on accuracy, speed, and robustness of the
displacement estimator -- motion estimation
Motion estimation is based on each image block (1616 or 88
pixels)
8-94
Motion-compensated
transform coding - encoder
CCU, Taiwan
Wen-Nung Lie
8-95
Motion-compensated
transform coding -- decoder
Compressed
bit stream
BUF
VLD
IT
Reconstructed
Frame
MC
Predictor
Motion Vectors
IT : inverse DCT transform
VLD : variable length decoding
CCU, Taiwan
Wen-Nung Lie
8-96
2-D Motion estimation

techniques
Assumptions :
Methods :
Optical flow approach

Pel-recursive approach
Block-matching approach
Different from motion estimation in target tracking
CCU, Taiwan
Wen-Nung Lie
Rigid translational movements only

Uniform illumination in time and space
Causal operations (forward prediction, no future frames at decoder)

Goal: reproduce pictures with minimum distortion, not necessarily
estimate accurate motion parameters
8-97
Block-Matching (BM) Motion

Estimation
Find the best match (mvx , mv y ) between current image block and
candidates in the previous frame by minimizing the following cost
x ( x, y ) x
n
( x , y )block
n 1
( x + mvx , y + mv y )
One motion vector (MV) for each block between successive frames
frame k-1
frame k
search window
block
CCU, Taiwan
Wen-Nung Lie
8-98
BM motion estimation
techniques
Full search method

Fast Search (Reduced Search)
Existing Algorithms :
CCU, Taiwan
Wen-Nung Lie
Usually a multistage search procedure that calculates fewer search points

Local-(or Sub-)optima in general
Evaluation number of search points, noise immunity
Three-step method
Four-step method
Log-search method
Hierarchical matching method
Prediction search method
Kalman prediction method
8-99
Fast full search
Strategy of early quit to next search point in computing partial errors

Re-ordering in computing errors
Conventional : raster-scanning order

Improvement : row or column sorting by gradient magnitudes in decreasing order
A 3x~4x speedup with respect to traditional full search

Search area
23 30 32 35
22 32 33 37
45 46 41 36
41 29 31 20
0 3 12 12
11 6 4 6
difference 32 26 38 49
10 8 10 6 34
21 22 22 25 90
31 26 31 37 125
25 21 18 22 86
CCU, Taiwan
Wen-Nung Lie
gradient
23 27 20 23
33 38 29 31
1 16 19 29
77 72 79 85
Assume : current minimum

error = 100
40 45 50 49
Template
8-100
Three steps search
CCU, Taiwan
Wen-Nung Lie
The full search examines 1515=225 positions, the three step search examines
9+8+8=25 positions (assume MV range is 7 to +7)
Search range
8-101
Predictive search
Explore the similarity of MVs between adjacent macroblocks

Obtain initially predicted MV from neighboring macroblocks by
averaging
Prediction followed by a small-area detailed search
Significantly reduce the CPU time by 4~10X
MVpredict=(MV1+MV2+MV3)/3
Search area
MV1 MV2
MV3
CCU, Taiwan
Wen-Nung Lie
8-102
Fractional pixel motion

estimation
CCU, Taiwan
Wen-Nung Lie
Question : what happens if motion vector (mvx , mv y ) are real numbers instead
of integer numbers ?
Answer : Find integer motion vector first and then do interpolation from
surrounding pixels for refined search. Most popular case is the half-pixel
accuracy (i.e., 1 out of 8 positions for refinement). 1/3 pixel accuracy will be
proposed in H.263L standard.
Half-pixel accuracy often leads to a matching with less residue (high
reconstruction quality), hence decreases the bit rate required in following
encoding.
: integer MV position
: candidates of half-pixel MV positions
8-103
Evaluation of BM motion
estimation
Advantages :
Disadvantages :
Can not process several moving objects in a block (e.g., around

occlusion boundaries)
Improvements :
CCU, Taiwan
Wen-Nung Lie
Straightforward, regular operation, and promising for VLSI chip

design
Robustness (no differential operations as in optical flow approach)
Post-processing of images to eliminate blocking effect

Interpolation of images or averaging of motion vectors to subpixel
accuracy
8-104
Image Prediction structure
I/ P frame only (H.261, H.263)
Intra-frame
forward prediction
Predictive
frame
.....
Bi-directional
frame
I/ P/ B frame (MPEG-x)
forward prediction
.....
bidirectional
prediction
CCU, Taiwan
Wen-Nung Lie
8-105

Lie Gonzalez Ch8

Cargado por

Información del documento

Descripción original:

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Lie Gonzalez Ch8

Cargado por

Copyright:

Formatos disponibles

Chapter 8 : Image

What is data and hat is information ?

Data are the means by which information is conveyed.

The graylevel histogram of an image can provide a great

Lavg = l (rk ) pr (rk )

l (rk ) is the average number of bits to represent each value of rk

Lavg = 3.0 bits (code1)

Larger autocorrelation, more interpixel

Run-length coding to remove

Certain information simply has less relative importance

E.g., quantization in graylevels

(b) False contouring on quantization

(c) Improve graylevel quantization

Objective fidelity criterion

Subjective fidelity criterion

Image compression model

Remove input redundancies

Source encoder model

Reduce the accuracy of the mappers output (reduce the

Transform the input data

Fixed or variable-length coder (assign the shortest code words to

Channel encoder and decoder

Designed to reduce the impact of channel noise by

7-bit Hamming (7,4) the minimum distance is 3 and could

A single-bit error is indicated by a nonzero parity word c4c2c1

Explore the minimum amount of data that is

The base of logarithm

H (z ) = P(a j ) log P(a j )

H(z) is the uncertainty or entropy of the source or

For equally probable source symbols, entropy is maximized

Channel alphabet B = {b1 , b2 ,..., bK }

P(bk ) = P(bk a j ) P(a j )

... ... P (bK a J )

Q : Forward channel transition

Conditional entropy function

H (z bk ) = P(a j bk ) log P(a j bk )

H (z v ) = P(a j , bk ) log P(a j bk )

H ( z v ) is called the equivocation of z wrt v.

The minimum possible value of I(z,v) is zero and occurs

Capacity : the maximum rate at which information can be

Binary entropy function

To estimate source entropy

For binary source z = [ pbs ,1 pbs ]T

when pbs = 0.5

Binary symmetrical channel (BSC)

To estimate channel capacity

mutual information : I ( z, v ) = H bs ( pbs pe + pbs pe ) H bs ( pe )

I(z,v)=0 when pbs=0 or 1

I(z,v)=1-Hbs(pe)=C when pbs=0.5 and any pe

when pe=0 or 1 C=1 bit

Shannons first theorem for

Define the minimum average code length per source

P( i ) = P(a j1 ) P(a j 2 )...P(a jn )

Use n symbols as a block symbol

/ n arbitrarily close to H(z) by

Original source : P (a1 ) = 2 / 3,

Second extension source : z = {4 / 9,2 / 9,2 / 9,1 / 9}

= 17 / 9 = 1.89 bits / symbol

= 1.83 / 1.89 = 0.97

Shannons second theorem

For any R < C (channel capacity of the zero-memory

Q : the probability from source symbols to output symbols by compression