Documentos de Académico
Documentos de Profesional
Documentos de Cultura
1. INTRODUCTION
Communication using automatic verbal to text conversion has gained a lot of attention lately especially in
mobile devices. However, hand written character recognition remains a vital research area mainly because of
its application to human-machine and machine-machine
communication. Hand written character recognition still
remains a challenge even though great improvements
have been achieved using digital pens and touch screens.
Another area in character recognition is in virtual scenes.
In such scenes, the characters are written in the air by
hand and captured using a cheap USB camera placed
in front of a subject. Such characters are termed Air
Characters in this work. Fortunately, many useful technologies in automatic detection and recognition have already been proposed to recognize characters. Furthermore, recognition of air characters will open new areas in
human-machine interfaces especially in replacing the TV
remote control devices and enabling non-verbal communication. Three steps are necessary in such systems. That
is, the size and orientation invariant segmentation of the
characters, normalization of other factors like brightness,
contrast, illumination etc. and the recognition of the characters themselves. Today there are many OCR devices
in use based on a plethora of different algorithms [1].
Examples include a wavelet transform based method for
extracting license plates from cluttered images achieving
a 92.4% accuracy [2] and a morphology-based method
for detecting license plates from cluttered images with a
detection accuracy of 98% [3] . Hough transform combined with other preprocessing methods is used by [4]
[5]. In [6] an efficient object detection method is proposed. More recently, license plate recognition from lowquality videos using morphological and Ada boost algorithm was proposed by [7]. It uses the haar like features
proposed by [8] for face detection.
Fig. 2 Basic histogram showing the 8 features represented by their normalized lengths.
The final feature data will consist of the character identifier, the number of features and the eight feature data.
By dividing that data by the longest line segment, data
normalization can be achieved. Data normalization deals
with characters size differences. Each character can then
be represented by a ten value feature vector that can be
visualized as a normal histogram, Fig. 2.
2. PRE-PROCESSING
1 12 + 2 22
1 + 2
(1)
Fig. 6 Thinning results using the Zhang-Suen Thinning
Algorithm.
2
) can be calculated
and the between-class variance (B
using:
2
B
=
1 2 (M1 M2 )2
(1 + 2 )2
(2)
However, the results were not perfect for all characters. Therefore, a pruning algorithm is necessary to remove such noise.
(3)
3. NEURAL NETWORKS
2
2
= W
+ B
2
B
in
(4)
4. EXPERIMENTS
4.1 Database
The MNIST database [15] of handwritten digits, used
in this work, has a training set of 60,000 examples, and
a test set of 10,000 examples. The digits have been
size-normalized and centered in a fixed-size image. The
MNIST database was constructed from NISTs Special
Database 3 and Special Database 1 which contain binary
images of handwritten digits. NIST originally designated
1153
SD-3 as their training set and SD-1 as their test set. SD1 contains 58,527 digit images written by 500 different
writers. The original black and white (bilevel) images
from NIST were size normalized to fit in a 20x20 pixel
box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing
technique used by the normalization algorithm.
The neural network structure developed for this work
is a four layer network. The first layer nodes are 8, 16
and 24 representing the normal, two-layered and threelayered cases respectively. The two hidden layers nodes
were fixed at 100 nodes after several trials. The output
layer consists of 10 nodes each for the 10 handwritten
numerals.
From the database, 5000 samples per character are selected at random for learning. The 5000 characters are
further subdivided into 5 groups of 1000 characters each
to enable 5-fold cross validation. Moreover, the neural
network was trained for two cases; that is, feeding the
training samples in order or at random.
The final testing set consists of 10010 characters. The
final results as a percentage are calculated based on the
correct recognition out of the test character set.
4.2 Normal
The results in this section represent the case where the
basic star model consisting of 8 features was used. This
means that from characters the center of gravity, the first
hit is taken as the feature for the line. Therefore, the number of features per character used for training is 8. The
results are shown in Table 1.
Table 1 Normal star model results
layer neural networks are applied in about 18 cases. A 6layer network is also used. The results show an error rate
of between 0.35 and 4.7% for 6-layered and 2-layered
networks respectively. The proposed system produces an
error rate of 2.7% using only the star-layered features and
a 3-layered neural network. This is within the range (1.53
to 3.05%) of similar structured neural nets.
Training
Training
Validate
Testing
order
(%)
(%)
(%)
Normal
97.0
92.2
96.2
Random
96.7
92.8
96.9
Training
Training
Validate
Testing
order
(%)
(%)
(%)
Normal
98.0
93.7
96.6
Random
98.8
94.1
97.3
Training
Training
Validate
Testing
order
(%)
(%)
(%)
Normal
95.0
91.2
93.4
Random
96.1
92.5
93.5
4.6 Discussions
There are many carried over errors in the preprocesses
in this work that affect the final accuracy. These include
binarization and thinning. A universal threshold determined by the discriminant method is used. This caused
some breaks in the numerals. We must consider using
the variable threshold method in the future to solve this
problem. In each window the threshold will still be determined using the the discriminant method. The thinning
algorithm used requires pruning. The thinned image directly determines the value of the features. Therefore, a
more accurate method should be applied in the future.
Re-sampling the image enables for the capture of displaced features due to alignment. Note that we are
dealing with a binary image and extraction features on
straight line. Therefore, it is possible for some pixels to
be off the line at some sampling rate. The two-layered
model offered the best result because most of the features
were accurately captured. It turns out that re-sampling 3
times deletes some of these useful features. We must find
the optimum sampling rate in the future.
5. CONCLUSION
acter region is extracted, its contour is then used to determine the center of gravity (COG) that is used as the origin
to create a histogram using equally spaced lines extending from it. The first point the line touches the character
represents the first layer of the histogram. If the line extension has not reached the region boundary, the next hit
represents the second layer of the histogram. This process is repeated until the line touches the boundary of the
characters region. After normalization, these features are
used to train a neural network to evaluate their effectiveness in numeral classification. This method achieves an
accuracy of about 97.1% using the MNIST database of
handwritten digits.
In future, we must analyze the method to determine the
most effective way of representing the features because
most of them are zero especially in the 24 feature vector.
Moreover, other types of features including the bifurcation points, area, edge gradient, etc. must be considered
to improve the recognition accuracy. Other classification
method and databases also need to be considered in the
future.
REFERENCES
[1] Eric
W.
Brown,
Character
Recognition
by
Feature
Point
Extraction,
http://www.ccs.neu.edu/home/feneric/charrec.html,
2010.
[2] Ching-Tang Hsieh, Yu-Shan Juan and Kuo-Ming
Hung, Multiple License Plate Detection for Complex
Background, Advanced Information Networking and
Applications, pp.389-392, 2005.
[3] Jun-Wei Hsieh, Shih-Hao Yu, Yung-Sheng Chen,
Morphology-Based License Plate Detection from
Complex Scenes, Proc. of International Conference
on Pattern Recognition, pp. 176-179, 2002.
[4] Yanamura Y., Goto M., Nishiyama D., Soga M.,
Nakatani H. and Saji H, Extraction And Tracking
Of The License Plate Using Hough Transform And
Voted Block Matching, Proc. of IEEE IV Intelligent
Vehicles Symposium , pp.243-6,2003.
[5] Kamat V. and Ganesan S, An efficient implementation of the Hough transform for detecting vehicle license plates using DSPfS, Proc. of Real-Time Technology and Applications Symposium, pp.58-9,1995.
[6] Viola P. and Jones M, Rapid Object Detection Using
a Boosted Cascade of Simple Features, Proc. of Computer Vision and Pattern Recognition, vol.1, pp.511518, 2001.
[7] Chih-Chiang Chen and Jun-Wei Hsieh, License Plate
Recognition from Low-Quality Videos, Proc. of the
IAPR Conference on Machine Vision Applications,
pp. 122-125, 2007.
[8] P. Viola and M. J. Jones, Robust real-time face detection, International Journal of Computer Vision, vol.
57, no. 2, pp. 137-154, 2004.
[9] Y. Abe, M. Konishi and J. Imai, Neural network
based diagnosis system for looper height controller
of hot strip mills, International Journal Innovative
1155