Está en la página 1de 9

PROJECT REPORT

Optical Character Recognition using Artificial Neural Network

Department of Computer Science & Engineering 24 September 2012

BY

Contents 1. Introduction 2. Pre-processing 3. Segmentation 4. Feature Extraction 5. Classification 6. Application 7. Conclusion

1. Introduction
The goal of our project is to create an application interface for Optical Character Recognition that would use an Artificial Neural Network as the backend to solve the classification problem. The input for the OCR problem is pages of scanned text. To perform the character recognition, our application has to go through three important steps. The first is segmentation, i.e., given a binary input image, to identify the individual glyphs (basic units representing one or more characters, usually contiguous). The second step is feature extraction, i.e., to compute from each glyph a vector of numbers that will serve as input features for an ANN. This step is the most difficult in the sense that there is no obvious way to obtain these features. The final task is classification. In our approach, there are two parts to this. The first is the training phase, where we manually identify the correct class of several glyphs. The features extracted from these would serve as the data to train the neural network. After the network is trained, classification for new glyphs can be done by extracting features from new glyphs and using the trained network to predict their class.

Software Choices
We intend to implement this using statistical and graphical programming environment called MATLAB. MATLAB can be extended with toolboxes, which implement various algorithms commonly used in science and engineering. Of particular interest to us here is the Neural Network Toolbox, which constitutes one of the most comprehensive neural network packages currently available, which we will use to create and run neural networks. Further modifications if necessary will be done using C programming language which can be integrated with MATLAB.

2. Pre-processing
Image digitalization
When a document is put to visual recognition, it is expected to be consisting of printed (or handwritten) characters pertaining to one or more scripts or fonts. This document however, may contain information besides optical characters alone. For example, it may contain pictures and colours that do not provide any useful information in the instant sense of character recognition. In addition, characters which need to be singly analysed may exist as word clusters or may be located at various points in the document. Such an image is usually processed for noise-reduction and separation of individual characters from the document. It is convenient for comprehension to assume that the submitted image is freed from noise and that individual characters have already been located (using for example, a suitable clustering algorithm). This situation is synonymous to the one in which a single noise-free character has been submitted to the system for recognition.

The process of digitization is important for the neural network used in the system. In this process, the input image is sampled into a binary window which forms the input to the recognition system. In the above figure, the alphabet A has been digitized into 6X8=48 digital cells, each having a single color, either black or white. It becomes important for us to encode this information in a form meaningful to a computer. For this, we assign a value +1 to each black pixel and 0 to each white pixel and create the binary image matrix I which is shown in the Figure (c). So much of conversion is enough for neural networking which is described next. Digitization of an image into a binary matrix of specified dimensions makes the input image invariant of its actual dimensions. Hence an image of whatever size gets transformed into a binary matrix of fixed pre-determined dimensions. This establishes uniformity in the dimensions of the input and stored patterns as they move through the recognition system.

3. Segmentation
The most basic step in OCR is to segment the input image into individual glyphs. In our approach, this is needed in two different phases, with slightly different requirements. The first is during the training stage, where segmented glyphs are presented to the human supervisor for manual classi-fication. The other is after the network is trained and we want to recognize

a new image. In this case, we need to identify each glyph in the correct sequence before extracting features from it and classifying.

The current implementation is not very sophisticated, and sometimes fails when there are very short lines at the end of a paragraph. However, the calculations for identifying line gaps given the mean row intensities is implemented as a separate function and can be easily improved later without affecting the rest of the procedure. A similar procedure can be used to split lines into words. The segmentation into lines is a useful pre-processing step. The actual segmentation code accepts a matrix with entries 0 and 1 and returns a matrix of the same dimensions with entries 0, 1, 2, 3, N, where N - 1 is the number of identified segments. The elements of the matrix marked i, i = 2 ., N correspond to the ith segment. This part is computationally intensive and is implemented internally in C code called from within R. Subsequently, another small R function extracts the individual segments as binary matrices. As mentioned above, one important use of segmentation is for training the classifier. In the training stage, we need to manually identify several glyphs

4. Feature Extraction
The glyphs identified by segmentation are binary matrices, and as such, not suitable for direct use in a neural network. So, we have to somehow extract features from each glyph that we can subsequently use for classification. This is definitely the most important design decision in the procedure, since without a good feature set we cannot expect to see good results. There is no single obvious choice of features. I decided to base my features on identifiable regular parabolic curves in the image. A brief description of the feature extraction steps follow 1. The first step is to convert the binary glyph matrix to a set of points roughly corresponding to the boundary of the image. This is defined as the collection of background pixels (0) in the image which have at least one neighbour in the foreground (1). See Figure 2 for an example. 2. The next step is to loop through each of these boundary points, and figure out the `best' parabola passing through that point fitting the boundary locally. For each point, this involves going through the following steps: Decide the `orientation' of the boundary at that point by fitting a straight line through points in a small neighbourhood of that point.

Rotate the image by an angle to make this line horizontal, and fit a quadratic regression line to the previously identified neighbouring points. Determine points in the boundary that are close to this fitted quadratic curve (using a predetermined threshold). Update the quadratic curve by refitting it using all the points thus identified. Repeat this update using `close' points 2 more times. It is hoped that the curve thus identified closely approximates the curvature of the boundary at that point. Note that it is perfectly all right if this doesn't work as expected for all points. We are interested in only the `strongest' curves, that is, those that fit the biggest proportion of boundary points. For such points, many points would lie on those curves, and it is likely that at least some of those points will identify the curve correctly.

3. After this, we order the points by the `strength' of the best fitting curves for each point (measured in number of other boundary points `close' to the final curve). For the best fitting curve, we record the angle of rotation and the quadratic coefficient of the curve as features (the linear coefficient is close to 0 because of the rotation). 4. Only one curve may not be enough to identify a glyph, so we try to identify the second and third best curves as well. Since we don't want the same curve to be identified again, we leave out all points identified as being `close' to the first curve, and re-evaluate the `strengths' of the remaining points based on the remaining points. Thus we finally have three sets of angles and quadratic coefficients as features. 5. Finally, we add another measure as a feature, namely the aspect ratio (width / height) of the glyph. Thus we have a feature vector of length 7. Note that all these measurements are theoretically independent of scale (although the actual resolution of the image may make a difference).

5. Classification
An artificial neural network (ANN), usually called neural network (NN), is a mathematical model or computational model that tries to simulate the structure and/or functional aspects of biological neural networks. It consists of an interconnected group of artificial neurons and processes information using a connectionist approach to computation. In most cases an ANN is an adaptive system that

changes its structure based on external or internal information that flows through the network during the learning phase. In more practical terms neural networks are non-linear statistical data modelling tools. They can be used to model complex relationships between inputs and outputs or to find patterns in data. Neural networks have seen an explosion of interest over the last few years, and are being successfully applied across an extraordinary range of problem domains, in areas as diverse as finance, medicine, engineering, geology and physics. Indeed, anywhere that there are problems of prediction, classification or control, neural networks are being introduced.

This sweeping success can be attributed to a few key factors: Power: Neural networks are very sophisticated modelling techniques capable of modelling extremely complex functions. In particular, neural networks are nonlinear. Ease of use: Neural networks learn by example. The neural network user gathers representative data, and then invokes training algorithms to automatically learn the structure of the data. Although the user does need to have some heuristic knowledge of how to select and prepare data, how to select an appropriate neural network, and how to interpret the results, the level of user knowledge needed to successfully apply neural networks is much lower than would be the case using (for example) some more traditional nonlinear statistical methods. To capture the essence of biological neural systems, an artificial neuron is defined as follows:

It receives a number of inputs (either from original data, or from the output of other neurons in the neural network). Each input comes via a connection that has a strength (or weight); these weights correspond to synaptic efficacy in a biological neuron. Each neuron also has a single threshold value. The weighted sum of the inputs is formed, and the threshold subtracted, to compose the activation of the neuron (also known as the post-synaptic potential, or PSP, of the neuron). The activation signal is passed through an activation function (also known as a transfer function) to produce the output of the neuron.

The Multi-Layer Perceptron Neural Network is perhaps the most popular network architecture in use today. The units each perform a biased weighted sum of their inputs and pass this activation level through an activation function to produce their output, and the units are arranged in a layered feedforward topology. The network thus has a simple interpretation as a form of input-output model,

with the weights and thresholds (biases) the free parameters of the model. Such networks can model functions of almost arbitrary complexity, with the number of layers, and the number of units in each layer, determining the function complexity.

Backpropagation Learning Algorithm The backpropagation algorithm trains a given feed-forward multilayer neural network for a given set of input patterns with known classifications. When each entry of the sample set is presented to the network, the network examines its output response to the sample input pattern. The output response is then compared to the known and desired output and the error value is calculated. Based on the error, the connection weights are adjusted. The backpropagation algorithm is based on Widrow-Hoff delta learning rule in which the weight adjustment is done through mean square error of the output response to the sample input. The set of these sample patterns are repeatedly presented to the network until the error value is minimized. The figure below illustrates the backpropagation multilayer network with M layers. represents the number of neurons in jth layer. Here, the network is presented the -dimensional input . The actual response to the . be the connection

pth pattern of training sample set with and

-dimensional known output response

input pattern by the network is represented as Let be the output from the ith neuron in layer j for pth pattern;

weight from kth neuron in layer j-1 to ith neuron in layer j; and associated with the ith neuron in layer j. .

be the error value

Figure : Backpropagation Neural Network

Steps to follow until error is suitably small Step 1: Input training vector. Step 2: Hidden nodes calculate their outputs. Step 3: Output nodes calculate their outputs on the basis of Step 2. Step 4: Calculate the differences between the results of Step 3 and targets. Step 5: Apply the first part of the training rule using the results of Step 4. Step 6: For each hidden node, n, calculate d(n). Step 7: Apply the second part of the training rule using the results of Step 6. Steps 1 through 3 are often called the forward pass, and steps 4 through 7 are often called the backward pass. Hence, the name: back-propagation.

6. Application
Depending on the time left for the project the OCR thus implemented will be used in one of the following fields: 1) Currency scanning 2) Real time Sudoku solving 3) Text detection in streaming video for car number plate identification 4) Font detection 5) Product expiry date scanner 6) Video based handwritten character recognition

7. Conclusion
Our preliminary research in noisy character recognition clearly indicates that Neural Network can be a most important tool for the implementation of the Character Recognition System.

También podría gustarte