Está en la página 1de 26

Case Study on Machine Vision 2011

CHAPTER 1
INTRODUCTION
1.1 What is machine vision?

Machine vision (MV) is a branch of engineering that uses computer vision in the
context of manufacturing. While the scope of MV is broad and a comprehensive
definition is difficult to distil, a "generally accepted definition of machine vision is '...
the analysis of images to extract data for controlling a process or activity. Put another
way, MV processes are targeted at "recognizing the actual objects in an image and
assigning properties to those objects--understanding what they mean.

The first step in the MV process is acquisition of an image, typically using cameras,
lenses, and lighting that has been designed to provide the differentiation required by
subsequent processing. MV software packages then employ various digital image
processing techniques to allow the hardware to recognize what it is looking at.

Techniques used in MV include: thresholding (converting an image with gray tones to


black and white), segmentation, blob extraction, pattern recognition, barcode reading,
optical character recognition, gauging (measuring object dimensions), edge detection,
and template matching (finding, matching, and/or counting specific patterns).

Fig 1.1 Machine Vision Camera Used in Robots

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 1


GGSIPU
Case Study on Machine Vision 2011

CHAPTER 2
THRESHOLDING
2.1 What is thresholding?

Thresholding is the simplest method of image segmentation. From a grayscale image,


thresholding can be used to create binary images

2.2 Method

During the thresholding process, individual pixels in an image are marked as “object”
pixels if their value is greater than some threshold value (assuming an object to be
brighter than the background) and as “background” pixels otherwise. This convention
is known as threshold above. Variants include threshold below, which is opposite of
threshold above; threshold inside, where a pixel is labeled "object" if its value is
between two thresholds; and threshold outside, which is the opposite of threshold
inside. Typically, an object pixel is given a value of “1” while a background pixel is
given a value of “0.” Finally, a binary image is created by coloring each pixel white or
black, depending on a pixel's label's.

2.3 Threshold selection

The key parameter in the thresholding process is the choice of the threshold value (or
values, as mentioned earlier). Several different methods for choosing a threshold
exist; users can manually choose a threshold value, or a thresholding algorithm can
compute a value automatically, which is known as automatic thresholding. A simple
method would be to choose the mean or median value, the rationale being that if the
object pixels are brighter than the background, they should also be brighter than the
average. In a noiseless image with uniform background and object values, the mean or
median will work well as the threshold, however, this will generally not be the case. A
more sophisticated approach might be to create a histogram of the image pixel
intensities and use the valley point as the threshold. The histogram approach assumes
that there is some average value for the background and object pixels, but that the
actual pixel values have some variation around these average values. However, this
may be computationally expensive, and image histograms may not have clearly
defined valley points, often making the selection of an accurate threshold difficult.
One method that is relatively simple, does not require much specific knowledge of the
image, and is robust against image noise, is the following iterative method:

1. An initial threshold (T) is choosen; this can be done randomly or according to


any other method desired.
2. The image is segmented into object and background pixels as described above,
creating two sets:
1. G1 = {f(m,n):f(m,n)>T} (object pixels)
2. G2 = {f(mn):f(m,n) T} (background pixels) (note, f(m,n) is the value
of the pixel located in the mth column, nth row)
3. The average of each set is computed.
1. m1 = average value of G1

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 2


GGSIPU
Case Study on Machine Vision 2011

2. m2 = average value of G2
4. A new threshold is created that is the average of m1 and m2
1. T’ = (m1 + m2)/2
5. Go back to step two, now using the new threshold computed in step four, keep
repeating until the new threshold matches the one before it (i.e. until
convergence has been reached).

This iterative algorithm is a special one-dimensional case of the k-means clustering


algorithm, which has been proven to converge at a local minimum—meaning that a
different initial threshold may give a different final result.

2.4 Adaptive thresholding

Thresholding is called adaptive thresholding when a different threshold is used for


different regions in the image. This may also be known as local or dynamic
thresholding.

2.5 Categorizing thresholding Methods

Sezgin and Sankur (2004) categorize thresholding methods into the following six
groups based on the information the algorithm manipulates:

1. "histogram shape-based methods, where, for example, the peaks, valleys and
curvatures of the smoothed histogram are analyzed
2. clustering-based methods, where the gray-level samples are clustered in two
parts as background and foreground (object), or alternately are modeled as a
mixture of two Gaussians
3. Entropy-based methods result in algorithms that use the entropy of the
foreground and background regions, the cross-entropy between the original
and binarized image, etc.
4. Object attribute-based methods search a measure of similarity between the
gray-level and the binarized images, such as fuzzy shape similarity, edge
coincidence, etc.
5. spatial methods [that] use higher-order probability distribution and/or
correlation between pixels
6. Local methods adapt the threshold value on each pixel to the local image
characteristics."

2.6 Multiband thresholding

Colour images can also be thresholded.One approach is to designate a separate


threshold for each of the RGB components of the image and then combine them with
an AND operation. This reflects the way the camera works and how the data is stored
in the computer, but it does not correspond to the way that people recognize colour.
Therefore, the HSL and HSV colour models are more often used. It is also possible to
use the CMYK colour model

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 3


GGSIPU
Case Study on Machine Vision 2011

CHAPTER 3
SEGMENTATION
3.1 What is segmentation?

In computer vision, segmentation refers to the process of partitioning a digital image


into multiple segments (sets of pixels, also known as super pixels). The goal of
segmentation is to simplify or change the representation of an image into something
that is more meaningful and easier to analyze. Image segmentation is typically used to
locate objects and boundaries (lines, curves, etc.) in images. More precisely, image
segmentation is the process of assigning a label to every pixel in an image such that
pixels with the same label share certain visual characteristics.

The result of image segmentation is a set of segments that collectively cover the entire
image, or a set of contours extracted from the image (see edge detection). Each of the
pixels in a region are similar with respect to some characteristic or computed
property, such as colour, intensity, or texture. Adjacent regions are significantly
different with respect to the same characteristic(s).[1] When applied to a stack of
images, typical in Medical imaging, the resulting contours after image segmentation
can be used to create 3D reconstructions with the help of interpolation algorithms like
marching cubes.

Some of the practical applications of image segmentation are:

1. Medical imaging

• Locate tumors and other pathologies


• Measure tissue volumes
• Computer-guided surgery
• Diagnosis
• Treatment planning
• Study of anatomical structure

2. Locate objects in satellite images (roads, forests, etc.)


3. Face recognition
4. Fingerprint recognition
5. Traffic control systems
6. Brake light detection
7. Machine vision
8. Agricultural imaging – crop disease detection

3.2 Thresholding

The simplest method of image segmentation is called the thresholding method. This
method is based on a clip-level (or a threshold value) to turn a gray-scale image into a
binary image. The key of this method is to select the threshold value (or values when
multiple-levels are selected). Several popular methods are used in industry including

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 4


GGSIPU
Case Study on Machine Vision 2011

the maximum entropy method, Otsu's method (maximum variance), and et al. k-
means clustering can also be used.

3.3 Histogram-based methods

Histogram-based methods are very efficient when compared to other image


segmentation methods because they typically require only one pass through the pixels.
In this technique, a histogram is computed from all of the pixels in the image, and the
peaks and valleys in the histogram are used to locate the clusters in the image colour
or intensity can be used as the measure.

A refinement of this technique is to recursively apply the histogram-seeking method


to clusters in the image in order to divide them into smaller clusters. This is repeated
with smaller and smaller clusters until no more clusters are formed.

One disadvantage of the histogram-seeking method is that it may be difficult to


identify significant peaks and valleys in the image. In this technique of image
classification distance metric and integrated region matching are familiar.

Histogram-based approaches can also be quickly adapted to occur over multiple


frames, while maintaining their single pass efficiency. The histogram can be done in
multiple fashions when multiple frames are considered. The same approach that is
taken with one frame can be applied to multiple, and after the results are merged,
peaks and valleys that were previously difficult to identify are more likely to be
distinguishable. The histogram can also be applied on a per pixel basis where the
information result is used to determine the most frequent colour for the pixel location.
This approach segments based on active objects and a static environment, results in a
different type of segmentation useful in video tracking.

3.5 Edge Detection

Edge detection is a well-developed field on its own within image processing. Region
boundaries and edges are closely related, since there is often a sharp adjustment in
intensity at the region boundaries. Edge detection techniques have therefore been used
as the base of another segmentation technique.

The edges identified by edge detection are often disconnected. To segment an object
from an image however, one needs closed region boundaries.

3.6 Connected Component Labeling

Connected component labeling (alternatively connected component analysis, blob


extraction, region labeling, blob discovery, or region extraction) is an algorithmic
application of graph theory, where subsets of connected components are uniquely
ladled based on a given heuristic. Connected component labeling is not to be confused
with segmentation.

Connected component labeling is used in computer vision to detect connected regions


in binary digital images, although colour images and data with higher-dimensionality
can also be processed. When integrated into an image recognition system or human-

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 5


GGSIPU
Case Study on Machine Vision 2011

computer interaction interface, connected component labeling can operate on a variety


of information. Blob extraction is generally performed on the resulting binary image
from a thresholding step. Blobs may be counted, filtered, and tracked.

Overview:

4-connectivity 8-connectivity

A graph, containing vertices and connecting edges, is constructed from relevant input
data. The vertices contain information required by the comparison heuristic, while the
edges indicate connected 'neighbours'. An algorithm traverses the graph, labeling the
vertices based on the connectivity and relative values of their neighbours.
Connectivity is determined by the medium; image graphs, for example, can be 4-
connected or 8-connected.

Following the labeling stage, the graph may be partitioned into subsets, after which
the original information can be recovered and processed.

3.7 Algorithms

The algorithms discussed can be generalized to arbitrary dimensions, albeit with


increased time and space complexity.

Two-pass

Relatively simple to implement and understand, the two-pass algorithm iterates


through 2-dimensional, binary data. The algorithm makes two passes over the image:
one pass to record equivalences and assign temporary labels and the second to replace
each temporary label by the label of its equivalence class.

The input data can be modified in situ (which carries the risk of data corruption), or
labeling information can be maintained in an additional data structure.

Connectivity checks are carried out by checking the labels of pixels that are North-
East, North, North-West and West of the current pixel (assuming 8-connectivity). 4-
connectivity uses only North and West neighbours of the current pixel. The following
conditions are checked to determine the value of the label to be assigned to the current
pixel (4-connectivity is assumed)

Conditions to check:

1. Does the pixel to the left (West) have the same value?

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 6


GGSIPU
Case Study on Machine Vision 2011

1. Yes - We are in the same region. Assign the same label to the current
pixel
2. No - Check next condition
2. Do the pixel to the North and West of the current pixel have the same value
but not the same label?
1. Yes - We know that the North and West pixels belong to the same
region and must be merged. Assign the current pixel the minimum of
the North and West labels, and record their equivalence relationship
2. No - Check next condition
3. Does the pixel to the left (West) have a different value and the one to the
North the same value?
1. Yes - Assign the label of the North pixel to the current pixel
2. No - Check next condition
4. Do the pixel's North and West neighbours have different pixel values?
1. Yes - Create a new label id and assign it to the current pixel

The algorithm continues this way, and creates new region labels whenever necessary.
The key to a fast algorithm, however, is how this merging is done. This algorithm
uses the union-find data structure which provides excellent performance for keeping
track of equivalence relationships.[7] Union-find essentially stores labels which
correspond to the same blob in a disjoint-set data structure, making it easy to
remember the equivalence of two labels by the use of an interface method Eg: findSet
(l). findSet (l) returns the minimum label value that is equivalent to the function
argument 'l'.

Once the initial labeling and equivalence recording is completed, the second pass
merely replaces each pixel label with the it's equivalent disjoint-set representative
element.

Raster Scanning Algorithm for connected region extraction is presented below.

On the first pass:

1. Iterate through each element of the data by column, then by row (Raster
Scanning)
2. If the element is not the background
1. Get the neighbouring elements of the current element
2. If there are no neighbours, uniquely label the current element and
continue
3. Otherwise, find the neighbour with the smallest label and assign it to
the current element
4. Store the equivalence between neighbouring labels

On the second pass:

1. Iterate through each element of the data by column, then by row


2. If the element is not the background
1. Re label the element with the lowest equivalent label

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 7


GGSIPU
Case Study on Machine Vision 2011

Here, the background is a classification, specific to the data, used to distinguish


salient elements from the foreground. If the background variable is omitted, then the
two-pass algorithm will treat the background as another region.

Graphical Example of Two-pass Algorithm

1. The array from which connected regions are to be extracted is given below

2. After the first pass, the following labels are generated. Note that a total of 7 labels
are generated in accordance with the conditions highlighted above.

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 8


GGSIPU
Case Study on Machine Vision 2011

The label equivalence relationships generated are,

Set ID Equivalent Labels


1 1,2
2 1,2
3 3,4,5,6,7
4 3,4,5,6,7
5 3,4,5,6,7
6 3,4,5,6,7
7 3,4,5,6,7

3. Array generated after the merging of labels is carried out. Here, the label value that
was the smallest for a given region "floods" throughout the connected region and
gives two distinct labels, and hence two distinct labels.

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 9


GGSIPU
Case Study on Machine Vision 2011

4. Final result in colour to clearly see two different regions that have been found in
the array.

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 10


GGSIPU
Case Study on Machine Vision 2011

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 11


GGSIPU
Case Study on Machine Vision 2011

CHAPTER 4
PATTERN RECOGNITION
4.1 What is pattern recognition?

In machine learning, pattern recognition is the assignment of some sort of output


value (or label) to a given input value (or instance), according to some specific
algorithm. An example of pattern recognition is classification, which attempts to
assign each input value to one of a given set of classes (for example, determine
whether a given email is "spam" or "non-spam"). However, pattern recognition is a
more general problem that encompasses other types of output as well. Other examples
are regression, which assigns a real-valued output to each input; sequence labeling,
which assigns a class to each member of a sequence of values (for example, part of
speech tagging, which assigns a part of speech to each word in an input sentence); and
parsing, which assigns a parse tree to an input sentence, describing the syntactic
structure of the sentence.

Pattern recognition algorithms generally aim to provide a reasonable answer for all
possible inputs and to do "fuzzy" matching of inputs. This is opposed to pattern
matching algorithms, which look for exact matches in the input with pre-existing
patterns. A common example of a pattern-matching algorithm is regular expression
matching, which looks for patterns of a given sort in textual data and is included in
the search capabilities of many text editors and word processors. In contrast to pattern
recognition, pattern matching is generally not considered a type of machine learning,
although pattern-matching algorithms (especially with fairly general, carefully
tailored patterns) can sometimes succeed in providing similar-quality output to the
sort provided by pattern-recognition algorithms.

Pattern recognition is studied in many fields, including psychology, psychiatry,


ethology, cognitive science and computer science.

4.2 Overview

Pattern recognition is generally categorized according to the type of learning


procedure used to generate the output value. Supervised learning assumes that a set of
training data has been provided, consisting of a set of instances that have been
properly labelled by hand with the correct output. A learning procedure then generates
a model that attempts to meet two sometimes conflicting objectives: Perform as well
as possible on the training data, and generalize as well as possible to new data
(usually, this means being as simple as possible, for some technical definition of
"simple", in accordance with Occam's Razor). Unsupervised learning, on the other
hand, assumes training data that has not been hand-labelled, and attempts to find
inherent patterns in the data that can then be used to determine the correct output
value for new data instances. A combination of the two that has recently been
explored is semi-supervised learning, which uses a combination of labelled and
unlabeled data (typically a small set of labelled data combined with a large amount of

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 12


GGSIPU
Case Study on Machine Vision 2011

unlabeled data). Note that in cases of unsupervised learning, there may be no training
data at all to speak of; in other words, the data to be labelled is the training data.

Note that sometimes different terms are used to describe the corresponding supervised
and unsupervised learning procedures for the same type of output. For example, the
unsupervised equivalent of classification is normally known as clustering, based on
the common perception of the task as involving no training data to speak of, and of
grouping the input data into clusters based on some inherent similarity measure (e.g.
the distance between instances, considered as vectors in a multi-dimensional vector
space), rather than assigning each input instance into one of a set of pre-defined
classes. Note also that in some fields, the terminology is different: For example, in
community ecology, the term "classification" is used to refer to what is commonly
known as "clustering".

The piece of input data for which an output value is generated is formally termed an
instance. The instance is formally described by a vector of features, which together
constitute a description of all known characteristics of the instance. (These feature
vectors can be seen as defining points in an appropriate multidimensional space, and
methods for manipulating vectors in vector spaces can be correspondingly applied to
them, such as computing the dot product or the angle between two vectors.) Typically,
features are either categorical (also known as nominal, i.e. consisting of one of a set of
unordered items, such as a gender of "male" or "female", or a blood type of "A", "B",
"AB" or "O"), ordinal (consisting of one of a set of ordered items, e.g. "large",
"medium" or "small"), integer-valued (e.g. a count of the number of occurrences of a
particular word in an email) or real-valued (e.g. a measurement of blood pressure).
Often, categorical and ordinal data are grouped together; likewise for integer-valued
and real-valued data. Furthermore, many algorithms work only in terms of categorical
data and require that real-valued or integer-valued data be discretized into groups (e.g.
less than 5, between 5 and 10, or greater than 10).

Many common pattern recognition algorithms are probabilistic in nature, in that they
use statistical inference to find the best label for a given instance. Unlike other
algorithms, which simply output a "best" label, often times probabilistic algorithms
also output a probability of the instance being described by the given label. In
addition, many probabilistic algorithms output a list of the N-best labels with
associated probabilities, for some value of N, instead of simply a single best label.
When the number of possible labels is fairly small (e.g. in the case of classification),
N may be set so that the probability of all possible labels is output. Probabilistic
algorithms have many advantages over non-probabilistic algorithms:

1. They output a confidence value associated with their choice. (Note that some
other algorithms may also output confidence values, but in general, only for
probabilistic algorithms are this value mathematically grounded in probability
theory. Non-probabilistic confidence values can in general not be given any
specific meaning, and only used to compare against other confidence values
output by the same algorithm.)
2. Correspondingly, they can abstain when the confidence of choosing any
particular output is too low.
3. Because of the probabilities output, probabilistic pattern-recognition
algorithms can be more effectively incorporated into larger machine-learning

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 13


GGSIPU
Case Study on Machine Vision 2011

tasks, in a way that partially or completely avoids the problem of error


propagation.

Techniques to transform the raw feature vectors are sometimes used prior to
application of the pattern-matching algorithm. For example, feature extraction
algorithms attempt to reduce a large-dimensionality feature vector into a smaller-
dimensionality vector that is easier to work with and encodes less redundancy, using
mathematical techniques such as principal components analysis (PCA). Feature
selection algorithms, attempt to directly prune out redundant or irrelevant features.
The distinction between the two is that the resulting features after feature extraction
has taken place are of a different sort than the original features and may not easily be
interpretable, while the features left after feature selection are simply a subset of the
original features.

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 14


GGSIPU
Case Study on Machine Vision 2011

CHAPTER 5
BARCODE

5.1 What is barcode?

A barcode is an optical machine-readable representation of data, which shows data


about the object to which it attaches. Originally, barcodes represented data by varying
the widths and spacings of parallel lines, and may be referred to as linear or 1
dimensional (1D). Later they evolved into rectangles, dots, hexagons and other
geometric patterns in 2 dimensions (2D). Although 2D systems use a variety of
symbols, they are generally referred to as barcodes as well. Barcodes originally were
scanned by special–optical scanners called barcode readers, scanners and interpretive
software are available on devices including desktop printers and smartphones.

The first use of barcodes was to label railroad cars, but they were not commercially
successful until they were used to automate supermarket checkout systems, a task for
which they have become almost universal. Their use has spread to many other tasks
that are generically referred to as Auto ID Data Capture (AIDC). The very first
scanning of the now ubiquitous Universal Product Code (UPC) barcode was on a pack
of Wrigley Company chewing gum in June 1974.

5.2 Scanners (barcode readers)

The earliest, and still the cheapest, barcode scanners are built from a fixed light and a
single photosensor that is manually "scrubbed" across the barcode.

Barcode scanners can be classified into three categories based on their connection to
the computer. The older type is the RS-232 barcode scanner. This type requires
special programming for transferring the input data to the application program.

"Keyboard interface scanners" connect to a computer using a PS/2 or AT keyboard–


compatible adaptor cable. The barcode's data is sent to the computer as if it had been
typed on the keyboard.

Like the keyboard interface scanner, USB scanners are easy to install and do not need
custom code for transferring input data to the application program.

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 15


GGSIPU
Case Study on Machine Vision 2011

CHAPTER 6
EDGE DETECTION
6.1 What is edge detection?

Edge detection is a fundamental tool in image processing and computer vision,


particularly in the areas of feature detection and feature extraction, which aim at
identifying points in a digital image at which the image brightness changes sharply or,
more formally, has discontinuities.

6.2 Edge properties

The edges extracted from a two-dimensional image of a three-dimensional scene can


be classified as either viewpoint dependent or viewpoint independent. A viewpoint
independent edge typically reflects inherent properties of the three-dimensional
objects, such as surface markings and surface shape. A viewpoint dependent edge
may change as the viewpoint changes, and typically reflects the geometry of the
scene, such as objects occluding one another.

A typical edge might for instance be the border between a block of red colour and a
block of yellow. In contrast a line (as can be extracted by a ridge detector) can be a
small number of pixels of a different colour on an otherwise unchanging background.
For a line, there may therefore usually be one edge on each side of the line.

6.3 A simple edge model

Although certain literature has considered the detection of ideal step edges, the edges
obtained from natural images are usually not at all ideal step edges. Instead they are
normally affected by one or several of the following effects:

1. Focal blur caused by a finite depth-of-field and finite point spread function.
2. Penumbral blur caused by shadows created by light sources of non-zero
radius.
3. Shading at a smooth object

A number of researchers have used a Gaussian smoothed step edge (an error function)
as the simplest extension of the ideal step edge model for modeling the effects of edge
blur in practical applications.[3][5] Thus, a one-dimensional image f which has exactly
one edge placed at x = 0 may be modeled as:

At the left side of the edge, the intensity is , and right of the edge
it is . The scale parameter σ is called the blur scale of the edge.

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 16


GGSIPU
Case Study on Machine Vision 2011

6.4 Why edge detection is a non-trivial task?

To illustrate why edge detection is not a trivial task, consider the problem of detecting
edges in the following one-dimensional signal. Here, we may intuitively say that there
should be an edge between the 4th and 5th pixels.

5 7 6 4 152 148 149

If the intensity difference were smaller between the 4th and the 5th pixels and if the
intensity differences between the adjacent neighbouring pixels were higher, it would
not be as easy to say that there should be an edge in the corresponding region.
Moreover, one could argue that this case is one in which there are several edges.

5 7 6 41 113 148 149

Hence, to firmly state a specific threshold on how large the intensity change between
two neighbouring pixels must be for us to say that there should be an edge between
these pixels is not always simple.[3] Indeed, this is one of the reasons why edge
detection may be a non-trivial problem unless the objects in the scene are particularly
simple and the illumination conditions can be well controlled (see for example, the
edges extracted from the image with the girl above).

6.5 Approaches

There are many methods for edge detection, but most of them can be grouped into two
categories, search-based and zero-crossing based. The search-based methods detect
edges by first computing a measure of edge strength, usually a first-order derivative
expression such as the gradient magnitude, and then searching for local directional
maxima of the gradient magnitude using a computed estimate of the local orientation
of the edge, usually the gradient direction. The zero-crossing based methods search
for zero crossings in a second-order derivative expression computed from the image
in order to find edges, usually the zero-crossings of the Laplacian or the zero-
crossings of a non-linear differential expression. As a pre-processing step to edge
detection, a smoothing stage, typically Gaussian smoothing, is almost always applied
(see also noise reduction).

The edge detection methods that have been published mainly differ in the types of
smoothing filters that are applied and the way the measures of edge strength are
computed. As many edge detection methods rely on the computation of image
gradients, they also differ in the types of filters used for computing gradient estimates
in the x- and y-directions.

6.6 Canny edge detection

John Canny considered the mathematical problem of deriving an optimal smoothing


filter given the criteria of detection, localization and minimizing multiple responses to
NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 17
GGSIPU
Case Study on Machine Vision 2011

a single edge. He showed that the optimal filter given these assumptions is a sum of
four exponential terms. He also showed that this filter can be well approximated by
first-order derivatives of Gaussians. Canny also introduced the notion of non-
maximum suppression, which means that given the presmoothing filters, edge points
are defined as points where the gradient magnitude assumes a local maximum in the
gradient direction. Looking for the zero crossing of the 2nd derivative along the
gradient direction was first proposed by Haralick. It took less than two decades to find
a modern geometric variational meaning for that operator that links it to the Marr-
Hildreth (zero crossing of the Laplacian) edge detector. That observation was
presented by Ron Kimmel and Alfred Bruckstein.

Although his work was done in the early days of computer vision, the Canny edge
detector (including its variations) is still a state-of-the-art edge detector. Unless the
preconditions are particularly suitable, it is hard to find an edge detector that performs
significantly better than the Canny edge detector.

The Canny-Deriche detector was derived from similar mathematical criteria as the
Canny edge detector, although starting from a discrete viewpoint and then leading to a
set of recursive filters for image smoothing instead of exponential filters or Gaussian
filters.

The differential edge detector described below can be seen as a reformulation of


Canny's method from the viewpoint of differential invariants computed from a scale-
space representation leading to a number of advantages in terms of both theoretical
analysis and sub-pixel implementation.

Other first-order methods

For estimating image gradients from the input image or a smoothed version of it,
different gradient operators can be applied. The simplest approach is to use central
differences:

corresponding to the application of the following filter masks to the image data:

The well-known and earlier Sobel operator is based on the following filters:

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 18


GGSIPU
Case Study on Machine Vision 2011

Given such estimates of first- order derivatives, the gradient magnitude is then
computed as:

while the gradient orientation can be estimated as

Other first-order difference operators for estimating image gradient have been
proposed in the Prewitt operator and Roberts cross.

6.7 Thresholding and Linking

Once we have computed a measure of edge strength (typically the gradient


magnitude), the next stage is to apply a threshold, to decide whether edges are present
or not at an image point. The lower the threshold, the more edges will be detected,
and the result will be increasingly susceptible to noise and detecting edges of
irrelevant features in the image. Conversely a high threshold may miss subtle edges,
or result in fragmented edges.

If the edge thresholding is applied to just the gradient magnitude image, the resulting
edges will in general be thick and some type of edge thinning post-processing is
necessary. For edges detected with non-maximum suppression however, the edge
curves are thin by definition and the edge pixels can be linked into edge polygon by
an edge linking (edge tracking) procedure. On a discrete grid, the non-maximum
suppression stage can be implemented by estimating the gradient direction using first-
order derivatives, then rounding off the gradient direction to multiples of 45 degrees,
and finally comparing the values of the gradient magnitude in the estimated gradient
direction.

A commonly used approach to handle the problem of appropriate thresholds for


thresholding is by using thresholding with hysteresis. This method uses multiple
thresholds to find edges. We begin by using the upper threshold to find the start of an
edge. Once we have a start point, we then trace the path of the edge through the image
pixel by pixel, marking an edge whenever we are above the lower threshold. We stop
marking our edge only when the value falls below our lower threshold. This approach
makes the assumption that edges are likely to be in continuous curves, and allows us
to follow a faint section of an edge we have previously seen, without meaning that
every noisy pixel in the image is marked down as an edge. Still, however, we have the
problem of choosing appropriate thresholding parameters, and suitable thresholding
values may vary over the image.

6.8 Edge thinning

Edge thinning is a technique used to remove the unwanted spurious points on the edge
of an image. This technique is employed after the image has been filtered for noise
(using median, Gaussian filter etc.), the edge operator has been applied (like the ones
described above) to detect the edges and after the edges have been smoothed using an

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 19


GGSIPU
Case Study on Machine Vision 2011

appropriate threshold value. This removes all the unwanted points and if applied
carefully, results in one pixel thick edge elements.

Advantages:

1. Sharp and thin edges lead to greater efficiency in object recognition.


2. If Hough transforms are used to detect lines and ellipses, then thinning could
give much better results.
3. If the edge happens to be boundary of a region, then thinning could easily give
the image parameters like perimeter without much algebra.

There are many popular algorithms used to do this, one such is described below:

1) Choose a type of connectivity, like 8, 6 or 4.

2) 8 connectivity is preferred, where all the immediate pixels surrounding a particular


pixel are considered.

3) Remove points from North, south, east and west.

4) Do this in multiple passes, i.e. after the north pass, use the same semi processed
image in the other passes and so on.

5) Remove a point if:

The point has no neighbours in the North

The point is not the end of a line.

The point is isolated.


Removing the points will not cause to disconnect its neighbours in any way.

6) Else keep the point. The number of passes across direction should be chosen
according to the level of accuracy desired.

Second-order approaches to edge detection

Some edge-detection operators are instead based upon second-order derivatives of the
intensity. This essentially captures the rate of change in the intensity gradient. Thus,
in the ideal continuous case, detection of zero-crossings in the second derivative
captures local maxima in the gradient.

The early Marr-Hildreth operator is based on the detection of zero-crossings of the


Laplacian operator applied to a Gaussian-smoothed image. It can be shown, however,
that this operator will also return false edges corresponding to local minima of the
gradient magnitude. Moreover, this operator will give poor localization at curved
edges. Hence, this operator is today mainly of historical interest.

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 20


GGSIPU
Case Study on Machine Vision 2011

6.9 Differential edge detection

A more refined second-order edge detection approach which automatically detects


edges with sub-pixel accuracy uses the following differential approach of detecting
zero-crossings of the second-order directional derivative in the gradient direction:

Following the differential geometric way of expressing the requirement of non-


maximum suppression proposed by Lindeberg, let us introduce at every image point a
local coordinate system (u,v), with the v-direction parallel to the gradient direction.
Assuming that the image has been pre-smoothed by Gaussian smoothing and a scale-
space representation L(x,y;t) at scale t has been computed, we can require that the
gradient magnitude of the scale-space representation, which is equal to the first-order
directional derivative in the v-direction Lv, should have its first order directional
derivative in the v-direction equal to zero

while the second-order directional derivative in the v-direction of Lv should be


negative, i.e.

Written out as an explicit expression in terms of local partial derivatives Lx, Ly ... Lyyy,
this edge definition can be expressed as the zero-crossing curves of the differential
invariant

that satisfy a sign-condition on the following differential invariant

where Lx, Ly ... Lyyy denote partial derivatives computed from a scale-space
representation L obtained by smoothing the original image with a Gaussian kernel. In
this way, the edges will be automatically obtained as continuous curves with sub-pixel
accuracy. Hysteresis thresholding can also be applied to these differential and
subpixel edge segments.

In practice, first-order derivative approximations can be computed by central


differences as described above, while second-order derivatives can be computed from
the scale-space representation L according to:

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 21


GGSIPU
Case Study on Machine Vision 2011

corresponding to the following filter masks:

Higher-order derivatives for the third-order sign condition can be obtained in an


analogous fashion.

6.10 Phase congruency-based edge detection

A recent development in edge detection techniques takes a frequency domain


approach to finding edge locations. Phase congruency (also known as phase
coherence) methods attempt to find locations in an image where all sinusoids in the
frequency domain are in phase. These locations will generally correspond to the
location of a perceived edge, regardless of whether the edge is represented by a large
change in intensity in the spatial domain. A key benefit of this technique is that it
responds strongly to Mach bands, and avoids false positives typically found around
roof edges. A roof edge, is a discontinuity in the first order derivative of a grey-level
profile.

CHAPTER 7
TEMPLATE MACHINING

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 22


GGSIPU
Case Study on Machine Vision 2011

7.1 What is template machining?

Template matching is a technique in digital image processing for finding small parts
of an image which match a template image. It can be used in manufacturing as a part
of quality control, a way to navigate a mobile robot, or as a way to detect edges in
images.

Template matching can be subdivided between two approaches: feature-based and


template-based matching. The feature-based approach uses the features of the search
and template image, such as edges or corners, as the primary match-measuring
metrics to find the best matching location of the template in the source image. The
template-based, or global, approach, uses the entire template, with generally a sum-
comparing metric (using SAD, SSD, cross-correlation, etc.) that determines the best
location by testing all or a sample of the viable test locations within the search image
that the template image may match up to.

7.2 Feature-based approach

If the template image has strong features, a feature-based approach may be


considered; the approach may prove further useful if the match in the search image
might be transformed in some fashion. Since this approach does not consider the
entirety of the template image, it can be more computationally efficient when working
with source images of larger resolution, as the alternative approach, template-based,
may require searching potentially large amounts of points in order to determine the
best matching location.

7.3 Template-based approach

For templates without strong features, or for when the bulk of the template image
constitutes the matching image, a template-based approach may be effective. As
aforementioned, since template-based template matching may potentially require
sampling of a large number of points, it is possible to reduce the number of sampling
points by reducing the resolution of the search and template images by the same factor
and performing the operation on the resultant downsized images (multiresolution, or
pyramid, image processing), providing a search window of data points within the
search image so that the template does not have to search every viable data point, or a
combination of both.

7.4 Motion tracking and occlusion handling

In instances where the template may not provide a direct match, it may be useful to
implement the use of eigenspaces – templates that detail the matching object under a
number of different conditions, such as varying perspectives, illuminations, colour
contrasts, or acceptable matching object “poses”. For example, if the user was looking
for a face, the eigenspaces may consist of images (templates) of faces in different
positions to the camera, in different lighting conditions, or with different expressions.

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 23


GGSIPU
Case Study on Machine Vision 2011

It is also possible for the matching image to be obscured, or occluded by an object; in


these cases, it is unreasonable to provide a multitude of templates to cover each
possible occlusion. For example, the search image may be a playing card, and in some
of the search images, the card is obscured by the fingers of someone holding the card,
or by another card on top of it, or any object in front of the camera for that matter. In
cases where the object is malleable or poseable, motion also becomes a problem, and
problems involving both motion and occlusion become ambiguous. In these cases,
one possible solution is to divide the template image into multiple sub-images and
perform matching on each subdivision.

7.5 Template-based matching and convolution

A basic method of template matching uses a convolution mask (template), tailored to


a specific feature of the search image, which we want to detect. This technique can be
easily performed on grey images or edge images. The convolution output will be
highest at places where the image structure matches the mask structure, where large
image values get multiplied by large mask values.

This method is normally implemented by first picking out a part of the search image
to use as a template: We will call the search image S(x, y), where (x, y) represent the
coordinates of each pixel in the search image. We will call the template T(x t, y t),
where (xt, yt) represent the coordinates of each pixel in the template. We then simply
move the center (or the origin) of the template T(x t, y t) over each (x, y) point in the
search image and calculate the sum of products between the coefficients in S(x, y) and
T(xt, yt) over the whole area spanned by the template. As all possible positions of the
template with respect to the search image are considered, the position with the highest
score is the best position. This method is sometimes referred to as 'Linear Spatial
Filtering' and the template is called a filter mask.

For example, one way to handle translation problems on images, using template
matching is to compare the intensities of the pixels, using the SAD (Sum of absolute
differences) measure.

A pixel in the search image with coordinates (xs, ys) has intensity Is(xs, ys) and a pixel
in the template with coordinates (xt, yt) has intensity It(xt , yt ). Thus the absolute
difference in the pixel intensities is defined as Diff(xs, ys, x t, y t) = | Is(xs, ys) – It(x t, y
t) ./

The mathematical representation of the idea about looping through the pixels in the
search image as we translate the origin of the template at every pixel and take the
SAD measure is the following:

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 24


GGSIPU
Case Study on Machine Vision 2011

Srows and Scols denote the rows and the columns of the search image and Trows and Tcols
denote the rows and the columns of the template image, respectively. In this method
the lowest SAD score gives the estimate for the best position of template within the
search image. The method is simple to implement and understand, but it is one of the
slowest methods.

Example

+ =

REFERENCES:
1. Introduction to Robotics By Saeed B.Niku

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 25


GGSIPU
Case Study on Machine Vision 2011

2. Industrial Robotics By Mikeell P.Groover.

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 26


GGSIPU

También podría gustarte