Está en la página 1de 4

International Journal of Advanced Computer Science, Vol. 2, No. 5, Pp. 193-196, May, 2012.

Object Recognition Using SIFT Feature Over Fastly Extracted Salient Map
Ali Zia
Abstract To recognize or classify objects as human mind does have been one of chief goals of computer vision, in this paper we discussed one such approach which have been inspired by human vision to recognize objects. This paper presents a novel approach for object recognition by using SIFT feature laid over a salient map, thus giving us only those feature points which are important to human eye in an image. By getting those points we get a description of particular object(s) from an image and then by matching those descriptions we identified type or class of particular Object. We demonstrated our approach on real world database and the results shows the promising significance of the method proposed in this paper.

Manuscript
Received: 18,Jun., 2011 Revised: 7,Sep.,2011 Accepted: 25,Mar.,2012 Published: 15,Jun.,2012

Keywords
Object Recognition Salient Map SIFT

determining SIFT features of only salient parts is that our focus remain on foreground objects in an image instead of relatively less important background. As we get SIFT points of only important part so they are less in number to as compared to SIFT point of all image hence we get more short and efficient descriptor. The descriptors are later matched to get classification. The remainder of paper is organized as follow. Section 2 describes the proposed method. Section 3 elaborates the experimental result on Oxford flower database and section 4 describes conclusion and future work.

2. Object Recognition
The basic theme of this research is extract important or salient regions from the training images, get SIFT feature points from those parts and store description and class label of those feature points in database. This process is shown in fig. 1. The same process is then performed on query image to get description of the concerned object then this description is matched with database descriptions by calculating distances and establishing scores among them. The query is classified as belonging to same class as that of best scored description. This process of classification is shown in fig. 2. The first phase of the process is to extract saliency map from the image. Saliency refers to as areas that are mainly emphasized by human perception. So saliency detection basically unclutter the image by separating highly perceptive areas (foreground) from background. Clear distinction between background and foreground is achieved further by binirization. The binirized image is them overlaid over original image so as to only show the foreground parts. SIFT features are then extracted from that overlaid image using david lowe method [13]-[15]. Gradient description of these points is then extracted and the feature vector and descriptor vectors are then stored in database along with their class label. The distance distances between query image description and description of images stored is measured to get the match score. Query image is labeled as of same class as that highest scored database image. Rest of section is organized as follows. Section 2.A explains the saliency process in detail, Section 2.B give account on SIFT feature and describe how appropriate object description is get from salient map and SIFT feature and finally section 2.C explain how image is matched and classified.

1. Introduction
Object identification and classification has been active topic for researchers in field computer vision, Robotics and Pattern recognitions. Although Number of commercial system such as such as QBIC (Query by Image Content) [1], Four Eyes [2] and SQUID (Shape Queries Using Image Databases) [3] are present, which can perform Object identification or content based retrieval, but the question of compact object description and fast matching and retrieval still remains open problem. In General Object recognition and classification techniques [4]-[7] use summary of objects in an image by utilizing unique visual hints [8]-[10]. The representation of those visual hints or features is known as description. Moreover Image or object description means to represent them in such a form that can easily be compared for example represent them as vectors or group of numbers. This description is later matched with dataset that already contains description of other images to get the best match. Our method uses salient map [9], [11]-[12] to identify the part of image that appears to be important in the image. Although there are number of Saliency Detection available but most of time they take lots of time to compute salient areas. The feature of Saliency detection algorithm used in our Approach is that it is quite fast. After determining salient parts we determine the SIFT [13]-[15] feature points from them. The advantage of
Ali Zia is with the Center for Intelligent Machine and Robotics, Department of computer science COMSATS University Lahore, Pakistan.

International Journal Publishers Group (IJPG)

194

International Journal of Advanced Computer Science, Vol. 2, No. 5, Pp. 193-196, May, 2012.

A. Saliency Map Extraction The Saliency detection method used in this paper is a slight variation of method proposed in zia and huang method [16] .The method theorized that image has some important

Fig. 2. Flow chart of classification method.

Fig. 1. Flow chart of proposed training method.

or perceptive parts whereas others are clutter that must be separated from those parts. In fig.3 (a) flower is the main focus of the image whereas the blurred grassy foliage is rather less important. From the observation the method concludes that conspicuous areas in an image are generally those with a high frequency or edges and a large amount of contrast information, whereas flat areas generally contain less meaningful information like grassy foliage in fig.3 (a). On the basis of above argument, the method focuses on a filter that has the ability to segregate the high frequency parts that include contrast and edge information from the low-frequency parts including plain or blurred regions of the image. Saliency method used takes theCornerness of the pixel in the image as a measurement of their high frequency information content. TheCornerness, as presented in [17], for a pixel with coordinates u on the image lattice is defined as

where Ix and Iy are the derivatives of image at u in the x and y directions on the image lattice, i.e. row and column wise, u is the set of pixels corresponding to neighborhood centered at u and is a Guassian kernel function wit h variance .

where k is constant and M is 2x2 matrix calculated from derivatives of the image such that

Consequently, we can consider the matrix M as a weighted linear combination of the variations in the image, as captured by parallel derivatives, across a neighborhood centered at the pixel of interest. In fig.3(b), we present an example of the R(u) values of the input image in fig.3(a), where pixels in the edge and high contrast regions of the image have higher "Cornerness" values. It is worth emphasizing that as we can use convolution to efficiently compute both the image derivatives and matrix M, this reveals the possibility of using equation 1 as a computationally inexpensive saliency detector. Therefore, in our technique, we take the values produced by Equation 1 as a saliency map that can then be binarised in order to separate the foreground features from the background clutter. We have used this saliency detector because of its quickness other saliency dectors like Itti [18] take more time than this one. A. Object Description To draw a clear boundary between foreground and background the salient image in binarized. In binarization
International Journal Publishers Group (IJPG)

Ali Zia: Object Recognition Using SIFT Feature Over Fastly Extracted Salient Map.

195

Fig. 3. From left-to-right: First panel: Original Image; Second panel: Biniarized salient map in the; Third panel: saliency map overlaid over original image; Fourth panel: SIFT feature detection on whole image; Fifth panel: SIFT feature detection over salient areas

process to determine the threshold value, it can be done automatically by using different number or algorithms or it can be manually supplied. In our case the automatic thresholding was not giving that accurate result so we supplied the manual value of 0.03. The result of binarization is shown in Fig. 3(b) The binarized image clearly show that the "white" part in image is salient part whereas black is background. The binarized salient image is overlaid on the original image to get the original intensities as shown in fig. Now local features can be extracted from the salient parts. The reasons for choosing SIFT as a local feature for our proposed method is that it is basically invariant to image translation, scaling, and rotation, partially invariant to illumination changes and robust to local geometric distortion which in turn gives it a high accuracy ratio. The basic theory behind identifying SIFT key points is that the maxima and minima result of difference of Gaussians function is applied to scale-space series of smoothed and resampled images. Dominant orientations of edge responses are assigned to localized key points and low contrast candidate point and edge response along an edge are discarded. SIFT descriptors in form of vector is then obtained by considering pixels around a radius of the key location. Although SIFT is very accurate as compared to most of other simple local feature but still there are two point that should be noted if SIFT is applied on whole image instead of object. As shown in fig if SIFT is applied on whole the image most of the time it identifies those point which are part of background and thus have no relation to object itself,
International Journal Publishers Group (IJPG)

which renders useless information with respect to object. The problem of object focusing can be avoided if static background is taken and then removed later to train the system. But that sort of training can only be done in simulated or artificially maintained environment and it is not applicable in real world training problem. Our saliency can be used in real times images and videos. Second thing that can be observed if we apply SIFT on whole image is that as there are more feature points so the descriptors are also great in number hence a single image would take more time in feature extraction and descriptor matching stage. The comparison among SIFT applied on whole image and SIFT applied on salient parts in shown in fig and fig respectively. After Applying SIFT on salient parts the result is in form of two vectors feature vector and descriptor vector. Feature vector has all the value and positions of feature points and descriptor vector have their respective descriptors. The descriptor vectors give the relatively good description of objects detected in an image. This description along with name label of the class of that object is saved in data base, so that later it can be used in classification process. For example if a flower belongs to rose class then label rose is save along with salient -sift object description. C. Classification The query image description is get from the above mentioned method and then is matched with the description of a particular image that is already stored in database. The matching between descriptors is achieved with the method proposed by D. Lowe [13] which rejects matches that are

196

International Journal of Advanced Computer Science, Vol. 2, No. 5, Pp. 193-196, May, 2012.

too ambiguous. The match algorithm returns the feature point descriptor matches and also the squared Euclidean distance between those matches. Then the mean score of the matches is calculated to associate it with the particular image.
TABLE 1 PERFOMANCE COMPARISON
Classification Accuracy

References
[1] W. N. et al., "The qbic project: Querying images by content using color, texture and shape," In In Proc. SPIE Conference on Storage and Retrieval of Image and Video Databases, vol. 2, pp. 173187, 1993. R. W. Picard. Light-years from lena: "Video and image libraries and the future," In International Conference on Image Processing, vol. 1, pp. 310313, 1995. S. A. M. Farzin & J. Kittler, "robust and efficient shape indexing through curvature scale space," British Machine Vision Conference, vol. 1, pp. 53-62, 1996. A. Robles-Kelly, "A quasi-random sampling approach to image retrieval," pp. 18, 2008. D. Nister & H. Stewenius, "Scalable recognition with a vocabulary tree," In Comp. Vision and Pattern Recognition, pp. 21612168, 2006. J. Sivic & A. Zisserman, "Video google: A text retrieval approach to object matching in videos," In Int. Conference on Computer Vision , pages 14701477, 2003. J. S. M. I. O. Chum, J. Philbin & A. Zisserman, "Total recall: Automatic query expansion with a generative feature model for object retrieval," In Int. Conference on Computer Vision, 2007. L. Fei-Fei & P. Perona, "A bayesian hierarchical model for learning natural scene categories," Comp. Vision and Pattern Recognition, pp. 524531, 2005. J. O. D. G.-P.-T. T. P. Quelhas, F. Monay & L. V. Gool, "Modelling scenes with local descriptors and latent aspects," In Int. Conference on Computer Vision, pp. 883890, 2005. M.-E. Nilsback & A. Zisserman, "A visual vocabulary for flower classification," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 14471454, 2006. R. B. Y. Cohen, "Inferring region salience from binary and gray-level images," In Pattern Recognition, pp. 23492362, 2003. C. Harris & M. Stephens, "On measuring low-level saliency in photographic images," In In Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pages 8489, 2000. D. G. Lowe, "Distinctive image features from scale-invariant keypoints," pp. 91110, 2004. D. G. Lowe, "Object recognition from local scale-invariant features," pp. 11501157, 1999. D. G. Lowe, "Local feature view clustering for 3d object recognition," pp. 682688, 2001. A. R.-K. Jyun-Hao Huang, Ali Zia, & Jun Zhou, "Content-based image retrieval via subspace-projected salient features," In Digital Image Computing: Techniques and Applications, pp. 593599, 2008. C. J. Harris & M. Stephens, "A combined corner and edge detector," 4th Alvey Vision Conference, pp. 147151, 1988. C. K. L. Itti & E. Niebur, "A model of saliency-based visual attention for rapid scene analysis," In IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 12541259, 1998.

[2]

[3]

SIFT + Saliency Saliency + Freq. Hist. Itti et al. + Freq. Hist. Itti et al. + Codebook

56.3% 28.8% 32.4% 35.3%

[4] [5]

The process is repeated for all the images present in database. The best match is determined on basis scores associated with each individual match. The label of the best matched image is retrieved and query image is given same label. Hence query image is classified as belonging to same class as that of best match image.

[6]

[7]

[8]

3. Experiments
This section gives account on the performance of our proposed algorithm. For this purpose we used oxford flower dataset [10]. It has training set of 680 and testing set of 340 images. Both sets have 17 different classes or species of flowers. Each species of flower has some common attributes like shape, color and texture but keep in mind this dataset is bit tricky or difficult in a sense that in some classes although two flowers belong to same species yet there is a significant change in their shape and color. We compared our method with the results that were obtained in zia and huang paper [16]. As it can be seen from the table 1 that the accuracy of the current method is better than then the previously proposed. Also if we talk about efficiency in terms of training and matching time it yields better result. The algorithm can also be tuned with opening and closing morphological operation to remove the possible outliers in the saliency and also threshold value can be adjusted accordingly. There is also possibility to further experiment this algorithm on videos data.

[9]

[10]

[11] [12]

[13] [14] [15] [16]

[17] [18]

4. Conclusions
This paper provides a unique method for object classification in images. The beauty of this algorithm is that it uses accuracy of SIFT method and object detection fastness from salient maps. The experiments are done on real time database which show promising result as compared to other methods.

International Journal Publishers Group (IJPG)

También podría gustarte