Documentos de Académico
Documentos de Profesional
Documentos de Cultura
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 5, OCTOBER 2012
AbstractThe accuracy of a conventional supervised classification is in part a function of the training set used, notably impacted
by the quantity and quality of the training cases. Since it can be
costly to acquire a large number of high quality training cases, recent research has focused on methods that allow accurate classification from small training sets. Previous work has shown the potential of support vector machine (SVM) based classifiers. Here,
the potential of the relevance vector machine (RVM) and sparse
multinominal logistic regression (SMLR) approaches is evaluated
relative to SVM classification. With both airborne and spaceborne
multispectral data sets, the RVM and SMLR were able to derive
classifications of similar accuracy to the SVM but required considerably fewer training cases. For example, from a training set comprising 600 cases acquired with a conventional stratified random
sampling design from an airborne thematic mapper (ATM) data
set, the RVM produced the most accurate classification, 93.75%,
and needed only 7.33% of the available training cases. In comparison, the SVM yielded a classification that had an accuracy of
92.50% and needed 4.5 times more useful training cases. Similarly,
with a Landsat ETM+ (Littleport, Cambridgeshire, UK) data set,
the SVM required 4.0 times more useful training cases than the
RVM. For each data set, however, the classifications derived by
each classifier were of similar magnitude, differing by no more
than 1.25%. Finally, for both the ATM and ETM+ (Littleport)
data sets, the useful training cases by SVM and RVM had distinct and potentially predictable characteristics. Support vectors
were generally atypical but lay in the boundary region between
classes in feature space while the relevance vectors were atypical
but anti-boundary in nature. The SMLR also tended to mostly, but
not always, use extreme cases that lay away from class boundary.
The results, therefore, suggest a potential to design classifier-specific intelligent training data acquisition activities for accurate classification from small training sets, especially with the SVM and
RVM.
Index TermsGround truth, relevance vector machines, sparse
multinomial logistic regression, support vector machines, training
data, typicality.
I. INTRODUCTION
AND cover mapping is one of the most common applications of remote sensing. Land cover maps are produced to
meet the needs of a diverse array of users and are typically derived via some form of image classification analysis, which is,
Manuscript received September 30, 2011; revised February 12, 2012; accepted August 02, 2012. Date of publication October 16, 2012; date of current
version November 14, 2012. This work was supported in part by the Association of Commonwealth Universities (ACU), London, through a fellowship to
M. Pal.
M. Pal is with the Department of Civil Engineering, NIT Kurukshetra,
Haryana, 136119 India (e-mail: mpce_pal@yahoo.co.uk).
G. M. Foody is with the School of Geography, University of Nottingham,
Nottingham, NG7 2RD, U.K.
Digital Object Identifier 10.1109/JSTARS.2012.2215310
PAL AND FOODY: EVALUATION OF SVM, RVM AND SMLR FOR ACCURATE IMAGE CLASSIFICATION WITH LIMITED GROUND DATA
1345
1346
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 5, OCTOBER 2012
If the two classes are not linearly separable, the SVM tries
to find the hyperplane that maximises the margin while, at the
same time, minimising a quantity proportional to the number
of misclassification errors. The restriction that all training cases
of a given class lie on the same side of the optimal hyperplane
can be relaxed by the introduction of a slack variable
and the trade-off between margin and misclassification error is
controlled by a positive user-defined constant such that
[38]. Thus, for non-separable data, (2) can be written as:
where is a set of adjustable weights. For multiclass classification (5) can be written as:
(3)
SVM can also be extended to handle non-linear decision surfaces. [39] propose a method of projecting the input data onto
a high-dimensional feature space through some nonlinear mapping and formulating a linear classification problem in that feature space. Kernel functions are used to reduce the computational cost of dealing with high-dimensional feature space [37].
A kernel function is defined as
and
with the use of a kernel function (1) becomes:
(4)
where
is a Lagrange multiplier.
Further and more detailed discussion on SVM can be found
in [37], [40], [41].
(6)
where
(8)
where the first summation term corresponds to the likelihood
of the class labels and the second term corresponds to the prior
on the parameters . In the resulting solution, the gradient of
with respect to is calculated and only those training cases
having non-zero coefficients , which are called relevance vectors, will contribute to the generation of a decision function. The
posterior is approximated around weights by a Gaussian approximation with
B. RVM
The RVM is a recent development in kernel based machine
learning approaches and can be used as an alternative to SVM
for image classification. The RVM is a possibilistic counterpart
to the SVM, based on a Bayesian formulation of a linear model
with an appropriate prior that results in a sparser representation
than that achieved by SVM. The RVM is based on a hierarchical
prior, where an independent Gaussian prior is defined on the
weight parameters in the first level, and an independent Gamma
hyper prior is used for the variance parameters in the second
level, which leads to model sparseness [32]. An algorithm produces sparse results when among all the coefficients defining
the model only few are non-zero. This property helps in fast
model evaluation and provides a potential for accurate classification from small training sets. Key advantages of the RVM
over the SVM include a reduced sensitivity to the hyperparameter settings, an ability to use non-Mercer kernels, the provision
of a probabilistic output, no need to define the parameter , and
often a requirement for fewer relevance vectors than support
vectors for a particular analysis [31], [32].
In a two class classification by RVM, the aim is, essentially,
to predict the posterior probability of membership for one of the
classes for a given input. A case may then be allocated to the
class with which it has the greatest likelihood of membership.
Using a Bernoulli distribution the likelihood function for the
analysis would be:
(5)
where
PAL AND FOODY: EVALUATION OF SVM, RVM AND SMLR FOR ACCURATE IMAGE CLASSIFICATION WITH LIMITED GROUND DATA
TABLE I
THE MEAN AND STANDARD DEVIATION VALUES OF THE SYNTHETIC DATA
If
is the weight vector associated with class , then the
probability that a given training case belongs to class is given
by
(9)
1347
simulated data to a real data set. As with the simulated data set,
ten-fold cross validation was used with ETM+ (Boston) dataset.
More extensive analyses were undertaken with the remaining
two data sets with the accuracy of the resulting classifications
evaluated against ground data.
The third data set was obtained by Daedalus 1268 airborne
thematic mapper (ATM) for an agricultural test site near
Feltwell, UK. The ATM data were acquired in 3 spectral wavebands, with a spatial resolution of 5 m [46]. The ATM data
were used to classify six different crop types: sugar beet, wheat,
barley, carrot, potato and grass. A map depicting the crop type
planted in each field produced near the time of the ATM data
acquisition was used as ground data to inform the training and
testing of the classifications. The training sets comprised of
100 randomly selected pixels of each class for the analyses of
the ATM data set. The testing set comprised 320 pixels drawn
at random from the test site.
The fourth and final data set used was acquired by the
Landsat ETM+ for an agricultural area near Littleport in
Cambridgeshire, UK. The data in the six-non-thermal spectral
wavebands with a 30 m spatial resolution were used to classify
seven agriculture land cover types: wheat, sugar beet, potato,
onion, peas, lettuce and beans [47].A map depicting the crop
type planted in each field produced near the time of the ETM+
(Littleport) data acquisitions was used as ground data. For
each class, 100 randomly selected pixels were used to train the
classifiers. The accuracy of the classifications was evaluated
using an independent testing set that comprised 1,400 randomly
selected pixels.
For each classification undertaken with the ATM and ETM+
(Littleport) data sets, accuracy was assessed with the aid of a
confusion matrix and expressed as the percentage of the testing
cases correctly allocated. As the potential for accurate classification by the SVM from small training sets has been demonstrated, a desire was to determine if the RVM and SMLR approaches were at least as accurate as the SVM classification,
which may be assessed by a test of non-inferiority. For both the
RVM and SMLR methods, this was evaluated by using the confidence interval of the difference in accuracy obtained from that
observed with the SVM in a test of non-inferiority, which focuses on the lower limit of the defined confidence interval [48],
[49]. In this evaluation it was assumed that the zone of indifference was 2.00%; this value was selected arbitrarily but ensures
that small differences in accuracy are inconsequential. For all
experiments, a personal computer with a Pentium IV processor
and 3 GB of RAM was used.
SVM were initially designed for binary classification problems. A range of methods have been suggested for multi-class
classification [20], [50], [51]. Here, the one against rest, approach with ATM dataset [17], [24] and one against one with
simulated and ETM+ datasets [51] was used. Throughout, a radial basis function kernel with kernel specific parameter ( ) was
used with SVM, RVM and SMLR algorithms. The softwares
LIBSVM and BSVM [50], [52] were used to implement the
SVM whereas software SMLR [33] was used to implement the
sparse multinomial logistic regression classifier. A multiclass
implementation of original RVM codes [32]; [53] was used to
implement RVM classifier. Similar to the parameter required
1348
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 5, OCTOBER 2012
TABLE II
USER DEFINED PARAMETERS WITH ALL FOUR DATASETS USED IN THIS STUDY
TABLE III
MEAN MAHALANOBIS DISTANCE MEASURES COMPUTED OVER ALL USEFUL TRAINING CASES FOR A CLASS BASED ON ANALYSES OF THE SIMULATED DATA
PAL AND FOODY: EVALUATION OF SVM, RVM AND SMLR FOR ACCURATE IMAGE CLASSIFICATION WITH LIMITED GROUND DATA
1349
Fig. 1. Location of the useful training cases for classifications of the simulated
data by (a) SVM, (b) RVM and (c) SMLR.
useful training cases were distributed in feature space in a relatively systematic fashion (Fig. 1). The location of the useful
training cases, however, varied between the three classifiers.
The trends were visually most apparent for class 2. For this
class, the support vectors were a set of extreme cases that lay at
the edge of the class distribution and between the distributions
of the other classes (Fig. 1(a)). As expected, the support vectors,
therefore, lay in region close to where a classification decision
Fig. 2. Location of the useful training cases for classifications of the ETM+
(Boston) data by (a) SVM, (b) RVM, (c) SMLR.
1350
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 5, OCTOBER 2012
TABLE IV
MEAN MAHALANOBIS DISTANCE MEASURES COMPUTED OVER ALL USEFUL TRAINING CASES
FOR A CLASS BASED ON ANALYSES OF THE ETM+ (BOSTON) DATA SET
PAL AND FOODY: EVALUATION OF SVM, RVM AND SMLR FOR ACCURATE IMAGE CLASSIFICATION WITH LIMITED GROUND DATA
1351
TABLE V
MEAN MAHALANOBIS DISTANCE MEASURES COMPUTED OVER ALL USEFUL TRAINING CASES FOR A CLASS BASED ON ANALYSES OF THE ATM DATA
gent training scheme requires moving between feature and geographical space. For example, the approach used in [13] was
based on using fundamental knowledge of the variables that influence the spectral response to aid the selection of training sites
on the ground that would be expected to lie at extreme positions in feature space. For example, with a crop, extreme cases
might be expected to occur in regions of differing growth stage
and cover as well as with differing soil backgrounds. Moreover,
different extremities can be defined. For example, sites of extremely high and low plant cover would be expected to lie in different locations in feature space. Similarly, crops grown on different soil types or perhaps growing on wet and dry soils would
be expected to lie in different, potentially predictable, locations
of feature space [13], [57]. The precise approach will depend
on the specific data sets used but provided the useful training
cases have a potentially predictable nature an intelligent training
scheme should be feasible. Finally, it is apparent that the results
also highlight that training data collection programmes should
be designed in a classifier-specific manner. Note, for example
with both the ATM and ETM+ (Littleport) data sets, that few of
the training cases selected as useful by one classifier were also
selected as useful by another classifier (Table VII).
The results above indicate that all three classifiers use mostly
different training cases and so point to a desire for classifier-specific training data acquisition programmes. The importance of
this can be seen in the results of classifications of the ATM data
1352
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 5, OCTOBER 2012
TABLE VI
MEAN MAHALANOBIS DISTANCE MEASURES COMPUTED OVER ALL USEFUL TRAINING CASES
FOR A CLASS BASED ON ANALYSES OF THE ETM+(LITTLEPORT) DATA
TABLE VII
NUMBER OF COMMON USEFUL TRAINING CASES
PAL AND FOODY: EVALUATION OF SVM, RVM AND SMLR FOR ACCURATE IMAGE CLASSIFICATION WITH LIMITED GROUND DATA
1353
TABLE VIII
CONFUSION MATRICES FOR THE CLASSIFICATIONS OF THE ATM DATA (A) SVM, (B) RVM AND (C) SMLR. THE OVERALL ACCURACY OF THE CLASSIFICATIONS
WAS 93.75% FOR RVM, 92.50% FOR SVM AND 92.81% FOR SMLR. PER-CLASS ACCURACY (%) SHOWN FROM USERS AND PRODUCERS PERSPECTIVES
TABLE IX
NON-INFERIORITY TEST RESULTS RELATIVE TO SVM BASED ON 95%
CONFIDENCE INTERVAL ON THE ESTIMATED DIFFERENCE IN ACCURACY. NOTE
THAT THE DIFFERENCES IN ACCURACY WERE ALL VERY SMALL (
)
AND INSIDE THE DEFINED ZONE OF INDIFFERENCE.
TABLE X
VARIATION OF CLASSIFICATION ACCURACY AND NUMBER OF RELEVANCE
VECTORS WITH USING ATM DATASET.
1354
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 5, OCTOBER 2012
TABLE XI
COMPUTATIONAL COST AND THE NUMBER OF USEFUL TRAINING CASES USED
BY THE CLASSIFIERS.
REFERENCES
[1] G. M. Foody and A. Mathur, Toward intelligent training of supervised
image classifications: Directing training data acquisition for SVM classification, Remote Sens. Environ., vol. 93, no. 12, pp. 107117, Oct.
2004.
[2] M. Chi and L. Bruzzone, A semilabeled-sample-driven bagging
technique for ill-posed classification problems, IEEE Geosci. Remote
Sens. Lett., vol. 2, no. 1, pp. 6973, Jan. 2005.
[3] P. Mantero, G. Moser, and S. B. Serpico, Partially supervised classification of remote sensing images through SVM-based probability density estimation, IEEE Trans. Geosci. Remote Sens., vol. 43, no. 3, pp.
559570, Mar. 2005.
[4] G. M. Foody, Assessing the accuracy of land cover change with imperfect ground reference data, Remote Sens. Environ., vol. 114, no.
10, pp. 22712285, Oct. 2010.
[5] L. Bruzzone, M. Chi, and M. Marconcini, A novel transductive SVM
for the semisupervised classification of remote-sensing images, IEEE
Trans. Geosci. Remote Sens., vol. 44, no. 11, pp. 33633373, Nov.
2006.
[6] M. Marconcini, G. Camps-Valls, and L. Bruzzone, A composite
semisupervised SVM for classification of hyperspectral images,
IEEE Geosci. Remote Sens. Lett., vol. 6, no. 2, pp. 234238, Apr.
2009.
[7] L. Bruzzone and C. Persello, A novel context-sensitive semisupervised SVM classifier robust to mislabeled training samples,
IEEE Trans. Geosci. Remote Sens., vol. 47, no. 7, pp. 21422154, Jul.
2009.
[8] S. Rajan, J. Ghosh, and M. M. Crawford, An active learning approach
to hyperspectral data classification, IEEE Trans. Geosci. Remote
Sens., vol. 46, no. 4, pp. 12311242, Apr. 2008.
[9] D. Tuia, F. Ratle, F. Pacifici, M. F. Kanevski, and W. J. Emery, Active
learning methods for remote sensing image classification, IEEE Trans.
Geosci. Remote Sens., vol. 47, no. 7, pp. 22182232, Jul. 2009.
[10] P. Zhong, P. Zhang, and R. Wang, Dynamic learning of SMLR for feature selection and classification of hyperspectral data, IEEE Geosci.
Remote Sens. Lett., vol. 5, no. 2, pp. 280284, Apr. 2008.
[11] M. Pal and G. M. Foody, Feature selection for classification of hyperspectral data by SVM, IEEE Trans. Geosci. Remote Sens., vol. 48, no.
5, pp. 22972307, May 2010.
[12] G. M. Foody and A. Mathur, The use of small training sets containing
mixed pixels for accurate hard image classification: Training on mixed
spectral responses for classification by a SVM, Remote Sens. Environ.,
vol. 103, no. 2, pp. 179189, Jul. 2006.
[13] A. Mathur and G. M. Foody, Crop classification by support vector
machine with intelligently selected training data for an operational application, Int. J. Remote Sens., vol. 29, no. 8, pp. 22272240, Apr.
2008.
[14] C. Sanchez-Hernandez, D. S. Boyd, and G. M. Foody, One-class classification for mapping a specific land cover class: SVDD classification of fenland, IEEE Trans. Geosci. Remote Sens., vol. 45, no. 4, pp.
10611073, Apr. 2007.
[15] W. Li, Q. Guo, and C. Elkan, A positive and unlabeled learning algorithm for one-class classification of remote-sensing data, IEEE Trans.
Geosci. Remote Sens., vol. 49, no. 2, pp. 717725, Feb. 2011.
[16] J. A. Gualtieri and R. F. Cromp, Support vector machines for hyperspectral remote sensing classification, in Proc. 27th AIPR Workshop:
Advances in Computer Assisted Recognition, Washington, DC, Oct. 27,
1998, pp. 221232.
[17] C. Huang, L. S. Davis, and J. R. G. Townshend, An assessment of
support vector machines for land cover classification, Int. J. Remote
Sens., vol. 23, no. 4, pp. 725749, Feb. 2002.
[18] G. Zhu and D. G. Blumberg, Classification using ASTER data and
SVM algorithms; The case study of Beer Sheva, Israel, Remote Sens.
Environ., vol. 80, no. 5, pp. 233240, May 2002.
[19] M. Pal and P. M. Mather, Assessment of the effectiveness of support
vector machines for hyperspectral data, Future Gen. Comput. Syst.,
vol. 20, no. 7, pp. 1215122, Oct. 2004.
[20] F. Melgani and L. Bruzzone, Classification of hyperspectral remote
sensing images with support vector machines, IEEE Trans. Geosci.
Remote Sens., vol. 42, no. 8, pp. 17781790, Aug. 2004.
[21] D. Lu and Q. Weng, A survey of image classification methods and
techniques for improving classification performance, Int. J. Remote
Sens., vol. 28, no. 5, pp. 823870, Mar. 2007.
PAL AND FOODY: EVALUATION OF SVM, RVM AND SMLR FOR ACCURATE IMAGE CLASSIFICATION WITH LIMITED GROUND DATA
1355
Mahesh Pal received the Ph.D. degree from the University of Nottingham, U.K., in 2002.
He is presently an Associate Professor in the
Department of Civil Engineering, NIT Kurukshetra,
Haryana, India. His major research areas are land
cover classification, feature selection and application
of artificial intelligence techniques in various civil
engineering application.
Dr. Pal is on the editorial board of Remote Sensing
Letters. Part of the research work reported in this
paper was carried out when Dr. Pal was on a commonwealth fellowship in the University of Nottingham during the period of
October 2008March 2009.