Está en la página 1de 9

JOURNAL OF CHEMOMETRICS

J. Chemometrics 2006; 20: 221229


Published online 21 November 2006 in Wiley InterScience
(www.interscience.wiley.com) DOI: 10.1002/cem.994

Application of PLS-DA in multivariate image analysis


Sylvie Chevallier1*, Dominique Bertrand2, Achim Kohler2,3 and Philippe Courcoux2
1

GEPEA UMR-CNRS 6144, ENITIAA, BP 82225, 44322 NANTES CEDEX 3, France


Unite de Sensometrie et de Chimiometrie, ENITIAA-INRA, BP 82225, 44322 NANTES CEDEX 3, France
3
Center for Biospectroscopy and data modelling, Norwegian Food Research Institute, MATFORSK, 1430 AS, Norway
2

Received 14 October 2005; Revised 17 February 2006; Accepted 28 February 2006

A simple imaging system has been developed for acquiring multivariate images in order to characterise the heterogeneity of food materials. The objective of the present work is, first, to demonstrate the
capability of this acquisition system to discriminate food products of different natures. Secondly, our
goal is to apply Partial Least Squares regression on these multivariate images and to evaluate the
interest of various strategies of classification. A data set containing 24 images (702 T 524) acquired at
different wavelengths for four food products is analysed. After the establishment of the PLS2 models
employed for predicting the indicator variables, four strategies of classification of observations are
tested. The first classification is done by selecting the largest component of the indicator variables. The
others are based on the measurement of distances to the barycentres of the qualitative groups.
Distances calculated can be either Euclidian distances or Mahalonobis distances. Except the strategy
based on the Euclidian distance on scores, the strategies are rather equivalent, with a slight advantage
to the Euclidian distance on predicted indicators. Another possibility addressed by the use of linear
discriminant analysis (LDA) on multivariate images is to represent the qualitative groups as artificial
images. The largest confusion appears between both cereal products while others are well classified.
Copyright # 2006 John Wiley & Sons, Ltd.
KEYWORDS: partial least squares discriminant analysis; PLS-DA; multivariate images; classification; segmentation

1. INTRODUCTION
Nowadays, powerful analytical techniques such as spectroscopic methods allow to obtain high dimensional data sets
from which we may extract valuable information using
multivariate analyses. For example, automatic grouping of
data having a similar feature is an important problem in a
variety of research area such as biology, chemistry and
medicine [1]. Classifying data in different groups (clusters)
can be done in an unsupervised way if no information is
known about the classes, or in a supervised way. Several
techniques of clustering are available and are thoroughly
described in the literature [2]. In recent years, considerable
effort has been expended on exploiting the data compression
or reduction methods of principal component analysis (PCA)
prior to apply a range of discriminant analysis techniques to
solve classification problems [3]. This reduction is achieved
by a linear transformation to a new set of variables, the
principal components (PC) scores, which are uncorrelated
and ranked such that the first few retain most of the variation
present in all of the original variables. Afterwards, a subset of

*Correspondence to: S. Chevallier, ENITIAA-GPA, BP 82225, 44322


NANTES CEDEX 3, France.
E-mail: sylvie.chevallier@enitiaa-nantes.fr

PC scores is used as variables in a linear discriminant


analysis (LDA) [4].
Clustering techniques can also be applied to multivariate
images. Multivariate imaging appears in all experimental
fields of science and technology. It consists in constructing
images of a material for different radiation energies or
frequencies using various techniques. The images can be
macroscopic or microscopic. For example, different imaging
systems based on multispectral fluorescence [5] or on nearinfrared microscopy [6] have been developed for the
identification of food products. The combination of images
recorded at different wavelengths, frequency or energy gives
a multivariate image as defined by [7]. Besides pixel
information generally recorded as intensity, images provide
also spatial information in the form of X and Y coordinates.
To extract information from this large amount of data,
multivariate analysis is required. A widely used approach is
to apply an unsupervised classifier to a small sample of the
image data for the estimation of the classes and then classify
the entire multivariate image using a supervised classifier
trained with the estimated model [8,9]. A procedure for
selecting a representative training set and test set from
multivariate images is based on PCA [7]. Recent works [1,10]
were carried out using other procedures for the estimation of
classes in multivariate images.
Copyright # 2006 John Wiley & Sons, Ltd.

222 S. Chevallier et al.

An interesting alternative, when dimension reduction is


needed and discrimination is sought, may be the use of
Partial Least Squares Discriminant Analysis (PLS-DA) [11].
The basics of PLS-DA consist firstly in the application of a
PLS regression model on variables which are indicators of
the groups. The link between this regression and other
discriminant methods such as LDA has been shown [11]. The
second step of PLS-DA is to classify observations from the
results of PLS regression on indicator variables. The more
common used method is simply to classify the observations
in the group giving the largest predicted indicator variable.
Some authors [12] have shown that this strategy is not the
best, and often performs less well than more common
methods such as simple LDA. The objective of the present
work is to apply PLS regression for a discrimination problem
with a particular focus on multivariate images and to
evaluate the interest of various strategies of classification.

2. BACKGROUND
2.1. Multivariate images
A multivariate image, as acquired by spectrometers or
multichannel cameras, is represented by a three-way data
structure (a cube), noted A. An element almn of A is a positive
number representing some intensity value measured at
location {l, m} for the image channel labelled by the index n.
The locations where the measurements are carried out are
regularly spaced. The indices l and m range from 1 to L and
1 to M, respectively. When working in spectroscopy,
the n index (ranging from 1 to N) often corresponds to the
wavelengths of the illuminating light. In the illustrative
example that will be presented here, n simply indicates a
condition of acquisition. In a given cube image A, the vector
[alm1; alm2; . . . ; almN], noted aTlm and referred as pixel-vector in
the following, thus represents the N intensity values
measured at location {l, m} of a given studied sample. For
processing such a cube image, it is often useful to reorganise
the intensity values as a new data matrix, A fain g,
including LM rows and N columns by a procedure of
unfolding the images. The row index of A are obviously
chosen in order to allow a simple univocal correspondence
between the elements of A and A.

PLS regression (Partial Least Squares) for discriminant


purpose [13]. As PLS well handles rank-deficient matrices,
it is well suited to multiway images in which the number of
spectral conditions N may be large and in which the matrix of
predicting variables is rank-deficient. As the classical PLS
regression is very well known, we will not describe it in the
present article. Only the point related to PLS discriminant
analysis (PLS-DA) will be presented.
Let X (dimensioned I  J) be a (centred) matrix of
predictive variables of the calibration set. Let g be a vector
of I integer values coding the qualitative groups, such as gi
gives the group number associated with the observation i. G
will denote the total number of qualitative groups. From g, it
is possible to build an indicator matrix Y, dimensioned I  G
such as yig is equal to 1 if the observation of index i is
belonging to group g, and 0 otherwise [13]. The classical PLS2
regression model can then be applied on X and Y by varying
the PLS dimensions. For a given number of dimensions K, the
application of PLS2 regression basically gives several output
matrices. The PLS scores denoted T and dimensioned I  K
represent a set of latent variables, which are linear
combinations of the original variables in X. The coefficients
of the linear combinations are gathered in the matrix of
loadings P, such as T XP. The regression model associated
with K dimensions gives a prediction of Y, gathered in the
matrix Y^ and a matrix of estimated regression coefficients B
such as Y^ XB. The PLS2 models can obviously be applied
on unknown data using the same matrix calculations.
It must be noticed that applying PLS2 on indicator
variables is not, in principle, totally logical. If the number
of qualitative groups G is greater than 2 and the matrix X has
a low rank, situations depicted on Figure 1 may often occur.
Even if the groups are actually easily separable in the
multidimensional space, some of the indicator variables may
be impossible to be accurately predicted using a linear
model. For example, on Figure 1, the indicator variable
associated with group A, which is surrounded by other
groups, is not linearly dependant on the variables x1 and x2.
This problem is perhaps to be related with the conclusions of

2.2. PLS-DA in the context of multivariate


images
2.2.1. Principle of PLS-DA
The purpose of the acquisition of multivariate images is often
to identify, on the surface of the studied samples, the group
belongings associated to each of the pixel vectors. The
eventual results of such studies could be artificial images of
the sample, in which each predicted group is represented by
an arbitrary symbolic colour. Such an objective can lead to
unsupervised approaches, when it is impossible to create
learning set of pixel vectors, or to supervised ones, when the
creation of a learning set is possible. In the second case,
which is addressed in the present article, linear discriminant
approach can be carried out via a regression analysis with an
indicator matrix reflecting the classes of the training set
observations. This remark leads naturally to the utilisation of
Copyright # 2006 John Wiley & Sons, Ltd.

Figure 1. Imaginary example showing the impossibility to


linearly predict some indicator variables in a vector space
having a small number of dimensions. Codes (0 or 1): value of
the indicator variable associated with group A.
J. Chemometrics 2006; 20: 221229
DOI: 10.1002/cem

PLS-DA in image analysis 223

Indahl et al. [12] and ours [14] who found, on real data set,
that PLS-DA often performs less well than other discriminant
methods such as LDA. However, providing that the data are
projected in a relevant space (such as the space of the PLS
scores), it seems possible to find other strategies of discri^
mination based on the knowledge of T and Y.

2.2.2. Strategies for classification


In the particular case of PLS-DA, Y^ is a prediction of the
indicator variables, and is thus not directly a prediction of the
qualitative nature of the observations. Four strategies of
classification of observations are tested:
(i) The first one and the more usual method for classifying
an anonymous observation xT (1  J) is to first compute
the vector of predicted indicator variables y^ (G  1)
using the PLS2 regression model. The classification of
x is done by selecting the group corresponding to the
largest component of y^. This strategy will be referred
further as the Max indicators strategy. As mentioned
previously, the possible drawback of this strategy is
that it will give poor results if a group is not linearly
separable in the projection space defined by T.
(ii) For the others strategies, it appears interesting to involve
the measurement of distances to the barycentres of the
qualitative groups. Following the remarks of Indahl et al.
[12], we have tested the use of PLS2 scores T as a new
data matrix for linear discrimination, in replacement of
X. Let t Ti be the PLS2 scores of the ith observations of the
calibration set. For r 1 . . . G, we can define the baryPr
centres such as mr n1r ni1
t i for the nr observations
belonging to group r. For classifying an anonymous
observation xT, we will first project it in the PLS2 space,
leading to a vector t Tx . Some distances such as d(tx, mr)
with r 1 . . . G can then be computed and the anonymous observation is classified in the group for which
the distance takes the smallest value. Following this
general strategy, it is possible to vary the way of calculating the distance. The first possibility, corresponding
to our second strategy and referred as Euclidian distance
on scores, is to use Euclidian distance de such as
d2e t x ; mr t x  mr T t x  mr

with

V TT T:

This strategy will be denoted in the following as


Mahalanobis distance on scores.
(iv) A last strategy, which follows the logic of PLS-DA is to
consider again the relationship Y^ XB. The predicted
indicators are actually linear combinations of the variables in X. Moreover, it is interesting to remark that the
information on group belongings brought by y^ may lie
not only in its largest component, but also in the other
ones: the fact that a given observation is far from some
groups, and thus have low value of the corresponding
Copyright # 2006 John Wiley & Sons, Ltd.

yx ; pr ^
yx  pr T ^
yx  pr
d2P ^
As previously, the anonymous observation is classified in the group for which the distance takes the
smallest value.

2.2.3. Application on multivariate images


Let A1; A2; A3; . . . ; AG be multivariate calibration images
(L  M  N) which are each representative of a single
qualitative group. In each of this image, H pixel vectors
are randomly selected, and gathered in a matrix of
calibration Xcal, dimensioned (GH  N). The PLS-DA models
are established on Xcal. For testing the accuracy of the models,
pixel vectors of the images of the verification set are sampled
in a similar way, and gathered in a matrix Xval. The four
proposed strategies are applied on this validation set varying
the dimensions K of the PLS-DA models. It is also possible to
build synthetic images showing the predicted qualitative
groups. For this purpose each of the studied anonymous
multivariate image is first unfolded, and the predicted
groups g^ (LM  1) are computed using the chosen PLS-DA
model and strategy. g^ can then be refolded, in order to form a
single-channel image (L  M) showing the predicted group
of each pixel. Such a group image can be displayed using
arbitrary symbolic colours for representing each group. In
this way, it is easy to emphasise the spatial organisations in
images in which several predicted groups are represented.

3. MATERIALS AND METHODS

(iii) The third strategy is a variant of the second one,


suggested by another study [12]. It consists in performing a classical LDA using T as the set of predictive
variables. This leads to the use of the Mahalanobis
distance:
d2m t x ; mr t x  mr T V 1 t x  mr ;

predicted indicators, can be a relevant information. For


this reason, we have tested the possibility of making use
of an Euclidian distance on the predicted estimator
(Strategy Distance on indicators). Let y^i be the vector of
predicted indicator variables of the ith observations of
the calibration set. For r 1 . . . G, it is possible to compute the barycentres of the predicted indicators such as
Pr
pr n1r ni1
y^r for the nr observations belonging to
group r. For classifying an anonymous observation xT,
appropriately centred, its predicted indicator variables
are first calculated using y^x xT B. The Euclidian distance on predicted scores dp is given by:

3.1. Multiway imaging system


3.1.1. Description of the system
The acquisition system, shown schematically in Figure 2,
includes a camera, light sources and a computer. Digitised
images, coded on 12 bits, are acquired by a digital colour
camera (DX20, KAPPA, Germany) with a zoom lens (focal
lengths of 5.632 mm, COMPUTAR, Bioblock, France). This
model of camera is equipped with double-s stage Peltier aircooling which makes it suitable for fluorescence imaging. A
large spectrum of adjustable integration times, from a few
milliseconds to several seconds, allows photon accumulation
and to display correctly from extremely dark to bright scenes.
The light sources consist of eight sets of easily available LEDs
of different wavelengths ranging from 400 to 950 nm
(Table I). Each set comprises 12 LEDs distributed at the
four points of a square in order to light up the sample evenly.
J. Chemometrics 2006; 20: 221229
DOI: 10.1002/cem

224 S. Chevallier et al.

Figure 2. Diagram of the image acquisition device.


Table I. Wavelengths of the different LEDs and integration
time of the camera chosen for the acquisition of multivariate
images
LED
Near-IR1
Near-IR2
Red
Amber
Green
Blue
UV
White

Wavelength (nm)

Integration time (TIC in s)

950
875
626
592
524
470
400
N/A

0.250
0.080
0.040
0.110
0.250
0.110
3.500
1.850

The device is enclosed in a dark enclosure avoiding light


interference from the laboratory. The system is connected to a
PC via a PCI interface board. A specific software has been
developed to define and control the different parameters of
the image acquisition: integration time of the camera and
type of LEDs switched on.

3.1.2. Experimental procedure


The samples are placed in the cell of the dark enclosure under
the camera as shown in Figure 2 and the zoom lens are
focused once and for all experiments. Integration time can
vary from 0.1 ms to about 100 min and eight different light
sources are available from near-IR to UV (see Table I) and
white LED. A procedure consists of the acquisition of a series
of defined images. An example of procedure is given in
Figure 3. As can be read in this procedure, the image named
image1.dat is acquired with the following chosen parameters:
TIC 0.110: integration time of the camera (s)
LED 00100000: binary value for red lightning
Each acquisition for a given illumination condition results
in a colour image that has a spatial resolution of 702  524
pixels with three camera channels (red, green and blue).
From the successive acquisition of individual images of the
same sample, it is thus possible to build a cube image by
merging all the RGB images associated with the same
Copyright # 2006 John Wiley & Sons, Ltd.

Figure 3. Example of batch procedure for image acquisition.

sample. As there are eight illumination conditions (defined


by the LEDs) and three camera channels, the resulting cube
images are thus dimensioned 702  524  24. Experimental
designs with the two acquisition parameters (integration
time of the camera and type of LEDs switched on) have been
applied on reference samples (different pure organic or
mineral compounds) to study the physical response of the
system. The statistical analysis of the multivariate images
collected in this way showed that the system was sufficiently
repeatable for practical applications. No interaction was
highlighted between the two parameters: integration time
and nature of the LED. This preliminary study also made it
possible to optimise the floodlighting conditions according to
each LED (see Table I) and to confirm the linearity of the
response with regards to the integration time.

3.2. Sample collection and discrimination


As an illustrative example, four powdered raw materials
(maize, pea, soya bean meal, wheat) are studied. The raw
materials were placed in the sample-cell. The surface of the
sample is compressed and levelled using a flat cylinder, and
multivariate images are acquired, in random order. For each
studied raw material, four multivariate images are acquired.
In order to build the PLS-DA models, one multivariate image
of each raw material is selected to form the calibration set. In
each calibration multivariate image, 400 pixel vectors are
J. Chemometrics 2006; 20: 221229
DOI: 10.1002/cem

PLS-DA in image analysis 225

Figure 4. Example of the construction of calibration and validation matrices for one raw
material. This sampling is repeated for each of the raw materials.

randomly selected (Figure 4). As there are four raw materials,


the pixel vectors selected in this way could be eventually
gathered in the matrix of the calibration set Xcal, dimensioned
1600 (400 pixel vectors  4 raw materials)  24 (8 LEDs  3
channels). The validation set is built in a similar way using
the 12 remaining multivariate images, and selecting 400 pixel
vectors in each image. The validation matrix Xval is thus
dimensioned 4800 (400 pixel vectors  4 raw materials  3
validation multivariate images)  24. The PLS-DA models
are tested with dimensions ranging from 1 to 20, making use
of the four defined strategies of discrimination. After the
selection of the most relevant PLS-DA model, the group
images of the validation set can be built and examined.

4. RESULTS
Images of the four reference products (maize, pea, soya bean
meal and wheat) are shown in Figure 5. When observing
images acquired with the white light (Table I), it appears a
certain heterogeneity within each product because of the
different nature of the grains constituents (pericarp,
endosperm, aleurone) which are ground and mixed during
the sample preparation. When compared with images
acquired using near-infrared LEDs, it can be seen that these
fractions appear differently under different light sources.
The complex nature of these raw materials will make the
classification more difficult.
PLS-DA is first applied on the sampled pixel vectors.
Figure 6 shows the plot of the two first PLS scores obtained
using the indicator variables as predictive variables. For the
sake of clarity, only 15 of randomly chosen observations are
shown on this graph, but the examination of the whole set
Copyright # 2006 John Wiley & Sons, Ltd.

has given the same kind of information. The four raw


materials are partly separated on this graph, with the
observations associated with soya bean meal (S) on the left
and the wheat (W) on the right. The qualitative groups are
overlapping with respect to the first two scores, but are
logically positioned according to the biological nature of the
samples. The protein-rich raw materials (soya bean meal and
pea) are close together and rather well separated from the
two cereals (wheat and maize), which are partly overlapping.
The PLS-DA model is tested with varying the PLSdimensions from 1 to 20. The number of correct classification
both in the calibration and the validation sets are taken as a
criterion of the accuracy of the tested model. Figure 7 shows
the evolution of the number of observations in the calibration
set correctly classified (maximum: 1600) as a function of the
PLS dimensions of the model. As expected, the number of
correct classifications increases as a function of the number of
dimensions. The strategy Euclidian distance on scores gives
very poor results in comparison with the other strategies,
with less than 1100 observations correctly classified. The
other strategies give higher results, with a slight advantage
for the strategy Euclidian distance on predicted indicators.
The strategies Mahalanobis distance on scores and Max
indicators are very similar. Figure 8 shows the same kind
of evolution on the validation set, including 4800 observations. The models seem to show a plateau from 8 to
17 components and thus appear to be rather stable. The
examination of this graph leads to the same conclusions as
for the calibration set. The Euclidian distance on scores gives
again poor results with a success rate of about 65%. The other
strategies are more efficient, with again an advantage to the
Euclidian distance on predicted indicators. Taking the models
J. Chemometrics 2006; 20: 221229
DOI: 10.1002/cem

226 S. Chevallier et al.

Figure 5. Multivariate images of four reference food products recorded


under white LEDs (left-hand side column) and near infra-red LEDs at
875 nm (right-hand side column).

Figure 6. PLS score plot of the discrimination. Scores #1


#2. M: maize; P: pea; S: soya bean meal; W: wheat. For the
sake of clarity, only 1/5 of the points are shown.
Copyright # 2006 John Wiley & Sons, Ltd.

with eight dimensions, the success rates are about 84% for
Mahalanobis distance on scores and Max indicators. This figure
reaches 86% for the Euclidian distance on predicted indicators,
which corresponds to an increase of about 100 in the number
of observations correctly identified. Table II shows the
confusion matrix computed on the 4800 observations of the
validation set, from the PLS-DA model with eight components
and using this last strategy. This table is in accordance with the
examination of Figure 6. The largest confusion is between the
pixel vectors of maize (M) and wheat (W) which are both
cereals. The two protein-rich raw materials (pea: P; soya bean
meal: S) are also partly confounded one with each other. The
pixel vectors of soya bean meal (S) build the most clearly
separated group; indeed, this material is issued from a
complex industrial process, whereas the other materials are
simply ground seeds or grains.
Figure 9 shows the histograms of the distances of the
observations to each of the four barycentres of the observed
J. Chemometrics 2006; 20: 221229
DOI: 10.1002/cem

PLS-DA in image analysis 227

Figure 7. PLS-DA discrimination. Number of correctly classified observations of the calibration set as a function of the PLS
dimensions. The four strategies are described in paragraph
2.2. Total number of observations: 1600.

Figure 8. PLS-DA discrimination. Number of correctly classified observations of the validation set as a function of the PLS
dimensions. The four strategies are described in paragraph
3.2. Total number of observations: 4800.

Table II. PLS-DA discrimination


groups. On this graph the dark points correspond to
observations belonging to the depicted group. The histograms associated with soya bean meal and pea show clearly a
bimodal distribution associated to group separation. On the
contrary histograms for maize and wheat show a larger
confusion between the groups. The exploitation of such
histograms obtained on the calibration set may logically
leads to the development of confidence tests. From these
histograms, experimental repartition functions can be built
up, which will make it possible to evaluate the probability of

Predicted groups

Actual groups

M
P
S
W

74.8
4.3
0.2
11.0

10.7
90.3
12.1
1.6

0.5
1.0
87.2
0.8

14.1
4.5
0.6
86.6

Confusion matrix of the validation set (4800 observations). Results


expressed as percentages. Codes (nature of the raw material): M:
maize; P: pea; S: soya bean meal; W: wheat.

Figure 9. Histograms of distances of individuals to the barycentres of the


observed groups. Euclidian distances based on the predicted indicator variable.
M: maize; P: pea; S: soya bean meal; W: wheat. The darker colour indicates
the belonging of the observation to the actual group.
Copyright # 2006 John Wiley & Sons, Ltd.

J. Chemometrics 2006; 20: 221229


DOI: 10.1002/cem

228 S. Chevallier et al.

Figure 10. Images of predicted groups in validation. The correctly classified


pixels are shown in dark colour.

an unknown observation to belong to a given qualitative


group.
An important issue of the application of discriminant
analysis on multivariate image is the possible representation
of the qualitative groups of the pixel vectors (observations)
on artificial images. For this purpose, the chosen PLS-DA
model is applied on anonymous unfolded images. The
predicted groups are then refolded in order to form a singlechannel image of the groups, which can be examined. As an
example, some group images of the four raw materials in the
validation set are shown on Figure 10. In this example, the
observations of the validation set include 702  524, that is
367 848 pixels for each raw material. On Figure 10, the
correctly classified pixels are shown in black, whereas the
others are coloured in white. On this particular set of images,
the proportion of correctly identified observations was
slightly different than the one observed on the validation
set (Table II), with 75%; 83%; 98%; 89% of correct
classification for respectively M, P, S and W. This slight
variation may be due to some uncontrolled systematic error
related to the lightning conditions.

5. CONCLUSION
The use of PLS-DA on multivariate images acquired with a
simple experimental device may offer a large range of
applications in which it is needed to characterise the spatial
organisation of heterogeneous materials. In many situations,
the data are collinear or quasi-collinear, and PLS-DA is
therefore appropriate. After the establishment of the PLS2
models employed for predicting the indicator variables, four
strategies of classification of observations have been tested.
Except the strategy based on the Euclidian distances on scores,
the strategies are on overall rather equivalent, with a slight
advantage to the Euclidian distance on predicted indicators. This
is in accordance with previous studies [14] showing that the
more commonly used strategy, Max indicators was never the
Copyright # 2006 John Wiley & Sons, Ltd.

best on artificial or real data set. The good results obtained


using the Max indicators strategy seem to show that in the
multidimensional space of PLS, each group can be individually separated from all the others contrary to what was
mentioned in Figure 1. Indeed, the number of qualitative
groups was smaller than the number of optimal dimensions
(about 8) in the PLS model.
The possibility of unfold and refold images in order to
build synthetic images showing the location of qualitative
groups is particularly useful for the characterisation of
heterogeneous samples such as vegetal or animal tissues.
Moreover, this processing can be a first step in the extraction
of features from the shapes corresponding to each qualitative
groups. On the computed group images, adapted methods
such as mathematical morphology [15] will give a way to
efficiently summarise the spatial and spectral information of
multivariate images.

Acknowledgments
The authors thank Region des Pays de la Loire for its financial
support and P. Papineau and A. Sire for their technical help.

REFERENCES
1. Tran TN, Wehrens R, Buydens LMC. Clustering multispectral images: a tutorial. Chemometrics Intell. Lab. Syst.
2005; 77: 317.
2. Massart DL, Vandeginste BGM, Buydens LMC, De Jong
S, Lewi PJ, Smeyers-Verbeke J. Handbook of Chemometrics
and Qualimetrics: part B, Data Handling in Science and
Technology, vol 20B, Elsevier: Amsterdam, 1998.
3. Kemsley EK, Ruault S, Wilson RH. Discrimination
between Coffea arabica and Coffea canephora variant robusta
beans using infrared spectroscopy. Food Chem. 1995; 54:
321326.
4. Kemsley EK. Discriminant analysis of high-dimensional
data: a comparison of principal components analysis and
partial least squares data reduction methods. Chemometrics Intell. Lab. Syst. 1996; 33: 4761.
J. Chemometrics 2006; 20: 221229
DOI: 10.1002/cem

PLS-DA in image analysis 229

5. Novales B, Bertrand D, Devaux M-F, Robert P, Sire A.


Multispectral fluorescence imaging for the identification
of food products. J. Sci. Food Agric. 1996; 71: 376
382.
6. Baeten V, Michotte Renier A, Vermeulen P, Dardenne P.
Use of Near-infrared microscopy (NIRM) and IR Camera
to detect and quantify meat and bone meal (MBM),
Compound feed, AAFCO & FDA, Tucson, January 14, 2003.
7. Geladi P, Grahn H. In Multivariate Image Analysis.
John Wiley and Sons: Chichester, UK, 1996; 3444.
8. Banfield J, Raftery A. Model-based Gaussian and nonGaussian clustering. Biometrics 1993; 49: 803821.
9. Geladi P, Isaksson H, Lindqvist L, Wold S, Esbensen K.
Principal component analysis of multivariate images.
Chemometrics Intell. Lab. Syst. 1989; 5: 209220.
10. Noordam JC, van den Broek WHAM, Geladi P, Buydens
LMC. A new procedure for the modelling and repres-

Copyright # 2006 John Wiley & Sons, Ltd.

11.
12.

13.

14.
15.

entation of classes in multivariate images. Chemometrics


Intell. Lab. Syst. 2005; 75: 115126.
Baker M, Rayens W. Partial least squares for discrimination. J. Chemometrics 2003; 17: 166173.
Indahl UG, Sahni NS, Kirkhus B, Naes T. Multivariate
strategies for classification based on NIR-spectrawith
application to mayonnaise. Chemometrics Intell. Lab. Syst.
1999; 49: 1931.
Sjostrom M, Wold S, Soderstrom B. Feature extraction,
classification, mapping. In Pattern Recognition in Practice
II, Gelsema ES, Kanal LN (eds). Elsevier: Amsterdam,
1989; 486.
Bertrand D, Courcoux P, Camps C. La discrimination par
regression PLS sur indicatrices. Congre`s Chimiometrie:
Paris, 2004; 108111.
Serra J. Image Analysis and Mathematical Morphology. Academic Press: London, 1982.

J. Chemometrics 2006; 20: 221229


DOI: 10.1002/cem