Documentos de Académico
Documentos de Profesional
Documentos de Cultura
and STATA
.
2.3 Approximations
Given that the tetrachoric coefficient cannot be expressed in analytical form (see expression
[2.2] above), Pearson (1900) and other authors presented some suggestions to approximate
[2.2]
[2.1]
the tetrachoric correlation in a less time-consuming way (see Castellan Jr., 1966 for a
summary and Digby, 1983). However, it is important to note that these estimates are equal to
the real tetrachoric correlation only when the response frequency form median divisions of
the distributions (that is, using the notation above, when a + b = c + d and a + c = b + d).
3. ASSET CORRELATIONS AND THE BASEL ACCORDS
3.1 The estimation of extreme credit losses
In Basel, extreme credit losses (ECL) are estimated by means of the following formula (see
BCBS, 2006):
|
|
.
|
\
|
u + u
u =
1
) 999 . 0 ( ) (
1 1
PD
ECL
where u and u
-1
represent the standard normal distribution and its inverse respectively. u
-1
(0.999) gives the confidence level (99.9%) of the losses in an extreme economic scenario. PD
is the mean (historical) probability of default in the portfolio considered and is the linear
correlation between latent variables (e.g. returns of obligors assets).
3.2 Calibration of asset correlations in Basel
The correlations used in [3.1] are stipulated according to the credit segments. Such
correlations are estimated as a function of the probability of default for corporate, sovereign
and bank exposures (between 0.12 and 0.24; see BCBS, 2006, paragraph 272), for small and
medium enterprises (between 0.12 and 0.24 adjusted by the firm size; see BCBS, 2006,
paragraph 273) and for high-volatility commercial real estate (between 0.12 and 0.30; see
BCBS, 2006, paragraph 283). As for residential mortgages and revolving retail credit, the
correlation was set equal to 0.15 and 0.04, respectively (see BCBS, 2006, paragraphs 328 and
329). The correlation pertaining to other retail exposures goes from 0.03 to 0.16 as a function
of the probability of default (see BCBS, 2006, paragraph 330). The method and the data used
to empirically estimate these asset correlations were not disclosed. Finally, recall that all
these correlations refer to the latent variables that drive defaults (obligors asset returns, for
example).
[ 3.1 ]
3.3 Other empirical estimations
As for corporate debt, some approaches (such as CreditMetrics and KMV) have traditionally
approximated asset return correlations by equity return correlations which requires the
availability of data on the borrowers considered. Nonetheless, a few studies have used
different techniques. Rsch (2003), for instance, estimated asset correlations of German
corporate obligors via Maximum Likelihood with adaptive Gaussian quadrature. Based on
equations similar to [3.1], he found correlations equal to 0.00052 (0.00857) when
macroeconomic variables were (not) included in the calculations. Following the same
method, Hamerle et al. (2003) examined data on the G7 countries and found correlations
between 0.00113 and 0.02274 (0.00044 and 0.01646) without (with) macroeconomic
variables. Rsch (2005) estimated corporate asset correlations for different rating classes and
found values between 0.0023 and 0.0086 (0.0441 and 0.0655) when macroeconomic factors
are (not) considered. In Hamerle and Rsch (2006), the estimates were even smaller for
German firms in manufacturing (0.000557) and commerce (0.000462). These results are
considerably below the values specified in Basel (between 0.12 and 0.24).
Lopez (2004) studied business loans and calibrated asset correlations by searching for values
that minimized the absolute difference between the credit losses calculated according to two
approaches: the Portfolio Manager
TM
software (developed by KMV) and the Asymptotic Risk
Factor model (the core theory behind the Basel method). The author found values between
0.1000 and 0.3250 which represent a range wider than the interval stipulated in Basel (see the
previous section).
A number of other studies have focused on retail credit. Rsch and Scheule (2004), for
instance, estimated asset correlations for residential loans, credit cards and other consumer
loans from US commercial banks by using linear regressions of logit and probit
transformations of default rates. Their results were 0.0098, 0.0102 and 0.0073 respectively
when lagged macroeconomic risk drivers were not included in the regressions and even
smaller (0.0028, 0.0066 and 0.0044, respectively) when macroeconomic variables were
included. Note that these estimates are much below the correlations in Basel: 0.15, 0.04 and
between 0.03 and 0.16, respectively. De Andrade and Thomas (2007) studied retail loans and
interpreted the latent variable as the consumers creditworthiness. The authors estimated the
correlation across these underlying variables by calculating the mean pairwise correlation
among behavioral scores of 1000 Brazilian consumers and found 0.00095 (for monthly data)
and 0.00047 (for six-monthly time periods) which are much lower than the values in Basel.
Crook and Bellotti (2012) used maximum likelihood to estimate the asset correlation in credit
card loans and found 0.00396 for account level data (from a particular financial institution)
and 0.018 for aggregate UK data while the value determined in Basel, 0.04, is substantially
larger.
Das et al. (2007) estimated the pairwise correlation for underlying variables in credit
corporate portfolios by means of simulations and found correlations between 0.01 and 0.04
for time horizons between 2 and 10 years. Although these values are higher than the estimates
mentioned above they are still much lower than the Basel calibration for corporate debt (see
Section 3.2). Moreover Das et al. (2007) cannot be compared to the previously mentioned
findings since it reports cumulative defaults over several years whilst the other studies imply
defaults in one year.
It is worthwhile noting that Hamdan (1970) showed the equivalence between estimates of the
tetrachoric correlation and estimates of the linear correlation coefficient () via maximum
likelihood but this equality is valid only for estimations regarding 2 x 2 contingency tables.
The studies cited above that used maximum likelihood to estimate the linear correlation
across latent variables were based on expression [3.1] which has no relationship with the
formulas employed by Hadam (1970).
Thus, none of the studies reported in the literature is based on the tetrachoric coefficient and
the aforementioned results show that, in most of the cases, the portfolios tested presented
asset correlations considerably smaller than those specified in Basel.
4. LATENT VARIABLE CORRELATIONS INFERRED FROM DEFAULT RATES
4.1 Using default rates to estimate the tetrachoric coefficient
In the Basel approach, what we observe are continuous latent variables (asset returns of
obligors) dichotomized into the statuses default and non-default. Since these variables
are assumed to be normally distributed (both univariate and bivariate) and continuous, we can
use the tetrachoric coefficient for estimating the linear correlation between them. That is,
since we do not have access to the latent variables themselves we can use the outcome
observed (default/non-default) to estimate their correlation.
To do so, we should define the components of the 2x2 contingency table (a, b, c, and d) as
the number of pairs of loans (joint occurrences) regarding their statuses (default and non-
default). Doing an analogy with the example given in Section 2.1, each pair of loans
corresponds to a machine inasmuch as we associate two latent variables (return of obligors
assets) concerning each pair of loans (i.e., one related to loan i and another one related to loan
j) while we associate two underlying characteristics (operational condition and hours of
operation) for each machine. So, the total of pairs, N, in a portfolio is the number of all
possible combinations of loans taken two by two.
Note that the number of loans in the portfolio and its default rate (proportion of defaulted
loans) are sufficient to set the contingency table. Consider a portfolio composed of L loans.
Using a contingency table similar to the one illustrated in Figure 1, the numbers of pairs in
which both loans (i and j) default (a), one loan default while the other one does not default (b
+ c) and neither of the loans defaults (d) are respectively given by:
a = comb(dr*L,2)
b + c = (L (dr*L)) * (dr*L)
d = comb((L (dr*L)),2)
where dr is the default rate (proportion of defaulted loans in the portfolio) and L is the
number of loans in the portfolio. comb(u,v) is the combination of u elements taken v at a time,
which implies that this technique is limited to L*dr > 2 and (L-(L*dr )) > 2 since u > v. These
limitations are reduced as L increases. That is, the number of unfeasible dr goes to 0 as L
approaches .
The number of pairs in which one loan defaults while the other one does not default (b and c)
is split into two equal values because we assume homogeneous behavior (probability of
default) of loans i and j and therefore the frequency of their defaults can be assumed to be the
same. Hence:
b = c = [(L (dr*L)) * (dr*L)]/2
The contingency table related to this case is displayed in Figure 2.
[Insert Figure 2 here]
Using [2.2], we found the asset correlations inferred from all possible default rates for
portfolios composed of 100, 1000 and 10000 loans. Although it is well known in the
literature (see Duffie et al. among many others) that default correlations tend to increase with
default rates, we emphasize that, in accordance with the foundation of the Basel model, we
are estimating the correlation across latent variables (assumed to drive defaults) and not
correlations across default rates (or probabilities of default). In this case, it is expected that
both low and high default rates are associated with high levels of correlation across latent
variables and intermediate levels of default rates should be related to the lowest correlations.
The high correlation across latent variables regarding low (high) default rates is explained by
the fact that many loans simultaneously have favorable (unfavorable) latent variables such
as high (low) asset returns which means that these unobserved variables are highly
correlated.
The correlations estimated by means of the tetrachoric coefficient are somehow consistent
with the behavior described above. The values calculated are not a monotonic function of the
default rates inasmuch as the correlations increase up to a maximum point then falls until a
particular level. Next, they go up and reach the same maximum point again before reducing.
This behavior is represented in Figure 3 for portfolios composed of 100 (Panel A), 1000
(Panel B), and 10000 (Panel C) loans.
[Insert Figure 3 here]
We see that the correlation in a portfolio containing 100 loans has the minimum correlation
(0.0159) when the default rate is 0.50 and the maximum correlation (0.0304) is related to the
default rate equal to 0.05 or 0.95.
As for the portfolios composed of 1000 and 10000 loans the lowest correlations (0.000412
and 0.0000044, respectively) are associated with the lowest default rates tested (0.002 and
0.0002, respectively). On the other hand, each of the highest correlations (0.003113 and
0.000312, respectively) is related to the same default rates concerning the case of 100-loan
portfolios: 0.05 and 0.95 (if we consider only two decimal places). Thus, in general, default
rates equal to 0.05 or 0.95 are related to the highest asset correlations and, when the size of
the portfolio increases, the default rate associated with the lowest correlation tends to zero.
These results are lower than the values specified in Basel (see Section 3.2) and our results are
more compatible with the findings in the studies cited in Section 3.3.
We challenge the notion (in Basel) that, for some credit classes (e.g. corporate loans), the
probability of default (or, in our interpretation, default rates) has a monotonic relationship
with the correlation across underlying assets. Either low or high default rates indicate high
correlation across assets since both situations show that the latent variables (returns of
debtors assets) are at similar levels (either low or high, respectively) which contributes to
(though does not guarantee) an increase in their correlations.
Moreover, a default rate close to one does not imply that the correlation among latent
variables is close to one as well because even though the latent variables are below a
threshold that characterizes default they still have a range of possible values and may behave
in diverse ways in that interval (which reduces their correlation).
Note that the tetrachoric correlation in the examples mentioned above (machines and credit
portfolio) is based on cross-sectional data (instead of time-series data) and therefore we are
using a different approach of dependence. Even though, we believe our approach is more
adequate (than the method currently employed) to capture the real (underlying) association
across loans in portfolios since the tetrachoric model focus on the most important aspect in
this context: the default rates. Regulators and financial institutions can use fixed or rolling
windows to calculate average default rates and then to estimate the related (inferred)
correlation. So, the correlation can be constantly updated (time-varying) to reflect the
dynamic of credit portfolios (more recent default rates).
4.2 A simplified formula
Alas the approximations mentioned in Section 2.3 typically lead to biases in the context of
the default rates. To see this, recall that the estimates (approximations) are equal to the
correlation coefficient only when the contingency tables present the equalities a + b = c + d
and a + c = b + d. Hence, we would get unbiased correlation coefficients only if the number
of pairs of joint default is equal to the number of pairs of simultaneous non-default (using the
notation in Section 4.1, a + b = c + d, which implies, a = d since b = c).
For the sake of simplicity, some of these approximations can be used at the expense of some
bias. This is an alternative for regulators and practitioners to facilitate the implementation of
this approach. As an example, we test the case where all the terms except for the first term in
equation [2.2] are excluded (assumed to be close to zero) as in Cheng and Popovich (2002):
) ( ) (
2
k h N
bc ad
r
| |
= e=
where r is the tetrachoric coefficient and the definition of the other terms follows [2.1]. We
then check the difference between the coefficients estimated from the extended and the
simplified equations, [2.2] and [4.1] respectively.
The shape of the plots concerning the asset correlation and default rates are basically the
same as those displayed in Figure 3. Notwithstanding, on average, the correlations estimated
from the simplified formula are smaller than those derived from the extended formula: 0.0198
[4.1]
against 0.0204 for 100 loans, 0.00193 against 0.00202 for 1000 loans and 0.000192 against
0.000202 for 10000 loans. According to these values, it seems that the relationship between
the number of loans in the portfolio and the asset correlation is very close to the linearity in
both the simplified and the extended formula: in this example, when the number of loans is
multiplied by 10 the respective asset correlation is roughly divided by 10.
As in the results obtained from the extended formula [2.2], the lowest asset correlations
estimated via the simplified formula [4.1] were associated with the default rate equal to 0.50
(for the 100-loan portfolio) and the smallest default rate tested (for the other two portfolios).
However the default rates concerning the highest asset correlations were 0.07 and 0.93
(instead of 0.05 and 0.95 in the case of the extended formula)
1
.
Bear in mind that the position of the frequencies in the contingency table may result in a
tetrachoric coefficient of the same magnitude but with the opposite sign of the coefficient
calculated from [4.1] and a contingency table like the one displayed in Figure 4. If we have a
contingency table in which the position of the default and non-default statuses for one of the
loans (i, for example) are different from the table in Figure 4 (see Figure 5), the tetrachoric
correlation would have the same magnitude but the inverse sign. This can be easily confirmed
if we compare the numerator of [4.1] to the numerator of the following equation related to the
contingency table shown in Figure 5:
) ( ) (
2
k h N
ad bc
r
| |
=
Hence, in this context, we should opt for the contingency table that results in a positive
coefficient since the formula used to estimate extreme losses [3.1] is not compatible with
negative correlations.
[Insert Figures 4 and 5 here]
4.3 An adjustment
The general quadratic function found in Sections 4.1 and 4.2 (as shown in Figure 3) was
expected given that very low or very high default rates must be associated with a high
connection across the latent variables whilst intermediate default rates tend to be related to a
weak relationship across the latent variables. However, the results concerning the extreme
default rates (less than 0.05 and higher than 0.95 for the extended formula in Section 4.1 or
1
Note that, in both cases, the summation of these default rates is 1, which reveals the symmetry of the
two highest asset correlations around an intermediate default rate. Figure 3 confirms this.
less than 0.07 and higher than 0.97 in Section 4.2) are inconsistent with the values intuitively
expected.
In order to adjust our findings to plausible values for extreme default rates we suggest the use
of the respective maximum tetrachoric correlation for all default rates below 0.05 and above
0.95 for the extended formula (as illustrated in Figure 6) or below 0.07 and above 0.93 for the
simplified formula. For example, with regard to the 1000-loan portfolio considered in Section
4.1 (extended formula), we should use the correlation 0.003113 for any default rate equal to
lower than 0.05 and equal to or higher than 0.95.
[Insert Figure 6 here]
5. CONCLUSIONS
Given that the correlation typically applied in the calculation of extreme credit losses refers to
a relationship across dichotomized continuous latent variables, the tetrachoric coefficient is
more indicated than the product-moment correlation coefficient to measure such relationship.
It is clear that our approach relies on an unusual type of data in these circumstances (cross-
sectional rather than times-series) but given the rationale behind the credit risk models
considered in this study (notably the Basel Accord method), the tetrachoric correlation is an
appropriate dependence measure in this context and seems to be the most adequate one.
We show that, irrespective of the credit segment, it suffices to know the default rate and the
number of loans in the portfolio to estimate the tetrachoric coefficient. Therefore, this
measure reflects the reality of the portfolios since their default rates are closely related to the
association across the underlying variables assumed to drive defaults. Our results confirm the
findings in other empirical studies according to which asset correlations in Basel are
overestimated.
Although the estimation of the tetrachoric coefficient has no analytical form, some closed-
form approximations are available and yield relatively satisfactory estimates. So, due to the
advantages pointed out in this paper, the tetrachoric coefficient is an option for regulators and
financial institutions to evaluate correlations in credit risk models.
Some possibilities of future research are the application of the approach suggested here to
loan portfolios of financial institutions and the assessment of the evolution of asset
correlations over time (as a function of the oscillation of the default rates).
REFERENCES
Basel Committee on Banking Supervision BCBS (2006), International Convergence of
Capital Measurement and Capital Standards: A Revised Framework. Bank for
International Settlements.
Castellan Jr., N. John (1966), On the estimation of the tetrachoric correlation coefficient.
Psychometrika, Vol. 31, No. 1, pp. 67-73.
Cheng, P. Y., Popovich, P. M. (2002), Correlation: Parametric and Nonparametric
Measures. Sage University Press Series on Quantitative Applications in the Social
Sciences, 07-139. Thousand Oaks, US: Sage.
Crook, J., Bellotti, T. (2012), Asset correlations for credit card defaults. Applied Financial
Economics, Vol. 22, Issue 2, pp. 87-95.
Das, S., Duffie, D., Kapadia, N., Saita, L. (2007), Common Failings: How Corporate Defaults
Are Correlated. The Journal of Finance, Vol. LXII, No. 1, pp. 93-118.
De Andrade, F., Thomas, L. (2007), Structural models in consumer credit. European Journal
of Operational Research, 183, pp. 1569-1581.
Digby, P. G. N. (1983), Approximating the Tetrachoric Correlation Coefficient. Biometrics,
Vol. 39, No. 3, pp. 753-757.
Divgi, D. V. (1979), Calculation of the tetrachoric correlation coefficient. Psychometrika,
Vol. 44, No. 2, pp. 169-172.
Duffie, D., Eckner, A., Horel, G., Saita, L. (2009), Frailty Correlated Default. The Journal of
Finance, Vol. LXIV, No. 5, pp. 2089-2123.
Hamdan, M. A. (1970), The Equivalence of Tetrachoric and Maximum Likelihood Estimates
of in 2 x 2 Tables. Biometrika, Vol. 57, No. 1, pp. 212-215.
Hamerle, A., Liebig, T., Rsch, D. (2003), Benchmarking Asset Correlations. Deutsche
Bundesbank Discussion Paper.
Hamerle, A., Rsch, D. (2006), Parameterizing Credit Risk Models. Journal of Credit Risk,
Vol. 2, pp. 102-122.
Harris, B. (1988), Tetrachoric correlation coefficient. In: Kotz, L., N. Johnson (Eds.),
Encyclopedia of Statistical Sciences, Vol. 9, New York: Wiley, pp. 223-225.
Juras, J., Pasari, Z. (2006), Application of tetrachoric and polychoric correlation coefficients
to forecast verification. Geofizika, Vol. 23, No. 1, pp. 59-82.
Lopez, Jose A. (2004), The empirical relationship between average asset correlation, firm
probability of default, and asset size. Journal of Financial Intermediation, Vol. 13, Issue
2, pp. 265-283.
Merton, Robert C. (1974), On the Pricing of Corporate Debt: The Risk Structure of Interest
Rates, Journal of Finance, 28, pp. 449-470.
Pearson, Karl (1900), Mathematical Contributions to the Theory of Evolution. VII. On the
Correlation of Characters not Quantitatively Measurable. Philosophical Transactions of
the Royal Society of London, Series A, Vol. 195, pp. 1-47.
Rsch, Daniel (2003), Correlations and business cycles of credit risk: evidence from
bankruptcies in Germany. Financial Markets and Portfolio Management, Vol. 17, No. 3,
pp. 309-331.
Rsch, Daniel (2005), An empirical comparison of default risk forecasts from alternative
credit rating philosophies. International Journal of Forecasting, 21, pp. 37-51.
Rsch, D., Scheule, H. (2004), Forecasting Retail Portfolio Credit Risk. The Journal of Risk
Finance, Winter/Spring, pp. 16-32.
Hours of operation
Short (Y=0) Long (Y=1) Total
Operational
condition
Good (X=1) a b a+b
Poor (X=0) c d c+d
Total a+c b+d a+b+c+d = N
Figure 1: a 2x2 contingency table associated with the example presented in Section 2.1; a, b,
c and d stand for the number of machines with the respective characteristics.
Default status (loan i)
default non-default
Default status
(loan j)
default comb(dr*L,2) [(L (dr*L)) * (dr*L)]/2
non-default [(L (dr*L)) * (dr*L)]/2 comb((L (dr*L)),2)
Figure 2: Default statuses represented in a 2x2 contingency table in terms of the default rate
(dr) and the number of loans in the portfolio (L). comb(u,v) represents the combination of u
elements taken v at a time. For convenience, the row and the column regarding the total
observations for loans i and j were omitted.
Panel A Portfolio composed of 100 loans
Panel B Portfolio composed of 1000 loans
Figure 3: Asset correlation as a function of default rate.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.014
0.016
0.018
0.02
0.022
0.024
0.026
0.028
0.03
0.032
default rate
t
e
t
r
a
c
h
o
r
i
c
c
o
r
r
e
l
a
t
i
o
n
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.5
1
1.5
2
2.5
3
3.5
x 10
-3
default rate
t
e
t
r
a
c
h
o
r
i
c
c
o
r
r
e
l
a
t
i
o
n
Panel C Portfolio composed of 10000 loans
Figure 3 (continued): Asset correlation as a function of default rate.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.5
1
1.5
2
2.5
3
3.5
x 10
-4
default rate
t
e
t
r
a
c
h
o
r
i
c
c
o
r
r
e
l
a
t
i
o
n
Default status (loan i)
default non-default Total
Default status
(loan j)
default a b a+b
non-default c d c+d
Total a+c b+d a+b+c+d = N
Figure 4: Default statuses represented in a 2x2 contingency table; a, b, c and d denote the
number of pairs of loans with the respective statuses.
Default status (loan i)
non-default default Total
Default status
(loan j)
default b a a+b
non-default d c c+d
Total b+d a+c a+b+c+d = N
Figure 5: An alternative representation of default statuses in a 2x2 contingency table; a, b, c
and d denote the number of pairs of loans with the respective statuses.
Panel A Portfolio composed of 100 loans
Panel B Portfolio composed of 1000 loans
Figure 6: Asset correlation as a function of default rate (plot adjusted at extreme points,
default rates lower than 0.05 and higher than 0.95).
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.014
0.016
0.018
0.02
0.022
0.024
0.026
0.028
default rate
t
e
t
r
a
c
h
o
r
i
c
c
o
r
r
e
l
a
t
i
o
n
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
x 10
-3
default rate
t
e
t
r
a
c
h
o
r
i
c
c
o
r
r
e
l
a
t
i
o
n
Panel C Portfolio composed of 10000 loans
Figure 6 (continued): Asset correlation as a function of default rate (plot adjusted at extreme
points, default rates lower than 0.05 and higher than 0.95).
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
x 10
-4
default rate
t
e
t
r
a
c
h
o
r
i
c
c
o
r
r
e
l
a
t
i
o
n