Forecasting Volatility
Last updated
Last updated
Learning Outcome
discuss methods of forecasting volatility
In some applications, the analyst is concerned with forecasting the variance for only a single asset. More often, however, the analyst needs to forecast the variance–covariance matrix for several, perhaps many, assets in order to analyze the risk of portfolios. Estimating a single variance that is believed to be constant is straightforward: The familiar sample variance is unbiased and its precision can be enhanced by using higher-frequency data. The analyst’s task becomes more complicated if the variance is not believed to be constant or the analyst needs to forecast a variance–covariance (VCV) matrix. These issues are addressed in this section. In addition, we elaborate on de-smoothing real estate and other returns.
Estimating a Constant VCV Matrix with Sample Statistics
The simplest and most heavily used method for estimating constant variances and covariances is to use the corresponding sample statistic—variance or covariance—computed from historical return data. These elements are then assembled into a VCV matrix. There are two main problems with this method, both related to sample size. First, given the short to intermediate sample periods typical in finance, the method cannot be used to estimate the VCV matrix for large numbers of assets. If the number of assets exceeds the number of historical observations, then some portfolios will erroneously appear to be riskless. Second, given typical sample sizes, this method is subject to substantial sampling error. A useful rule of thumb that addresses both of these issues is that the number of observations should be at least 10 times the number of assets in order for the sample VCV matrix to be deemed reliable. In addition, since each element is estimated without regard to any of the others, this method does not address the issue of imposing cross-sectional consistency.
VCV Matrices from Multi-Factor Models
Factor models have become the standard method of imposing structure on the VCV matrix of asset returns. From this perspective, their main advantage is that the number of assets can be very large relative to the number of observations. The key to making this work is that the covariances are fully determined by exposures to a small number of common factors whereas each variance includes an asset-specific component.
In a model with K common factors, the return on the ith asset is given by
10
where αi is a constant intercept, βik is the asset’s sensitivity to the kth factor, Fk is the kth common factor return, and εi is a stochastic term with a mean of zero that is unique to the ith asset. In general, the factors will be correlated. Given the model, the variance of the ith asset is
A well-specified factor model can also improve cross-sectional consistency. To illustrate, suppose we somehow know that the true covariance of any asset i with any asset j is proportional to asset i’s covariance with any third asset, k, so
for any assets i, j, and k. We would want our estimates to come as close as possible to satisfying this relationship. Sample covariances computed from any given sample of returns will not, in general, do so. However, using Equation 12 with only one factor (i.e., K = 1) shows that the covariances from a single-factor model will satisfy
for all assets i, j, and k. Thus, in this simple example, a single-factor model imposes exactly the right cross-sectional structure.
The benefits obtained by imposing a factor structure—handling large numbers of assets, a reduced number of parameters to be estimated, imposition of cross-sectional structure, and a potentially substantial reduction of estimation error—come at a cost. In contrast to the simple example just discussed, in general, the factor model will almost certainly be mis-specified. The structure it imposes will not be exactly right. As a result, the factor-based VCV matrix is biased; that is, the expected value is not equal to the true (unobservable) VCV matrix of the returns. To put it differently, the matrix is not correct even “on average.” The matrix is also inconsistent; that is, it does not converge to the true matrix as the sample size gets arbitrarily large. In contrast, the sample VCV matrix is unbiased and consistent. Thus, when we use a factor-based matrix instead of the sample VCV matrix, we are choosing to estimate something that is “not quite right” with relative precision rather than the “right thing” with a lot of noise. The point is that although factor models are very useful, they are not a panacea.
Shrinkage Estimation of VCV Matrices
As with shrinkage estimation in general, the idea here is to combine the information in the sample data, the sample VCV matrix, with an alternative estimate, the target VCV matrix—which reflects assumed “prior” knowledge of the structure of the true VCV matrix—and thereby mitigate the impact of estimation error on the final matrix. Each element (variance or covariance) of the final shrinkage estimate of the VCV matrix is simply a weighted average of the corresponding elements of the sample VCV matrix and the target VCV matrix. The same weights are used for all elements of the matrix. The analyst must determine how much weight to put on the target matrix (the “prior” knowledge) and how much weight to put on the sample data (the sample VCV matrix).
Aside from a technical condition that rules out the appearance of riskless portfolios, virtually any choice of target VCV matrix will increase (or at least not decrease) the efficiency of the estimates versus the sample VCV matrix. “Efficiency” in this context means a smaller mean-squared error (MSE), which is equal to an estimator’s variance plus the square of its bias. Although the shrinkage estimator is biased, its MSE will in general be smaller than the MSE of the (unbiased) sample VCV matrix. The more plausible (and presumably less biased) the selected target matrix, the greater the improvement will be. A factor-model-based VCV matrix would be a reasonable candidate for the target.
Solution:
The VCV matrix based on sample statistics is correct on average (it is unbiased) and convergences to the true VCV matrix as the sample size gets arbitrarily large (it is “consistent”). The sample VCV method cannot be used if the number of assets exceeds the number of observations, which is not an issue in this case. However, it is subject to large sampling errors unless the number of observations is large relative to the number of assets. A 10-to-1 rule of thumb would suggest that Berkitz needs more than 250 observations (20+ years of monthly data) in order for the sample VCV matrix to give her reliable estimates, but she has at most 120 observations. In addition, the sample VCV matrix does not impose any cross-sectional consistency on the estimates. A factor-model-based VCV matrix can be used even if the number of assets exceeds the number of observations. It can substantially reduce the number of unique parameters to be estimated, it imposes cross-sectional structure, and it can substantially reduce estimation errors. However, unless the structure imposed by the factor model is exactly correct, the VCV matrix will not be correct on average (it will be biased). Shrinkage estimation—a weighted average of the sample VCV and factor-based VCV matrices—will increase (or at least not decrease) the efficiency of the estimates. In effect, the shrinkage estimator captures the benefits of each underlying methodology and mitigates their respective limitations.
Estimating Volatility from Smoothed Returns
The available return data for such asset classes as private real estate, private equity, and hedge funds generally reflect smoothing of unobservable underlying “true” returns. The smoothing dampens the volatility of the observed data and distorts correlations with other assets. Thus, the raw data tend to understate the risk and overstate the diversification benefits of these asset classes. Failure to adjust for the impact of smoothing will almost certainly lead to distorted portfolio analysis and hence poor asset allocation decisions.
The basic idea is that the observed returns are a weighted average of current and past true, unobservable returns. One of the simplest and most widely used models implies that the current observed return, Rt, is a weighted average of the current true return, rt, and the previous observed return:
As an example, if λ = 0.8, then the true variance, var(r), of the asset is 9 times the variance of the observed data. Equivalently, the standard deviation is 3 times larger.
This model cannot be estimated directly because the true return, rt, is not observable. To get around this problem, the analyst assumes a relationship between the unobservable return and one or more observable variables. For private real estate, a natural choice might be a REIT index, whereas for private equity, an index of similar publicly traded equities could be used.
Time-Varying Volatility: ARCH Models
The discussion up to this point has focused on estimating variances and covariances under the assumption that their true values do not change over time. It is well known, however, that financial asset returns tend to exhibit volatility clustering, evidenced by periods of high and low volatility. A class of models known collectively as autoregressive conditional heteroskedasticity (ARCH) models has been developed to address these time-varying volatilities.31
One of the simplest and most heavily used forms of this broad class of models specifies that the variance in period t is given by
where α, β, and γ are non-negative parameters such that (α + β) < 1. The term ηt is the unexpected component of return in period t; that is, it is a random variable with a mean of zero conditional on information at time (t − 1). Rearranging the equation as in the second line shows that
can be interpreted as the “shock” to the variance in period t. Thus, the variance in period t depends on the variance in period (t − 1) plus a shock. The parameter β controls how much of the current “shock” feeds into the variance. In the extreme, if β = 0, then variance would be deterministic. The quantity (α + β) determines the extent to which the variance in future periods is influenced by the current level of volatility. The higher (α + β) is, the more the variance “remembers” what happened in the past and the more it “clusters” at high or low levels. The unconditional expected value of the variance is [γ/(1 − α − β)].
As an example, assume that γ = 0.000002, α = 0.9, and β = 0.08 and that we are estimating daily equity volatility. Given these parameters, the unconditional expected value of the variance is 0.0001, implying that the daily standard deviation is 1% (0.01). Suppose the estimated variance at time (t − 1) was 0.0004 (= 0.022) and the return in period t was 3% above expectations (ηt = 0.03). Then the variance in period t would be
The ARCH methodology can be extended to multiple assets—that is, to estimation of a VCV matrix. The most straightforward extensions tend to be limited to only a few assets since the number of parameters rises very rapidly. However, Engle (2002) developed a class of models with the potential to handle large matrices with relatively few parameters.
where ρmn is the covariance between the mth and nth factors and is the variance of the unique component of the ith asset’s return. The covariance between the ith and jth assets is
As long as none of the factors are redundant and none of the asset returns are completely determined by the factors (so ��2≠0), there will not be any portfolios that erroneously appear to be riskless. That is, we will not encounter the first problem mentioned in Section 8, with respect to using sample statistics.
Imposing structure with a factor model makes the VCV matrix much simpler. With N assets, there are [N(N − 1)/2] distinct covariance elements in the VCV matrix. For example, if N =100, there are 4,950 distinct covariances to be estimated. The factor model reduces this problem to estimating [N × K] factor sensitivities plus [K(K + 1)/2] elements of the factor VCV matrix, Ω. With N = 100 and K = 5, this would mean “only” 500 sensitivities and 15 elements of the factor VCV matrix—almost a 90% reduction in items to estimate. (Of course, we also need to estimate the asset-specific variance terms, ��2, in order to get the N variances, ��2.) If the factors are chosen well, the factor-based VCV matrix will contain substantially less estimation error than the sample VCV matrix does.