Counseling
Center
University
of Maryland
College
Park, Maryland
Using
Ridge Regression with Non-cognitive
Variables
by Race in Admissions
Terence
J. Tracey and William E. Sedlacek
Research
Report # 1-83
A
version of this paper was presented at the annual meeting of the American
Educational Research Association, Montreal, April, 1983.
Terence
J. Tracey is now an Assistant Professor in Educational Psychology at the
University of Illinois, Champaign/Illinois, Champaign/Urbana. William E.
Sedlacek is Assistant Director of the Counseling Center at the University of
Maryland.
Counseling Center
University
of Maryland
College
Park, Maryland
Using
Ridge Regression with Non-cognitive
Variables
by Race in Admissions
Terence
J. Tracey and William E. Sedlacek
Research
Report # 1-83
The relative predictive validity of two regression
methods, ordinary least squares (OLS) and ridge regression, were compared using
SAT scores and the eight non-cognitive measures posited by Tracey and Sedlacek
(in press) as predictor variables. It was hypothesized that, given the high
degree of multi collinearity of the predictor set, ridge regression would
provide less cross-validated shrinkage than would the OLS method. Sub-samples,
a different one for each race, were drawn and prediction equations were
generated for each regression method. These equations were then applied to the
entire racial sample and the shrinkage was examined. The results demonstrated
that each method was equally valid.
2
Ridge regression has been developed as an alternative
prediction method to ordinary least squares regression (OLS) in cases where
there is a high degree of multicollinearity among the predictor variables.
Predicting academic success is an area where there is a high degree of overlap
among the predictor variables and should thus be an excellent area to apply
ridge regression instead of the usual OLS regression used. Obtaining valid
prediction equations in this area is often difficult because the high degree of
multicollinearity tends to create very different prediction equations from year
to year. Thus, the process of validating, these equations typically involves
collecting very large samples over many years. Ridge regression has been found
to be most effective in exactly these cases. Ridge regression should result in
more stable equations with high multicollinear data, and thus, should be more
valid using smaller samples than typically required by least squares methods.
Therefore, it would be quite valuable to colleges and universities to use ridge
regression if it resulted in stable, valid prediction equations based on much
smaller samples. The purpose of this paper was to describe a study conducted to
examine the effectiveness of applying ridge regression procedures over OLS regression
procedures to admissions data.
When least squares regressions are used on highly
interrelated data (high in multicollinearity), the resulting beta weights tend
to be very unstable, and when prediction equations are cross-validated,
the shrinkage in prediction tends to be great
3
(Darlington, 1978; Faden, 1978). Ridge regression was
developed to be used in exactly these situations of high multi collinearity
(Darlington, 1978; Dempster, Schatzoff, & Wermuth, 1977; Hoerl &
Kennard, 1970; Price, 1977). Overall, the two procedures
are identical except that in ridge regression, a small constant is added to the
main diagonal of the variance-covariance matrix prior to the
determination of the regression equation. This adding of the constant creates a
"ridge" on the main diagonal, hence the name. Adding this ridge is an
artificial means of decreasing the relative amount of collinearity in
the data. The determination of the specific constant ( =delta) that is added to
the matrix is determined by using an iterative approach; selecting the delta
value that results in the lowest total mean square error for the prediction
equation.
Tracey, Sedlacek, and Miars (1983) compared ridge
regression and least squares. regression in . predicting freshman year
cumulative grade point average (GPA) based on SAT scores and high school GPA.
They found that ridge regression resulted in cross validated correlations
similar to those .found by using ordinary least squares regression. The failure
of ridge regression to yield less shrinkage over OLS regression was postulated
to have been due to a relatively low ratio of the number of predictors (p) to
the number of subjects in the sample (n) used in the study. Faden (1978) found
that the key dimension where ridge regression proved superior to OLS regression
was where the p/n ratio was high. So it
4
was decided to re-examine the efficacy of ridge
regression with admissions data by using data with more than three predictors.
Sedlacek and Brooks (1976) postulated non-cognitive
variables that are predictive of minority student academic success. Tracey and
Sedlacek (in press) developed a brief questionnaire, the Non-Cognitive
Questionnaire (NCQ), to assess these variables and found eight non-cognitive
factors to be highly predictive of grades and enrollment status for both whites
and blacks above and beyond using SAT scores alone. But, it was also found that
these variables shared a high degree of variance with the SAT scores, so there
was a fairly high degree of multicollinearity. It was felt that these ten
variables (SATV, SATM, and the eight non-cognitive factors) would be an
ideal application of ridge regression.
The purpose of this study was to examine the efficacy of
ridge regression over ordinary least squares regression as applied to admissions
data (SAT & NCQ scores). Because it has been demonstrated that separate
regression equations for each race are desirable in selecting students (Farver,
Sedlacek, & Brooks, 1975), it was decided to use separate equations for
each race. This examination of ridge and OLS regressions was conducted using
separate race (black and white) equations in addition to a general equation
without regard to race. Also, the criterion of interest in this study was
cumulative grade point average (GPA) after three semesters rather than the more
typical, short-term lengths of one or two semesters. It was hypothesized
that using ridge regression
5
would result in lower cross-validated shrinkage of
prediction than would using OLS regression in each of the analyses conducted
(white, black, and general).
Sample
A random sample of 825 incoming freshmen attending summer
orientation at a large eastern university were administered the Non-Cognitive
Questionnaire. This sample of 825 consisted of 571 white students, 176 black
students, and 78 students of other (mostly Asian-Americans) or unknown
racial backgrounds.
Measures
The Non-Cognitive Questionnaire (NCQ)
consisted of 22 Likert type items and three open-ended items, all
relating to expectations of college. The individual items were designed to
measure the non-cognitive variables postulated to be related to minority
student academic success (Sedlacek & Brooks, 1976). The eight factors (as
developed by Tracey & Sedlacek, in press) were: leadership, preference for
long range goals over short, self-confidence, realistic self-appraisal,
an understanding of and ability to deal with racism, demonstrated community
service, having strong support for college plans, and academic familiarity.
There is evidence for the content and predictive validity of these factors as
measured by the NCQ (Tracey & Sedlacek, in press).
Analyses
6
As ridge regression has been demonstrated to be most
effective over OLS in cases where the ratio of the number of predictor
variables (p) to the sample size (n) was large (Faden, 1978), small sub-samples
were drawn from the original sample to generate the prediction equations. Sub-samples
of 50 were drawn from the whole sample and from the sample of white students.
Given that the sample of black students was significantly smaller, a small sub-sample
of 30 was drawn. Each random sub-sample was used to generate OLS and ridge
regression prediction equations, using SAT Verbal, SAT Quantitative, and the
eight non-cognitive factors from the NCQ as predictors, and three
semester GPA as the criterion. The resulting predictive equations were then
used to generate predicted three semester GPA for the larger sample. These
predicted GPA's were then correlated with the actual three semester GPA. This
cross-validation of predictive equations would yield information on the
differences of shrinkage in prediction between the OLS and ridge methods. It
was expected that the ridge regression technique would result in higher
correlations between the predicted the actual grades in the cross-validation
step than would be found for the ordinary least squares regression technique.
The summary of the correlations of the predictive
equations with three semester GPA are presented in Table 1.
7
Insert Table 1 about here
As evidenced by Table 1, fairly high levels of prediction
were attained in each of the sub-samples. But, in each sample, the shrinkage in
prediction of each equation when applied to the full sample was substantial.
This shrinkage was particularly dramatic for the black sample, which had the
most predictive equation based on the sub-sample and a non-significant
cross-validated prediction. The hypothesized result that ridge regression
would yield less shrinkage in prediction than OLS regression was not
demonstrated. In each case, the application of ridge regression yielded
shrinkage results similar to the application of OLS.
The results of this study mirrored those found earlier by
Tracey, Sedlacek, and Miars (1983). Applying ridge regression to admissions data
did not yield predictive equations that improved on those equations derived
from applying ordinary least squares regression. This study was designed to
represent those aspects where ridge regression has been found superior to OLS
regression in Monte Carlo studies. These conditions are: a) when there is a
high degree of multi collinearity, which there was in the predictors used, b)
when the samples on which the equations are based are small, and c) when the
ratio of p/n is relatively large (Darlington, 1979; Dempster, Schatzoff, &
Wermuth, 1977; Faden,
8
1978). The failure of ridge regression to yield less
shrinkage than OLS regression is perplexing. Perhaps one possible reason for
this lack of expected results is the relative lack of sophistication of the
measures used. The NCQ factors may not have the well developed psychometric
properties typically associated with measures like SAT scores, although Tracey
and Sedlacek (in press) provided very good reliability and validity data. In
the Monte Carlo studies examining ridge regression, there-was little
concern about the error variance associated with the predictors, given that the
population parameters were known. In real applications, the population
parameters are unknown and the error variance associated with the predictors
takes on more importance. Thus, less well developed measures such as the NCQ
may introduce a good deal of error variance unassociated with
multicollinearity, and the very precise procedures of ridge regression may not
be able to reduce this. Perhaps ridge regression is most useful only in those
ideal conditions used in Monte Carlo studies, but not in actual applications
where typical social science variables used tend to be less precise. Rozeboom
(1979) demonstrated that ridge regression may enhance prediction if the
conditions are right,. but if not, decreased accuracy would result. He went on
to state that how to diagnose when the conditions are right remains obscure. In
application to admissions data, ridge regression does not significantly enhance
or detract from the prediction possible from OLS and thus may have limited
utility in applied admissions situations.
9
Darlington, R. B. (1978). Reduced variance regression. Psychological
Bulletin, 85, 1238-1255.
Dempster, A. P., Schatzoff, M., & Wermuth, N. (1977).
A simulation study of alternatives to ordinary least
squares. Journal of the American
Statistical Association, 72, 77-91,
Faden, V. B. (1978). Shrinkage in ridge regression and
ordinary least squares multiple regression estimators.
Unpublished doctoral
dissertation, University of-Maryland.
Farver, A. S., Sedlacek, W. E., Brooks, Jr., G. C. (1975).
Longitudinal predictions of university grades for
black and whites. Measurement
and Evaluation in Guidance, 7, 243-250.
Hoerl, A. E. & Kennard, R. W. (1970). Ridge
regression: Biased estimation for nonorthogonal problems.
Technometrics, 12, 69-82,
Price, B. (1977). Ridge regression: Applications to nonexperimental
data. Psychological Bulletin, 84,
759-766.
Rozeboom, W. W. (1979). Ridge regression: Bonanza or beguilement? Psychological Bulletin, 86, 242-249.
Sedlacek, W. E. & Brooks, Jr., G. C. (1976). Racism
in American education: A model for change. Chicago:
Nelson-Hall..
Tracey, T. J., & Sedlacek, W. E. (in press). Noncognitive
variables in predicting academic success by race.
Measurement and Evaluation in
Guidance.
10
Tracey, T. J., Sedlacek, W. E., & Miars, R. D. (1983).
Applying ridge regression to admissions data by race
and sex. College and
University, 58, 313-318.
Table 1: Summary of Multiple Coefficients Using Ordinary Least Squares (OLS) and
Ridge Regression Equations to Predict College Cumulative Average After Three
Semesters |
||||||||
|
Sub-sample Equations |
|
Cross Validation2 |
|||||
|
|
OLS |
RIDGE |
RIDGE1 |
|
|
OLS |
RIDGE |
Sample |
n |
R |
R |
|
|
n |
R |
R |
Whites |
50 |
0.63 |
0.62 |
0.8 |
|
571 |
0.39 |
0.37 |
Blacks |
30 |
0.67 |
0.66 |
0.35 |
|
176 |
0.03 |
0.06 |
Whole Sample |
50 |
0.61 |
0.6 |
0.8 |
|
825 |
0.37 |
0.36 |
1-refers to the constant that is added to the main
diagonal of the variance-covariance matrix, creating the
"ridge."
2-Cross validation performed on full sample.