Applying Ridge Regression to Admissions Data by Race and Sex


Terence J. Tracey, William E. Sedlacek and Russell D. Miars

Given these times of diminishing enrollments, it is increasingly important for colleges and universities to select students who have the greater likelihood in succeeding. The standard method of predicting success at any institution is a regression equation based on past academic achievement, typically high school grades, and present academic ability, usually measured by a standardized test such as the SAT or the ACT. The problem with this procedure is that since these measures are so highly intercorrelated (high multicollinearity), the resulting least squares regression equations suffer from a relatively high degree of error variance. (Darlington, 1978). Cross-validations of these equations typically yield low levels of prediction.

A recent alternative to least squares regression is ridge regression (Darlington, 1978; Dempster, Schatzoff & Wermuth, 1977; Hoerl & Kennard, 1970; Prince, 1977). Ridge regression was developed expressly for the purpose of circumventing the weakness of least squares regression with regard to highly overlapping predictors. The typical measures used in collegiate admissions are highly interrelated, and as such, applying ridge regression would appear to be very appropriate.

Ridge regression is similar to least squares regression except that a small constant value is added to the main diagonal of the variance-covariance matrix prior to the determination of the regression equation. The exact means of determining the regression equations are identical in these two procedures, following this addition to the variance-covariance matrix done in the ridge procedure. In effect, what this alteration of the data achieves is a new set of data that has a lower degree of multicolliearity, a better fit of the regression equation to the actual data is achieved as the mean square error is reduced.

Obviously, a key aspect of ridge regression is determining what the best value of the constant that is added to the main diagonal of the variance-covariance matrix is to maximize prediction. The typical way of determining this maximal constant value (delta) is by using an iterative procedure: adding in a series of possible delta values and seeing the effect on the regression equation. The best delta value is the one associated with the lowest mean square error for the equation. Using successively higher delta levels typically results in decreasing mean square errors to a certain point where higher delta levels increase mean square errors. The specific delta value that yielded the lowest man square error is the delta value that is to be used. The regression equation associated with this value yields the maximal predictive power.

It is important to realize that the resulting ridge regression equation is a biased estimate and not reflective of population parameters. As such, ridge regression is of little use in theoretical modeling (Darlingon, 1978). The main advantage of ridge regression is in prediction, and as this is the specific purpose of using regression equations in selection, ridge regression would appear to be a particularly useful tool in admissions.

The purpose of this study was to examine the improvement in prediction of the ridge regression procedure over the least squares regression procedure when applied to admissions data. Improvement of prediction was defined in terms of the shrinkage of the multiple correlation coefficients obtained when the regression equations were cross-validated on another sample. Further, as it has been demonstrated that separate regressions equations for eaxh race and sex are desirable in selecting students (Farver, Sedlacek & Brooks, 1975), the above research question was examined separately for each race/sex group. It was expected that the ridge regression procedure would result in less coefficient shrinkage than the least square procedure when cross-validated.

Method

Sample

The sample and data used in this study were the same used by Farver, Sedlacek and Brooks (1975). All black freshmen students entering a large stat university in the fall of 1968 (N = 126) and the fall of 1969 (N = 133) who had complete data (high school GPA and SAT scores) and who complete the freshman year were included in this study. Samples of white students were randomly drawn as a comparison group.

Analyses

It was decided to examine the predictability of freshman grades from high school GPA, SAT Math and SAT Verbal scores separately for each of the race/sex groups (black males, black females, white males and white females). As ridge regression is purported to be most useful given small sample sizes, random subsamples of 25 were drawn from each of the above four groups for each of the sample years (1968 and 1969). So a total of eight subsamples of 25 were drawn and least squares and ridge regression equations were performed on each. The resulting regression equations were then cross-validated on the full, corresponding sex/race sample from the other sample year to get a predicted freshman cumulative GPA for each individual. These predicted cumulative GPA’s were then correlated with the actual freshman year cumulative GPA. The difference between the correlation of the original equation and the correlation of the cross-validation was the shrinkage examined.

It was hypothesized that the cross-validation shrinkage for the ridge regression would be less than the shrinkage for the least squares regression.

Results

The summary of the multiple correlations based on the subsample (N = 25) regressions (least squares and ridge) and the cross-validated correlations are presented in Table I. As can be seen from Table I, ridge regression was as good as, if not better than, least squares regression in reducing shrinkage. The shrinkage associated with the least squares equations were vertically identical with the shrinkage associated with the ridge equations in six of the eight validations. The two exceptions to this were the use of the 1968 white male equations on the 1969 white males and the 1969 black male equations on the 1968 black males. In each of these cases there was less shrinkage associated with the ridge regression equations.

Discussion

The results of this study were not as strong as hypothesized. There was relatively little difference in the prediction of college grades between the least squares and ridge regression techniques. Ridge regression yielded similar or slightly better results compared to the least squares regression. Given the greater effort required to apply ridge regression, its usefulness in helping to select students may be of limited value if it is applied only to the predictors used in this study, high school GPA and SAT scores.

A possible reason for the lack of results in this study could have been the relatively small size of the ratio of the number of predictors (p) to the sample (n) used in obtaining the regression equations. In this study, the p/n ratio was 3/25. It has been found that when the p/n ratio is too small, there really is no difference between the least squares and ridge regression equations. But where this p/n ratio is large, i.e., many predictors with a small sample, ridge regression has been demonstrated to be more accurate than least squares regression (Darlington, 1978; Dempster, Schatzoff & Wermuth, 1977; Faden, 1978). So ridge regression might be a valuable tool if more predictors of collegiate success were included in the regression equations. Some particularly valuable predictors to include in predicting collegiate success are non-cognitive predictors, such as those suggested by Sedlacek and Brooks, (1976). These seven non-cognitive dimensions have some variance overlap with the above cognitive predictors, but also contribute some unique variance with collegiate success (Tracey and Sedlacek, 1982). If these variables were used along with HS GPA and SAT scores, the ridge regression should yield less cross-validated shrinkage than least squares solutions.

If schools use only the traditional cognitive predictors of success, as used here, ridge regression is only a minimal improvement over the usual least squares regression in terms of predicting success (Sedlacek & Brooks, 1976; Tracey & Sedlacek, 1980), and if these measures are included as predictors, ridge regression appears to be a viable alternative to least squares regression. At worst, ridge regression yields similar results; at best, it is a vast improvement in prediction power over least squares regression.

Finally, ridge regression may be valuable to use as it typically does not require the large sample sizes that least squares regression does. The process of collecting data for a large sample is and time consuming. Ridge regression may enable schools to gather a smaller sample of data without sacrificing predictive power.

References

Darlington, R.B. “Reduced-Variance Regression,” Psychological Bulletin, 1978, 85, 1238-1255.

Dempster, A.P., Schatzoff, M., & Wermuth, N. “A Simulation Study of Alternatives to Ordinary Least Squares,” Journal of the American Statistical Association, 1977, 72, 77-91.

Faden, V.B. Shrinkage in Ridge Regression and Ordinary Least Squares Multiple Regression Eliminators. Unpublished doctoral dissertation, University of Maryland, 1978.

Farver, A.S., Sedlacek, W.E., & Brooks, Jr., G.C. “Longitudinal Predictions Of University Grades for Blacks and Whites,” Measurement and Evaluation in Guidance, 1975, 7, 243-250.

Hoerl, A.E. & Kennard, R.W. “Ridge Regression: Biased Estimation for NonOrthogonal Problems,” Technometrics, 1970, 12, 69-82.

Price, B. “Ridge Regression: Application to Non-Experimental Data,” Psychlogical Bulletin, 1977, 84, 759-766.

Sedlacek, W.E. & Brooks, G.C., Jr. Racism in American Education: A Model For Change. Chicago: Nelson-Hall, 1976.

Tracey, T.J. & Sedlacek, W.E. “Conducting Student Retention Research,” NASPA (National Association of Student Personnel Administrators) Journal Field Report, 1981, 5, 5-6.

Tracey, T.J. & Sedlacek, W.E. Noncognitive Variables in Predicting Academic Success by Race. Counseling Center Research Report # 1-82. University of Maryland, College Park, 1982.

Table 1

Multiple Correlation Coefficients Obtained Using Least Squares and Ridge Regression by Race and Sex Equations

  Original Equations [1] Cross-Validation [2]
Sample Least Squares R Ridge R Ridge Change Least Squares R Ridge R Cross Validation N
1968 Black Males .74 .75 .15 .42 .41 58
1968 Black Females .50 .50 .75 .55 .56 75
1968 White Males .71 .70 .10 .45 .51 70
1968 White Females .84 .84 .19 .49 .49 52
1969 Black Males .56 .56 .25 .48 .53 64
1969 White Males .59 .59 .50 .62 .61 78
1969 White Females .68 .67 .30 .68 .68 66

[1] Original equation based on subsamples of n = 25, and predictors of H.S. GPA and SAT Verbal and Math.

[2] Cross-validation used appropriate subgroup from each year, i.e., 1968 black male equation used on 1969 black male sample.

 


[X] Close Window

©2005 William Sedlacek