Basham, A., & Sedlacek,
W. E. (2009). Validity.
In American Counseling
Association (Ed.), The American Counseling Association encyclopedia of counseling (pp.
557-558).
Validity
Validity is an important concept in the field of assessment. Validity is determined by estimating if scores from a measure assess what they are supposed to assess. There are a number of ways that validity of the scores on a measure can be established. There are many good discussions of techniques to determine validity (e.g., Anastasi & Urbina, 1997; Linn & Gronlund, 2000) Validity should always be stated in terms of a specific purpose for a specific group, and is a characteristic of the results, or scores, on a measure, rather than the measure itself (AERA/APA/NCME, 1999). For example, do the scores on a measure show validity and reliability in predicting grades for Asian American students at a certain secondary school? Validity is not a general characteristic of an assessment method, it is a characteristic of the scores from a particular sample in a specific context.
There are several ways in which validity can be estimated, depending on the type of test, its intended purpose, and how it is constructed. The scores on a measure have face validity if they look like they are measuring topics of interest, with no further evidence. Consider an example in which a group is evaluating a questionnaire, or other measure, by looking over the items. With little more than personal hunches, the group picks the measure that looks like it will best measure the concept. Content validity requires more documentation than face validity. The logic here is that the content of the items on a test, questions in an interview, themes in a focus group and so on, should contain the content that one is seeking to measure or evaluate. As opposed to face validity, content validity requires collection of some empirical information or expert judgments and should have a scholarly foundation. In construct validity a number of items are written to cover certain abstract constructs or dimensions of interest, or the items are from the literature on a topic. Often, use is made of statistical techniques that group items together empirically, such as factor analysis or cluster analysis (Merenda, 1997).
As the name suggests, in predictive validity the test developer is trying to predict scores on some future criterion measure. A common example in higher education is in admissions. Scores from a given test are said to have validity in admissions if they can predict future student success (e.g. grades, retention, graduation) for certain groups in certain contexts (e.g. Latino students at college X). In the pure form of demonstrating predictive validity, all applicants would be admitted, criterion measures on each student would be obtained, and the most accurate prediction equations possible would be developed. A variety of statistical techniques can be used, including multiple regression, multiple discriminant analysis, logistic regression and LISREL (Cizek & Fitzgerald, 1999). However, in practice the test developer rarely, if ever, has the opportunity to obtain criterion scores on an unselected sample, even though this is required for the best estimate of predictive validity. If the range of possible scores is restricted on the predictors or criteria, the size of the statistic representing the extent of the relationship (e.g. correlation coefficient) is artificially reduced.
In concurrent validity, the test developer identifies those who are successful (or unsuccessful) on a criterion measure, and a measure is developed that reflects the characteristics of the successful group; ideally contrasted with the unsuccessful group. In developing the Minnesota Multiphasic Personality Inventory (MMPI; Hathaway and McKinley, 1943) people with certain clinical symptoms were compared with a “normal” group on their responses to many items. Items that differentiated between the two groups were retained in the instrument. Congruent validity is estimated by correlating scores from a new measure with those from an existing measure against a specific criterion (Fuertes, Miville, Mohr, Sedlacek & Gretchen, 2000). It is an easy way to check the validity of scores on a new measure.
Convergent validity is demonstrated when several assessments are shown to achieve the same result using different measures, while discriminant validity is achieved when one measure is differentiated from another. For example, if two personality assessments give the same profile there is evidence for convergent validity, whereas if the two measures show different results, there is evidence for discriminant validity
Contributed by Alan Basham, Eastern
Washington University (CITY), WA, and William Sedlacek,
References
American Educational Research Association, American Psychological Association, National
Council on
Measurement in Education. (1999). Standards
for educational and
psychological testing.
Anastasi,
A., & Urbina, S. (1997). Psychological testing (7th ed.).
Prentice-Hall.
Cizek, G. J., & Fitzgerald, S. M. (1999). An introduction to logistic
regression. Measurement
and
Evaluation in Counseling and Development, 31, 223–241.
Fuertes, J. N., Miville, M. L., Mohr, J. J., Sedlacek, W. E., & Gretchen, D. (2000). Factor
structure and Short Form of the Miville-Guzman Universality-Diversity Scale.
Measurement and Evaluation in Counseling and
Development, 33, 157–169.[[
Hathaway,
S. R., & McKinley, J. C. (1943). The
(Rev. Ed.).
Linn,
R. L., & Gronlund, N. E. (2000). Measurement and assessment in teaching.
Upper Saddle
River, NJ: Prentice Hall.
Merenda, P. F. (1997). A guide to the proper use of factor analysis in the conduct and reporting
of
research: Pitfalls to avoid. Measurement
and Evaluation in Counseling and
Development, 30, 156–164.