EDU 510 - Assessment page - main page

Definitions of Reliability and Validity

Reliability

Reliability deals generally with issues of how well a measurement could be reproduced and/or how consistent measurements are.  Statistical reliability of tests examines how much measurement error there is relating to a test associated with the test itself.  Some forms of reliability that come into question most often in day to day educational assessment practices include:

Inter-rater reliability: how consistently two or more raters would score something.  For example, if teachers A and Teacher B both grade a paper, how much agreement they would have?

Intra-rater reliability: how consistently a rater scores similar things over different ratings. For example, if you watch a performance during one part of the day, would you rate it the same as if you watched it in another part of the day?

Performance reliability: how consistent is a certain type of performance over intervals?  For example, if a student scores a certain way today, how is their score likely to change tomorrow?
 

Validity

Validity deals with how well any device measures what it is supposed to measure.  Is the measurement an authentic representation of what was intended to be measured? Forms of validity that are most often salient to day to day assessment practices include:

Content validity: How well the measure relates to what was taught or intended to be learned?  Also called instructional validity.

Construct validity: How sound is the basis of the measure, and how strong are the theoretical constructs of the test?

Predictive validity: If the measure is supposed to predict future abilities, is it effective in this prediction?  For example, aptitude tests are supposed to predict future achievement.

Consequential validity: What happens as a result of the use of the measure such as benefits, long-term effects, intended and unintended future consequences?  If the test is accurate but has negative consequences to learning, is it wholly valid?

Relevant Validity: Is this test relevant to the measurement sought, is it an appropriate instrument, and what is the purpose of using it?

External Validity: Does the measurement generalize to other situations?  For example, if a few students did not like the movie, how sure can you be that most students would not like the movie?

EDU 510 - Assessment page - main page