Category: Comparison of assessments

Item response theory
In psychometrics, item response theory (IRT) (also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of tests, q
Inter-rater reliability
In statistics, inter-rater reliability (also called by various similar names, such as inter-rater agreement, inter-rater concordance, inter-observer reliability, inter-coder reliability, and so on) is
Anchor test
In psychometrics, an anchor test is a common set of test items administered in combination with two or more alternative forms of the test with the aim of establishing the equivalence of the test score
Bangdiwala's B
Bangdiwala's B statistic was created by in 1985 and is a measure of inter-rater agreement. While not as commonly used as the kappa statistic the B test has been used by various workers. While it is pr
Internal consistency
In statistics and research, internal consistency is typically a measure based on the correlations between different items on the same test (or the same subscale on a larger test). It measures whether
Intra-rater reliability
In statistics, intra-rater reliability is the degree of agreement among repeated administrations of a diagnostic test performed by a single rater. Intra-rater reliability and inter-rater reliability a
Cronbach's alpha
Cronbach's alpha (Cronbach's ), also known as tau-equivalent reliability or coefficient alpha (coefficient ), is a reliability coefficient that provides a method of measuring internal consistency of t
Classical test theory
Classical test theory (CTT) is a body of related psychometric theory that predicts outcomes of psychological testing such as the difficulty of items or the ability of test-takers. It is a theory of te
Kuder–Richardson formulas
In psychometrics, the Kuder–Richardson formulas, first published in 1937, are a measure of internal consistency reliability for measures with dichotomous choices. They were developed by Kuder and Rich
Consensus-based assessment
Consensus-based assessment expands on the common practice of consensus decision-making and the theoretical observation that expertise can be closely approximated by large numbers of novices or journey
Policy capturing
Policy capturing or "the PC technique" is a statistical method used in social psychology to quantify the relationship between a person's judgement and the information that was used to make that judgem
Proportional reduction in loss
Proportional reduction in loss (PRL) is a general framework for developing and evaluating measures of the reliability of particular ways of making observations which are possibly subject to errors of
Congeneric reliability
In statistical models applied to psychometrics, congeneric reliability ("rho C") a single-administration test score reliability (i.e., the reliability of persons over items holding occasion fixed[2])
Reliability (statistics)
In statistics and psychometrics, reliability is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions: "It is th
Kendall tau distance
The Kendall tau rank distance is a metric (distance function) that counts the number of pairwise disagreements between two ranking lists. The larger the distance, the more dissimilar the two lists are
Stanine
Stanine (STAndard NINE) is a method of scaling test scores on a nine-point standard scale with a mean of five and a standard deviation of two. Some web sources attribute stanines to the U.S. Army Air
Spearman–Brown prediction formula
The Spearman–Brown prediction formula, also known as the Spearman–Brown prophecy formula, is a formula relating psychometric reliability to test length and used by psychometricians to predict the reli
Item-total correlation
The item-total correlation test arises in psychometrics in contexts where a number of tests or questions are given to an individual and where the problem is to construct a useful single quantity for e