Category: Comparison of assessments

Item response theory

In psychometrics, item response theory (IRT) (also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of tests, q

Inter-rater reliability

In statistics, inter-rater reliability (also called by various similar names, such as inter-rater agreement, inter-rater concordance, inter-observer reliability, inter-coder reliability, and so on) is

Anchor test

In psychometrics, an anchor test is a common set of test items administered in combination with two or more alternative forms of the test with the aim of establishing the equivalence of the test score

Bangdiwala's B

Bangdiwala's B statistic was created by in 1985 and is a measure of inter-rater agreement. While not as commonly used as the kappa statistic the B test has been used by various workers. While it is pr

Internal consistency

In statistics and research, internal consistency is typically a measure based on the correlations between different items on the same test (or the same subscale on a larger test). It measures whether

Intra-rater reliability

In statistics, intra-rater reliability is the degree of agreement among repeated administrations of a diagnostic test performed by a single rater. Intra-rater reliability and inter-rater reliability a

Cronbach's alpha

Cronbach's alpha (Cronbach's ), also known as tau-equivalent reliability or coefficient alpha (coefficient ), is a reliability coefficient that provides a method of measuring internal consistency of t

Classical test theory

Classical test theory (CTT) is a body of related psychometric theory that predicts outcomes of psychological testing such as the difficulty of items or the ability of test-takers. It is a theory of te

Kuder–Richardson formulas

In psychometrics, the Kuder–Richardson formulas, first published in 1937, are a measure of internal consistency reliability for measures with dichotomous choices. They were developed by Kuder and Rich

Consensus-based assessment

Consensus-based assessment expands on the common practice of consensus decision-making and the theoretical observation that expertise can be closely approximated by large numbers of novices or journey

Policy capturing

Policy capturing or "the PC technique" is a statistical method used in social psychology to quantify the relationship between a person's judgement and the information that was used to make that judgem

Proportional reduction in loss

Proportional reduction in loss (PRL) is a general framework for developing and evaluating measures of the reliability of particular ways of making observations which are possibly subject to errors of

Congeneric reliability

In statistical models applied to psychometrics, congeneric reliability ("rho C") a single-administration test score reliability (i.e., the reliability of persons over items holding occasion fixed[2])

Reliability (statistics)

In statistics and psychometrics, reliability is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions: "It is th

Kendall tau distance

The Kendall tau rank distance is a metric (distance function) that counts the number of pairwise disagreements between two ranking lists. The larger the distance, the more dissimilar the two lists are

Stanine

Stanine (STAndard NINE) is a method of scaling test scores on a nine-point standard scale with a mean of five and a standard deviation of two. Some web sources attribute stanines to the U.S. Army Air

Spearman–Brown prediction formula

The Spearman–Brown prediction formula, also known as the Spearman–Brown prophecy formula, is a formula relating psychometric reliability to test length and used by psychometricians to predict the reli

Item-total correlation

The item-total correlation test arises in psychometrics in contexts where a number of tests or questions are given to an individual and where the problem is to construct a useful single quantity for e