Evaluating the performance of common items using item parameter drift, model-data misfit and response time.

Hess BJ, Zhu R, Grosso LJ, Fortna GS, Lipner RS. — American Board of Internal Medicine

Presented: American Educational Research Association Meeting, April 2011

Abstract: In high-stakes testing, it is often important to administer multiple forms in order to maintain test security. The common-item nonequivalent groups design is frequently used to equate forms of a test. Selecting effective methods and appropriate statistical criteria to identify problematic items in the common-item set is a critical step in the equating process. Using data from 10 forms of a medical certification examination administered at different time points over an eight-week period, the present study examined the performance of common items using item parameter drift, model-data misfit and item response times. Four items with differential performance across administrations were identified; one item was flagged by all three methods (the item got easier and response times decreased across administrations). Removal of items with aberrant performance from the common-item set could enhance the quality of test equating and improve the validity of the scores, provided the aberrance is due to construct-irrelevant factors and that the remaining common-item set is still proportionally representative of the total test. Our study also has implications for data forensics in that utilizing response time information with other statistical procedures can aid in our understanding of the item exposure problem resulting from possible collusion or cheating.

For more information about this presentation, please contact Research@abim.org.