Structured blueprint review: Setting the foundation for test development.

Baranowski R, Jones A, Grosso L. — American Board of Internal Medicine

Presented: Association of Test Publishers Meeting, March 2016

Abstract: An essential element of any testing program is assuring the relevance and appropriateness of examination content. The starting point for content is a well-defined blueprint or a set of test specifications, aligned to the requirements for the credential offered. To meet that goal, the presenters of this session developed a structured exam blueprint review. The purpose of this session was to illustrate how the structured blueprint review informs the many components of test development (e.g., item selection and writing new content) and provides validity evidence for a medical certification exam program.

The presenters gathered information on content relevance by surveying practitioners and asking them to rate blueprint content areas and associated tasks in terms of importance and frequency seen in practice. For example, respondents rated the overall frequency of a particular disease state (content area) and independently rated the importance of tasks related to that disease state (e.g., diagnosis, treatment, risk assessment). The exam committee used the resulting data to adjust the blueprint at a high level and to identify content areas of low relevance that should not be tested. This data was then used to inform the Automated Test Assembly for the next exam. Items in content areas with tasks that had low importance and low frequency ratings (i.e., low relevance) were not selected for the exam, and strong emphasis was placed on selecting items that had both high frequency and high importance ratings (i.e., high relevance). Some items in content seen less frequently but still important to know were also selected.

Critically, the presenters were able to avoid emphasizing exam content that was deemed less relevant to practitioners. They subsequently published the updated blueprint and included associated task relevance data in an effort to increase transparency and to better inform test takers about the exam. Going forward, exam committees will be able to evaluate items against both statistical performance and relevance rating criteria. By using ratings from practitioners, the presenters were able to more closely align the content of the exam with what examinees do in practice. This process sets a foundation for future validation arguments since the basis for validity in credentialing must start with identifying the knowledge and skills that are important for examinees to use in practice.

For more information about this presentation, please contact Research@abim.org.