How Exams are Developed
- Board and Test Committees
- Examination Blueprint
- Question Format
- Question-Development Process
- Pretesting
- Question Review/Editing Process
- Test Construction – Automated Test Assembly (ATA)
- Standard Setting
- Answer Key Validation
Board and Test Committees
A committee begins the examination development process by writing new questions. Committee members are nationally recognized specialists who possess a breadth of knowledge in the specialty area. Academicians and practitioners are included to ensure adequate content coverage and to include the perspectives of both training and practice environments.
TopExamination Blueprint
An examination blueprint is a table of specifications that determines the content of each examination. It is developed by the board/test committee and reviewed annually. The examination blueprint is based on analyses of current practices and understanding of the relative importance of the clinical problems in the specialty area.
TopQuestion Format
Exams feature multiple-choice questions (MCQs) with a single best answer. Research on MCQs has shown that scores obtained with MCQs are related to quality training and later performance; moreover, MCQs are particularly suitable for simulating clinical decision-making. The overwhelming majority of ABIM examination questions use a clinical stem (patient-based case scenario) format that assesses the higher order cognitive abilities required for clinical decision-making. A small number of questions address specific knowledge points without the use of a clinical stem.
TopQuestion-Development Process
As a part of the review of the examination question pool, underrepresented areas are identified, practice changes are considered and writing assignments are given. Authors supply specific testing points that will be addressed with new questions. The testing points address areas that the qualified candidate is expected to know without consulting medical resources or references. The level of difficulty for each testing point should reflect the measurement goal of the examination: to differentiate candidates who have the expertise required for certification from those candidates who do not. Testing points are the basis for new questions. For each question, the author selects a task (e.g., diagnosis, treatment) and a cognitive ability (e.g., judgment, synthesis). The author generally writes a clinical stem, a lead line (question), and response options. The correct answer is based on clinical evidence; the distractors provide plausible choices to differentiate between qualified and less-qualified candidates. The author supplies a question rationale, which relates the testing point to the specific information in the question, and cites any applicable references.
TopPretesting
Pretesting is a standard practice in the testing industry that allows testing of new, unproven questions without risk to the candidate. These questions are not counted in the candidate's score and are not identifiable to the candidate. Each pretest question is assessed according to statistical performance criteria before being accepted into the live pool of questions.
TopQuestion Review/Editing Process
Newly written questions are rigorously reviewed at two separate meetings. At the first meeting, the committee decides which questions to retain; these questions are either accepted with no changes, revised at the meeting, or assigned to individual members for more extensive revision.
The questions that are accepted with no changes or with revisions from the first meeting undergo the copyediting process at ABIM. Media (e.g., illustrations) are also processed in preparation for the second review by the committee before they are produced for the examination.
At the second review meeting, the committee assesses the edited new questions and the revisions to older questions.
From these, they select the final set of pretest questions. The selected questions are then proofread by the ABIM editorial staff and prepared for examination production. Questions in the live item pool are also evaluated at this meeting for content and statistical performance for possible use on a future examination.
TopTest Construction – Automated Test Assembly (ATA)
ABIM uses an Automated Test Assembly (ATA) program to build its examinations, which ensures a fair balance of content on each examination form that reflects the distribution of the items according to the blueprint as well as other specific content criteria. ATA also utilizes statistical criteria to ensure that examination forms are comparably constructed with regard to difficulty, discrimination, relevance and other statistical constraints. The examination forms are composed of items that best meet the content and statistical criteria for computer selection. The committee chair then reviews the ATA-constructed examination and either approves the form or identifies questions that should be removed and replaced by the ATA program, which then produces a second iteration to be approved by the chair.
TopStandard Setting
The passing standard for an exam is established using standard-setting techniques that follow best practices in the testing industry. The standard for each certification exam is set by the designated ABIM Subspecialty Board or Test Committee, whose members are nationally recognized specialists who together encompass the breadth of clinical knowledge in the specialty area. Members include both clinical educators and practitioners in order to incorporate the perspectives of both the training and practice environments. In setting the passing standard, the committee considers several factors, including changes to the knowledge base of the field as well as changes in the characteristics of minimally qualified candidates for certification.
The passing standard is based on a specified level of mastery of content in the specialty area and therefore no predetermined percentage of examinees will pass or fail the exam. The committee uses the modified-Angoff method to set a content-based standard. This evidence-based method asks raters to conceptualize and estimate what a specialist, who is just barely qualified to merit certification, would be able to do. For each question, the rater is asked, “What is the probability that this type of physician will correctly answer it?” The raters’ judgments are systematically combined to derive the passing standard.
Following best practices in the testing industry, standards are reviewed periodically for appropriateness and may be adjusted. If the committee determines that the current standard is no longer appropriate based on its judgment of the cognitive expertise essential for certification, it will set a new standard using the process described above, and this new standard will be reviewed periodically by the same methods.
For more information on standard setting, we recommend reviewing the following resources:
- Cizek GJ. Setting performance standards: Concepts, methods, and perspectives. Mahway NJ: Lawrence Erlbaum Associates. 2001.
- Livingston SA, Zieky MJ. Passing scores: A manual for setting standards of performance on educational and occupational tests. Educational Testing Service. 1982.
- Mills CN. Establishing passing standards. In J.C. Impara (Ed.), Licensure testing: Purposes, procedures, and practices. University of Nebraska-Lincoln: Buros Institute of Mental Measurements. 1995.
- Berk RA. A consumer's guide to setting performance standards on criterion-referenced tests. Review of Educational Research. 1986;56:137-172.
Frequently Asked Questions about Standard Setting
TopAnswer Key Validation
After the examination is administered, staff psychometricians complete the performance analyses of the pretest and live questions. Before final scores are released, a key validation process is conducted to determine whether any answers may be miskeyed because the medical knowledge in the area has changed since the committee last reviewed the question pool. This process is accomplished by a review of questions that were overly difficult, nondiscriminating, or performed differently from previous use. Also included in the process are questions that received critical comments from candidates and questions addressing topics for which new information has emerged that may affect their correct answers. Key validation actions can be to 1) leave the answer as originally keyed, 2) change the keyed answer to another answer in the option list, 3) make more than one answer correct, or 4) make all answers correct so that all candidates receive credit for the question. These questions are removed from the live question pool and go back into the Question Review/Editing Process as described within this section.







