All members of an Exam Committee must hold valid ABIM certification in that committee's discipline. ABIM strives to include broad representation from the relevant discipline on Exam Committees in order to maintain the validity of the examinations. Clinical expertise in the certification discipline is an essential requirement for all Exam Committee members.
All members on the Exam Committees are involved in direct patient care; full-time practitioner representation outside academic institutions is important to ensure job-relevant understanding of practice issues across the discipline.
Inclusion of diverse perspectives within the Exam Committee is strongly encouraged. Membership of the committee should reflect diversity in race, age, gender, geographic locations and institutional settings.
An examination blueprint is a table of specifications that determines the content of each exam. It is developed by the Exam Committee for each discipline and reviewed annually. The discipline's Board must also approve the exam blueprint. The examination blueprint is based on analyses of current practices and understanding of the relative importance of the clinical problems in the specialty area. Other content criteria such as question task and patient demographics are used to ensure an equal distribution of various content characteristics across different forms of the exam. The blueprint also includes statistical performance criteria that ensure that exams are fair and equivalent across administrations. Select an exam blueprint for your specialty.
Exams feature multiple-choice questions (MCQs) with a single best answer. Research has shown that scores obtained with MCQs are correlated with quality training and superior clinical performance; moreover, MCQs are particularly suitable for simulating clinical decision-making. The overwhelming majority of ABIM exam questions use a clinical stem (patient-based case scenario) format that assesses the higher order cognitive abilities required for clinical decision-making. A small number of questions address specific knowledge points without the use of a clinical stem. ABIM examination questions include both Système International (SI) and imperial units for height (cm/in), weight (kg/lb), and temperature (C/F). View Examples in Exam Tutorials.
As a part of the review of the examination question pool, underrepresented areas are identified, practice changes are considered, and writing assignments are given. Authors supply specific testing points that will be addressed with new questions. The testing points address areas that the qualified candidate is expected to know without consulting medical resources or references. The level of difficulty for each testing point should reflect the measurement goal of the examination: to differentiate candidates who have the expertise required for certification from those candidates who do not.
The ABIM standard for writing multiple-choice questions for the secure exam is a process called prototyping. This method is designed to focus content experts on the viability of the testing point, the appropriateness of the task that the question poses (diagnosis, treatment), an evidence-based single-best answer, and plausible distractors. After these question elements are created, the authors add the key elements that are needed in a patient-based scenario for that question.
Question Review/Editing Process
Newly created questions are rigorously reviewed at two separate meetings. At the first meeting, the Exam Committee chooses the prototypes to retain and further develops them into questions; these questions are then sent to ABIM for processing, which includes copyediting and processing of any illustrations that are part of the prototypes.
At the second review meeting, the Exam Committee reads and assesses the edited new questions and any revisions that might be needed on older questions
From these, the Exam Committee approves a final set of pretest questions. These questions are then proofread by the ABIM editorial staff and prepared for exam production. Questions in the live item pool are also evaluated at this meeting for content and statistical performance for possible use on a future exams.
Pretesting is a standard practice in the testing industry that allows testing of new, unproven questions without risk to the candidate. These questions are not counted in the candidate's score and are not identifiable to the candidate. Each pretest question is assessed according to statistical performance criteria before being accepted into the live pool of questions.
Test Assembly and Administration
ABIM uses an Automated Test Assembly (ATA) program to build its exams, which ensures a fair balance of content on each examination form that reflects the distribution of the items according to the blueprint as well as other specific content criteria. ATA also utilizes statistical criteria to ensure that examination forms are comparably constructed with regard to difficulty, discrimination, relevance and other statistical constraints. The examination forms are composed of items that best meet the content and statistical criteria for computer selection. The Exam Committee chair then reviews the ATA-constructed examination for approval.
To protect the integrity of certification, ABIM examinations are administered on a fixed schedule using different but equivalent forms of an examination from one administration to the next. Most Certification examinations are administered annually. Maintenance of Certification (MOC) examinations are administered twice annually to allow diplomates multiple opportunities to successfully complete the examination requirement and renew their certification.
When exams are used to make classification decisions such as pass/fail, content experts must determine how many questions need to be answered correctly to pass the exam. This score is known as the “passing score”. The process of determining the passing score is known as standard setting. Standard setting procedures provide a systematic and thorough method for eliciting a passing score from a diverse set of content experts.
How does ABIM Determine the Passing Score for an Exam?
ABIM uses the standard setting procedure first described by William Angoff in 1971. The Angoff method is well supported in the literature and is the most popular item-centered standard setting procedure in use on credentialing exams today. In the Angoff method panels of content experts:
- Discuss and internalize the ability of the borderline examinee.
- Review test questions and evaluate the difficulty of each.
- Estimate the proportion of borderline examinees who will answer each question correctly.
The logic behind this method is that the sum of these proportions is the expected score for the borderline examinee, that is, the passing score for the exam.
In the fictitious example above four content experts (A-D) provide ratings for 10 items. The average of these 40 ratings is the percent of items needed to pass the exam, in this case, 65.3%. In a real standard setting activity at ABIM dozens of content experts review and rate hundreds of test questions.
Who are the Content Experts?
Beginning in 2015, ABIM has invited a representative group of content experts, selected from the diplomate population, to participate in the standard setting meetings. This group is augmented with select members of ABIM Exam Committee, whose members are nationally recognized specialists in their specialty area. Together these diverse groups of content experts determine the passing score for their discipline.
What is the Difference Between the Passing Score and the Pass Rates?
As noted above, the goal of standard setting is to determine a passing score. The passing score is the number of items an examinee has to answer correctly to pass the exam. Once the passing score is determined, it is held constant over time so that all examinees are held to the same standard. ABIM revisits passing scores approximately every five years.
The pass rate is the percent of examinees that pass the exam in a given year. The pass rate can change each year due to differences in examinee performance.
The example above shows three years of testing. The passing score, represented by the black line, is constant across all three years. The pass rates, however, change with each cohort of examinees.
- In year 1 the cohort is medium performing and the pass rate is about 65%.
- In year 2 the cohort is low performing and the pass rate is about 45%.
- In year 3 the cohort is high performing and the pass rate is about 80%.
In summary, ABIM does NOT set pass rates, ABIM sets the passing score. Once the passing score is established, the pass rates can vary depending on the performance of the examinees in a given cohort. Get more information on pass rates.
Angoff, WH (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed.; pp. 508-600). Washington, DC: American Council on Education.
Cizek GJ. Setting performance standards: Concepts, methods, and perspectives. Mahwah, NJ: Lawrence Erlbaum Associates. 2001.
Livingston SA, Zieky MJ. Passing scores: A manual for setting standards of performance on educational and occupational tests. Educational Testing Service. 1982.
Mills CN. Establishing passing standards. In J.C. Impara (Ed.), Licensure testing: Purposes, procedures, and practices. University of Nebraska-Lincoln: Buros Institute of Mental Measurements. 1995.
Berk RA. A consumer's guide to setting performance standards on criterion-referenced tests. Review of Educational Research. 1986;56:137-172.
Answer Key Validation
After the examination is administered, staff psychometricians complete the performance analyses of the pretest and live questions. Before final scores are released, a key validation process is conducted to determine whether any answers may be miskeyed because the medical knowledge in the area has changed since the committee last reviewed the question pool. This process is accomplished by a review of questions that were overly difficult, nondiscriminating, or performed differently from previous use. Also included in the process are questions that received critical comments from candidates and questions addressing topics for which new information has emerged that may affect their correct answers. Key validation actions can be to 1) leave the answer as originally keyed, 2) change the keyed answer to another answer in the option list, 3) make more than one answer correct, or 4) make all answers correct so that all candidates receive credit for the question. These questions are removed from the live question pool and go back into the Question Review/Editing Process as described above.