Physician performance assessment: Setting a credible standard.

Lipner RS, Hess BJ, Arnold GK, Weng W. — American Board of Internal Medicine and American Board of Surgery

Presented: AcademyHealth Annual Research Meeting, June 2009

Research Objectives: Challenges in assessing physician practice performance, specifically in achieving measures that are evidence-based and statistically sound, have received a lot of attention over the last decade. Less research is focused on standard-setting techniques used to determine standards by which physicians are being held accountable for public reporting or pay-for-performance. This study sought to create a structured process that would lead to a credible and defensible absolute passing standard.

Study Design: The ABIM Diabetes Practice Improvement Module (PIM) was used to collect medical record and patient experience data for 10 clinical (intermediate outcomes and processes) and two patient experience measures. The Angoff standard-setting method, frequently used for multiple-choice examinations, was used to judge how minimally competent physicians would perform on individual measures. This required an expert panel to review the 12 measures and establish performance criteria for delivering minimally competent diabetic care. For example, each expert answered the question “what percent of patients seen by a borderline physician would receive an ophthalmologic exam?” Experts then rated each measure to determine relative importance and these were scaled using the Dunn-Rankin method. An overall pass/fail standard was derived using a continuous scoring approach.

Population Studied: Performance data were from 957 physicians who completed the Diabetes PIM with at least 10 patients between the ages of 18-75. This yielded 20,131 chart data and 18,706 patient surveys. The expert panel had four general internists, two nephrologists, one endocrinologist and one geriatrician.

Principal Findings: The expert panel successfully used the Angoff method and importance rating scale to establish reasonable criteria and weightings. The scoring approach was in contrast to those used by recognition programs where physicians who meet minimum performance criteria for a measure are awarded all points allocated while physicians who do not receive zero points. Instead, we multiplied physicians’ actual performance rate for each measure by the derived importance value and summed these products. Continuous scoring uses more information from a physician’s performance making it more sensitive to identifying differences among physicians. The approach yielded a normal distribution and a passing standard of 48.002 (almost half of the possible 100 points), which classified 4.3% of physicians in the sample as incompetent. Performances on clinical outcomes were weighted highest for importance (54%) followed by process measures (30%) and patient experience (16%). The decision consistency was quite high (.95). Validity was substantiated because those classified as incompetent were more likely to be of lower cognitive ability (as measured by standardized internal medicine certification scores and overall clinical competence program director ratings) and more likely to be in solo practice.

Conclusion: The standard-setting approach yielded a credible and defensible absolute passing standard whose outcome was reasonable and reproducible, and was based on informed judgment, due diligence, performance data and research.

Implications for Policy, Delivery or Practice: In making high-stakes decisions about physician performance that affect not only physicians but the patients they treat one must not only ensure that the measures used in the assessment are evidence-based and statistically sound but also that the method used to determine a standard of competence is rigorous and credible.

For more information about this presentation, please contact Research@abim.org.