Psychometric evaluation of quality indicators for potential use in quality composite measures of physician performance.

Back

Arnold GK, Weng Weifeng, Didura H, Holmboe ES, Lipner RS. — American Board of Internal Medicine and American Board of Surgery

Presented: AcademyHealth Annual Research Meeting, June 2007

Research Objectives: Developing composite measures of physician performance across different medical conditions and practice characteristics using a subset of the ambulatory care measures endorsed by the National Quality Forum (NQF) is considered. A psychometric analysis of measures from chart audits of patients with three chronic conditions: diabetes, cardiovascular disease and hypertension is presented.

Study Design: The American Board of Internal Medicine (ABIM) requires physicians to evaluate their practices as part of the Maintenance of Certification program. One evaluation tool is the ABIM’s Practice Improvement Module (PIM) with a chart audit component. Three PIMs commonly used are: Diabetes, Preventive Cardiology (PC) and Hypertension. Reliability estimates based on intraclass correlation coefficients (ICCs) of process measures and clinical outcomes (e.g., lipids, blood pressure) will be compared (Fisher Z and Feldt tests) across medical conditions and practice characteristics. Additional analyses included plotting distributional characteristics, factor analyses, subgroup contrasts via mixed models and assessment of case-mix adjustments.

Population Studied: 31,555 patient charts were audited by 1,320 physicians, with 638 completing Diabetes, 368 completing PC and 314 completing Hypertension PIMs.

Principal Findings: Process measures tended to have higher reliabilities than clinical outcome measures. For example, the reliability of annual lipid profiles for patients in the PC PIM [ICC = 0.25: 95% CI (0.22 to 0.28)] is higher (p<0.01) than the reliability for proportion of patients with LDL < 100 mg/dl, [ICC = 0.08, (0.07 to 0.09)]. This finding is consistent across the three chronic conditions. The degree of ICC similarity for the same measure among PIMs seems sensitive to the goal criterion. For example, proportion of patients having triglycerides < 150 mg/dl: [ICC – Diabetes = 0.05, (0.04 to 0.06)] versus [ICC – PC = 0.06, (0.05 to 0.08)] versus [ICC – Hypertension = 0.06, (0.05 to 0.08)] are not significantly different (p>0.05); but when the criterion is shifted to triglycerides < 200 mg/dl then the reliability of the measure from the Diabetes differs significantly from the PC and Hypertension reliabilities [ICC – Diabetes = 0.04, (0.03 to 0.05)]; [ICC – PC = 0.05, (0.04 to 0.07)]; and [ICC – Hypertension = 0.06, (0.04 to 0.07)]. ICC values with and without case-mix adjustments (gender, age, behavior, and other co-morbidities) tend to be similar. For example the ICC for proportion of diabetic patients (Diabetes) with annual lipid profiles without case-mix adjustment [ICC = 0.23, (0.21 to 0.25)] versus with case-mix adjustment [(ICC = 0.23, (0.20 to 0.25)]. Factor analyses and subgroup comparisons are in process.

Conclusion: The psychometric properties of the common measures across PIMs are dependent upon goal criterion values and patient case-mix. Therefore, the use of these measures in developing a composite for any specific condition must take this into consideration.

Implications for Policy, Delivery or Practice: In assessing the quality of physician health care via composite measures we must consider the reliability profiles of process and clinical outcome indicators collected within and across medical practice settings and chronic conditions. The context of practice appears to be an important factor in the construction of composite measures.

Primary Funding Source: ABIM, NQF

For more information about this presentation, please contact Research@abim.org.