Author + information
- Received January 26, 2018
- Revision received February 22, 2018
- Accepted February 23, 2018
- Published online April 18, 2018.
- Lohendran Baskaran, MDa,b,
- Ibrahim Danad, MDa,
- Heidi Gransar, MSa,
- Bríain Ó Hartaigh, PhDa,
- Joshua Schulman-Marcus, MDa,
- Fay Y. Lin, MDa,
- Jessica M. Peña, MD, MPHa,
- Amanda Hunter, MDc,
- David E. Newby, MDc,
- Philip D. Adamson, MDc and
- James K. Min, MDa,∗ ()
- aDepartment of Radiology, New York–Presbyterian Hospital and the Weill Cornell Medical College, New York, New York
- bNational Heart Centre, Singapore
- cUniversity of Edinburgh/BHF Centre for Cardiovascular Science, Edinburgh, United Kingdom
- ↵∗Address for correspondence:
Dr. James K. Min, Dalio Institute of Cardiovascular Imaging, Weill Cornell Medical College, 413 East 69th Street, New York, New York 10021.
Objectives This study sought to compare the performance of history-based risk scores in predicting obstructive coronary artery disease (CAD) among patients with stable chest pain from the SCOT-HEART study.
Background Risk scores for estimating pre-test probability of CAD are derived from referral-based populations with a high prevalence of disease. The generalizability of these scores to lower prevalence populations in the initial patient encounter for chest pain is uncertain.
Methods We compared 3 scores among patients with suspected CAD in the coronary computed tomographic angiography (CTA) randomized arm of the SCOT-HEART study for the outcome of obstructive CAD by coronary CTA: the updated Diamond-Forrester score (UDF), CAD Consortium clinical score (CAD2), and CONFIRM risk score (CRS). We tested calibration with goodness-of-fit, discrimination with area under the receiver-operating curve (AUC), and reclassification with net reclassification improvement (NRI) to identify low-risk patients.
Results In 1,738 patients (58 ± 10 years and 44.0% women), overall calibration was best for UDF, with underestimation by CRS and CAD2. Discrimination by AUC was highest for CAD2 at 0.79 (95% confidence interval [CI]: 0.77 to 0.81) than for UDF (0.77 [95% CI: 0.74 to 0.79]) or CRS (0.75 [95% CI: 0.73 to 0.77]) (p < 0.001 for both comparisons). Reclassification of low-risk patients at the 10% probability threshold was best for CAD2 (NRI 0.31, 95% CI: 0.27 to 0.35) followed by CRS (NRI 0.21, 95% CI: 0.17 to 0.25) compared with UDF (p < 0.001 for all comparisons), with a consistent trend at the 15% threshold.
Conclusions In this multicenter clinic-based cohort of patients with suspected CAD and uniform CAD evaluation by coronary CTA, CAD2 provided the best discrimination and classification, despite overestimation of obstructive CAD as evaluated by coronary CTA. CRS exhibited intermediate performance followed by UDF for discrimination and reclassification.
- coronary artery disease
- coronary computed tomography angiography
- pre-test probability
- risk score
Coronary artery disease (CAD) is a leading global health burden and a common cause of chest pain. In patients with chest pain, guidelines recommend initial diagnostic evaluation by assessment of an individual’s pre-test probability (PTP) of CAD to make decisions regarding further diagnostic testing (1–4). Accurate assessment of PTP affects the accuracy, yield, and cost-effectiveness of downstream diagnostic testing (5). Multiple risk scores have been developed to systematize risk assessment based on clinical history, including the updated Diamond-Forrester (UDF), CAD Consortium 2 (CAD2), and the CONFIRM registry scores (CRS) (6–8). However, previous evaluations and comparisons have been performed in referral cohorts for diagnostic testing, with disparate cutoffs for low-, intermediate-, and high-risk categories, and score performance may deteriorate if applied to more general chest pain populations (9–11).
We sought to compare the performance of the UDF, CAD2, and CRS scores in predicting the probability of obstructive CAD in the coronary computed tomography angiography (CTA)-randomized arm of the Scottish COmputed Tomography of the HEART trial (SCOT-HEART) study, a multicenter cohort of patients with chest pain enrolled in the clinic who underwent coronary CTA. We also evaluated score performance within important age- and sex-based subgroups.
Details of the SCOT-HEART study have been described elsewhere (12,13). Briefly, between 2010 and 2014, patients with suspected stable angina were recruited from chest-pain clinics and randomized to usual care plus coronary CTA versus usual care alone. Participants were excluded if they presented with acute chest pain, renal failure, and/or acute coronary syndrome within 3 months of recruitment. Patients gave written informed consent, and the study was approved by the research ethics committee. We included only patients in the coronary CTA randomized arm who had no previous CAD and who had information on all variables needed for the 3 scores, as only the coronary CTA arm underwent uniform evaluation of CAD. Of 2,073 patients in the coronary CTA arm of the SCOT-HEART trial, 1,738 patients were ultimately included for analysis (Figure 1).
Medical history was obtained at the time of enrollment. Cardiovascular risk factors were ascertained by review of patients’ medical records (14–16). The National Institute for Clinical Healthcare Excellence (NICE) clinical guideline on chest pain was used to categorize chest pain as typical, atypical, or nonanginal (14).
Cardiac CT scan protocol
All patients underwent coronary CTA using a 64-slice (Brilliance 64, Philips Medical Systems, Best, the Netherlands, and Biograph mCT, Siemens, Erlangen, Germany) and a 320-slice CT scanner (Aquilion ONE, Toshiba Medical Systems, Tokyo, Japan) at 3 imaging sites. The scans were graded by at least 2 accredited assessors (a cardiologist and radiologist). Obstructive CAD was defined as a stenosis causing ≥50%-diameter stenosis on coronary CTA. Intraobserver agreement was 95%, and interobserver agreement was 91% (17).
Prediction Risk Scores of Significant CAD
Risk scores were calculated for each patient. The UDF requires only age, sex, and symptom typicality (6). The CAD2 score additionally requires smoking status, diabetes, hypercholesterolemia, hypertension, and body mass index (BMI) (7), whereas the CRS additionally requires diabetes, hypertension, family history of CAD, and current smoking status (Online Appendix Table 1) (8).
The clinical outcome of interest was obstructive CAD. Continuous variables were described as mean ± SD, or medians with interquartile ranges when appropriate; categorical variables were displayed as frequencies and percentages. Variables were compared by chi-square statistic for categorical variables and by Student’s unpaired t test or Mann-Whitney nonparametric test when appropriate for continuous variables. To assess the calibration of each score assigned to the SCOT-HEART population, observed and predicted risk was computed based on categories defined by quintiles of predicted risk. The Hosmer-Lemeshow goodness-of-fit (GOF) chi-square statistic across each quintile was then calculated to measure the agreement between observed and predicted events. The discrimination of each score was then assessed via the area under the curve (AUC) for the receiver-operating characteristics (ROC) curve, then compared using the Hanley and McNeil method for paired data (18).
To assess the potential difference in decision-making impact of one score over another, reclassification was evaluated by use of net reclassification improvement (NRI), which assesses the improvement of an added model in comparison with a base model (19). For clinical relevance, NRI was assessed in a binary manner to identify low-risk patients, defined as those not requiring further testing as recommended by contemporary guidelines. The cutoffs chosen were at <10% and <15% PTP of disease, per guidelines (1–4). A score is penalized for incorrectly classified subjects (20,21). For this study, the reclassification performance of CAD2 and CRS was compared with UDF. Subsequently, CAD2 was used as a base model. Analyses were conducted using Stata version 14 (StataCorp LP, College Station, Texas) and SAS version 9.2 (SAS Institute Inc., Cary, North Carolina). Subgroups based on sex and an age cutoff of 65 years were evaluated for score calibration and discrimination, but there was insufficient sample size to calculate score NRIs based on binary categories. A 2-tailed p value <0.05 was considered statistically significant for all analyses.
The mean age was 58 ± 10 years, 764 (44.0%) were women, and 655 (37.7%) had obstructive CAD (Table 1). In addition to being older, patients with obstructive CAD were more likely to be male; have diabetes, hypertension, dyslipidemia; and to exhibit typical angina.
The calibration of the overall cohort demonstrated a good fit for UDF, with no significant deviation between observed and expected cases (chi-square 4.13; p = 0.53, Figures 2 and 3). CAD2 underestimated the likelihood of obstructive CAD across the risk spectrum (chi-square 35.17; p < 0.001), as did CRS to a lesser extent (chi-square 17.22; p = 0.004).
Continuous ROC analysis revealed the AUC for UDF to be 0.767 (95% confidence interval [CI]: 0.744 to 0.790), CAD2 to be 0.790 (95% CI: 0.768 to 0.811), and CRS to be 0.749 (95% CI: 0.726 to 0.771) (Figure 4). CAD2 exhibited significantly higher AUC compared with UDF and CRS (p < 0.001 for both), with a nonsignificant difference between UDF and CRS scores (p = 0.14).
At a probability cutoff of 10% to identify low-risk patients before further diagnostic testing, CAD2 compared with UDF correctly reclassified 45.7% of patients with nonobstructive CAD at the cost of incorrectly reclassifying 14.8% of patients with obstructive CAD, resulting in NRI of 0.31 (95% CI: 0.27 to 0.35; p < 0.001) (Table 2). CRS improved reclassification compared with UDF with an NRI of 0.21 (95% CI: 0.17 to 0.25; p < 0.001) but incorrectly classified more patients than CRS with NRI of –0.10 (95% CI: –0.14 to 0.06; p < 0.001). A consistent pattern of superior reclassification for CAD2 was observed at a probability cutoff of 15%, with NRI of 0.23 (95% CI: 0.19 to 0.27; p < 0.001) compared with UDF, followed by CRS with NRI of 0.20 (95% CI: 0.15 to 0.24; p < 0.001) compared with UDF. Similar to the 10% threshold, CRS compared with CAD2 trended toward worse reclassification, with NRI of –0.04 (95% CI: –0.07 to 0.00; p = 0.08).
Sex- and age-based subgroups
For male patients, UDF fit well (chi-square 6.90; p = 0.23), whereas both CAD2 and CRS underestimated risk (chi-square 20.3 and 27.6, respectively; p < 0.001 for both). There was no significant difference in AUC between UDF and CRS scores (p for difference = 0.19) for men. For female patients, both UDF and CRS fit well (chi-square 8.98, p = 0.11 and chi-square 1.30, p = 0.86, respectively), whereas CAD2 underestimated risk (chi-square 17.07; p = 0.004). For women, UDF had the lowest AUC of 0.686 (95% CI: 0.642 to 0.730), CAD2 had the highest AUC of 0.722 (95% CI: 0.680 to 0.763), and CRS AUC was 0.689 (95% CI: 0.646 to 0.732). There was a difference between the AUC of UDF and CAD2 (p < 0.001) but no difference in AUC between CRS and UDF (p = 1.00) or CAD2 (p = 0.08).
A similar pattern was observed for the cohort age <65 years, with good calibration for UDF (chi-square 7.62; p = 0.60) and CRS (chi-square 6.90; p = 0.11) and poor calibration for CAD2 (chi-square 19.43; p = 0.002). For the cohort age <65 years, CAD2 had the highest AUC (0.786, 95% CI: 0.759 to 0.812) (p < 0.001 when compared with UDF or CRS), followed by UDF (0.759, 95% CI: 0.730 to 0.787), then CRS (0.742, 95% CI: 0.714 to 0.770). There was no difference in AUC between UDF and CRS (p = 0.345). For age ≥65 years, all 3 models had good calibration (p = NS for all). For the cohort ≥65 years, the AUC for CAD2 (0.738, 95% CI: 0.692 to 0.783) was higher than that for CRS (0.688, 95% CI: 0.641 to 0.734) (p = 0.01). AUC for UDF (0.730, 95% CI: 0.685 to 0.775) was no different from CAD2 or CRS (p = NS for both).
In this population of contemporary stable outpatients with chest pain who underwent uniform study-mandated coronary CTA, CAD2 provided the best discrimination and reclassification for the overall patient population as well as among women, although it overestimated PTP of obstructive CAD as determined by coronary CTA. UDF displayed good calibration but lesser discrimination and reclassification. CRS, which has not been externally validated except in the original manuscript, had equivalent performance to UDF in some but not all age- and sex-based subgroups for calibration, discrimination, and reclassification.
Candidate scores were chosen to reflect the information available to clinicians during an initial patient encounter for chest pain and applicability to symptomatic patients. UDF, validated against an invasive coronary angiography (ICA)-based referral cohort, replaces the original ubiquitous Diamond-Forrester score recommended in American College of Cardiology/American Heart Association guidelines, relies on only demographics and chest pain typicality, and is currently recommended in European Society of Cardiology guidelines (1,2,4,6,7). CAD2 includes traditional CAD risk factors and basic laboratory evaluation of lipid profile and has been validated against both ICA-based and coronary CTA-based referral cohorts (7,22,23). CRS has been validated in a coronary CTA-based and myocardial perfusion stress-based cohort and requires only patient medical history, which may make it suitable for point-of-care decisions in an initial patient encounter (8).
In the current study, we sought to examine the performance of risk scores in a population that uniformly underwent evaluation for CAD with minimal post-test referral bias. This contrasts with previous studies that have derived and validated scores in patients who had already been referred for further investigations (6,7,9–11,18,24). Thus, the population studied herein is representative of patients where pre-test assessment with scores may have optimal utility (2). It also bears mentioning that baseline characteristics of the current study sample were broadly similar to other cohorts enrolled in previous large cardiac CT registries (25–28).
A coronary CTA-based definition of obstructive CAD was chosen rather than an ICA- or ischemia-based definition because there are no large-scale ICA- or ischemia-based cohorts without work-up bias of sufficient size to compare risk scores. SCOT-HEART is one of the very few patient populations, along with other coronary CTA intervention trials such as the PROMISE (Prospective Multicenter Imaging Study for Evaluation of Chest Pain) trial, in which risk scores can be tested uniformly. Because 85% of the patients in the overall SCOT-HEART cohort had exercise treadmill testing (ETT) prior to enrollment, there may have been referral bias that would affect the performance characteristics for ETT but not for coronary CTA (13). However, because coronary CTA has a high negative predictive value for obstructive CAD by ICA but a lower positive predictive value, it is likely that the prevalence of obstructive CAD is overestimated in our study. Thus, our study is more useful for evaluation of discrimination and reclassification and less useful for evaluation of calibration.
Overall, CAD2 provided the best discrimination. This may be because its derivation included a range of prevalences, giving it broad applicability (7). The AUC value in this study is similar to the value obtained by Bittencourt et al. (22). However, in a higher (59%) prevalence group, the AUC obtained was lower. Notably, this study was for patients presenting with typical angina and not the intended population for a diagnostic risk score (23). A more parsimonious version of the CAD2 score has been applied to the SCOT-HEART population, with comparison to the PROMISE minimal-risk tool (29). However, this comparison was for the identification of low-risk patients without CAD. The AUC for UDF obtained in the current study was 0.767, comparable to its previous validation (6), although higher than the value obtained by Jensen et al. (9). In the original paper, CRS was validated in 2 cohorts: 1 undergoing coronary CTA and the other myocardial perfusion scintigraphy (MPS). AUCs of 0.710 and 0.770 were obtained, respectively (8). The AUC for CRS in the current study was between both values, suggesting translatability across cohorts.
The binary NRI analysis using guideline-recommended cutoffs was intended to have real-world applicability by streamlining risk-score selection and to help avoid further unnecessary testing in patients with low PTP. At a threshold of 10% and 15%, CAD2 demonstrated consistently improved reclassification of low-risk patients, identifying patients among whom further testing would not be warranted. CRS was slightly inferior at a threshold of 10%, whereas UDF exhibited the worst performance to identify low-risk patients. This may be a reflection of the increased accuracy of CAD2 and CRS derived from the greater number of variables required when compared with UDF.
Score calibration is easily affected by the prevalence of disease in the validation population compared with the prevalence in the derivation population. It follows that, as UDF derivation and validation cohorts had high prevalences of obstructive CAD, it thus exhibited the best calibration against a coronary CTA gold standard (6). Conversely, CAD2 and CRS, which demonstrated good calibration in coronary CTA studies with a lower prevalence, underestimated the probability of obstructive CAD by coronary CTA (7,8,23).
On the basis of this study, we find that, in a general chest-pain population with availability of lipid profiles, CAD2 has the best discrimination and reclassification across all groups. UDF, although parsimonious, has the poorest discriminatory and reclassification performance. CRS may be more useful in preference to UDF for point-of-care estimation of CAD probability by medical history at the first clinic visit, before lipid-profile results become available.
First, the use of coronary CTA as a gold standard for obstructive CAD limits evaluation for calibration, as mentioned earlier, but allows for a generalizable comparison of these scores in initial evaluation of chest pain with minimal work-up bias, which has not been previously performed. Second, we evaluated NRI for each score at thresholds that differ from the original manuscripts but allow comparison of scores in uniform manner (21). However, to guide clinicians, we chose a priori to compare reclassification qualities of these different scores in a large, external, generalizable cohort. Third, we chose a limited number of scores based on the information that would be available during an initial patient encounter. We cannot comment on the performance of CAD2, UDF, and CRS compared with scores incorporating coronary calcium, exercise treadmill, or biomarker results, which would only be available at a follow-up patient encounter and thus cannot aid clinicians who need to make immediate decisions about referral to diagnostic work-up for evaluation of chest pain. Finally, although this comparison of scores assesses relationships that may suggest clinical utility, the actual effect of each score in guiding processes of patient care is not directly evaluated.
At the first clinic encounter for patients with chest pain, estimation of the pre-test probability of obstructive CAD by coronary CTA with history-based scores may be best performed with CAD2, with the highest discrimination and accurate reclassification of low-risk patients with <10% probability of obstructive CAD. When less information is available initially, CRS exhibits intermediate performance followed by UDF for discrimination and reclassification. This comparison is robust in that CAD evaluation was uniformly performed in the coronary CTA-randomized arm of the SCOT-HEART study. However, calibration is generally underestimated using coronary CTA as the reference standard. Consistent trends are observed in sex- and age-based subgroups, most notably in women.
COMPETENCY IN MEDICAL KNOWLEDGE: In the SCOT-HEART coronary CTA-randomized cohort, the pre-test probability of obstructive CAD in stable patients with chest pain is best estimated by the CAD2 score, followed by the CRS, and then the UDF scores, with consistency across sex- and age-based subgroups, as evaluated by discrimination and classification. Our results may be useful for point-of-care estimation of CAD probability by medical history at the first clinic visit.
TRANSLATIONAL OUTLOOK: The downstream effect of different estimates of pretest probability of obstructive CAD by coronary CTA should be assessed for clinical outcomes including obstructive CAD by ICA, cath normalcy, and major adverse cardiac events.
Dr. Newby thanks the British Heart Foundation (CH/09/002) and the Wellcome Trust Senior Investigator Award (WT103782AIA) donors for their support.
This work is supported in part by the Dalio Institute of Cardiovascular Imaging and the Michael Wolk Foundation. Dr. Min has served as a consultant to HeartFlow and Abbott Vascular; on the medical advisory boards of GE Healthcare and Arineta; as a consultant for HeartFlow, NeoGraft Technologies, MyoKardia, and CardioDx; and holds ownership in MDDX and AutoPlaq. Dr. Newby has received honoraria and consultancy from Toshiba Medical Systems; and is also supported by the British Heart Foundation (CH/09/002) and a Wellcome Trust Senior Investigator Award (WT103782AIA). All other authors have reported that they have no relationships relevant to the contents of this paper to disclose. Drs. Adamson and Min contributed equally to this work and are joint senior authors. Pamela Douglas, MD, served as the Guest Editor for this paper.
- Abbreviations and Acronyms
- area under the curve
- coronary artery disease
- CAD Consortium clinical score
- CONFIRM risk score
- computed tomography angiography
- invasive coronary angiography
- net reclassification improvement
- pre-test probability
- receiver-operating characteristics
- updated Diamond-Forrester
- Received January 26, 2018.
- Revision received February 22, 2018.
- Accepted February 23, 2018.
- Fihn S.D.,
- Gardin J.M.,
- Abrams J.,
- et al.
- Fihn S.D.,
- Blankenship J.C.,
- Alexander K.P.,
- et al.
- Skinner J.S.,
- Smeeth L.,
- Kendall J.M.,
- et al.
- Genders T.S.S.,
- Steyerberg E.W.,
- Hunink M.G.M.,
- et al.
- Newby D.E.,
- Williams M.C.,
- Flapan A.D.,
- et al.
- American Diabetes Association
- Expert Panel on Detection, Evaluation and Treatment of High Blood Cholesterol in Adults
- Williams M.C.,
- Golay S.K.,
- Hunter A.,
- et al.
- Bittencourt M.S.,
- Hulten E.,
- Polonsky T.S.,
- et al.
- Almeida J.,
- Fonseca P.,
- Dias T.,
- et al.
- Hadamitzky M.,
- Achenbach S.,
- Al-Mallah M.,
- et al.
- Cho I.,
- Chang H.J.,
- Ó Hartaigh B.,
- et al.
- Yeboah J.,
- Sillau S.,
- Delaney J.C.,
- et al.
- Adamson P.D.,
- Fordyce C.B.,
- McAllister D.A.,
- Udelson J.E.,
- Douglas P.S.,
- Newby D.E.