Author + information
- Received March 27, 2017
- Revision received July 5, 2017
- Accepted July 5, 2017
- Published online October 18, 2017.
- Julian Betancur, PhDa,
- Yuka Otaki, MDa,
- Manish Motwani, MB, ChB, PhDa,
- Mathews B. Fish, MDb,
- Mark Lemley, CNMTb,
- Damini Dey, PhDa,
- Heidi Gransar, MSa,
- Balaji Tamarappoo, MD, PhDa,
- Guido Germano, PhDa,
- Tali Sharir, MDc,
- Daniel S. Berman, MDa and
- Piotr J. Slomka, PhDa,∗ ()
- aDepartments of Imaging, Medicine, and Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, California
- bOregon Heart and Vascular Institute, Sacred Heart Medical Center, Springfield, Oregon
- cDepartment of Nuclear Cardiology, Assuta Medical Centers, Tel Aviv, Israel
- ↵∗Address for correspondence:
Dr. Piotr J. Slomka, Artificial Intelligence in Medicine Program, Cedars-Sinai Medical Center, 8700 Beverly Boulevard, Suite A047N, Los Angeles, California 90048.
Objectives This study evaluated the added predictive value of combining clinical information and myocardial perfusion single-photon emission computed tomography (SPECT) imaging (MPI) data using machine learning (ML) to predict major adverse cardiac events (MACE).
Background Traditionally, prognostication by MPI has relied on visual or quantitative analysis of images without objective consideration of the clinical data. ML permits a large number of variables to be considered in combination and at a level of complexity beyond the human clinical reader.
Methods A total of 2,619 consecutive patients (48% men; 62 ± 13 years of age) who underwent exercise (38%) or pharmacological stress (62%) with high-speed SPECT MPI were monitored for MACE. Twenty-eight clinical variables, 17 stress test variables, and 25 imaging variables (including total perfusion deficit [TPD]) were recorded. Areas under the receiver-operating characteristic curve (AUC) for MACE prediction were compared among: 1) ML with all available data (ML-combined); 2) ML with only imaging data (ML-imaging); 3) 5-point scale visual diagnosis (physician [MD] diagnosis); and 4) automated quantitative imaging analysis (stress TPD and ischemic TPD). ML involved automated variable selection by information gain ranking, model building with a boosted ensemble algorithm, and 10-fold stratified cross validation.
Results During follow-up (3.2 ± 0.6 years), 239 patients (9.1%) had MACE. MACE prediction was significantly higher for ML-combined than ML-imaging (AUC: 0.81 vs. 0.78; p < 0.01). ML-combined also had higher predictive accuracy compared with MD diagnosis, automated stress TPD, and automated ischemic TPD (AUC: 0.81 vs. 0.65 vs. 0.73 vs. 0.71, respectively; p < 0.01 for all). Risk reclassification for ML-combined compared with visual MD diagnosis was 26% (p < 0.001).
Conclusions ML combined with both clinical and imaging data variables was found to have high predictive accuracy for 3-year risk of MACE and was superior to existing visual or automated perfusion assessments. ML could allow integration of clinical and imaging data for personalized MACE risk computations in patients undergoing SPECT MPI.
Traditionally, the prognostic value of myocardial perfusion single-photon emission computed tomography (SPECT) imaging (MPI) has been studied with semiquantitative visual and quantitative analysis of image data (1–3). A number of previous studies have shown that clinical demographics, functional parameters, and hemodynamic and stress results all affect the evaluation of MPI (4–7). This integration of clinical information and imaging data into a final impression is currently performed subjectively by physicians when they assess the MPI test, often in a nonstandardized manner.
Machine learning (ML) is a field of computer science that uses computer algorithms to identify patterns in large multivariable datasets and can be used to predict outcomes. In recent years, ML has been used for prediction and decision-making in a multitude of disciplines, including internet search engines, customized advertising, natural language processing, finance trending, and robotics (8–10). For MPI, a large number of parameters, including clinical variables, stress test results, and imaging data variables, could be considered by ML for outcome prediction. We evaluated the benefits of combining all of these variables using an ML algorithm to predict major adverse cardiac events (MACE) (8). ML prediction using combined data was also compared with physician (MD) diagnosis (based on a visual read with awareness of clinical data) and with automated perfusion quantification indexes (stress and ischemic total perfusion deficit [TPD]).
A total of 2,689 consecutive patients who were referred for clinically indicated exercise or pharmacological stress MPI at Sacred Heart Medical Center between January 2010 and December 2011 were included. The study was approved by the institutional review board, including a waiver for informed consent. After excluding 70 patients with early revascularization within 90 days, 2,619 patients were included for further analysis.
Clinical data were derived from patients’ medical records and included age, sex, and risk factors. Recorded risk factors were hypertension, diabetes mellitus, dyslipidemia, and smoking (defined as current smoking or cessation within 3 months of testing), and family history of premature clinical coronary artery disease (CAD). Presence of chest pain, and type and shortness of breath were assessed by the stress testing MD.
MPI and stress protocols
Resting and/or stress 1-day 99mtechnetium-sestamibi imaging was performed using a high-efficiency, solid-state SPECT scanner (D-SPECT, Spectrum-Dynamics, Haifa, Israel) (11). Weight-adjusted doses of 353 ± 151 MBq (9.5 ± 4.1 mCi) for rest and 1,252 ± 196 MBq (34 ± 5.3 mCi) for stress (recommended by vendor) were used (12), equivalent to a total average effective dose of 10.7 mSv based on the latest International Commission on Radiological Protection 103 estimates (13). Patients underwent symptom-limited Bruce protocol exercise testing (38%) or pharmacological stress (62%; regadenoson 0.4 mg) with injection at peak stress. Resting image acquisition was performed supine with 6- to 10-min acquisition time, based on patient body mass index. Upright and supine stress imaging (4 to 6 min) began 15 to 30 min after stress.
Transaxial images were generated from list mode data maximum likelihood expectation maximization reconstruction (11). No attenuation or scatter correction was applied. Images were automatically re-oriented into short-axis, and vertical and horizontal long-axis slices with Quantitative Perfusion SPECT (QPS)/Quantitative Gated SPECT (QGS) software (Cedars-Sinai Medical Center, Los Angeles, California).
Visual perfusion analysis
The visual analysis was done by multiple MDs who were aware of patient clinical information and quantitative assessment at the time of the study. Reader scan interpretation (MD diagnosis) was scored as 0 = normal, 1 = equivocal, 2 = probably abnormal, 3 = abnormal, or 4 = definitely abnormal. A 3-step scale probability of CAD was also reported (0 = low, 1 = intermediate, 2 = high).
All image datasets were de-identified, transferred to Cedars-Sinai Medical Center, and quality control was checked by a single experienced core laboratory technologist without knowledge of clinical data. Automatically generated myocardial contours by QPS/QGS software were evaluated, and when necessary, contours were adjusted to correspond to the myocardium. Upright and supine images were quantified as previously described (14). We used automatic TPD, a quantitative perfusion variable that reflects a combination of defect extent and severity, and produces stress, rest, and ischemic (stress – rest) TPD values. Ejection fraction, and systolic and diastolic volumes at stress and rest were quantified separately for each acquisition using standard QGS software with 8 frames per cardiac cycle. Transient ischemic dilation (TID) was computed as previously described (15). Counts in the left ventricle were obtained by planar projections of the left ventricular region defined during the first step of data reconstruction (16).
Outcome and follow-up data collection
The endpoint was MACE, which consisted of all-cause mortality, nonfatal myocardial infarction, unstable angina, or late coronary revascularization (percutaneous coronary intervention or coronary artery bypass grafting). All-cause mortality was determined from the Social Security Death Index and combined with MACE obtained from the hospital electronic medical records, including all clinics, as well as cardiology group and hospital visits. Nonfatal myocardial infarction was defined based on the criteria of hospital admission for chest pain, elevated cardiac enzyme levels, and typical changes on the electrocardiogram (17). The first event in each patient was used as the outcome. Patients with early revascularization ≤90 days after MPI were excluded.
Figure 1 illustrates the ML pathway, which involved automated variable selection by information gain ratio ranking and model building with a boosted ensemble algorithm, both worked into a stratified 10-fold cross validation procedure, as reported in our previous work (8). ML techniques were implemented in the open-source Waikato Environment for Knowledge Analysis (WEKA) platform 3.8.0 (University of Waikato, Hamilton, New Zealand) (18).
Twenty-five imaging data variables, 17 stress test variables, and 28 clinical variables were available for variable selection by the information gain ratio (18). Information gain ratio offers a measure of the effectiveness of a variable in classifying the training data. Only variables that resulted in an information gain ratio >0 were subsequently used in model building (Figure 2B).
Predictive classifiers for MACE scoring were developed by an ensemble (“boosting”) LogitBoost algorithm. The principle behind ML ensemble boosting is to combine the prediction of simple classifiers with weak performances to create a single strong classifier (19). These weak predictions are then combined in an ensemble (weighted majority voting) to derive an overall classifier, the ML score.
The performance and general error estimation of the entire ML process (variable selection and LogitBoost) were assessed using stratified 10-fold cross validation (Figure 1), which is currently the preferred validation technique in machine learning (18). The main advantages of this technique, compared with the conventional split-sample approach, are: 1) it reduces the variance in prediction error; 2) it maximizes the use of data for both training and validation, without overfitting or overlap between the test and validation data; and 3) it guards against testing hypotheses suggested by arbitrarily split data (20).
Using receiver-operating characteristic analysis and pairwise comparisons according to DeLong et al. (21), the predictive accuracy for MACE was compared among: 1) ML with all available data (ML-combined); 2) ML with only imaging data (ML-imaging); 3) a 5-point scale visual diagnosis (MD diagnosis); and 4) automated quantitative imaging analysis (stress TPD and ischemic TPD). Brier score and Pearson correlation were computed between predicted and observed MACE (22). For all analyses, MACE-free patients were censored to their follow-up date. To define the low-risk limit for MACE prediction by ML-combined, we used clinical diagnosis = 0, which is considered as definitely normal scans, as a well-established, low-risk limit. Then, low-risk cutoffs for ML-combined and TPD were calculated for approximately the same population percentile as for the MD diagnosis = 0 (87th percentile). Subsequently, improvement in risk classification using ML-combined compared with the MD diagnosis was assessed with a 5-category reclassification. Statistical calculations were performed using R software version 3.3.1 (R Foundation, Vienna, Austria) and PredictABEL package (R Foundation) for the reclassification.
Study population and outcome
Table 1 shows the baseline clinical characteristics of the studied population. When the first event per patient was considered, there were 239 (9.1%) 3-year MACE, with 150 (5.7%) all-cause deaths, 11 (0.4%) nonfatal MIs, 24 (0.9%) unstable anginas, and 54 (2.1%) late target revascularizations. The observed annual MACE rate was 3%.
Hemodynamic and MPI results
Table 2 shows hemodynamic and stress results separately for pharmacological stress and for exercise stress. The frequency of exercise stress was lower among patients with MACE compared with those without MACE (9% with MACE vs. 41% without MACE; p < 0.0001). Table 3 shows quantitative and visual MPI results. For the quantitative evaluation of perfusion and function, 9.8% of myocardial contours were corrected by the core laboratory technologist.
Figure 2A shows the average information gain ratio within 10-fold cross validation. On average, 22 imaging data, 8 stress tests, and 17 clinical variables were selected. All perfusion and functional variables from MPI had an information gain ratio >0, including left ventricular counts and injected dose. Top 9 selected variables were all imaging data variables.
MACE prediction by individual variables
Figure 2B shows the area under the receiver-operating characteristic curve (AUC) for the prediction of MACE by each individual variable. Stress TPD, stress heart rate, ischemic TPD, stress systolic blood pressure, resting TPD, and age were the best individual predictors. Compared with the information gain ratio in Figure 2A, there were some variables for which individual AUCs were predictive, yet they did not offer incremental information gain for predicting MACE (white bars). Furthermore, the variables with highest AUCs did not always have the highest information gain ratio.
MACE prediction by combined variables
MACE prediction was significantly higher for ML-combined than ML-imaging (AUC: 0.81, 95% confidence interval [CI]: 0.78 to 0.83 vs. AUC: 0.78, 95% CI: 0.75 to 0.81; p < 0.01). ML-combined also had a higher AUC compared with the AUCs of automated stress TPD and automated ischemic TPD (Figure 3), and compared with the AUCs for probability of CAD (0.64; 95% CI: 0.61 to 0.66) or MD diagnosis (0.65; 95% CI: 0.62 to 0.68), as reported by the MD (all p < 0.001). When stress test variables were added to image variables for ML integration, AUC did not change significantly (AUC: 0.79, 95% CI: 0.76 to 0.82 vs. AUC: 0.78, 95% CI: 0.75 to 0.81; p = 0.4).
The Brier score for ML-combined prediction of MACE was 0.07, which indicated good calibration between ML scores (estimated predicted risk) and observed 3-year risk. The plot of observed MACE versus predicted MACE over percentiles of ML-combined risk is shown in Figure 4. High correlation of ML-combined predicted MACE versus observed MACE was found (r = 0.97; p < 0.0001).
To allow categorical comparison, a low-risk, ML-combined score (<0.15) was determined as the cutoff that defined the same percentile as visual MD diagnosis = 0 (87th percentile). This percentile also approximately corresponded to the stress TPD threshold of <5% (14). For patients within the 95th to 100th percentile of the ML-combined score, 19% (25 of 131) of patients had a normal MD diagnosis and 10% (13 or 131) had stress TPD of <5% (Figure 5). Finally, a 5-category risk reclassification was 26% for ML-combined scores compared with a 5-category MD diagnosis (p < 0.001) (Table 4), with 30.5% improved identification of patients with MACE and −5% decreased identification of MACE-free patients (all p < 0.001).
We developed and validated a highly accurate, personalized method for post-MPI risk computation that used ML. This approach allowed the combination of all available clinical, stress test, and automatically derived imaging data variables without a priori assumptions about the influence or weighting of individual factors, or how they may interact. The method was used to evaluate the added value of clinical and stress test information for the prediction of MACE after MPI. The observed 3% annual MACE rate was similar to previous studies that assessed the prognostic value of SPECT MPI (4). The only human input required for the derivation of the ML-combined MACE risk score was the collation of clinical data from health records (conceivably a task fulfilled by advanced text mining in the future) and the adjustment of contours by the technologists in a minority (<10%) of the cases. Figure 6 illustrates how the proposed ML model would allow prediction of the risk of MACE for an individual unknown case by automatically integrating the clinical data with the imaging data.
The performance of the ML-combined score was superior to image risk metrics that are traditionally used to study prognostic outcomes after MPI (1–7). The AUC estimate, derived in a rigorous manner with test and training data separated within 10-fold cross validation (preventing overfitting) was substantially higher than that for ML-imaging, as well as visual or automated MPI assessment. Furthermore, risk reclassification analysis demonstrated that the ML-combined risk allowed better classification of high-risk patients than visual clinical diagnosis. Risk reclassification revealed that the ML-combined score could increase the risk score for >30% of patients with MACE incidence, but also increased the risk score for 5% of MACE-free patients. At the same time, we found that 19% of the patients in the highest ML-combined risk category (top 5%), with a MACE incidence of 38%, were still read as normal scans with a MD diagnosis = 0. These observations highlight the difficulty in finding the appropriate thresholds for the multicategory risk scores. The low-risk threshold in this study was derived for the same population percentile as “normal” visual scans, and subsequent higher risk thresholds were defined at 5% increments of increasing ML risk score. Furthermore, we found that automatically derived stress and/or ischemic TPD had better predictive value for MACE than a clinical diagnosis, which was in line with our previous reports (9,23), but has not been previously reported in prognostic studies.
To our knowledge, this was the first study that applied ML to predict MACE in patients who underwent MPI. Recently, our group assessed the feasibility and accuracy of ML to predict 5-year all-cause mortality in 10,030 patients who underwent coronary computed tomography (CT) angiography (8). In this analysis, ML exhibited a higher AUC compared with the Framingham risk score or visual CT severity scores alone (8). Automated processing of CT images was not used. In contrast, the present study capitalized on established automated processing software tools that were validated in nuclear cardiology to provide multiple imaging data variables with limited manual interaction. The intent was to demonstrate the feasibility of edging us closer to a completely automated computer-powered imaging analysis and risk assessment. A future direction and potential next step will be to develop tools that are also capable of automatically extracting clinical variables, for example, by text mining electronic health records.
The ML approach provides a computational integration of all available information that is not feasible for subjective analysis by the reporting physician. As part of the clinical decision-making, physicians take into account clinical and stress testing data; however, this is done subjectively without a systematic way of integrating information. Furthermore, although including these variables as part of the MPI report is recommended by guidelines, integration of these findings in the report is not yet part of standardized reporting guidelines (24,25). Intuitive patient-specific weighting of all individual clinical and imaging factors for assessing risk could not be expected to be precise, or consistent among different medical centers, whether performed by the interpreting physician or the physician managing the patient.
Although the average patient radiation dose (10.7 mSv) used in this study was higher than those specified in current guideline recommendations (26), the data were collected before the latest guidelines were adopted, using the same day rest−first protocol optimized for the acquisition speed rather than for the radiation dose. Furthermore, a weight-based protocol was used, and most of the patients were obese (body mass index ≥30 kg/m2). It is likely that at least a 50% lower effective radiation dose could be achieved with longer acquisition times without any effect on image quality, as previously studied (16). Further dose reductions could be achieved with stress-first and/or stress-only protocols.
The ability to optimally assess risk in individual patients remains a major challenge in cardiology. With MPI, visual image analysis itself is subjective, and the overall risk assessment that incorporates clinical, stress test, and imaging results, is highly variable, based on physician knowledge and experience, and limited by the complexity of appropriately assigning weight to individual factors. The presented ML score provides an automated precise and objective risk estimate that combines imaging, clinical, and stress testing variables. The same optimal method for risk computation would be readily available to all imaging centers, including less experienced centers. The practical implementation will depend on the ability to interface the MPI reporting workstation with electronic patient records, to access the clinical variables. Such a tool could be perhaps interfaced with large registry data (e.g., the ImageGuide registry of the American Society of Nuclear Cardiology ), which could collect clinical variables similar to those used in this study. The implementation will depend on the availability of the interface to the electronic health records.
This was a single-center study, and further multicenter and external validation of the derived risk score will be required. Future work should include the definition of the optimal ML threshold, to validate prospective practical clinical implementation. The sample size was modest and follow-up was only 3 years; however, all results were significant. Although training data were always separated from test data within the 10-fold cross validation, it is not yet known how well such an ML score can extrapolate among different centers, patient populations, and follow-up time. Although we included key perfusion and function imaging variables in this study, the list was not exhaustive. The derived ML score was generic and could be applied to both pharmacological and stress protocols, because the ML technique uses the information about the type of test internally. However, further evaluation of ML risk stratification for MACE prediction in specific subpopulations, for example, in patients with suspected disease, patients with early revascularization, or patients undergoing adenosine protocols, may be appropriate in multicenter studies. Risk reclassification metrics have limitations such as dependence on the choice of cutoff values of the continuous probability risk score. It is likely that more appropriate threshold selection in future studies may optimize the reclassification patterns for specific clinical risks. Alternatively, the MACE risk score without any categories could be also used clinically to indicate the probability of events for a given patient. Finally, we selected a LogitBoost approach for automatic ML variables integration, as in our previous work (8), but the LogitBoost approach we used is only one of many possible ML approaches to combine multiple variables for prediction. It is possible that different approaches such as deep learning may provide more optimal risk score derivation. However, a larger multicenter data set is required to evaluate possible advantages of other ML approaches.
ML combining both clinical and imaging data variables was found to have high predictive accuracy for the 3-year risk of MACE, and was superior to existing visual or automated perfusion assessments in isolation. This computational method could allow integrating the clinical data with imaging results for the optimal evaluation of MACE risk in patients undergoing MPI.
COMPETENCY IN MEDICAL KNOWLEDGE: Combining clinical and imaging information by an ML algorithm exhibited significantly better MACE prediction than using only imaging information or performing visual and automated perfusion assessment alone in SPECT MPI.
TRANSLATIONAL OUTLOOK: Adding clinical information to imaging data by ML will aid comprehensive MPI assessment to improve clinical patient management.
This research was supported in part by grant R01HL089765 from the National Heart, Lung, and Blood Institute/National Institute of Health (PI: Piotr Slomka). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Drs. Betancur and Otaki contributed equally to this work.
Drs. Berman, Germano, and Slomka have received royalties from Cedars-Sinai Medical Center. All other authors have reported that they have no relationships relevant to the contents of this paper to disclose.
- Abbreviations and Acronyms
- coronary artery disease
- computed tomography
- major adverse cardiac events
- machine learning
- myocardial perfusion imaging
- single-photon emission computed tomography
- transient ischemic dilation
- total perfusion deficit
- Received March 27, 2017.
- Revision received July 5, 2017.
- Accepted July 5, 2017.
- 2017 American College of Cardiology Foundation
- Gimelli A.,
- Rossi G.,
- Landi P.,
- et al.
- Hachamovitch R.,
- Kang X.,
- Amanullah A.M.,
- et al.
- Shaw L.J.,
- Berman D.S.,
- Maron D.J.,
- et al.
- Hachamovitch R.,
- Berman D.S.,
- Kiat H.,
- et al.
- Sharir T.,
- Germano G.,
- Kang X.,
- et al.
- Motwani M.,
- Dey D.,
- Berman D.S.,
- et al.
- Betancur J.,
- Rubeaux M.,
- Fuchs T.,
- et al.
- Gambhir S.S.,
- Berman D.S.,
- Ziffer J.,
- et al.
- Sharir T.,
- Slomka P.J.,
- Hayes S.W.,
- et al.
- Andersson M.,
- Johansson L.,
- Minarik D.,
- Leide-Svegborn S.,
- Mattsson S.
- Nakazato R.,
- Tamarappoo B.K.,
- Kang X.,
- et al.
- Nakazato R.,
- Berman D.S.,
- Hayes S.W.,
- et al.
- Thygesen K.,
- Alpert J.S.,
- White H.D.
- Friedman J.,
- Hastie T.,
- Tibshirani R.
- Arsanjani R.,
- Xu Y.,
- Dey D.,
- et al.
- Tilkemeier P.L.,
- Mahmarian J.J.,
- Wolinsky D.G.,
- Denton E.A.
- Henzlova M.J.,
- Duvall W.L.,
- Einstein A.J.,
- Travin M.I.,
- Verberne H.J.