Author + information
- Published online December 3, 2018.
- Ahmed S. Fahmy, PhD,
- Johannes Rausch, MS,
- Ulf Neisius, MD, PhD,
- Raymond H. Chan, MD,
- Martin S. Maron, MD,
- Evan Appelbaum, MD,
- Bjoern Menze, PhD and
- Reza Nezafat, PhD∗ ()
- ↵∗Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, Massachusetts 02215
Scar volume quantified by cardiovascular magnetic resonance (CMR) with late gadolinium enhancement (LGE) is a novel imaging biomarker for risk stratification in patients with hypertrophic cardiomyopathy (HCM) (1). In current practice, scar quantification often relies on manual delineation of the myocardium borders and the hyperenhanced regions on LGE images, which is subjective, laborious, and time- consuming. Variations among different CMR centers and core laboratories reduce the reproducibility of scar quantification (2). In addition, variable gadolinium kinetics and the patchy multifoci appearance of hyperenhancements in patients with HCM are major challenges to automatic quantification techniques compared with other CMR applications (2).
In this study, we present an initial proof-of-concept for using deep convolutional neural networks (DCN) to automatically quantify left ventricle (LV) mass and scar volume on LGE in patients with HCM. We used a U-net DCN architecture with 150 operational layers including batch normalization, convolutional, rectified-linear, and dropout layers (3). The network parameters were initially set to random values sampled from standard Gaussian distribution. LGE images of 1,041 patients with HCM (7,775 images) acquired and manually segmented by an expert reader as part of a multicenter/multivendor study (1) were split into training (80%) and testing (20%) subsets. Stratified random sampling was used to achieve a balanced number of cases with scar and from each vendor. Data augmentation with elastic deformation, translation, and mirroring of the training images was used to increase the dataset size synthetically and incorporate previous knowledge (e.g., image segmentation should be invariable to image translation). After off-line training phase, the DCN was used to automatically segment the scar and left ventricle (LV) in the testing dataset. The segmented scar volume and LV mass were assessed relative to the manual segmentation on per-slice and per-patient bases. The spatial overlap between the automatically and manually segmented regions was assessed by the Dice similarity coefficient (DSC). The testing dataset was assessed by a second expert to grade the image quality (low, medium, and high), and the DCN performance was compared among the different quality levels. To evaluate the performance of scar segmentation for different training/testing subsets, we repeated dataset splitting, training, and testing of our DCN 4 times and compared the performance of each evaluation.
Automatic segmentation was achieved in 0.26 s/image (Figure 1A). The DCN detected scar in 52 patients with scar volume 6.1 ± 7.4 cm3 compared with manual detection of 60 patients with volume 7.5 ± 8.5 cm3. The automatically and manually segmented scar volumes (over all testing images) were strongly correlated in per-patient (rs = 0.84, r = 0.90; p < 0.001) (Figure 1B) and per-slice (rs = 0.81, r = 0.84; p < 0.001) analyses. A strong correlation was also observed between the manually and automatically estimated LV mass in per-patient (rs = 0.95, r = 0.96; p < 0.001) and per-slice (rs = 0.93, r = 0.93; p < 0.001) analyses. The segmentation accuracy (measured by DSC) between automatic and manual segmentations was 0.57 ± 0.23 (per-patient) and 0.58 ± 0.28 (per-slice) for the scar, and 0.82 ± 0.08 (per-patient) and 0.81 ± 0.11 (per-slice) for the LV. The DSC of LV segmentation was lower in the apical slices compared with other slice locations (0.70 ± 0.2 vs. 0.83 ± 0.10; p < 0.001). No significant differences were observed in the scar DSC among slices with different image quality levels in per-patient (p = 0.86) or per-slice (p = 0.65) analyses. Repeating the training/testing of the DCN showed no significant effect on scar DSC in per-patient (p = 0.64) or per-slice (p = 0.23) analyses.
The results of this study show the potential of deep learning as a tool for automatic segmentation of the LV and scar volume in patients with HCM, with strong agreement between the automatic and manual segmentations. Limitations of the study include lack of testing using an independent dataset and no interobserver evaluation of the manual segmentation. Further improvements in the network architecture as well as increasing the training dataset size and diversity are needed to improve the relatively low DSC segmentation score.
Please note: This project was supported in part by National Institutes of Health 1R01HL129185 (Bethesda, Maryland) and 1R21HL127650 (Bethesda, Maryland); and American Heart Association 15EIA22710040 (Dallas, Texas).
The authors have reported that they have no relationships relevant to the contents of this paper to disclose.
- 2018 The Authors
- Chan R.H.,
- Maron B.J.,
- Olivotto I.,
- et al.
- Engblom H.,
- Tufvesson J.,
- Jablonowski R.,
- et al.
- ↵Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015. 18th International Conference, Munich, Germany, Proceedings, Part III. New York, NY: Cham Springer International Publishing; 2015: 234–41.