Author + information
- Received October 21, 2016
- Revision received January 24, 2017
- Accepted January 26, 2017
- Published online May 17, 2017.
- Oana Mirea, MD, PhDa,
- Efstathios D. Pagourelias, MD, PhDa,
- Jurgen Duchenne, MSca,
- Jan Bogaert, MD, PhDb,
- James D. Thomas, MDc,
- Luigi P. Badano, MD, PhDd,
- Jens-Uwe Voigt, MD, PhDa,∗ (, )
- EACVI-ASE-Industry Standardization Task Force
- aDepartment of Cardiovascular Diseases, University Hospital Leuven, Leuven, Belgium
- bDepartment of Radiology, University Hospital Leuven, Leuven, Belgium
- cBluhm Cardiovascular Institute, Northwestern University, Chicago, Illinois
- dCardiac, Thoracic and Vascular Sciences, University Padua, Padua, Italy
- ↵∗Address for correspondence:
Prof. Dr. Jens-Uwe Voigt, Department of Cardiovascular Diseases, University Hospital Gasthuisberg, Herestraat 49, 3000 Leuven, Belgium.
Objectives In this study, we compared left ventricular (LV) segmental strain measurements obtained with different ultrasound machines and post-processing software packages.
Background Global longitudinal strain (GLS) has proven to be a reproducible and valuable tool in clinical practice. Data about the reproducibility and intervendor differences of segmental strain measurements, however, are missing.
Methods We included 63 volunteers with cardiac magnetic resonance–proven infarct scar with segmental LV function ranging from normal to severely impaired. Each subject was examined within 2 h by a single expert sonographer with machines from multiple vendors. All 3 apical views were acquired twice to determine the test-retest and the intervendor variability. Segmental longitudinal peak systolic, end-systolic, and post-systolic strain were measured using 7 vendor-specific systems (Hitachi, Tokyo, Japan; Esaote, Florence, Italy; GE Vingmed Ultrasound, Horten, Norway; Philips, Andover, Massachusetts; Samsung, Seoul, South Korea; Siemens, Mountain View, California; and Toshiba, Otawara, Japan) and 2 independent software packages (Epsilon, Ann Arbor, Michigan, and TOMTEC, Unterschleissheim, Germany) and compared among vendors.
Results Image quality and tracking feasibility differed among vendors (analysis of variance, p < 0.05). The absolute test-retest difference ranged from 2.5% to 4.9% for peak systolic, 2.6% to 5.0% for end-systolic, and 2.5% to 5.0% for post-systolic strain. The average segmental strain values varied significantly between vendors (up to 4.5%). Segmental strain parameters from each vendor correlated well with the mean of all vendors (r2 range 0.58 to 0.81) but showed very different ranges of values. Bias and limits of agreement were up to −4.6 ± 7.5%.
Conclusions In contrast to GLS, LV segmental longitudinal strain measurements have a higher variability on top of the known intervendor bias. The fidelity of different software to follow segmental function varies considerably. We conclude that single segmental strain values should be used with caution in the clinic. Segmental strain pattern analysis might be a more robust alternative.
Two-dimensional speckle tracking echocardiography has been proposed for improving the echocardiographic quantification of left ventricular (LV) segmental and global function. Longitudinal strain (LS) appears to be the most robust among the various myocardial strain components, and global longitudinal strain (GLS) has demonstrated added diagnostic and prognostic value in a wide range of conditions such as heart failure (1), valvular heart disease (2), and others. In a previous study by this task force, we showed good reproducibility of GLS measurements (3), suggesting that this technique can be safely used in the clinic, in particular for repeated measurements in the same patient (4,5).
Previous studies have demonstrated that the assessment of segmental LS provides added information in wide range of pathologies (6–9). However, data about the reproducibility of segmental strain are conflicting (10,11), and intervendor differences remain to be assessed.
In the context of the ongoing work of the task force on strain standardization, which was initiated by the European Association of Cardiovascular Imaging (EACVI) and the American Society of Echocardiography in collaboration with industry (12,13), we have set up a study to investigate the robustness, reproducibility, and intervendor variability of segmental speckle tracking–based strain measurements.
The study population comprised patients with prior myocardial infarction and healthy volunteers. A short list of 63 potential patients was created from hospital records on the basis of the following criteria: 1) age >18 years and ability to consent, walk, and lie in supine position for 2 hours; 2) good acoustic window and regular heart rhythm; 3) a documented myocardial infarction within maximum 2 years before the study; and 4) the existence of a late gadolinium enhancement (LGE) cardiac magnetic resonance (CMR) study performed after the myocardial infarction (without other ischemic events or cardiac interventions before the image acquisitions for this study). Patients were then contacted by telephone and invited to participate in the study. Care was taken to cover a wide range of segmental and functional abnormalities. In case not all invited patients would present for the study, healthy volunteers were recruited as “gap fillers in stand-by” from the coworkers of our imaging laboratory. They were all in good condition, in sinus rhythm, and had no evidence of cardiac disease in their history, resting electrocardiography (ECG), and baseline echocardiogram.
The study was approved by the ethical commission of the University Hospitals Leuven and all subjects gave written informed consent before inclusion.
Industry partner recruitment
All industry partners within the task force were invited to participate in the study by an open letter. Seven ultrasound machine manufacturers provided an ultrasound machine, speckle tracking software, and an application specialist to optimize data acquisition for the study. Additionally, 2 manufacturers of generic software solutions for speckle tracking analysis participated in the comparison. One company (Philips) withdrew later from the study for technical reasons. A list of participants is provided in Table 1.
The echocardiographic image acquisitions were completed in 5 days during 9 sessions of 2 to 3 h each. Seven subjects were simultaneously scanned with the 7 different ultrasound machines. To minimize differences in image acquisition, 1 experienced examiner (with at least 2 years of experience in routine echocardiography, including the use of speckle tracking) was assigned to each subject and both rotated through all machines. Examiners were responsible for the acquisition of high-quality standard echocardiographic images. In addition, application specialists ensured optimal machine settings and image acquisition according to the respective manufacturers’ recommendations.
Blood pressure was measured at the beginning and at the end of the echocardiography session. Patients were examined in left lateral decubitus position. LV 4-, 3-, and 2-chamber views were acquired during breath hold. Pulsed wave Doppler recordings of the mitral inflow and aortic outflow were obtained for timing measurements. Examiners then left the examination bed for at least 1 min and walked around. After that, a second set of apical views was acquired for the assessment of test-retest variability.
A minimum of 3 consecutive cycles was recorded per view. All image data were stored as raw data in a proprietary company format if available. In addition, all data were also stored in standard Digital Imaging and Communications in Medicine format to allow post-processing with the independent software packages.
All CMR studies were performed on a 1.5-T Philips Intera-CV (Philips, Best, the Netherlands). Cine images were taken in horizontal, vertical, and short-axis views. Ten minutes after intravenous bolus of 0.2 mmol/kg gadolinium-tetraazacyclododecane-tetraacetic acid (Dotarem, Guerbet, Villepinte, France), LGE images were acquired in the same views. LGE images were used to verify the existence of scar and to determine the segmental scar burden per patient using an 18-segment model (14). Healthy volunteers were assumed to have no scar without CMR examination.
Conventional echocardiographic parameters
Image quality was scored per vendor and per segment (Online Appendix). Biplane volumes and ejection fraction were calculated by using modified Simpson rule (4). The measurements were performed on the images of 1 vendor (GE).
All strain measurements were performed by a single observer (O.M.). This observer had a solid background in tissue Doppler and speckle tracking analysis (>2,000 analyses) before starting the data analysis of this study. Images were analyzed using the vendor-specific speckle tracking software. Before the analysis, the observer was trained by application specialists from each company in the use of their software packages. There was no specific order of the vendor analysis. For the 2 independent software providers, Digital Imaging and Communications in Medicine images acquired with the GE system were used.
With each software, patient data were analyzed in the order of the study identification number. In each view, the cardiac cycle with the best image quality was selected. End-diastole was manually set to the R peak of the ECG. If the software did not allow that, the automatic settings of the software were used instead. The aortic valve closure (AVC) was measured from the pulsed wave Doppler recording of the LV outflow tract and AVC was manually set to this time in all software packages.
Next, the region of interest was created either by manually tracing the endocardium or by automated recognition according to the requirements of the software. All post-processing settings were maintained as recommended by the vendor. The quality of the tracking was assessed for each segment by visually comparing the tracking result with the underlying myocardial motion. Segments were excluded from further analysis when the tracking did not follow accurately the myocardial motion after at least 2 attempts of re-adjusting the region of interest.
We used an 18-segment model (3 segments/wall), according to the recommendations for segmental function analysis (4,13). In each segmental strain curve, peak systolic (PS), end-systolic (ES), and post-systolic strain (PSS) peak were measured. The peaks were defined as follows: PS, maximum (positive or negative) strain value before AVC; ES, strain value at AVC; and PSS, the maximum negative deformation after AVC, if more negative than ES (Figure 1). All strain values are reported as measured (i.e., more negative values represent more shortening). Because it is common usage, we refer in the discussion to the measured absolute amount of strain (i.e., “higher strain values” indicates that measured LS values were more negative).
Test-retest variability was assessed by intraclass correlation (ICC) (2-way mixed model, absolute agreement between single measurements) and as absolute error between repeated measurements. The bias between software packages in segmental strain was compared by repeated measures analysis of variance (ANOVA). Further, segmental measurements of each vendor were compared to the average of all vendors for the same segment. Additionally, ICC coefficients were calculated among vendors (Online Appendix).
Of 63 patients initially invited to the study, 5 dropped out (3 no-shows, 1 atrial fibrillation, 1 physical inability to complete all echocardiographic examinations) and had to be replaced by healthy volunteers. Patient characteristics are summarized in Online Table 1. As planned, a total of 882 echocardiographic examinations (2 examinations on 7 machines/subject) could be performed. Systolic arterial blood pressure increased slightly during the scanning session (128 ± 20 mm Hg to 135 ± 17 mm Hg; p < 0.05), whereas diastolic blood pressure remained unchanged (73 ± 13 to 74 ± 9; p = 0.6). The ejection fraction in our study population ranged from 28% to 73% (average 52.4 ± 9.9%).
Segmental scar burden could be defined in all 1,134 segments. Of these, 748 (66%) had no evidence of scar, 129 (11.4%) had a nontransmural scar, 241 (21.3%) a transmural scar (>75% of wall thickness), and only 16 (1.4%) were partially scarred (20% to 80% of segment length).
The number of segments that could not be tracked with acceptable quality differed significantly between vendors, ranging from 7.1% to 22.9% (ANOVA p < 0.05) (Figure 2). Wall-by-wall analysis revealed that the interventricular septum and the inferior wall had the highest tracking feasibility while the anterior wall segments were most difficult to track. Online Figure 1 shows the exclusion per segment.
Test-retest variability of segmental strain measurements
The test-retest agreement ranged from moderate to excellent (ICC coefficients: 0.67 to 0.90) (Table 2) and showed significant differences between vendors (ANOVA p < 0.05).
The average absolute difference between LS values from the same segment in the first and second image acquisition ranged from 2.6% to 4.9% for PS, 2.6% to 5.0% for ES, and 2.5% to 5.0% for PSS (Figure 3). Interestingly, the test/retest variability of PS, ES, and PSS was not significantly different within a given vendor, except for Samsung, where it was higher for ES (ANOVA p > 0.05 and p < 0.01, respectively).
Online Table 2 shows the absolute difference per level and per segment for PS. In general, midwall segments had the lowest test-retest variability in all vendors.
The average values of PS, ES, and PSS LS of all segments of our study cohort are displayed in Figure 4. The maximum absolute difference between the vendors with the highest and lowest values was 4.5% for all 3 parameters. In more than one-half of the post hoc comparisons, the bias between vendors reached statistical significance (ANOVA p < 0.05) (Figure 4).
The intervendor agreement of LV segmental PS values between vendors ranged from poor to good (ICC between 0.52 and 0.79, Table 3). The intervendor agreement of LV segmental ES was similar (ICC between 0.52 and 0.79, Online Table 3) and slightly lower for PSS (ICC between 0.45 and 0.77, Online Table 4).
Table 4 shows the Pearson correlation coefficients for the pairwise vendor vs. vendor comparisons of PS values. Data for ES and PSS are provided in Online Tables 5 and 6. The Bland-Altman analysis of the same comparisons is provided in Table 5 and Online Tables 7 and 8.
The correlation of segmental PS strain measurements from a given vendor with the segmental mean of all vendors ranged from r2 = 0.58 to r2 = 0.81 (Figure 5). The slopes of the regression lines reveal that the range of measured strain values differs among vendors with GE having the highest range (slope: 1.29) and Esaote having the lowest (slope: 0.82) (Figure 5). The analysis per ventricular level (apical, mid, basal) revealed that the basal segments showed the lowest correlation with the mean of all vendors (r2 from 0.50 to 0.69), whereas the apical segments showed the highest (r2 from 0.76 to 0.85) (Online Figure 2). The same analysis reveals that the differences in the slope of the regression line are most pronounced in the apex.
Main findings of the study
In this study, we directly compared the speckle tracking–based segmental strain measurements from 6 ultrasound machine vendors and 2 software-only vendors in a group of volunteers with myocardial segmental function ranging from normal to severely impaired. We found that: 1) the feasibility of assessing segmental strain differs significantly among vendors; 2) the test-retest variability is relatively high but has also a considerable range among vendors; 3) the intervendor bias is relevant; and 4) the range of measured segmental strain values differs between vendors.
Measurement variability: selected aspects
A more extensive discussion of potential sources of measurement variability is provided in the Online Appendix.
The specific algorithms used by speckle tracking software solutions from different vendors may have a significant impact on strain results. We have therefore aimed particularly at finding direct or indirect evidence for differences in the processing of the data.
Most of the speckle tracking algorithms apply noise reduction by temporal and spatial smoothing. It can be expected that extensive smoothing improves the robustness of GLS assessments but may lead to a lower sensitivity towards small segmental or temporal abnormalities. In our comparisons, we found a considerable difference in the range of measured segmental stain values among different vendors. A high range of values could theoretically be due to high noise levels; however, because the correlation between vendors was acceptable, it must be assumed that the range of measured strain values from a respective software is rather reflecting how rigid its assumptions about the homogeneity of myocardial function are (i.e., how much spatial smoothing is applied to the data). A higher range of values would than rather reflect the better fidelity of a software in following myocardial motion locally.
Strain measurements rely strongly on the definition of cardiac time events (15). An accurate identification of end-diastole and ES is of particular importance in segmental disease when the timing of strain peaks becomes as important as the amplitude. In the noncommercial Samsung software, where options for manual setting of AVC were limited because of a preliminary user interface, we consequently found a clearly higher variability in ES strain measurements which depend most on the accurate definition of AVC (Figures 1 and 3).
Longitudinal deformation is highest in the endocardium and lowest in the epicardium (16). It is therefore important to consider where LS is measured. So far, there is no sufficient evidence to decide if endocardial, midwall, or full wall strain is the best choice for clinical use. In this study, endocardial strain was analyzed for purely practical reasons because it was the only LS parameter that could be provided by all vendors.
Feasibility of assessing segmental strain
The feasibility of assessing segmental strain was different among vendors, which is likely due to both differences in image quality and tracking algorithm. The differences observed among GE, Epsilon, and TOMTEC can be solely attributed to the applied software package because the same image datasets were used for analysis.
We also found that the anterior and the lateral walls have the highest rate of exclusions. This is in agreement with previous reports (10) and may be related to the high burden of artifacts and noise in this region leading to a poorer recognition of speckles.
Variability of repeated strain measurements (test-retest variability)
A number of studies evaluated the intraobserver and interobserver variability of segmental strain with conflicting results. Mavinkurve-Groothuis et al. (17) assessed the reproducibility of segmental strain in 1 vendor in a small number of normal volunteers and found that it was good in 4-chamber views but poor in the 2-chamber views. In the HUNT (Nord-Trøndelag Health) study, the segmental strain analysis was performed on a larger number of segments, and the results showed poor intraobserver and interobserver reproducibility (11). A more recent study, which reported reproducibility data of PS strain, showed very good test-retest agreement (ICC ranged from 0.88 to 0.97) (10). It is not clear, however, which exact settings of the ICC test were used.
In the present study, we found the averaged absolute difference between repeated measurements of different LS parameters ranging from 2.5% to 5.0% (irrespective of the 1 outlier of 6.4%, which can be attributed to timing issues). Although the lower end of this range might be still considered acceptable under certain conditions, the higher end constitutes an average relative error in the range of 25%, which renders a segmental strain measurement jeopardized for clinical use. Our analysis revealed that the segmental strain reproducibility can differ between apical, mid, and basal segments, which again likely reflects different underlying algorithms in the respective software packages.
To our knowledge, this is the first study to investigate the intervendor differences of segmental strain in a clinical setting using 8 different software packages. The maximal absolute difference between the vendor with the highest and the lowest values was 4.5% for all 3 measured strain parameters, which is slightly higher than the 3.7% earlier reported for GLS (3).
The higher reproducibility of GLS could be related to the averaging algorithms over larger regions of the myocardium and inclusion of models of LV behavior. In contrast, segmental strain has to rely entirely on the accuracy of the local tracking results and the quality of the artifact detection algorithms.
Several ANOVA post hoc tests showed significant differences among vendors (Figure 4). Different definitions of endocardial strain may partially explain these differences. Although some vendors calculate strain values at a virtual endocardial border, others track subendocardially (up to one-third of myocardial thickness), which might result in lower values.
Lacking a reliable “ground truth” in this clinical setting, we compared segmental measurements of each vendor with the average of all vendors for the same segment (Figure 5). It is reassuring that we found a moderate to good linear correlation for all vendors, which is in contrast to earlier studies (18). Analysis per LV level revealed that the basal segments had the lowest agreement which is in agreement with previous studies (19).
Although the correlation coefficients indicated overall good intervendor agreement, the absolute segmental differences and limits of agreement were relatively large, indicating that a considerable amount of noise is superimposed on the overall bias in measurements between vendors (Table 5).
Good local tracking versus susceptibility for noise
If we assume from the above that the test-retest variability reflects to a large extent the robustness of the tracking algorithm and that a wide range of values is an indicator of good fidelity to segmental abnormalities, then a good software should combine both. We have therefore combined both measures in a comprehensive graph (Figure 6).
Although the present study was set up to mimic clinical routine, several parameters were controlled to prepare an optimal environment for a fair comparison of different machines and software packages: patients were selected for better than average image quality; repeated scans were performed by the same expert examiner; except for Esaote (portable device), only high-end ultrasound machines were used; and a company representative ensured technically optimal acquisitions. Moreover, the analysis was performed by an expert observer (with solid background in tissue Doppler and speckle tracking analysis). We must therefore assume that the variability of segmental strain measurements will be even larger in a real-world clinical setting. It must be further assumed that involving different software versions of the same vendor would add to the disagreement of measurements. The interobserver variability was not tested. It is, however, expected that it is even higher.
In this study, we have tested the accuracy and reproducibility of different peak strain parameters only. We did not investigate how reliably the shape of a strain curve is reproduced by a software or how well relative differences between regions are reflected independent from their absolute strain values. Furthermore, we have not tested the reproducibility of the timing of strain peaks, which would be relevant for any dyssynchrony related function assessment. All this remains a task for the further analysis of this dataset.
For this paper, cardiac magnetic resonance was used solely to characterize patients with an infarct because it was our intention to include patients with a wide range of abnormalities. A detailed investigation on the relation between strain values and scar extend or scar transmurality would reach far beyond the scope of this reproducibility study.
In contrast to GLS measurements, which showed excellent reproducibility and only a moderate, yet significant bias between vendors (5), segmental LS measurements have a higher degree of measurement variability. Therefore, single segmental strain measurements should be used for clinical decision-making, monitoring, and research only with caution. The extent to which other means of segmental function assessment, such as strain curve shape analysis or relative comparison between regions, could compensate for the relatively poor segmental reproducibility remains to be determined.
COMPETENCY IN MEDICAL KNOWLEDGE: This study deals with an essential topic: that of intervendor reproducibility of segmental LS, and provides a comprehensive comparison between different software packages. Our findings show that segmental strain measurements have a considerable test-retest and intervendor bias, suggesting that such measurements should be used with prudence in research and clinical practice. Whether other characteristics of strain, such as curve shape, are more robust and specific markers of disease remains to be determined.
TRANSLATIONAL OUTLOOK: GLS has demonstrated clinical relevance and is now implemented in daily practice. Segmental LS parameters could also provide a broad spectrum of diagnostic options; however, additional improvements from companies are still required to increase the reproducibility of segmental strain measurements.
The authors thank all industry partners for their active support and constructive contribution to this project. The authors also thank Sarah Magits for her excellent logistic support and help with patient recruitment; our technicians Sarah Fabré, Ibn Tielens, Monique Tillekaerts, Anita Tuteleers, and Jolien Vissers; our colleagues, doctoral students, research fellows; and assistants Claire Bouleti, Guido Claessen, Charlien Gabriels, Kaatje Goetschalckx, Peter Haemers, Thibault Petit, Frédéric Schnell, Daisy Thijs, and Katrien De Vadder for their help with patient scanning and data processing.
For supplemental material, figures, and tables, please see the online version of this article.
Dr. Mirea is permanently affiliated to the Department of Cardiology, University Hospital of Craiova, Romania. This study was supported by a dedicated grant from the American Society of Echocardiography. Dr. Mirea has received a research grant from the European Association of Cardiovascular Imaging. Dr. Pagourelias holds a research grant from the European Association of Cardiovascular Imaging. Dr. Thomas has received honoraria and consulting fees from Edwards, Abbott, and GE. Dr. Voigt holds a personal research mandate from the Flemish Research Foundation; and has received a research grant from the University Hospital Gasthuisberg. All other authors have reported that they have no relationships relevant to the contents of this paper to disclose.
- Abbreviations and Acronyms
- analysis of variance
- aortic valve closure
- cardiac magnetic resonance
- global longitudinal strain
- intraclass correlation
- late gadolinium enhancement
- longitudinal strain
- left ventricular
- peak systolic
- post-systolic strain
- Received October 21, 2016.
- Revision received January 24, 2017.
- Accepted January 26, 2017.
- 2017 American College of Cardiology Foundation
- Motoki H.,
- Borowski A.G.,
- Shrestha K.,
- et al.
- Lang R.M.,
- Badano L.P.,
- Mor-Avi V.,
- et al.
- Plana J.C.,
- Galderisi M.,
- Barac A.,
- et al.
- Phelan D.,
- Collier P.,
- Thavendiranathan P.,
- et al.
- Bertini M.,
- Ng A.C.,
- Antoni M.L.,
- et al.
- Barbier P.,
- Mirea O.,
- Cefalù C.,
- Maltagliati A.,
- Savioli G.,
- Guglielmo M.
- Thomas J.D.,
- Badano L.P.
- Voigt J.U.,
- Pedrizzetti G.,
- Lysyansky P.,
- et al.
- Mada R.O.,
- Lysyansky P.,
- Daraban A.M.,
- Duchenne J.,
- Voigt J.U.