Author + information
- Received September 30, 2010
- Revision received June 3, 2011
- Accepted June 8, 2011
- Published online August 1, 2011.
- Amer M. Johri, MD,
- Michael H. Picard, MD,
- John Newell, BA,
- Jane E. Marshall, RDCS,
- Mary Etta E. King, MD and
- Judy Hung, MD⁎ ()
- ↵⁎Reprint requests and correspondence:
Dr. Judy Hung, Blake 256, Massachusetts General Hospital, 55 Fruit Street, Boston, Massachusetts 02114
Objectives This study sought to determine whether a formalized teaching intervention could reduce the interobserver variability (IOV) in visual estimation of left ventricular ejection fraction (LVEF) within a group of sonographers and physicians with a spectrum of experience.
Background Precise and reliable echocardiographic assessment of LVEF is necessary for clinical decision-making and minimizing duplicative testing. Skill in the visual estimation of LVEF varies depending on experience and is critical for corroborating EF quantification. IOV may also lead to inconsistency if multiple readers are assessing the EF on serial exams.
Methods Fourteen cases of 2-dimensional echocardiograms were shown to 25 participants who estimated the EF based on a complete assessment of LV wall motion including parasternal, short-axis, apical, and subcostal views. The cases represented a spectrum of EF range, image quality, and clinical context. Following the initial interpretations, participants underwent a teaching intervention involving tutorial review of reference cases and group discussion of each case with determination of the EF guided by quantitative measure (biplane Simpson method). Three months after the teaching intervention, 14 new cases were shown to the 25 participants following the same methodology.
Results IOV was quantified before and after the teaching intervention with the use of a 3-factor, nested analysis of variance. The factors were: observer, patient, and pre- and post-intervention (time). The analysis of variance showed that the intervention reduced the IOV for the 25 readers between the pre- and post-intervention assessments (F = 2.8, p = 0.007). The IOV decreased from ±14% EF prior to intervention to ±8.4% EF following intervention (a 40% reduction in IOV).
Conclusions In a large echocardiography laboratory with a wide range of training levels and experience, a simple, formalized teaching intervention can successfully diminish IOV of LVEF assessment. This intervention provides not only discrete quality measures, but also serves as a practical tool to document and improve quality of reporting, potentially reducing clinical inefficiencies and repeat testing.
In this era of diminishing resources for imaging and increased attention to efficiency and appropriateness of imaging, methods to measure and improve quality in the echocardiography laboratory have become particularly relevant. High-quality echocardiography, meaning accuracy and consistency in interpretation, is essential when guiding clinical practice and answering questions such as the assessment of left ventricular (LV) function. It is also recognized by a number of national cardiology societies that quality can only be maintained through a pursuit of continuous improvement in echocardiographic performance and interpretation, and that this task is an important responsibility of all echocardiography labs (1–4) and is our professional imperative (5). In this study, we present a means by which our laboratory assessed the quality of echocardiographic interpretation of LV function and tested a teaching program to improve accuracy and consistency of interpretation.
We chose to assess the quality of LV function assessment in our lab not only because of the clinical importance of this question, but also because LV wall motion and function analysis are among the most complex interpretive echocardiography skills to master (6). The estimation of ejection fraction (EF) is an integral component of determining LV function and 1 of the most common reasons for echocardiography referral. Furthermore, precise and reliable echocardiographic assessment of LVEF is necessary for clinical decision-making and minimizing duplicative testing by echocardiography or other imaging modalities. It has been recommended that EF be quantified by the biplane Simpson method (7) and then the measurement corroborated by visual estimation. Thus, the visual estimation of EF is an essential skill relevant to all echo labs, laying the foundation for quantification.
Cardiac ultrasound laboratories can have many participants with widely varying levels of training and clinical practice and therein lies the challenge of implementing a program to assess and improve quality among a large group of interpreters. In this report, we present a stepwise approach to assessing and improving quality of LV assessment in such a group. Our objective was to first determine among a population of students, sonographers, and physicians with varying years of experience the interobserver variability (IOV) associated with both the visual estimation and quantification of LVEF. We then developed a case-based teaching intervention and tested its effectiveness by quantifying the change in IOV following the intervention.
Assessment of the IOV in the estimate of LVEF
The quality assessment and teaching intervention consisted of 6 1-h sessions performed over a 6-month period (Fig. 1). The first session involved a baseline evaluation of a LVEF estimate on a series of cases presented simultaneously to the group (pre-intervention assessment). Following the baseline assessment, the 3 additional 1-h sessions involved performance of the teaching intervention over a 12-week period. This teaching intervention involved reviewing the cases presented in a group, facilitator-led format as well as the availability of new cases for further self-directed review. The final 2 sessions comprised the re-evaluation phase with discussions, which were performed 3 months following the teaching intervention to allow for a period of time to pass (post-intervention assessment). The re-evaluation phase involved repeating the evaluation of a LVEF estimate on a new set of cases with feedback and group discussion of the results. A late follow-up session was conducted approximately a year later in a subgroup of participants.
Baseline assessment pre-teaching intervention
Fourteen 2-dimensional transthoracic echocardiograms were shown to 25 participants of various training levels and expertise. Participants consisted of sonography students, sonographers, cardiologists in an echocardiography fellowship, and staff cardiologists who were level III certified in echocardiography. All participated in the exercise simultaneously and were blinded to each other's interpretation.
Participants viewed edited versions of selected clinical echocardiograms, each of which consisted of multiple views of the LV, including multiple parasternal, short-axis, apical, and subcostal views. Participants had an equal amount of time (5 min per case) to review the case and provide a visual estimate of the LVEF as a single integer. Participants were assigned an answer sheet with a numerical identifier so that their answers remained anonymous but could be tracked.
To avoid bias in the visual estimation of LVEF, no information regarding the reason for referral, previous echocardiogram result, or officially reported EF was provided. In addition, all cases were new to the participants and had not been previously reviewed by them. The cases selected represented a spectrum of EF range, image quality, and clinical context and reflected a full spectrum of studies seen at our hospital. Image quality was graded based on the acoustic detail of the LV walls and endocardium visualized (quality: 1 = excellent acoustic detail; 2 = good acoustic detail; 3 = adequate acoustic detail; 4 = technically difficult study with poor or inadequate visualization of LV walls and/or endocardium).
The teaching intervention
After collection of the responses, a teaching intervention was conducted consisting of 3 1-h long facilitator-led tutorials. These sessions occurred during a regularly scheduled conference time to optimize attendance and were facilitated by senior echocardiography staff.
The first teaching session consisted of presentation of 10 selected reference cases, which were examples of LV function across a spectrum of EF ranges (LVEFs between 0.10 and 0.19, 0.20 and 0.29, 0.30 and 0.39, 0.40 and 0.49, 0.50 and 0.59, and LVEF >0.60). These cases were different from those displayed during the baseline assessment and had an EF measured using the biplane Simpson method by 2 experienced readers, each of whom had over 20 years of experience interpreting echocardiograms, including LVEF quantitation methods. These references cases allowed for presentation of a common standard of EF ranges with each case discussed by senior staff. Specific discussion points involved effect of wall motion abnormalities, abnormal septal motion, drop out of LV endocardium, and the effect of arrhythmias on LVEF assessment. In addition, the biplane Simpson method was displayed and reviewed for each case with specific discussion points involving choosing the end-diastolic volume and end-systolic volume frames for tracing, choosing the endocardial line, and excluding papillary muscles. In addition to these sessions, all cases were made available on a common server as a digital presentation to the participants for further self-directed review.
Following the presentation of the reference cases, in the remaining 2 1-h sessions, each of the 14 baseline cases were presented again to the group followed by a graph plotting the single-integer EF visual estimates of each participant. This graph also displayed the group mean of the visual estimates and the “true” LVEF measured by biplane Simpson method of discs performed by a single experienced reader. Each EF data point or visual estimate was identified by the anonymous numerical code, allowing participants to appreciate their individual number relative to the group mean as well as the variability (Fig. 2). Importantly, this graph provided a springboard for further discussion. The images were then presented again and discussion was centered on factors that influenced the participant's LVEF estimate. Learning objectives derived from the discussion included: 1) technical aspects of image acquisition affecting LVEF assessment (foreshortening, poor image quality); 2) impact of LV remodeling (wall motion abnormality, aneurysm formation) on assessment of EF; 3) impact of arrhythmias (bradycardia, tachycardia, irregular rhythm); and 4) impact of abnormal septal motion (pacing, conduction delays) on LVEF assessment.
Post-teaching intervention assessment
Three months following the end of the 12-week teaching intervention, using the same methodology as described in the previous section, 14 new cases were shown to the 25 participants. These cases were selected to have similar pathologies and image quality as those of the baseline assessment. Again, all participants were blinded to each other's assessments and retained the numerical code assigned to them prior to the teaching intervention. Assessment of IOV was calculated post-intervention and compared with pre-intervention performance.
Assessment of Misclassification (Accuracy)
For each case, individual visual estimates of LVEF were compared with the expert-derived EF quantified using the Simpson biplane method of discs. A disagreement between these 2 measures was termed a misclassification. We defined disagreement between the visual estimate and the quantified EF if it differed by an EF of more than ± 0.05. A single expert (J.H.) derived the EF for all cases using a consistent methodology. This expert is a level III trained echocardiographer at Massachusetts General Hospital with >20 years of experience, a high volume of echocardiograms interpreted per year, and extensive experience in employing the biplane method of discs for both clinical and research purposes. This expert was blinded to both the reported EF, and the visual estimates collected from the participants of this exercise.
In addition, to assess whether the effect of the teaching intervention remained for a longer period than 3 months, a subgroup of 8 participants underwent a repeat evaluation 12 months following the teaching intervention by reviewing 14 similar cases.
Assessment of the IOV in the quantitative measurement of LV volume and EF
We assessed IOV in the quantitative measurement of LVEF in a smaller exercise involving 5 entry-level sonographers and students. For this exercise, 5 new cases of various EF ranges (different from those used for the visual estimation exercise) were selected and shown to 5 participants. The participants were asked to work individually at a workstation to determine the LVEF by Simpson biplane method of discs. The final EF determination, as well as the end-systolic and end-diastolic volumes, from 4- and 2-chamber views were recorded. All 5 cases were then reviewed in a small group format facilitated by an expert cardiologist with greater than 15 years of experience in echocardiographic quantification. During this teaching intervention, participants' answers were reviewed and all cases were remeasured by the expert to demonstrate the proper approach to the Simpson biplane method of discs (per American Society of Echocardiography recommendations). Following this teaching intervention, 5 new cases were selected and provided to the same 5 participants for quantification of EF and determination of the IOV post-intervention.
The IOV in the visual estimation of LVEF was analyzed pre- and post-intervention for each case for the 25 participants. A nested 3-factor analysis of variance (ANOVA) (with factors: observer, patient, and pre- and post-intervention, i.e., time, with patient nested in time) confirmed that the mean EF across patients changed significantly across the teaching intervention and that there was a significant observer × time interaction. Because of this interaction, IOV was quantified by 2 separate ANOVAs for pre- and post-intervention. In each of these 2 analyses, IOV was taken as the square root of the observer mean square for error (MSE) estimated in the ANOVA. The pre- and post-IOVs were compared with the approximately F-distributed ratio = (Observer MSE – Pre-Intervention/Observer MSE – Post-Intervention). The IOV for the late follow-up comparison was conducted using the same statistical methods. The IOV in the determination of LVEF using the Simpson biplane method of discs was also analyzed pre- and post-intervention. A split ANOVA was used to compare IOV of volumes and EF before and after the teaching intervention. Fisher exact test was used to determine differences in misclassification rates pre- and post-intervention. A t test for proportions was used to determine whether the change in misclassification rate for mid-range cases was different from the change in misclassification rate for cases with normal or severely impaired EF.
Of the 25 people participating in the full exercise, 14 were sonographers and 11 physicians. The mean years of experience was 12 (±10) years; however, the range of experience varied greatly (1 to 35 years of experience). Seven participants had >20 years of experience, and 9 had less than 5 years of experience. The sonographer group included students that were completing their internship in an accredited program. All physicians had cardiology training and included clinical fellows completing subspecialization in echocardiography (level III training).
Information regarding each case shown pre- and post-intervention is shown in Table 1. Initially there were 14 cases pre- and post-intervention; however, 1 case was withdrawn from the post-intervention group due to technical display difficulties leaving 13 in total. There were 350 responses pre-intervention (14 cases, 25 participants) and 325 responses (13 cases, 25 participants) post-intervention. The same 25 interpreters participated in the pre- and post-intervention exercises. The EF range and quality range of cases displayed was similar in both pre- and post-intervention groups. The majority of studies were quality 2 and 3, and no quality 4 studies were shown.
The left-hand panel in Figure 2 shows the LVEF visual estimates from 25 participants for a case of moderate LV impairment pre-intervention with the group mean and quantified EF displayed. The right-hand panel in this figure shows the LVEF visual estimates from the same 25 participants for a case of moderate LV impairment post-intervention. There was an observable decrease in LVEF IOV by the group post-intervention for all cases. Pre-intervention, the ANOVA showed that the IOV for visual estimation of LVEF considering all cases was ±0.14 EF. Following intervention, the IOV was significantly decreased (F = 2.8, p = 0.007) to ±0.08 EF, representing a 40% reduction in IOV. When categorized broadly by level of experience (<5 years and >5 years), there was no difference in the degree of improvement in IOV pre- versus post-intervention.
We reassessed IOV in a small subgroup that underwent repeat evaluation 12 months following teaching intervention. For this group, the IOV pre-intervention was 0.15, which was significantly larger (F = 4.4, p < 0.00001) than IOV post-intervention (0.073). Most importantly, the IOV for the late follow-up (0.061) was significantly smaller than the pre-intervention IOV (F = 6.2, p < 0.00001) and slightly smaller than the post-intervention IOV (F = 1.4, p = 0.064). This suggests that the impact of the teaching intervention on reducing variability can be sustained.
Assessment of misclassification
A response was considered misclassified if the visually estimated EF differed by more than ±0.05 from the quantified EF derived by an expert for each case using the biplane method of discs. The misclassification rate was determined from the 350 responses pre-intervention and from 325 responses post-intervention. Table 2 lists the misclassification rate for each session. For both pre- and post-intervention, cases were further divided into 2 groups, the first representing mild to moderate impairment (mid-range EF 0.30 to 0.55), and the second group representing normal (EF >0.55) or severe impairment (EF <0.30). There were 8 mid-range cases pre-intervention and 7 cases post-intervention. Pre-intervention, the misclassification rate was 44% for all cases and was reduced to 29% following intervention (p < 0.00001). For the mid-range EF cases (mild to moderately impaired EF), the misclassification rate of 66% was significantly reduced to 31% post-intervention (p < 0.00001). For cases with normal or severely impaired EF, the initial misclassification rate of 17% was reduced to 1% post-intervention (p < 0.00001). The change in misclassification rates for mid-range cases was greater than the change for the cases with normal or severely impaired EF but was just short of reaching statistical significance (p = 0.058).
In the second quantification exercise, for the 5 sonographers who measured LV volumes, the SD for the determination of LVEF by Simpson biplane method of discs was initially 4.09%. This was reduced to 2.49% following intervention (Table 3). The IOV pre-intervention for LVEF by the biplane method was ±0.06 EF, which reduced to an IOV of ±0.03 EF post-intervention, representing a significant reduction (F = 3.56, p = 0.017). We then quantified the IOV for end-diastolic and end-systolic volumes in the 4- and 2-chamber views pre- and post-intervention. We found that there was a significant reduction in the IOV for 2-chamber end-systolic and 2-chamber end-diastolic volumes (F = 4.66, p = 0.006, and F = 7.62, p = 0.0006, respectively). The reduction in IOV for 4-chamber end-systolic and 4-chamber end-diastolic volumes did not reach significance.
Single and serial echocardiographic assessments of LV function are important to the management of many clinical conditions including the timing of operation for valvular surgery, decisions for defibrillator implantation following myocardial infarction, and dosing of certain chemotherapeutic agents. Thus, it is critical that echocardiography laboratories report consistent, accurate, and reproducible findings regardless of the number of members within a laboratory. Likewise, monitoring of accuracy and reproducibility within an echocardiography lab can identify when interventions are necessary for improvement. Our paper reports that a simple teaching exercise can reduce IOV for the qualitative assessment of LV function from a baseline of 0.14 to 0.08 after implementation. In addition to reducing variability among laboratory members with this teaching exercise, the intervention also resulted in less misclassification of LV function grading by the group, especially for the mild and moderate grades of LV dysfunction, which are the most important cutoff points for clinical decision-making. We also noted that the change in IOV of LVEF visual estimation was not different when categorized broadly by experience, suggesting that readers of all experiences benefitted from the intervention. This suggests that even experienced readers have their own bias relative to the group mean. The importance of this intervention is minimizing the overall IOV within the laboratory.
Quantitative measures of LVEF and volumes using the Simpson biplane method of discs also showed a significant reduction in IOV following a teaching intervention (IOV of 0.06 reduced to 0.03 after implementation). We note that the IOV pre-intervention for visual estimation was more than twice the IOV pre-intervention for LVEF by quantification; however, this may reflect the much smaller number of participants and cases examined in the quantitative teaching exercise.
Although the visual estimate is usually not the sole method of EF determination provided by most echo labs, it is a fundamental skill necessary to corroborate EF quantified by all other means. Thus, teaching interventions such as the one we have presented here are vital to improving accuracy and consistency within an echo lab. However, there are several challenges to successfully integrating such a quality assessment program into a busy, high volume clinical setting. Some of these challenges have been highlighted by other authors implementing similar programs (8) and include whole group participation, lack of a clear teaching focus, and finding the time to implement the program. Participation was successful in this program because sessions occurred during a regularly scheduled conference time and attendance was strongly encouraged by senior members of the lab who played an active role in guiding discussion. The program focused on case-based presentations rather than didactic sessions, as these were felt to be best suited for self- and group evaluation. Finally, there was a mechanism to reassess improvements in quality compared with performance prior to intervention. Group participation provided an efficient learning environment for both self- and group evaluation. Thus, consideration of factors such as strong faculty support, timing and convenience of teaching sessions, and clear objectives would also allow other laboratories to be successful in implementing a similar quality improvement program. It is important to note that these learning objectives will be different for different laboratories and will depend on factors such as the number of participants, the volume of echocardiograms and workload, and the age and training experience of lab members. In this study, the entire process consisted of 6 1-h sessions that occurred over a 6-month period. However, the specific protocol of these sessions, as well as the goals of the quality assessment, can and should be tailored to each individual laboratory.
Based on our results, we feel that the factors critical to developing a successful quality improvement intervention are to: 1) define a simple quality measure; 2) provide a convenient forum for participation; 3) identify specific learning objectives of the teaching intervention; 4) provide case-based presentations (versus formal didactic sessions) as best suited for self- and group evaluation; 5) provide a mechanism for feedback in an environment that fosters group discussion and self-evaluation; and 6) perform serial assessments of quality measures (Fig. 3).
The American College of Cardiology—Duke University Medical Center Think Tank on Quality in Cardiovascular Imaging has recommended the development of Internet-based case studies to assess variation of interpretation and for comparison against a national gold standard (9). The cases series and teaching intervention presented may provide a framework for quality measurement and improvement in LVEF determination by echo labs on an expanded scale.
We determined IOV based on responses provided as single integer EFs. Usually the visual estimate is given as a range rather than a single value; thus, our assessment of the imprecision may be overestimated. However, because we used the same methodology pre- and post-intervention, we are thus able to determine the reduction in IOV following the teaching intervention even if there is overestimation in both groups. Our assessment of accuracy was based on a single expert–quantified EF using the biplane method of discs. There could be IOV in the quantification of EF among the experts in our laboratory as well, although for a small group of participants in our laboratory, the IOV was small. We also note that the expert-derived EF corroborated well with the group mean visual estimate for each case (Table 1). We tailored our teaching intervention program to include a wide range of participants, including new learners who are often only available on a short-term basis. Therefore, we developed a program that would maximize both participation and the effectiveness based the availability of the same group of participants throughout the entire study. Such real-world time constraints did not allow us to assess intraobserver and interobserver variability over a longer period for all participants. In future long-term studies, we will address the effect of repeated teaching intervention for LVEF assessment on interobserver and intraobserver variability.
A simple formalized teaching intervention can meet the challenge of reducing IOV in the assessment of LVEF in a large echocardiography lab with varied training levels and experience. The reduction of IOV using the intervention can be sustained. This intervention provides not only discrete quality measures, but also serves as a tool to document and improve quality of reporting, potentially reducing clinical inefficiencies and repeat testing.
All authors have reported that they have no relationships relevant to the contents of this paper to disclose.
- Abbreviations and Acronyms
- analysis of variance
- ejection fraction
- interobserver variability
- left ventricular
- Received September 30, 2010.
- Revision received June 3, 2011.
- Accepted June 8, 2011.
- American College of Cardiology Foundation
- Cheitlin M.D.,
- Armstrong W.F.,
- Aurigemma G.P.,
- et al.
- Douglas P.S.
- Kisslo J.
- Douglas P.,
- Iskandrian A.E.,
- Krumholz H.M.,
- et al.