Author + information
- Received May 14, 2009
- Revision received September 22, 2009
- Accepted September 28, 2009
- Published online March 1, 2010.
- Simon Biner, MD⁎,†,
- Asim Rafique, MD†,
- Farhad Rafii, MD‡,
- Kirsten Tolstrup, MD†,
- Omid Noorani, MS§,
- Takahiro Shiota, MD†,
- Swaminatha Gurudevan, MD† and
- Robert J. Siegel, MD†,⁎ ()
- ↵⁎Reprint requests and correspondence:
Dr. Robert J. Siegel, 8700 Beverly Boulevard, Room 5623, Los Angeles, California 90048-1804
Objectives The aim of this study was to evaluate the interobserver agreement of proximal isovelocity surface area (PISA) and vena contracta (VC) for differentiating severe from nonsevere mitral regurgitation (MR).
Background Recommendation for MR evaluation stresses the importance of VC width and effective regurgitant orifice area by PISA measurements. Reliable and accurate assessment of MR is important for clinical decision making regarding corrective surgery. We hypothesize that color Doppler-based quantitative measurements for classifying MR as severe versus nonsevere may be particularly susceptible to interobserver agreement.
Methods The PISA and VC measurements of 16 patients with MR were interpreted by 18 echocardiologists from 11 academic institutions. In addition, we obtained quantitative assessment of MR based on color flow Doppler jet area.
Results The overall interobserver agreement for grading MR as severe or nonsevere using qualitative and quantitative parameters was similar and suboptimal: 0.32 (95% confidence interval [CI]: 0.1 to 0.52) for jet area–based MR grade, 0.28 (95% CI: 0.11 to 0.45) for VC measurements, and 0.37 (95% CI: 0.16 to 0.58) for PISA measurements. Significant univariate predictors of substantial interobserver agreement for: 1) jet area–based MR grade was functional etiology (p = 0.039); 2) VC was central MR (p = 0.013) and identifiable effective regurgitant orifice (p = 0.049); and 3) PISA was presence of a central MR jet (p = 0.003), fixed proximal flow convergence (p = 0.025), and functional etiology (p = 0.049). Significant multivariate predictors of raw interobserver agreement ≥80% included: 1) for VC, identifiable effective regurgitant orifice (p = 0.035); and 2) for PISA, central regurgitant jet (p = 0.02).
Conclusions The VC and PISA measurements for distinction of severe versus nonsevere MR are only modestly reliable and associated with suboptimal interobserver agreement. The presence of an identifiable effective regurgitant orifice improves reproducibility of VC and a central regurgitant jet predicts substantial agreement among multiple observers of PISA assessment.
American Society of Echocardiography guideline recommendations for the evaluation of mitral regurgitation (MR) severity include assessment of color flow Doppler (CFD) regurgitant jet area and quantification of MR by vena contracta (VC) and by effective regurgitant orifice area (EROA) using proximal isovelocity surface area (PISA) (1,2). A reproducible method for assessment of MR severity is important for the management of patients with MR.
Visualization of the regurgitant jet area in the left atrium provides a fast assessment of the shape, size, and direction of the regurgitant jet and a qualitative evaluation of its severity, whereas assessment of VC and PISA requires meticulous attention to technical details both at image acquisition and during actual measurements (Figs. 1A and 1B) (2–5). Echocardiographic research laboratories have demonstrated that PISA and VC are accurate for MR assessment and have good interobserver agreement (3,6–11). However, the etiology of MR, the mitral valve morphology, and the regurgitant jet characteristics further complicate the assessment of MR (2,5,6). In addition, the dynamic nature of MR and pansystolic changes in VC and proximal flow convergence rate may also lead to a considerable temporal variation of both the VC width and the 2-dimensional PISA radius throughout systole, causing uncertainty among echocardiogram readers about which particular frame to choose for measurement (7). Therefore, we hypothesized that quantitative measurements of PISA and VC for classifying MR as severe versus nonsevere will have considerable interobserver agreement among clinical echocardiologists.
The primary aim of this study was to evaluate the interobserver agreement of quantitative MR assessment by PISA and VC for differentiating severe from nonsevere MR. In addition, we also evaluated the reproducibility of qualitative CFD-based MR jet area as it is frequently used in clinical settings to help identify severity of MR.
Interobserver agreement of jet area–based assessment of MR severity, PISA, and VC was evaluated using transthoracic echocardiograms of 16 consecutive patients referred to Cedars Sinai Medical Center for possible surgical correction of MR. All patients had a history of effort intolerance or were asymptomatic with left ventricular systolic dysfunction. The focus of the assessment was the classification of MR into severe versus nonsevere categories to identify appropriate candidates for mitral valve surgery. The Cedars Sinai Medical Center Institutional Review Board approved this study.
Eighteen cardiologists with mean 16 (range 2 to 40) years of post-cardiology fellowship experience as echocardiographic specialists (echocardiologists), practicing at 11 different university-based institutions in the U.S., Japan, and Israel were provided secure Web-based access to relevant echocardiographic images. The influence of institutional standards on interobserver agreement was evaluated by comparing the agreement of the 6 echocardiologists practicing in a single institution to the agreement of the echocardiologists from multiple institutions.
A single trained sonographer with clinical trial research experience performed all studies. A narrow color flow sector width and the least depth were chosen to maximize image resolution. The CFD images of the mitral valve regurgitant jet were acquired in the parasternal long axis, apical 4- and 2-chamber views at a Nyquist limit of 50 to 60 cm/s (2). In addition, proximal flow convergence was recorded using a magnified 4-chamber view, with baseline shift of the Nyquist limit to optimize visualization of flow convergence (2) (Figs. 2A and 2B). All echocardiogram clips (moving images) were available on the Website. There were no still frame images. The reader, however, was able to play the images, stop frame them, magnify them, perform frame-by-frame analysis, as well as take measurements using calibration markings and caliper. All imaging were performed with the iE33 ultrasound system (Phillips Medical Systems, Andover, Massachusetts).
As shown in Figures 3A and 3B, VC was acquired on a magnified parasternal long axis view. To image the VC, the transducer was angulated out of the standard imaging planes in an attempt to visualize the area of proximal flow acceleration, the VC, and the downstream expansion of the jet (2).
Participants were provided CFD images and were asked to rate the following parameters: 1) MR grade from 1 to 4 based on visual qualitative assessment of the jet area; 2) manual quantitative measurement of VC; and 3) quantitative measurement of PISA radius. If the echocardiologist found a parameter to be uninterpretable, it was noted. According to reported values of PISA radius, the EROA was calculated using the following formula: where RPISA is PISA radius (cm), Valiasing is aliasing velocity of the proximal flow convergence (cm/s), and Vmax is maximal velocity of continuous wave Doppler MR signal (cm/s).
The MR was graded as severe or nonsevere for each echocardiogram-Doppler parameter. MR was considered severe if VC was ≥7 mm, if EROA was ≥40 mm2, or the jet size–based grade was 3 or 4; MR was considered nonsevere if VC was ≤6.9 mm, EROA was ≤39 mm2, or jet size–based assessment graded it as 1 or 2.
The MR jet was considered eccentric if it was in close contact with the mitral leaflet behind the regurgitant orifice, and impinged to the medial or lateral wall of the left atrium, whereas central jets were initially directed into the center of the left atrium (2,3).
For each patient, temporal variability of PISA radius and VC width were assessed by measuring these parameters frame by frame throughout systole. We classified PISA and VC as having substantial pansystolic variation if there was greater than a 30% difference between the greatest and smallest values of any 2 different frames of the same cardiac cycle.
We classified PISA as nonspherical if the maximal and minimal radius of the proximal flow convergence demonstrated a greater than 30% variation within 1 cardiac cycle.
We used a cutoff of 30% to define substantial temporal variability of PISA radius and VC width as well as PISA sphericity to account for a 7% to 8% intraobserver variability of the small-scale measurements in our laboratory. In addition, a ≥30% difference in these parameters was readily detectable by visual assessment.
Figure 3 demonstrates identification of the effective regurgitant orifice, which was defined as visible if the transition of proximal flow convergence to the VC was clearly identifiable.
Overall, raw interobserver agreement (percent agreement) of ≥80% was classified as substantial agreement as this agreement rate was considered clinically adequate by all investigators. Values between 60% and 79% were classified as fair agreement and values below 60% as poor agreement (notably least possible interobserver agreement is 50%, which practically is equivalent to maximal disagreement).
Statistical analysis was performed with the SPSS 13.0 software (SPSS Inc., Chicago, Illinois). Continuous data are presented as mean ± SD. Categorical data are presented as absolute number or percentages. The significance level was set at p = 0.05.
Interobserver agreement was assessed using the overall raw agreement (percent agreement) and the Fleiss kappa statistic (12). Fleiss multirater kappa coefficient is an agreement rate adjusted for chance rate of agreement, and it was calculated using the Online Kappa Calculator developed by Randolph (13). For this purpose, we used Randolph's multirater variation of Brennan and Prediger's (14) free-marginal kappa. Raw agreement and multirater kappa were calculated for 3 methods (VC, PISA, and jet area–based MR severity) in 16 patients by 18 observers and were calculated for each of the individual patients to determine the range of agreement within each method across all participants. Subsequently, cumulative raw agreement and multirate kappa were calculated for 3 study methods in patients with substantial raw agreement (≥80%) and those with suboptimal raw agreement (<80%).
Mean and 95% confidence intervals of multirater kappa were derived for 3 study methods and t test was used for comparison of multirater kappa among various study methods.
The MR characteristics thought to be clinically significant were dichotomized. Fisher exact test was used to compare the significance of these predictive parameters on raw agreement of ≥80% versus <80%. Significant variables were included in the multivariate logistic regression analysis to predict raw agreement ≥80%.
A p value <0.05 was considered significant for all analyses.
Clinical and echocardiographic characteristics of 16 patients studied are summarized in Table 1: mean age was 69 ± 12 years and 56% were men. Eight (50%) had functional MR (6 with ischemic and 2 with dilated cardiomyopathy); 8 cases were degenerative (5 with mitral valve prolapse and 3 with flail leaflet). In 6 patients with functional MR, the jet was central. In all 8 degenerative MR patients and 2 functional MR cases, the jet was eccentric. For both the PISA radius as well as for VC width 7 (5 degenerative and 2 functional MR) cases demonstrated ≥30% frame-by-frame variability, whereas in the rest of the cases, the variability in these parameters was not substantial. The effective regurgitant orifice in the parasternal long axis view was visualized in 7 other cases (Fig. 2A).
Prevalence of severe MR
MR severity based on jet area was deemed interpretable by readers in 273 of 288 (95%) instances, whereas VC and PISA were interpreted in 206 of 288 (72%) and 207 of 288 (72%), respectively (p < 0.001).
Table 2 shows the proportion of observers who rated MR as severe according to various assessment methods in each patient. Using the parameter of jet area, echocardiologists graded MR as severe in 169 of 273 (62%) instances: using VC, MR was graded as severe in 74 of 206 (36%); and using PISA, measurements were consistent with severe MR in 85 of 207 (41%) times (p < 0.001).
Overall interobserver agreement on the classification of MR as severe or nonsevere was only fair for each of the study parameters. Raw agreement and kappa coefficient were: 75 ± 16% and 0.32 (95% confidence interval [CI]: 0.12 to 0.52), respectively, for jet area–based MR grade, 0.28 (95% CI: 0.11 to 0.45) and 75 ± 15% for VC measurements, and 78 ± 15% and 0.37 (95% CI: 0.16 to 0.58) for PISA measurements.
Figure 4 shows the absolute and percentage distribution of substantial (raw agreement ≥80%), fair (raw agreement 60% to 79%), and poor (raw agreement <60%) agreement for 3 study parameters. Raw agreement ≥80% was observed only in 7 (44%) patients for jet area–based assessment. In this subgroup, cumulative kappa coefficient was 0.72 (95% CI: 0.68 to 0.76). Substantial agreement for VC was achieved also in 7 (44%) cases, whereas kappa in this subgroup was 0.61 (95% CI: 0.58 to 0.64). For PISA measurement, substantial agreement was present in 6 (38%) patients, kappa 0.85 (95% CI: 0.84 to 0.86). The difference was not significant among the 3 study parameters. In the rest of the cases, the agreement was fair or poor.
As shown in Figure 5, separate analysis of interobserver agreement among interpreters practicing in a single institution and those from multiple centers demonstrated that the proportion of raw agreement ≥80% was insignificantly different for all 3 study parameters, including jet area: 56% for single institution versus 44% for multiple institutions (p = 0.72). For VC, raw agreement ≥80% was same for physicians practicing in single center versus multiple centers at 44% (p = 1.0), and for EROA measurement by PISA, raw agreement ≥80% between the 2 groups of observers was 56% versus 50% of the patients (p = 0.99). Similarly there was no significant difference in the proportion of fair or poor agreements for all 3 study methods.
Predictors of substantial interobserver agreement
To define the source of heterogeneous agreement on MR severity, etiologic and echocardiographic variables that may limit the accuracy of assessment for each study method were included in univariate and multivariate analysis. Table 3 shows the list of the variables evaluated for each study method as well as their value to predict raw agreement ≥80%.
The only significant univariate predictor of raw agreement ≥80% for the jet area–based method was functional etiology (p = 0.039). An eccentric jet had an insignificant influence on interobserver agreement. Multivariate analysis did not identify any significant predictor of raw agreement ≥80% (Table 3).
In univariate analysis, the statistically significant predictors for raw agreement ≥80% for VC were central MR (p = 0.013) as well as an identifiable effective regurgitant orifice (p = 0.049). Whereas in the multivariate analysis, only the measure of an identifiable effective regurgitant orifice showed a significant predictive power of raw agreement ≥80% (p = 0.035) (Table 3).
For the PISA radius measurement, the presence of a central MR jet (p = 0.003), fixed proximal flow convergence (p = 0.025), and functional etiology (p = 0.049) were statistically significant univariate variables for prediction of raw agreement ≥80%. By multivariate analysis, central regurgitant jet morphology was the only significant predictor of raw interobserver agreement ≥80% (p = 0.02) (Table 3).
This is the first multicenter study to evaluate the interobserver agreement of the quantitative parameters of VC width and PISA to differentiate severe from nonsevere MR. We found that classification of MR as severe as opposed to nonsevere using the quantitative CFD parameters of VC and PISA yielded only fair interobserver agreement (kappa: 0.28 to 0.37). The interobserver agreement for qualitative assessment for identifying severe from nonsevere MR was similar to the quantitative methods (kappa: 0.32). Our study group was composed of clinically experienced, practicing echocardiologists from 11 different academic institutions. Furthermore, we found that the interobserver agreement among echocardiologists practicing and instructing within the same institution was similar to the multicenter interobserver agreement and inferior to previously reported studies from single institutions validating the use of PISA and VC (3,6–11).
The VC width and EROA calculated by PISA are both affected by valve morphology and color flow jet characteristics. Incorporation of echocardiographic and etiologic variables in univariate and multivariate analysis revealed some of the sources of disagreement in our study. Various echocardiographic parameters were found to be significant in predicting good agreement for each study method in the univariate analysis. However, the presence of a central regurgitant jet was the only independent discriminator between suboptimal (raw agreement <80%) and substantial reproducibility (raw agreement ≥80%) for PISA radius measurement. In addition, the identification of an effective regurgitant orifice was the single multivariate predictor of raw agreement ≥80% for VC measurement.
Depending on whether observers use standardized measurements in an echocardiographic core laboratory, in a rigorously controlled research setting, or whether the MR grade is assessed by multiple enrolling sites with multiple echocardiologists, the rate of agreement is markedly different. Reliability of MR classification was reported in the EVEREST I (Endovascular Valve Edge-to-Edge Repair Study) trial. Two qualitative parameters—CFD jet characteristics and pulmonary vein flow—and 4 quantitative parameters—VC width, regurgitant volume, regurgitant fraction, and EROA—were evaluated. Parameters were integrated to form composite MR grade. Individual parameters and composite MR severity were found to be reproducible (15). However, considerable disagreement was reported in MR severity grading among echocardiogram readers in the Acorn Clinical Trial (16). All studies were interpreted to show significant MR at enrolling sites, but the core laboratory found that 41% of patients did not have significant MR: 7.4 % of patients had no detectable MR, 10.6% had mild MR, and 23.3% had moderate MR. Thomas et al. (17) proposed an MR index of 6 different indicators of MR: jet penetration into the left atrium, PISA radius, continuous wave jet intensity, pulmonary artery pressure, pulmonary venous flow pattern, and left atrial size. However, there was still a considerable overlap between moderate and severe MR (17).
There are several potential explanations for the limited reproducibility for the assessment of MR severity. Qualitative assessment of mitral regurgitation based on jet area varies on any given still frame and may thus be mischaracterized due to the frame-by-frame change in jet size. Whereas the eye rapidly integrates the change in jet area over time, this type of visual estimation can be highly subjective and varies among observers' assessment. The etiology of MR, the mitral valve morphology, and the regurgitant jet characteristics can further compound the MR assessment (4,5). Sources of error in the measurement of PISA include the presence of an eccentric jet, nonhemispheric geometry of proximal flow convergence, imprecise identification of regurgitant orifice (Fig. 2), and dynamic changes of PISA radius throughout systole (6,7). As with PISA, inherent characteristics of MR can influence the ability to accurately determine the VC width. The limitations of VC include an eccentric MR jet, dynamic VC, and inadequate visualization of any of the 3 components of VC (Fig. 3): proximal flow acceleration, the VC, and the downstream expansion of the jet (3,4).
The American Society of Echocardiography 2003 guidelines for the evaluation of valvular regurgitation emphasize an integrated approach for the classification of MR severity (2). However, in a subsequent large retrospective study in which severity of MR was quantified by Doppler echocardiography, 198 patients with an EROA greater than 0.40 cm2 had a 4% per year risk of cardiac death during a mean follow-up period of 2.7 years (18). These findings largely contributed to the recent American College of Cardiology/American Heart Association 2008 guideline (1) recommendation to consider mitral valve surgery for asymptomatic patients with severe MR. Given the current recommendations for surgery on asymptomatic patients with severe MR, it is crucial to accurately differentiate severe from nonsevere MR. However, our study demonstrates that not only qualitative CFD but also quantitative CFD parameters have limited reproducibility and thus may be suboptimal in the clinical setting. We believe that the findings of our study, which identifies the potential difficulties in reliability to differentiate severe from nonsevere MR, support the concept of Gaasch and Meyer (19), namely “to question the diagnosis of severe chronic MR when little or no left ventricular or left atrial enlargement is found.”
The relatively small number of patients studied (n = 16) is a limitation. However, there were 18 observers and consequently each study parameter was assessed over 200 times. Small sample size may result in overfitted models. However, it will be practically difficult to have 18 cardiologists evaluate MR in a bigger sample size by 3 different methods, although we do realize that it would be ideal to have a bigger sample size. We were attempting to differentiate severe from nonsevere MR, and we have not assessed entire spectrum of MR severity. Had patients with mild MR been included in this study, the interobserver agreement may have been better. The absence of a true gold standard for MR assessment is a potential limitation for any study as it precludes the evaluation of the accuracy of the interpretations. In the absence of such a gold standard, substantial variation exists in the classification of MR as severe in the same patient depending on the method used. These differences limit raw agreement rate maximums to an extent that readers may not appreciate.
As color Doppler flow methods have limited interobserver agreement, the classification of MR as severe and clinical decision making in patients with MR should incorporate the use of additional less equivocal and more reproducible parameters such as mitral inflow pattern, MR jet continuous Doppler profile, mitral inflow pattern, pulmonary venous flow, and mitral valve morphology, as well as left ventricular and left atrial dimensions and left ventricular systolic function and pulmonary artery systolic pressure as recommended by the American Society of Echocardiography (2).
The presence of a central MR jet improves the reproducibility of EROA assessment by PISA and the ability to identify the effective regurgitant orifice significantly enhances interobserver agreement of VC assessment. The VC, PISA, and CFD regurgitant jet–based assessments of MR grade have substantial interobserver agreement. Use of the VC and PISA method in routine clinical practice, and for clinical decision making based solely on EROA calculation in asymptomatic MR patients may be problematic.
- Abbreviations and Acronyms
- color flow Doppler
- effective regurgitant orifice area
- mitral regurgitation
- proximal isovelocity surface area
- vena contracta
- Received May 14, 2009.
- Revision received September 22, 2009.
- Accepted September 28, 2009.
- American College of Cardiology Foundation
- Bonow R.O.,
- Carabello B.A.,
- Chatterjee K.,
- et al.
- Hall S.A.,
- Brickner M.E.,
- Willett D.L.,
- Irani W.N.,
- Afridi I.,
- Grayburn P.A.
- Simpson I.A.,
- Shiota T.,
- Gharib M.,
- Sahn D.J.
- Schwammenthal E.,
- Chen C.,
- Benning F.,
- Block M.,
- Breithardt G.,
- Levine R.A.
- Buck T.,
- Plicht B.,
- Kahlert P.,
- Schenk I.M.,
- Hunold P.,
- Erbel R.
- Bargiggia G.S.,
- Tronconi L.,
- Sahn D.J.,
- et al.
- Enriquez-Sarano M.,
- Seward J.B.,
- Bailey K.R.,
- Tajik A.J.
- Fleiss J.L.
- Randolph J.J.
- Brennan R.L.,
- Prediger D.J.
- Thomas L.,
- Foster E.,
- Hoffman J.I.,
- Schiller N.B.
- Gaasch W.G.,
- Meyer T.E.