Increased ERCP volume improves cholangiogram interpretation: a new performance measure for ERCP training?
Article information
Abstract
Background/Aims
Cholangiogram interpretation is not used as a key performance indicator (KPI) of endoscopic retrograde cholangiopancreatography (ERCP) training, and national societies recommend different minimum numbers per annum to maintain competence. This study aimed to determine the relationship between correct ERCP cholangiogram interpretation and experience.
Methods
One hundred fifty ERCPists were surveyed to appropriately interpret ERCP cholangiographic findings. There were three groups of 50 participants each: “Trainees,” “Consultants group 1” (performed >75 ERCPs per year), and “Consultants group 2” (performed >100 ERCPs per year).
Results
Trainees was inferior to Consultants groups 1 and 2 in identifying all findings except choledocholithiasis outside the intrahepatic duct on the initial or completion/occlusion cholangiogram. Consultants group 1 was inferior to Consultants group 2 in identifying Strasberg type A bile leaks (odds ratio [OR], 0.86; 95% confidence interval [CI], 0.77–0.96), Strasberg type B (OR, 0.84; 95% CI, 0.74–0.95), and Bismuth type 2 hilar strictures (OR, 0.81; 95% CI, 0.69–0.95).
Conclusions
This investigation supports the notion that cholangiogram interpretation improves with increased annual ERCP case volumes. Thus, a higher annual volume of procedures performed may improve the ability to correctly interpret particularly difficult findings. Cholangiogram interpretation, in addition to bile duct cannulation, could be considered as another KPI of ERCP training.
INTRODUCTION
Endoscopic retrograde cholangiopancreatography (ERCP) is a technically challenging gastrointestinal procedure. However, ERCP has become more a therapeutic than diagnostic procedure.1,2 The type of intervention required often depends on interpretations of the cholangiograms. Training in ERCP is difficult to obtain during the standard fellowship timeline, which has led to an increase in advanced endoscopy fellowship programs.3
The shift in ERCP training has led to the development of guidelines that define competency for this procedure.4-7 The American Society of Gastrointestinal Endoscopy (ASGE) recommends 200 ERCPs per trainee, with selective cannulation rates of the common bile duct >80%–90% in a native papilla as a surrogate marker of trainee competence.8 Alternatively, the British Society of Gastroenterology (BSG) recommends successful cannulation of the target duct, common bile duct (CBD) stone clearance, and stent/cytology of extrahepatic strictures as key performance indicators (KPIs) and a minimum of 75 procedures per year to maintain competence, although one should aspire to complete more than 100.9 Table 1 describes the similarities and differences in the achievement and maintenance guidelines of competence stated by the ASGE, BSG, and European Society for Gastrointestinal Endoscopy (ESGE).9-11
After successful cannulation, visualization of the biliary tree and interpretation of the images is a crucial step in ERCP. There is a paucity of data regarding this aspect of the procedure. Moreover, whether this skill develops with increased case volumes has not previously been described. The primary aim of this study was to determine whether variation in the case volume of ERCPs per year correlated with the ability to correctly interpret cholangiograms.
METHODS
Thirteen cholangiograms performed by experienced ERCP endoscopists were independently reported and verified by a consultant hepatopancreatobiliary (HPB) radiologist. The cholangiograms were presented as static images and classified based on findings as follows: Strasberg type A bile leak (n=1), Strasberg type B (n=1), Bismuth type 1 hilar stricture (n=2), Bismuth type 2 hilar stricture (n=1), distal CBD stricture (n=1), initial cholangiogram (IC) of choledocholithiasis with large stones, i.e., >10 mm (n=1), completion/occlusion cholangiogram (CC) after removal of a large stone (n=1), IC of choledocholithiasis with small stones, i.e., <10 mm (n=1), CC after removal of a small stone (n=1), IC of choledocholithiasis with intrahepatic duct calculus (n=1), CC of choledocholithiasis with intrahepatic duct calculus (n=1), and a normal cholangiogram with a mildly dilated CBD (n=1). These images are presented in Figure 1. The administered survey was included as Supplementary Figure 1.
Images were digitalized, anonymized, and electronically distributed to more than 200 gastroenterologists throughout the world to limit single institution bias between June 2019 and March 2020. The survey participants were gastroenterologists acknowledged by the study authors and small HPB units to avoid bias. Surgeons and radiologists who performed the ERCPs were excluded from the study. Fifty respondents would allow for satisfactory power and appropriate statistical analysis based on a previous survey study of correct interpretation by surgical trainees of intraoperative cholangiograms in laparoscopic cholecystectomy.12 The first 50 respondents who completed the questionnaire from each category were included in data analysis.
The open-ended questionnaire asked participants to correctly identify each cholangiographic finding through a single static image. The questionnaire was sent as a Word document containing a table of 13 images with limited clinical information and a blank space next to each for an interpretation. For the relevant images, the questionnaire clearly stated whether the image was an IC or a CC performed at the end of the procedure. Participants were asked to interpret each cholangiogram. The findings were compared with the consultant radiologist’s report and identified as correct or incorrect.
Participants were also asked to state their status as either trainee or independent and report the number of ERCPs they performed per annum. Participants were grouped into three categories: “Trainees” who had performed more than 150 ERCPs independently or with minimal assistance; “Consultants group 1,” between 75 and 100 ERCPs per year; and “Consultants group 2,” more than 100 ERCPs per year.
Statistical analysis was performed using IBM SPSS ver. 27.0 (IBM Corp., Armonk, NY, USA) with member checking. Fisher exact test was used to identify statistically significant differences between the various groups. Statistical significance was set at p<0.05. Mantel-Haenszel odds ratios (ORs) and 95% confidence intervals (CIs) were also calculated.
Ethical statements
This international survey was regarded as an educational project, and approval from the Ethics or Health Research Authority was not required.
RESULTS
There were 50 members in each group (Trainees, Consultants group 1, and Consultants group 2). The survey results indicating the correct identification of the lesion are described in Table 2. Mantel-Haenszel ORs, 95% CIs, and p-values are reported in Table 3.
Trainees were noninferior to consultants regarding stone identification. This included IC choledocholithiasis with large stones (p=0.056), CC choledocholithiasis with large stones (p=0.117), and CC choledocholithiasis with small stones (p=0.056).
Trainees were inferior to consultants in identifying particularly complex findings, including Strasberg type B (p=0.023, OR 0.738, 95% CI 0.576–0.946), all hilar strictures (Bismuth type 1: p<0.001, OR 0.65, 95% CI 0.51–0.83; a second Bismuth type 1 stricture: p<0.001, OR 0.61, 95% CI 0.47–0.81; and Bismuth type 2: p=0.006, OR 0.64, 95% CI 0.47–0.88), and distal CBD strictures (p<0.001, OR 0.69, 95% CI 0.57–0.84). Trainees were also inferior to consultants in identifying IC and CC images of choledocholithiasis with intrahepatic duct calculus (p=0.003, OR 0.67, 95% CI 0.52–0.88; p=0.013, OR 0.71, 95% CI 0.55–0.92, respectively).
Consultants group 1 was inferior to Consultants group 2 regarding several important findings, including the images of Strasberg type A bile leak (p=0.012, OR 0.86, 95% CI 0.77–0.96) and Strasberg type B (p=0.006, OR 0.84, 95% CI 0.74–0.95). Consultants group 1 was also less likely than Consultants group 2 to correctly identify Bismuth type 2 hilar strictures (p=0.015, OR 0.81, 95% CI 0.69–0.95).
DISCUSSION
Although it is perhaps not surprising that trainees demonstrated less diagnostic ability than independently practicing consultants, it reinforces the importance of cholangiogram interpretation as a key aspect of ERCP learning, which should be assessed.
Regarding the less complex findings (IC of choledocholithiasis with large stones >10 mm and CC with small stones <10 mm), there were no differences between consultants and trainees. Consultants were superior to trainees in identifying particularly complex findings such as Strasberg type B, Bismuth type 1 and 2 hilar strictures, distal CBD strictures, and IC and CC intrahepatic duct calculi. Most interesting was the finding that low volume, independent operators fared worse than did high volume operators in identifying significant pathology despite being “trained.” Increased annual case volume correlated with an increased ability to correctly identify Strasberg type A and B bile injuries and Bismuth type 2 hilar strictures.
The relative difficulty of ERCP procedures may vary depending on patient characteristics, biliary anatomy, procedural indication, and intervention.7,13 According to one grading system, ERCP procedures for small-to-medium sized biliary stones are less difficult to identify than extrahepatic strictures.7,14 This is congruent with the findings of the present study. Consultants were no better than trainees in identifying most of the stones. But for the majority of other findings, consultants were consistently able to identify more lesion types than trainees were. Those who completed more than 100 procedures per year outperformed those who did not. These findings impugn what determines acceptable competence in comparison to achievable standards.
This study found that consultants in Consultants group 1 were inferior to those in Consultants group 2 in identifying Strasberg type A and B bile injuries and Bismuth type 2 hilar strictures. ERCP is an established minimally invasive management option for biliary leaks.15 Subsequently, low-grade Bismuth lesions are increasingly under the purview of advanced endoscopists rather than surgeons due to a decreased early complication rate, especially considering surgery in the early postoperative phase is associated with an 80% complication rate.16,17 However, serious adverse events are more likely when therapeutic procedures are performed than when imaging is used alone during ERCP (5%–10% vs. 1%–3%, respectively).18-20 ERCP is associated with recognized risks, including pancreatitis, infection, bleeding, perforation, and anesthesia-related complications.21 In addition to these procedure-related complications, studies have shown that complications such as overlooking common duct stones and delaying correct diagnosis may arise from failure to correctly interpret ERCP images.22,23 Indeed, this does appear to be a concern for ERCP trainees; a recent survey study reported that only 26% of trainees received formal training and 97% expressed a desire for further training.24 Therefore, maximizing operator cholangiogram interpretation is imperative to maximize the therapeutic efficacy of ERCP.
A recent systematic review investigated ERCP training and attempted to define what outcomes were being measured or overlooked.6 Current literature identifies cannulation rate of a native papilla as the most appropriate measure of ERCP training. However, there was wide variation in other outcomes measured by such studies (for example, the proportion of procedures completed without assistance or successful completion of the therapeutic maneuver). The authors suggested that future studies include six additional independent variables to the reporting standards for ERCP training: previous trainee experience in ERCP, time and method allowed for cannulation attempts, role of supervisor intervention, selective cannulation rate for CBD in native papilla cases, competence threshold and assessment, and procedure-related complication rate. The authors, although advocating a broader approach to ERCP training, did not mention cholangiogram interpretation as a requisite skill. However, a recent multicenter cohort study included cholangiography skills as 3 of 18 skills for ERCP.4 Failure to correctly interpret intraprocedural cholangiograms may lead to incorrect treatment decisions and deleterious patient outcomes. Therefore, this skill should be included in ERCP training and tested in future studies that will be investigating the learning of advanced endoscopists.
This study has several limitations. First, the participants were presented with static images without complete clinical context. Dynamic images, such as those viewed when performing the procedure, may offer additional diagnostic information. This may be especially true for clinical scenarios such as bile leaks. Additionally, the clinical context (such as brief clinical histories) may impact the pretest probability of a participant’s answer, aiding in ascertaining the correct lesion. However, these limiting factors were applicable to all three groups, thus all the participants would have experienced these disadvantages. Nevertheless, consultants with higher per annum volumes outperformed trainees, indicating that even without the entire clinical picture, they were able to outperform their lesser-experienced counterparts. Second, it may have been ideal to test various images of the same condition rather than just one. This could have tested interrater reliability. Third, the survey was open-ended rather than multiple choice; the authors felt that this more accurately represented real clinical circumstances. Varied responses could have introduced subjectivity of the graders, but this was not found to be the case. For example, participants either correctly identified a bile leak or did not. Fourth, the increase in the familywise error rate across the reported statistical analyses was not controlled, which could have increased the chances of a type 1 error. Overall, we consider this research to be relatively preliminary and encourage replication. Furthermore, there appears to be discussions in the literature on whether the Bonferroni correction to address this increased possibility of a type 1 error is necessary because it may increase the chances of type 2 errors.25,26 Fifth, participants were separated into three categories based on their ERCP trainee status and number of cases per year. These cutoff values were decided upon based on the current European guidelines, as indicated. Arguably, these are arbitrary numbers and a different approach, such as considering learning curves, years of experience, or previous accreditation/training through a fellowship program, may have been preferable to categorize the participants. Furthermore, although the nomenclature of Trainees, Consultants group 1, and Consultants group 2 was applied, the duration post-fellowship or prior training experience was not implied. Demographics other than the number of ERCPs performed per year were not collected from the participants, and this may have influenced the results. A future study could investigate how correct lesion identification improves over time with increased ERCP performance by following up with participants. This was, however, beyond the scope of the present study.
This is the first study to investigate correct cholangiogram interpretation in relation to ERCP case volume. The findings of this study provide support for the minimum number of ERCP cases required per year per physician, as suggested by the BSG. The results also provide objective evidence for an improved ability to diagnose pathology with increased ERCP performance per year. Our findings are consistent with guidelines from American and European gastroenterology groups. In addition, this survey investigated the ability to correctly identify several clinically important lesions rather than solely assessing a single pathological finding.
The findings of this study may help influence ERCP training and the monitoring of practice quality. The KPI categories and recommendations from different governing societies are presented in Table 1. Success rates of >90% were deemed as KPIs for understanding indications for ERCP, deep cannulation, stone clearance, and stent placement. This score was the highest achieved in Consultants group 2. Therefore, >90% correct identification of intraprocedural cholangiogram interpretation may be an additional KPI that should be used during advanced endoscopy training. Although successful cannulation is currently the target marker, the authors argue that this is just the first step in a successful ERCP, and understanding the cholangiogram afterwards is just as important. The percent correct scores found here under the trainees versus consultants in category 1 may help guide in-training exams for fellowship programs to this effect. These scores may provide objective benchmarks for success, especially in identifying different structural lesions, rather than measuring overall subjective progression. Future studies could assess cholangiogram interpretation of trainees during different stages in fellowship programs to ascertain the learning curves and identify barriers to progression and teaching. This is important because trainees are prone to incurring ERCP complications.27,28 They could also further investigate how different demographic factors may influence cholangiogram interpretation, such as years of experience versus per annum case volume. Overall, the authors advocate that the BSG KPI minimum of 75 ERCPs per year be increased to 100 ERCPs per year (the BSG “aspirational” goal). Those practitioners who increase their case volume seem to have an improved ability to recognize particularly complex lesions, specifically the type of lesions (Bismuth bile leaks and hilar strictures) increasingly becoming the purview of endoscopists rather than surgeons.
In conclusion, this study demonstrates that operators with higher case volumes perform better at cholangiogram interpretation than those with lower case volumes. Because correct intraprocedural cholangiogram interpretation is imperative for improving ERCP efficacy, it may be included as a KPI of ERCP training and appraisal alongside cannulation and other KPIs.
Supplementary Material
Supplementary materials related to this article can be found online at https://doi.org/10.5946/ce.2021.239.
Notes
Conflicts of Interest
The authors have no potential conflicts of interest.
Funding
None.
Acknowledgments
This was an international survey project that would not have been possible without the help of colleagues around the world. Their participation in this project is appreciated.
Author Contributions
Conceptualization: SAh, SK, SKN, NT; Data curation: SAm, BM, SAh, SK, SKN, MW, NT; Formal analysis: SV, SAm, BM, MW; Investigation: SV, NT; Methodology: SV, NT; Project administration: SAh, SK, SKN, MW, NT; Resources: NT; Software: SV; Supervision: SAm, BM, SAh, SK, SKN, MW, NT; Validation: SAm, BM, MW, NT; Writing-original draft: SV; Writing-review & editing: all authors.