Resident Physician Phoenix Children's Hospital Phoenix, Arizona, United States
Background: The utility of point of care lung ultrasound (LUS) in Neonatal intensive care units is evolving with few studies reporting high accuracy measures of LUS scores in the diagnosis of respiratory distress syndrome (RDS), transient tachypnea of the newborn (TTN) and prediction of bronchopulmonary dysplasia (BPD). Many studies have relied on point-of-care LUS experts in reporting and scoring the LUS. The training, skill and expertise to perform and score LUS are not standardized with the inter-rater reliability likely to be variable, particularly amongst trainees and inexperienced clinicians. Objective: To evaluate the inter-rater reliability of LUS scores performed by investigators as part of a multicenter prospective study of point of care LUS in preterm infants at risk for respiratory distress. Design/Methods: Prospective study of LUS in preterm infants (born at < 32 weeks gestation) at risk for respiratory distress. The LUS was performed by investigators after completing an online training course and/or 10 independent US procedures. Each LUS image was scored from 0 to 3, with higher scores indicating increased severity of lung pathology (Modified Brat classification, described in Figure 1). The LUS was scored by the performing investigators, then scored by two blind reviewers with expertise in LUS. Inter-rater reliability was analyzed with intraclass correlation coefficient (ICC). Results: A total of 780 LUS images from 130 LUS procedures performed at three participating centers were analyzed. The LUS was performed by six investigators including four trainees with no prior experience in LUS and scored by two blind reviewers. Overall, LUS scored by the investigators had excellent overall correlation with reviewers (ICC=0.93, p< 0.01). Amongst the three centers, NICU 1 and NICU 2 had excellent correlation (ICC= 0.90, p< 0.01) and (ICC=0.98, p< 0.01) respectively. NICU 3 had very good correlation (ICC=0.85, p< 0.01). Three of the investigators with limited US experience showed good correlation (ICC>0.88, p< 0.01), whereas one investigator had poor correlation (ICC 0.61, p< 0.10) (Table 1).
Conclusion(s): During early enrollment of our prospective study, LUS performed and scored by investigators following online training and small number of independent procedures had excellent correlation suggesting a short learning curve. We also describe variability among centers and individual providers which we hope to minimize with ongoing education, training, and revaluation. Improving reliability and accuracy of LUS scoring by the clinicians will optimize the utility of point of care LUS as a real-time imaging modality.