Staff Specialist Neonatologist The Children’s Hospital at Westmead Kellyville, New South Wales, Australia
Background: There is huge investment in building infrastructure across the globe to enable clinicians and researchers to capture high resolution physiological data from critically unwell infants. The high-resolution data often consist of lot of artefacts due to patient movements, blood samplings and probe malposition. These artefacts need to be addressed before meaningful interpretation. Manual inspection and removal of the artefact in large datasets is not sustainable and prone to errors. A reliable machine learning algorithm to detect and remove artefacts from large datasets is needed. Objective: A proof of concept to test machine learning algorithm to detect artefacts in the invasive blood pressure data recorded from newborn infants < 30 weeks gestation. Design/Methods: This is a retrospective secondary analysis of data of a subset of infants in a prospective cohort study investigating the effect of cord clamping on cerebral oxygenation in newborn infants < 30 week gestation. Those with arterial catheter for blood pressure monitoring were included in this analysis. The blood pressure data was displayed in a graphical format with custom built C# software application with the ability to zoom into individual data points. One of the authors (EL) was trained to use the application and mark the anomalous data points. The mean blood pressure data was analysed with an ‘anomaly’ package in R software which comprised of four hyperparameters (frequency, trend, alpha, maximum anomalies). The machine learning algorithm was used to find the best hyperparameters. The accuracy metrics (sensitivity, specificity, Precision, Recall, F1-score) were calculated by comparing the machine learning algorithm to the manual identification of the anomalies by the clinician. Results: High resolution data of blood pressure signals with an average duration of 19.4hours from 43 preterm infants were analyzed. There were a total of 3.4 million data points in the dataset. The machine learning algorithm with the combination of hyperparameters (alpha=0.055, frequency=5minutes, maximum anomalies = 5.5%, trend=11hours) had the best accuracy metrics (Sensitivity 98.6, specificity 99.5, Precision 99, Recall 98.6, F1-score 98.8) when compared with the manually detected anomaly data.
Conclusion(s): A machine learning algorithm with the best hyperparameters with the highest accuracy is reported. This machine learning algorithm needs validation in a prospective study and could be potentially extended to other physiological signals with significant research and clinical implications.