Graduate Student University of California, San Francisco, School of Medicine San Francisco, California, United States
Background: Social risk factors, such as housing instability and transportation needs, are prevalent among children, impacting their health. Detecting these factors in electronic health records (EHRs) is crucial for understanding their impact and assisting families in need. However, this data is often challenging to capture within EHRs, particularly in structured fields like ICD codes. Instead, this information is frequently captured in clinical notes. Leveraging natural language processing (NLP) to extract social risk factors from pediatric clinical notes can significantly improve the efficiency of identifying and addressing these issues. Objective: Our primary objective is to identify financial strain, unstable housing, transportation needs, and employment problems by applying NLP techniques, specifically the "clinical Text Analysis and Knowledge Extraction System" (cTAKES), to clinical notes. Additionally, we aim to assess the performance of cTAKES in accurately identifying social needs in these notes. Design/Methods: We conducted a retrospective cohort study on children admitted to a general pediatric unit at an urban tertiary hospital from January 2011 to December 2021. We collected both structured clinical data (e.g., social ICD codes) and unstructured clinical note data (free text) from the UCSF de-identified clinical data warehouse. To extract social risk factors, we developed an ontology of concept unique identifier (CUI) codes and SNOMED CT codes, extending UCSF Social Interventions Research and Evaluation Network (SIREN) codes. Additionally, we utilized UMLS mapping to convert SNOMED codes to CUI codes via the NIH National Library of Medicine UMLS Terminology Services. cTAKES, integrated into the UCSF Information Commons System, was employed to identify and extract these factors. To evaluate cTAKES' performance, three trained annotators assessed a random sample of 200 clinical notes for each social risk factor domain. We will complete manual annotation and evaluate cTAKES performance by February 2024.