Colonoscopy is used for Colorectal Cancer (CRC) screening. Extracting details of the colonoscopy findings from free text in Electronic Health Records (EHRs) can be used to determine patient risk for CRC and colorectal screening strategies. We developed and evaluated the accuracy of a deep learning model framework to extract information for the clinical decision support system to interpret relevant free-text reports, including indications, pathology, and findings notes. The Bio-Bi-LSTM-CRF framework was developed using Bidirectional Long Short-term Memory (Bi-LSTM) and Conditional Random Fields (CRF) to extract several clinical features from these free-text reports including indications for the colonoscopy, findings during the colonoscopy, and pathology of resected material. We trained the Bio-Bi-LSTM-CRF and existing Bi-LSTM-CRF models on 80% of 4,000 manually annotated notes from 3,867 patients. These clinical notes were from a group of patients over 40 years of age enrolled in four Veterans Affairs Medical Centres. A total of 10% of the remaining annotated notes were used to train hyper parameter and the remaining 10% were used to evaluate the accuracy of our model Bio-Bi-LSTM-CRF and compare to Bi-LSTMCRF. Our experiments show that the bidirectional encoder representations by integrating dictionary function vector from Bio-Bi-LSTM-CRF and strategies character sequence embedding approach is an effective way to identify colonoscopy features from EHR-extracted clinical notes. The Bio-Bi-LSTM-CRF model creates new opportunities to identify patients at risk for colon cancer and study their health outcomes.
Published Date: 2023-02-17; Received Date: 2023-01-16