At the annual symposium hosted by the American Medical Information Association (AMIA) next week, researchers from MIT will present a new natural language processing (NLP) system that is 75% accurate in deciphering words with multiple meanings in the freehand portion of a physician’s medical notes contained in an electronic health record (EHR).
Word-sense disambiguation is one of the most challenging aspects of natural language processing, the ability for computers to use algorithms to extract meaningful data from narrative text. Words such as “discharge” can have more than one meaning – a bodily secretion or the release of a patient from a hospital setting – and interpreting which sense is relevant in a document is a vital part of making sense of a text.
MIT post-doc Anna Rumshisky, who led the new study, says that inspiration was taken from a branch of research known as topic modeling, which seeks to automatically identify the topics of documents by making inferences about the relationships among prominently featured words. Topic modeling assigns a mathematical weight to each theme in a text, using an algorithm to determine which sense of a word is likely meant based on the surrounding language.
This “fundamentally new approach” will allow much more accurate systems to function without human supervision, reducing the time and cost of manual transcription efforts. The more data that is processed by the system, the more accurate it becomes as it learns where it was correct and where its inferences failed. Rumshisky says there plans to include the thesaurus of medical terms known as the Unified Medical Language System (UMLS) to widen the system’s available knowledge base and improve its word association capabilities.
Between sixty and eighty percent of the meaningful information collected by (EHR) systems is locked away in the physician’s narrative notes. Natural language processing, whether through MIT’s method or other techniques, is the key to letting physicians use EHR system to their fullest, enabling them to record patient data in an organic manner and still provide data that is useful for automatic clinical reporting, coding, and electronic transfer.