Machine learning is being used to process medical records

AWS now offers ML-driven medical record processing that deciphers freeform text.
7 December 2018

Negotiating medical records, courtesy of machine-learning. Source: Shutterstock

Readers in the UK will be aware of the billions of pounds wasted in recent years attempting to provide the country’s state-controlled & funded healthcare system, the NHS, with a unifying IT system across hospitals, departments, and medical disciplines.

The source of the multiple failures in this respect is a matter of some politically-charged debate, but one of the issues– and one that bedevils any IT project involving medical care– is that medical data is often stored in unstructured text. Extracting this text and presenting it in an interchangeable unified manner is very complex, and therefore expensive to achieve.

On Tuesday, Amazon announced the launch of Amazon Comprehend Medical, a cloud-hosted service that gathers patients’ details from a variety of sources, without much in the way of human intervention to identify and classify it.

Health and patient data come in many forms: doctors’ notes, prescription details, transcripts of interviews, radiography results in text and images, pathology reports and test results. Identifying this information takes a duo of skills: developers who can write customized code to extract the data, but with the ability to recognize specific medical data for what it is (medicine jargon is rife with abbreviations, acronyms, and similes).

In the past, teams have had to employ skilled medical personnel to help with any application or service that relied on medical records’ many different types, and their differing content & formats.

As a further complication, medicine relies on a changing scheme of classifications, of diseases, procedures, surgical operations and so forth. Any change to the classification codes requires a rewrite of (often) many dozens of routines, meaning that the technology is always one step behind.

Amazon Comprehend Medical identifies common types of medical information for developers automatically, with very high degrees of accuracy, and without having to write many different custom rules and address multiple APIs. The machine-learning powered code identifies anatomical terms, medicines, test result vernacular, procedures, treatment details and descriptions of medical conditions.

The platform offers two APIs which can be integrated into existing code with just a few simple calls: Medical Named Entity and Relationship Extraction (NERe), and Protected Health Information Data Extraction and Identification (PHId). The latter is used only to identify Protected Health Information (PHI), such as the patient’s name and address.

In IT projects for the medical sector, the sensitive nature of PHI adds a layer of data security complications. Anonymized data is incredibly useful for trials, statistical analysis, and trend identification, but as most medical records are identified by patient, sensitive use of data– plus legal ramifications– means that developers tread in a minefield.

Amazon Comprehend Medical draws medical information from patient data it is given from Amazon S3 sources and returns structured results. For example, developers can identify individuals at risk of a particular disease by extracting diagnosis, signs, and symptoms from thousands of clinical notes.

Like many machine-learning based systems, the Amazon platform can emulate human decision making, identifying abbreviations, misspellings, and typos in the text, without having to revert back to medical staff for advice on content.

“For cancer patients and the researchers dedicated to curing them, time is the limiting resource. The process of developing clinical trials and connecting them with the right patients requires research teams to sift through and label mountains of unstructured clinical record data. Amazon Comprehend Medical […] is a vital step toward getting researchers rapid access to the information they need when they need it so they can find actionable insights to advance lifesaving therapies for patients,” said Matthew Trunnell, CIO at the Fred Hutchinson Cancer Research Center, according to Amazon.

The platform does not store data itself, and the service is HIPAA eligible, identifying PHI stored in record systems and adhering to the standards GDPR (General Data Protection Regulation).

The service has a free tier for trail purposes; after that Amazon charges a few cents for blocks of 100 text characters. Learn more here.