AWS makes human review easier for machine learning
May 5, 2020
Amazon Web Services (AWS) has introduced Augmented Artificial Intelligence (A2I), a managed service that makes it easier to add human review to machine-learning predictions to improve model and application accuracy by continuously identifying and improving low confidence predictions.
Amazon A2I helps developers add human review for model predictions to new or existing applications using reviewers from Amazon Mechanical Turk, third party vendors or their own employees. It makes it easier for developers to build the human review system, structure the review process and manage the human review workforce.
For example, developers could use A2I to spin up and manage a workforce of humans to review and validate the accuracy of machine-learning predictions for an application that extracts financial information from scanned mortgage documents or an application that uses image recognition to identify counterfeit items online, so that the quality of results improve over time.
Today, machine learning provides highly accurate predictions, known as inferences, for a variety of use cases, including identifying objects in images, extracting text from scanned documents, or transcribing and understanding spoken language. In each case, machine-learning models provide an inference and a confidence score that expresses how certain the model is in its prediction. The higher the confidence number, the more the result can be trusted.
Typically, when developers receive a high confidence result they can trust that the prediction is accurate, and, depending on the use case, they can use it to automate a process. For example, developers of a social media application that matches a user’s photos to celebrity faces might rely on an 80% confidence score to generate and return a lot of entertaining matches. However, there are other times when it is strongly recommended to have both high confidence – up to 99% – and human review, such as public safety use cases involving law enforcement.
In situations where the confidence score is lower than desired or human judgment is required, reviews can be used to validate the prediction. This interplay between machine learning and human reviewers is critical to the success of machine-learning systems, but human reviews are challenging and expensive to build and operate at scale, often involving multiple workflow steps, operating custom software to manage human review tasks and results, and recruiting and managing large groups of reviewers.
As a result, developers sometimes spend more time managing the human review process than building the intended application, or they have to forego having human reviews, which leads to less confidence in deploying applications that use machine learning.
With A2I, developers can add human review to machine-learning applications without the need to build or manage expensive and cumbersome systems for human review. A2I provides over 60 pre-built human review workflows for common machine-learning tasks – such as object detection in images, transcription of speech and content moderation – that allow machine-learning predictions from Amazon Rekognition and Textract to be human-reviewed more easily.
Developers who build custom machine-learning models in Amazon SageMaker or other on-premises or cloud tools can set up human review for their specific use case in the Augmented AI console or via its API. After setting a confidence threshold for model predictions, developers can choose to have predictions below that threshold reviewed by Mechanical Turk and its 500,000 global workforce of independent contractors, third-party organisations who specialise in business process outsourcing – such as iVision, CapeStart and iMerit – or their own private, in-house reviewers.
Developers can specify the number of workers per review and A2I then routes each review to the precise number of reviewers. For example, a company building a system for processing financial loan applications using Textract can easily configure A2I to work with Textract outputs such that forms that have a confidence score less than 99% will be routed to human reviewers from their private workforce. Human-validated results are stored in Amazon Simple Storage Service (S3), and developers can set up Amazon CloudWatch Events notifications to review metadata about inference accuracy and retrieve the results.
“We often hear from our customers that Amazon SageMaker helps speed training, tuning and deploying custom machine-learning models, while fully managed services like Amazon Rekognition and Amazon Textract make it easy to build applications that incorporate machine learning without requiring any machine-learning expertise,” said Swami Sivasubramanian, AWS vice president. “But even with these advancements, our customers still say there are critical use cases where human judgment is required like in law enforcement investigations, or times when human review can be used to resolve the ambiguity in predictions when confidence levels fall below a given threshold for less sensitive use cases, and the current human review process involves a lot of custom effort and cost. We’re excited to help our customers remove another obstacle to building machine-learning applications with the launch of Amazon A2I, which makes it significantly easier and faster to incorporate human judgment into machine-learning applications in order to ensure higher quality predictions over a sustained period of time.”
T-Mobile in the USA is redefining the way consumers and businesses buy wireless services through product and service innovation.
"Providing relevant information, such as account details and available discounts, in real time to our customer care agents while they are in live conversations with customers is one of the ways T-Mobile uses machine learning to improve customer experience,” said Heather Nollis, machine-learning engineer at T-Mobile. “Using A2I, we will be able to ensure that our models continuously deliver top-quality insights by having humans validate random samples of model predictions. Trust is the hardest thing to build when it comes to machine learning, and A2I will allow us to make sure that our models are making the fewest mistakes."