Ashley Keil, IBML VP sales, EMEA/APAC, explains how BPOs and enterprises can close in on 100% data accuracy as part of their capture processes and business workflows
If you thought the missions undertaken by IMF1 agent Ethan Hunt in the popular action spy series are tough – or impossible – then spare a thought for CIOs, IT and information management professionals tasked with looking after company data. Their mission to manage data is getting considerably more difficult as data arrives from more sources, in more formats and in greater quantities than ever before.
In its latest Global DataSphere Forecast2, published in May, IDC states that more than 59 zettabytes of data will be created, captured, copied and consumed in the world this year and that in the next three years more data will be produced than in the previous 30. That’s mind boggling.
Yet according to Gartner3, 40% of an enterprise’s data is inaccurate, missing or incomplete. Less than half (47%) of organisations consider their data to be of high quality, compared to 13% that describe it as poor.
The implications are significant, as data is the lifeblood of any organisation, feeding backend processes, powering decisions and fuelling profits.
Bad, inaccurate data erodes operational efficiency, slows down decision-making, stunts ROI, makes delivering SLAs tricky, adds commercial risk, impairs the customer experience and damages relationships. Ultimately, it’s bad for the bottom line, too, with data governance very much part of GDPR rules and subject to penalties for non-compliance.
Of course, not all is doom and gloom. Many organisations and their BPO service partners have made considerable headway in automating data capture processes by investing in intelligent capture technologies that extract data from printed mail, email, fax messages, scanned documents, smartphone images and other sources and integrate easily with line of business systems through open APIs.
Today, Artificial Intelligence (AI) and machine learning platforms perform complex data capture with minimal operator intervention, achieving accuracy rates of between 80% and 95%. The variation comes from having to deal with, say, crumpled or torn paper, illegible handwriting or highlighted text. It’s just more challenging for recognition engines to extract and convert this kind of information into ASCII files for ingestion into downstream business processes.
So, what are the options if accuracy rates of 80, 85, 90 or 95% aren’t good enough? How can the ‘last mile’ of data capture be improved to get to the nirvana of 100% accuracy without the considerable expense of adding more headcount to review capture results and manually rekey information?
Achieving data perfection
The answer lies in a multi-faceted approach based on a mix of four main components:
*1 Best of breed capture technologies;
*2 Rules-driven capture and validation;
*3 AI-driven matching; and
*4 Human and AI-powered triple data entry.
The use of capture technologies will be familiar to many readers. What might be less well known is the speed and power of some of these solutions. High performance intelligent scanners today can process volumes of up to 730 A4 pages per minute and come with real-time, in-line intelligence that recognises different document types and extracts data early in the process to minimise errors downstream.
Importantly, business rules can be set to capture and validate field-level metadata, enabling a scanner to check whether an application form has a signature, for example, or whether exam scripts have the right number of pages in the correct order. Remedial action can be programmed in if they don’t. What’s more, this happens in real time as documents are in motion on the scanner.
The addition of AI-driven matching solutions, either integrated with the scanner or independent of it, enables the cross-referencing and matching of multiple incomplete or incorrect data fields against master database sources so that errors can be flagged and dealt with immediately. Partial metadata captures that are inaccurate can be pieced together and combined to correct and validate the information being processed before it is accepted into a business system.
A simple example would be the scanning of a dirty, torn envelope where part of the address is unreadable. By assessing all the fields and the text and cross-referencing this extraction in a master database, which might hold millions of customer records, the AI solution can bring partial ‘reads’ together to get a qualified and accurate result within milliseconds.
Help from the crowd
A fourth way to achieve clean data is to use scalable, automated crowd sourcing to do what’s called triple data entry. This pretty much guarantees data accuracy and is ideal for a range of applications like forms and loans processing, prescription management, mail room and customer on-boarding.
Crowd sourcing pushes snippets of the same information to online data entry clerks based globally who are connected to a management platform via the Internet. Two people check the same snippets of unmatched or poor quality data from an image before entering it into a system. If there’s a mismatch between what the two individuals input, it goes to a third person for exception handling. This is how 100% accuracy rates are achieved.
Crowd sourcing data checking is ideal where intelligent word or character recognition technologies (ICR and IWR) struggle to recognise handwriting in a field and more validation is required. Working with a specialist crowd sourcing partner that pays data entry operators by key or entry stroke costs a fraction of what it would to employ your own staff when salaries, pensions, office space, desktops etc. are taken into account.
Data’s exponential growth has created opportunities to leverage it in new ways for better business outcomes. Crowd sourcing is a relatively new area in the information and document management industry. This kind of data validation approach is cost effective, fast, secure and works reliably. Your mission, should you choose to accept it, is to give it a go.
1. Impossible Missions Force
2. IDC,Worldwide Global DataSphere Forecast, 2020–2024: The COVID-19 Data Bump and the Future of Data Growth
3. Gartner, The State of Sales Operations, 2020