Take Auto Data Redaction to New Heights With Machine Learning

MiaRec US Decision Makers AI Interaction Analytics box

The correct and accurate redaction of sensitive information in call recordings and transcripts is crucial to protect customers’ personal data, ensure compliance, and more.

But a manual approach to data redaction is tedious, unreliable, and labor-intensive — until now.

We are empowering hundreds of organizations across the globe to take advantage of Machine Learning and Artificial Intelligence to make their contact centres more compliant and secure.

We have now made significant advancements to our ML-based Auto Redaction feature we first launched in January 2023.

In this article, Gennadiy Bezko at Miarec will help you gain a better understanding of the enhanced AI-driven Auto Redaction feature leveraging Machine Learning (ML) and Named Entity Recognition (NER) and learn how you can take advantage of it in your contact centre to ensure data privacy and security. Let’s dive in.

Why Is Accurate Data Redaction Crucial?

Every contact centre processes thousands of pieces of sensitive information a day. For example, an agent needs to verify personal data to make sure they are speaking to the correct customer or authorized person to complete the reason for the call.

This leads to multiple sets of non-continuous data from a call that need to be redacted due to regulatory or compliance (government, insurance, internal) reasons.

Relying solely on the agent to pause and resume a call to “redact” the data from the call recording and transcript puts extra pressure on the agent in an already stressful environment, which can add to human error.

On the other hand, not redacting Personal Identifiable Information (PII) can result in fines or sanctions from privacy regulators (GDPR, PCI DSS, etc.) or loss of trust and reputation in a breach and leak of data.

This is why Auto Data Redaction methods have become popular in contact centres with varying degrees of success.

However, even with automated methods, the redaction process can be tedious. The Auto Redaction method must be set up, configured and customized, routinely tested, etc.

Types of Auto Data Redaction

There are three main types of Auto Redaction: Numeric-based, Pattern-matching, and Machine Learning-based.

Numeric Data Redaction

Numeric data redaction involves masking or altering numerical values, such as credit card numbers or social security numbers.

This method replaces specific digits with placeholders to protect sensitive information. The biggest disadvantage of the above method is over-redaction, as it can mistakenly redact data that is not PII such as customer account numbers etc.

This will result in a significant loss of data value as this data won’t be as suitable for analysis, insights and decision-making.

Numeric data redaction will not work for non-numeric entities, such as names, organizations and brand names, etc.

Pattern-Matching Data Redaction

Pattern-matching redaction relies on predefined patterns and manually created rules to identify and mask sensitive information. Organizations can define custom rules based on their specific data protection needs.

With this method, the call transcripts are processed by automated software or tools designed for data redaction.

These tools analyze the text content of the transcripts character by character or in chunks, searching for sequences that match the predefined patterns.

When a sequence of characters in the transcript matches one of the predefined patterns, it triggers the redaction process.

For example, if the pattern corresponds to a phone number format (e.g., “(555) 555-5555”), the software identifies and flags such instances.

Although this approach can be pretty effective with a lot of upfront manual configuration and customization, there is a need for consistent and extensive auditing of the results for over- and under-redaction.

Machine-Learning Based Data Redaction

The Machine Learning (ML) approach, as the name suggests, is based on Machine Learning, which is a subset of Artificial Intelligence (AI).

This method takes Auto Redaction to the next level as it can be more accurate and done automatically, eliminating the need to manually create rules and craft queries.

Machine Learning models use algorithms and learn and improve from experience. As the ML engine analyzes calls, it can take conversation context into account to better determine what a credit card number vs. an account number is, learn new terms, slang, etc.

The more data (calls) the Machine Learning model analyzes, the more effective it becomes. Trained on thousands on conversations, this method is recognized to be the most effective and accurate auto data redaction method available today.

Using Named Entity Recognition (NER), a Machine Learning-based Natural Language Processing method is specifically effective in extracting, identifying, and categorizing a wide range of Personally Identifiable Information (PII) data from unstructured texts without human analysis. For example:

Credit card, financial account numbers, codes etc.
General: names, organizations, addresses, phone numbers,
Government-, country-, or region-specific ID numbers (e.g., passport or Social Security numbers)
Protected Health Information (PHI)
Account and/or order information
and much more.

Enhanced NER/ML-Based Auto Redaction

At the beginning of the year, we launched our AI-Driven Auto Redaction as part of our AI-powered Conversation Intelligence platform for both text transcripts and audio recordings.

We are now happy to announce that we have significantly improved this feature using cutting-edge Named Entity Recognition (NER) and Machine Learning technology.

Higher Level of Accuracy

Our enhanced AI-driven Auto Redaction feature is more accurate than traditional pattern-matching because it incorporates state-of-the-art NER Machine Learning technology and was trained on thousands of conversations.

This enables our Auto Redaction software to have a deeper contextual understanding of conversations to more accurately identify what is Personally Identifiable Information and what isn’t.

For example, if these digits belong to a credit card number, they must be redacted, but if they are customer account numbers, they need to be kept in the recordings.

These enhancements will significantly reduce the level of missed redactions (false negative) and over-redactions (false positive), and the accuracy will continue to improve as more data (conversations) are analyzed.

Out-of-the-Box Configuration

Unlike pattern-matching data redaction models, our solution doesn’t require any configuration. This eliminates the time-consuming and labor-intensive setup of complex queries, as well as the initial audit and fine-tuning phase.

Seamless Integration

Enhanced Auto Data Redaction feature seamlessly integrates with our comprehensive Workforce Optimization (WFO) and Voice Analytics platform. Not to mention all of the integrations and compatibility that our WFO platform integrates with, i.e., Teams, Five9, etc.

As a result, businesses can enjoy a unified and user-friendly experience that simplifies data protection, retention, and analysis.

What Are the Benefits of Using NER-ML for Data Redaction

Compliance

One of the obvious main advantages of using Machine Learning for Auto Data Redaction purposes is to help in achieving compliance with all the various privacy and other government regulations, i.e., GDPR, PCI DSS, HIPPA, etc.

Data Breach/Theft

Data breaches are common and costly. Did you know that 62% of American companies suffered a data breach in 2020, according to a KPMG study, and that the cost of the average data breach in the US was $4.24 million in 2021, based on an IBM report?

With a high-accuracy Auto Redaction process in place, the amount of damage from a data breach is limited, i.e., no credit card numbers. Likewise, it is harder for an employee to get sensitive data on a review of a call.

Better Employee Experience

Another notable advantage of utilizing Machine Learning Auto Redaction is the reduction of stress for both agents and managers.

As the ML engine becomes more advanced, the likelihood of missing data that should be redacted diminishes significantly, alleviating stress for agents.

Additionally, managers no longer need to spend excessive time analyzing what was or wasn’t redacted, or refining the pattern-matching models. This streamlines their workflow and allows them to focus on more crucial tasks.

Conclusion

Compared with the Numeric and Pattern-matching redaction methods, the Machine Learning AI-driven Auto Data Redaction model provides more accurate results with less labor-intensive tasks.

This allows agents to focus more on the caller instead of pausing/resuming the recording. At the same time, managers can spend less time auditing outputs for redactions and instead focus on improving the customer and employee experience to make their contact centre more efficient.

The Auto Data Redaction feature employing Named Entity Recognition (NER) and Machine Learning technology represents a major leap forward in data privacy and security for businesses of all sizes. It is now available to all existing and new customers.

This blog post has been re-published by kind permission of MiaRec – View the Original Article

For more information about MiaRec - visit the MiaRec Website

About MiaRec

MiaRec is a global provider of Conversation Intelligence and Auto QA solutions, helping contact centers save time and cost through AI-based automation and customer-driven business intelligence.

Find out more about MiaRec

Call Centre Helper is not responsible for the content of these guest blog posts. The opinions expressed in this article are those of the author, and do not necessarily reflect those of Call Centre Helper.

Author: MiaRec

Published On: 31st Oct 2023 - Last modified: 9th Dec 2024
Read more about - Guest Blogs, MiaRec