Proactive DLP Adds Support For Optical Character Recognition (OCR)

New Features Included In v2.3.0:

  • Support sensitive information detection and redaction in non-searchable PDF
  • Support redacting sensitive information in Microsoft Excel (XLS, XLSX)

Proactive DLP Can Now Protect PII In Non-Searchable PDFs

Many paper documents containing confidential or sensitive information are scanned and transferred as PDF files for business purposes. For example, customers' identity cards, employees' payroll, insurance form, invoices, customer lists, patient records, etc. With the increasing exchange of e-documents via the Internet, the sensitive data can accidentally leak from these files that you may not even know its existence due to non-searchability.

With OPSWAT’s Proactive DLP, personally identifiable information (PII) now can be detected and automatically redacted not only in text files but also in non-searchable PDF files thanks to Optical Character Recognition (OCR) technology application.

What is Optical Character Recognition (OCR)?

OCR is a commonly-used technology to recognize text inside images. It examines the text of the documents and converts the characters into code that can be used for data processing. The advanced systems are able to produce highly accurate recognition results for almost all popular fonts.

Maximize Data Leak Prevention With OCR

Proactive DLP now can utilize this technology to detect and redact sensitive information hiding in English non-searchable PDFs. This improvement is to solve our customers' concerns about data loss from image-only PDF files. Now you can leverage the OCR feature in Proactive DLP for scanning PDFs attached to emails, uploaded or downloaded from websites, or transferred in your network traffic.

An example of sensitive data in an none-searchable PDF detected and redacted by Proactive DLP:

No More Sensitive Data Leaks From Microsoft Excel Documents (XLS, XLSX)

Another upgrade in this release is the extension of supported file formats. We have extended the sensitive data redaction feature for XLSX and XLS files.

Proactive DLP redaction now supports Portable Document Format (PDF), Microsoft Office Word (DOC/DOCX), and Microsoft Excel (XLSX/XLS). The detected PII is covered while the structure of the document remains. Simultaneously a wide range of file types, including email, text and media files, are supported for sensitive data detection. For a detailed list of supported file formats, please access

Sign up for Blog updates
Get information and insight from the leaders in advanced threat prevention.