Introducing OPSWAT Threat Intelligence Similarity Search Learn More

Don’t Trust What You See

“Appearances can be deceptive,” the adage goes. The same notion applies to digital content, where people with malicious intent can trick our eyes into believing what they want us to see online. To illustrate this case, while our team was analyzing malware samples, we stumbled upon an interesting case that involved a PDF file, sensitive information, and the potential of a data breach.

An Image, or Two?

We had a file that contained one image of a deserted beach, or so it seemed.

Figure: PDF file with an image
Figure: PDF file with an image


The file looked innocuous. Nothing suspicious about it. But after running it through OPSWAT’s Deep CDR (Content Disarm and Reconstruction) engine, we discovered that there were actually two images:

<</Type/XObject/Subtype/Image/Width 1100/Height 733/ColorSpace/DeviceRGB/BitsPerComponent 8/Filter/DCTDecode/Interpolate true/Length 160490/Alternates[<</Image 14 0 R/DefaultForPrinting true>>]>> 


When we opened the file in a text reader, we noticed the tag referring to the second image had been hidden:

PDF file displayed in text reader
Figure: PDF file displayed in text reader


The alternate tag that specifies the image as default for printing:

<</Alternates[<</Image 14 0 R/DefaultForPrinting true>>]>> 


The hidden image is defined by this tag:

14 0 obj 
<</Type/XObject/Subtype/Image/Width 1100/Height 733/ColorSpace/DeviceRGB/BitsPerComponent 8/Filter/DCTDecode/Interpolate true/Length 47955>> 


The use of alternate tags within the PDF allows authors to define what image is displayed when printed. This means if there is sensitive information in the second image, someone can pass it on to an external source and view it with a click of the print button. After printing the file out, here’s what we saw on the paper. The printed file contains three credit card numbers, which could be sensitive Personally Identifiable Information (PII).

Figure: Credit Card Number shown on paper when printed
Figure: Credit Card Number shown on paper when printed


As is apparent from this example, this tactic can be exploited to expose confidential information, causing data breaches or regulatory compliance violations. Worse, bad actors can use this ploy to involve other people to pass on sensitive information without knowing. Sneaky, isn’t it?

How Do We Handle Dealing with Hidden Sensitive Information?

Sensitive information such as social security numbers, credit card numbers, IPv4 addresses, or Classless Inter-Domain Routing (CIDR) is susceptible to data breaches and regulatory compliance violations.

A good practice to prevent data loss and data exposure is to constantly content-check files being transferred. OPSWAT Proactive DLP (Data Loss Prevention) detects and blocks sensitive and confidential data in files and emails. Every file being uploaded or downloaded from web applications, or being transferred through web proxies, secure gateways, web application firewalls, and storage systems, can be thoroughly checked before use with Proactive DLP.

Figure: Proactive DLP in MetaDefender Core to protect sensitive data in file uploads
Figure: Proactive DLP in MetaDefender Core to protect sensitive data in file uploads

Protect Sensitive Information and Prevent Data Loss with Proactive DLP

Figure: How Proactive DLP detects, redacts and blocks sensitive data in files and emails
Figure: How Proactive DLP detects, redacts and blocks sensitive data in files and emails


Proactive DLP detects and blocks sensitive data in more than 30 supported file types. The detected sensitive information in PDFs, MS Word documents, and MS Excel spreadsheets will then be automatically redacted.

Proactive DLP can check for image-based sensitive information by leveraging Optical Character Recognition (OCR) to detect and redact confidential data in image-only PDF files or PDF files with embedded images. The technology also removes metadata containing potentially confidential information such as name, company, subject, GPS location, author, and more. The final redacted file will include watermarks for enhanced security, accountability, and traceability.

Proactive DLP is an OPSWAT technology in Data Loss Prevention and is one of the key solutions in MetaDefender Core, MetaDefender ICAP Server, MetaDefender Email Gateway Security, MetaDefender Kiosk, and MetaDefender Vault. To learn more about Proactive DLP and how OPSWAT can protect your organization, talk to one of our critical infrastructure cybersecurity experts.

*Special thanks to Peter Simon for discovering and handling this case. Simon is one of our talented and dedicated software engineers from the Proactive DLP team.

Additional Related Resources:

Sign up for Blog updates
Get information and insight from the leaders in advanced threat prevention.