Just because a document isn’t digital doesn’t mean it doesn’t contain metadata. Printed documents often have their own hidden details, and now German researchers have developed tools to help you scrub them clean.
We have known for over a decade that most colour laser printers embed unique details to trace each document back to its source. They typically use tiny patterns of yellow dots, invisible to the naked eye, containing information such as their serial number and when the document was printed.
Now, researchers have released software to strip documents of that information. This could help whistleblowers to reveal sensitive information without getting caught, they claim.
Printer manufacturers have included this feature for years. The devices add the invisible dots to the image just before it hits the paper. The information hides in plain sight as a repeating matrix, nestled in the document’s white spaces, viewable only with a blue LED light and a magnifier, but it can trace every printout uniquely to your printer. Manufacturers rarely notify customers about these features, but law enforcement uses them to fight counterfeiters.
Timo Richter and Stephan Escher, researchers at TU Dresden’s Chair of Privacy and Data Security, cited NSA whistleblower Reality Leigh Winner as an example of what happens when governments and companies use these tracking dots to invade peoples’ privacy.
Winner, who worked for Pluribus International Corporation, was stationed at the NSA where she printed a top-secret document detailing a cyber attack by Russian military intelligence on US election infrastructure.
She had produced the documents using NSA printers, which investigative journalism site the Intercept then scanned and reproduced online. Winner’s arrest affidavit shows that she was identified following an ‘internal audit’.
Errata Security showed at the time how the document contained a dot pattern showing when it was printed, and on what device, which may have been one of many clues leading to her arrest. Winner is set to serve at least five years in jail after reaching a plea deal last week.
Reading between the lines
The TU Dresden researchers wanted to give people the chance to manipulate these dots for themselves. They analysed 1286 prints from 141 printers spanning 18 manufacturers, to document the patterns that they were using. They found four separate pattern formats used by different manufacturers.
Along with colleagues Dagmar Schönfeld and Thorsten Strufe, the duo created a tool, called Dot Extraction, Decoding and Anonymisation (DEDA). They also wrote a paper detailing its inner workings.
The tool offers a range of functions in two broad groups: analysis and anonymization.
On the analysis side, DEDA ‘reads’ the dots in a scanned document to find out what pattern it uses and to extract any information it can. If the tool cannot read any information from the dot pattern, it can extract the dots for further analysis. Users wanting to forensically analyse several files at once can also use the tool to find any produced by different printers.
On the anonymization side, DEDA can anonymize a scanned image by wiping all the dots from its whitespace. It can also anonymize a document for printing by adding more dots to the existing pattern, confusing anyone that tries to read the information. This is a more time-consuming process, involving the production of a mask which must then be aligned with the scanned document before printing the anonymized version.
TU Dresden’s isn’t the only project to target these yellow dots. A year ago, CryptoAUSTRALIA researcher Gabor Szathmari submitted a pull request to an open source sanitising tool called PDF Redact Tool, produced by the Intercept’s owner, First Look Media. The changes, which were added to the product, take a lower-tech approach by converting images to black and white, effectively removing the tracking dots.
Does all this mean that you can safely use these tools to scrub your whistleblowing documents of any identifying data? Perhaps not.
The EFF, in its no-longer-updated list of yellow dot-producing printers, cites documents that it received from the government in FOIA requests. These suggest that all major manufacturers may have entered into an agreement to embed some kind of forensic tracking technology, it says, adding:
It appears likely that all recent commercial laser printers print some kind of forensic tracking codes, not necessarily using yellow dots. This is true whether or not those codes are visible to the eye and whether or not the printer models are listed here. This also includes the printers that are listed here as not producing yellow dots.
There are also other tracking mechanisms (which the TU Dresden team describes as ‘passive’ in their paper). These include analyzing halftone patterns in printed images and looking for slight geometrical differences in printed characters. Forensic analysts used that technique to trace typewritten documents long before printers came along.
So if you’re planning to blow the lid off a scandal by scanning and reprinting the telltale documents, be careful – there may, quite literally, be more than meets the eye.
11 comments on “Tool scrubs hidden tracking data from printed documents”
I enjoyed this article, very insightful.
It would appear from this article that Naked Security condones the practise of releasing stolen documents.
I’m not saying that it’s never appropriate, but surely it’s a more complex, delicate and nuanced discussion than a simple “always right” or “always wrong”.
This is not the only tool that does this, github.com/firstlookmedia/pdf-redact-tools is another example. However, why they chose to develop this tool to specifically combat the United States Government, and not a more widely needed tool like a quality freeware metadata scrubber, shows their interest for what they are. Not in security, but in subverting the rule of law and order. Reality Winner should be getting 15 years not 5.
Print a local copy in black and white. Fax it to yourself using Group 3 “standard” (not fine) resolution of 100 x 100 dpi. Destroy the original. Mail the fax. At that resolution, there couldn’t be any useful metadata.
Or simply print on an old dot-matrix printer.
I thought through those possibilities too! I guess the problem then becomes provenance. In a lot of these cases, a whistleblower would need to prove that the document was legitimate, and the more degraded it becomes, the more difficult that becomes. So there seems to be a trade-off between traceability and authenticity, and I am not sure where the optimal point is. I suspect it depends on the individual use case, and we might need a legal forensics expert to answer that one.
This isn’t a cases of a whistle-blower. No one who leaks classified information to the press is a whistle-blower. How do you write an article like this and not know that? How do you not discuss or bring up the damages or risk that result from leaking classified government information to the press? Reality Winner should be getting 15 years and you should be getting a 15 day timeout from pretending to be a journalist, when you’re really just another opinion piece writer.
I wish I knew how formatting worked in this forum. Anyway, here’s the definition of Whistleblower from google:
a person who informs on a person or organization engaged in an illicit activity.
Releasing classified documents, as long as they are evidence of illicit activity, would count as whistle blowing.
Interesting article, does the statement from EFF suggest that black and white laser printers do something similar with black dots or other marks?
This won’t help on the newer printers that use halftones instead of yellow dots, but just print with a yellow background. If necessary, you can then scan it and print it to PDF with a white background.
Print document, exfiltrate document, retype relevant text, release that.
Less obviously authentic, a lot safer.
If you need the authenticity, photocopy the letterhead, footer, and anything else relevant, greyscale scan the copy, and paste it to your cleansed text. You may need to cycle the text part separately to get it to look right. Once done, you have an apparently original document with the original data and any tracking swept out. Assuming you caught any text metadata for it.
The source of the data can’t prove you’ve manipulated it without releasing the original, which is a win for you anyway.
Whistleblowing securely isn’t easy, but it shouldn’t be. It’s only okay when it’s really important, and if it matters, it’s worth spending all that effort on it.
(unrelated, to the people who say this is only designed to defeat the US government, where do you think big corporations or criminals print things?)
So then this brings up another type of hack, yet to be seen maybe. Duplicating the tracking data from a printer to make fakes. Was it really from the accountant? or did someone set him up by making incriminating fake documents? Best call Perry Mason.