Robert A. Alexander bio photo

Robert A. Alexander

Short bio.

Github

This project on GitHub allows you to generate two visually unique PDFs containing different text which both hash to the same SHA-1 sum.

Edit

Some similar projects that demonstrate improved versions of this attack have been released. Be sure to check these out.

Background

SHA-1 is broken in practice. An effort by Cryptology Group at Centrum Wiskunde & Informatica (CWI) and the Google Research Security, Privacy and Anti-abuse Group has generated a pair of PDF documents that are visually different, but hash to the same SHA-1 sum. The PDFs were published on the shattered.it website. The PDF document was carefully crafted such that an arbitrary number of additional PDF documents may be created without needing additional collisions.

This means that additional pairs of PDFs that hash to the same SHA-1 sum may be generated for free, without computing another SHA-1 collision.

While the full details of the attack and how the PDF prefix was generated have not yet been made fully public, analysis of the shattered.it PDF documents shows some simple approaches.

The Attack

The pair of PDFs from shattered.it contain a small region of differences near the start of the file. The computed colliding blocks are here.

Diff

Due to the way SHA-1 processes data one block at a time, the data after this region may be altered. If both files are extended with the same data then they will still hash to the same SHA-1 sum. This is related to the length extension attack.

The region from the start of the file to the end of the colliding blocks is the “PDF prefix”. If the data we append to the “PDF prefix” is visually affected by the data in the colliding blocks then we can generate additional visually unique PDFs that hash to the same SHA-1 sum.

This attack uses the existing “PDF prefix” from the shattered.it announcement and appends some custom data. The original shattered.it PDFs have either a red or a blue background as defined in an embedded encoded image. The rendering code, which exists outside the “PDF prefix”, can be altered to zoom in on a blank section of the background, thus hiding the text and logos of the original PDFs. If we add red text and blue text on top of the backgroud, then matching colors will not be visible. So the PDF with the red background will appear to only have blue text, and visa-versa.

The result, as shown in the images below isn’t pretty. However, unless the content of the PDF file is examined or the text is highlighted it is unlikely that the reader will notice the hidden text. If the PDFs are printed, there should be no sign of the hidden text on the printed document.

The project listed above demonstrates a simple script for generating additional colliding PDF document pairs using this approach.

When the full details of the shattered.it PDFs are released, it should be possible to create custom unique PDFs without resorting to the “text matching background” trick. These should be much prettier and customizable. It’s not immediately clear how the images in the original PDFs were generated.

SHA-1 collisions used

This attack doesn’t need to spend the estimated $110K to generate a new SHA-1 identical-prefix collision. The existing collision reported in the shattered.it announcement is all that was needed.

Mitigations and Protections

Please review the shattered.it website to learn how to stay safe from this attack and it’s variants.

Sample

The following two PDFs were created using the project. Both PDFs hash to the same SHA-1 sum (c51c57c8fddc8f9a43a9bc8e55cec79e37cf6591) but when viewed show different text.

First PDF

Demo

Second PDF

Demo

Showing hidden text

Highlighting either of the documents shows that some text wasn’t visible.

Demo

References

Many thanks and congradulations to the shattered.it team for their success.