In this post I’m going to discuss a major problem that exists with several WordPress malware scanners: The use of weak hashing algorithms for good and bad file identification. Some malware and antivirus scanners outside of WordPress suffer from this same issue.
For brevity, I’m going to refer to this as the “weak hash scanner” issue.
This issue may allow an attacker to hide malware that is undetectable to scanners using the MD5 hashing algorithm. Below I will explain how hashes are used in the security industry, what the problem is and how to solve it. I’ll also point you to research demonstrating this issue and further reading. I’ll also describe how Wordfence uses a secure hashing algorithm for our malware scanner.
How we use hashes in the security industry to find bad things
In the security world we have a commonly used process of running a file through a piece of logic, called an algorithm, and generating a unique number. That number is used to uniquely identify files. This process is called a hashing algorithm and the unique number is called a ‘hash’.
We use hashing algorithms for all kinds of really cool and useful stuff. We can take a piece of malware, create a hash for it and then store that hash. Later, we can create a hash of a file we’re scanning to check if it contains malware. If that hash matches the hash of the malware we created earlier, then we know the file is that malware.
We can also use hashes to identify “known good” files. At Wordfence we have created hashes of every file we know is safe in the WordPress universe. We have hashes for every theme, plugin and core release in WordPress history. In fact, we have hashes of every file in every version of WordPress core ever released and every version of every theme and plugin ever released.
Right now Wordfence tracks hashes for:
205,146 WordPress core files that Wordfence knows are safe.
5,967,361 WordPress theme files that Wordfence knows are safe.
23,527,261 – yes that’s 23 Million – WordPress plugin files that are known to Wordfence to be safe. This is every version of every file in every plugin ever released.
Hashes are a way for security companies like us to store a small piece of data that uniquely identifies known bad or good files, and then use that data to check if those files exist on a system we’re scanning. Then we can make a decision about whether to preserve the file or get rid of it.
The diagram below illustrates how malware scanners use hashing to identify good and bad files.
Not all hashing algorithms are equal: MD5 vs SHA-2
There are various ways to create a hash. When you run a file through one of these hashing ‘algorithms’, they create a unique number of a fixed length. MD5 is a hashing algorithm that was created in 1991 by Professor Ron Rivest at MIT. It was incredibly useful but is now quite old and has some problems.
Another newer and much more secure hashing algorithm called SHA-2 was developed by the National Security Agency and released by the National Institute of Standards and Technology in 2001. Today SHA-2 is widely used and considered secure enough for commercial use.
MD5 is quite old now and the problem with it is something called ‘collisions’. It’s easy to understand the issue: With MD5, it’s possible to create two different files that have the same MD5 hash, or unique signature. This could be used, for example, to fool a malware scanner into thinking a malware file is actually a known-good file.
That is why we use SHA-2 in Wordfence to track known good files. It prevents an attacker from creating a bad file that has the same hash as a known good file and avoiding detection.
The weak hash scanner problem
Unfortunately not all security products do this. In the WordPress space, some malware scanners uses plain old MD5 to hash files when searching for malware. Sucuri’s WordPress plugin and “Shield WordPress Security”, for example, use MD5 to detect core file changes. The way they do this is they grab the newest MD5 hashes from api.wordpress.org.
The API these products use was not designed to be used for malware scanning. It was originally created for the WordPress upgrade process back in 2013 to help determine which files need to be upgraded. The MD5 algorithm used by this API is not cryptographically strong enough to be used to detect malicious or safe files.
At Wordfence we use SHA-2 and this is one of the reasons we have created our own API endpoint that we use for malware scanning. Doing this allows us to use a cryptographically strong hash function to ensure that malware can’t evade detection by exploiting weak hash algorithms. We have been using SHA-2 since 2012, when the very first version of Wordfence was released as version 1.1.
Last week a security researcher demonstrated how it’s possible to create two windows executables that both have the same MD5 hash. This allows an attacker to create one friendly executable and another malicious file that will later replace the friendly file and avoid detection.
In 2014 Nat McHugh showed how to create two different PHP files and two different image files with the same MD5 hash. This demonstrates the same concept in PHP – that an attacker can create a friendly file which becomes trusted and later replace it with a malicious file that avoids detection by MD5 scanners.
This research has actually been around for some time now. The attack is called a ‘chosen prefix’ attack on MD5 in the security industry. It first came to light in a paper in 2005 written by Xiaoyun Wang and Hongbo Yu at Shandong University in China in which they refer to it as a modular differential attack on MD5.
In 2007, Marc Stevens created an open source toolkit as part of his masters thesis which actually exploited this weakness in MD5. These tools are what were used by the researchers above to create different files with identical MD5 hashes.
This research demonstrates that it’s already possible for an attacker to exploit MD5 to provide a safe file and later replace it with a malicious file that will avoid detection by scanners using MD5. It may soon be possible to create a malicious file that shares the same MD5 hash as a legitimate WordPress core file. For this reason it is important that malware scanners avoid MD5 and use strong cryptographic hash functions to verify file integrity.
What to do about this
The goal of today’s blog post is to encourage two things:
- If you are a customer of a security product, make sure your product is using SHA-2 or another secure hashing algorithm for malware scanning and other checks. If a product uses MD5, it risks being fooled into thinking a file is safe when it is dangerous.
- If you are a security vendor and have not already switched to SHA-2 or a secure hashing algorithm, it’s time to do so now in the interests of your customer’s security.
As always I’m happy to respond to questions and comments below.
Mark Maunder – Wordfence Founder/CEO.
You can learn more about hashing, how it is used for passwords and how to crack passwords using consumer GPU hardware by visiting our Password Authentication and Password Cracking article in the Wordfence Learning Center.