Gravatar Advisory: How to Protect Your Email Address and Identity
Update: We’ve added comments at the end of the post pointing out that the National Institute of Standards and Technology (NIST) considers an email address to be personally identifiable information or PII.
Gravatar is a service that provides users with a profile image that can appear on many sites across the Net. It is integrated with WordPress.com (The version of WordPress hosted by Automattic) and is also integrated into WordPress.org, the self hosted version of WordPress. Gravatar is also used by many other popular services on the web like StackOverflow.com.
If you sign up for a website on WordPress.com and publish a blog post, a Gravatar icon appears on your site as your profile photo, indicated by the red arrow below. You can visit gravatar.com to customize that icon and upload a photo of your own.
If you use WordPress.org, Gravatars are an option you can enable for your users and they are widely used. It will either show their profile photo if they have gone to Gravatar.com to create one, or it will show a default image. You can select from several kinds of default images.
Other services like StackOverflow, one of the most popular sites on the web, also use Gravatar for profile images.
In the HTML source code of your website, Gravatar loads images using a hash of your email address. If you read our post earlier this week where we discuss the problem of malware scanners using weak hashing algorithms, you will have a basic understanding of how a hashing algorithm works. In short, a hash algorithm turns some value into a long number and in theory it is difficult to turn that number back into the original value.
Even if you haven’t signed up for a custom profile image at Gravatar.com, a hash of your email address still appears in the source code of any website that integrates this service.
You can see in the screenshot below how Gravatar loads your profile image using a hash of your email address:
The value that appears after /avatar/ above is: fe967ccdc7b3caa33e0480bb95ae6588
That is a number (in hexadecimal) that is a hash of the email address that I used to create a WordPress.com website. The email I used is email@example.com.
I can run a PHP instruction to verify that. If I run the following PHP code, it produces the above hash:
<?php echo md5('firstname.lastname@example.org');
This prints the value: fe967ccdc7b3caa33e0480bb95ae6588
Using Gravatar and GPU cracking to steal email addresses
If I want to steal a lot of email addresses, I need to turn those hashes back into email addresses somehow. If I can figure out a way to do that, I can crawl wordpress.com, all the self-hosted wordpress.org websites and a lot of other services like StackOverflow and harvest a huge number of email addresses for spamming. I may also be able to reveal the email addresses of people who want to remain anonymous.
It turns out that someone already thought of this. In 2009 a researcher proved that he could reverse engineer about 10% of gravatar hashes into email addresses.
Then in 2013 Dominique Bongard presented a talk at PasswordsCon in Las Vegas where he demonstrated that he could reverse engineer 45% of Gravatar hashes into email addresses. He targeted a well known political forum in France which uses Gravatar for user profile pictures.
The big difference in Dominique’s approach is that he used Hashcat, which is a password cracking tool. He repurposed it so that he could reverse engineer Gravatar hashes into email addresses. The reason this is important is that Hashcat executes significantly faster because it uses consumer graphics processing units, or GPUs, which are used by gamers to accelerate game graphics performance. Cracking hashes with GPU acceleration increases performance by a factor of several thousand.
At Wordfence we have done a significant amount of experimentation with GPUs and hash cracking and we even provide a commercial service as part of Wordfence Premium that uses a GPU cluster to perform a password audit on your WordPress website. We launched this service over a year ago. The photo below is the password cracking cluster we designed for this service. Those are liquid cooled chrome GPU pipes in the photo. They look even better in real life.
When Dominique did his talk in 2013 on using Hashcat to turn Gravatar profile hashes back into email addresses, the Nvidia GeForce GTX Titan GPU was released which provided 5045 Gigaflops of processing power.
In May of his year Nvidia launched the GeForce GTX 1080 which comes with 8873 Gigaflops of processing power. In just two years the amount of processing power that is available has almost doubled.
When you consider that 2 years ago a single researcher reverse engineered 45% of gravatar profile photos into email addresses, it’s quite possible that a criminal group armed with a modern GPU cluster, as shown above, could reverse engineer a far higher percentage today. The problem will only get worse.
Email hashes may expose your identity across the Web
The use of email address hashes has a further problem. If you view the source of a website using Gravatar profile photos, extract the hash and then google that hash in quotes, you can find other websites and services that are used by the individual you are researching.
For example: A user may be comfortable having their full name and profile photo appear on a website about skiing. But they may not want their name or identity exposed to the public on a website specializing in a medical condition. Someone researching this individual could extract their Gravatar hash from the skiing website along with their full name. They could then Google the hash and determine that the individual suffers from a medical condition they wanted to keep private.
The above can be used to Google an MD5 hash of anything. Try entering in your domain name or common passwords (not passwords you actually use). Let us know what you find in the comments.
What to do to protect your email address and identity
To solve the identity and spam problem that Gravatar presents, the most effective option is to use a unique email address to register on each website you are a member of. The email address should be hard to reverse engineer.
If you use an @gmail.com address, Gmail provides a feature whereby you can append a plus sign to your email address and anything after it is ignored. If your email address is email@example.com, you can change it to yourname+junkGoesHere@gmail.com and you will still receive the email.
What we suggest you do is use a unique gmail address on any Gravatar enabled website when you register. Therefore firstname.lastname@example.org would become: yourname+2h4J1q9ZuU9@gmail.com. Gmail has documented this feature here. The feature also works with hosted Gmail addresses where you use your own domain. Outlook.com also provides this feature.
Using this technique makes it much harder for a spammer to reverse engineer your email address from a Gravatar hash. Try to make your email address at least 20 characters long and include upper and lower-case letters and numbers in the suffix after the plus sign. If you have uploaded a custom Gravatar profile image, you should note that this has the side effect of not displaying that image on the websites where you make this change. Instead you will get a default profile image.
Receiving extra spam is an inconvenience. It can be a minor inconvenience if you have an excellent spam filter in place. However, having your identity exposed on a website where you assumed your identity was private can be embarrassing at best and have far worse consequences. We therefore suggest that you switch to using a plus-suffix on any website where it is important to maintain your personal privacy.
What should Gravatar do?
This presents a significant challenge for a service that is as widely used as Gravatar. They can’t simply upgrade their own systems. Web applications that have integrated Gravatar rely on the fact that they can request an image with an MD5 hash of a user email address and get a profile photo in return. These applications all need to be updated too, and there are thousands – quite possibly tens of thousands of them.
Even if Gravatar switch to SHA-2 or a longer and stronger hashing algorithm, they are still vulnerable to GPU accelerated email cracking attacks. The identity problem will also still exist.
They could consider switching to a more computationally intensive hashing algorithm like bcrypt. That would provide significant resistance to reverse engineering. But it comes with the obvious cost that it is computationally intensive. Gravatar need to generate a lot of hashes to provide the service they do. Developers who integrate Gravatar into their products also need to generate hashes from email addresses. Both will suffer from increased resource usage if they start using bcrypt. It also doesn’t solve the identity problem.
There are other options available like using a shared secret between developers and the Gravatar servers to generate hashes. These come with their own implementation challenges and performance implications. This option may solve the identity issue because it could generate unique hashes across websites that are also hard to reverse engineer.
A final option is to switch to locally hosted images and move away from hashes or global unique identifiers of any kind. This will introduce more complexity for developers who want to integrate Gravatar into websites, but has the benefit of doing a better job of protecting user privacy and avoids disclosing email addresses.
Further comments on privacy
This is a complex problem and there is unfortunately not an easy fix for Gravatar. In my opinion, the most important issue here is the potential exposure of user identities. I think the medical example that I provided above illustrates how much damage can be done if a user identity is exposed under certain conditions.
That is why the privacy implications of this problem cause the most concern. If you aren’t particularly technical you may simply trust a website owner who says that your full name and personal information won’t be exposed. With the current way Gravatar works, you run the risk of having that information exposed.
As always I welcome your comments below and will respond as time permits.
Update: After publication, one of our senior staff pointed out that the National Institute of Standards and Technology (NIST) considers an email address to be PII, or personally identifiable information. Please see the NIST publication 800-122 “Guide to Protecting the Confidentiality of Personally Identifiable Information (PII)“. PII has a legal meaning in many jurisdictions and is used in the definition of privacy law.