Wordfence In Depth: How Malware Becomes Scan Signatures

One of the most effective ways the Wordfence team keeps the WordPress community and customers secure is through something we call the ‘Threat Defense Feed’. This is a combination of people, software, business processes and data. It’s an incredibly effective way to keep hackers out and provide our customers with early detection.

Today I’m going to go into some depth describing how the malware detection part of the Threat Defense Feed works. This will be a fun journey into some of the internals here at Wordfence, and will give you insight into how we constantly innovate to keep you and your customers secure.

The Decision to Vertically Integrate Threat Intelligence

Some time ago the team and I realized that to give our customers the best protection available, we needed to know what attacks are occurring on the ground. We needed a constant flow of ‘footprints’ that hackers left behind that we could turn into threat intelligence and feed into our products to improve detection.

We faced an important decision about where to get the forensic data we needed. We could either rely on hosting companies and open sources of malware samples, or we could go into the incident response business ourselves, getting what we needed from recently compromised sites while helping customers recover from a hack.

The decision was obvious. So we kicked off an ambitious project to create a constant flow of new threat intelligence. We immediately entered the site cleaning market and built a highly competent team of forensic experts that can perform incident response, analyze hacked sites and get those customers back up and running with a clean site as quickly and effectively as possible.

One of the products of our site cleaning activity is that our team finds malware samples and they feed that into a huge repository of malware that we have created. Our site cleaning customers are able to benefit from excellent customer service and an incredible forensic team at one of the lowest prices in the industry, because the malware we recover from their websites is used to help protect other Wordfence customers.

In addition, we occasionally receive huge troves of malware from customers and hosting companies. We also feed these sample sets and those from several other sources into our malware repository on a continual basis.

Removing Malware Duplicates and Creating Malware Signatures

When our analysts add a new sample to the malware repository, we have an internal tool we use to run a scan on the sample. It uses all our existing malware signatures and lets our analyst know what we already detect. Malware that we already detect is removed from the repository and not included in our workflow.

Then the malware sample is handed to our malware signature authors who are experts in creating highly optimized regular expressions that are used by our scan engine to detect malware. The sig authors perform another de-duplicating step to ensure that a recently added malware signature isn’t already detecting something they’re working on.

Once our sig authors are sure what they have is 100% unique malware that we haven’t seen before, they hand craft a new malware signature to detect it. Each signature is checked for compatibility with different versions of PHP and the PCRE regular expression library.

When we have created several new malware signatures this way, the sig authors get together and combine a batch of new malware signatures which they move into a beta state.

Avoiding False Positives in Malware Detection

One of the worst things that a security product can do is to produce false positives. In malware detection, a false positive is a situation where the scanner tells you that it has discovered malware but it really hasn’t. It wastes your time and it also desensitizes you so that you don’t take notice when a true positive shows up. So we put a lot of time into making sure that our malware signatures don’t create false positives.

Once our sig authors have moved a batch of malware signatures into beta, they run their own set of tests on the batch of signatures. Our sig authors have WordPress installations set up with over 3000 of the most popular plugins and themes installed on each test machine. As part of the early beta test they do, they will enable the ‘beta’ flag in Wordfence so that it uses the beta signatures, and then run a scan on these test machines.

If the scan runs on a clean test machine and comes back with no false positives, they move the scan to the next phase which is a formal software quality assurance step. At this point a senior member of our quality assurance team performs a series of final tests to ensure that the batch of beta malware samples does not produce any false positives and that they detect the ‘true positives’ that each signature is designed to detect.

Moving Malware Signatures to Production

Once our malware signatures have been hand crafted, subjected to intense testing by our sig authors and have passed final QA, they are released into production. This step is as simple as turning off the beta flag on the new batch of malware signatures and they become instantly available to our Premium customers. Thirty days later, those new premium malware signatures become available to our community customers at no charge.

The graph below shows the total number of malware signatures we have in production over time at one week intervals. The red graph shows the number of Premium Wordfence signatures we have in production.

As you can see our community threat defense feed receives malware signatures 30 days later and the number of community malware signatures has grown significantly since September as the Premium signatures have been feed into the community ruleset during the past 5 months.

During the past few weeks our team has added a significant number of new Premium rules based on new malware we are seeing that infects sites and injects links to Japanese spam websites. The malware our team found has a significant number of variants requiring a large number of new rules to be created. Those will start entering the community feed over the coming weeks.

People, Software, Processes and Data

At Wordfence we are continuously working to find new ways to efficiently deliver actionable threat intelligence to our customers in real-time so that we can better secure you, your website and your customers. We have a research project underway at present that is working on finding new ways to further improve the process I’ve described above.

The most important component of our Threat Defense Feed (or TDF) is our people. The team at Wordfence has done an incredible job of taking ownership of the TDF and continuously improving it and the products, like the Wordfence plugin, that use the threat intelligence that the TDF provides. The result is that over the past year you have seen a significant improvement in detection rates both on the Wordfence Firewall and in the malware scanner.

We’re all very proud to count you among our customers and will continue to innovate so that we can better protect you, your customers and your investment.

Mark Maunder – Wordfence Founder/CEO.

Special thanks to Wordfence team member Åsa for designing the super awesome malware characters in the diagram above.

Comments

7 Comments

Tom Fuszard

February 16, 2017
11:12 am

Mark,

Just wanted to say I really appreciate the work you and your team do. I was introduced to Wordfence several weeks ago. I'm still on the free version (will upgrade one day), but am so impressed with the protection it offers. I better understand now the threats out there--and what I'm sure the bad guys were doing before I installed Wordfence.

People don't realize what's happening in cyberspace until they see the real-time traffic. It's horrifying. But all the more reason to install Wordfence.

Keep up the good work.

- Tom Fuszard
Wisconsin
Kenneth Ervin Young

February 16, 2017
11:58 am

From an engineering perspective you should not be removing malware samples that you already detect from your test database. One could easily make a coding error that allows a prior caught malware code to slip thru in an update. Perhaps that is not what you meant in one of your paragraphs but that's how I interpreted your writing. Otherwise your team does an awesome job! Keep it up!

Mark Maunder

February 16, 2017
12:29 pm

Hi Kenneth, that section may have been unclear. We remove malware that we already detect from the set of samples that are handed to the sig authors. The samples, however, all stay in our database. We never throw away or delete a malware sample we find. Our malware DB is organized by ticket or other unique identifier that references the source. So lots of duplication but not in the data we hand our sig authors.

Hope that makes it clear.

Kenneth Ervin Young

February 16, 2017
2:37 pm

Mark, your clarification makes more sense. Thanks fir the prompt reply!

Michelle

February 16, 2017
1:42 pm

I am so thankful I discovered your s-ware. Our charity site went down and had to be restored. After changing passwords and running scans, I was sent a report on Russian hackers trying to hack into that site and my 3 other wordpress sites. Love that you give the IP, what username they tried to use, time of day, etc.

I was shocked to see so many attempts on all of my sites. They used "admin" "administrator" to try and get in, along with the account of a former admin who had privileges to help with issues stemming from a template he created. I quickly deleted that account.

Thank you, again.
Ejaz

February 17, 2017
12:03 am

Thats a very insightful article about TDF. I really appreciate your efforts.
Hans Fransen

February 18, 2017
8:58 am

As always, awesome job.

Wordfence In Depth: How Malware Becomes Scan Signatures

The Decision to Vertically Integrate Threat Intelligence

Removing Malware Duplicates and Creating Malware Signatures

Avoiding False Positives in Malware Detection

Moving Malware Signatures to Production

People, Software, Processes and Data

Comments

Breaking WordPress Security Research in your inbox as it happens.

Cookie Options

Strictly Necessary

Performance/Analytical

Targeting