Chapter 1. Introduction

PHP has grown from a set of tools for personal home page development to the world’s most popular web programming language, and it now powers many of the Web’s most frequented destinations. Along with such a transition comes new concerns, such as performance, maintainability, scalability, reliability, and (most importantly) security .

Unlike language features such as conditional expressions and looping constructs, security is abstract. In fact, security is not a characteristic of a language as much as it is a characteristic of a developer. No language can prevent insecure code, although there are language features that can aid or hinder a security-conscious developer.

This book focuses on PHP and shows you how to write secure code by leveraging PHP’s unique features. The concepts in this book, however, are applicable to any web development platform.

Web application security is a young and evolving discipline. This book teaches best practices that are theoretically sound, so that you can sleep at night instead of worrying about the new attacks and techniques that are constantly being developed by those with malicious intentions. However, it is wise to keep yourself informed of new advances in the field, and there are a few resources that can help:

http://phpsecurity.org/

This book’s companion web site

http://phpsec.org/

The PHP Security Consortium

http://shiflett.org/

My personal web site and blog

This chapter provides the foundation for the rest of the book. It focuses on teaching you the principles and practices that are prerequisities for the lessons that follow.

PHP Features

PHP has many unique features that make it very well-suited for web development. Common tasks that are cumbersome in other languages are a cinch in PHP, and this has both advantages and disadvantages. One feature in particular has attracted more attention than any other, and that feature is register_globals.

Register Globals

If you remember writing CGI applications in C in your early days of web application development, you know how tedious form processing can be. With PHP’s register_globals directive enabled, the complexity of parsing raw form data is taken care of for you, and global variables are created from numerous remote sources. This makes writing PHP applications very easy and convenient, but it also poses a security risk.

In truth, register_globals is unfairly maligned. Alone, it does not create a security vulnerability—a developer must make a mistake. However, two primary reasons you should develop and deploy applications with register_globals disabled are that it:

  • Can increase the magnitude of a security vulnerability

  • Hides the origin of data, conflicting with a developer’s responsibility to keep track of data at all times

All examples in this book assume register_globals to be disabled. Instead, I use superglobal arrays such as $_GET and $_POST. Using these arrays is nearly as convenient as relying on register_globals, and the slight lack of convenience is well worth the increase in security.

Tip

If you must develop an application that might be deployed in an environment in which register_globals is enabled, it is very important that you initialize all variables and set error_reporting to E_ALL (or E_ALL | E_STRICT) to alert yourself to the use of uninitialized variables. Any use of an uninitialized variable is almost certainly a security vulnerability when register_globals is enabled.

Error Reporting

Every developer makes mistakes, and PHP’s error reporting features can help you identify and locate these mistakes. However, the detailed information that PHP provides can be displayed to a malicious attacker, and this is undesirable. It is important to make sure that this information is never shown to the general public. This is as simple as setting display_errors to Off. Of course, you want to be notified of errors, so you should set log_errors to On and indicate the desired location of the log with error_log.

Because the level of error reporting can cause some errors to be hidden, you should turn up PHP’s default error_reporting setting to at least E_ALL (E_ALL | E_STRICT is the highest setting, offering suggestions for forward compatibility, such as deprecation notices).

All error-reporting behavior can be modified at any level, so if you are on a shared host or are otherwise unable to make changes to files such as php.ini, httpd.conf, or .htaccess, you can implement these recommendations with code similar to the following:

    <?php

    ini_set('error_reporting', E_ALL | E_STRICT);
    ini_set('display_errors', 'Off');
    ini_set('log_errors', 'On');
    ini_set('error_log', '/usr/local/apache/logs/error_log');

    ?>

Tip

http://php.net/manual/ini.php is a good resource for checking where php.ini directives can be modified.

PHP also allows you to handle your own errors with the set_error_handler() function:

    <?php

    set_error_handler('my_error_handler');

    ?>

This allows you to define your own function (my_error_handler()) to handle errors; the following is an example implementation:

    <?php

    function my_error_handler($number, $string, $file, $line, $context)
    {
      $error = "=  ==  ==  ==  ==\nPHP ERROR\n=  ==  ==  ==  ==\n";
      $error .= "Number: [$number]\n";
      $error .= "String: [$string]\n";
      $error .= "File:   [$file]\n";
      $error .= "Line:   [$line]\n";
      $error .= "Context:\n" . print_r($context, TRUE) . "\n\n";

      error_log($error, 3, '/usr/local/apache/logs/error_log');
    }

    ?>

Tip

PHP 5 allows you to pass a second argument to set_error_handler() that restricts the errors to which your custom function applies. For example, you can create a function that handles only warnings:

    <?php
    set_error_handler('my_warning_handler', E_WARNING);
    ?>

PHP 5 also provides support for exceptions . See http://php.net/exceptions for more information.

Principles

You can adopt many principles to develop more secure applications. I have chosen a small, focused list of the principles that I consider to be most important to a PHP developer.

These principles are intentionally abstract and theoretical in nature. Their purpose is to provide a broad perspective that can guide you as you focus on the details. Consider them your road map.

Defense in Depth

Defense in Depth is a well-known principle among security professionals. It describes the fact that there is value in redundant safeguards, and history supports this.

The principle of Defense in Depth extends beyond programming. A skydiver who has ever needed to use a reserve canopy can attest to the value in having a redundant safeguard. After all, the main canopy is never meant to fail. A redundant safeguard can potentially save the day when the primary safeguard fails.

In the context of programming, adhering to Defense in Depth requires that you always have a backup plan. If a particular safeguard fails, there should be another to offer some protection. For example, it is a good practice to prompt a user to reauthenticate before performing some important action, even if there are no known flaws in your authentication logic. If an unauthenticated user is somehow impersonating another user, prompting for the user’s password can potentially prevent the unauthenticated (and therefore unauthorized) user from performing a critical action.

Tip

Although Defense in Depth is a sound principle, be aware that security safeguards become more expensive and less valuable as they are accrued.

Least Privilege

I used to drive a car that had a valet key. This key worked only in the ignition, so it could not be used to unlock the console, the trunk, or even the doors—it could be used only to start the car. I could give this key to someone parking my car (or simply leave it in the ignition), and I was assured that the key could be used for no other purpose.

It makes sense to give a key to a parking attendant that cannot be used to open the console or trunk. After all, you might want to lock your valuables in these locations. What didn’t make sense to me immediately was why the valet key cannot open the doors. Of course, this is because my perspective was that of revoking privilege—I was considering why the parking attendant should be denied the privilege of opening the doors. This is not a good perspective to take when developing web applications. Instead, you should consider why a particular privilege is necessary, and provide all entities with the least amount of privilege required for them to fulfill their respective responsibilities.

One reason why the valet key cannot open the doors is that the key can be copied. Such a copy can be used to steal the car at a later date. This situation might seem unlikely (it is), but this illustrates why granting an unnecessary privilege can increase your risk, even if the increase is slight. Minimizing risk is a key component of secure application development.

It is not necessary that you be able to think of all of the ways that a particular privilege can be exploited. In fact, it is practically impossible for you to be able to predict the actions of every potential attacker. What is important is that you grant only least privilege. This minimizes risk and increases security.

Simple Is Beautiful

Complication breeds mistakes, and mistakes can create security vulnerabilities. This simple truth is why simplicity is such an important characteristic of a secure application. Unnecessary complexity is as bad as an unnecessary risk.

For example, consider the following code taken from a recent security vulnerability notice:

    <?php

    $search = (isset($_GET['search']) ? $_GET['search'] : '');

    ?>

This approach can obscure the fact that $search is tainted, particularly for inexperienced developers. Contrast this with the following:

    <?php

    $search = '';

    if (isset($_GET['search']))
    {
      $search = $_GET['search'];
    }

    ?>

The approach is identical, but one line in particular now draws much attention:

    search = $_GET['search'];

Without altering the logic in any way, it is now more obvious whether $search is tainted and under what condition.

Minimize Exposure

PHP applications require frequent communication between PHP and remote sources. The primary remote sources are HTTP clients (browsers) and databases. If you properly track data, you should be able to identify when data is exposed. The primary source of exposure is the Internet, and you want to be particularly mindful of data that is exposed over the Internet because it is a very public network.

Data exposure isn’t always a security risk. However, the exposure of sensitive data should be minimized as much as possible. For example, if a user enters payment information, you should use SSL to protect the credit card information as it travels from the client to your server. If you display this credit card number on a verification page, you are actually sending it back to the client, so this page should also be protected with SSL.

In this particular scenario, displaying the credit card number to the user increases its exposure. SSL does mitigate the risk, but a better approach is to eliminate the exposure altogether by displaying only the last four digits (or any similar approach).

In order to minimize the exposure of sensitive data, you must identify what data is sensitive, keep track of it, and eliminate all unnecessary exposure. In this book, I demonstrate some techniques that can help you minimize the exposure of many common types of sensitive data.

Practices

Like the principles described in the previous section, there are many practices that you can employ to develop more secure applications. This list of practices is also small and focused to highlight the ones that I consider to be most important.

Some of these practices are abstract, but each has practical applications, which are described to clarify the intended use and purpose of each.

Balance Risk and Usability

While user friendliness and security safeguards are not mutually exclusive, steps taken to increase security often decrease usability. While it’s important to consider illegitimate uses of your applications as you write your code, it’s also important to be mindful of your legitimate users. The appropriate balance can be difficult to achieve, and it’s something that you have to determine for yourself—no one else can determine the best balance for your applications.

Try to employ the use of safeguards that are transparent to the user. If this isn’t possible, try to use safeguards that are already familiar to the user (or likely to be). For example, providing a username and password to gain access to restricted information or services is an expected procedure.

When you suspect foul play, realize that you might be mistaken and act accordingly. For example, it is a common practice to prompt users to enter their password again whenever their identity is in question. This is a minor hassle to legitimate users but a substantial obstacle to an attacker. Technically, this is almost identical to prompting users to authenticate themselves again entirely, but the user experience is much friendlier.

There is very little to gain by logging users out entirely or chiding them about an alleged attack. These approaches degrade usability substantially when you make a mistake, and mistakes happen.

In this book, I focus on providing safeguards that are either transparent or expected, and I encourage careful and sensible reactions to suspected attacks.

Track Data

The most important thing you can do as a security-conscious developer is keep track of data at all times—not only what it is and where it is, but also where it’s from and where it’s going. Sometimes this can be difficult, especially without a firm understanding of how the Web works, and this is why inexperienced web developers are prone to making mistakes that yield security vulnerabilities, even when they have experience developing applications in other environments.

Most people who use email are not easily fooled by spam with a subject of “Re: Hello”—they recognize that the subject can be forged, and therefore the email isn’t necessarily a reply to a previous email with a subject of “Hello.” In short, people know not to place much trust in the subject. Far fewer people realize that the From header can also be forged. They mistakenly believe that this reliably indicates the email’s origin.

The Web is very similar, and one of the things I want to teach you is how to distinguish between the data that you can trust and the data that you cannot. It’s not always easy, but blind paranoia certainly isn’t the answer.

PHP helps you identify the origin of most data—superglobal arrays such as $_GET, $_POST, and $_COOKIE clearly identify input from the user. A strict naming convention can help you keep up with the origin of all data throughout your code, and this is a technique that I frequently demonstrate and highly recommend.

While understanding where data enters your application is paramount, it is also very important to understand where data exits your application. When you use echo, for example, you are sending data to the client. When you use mysql_query(), you are sending data to a MySQL database (even when the purpose of the query is to retrieve data).

When I audit a PHP application for security vulnerabilities, I focus on the code that interacts with remote systems. This code is the most likely to contain security vulnerabilities, and it therefore demands the most careful attention to detail during development and during peer reviews.

Filter Input

Filtering is one of the cornerstones of web application security. It is the process by which you prove the validity of data. By ensuring that all data is properly filtered on input, you can eliminate the risk that tainted (unfiltered) data is mistakenly trusted or misused in your application. The vast majority of security vulnerabilities in popular PHP applications can be traced to a failure to filter input.

When I refer to filtering input, I am really describing three different steps:

  • Identifying input

  • Filtering input

  • Distinguishing between filtered and tainted data

The first step is to identify input because if you don’t know what it is, you can’t be sure to filter it. Input is any data that originates from a remote source. For example, anything sent by the client is input, although the client isn’t the only remote source of data—other examples include database servers and RSS feeds.

Data that originates from the client is easy to identify—PHP provides this data in superglobal arrays, such as $_GET and $_POST. Other input can be more difficult to identify—for example, $_SERVER contains many elements that can be manipulated by the client. It’s not always easy to determine which elements in $_SERVER constitute input, so a best practice is to consider this entire array to be input.

What you consider to be input is a matter of opinion in some cases. For example, session data is stored on the server, and you might not consider the session data store to be a remote source. If you take this stance, you can consider the session data store to be an integral part of your application. It is wise to be mindful of the fact that this ties the security of your application to the security of the session data store. This same perspective can be applied to a database because the database can be considered a part of the application as well.

Generally speaking, it is more secure to consider data from session data stores and databases to be input, and this is the approach that I recommend for any critical PHP application.

Once you have identified input, you’re ready to filter it. Filtering is a somewhat formal term that has many synonyms in common parlance—sanitizing, validating, cleaning, and scrubbing. Although some people differentiate slightly between these terms, they all refer to the same process—preventing invalid data from entering your application.

Various approaches are used to filter data, and some are more secure than others. The best approach is to treat filtering as an inspection process. Don’t correct invalid data in order to be accommodating—force your users to play by your rules. History has shown that attempts to correct invalid data often create vulnerabilities. For example, consider the following method intended to prevent file traversal (ascending the directory tree):

    <?php

    $filename = str_replace('..', '.', $_POST['filename']);

    ?>

Can you think of a value of $_POST['filename'] that causes $filename to be ../../etc/passwd? Consider the following:

    .../.../etc/passwd

This particular error can be corrected by continuing to replace the string until it is no longer found:

    <?php

    $filename = $_POST['filename'];

    while (strpos($_POST['filename'], '..') !=  = FALSE)
    {
      $filename = str_replace('..', '.', $filename);
    }

    ?>

Of course, the basename() function can replace this entire technique and is a safer way to achieve the desired goal. The important point is that any attempt to correct invalid data can potentially contain an error and allow invalid data to pass through. Inspection is a much safer alternative.

In addition to treating filtering as an inspection process, you want to use a whitelist approach whenever possible. This means that you want to assume the data that you’re inspecting to be invalid unless you can prove that it is valid. In other words, you want to err on the side of caution. Using this approach, a mistake results in your considering valid data to be invalid. Although undesirable (as any mistake is), this is a much safer alternative than considering invalid data to be valid. By mitigating the damage caused by a mistake, you increase the security of your applications. Although this idea is theoretical in nature, history has proven it to be a very worthwhile approach.

If you can accurately and reliably identify and filter input, your job is almost done. The last step is to employ a naming convention or some other practice that can help you to accurately and reliably distinguish between filtered and tainted data. I recommend a simple naming convention because this can be used in both procedural and object-oriented paradigms. The convention that I use is to store all filtered data in an array called $clean. This allows you to take two important steps that help to prevent the injection of tainted data :

  • Always initialize $clean to be an empty array.

  • Add logic to detect and prevent any variables from a remote source named clean.

In truth, only the initialization is crucial, but it’s good to adopt the habit of considering any variable named clean to be one thing—your array of filtered data. This step provides reasonable assurance that $clean contains only data that you knowingly store therein and leaves you with the responsibility of ensuring that you never store tainted data in $clean.

In order to solidify these concepts, consider a simple HTML form that allows a user to select among three colors:

    <form action="process.php" method="POST">
    Please select a color:
    <select name="color">
      <option value="red">red</option>
      <option value="green">green</option>
      <option value="blue">blue</option>
    </select>
    <input type="submit" />
    </form>

In the programming logic that processes this form, it is easy to make the mistake of assuming that only one of the three choices can be provided. As you will learn in Chapter 2, the client can submit any data as the value of $_POST['color']. To properly filter this data, you can use a switch statement:

    <?php

    $clean = array();

    switch($_POST['color'])
    {
      case 'red':
      case 'green':
      case 'blue':
        $clean['color'] = $_POST['color'];
        break;
    }

    ?>

This example first initializes $clean to an empty array in order to be certain that it cannot contain tainted data. Once it is proven that the value of $_POST['color'] is one of red, green, or blue, it is stored in $clean['color']. Therefore, you can use $clean['color'] elsewhere in your code with reasonable assurance that it is valid. Of course, you could add a default case to this switch statement to take a particular action in the case of invalid data. One possibility is to display the form again while noting the error—just be careful not to output the tainted data in an attempt to be friendly.

While this particular approach is useful for filtering data against a known set of valid values, it does not help you filter data against a known set of valid characters. For example, you might want to assert that a username may contain only alphanumeric characters:

    <?php

    $clean = array();

    if (ctype_alnum($_POST['username']))
    {
      $clean['username'] = $_POST['username'];
    }

    ?>

Although a regular expression can be used for this particular purpose, using a native PHP function is always preferable. These functions are less likely to contain errors than code that you write yourself is, and an error in your filtering logic is almost certain to result in a security vulnerability.

Escape Output

Another cornerstone of web application security is the practice of escaping output—escaping or encoding special characters so that their original meaning is preserved. For example, O'Reilly is represented as O\'Reilly when being sent to a MySQL database. The backslash before the apostrophe is there to preserve it—the apostrophe is part of the data and not meant to be interpreted by the database.

As with filtering input, when I refer to escaping output , I am really describing three different steps:

  • Identifying output

  • Escaping output

  • Distinguishing between escaped and unescaped data

Tip

It is important to escape only filtered data. Although escaping alone can prevent many common security vulnerabilities, it should never be regarded as a substitute for filtering input. Tainted data must be first filtered and then escaped.

To escape output, you must first identify output. In general, this is much easier than identifying input because it relies on an action that you take. For example, to identify output being sent to the client, you can search for strings such as the following in your code:

  • echo

  • print

  • printf

  • <?=

As the developer of an application, you should be aware of every case in which you send data to a remote system. These cases all constitute output.

Like filtering, escaping is a process that is unique for each situation. Whereas filtering is unique according to the type of data you’re filtering, escaping is unique according to the type of system to which you’re sending data.

For most common destinations (including the client, databases, and URLs), there is a native escaping function that you can use. If you must write your own, it is important to be exhaustive. Find a reliable and complete list of every special character in the remote system and the proper way to represent each character so that it is preserved rather than interpreted.

The most common destination is the client, and htmlentities() is the best escaping function for escaping data to be sent to the client. Like most string functions, it takes a string and returns the modified version of the string. However, the best way to use htmlentities() is to specify the two optional arguments—the quote style (the second argument) and the character set (the third argument). The quote style should always be ENT_QUOTES in order for the escaping to be most exhaustive, and the character set should match the character set indicated in the Content-Type header that your application includes in each response.

To distinguish between escaped and unescaped data, I advocate the use of a naming convention. For data to be sent to the client, the convention I use is to store all data escaped with htmlentities() in $html, an array that is initialized to an empty array and contains only data that has been both filtered and escaped:

    <?php

    $html = array();

    $html['username'] = htmlentities($clean['username'],
      ENT_QUOTES, 'UTF-8');

    echo "<p>Welcome back, {$html['username']}.</p>";

    ?>

Tip

The htmlspecialchars() function is almost identical to htmlentities(). It accepts the same arguments, and the only difference is that it is less exhaustive.

By using $html['username'] when sending the username to the client, you can be sure that special characters are not interpreted by the browser. If the username contains only alphanumeric characters, the escaping is not actually necessary, but it is a practice that adheres to Defense in Depth. Consistently escaping all output is a good habit that dramatically increases the security of your applications.

Another popular destination is a database. When possible, you should escape data used in an SQL query with an escaping function native to your database. For MySQL users, the best escaping function is mysql_real_escape_string(). If there is no native escaping function for your database, addslashes() can be used as a last resort.

The following example demonstrates the proper escaping technique for a MySQL database:

    <?php

    $mysql = array();

    $mysql['username'] =
      mysql_real_escape_string($clean['username']);

    $sql = "SELECT *
            FROM   profile
            WHERE  username = '{$mysql['username']}'";

    $result = mysql_query($sql);

    ?>

Get Essential PHP Security now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.