Protect Your Website from Malicious Input: Validation with Filters

For years (and years) we've done data validation in PHP "by hand" either with string functions or with regular expressions. The problem with PHP has been that, well, PHP programmers aren't so great at regexs. Also, unlike Perl, there's no data tainting feature that forces you to validate your inputs before they're used in expressions. (Correction, there is a tainting extension.) The upshot is that a lot of bad data gets through. PHP finally added some data validation functions, but it seems like nobody is using them. It still lacks tainting. The former problem, we can address.

I suspect people don't use the built-in validation functions because they're relatively new, because the functions are a little cumbersome, and old habits die hard, especially when the old habit requires less typing. Here's a function that wraps a call to filter_input, validating the value of the parameter 'c'.

function get_config_id() {
  static $config_id = NULL;
  if ($config_id==NULL) {
    $config_id = filter_input( INPUT_GET, 'c', FILTER_VALIDATE_INT, 
      array('options'=>array('min_range'=>1, 'max_range'=>1000))
  $config_id = $config_id ? $config_id : 0;
  return $config_id;

That code does a little bit of variable caching, using the static keyword. That's a little C trick there to prevent calling the filter_input function twice. It speeds up the system.

After thinking about this, a lot, I've concluded that it's impossible to avoid writing a lot of validation code. Validating input requires creating some filter functions to filter common types of data, then comparing the input to the acceptable range of data. Then, if you don't have a value, possibly setting a default value. Additionally, to avoid repeating the process, you want to cache the value in a variable.

For more info see Filtering at PHP.

Sample Code

Extracted from something from 2011. This is for a little server backing an AJAX user interface. It is example code that demonstrates how to check for existence, and import the value into your script, and fail otherwise. The code isn't complex, but it's verbose, so you really need to copy-paste this stuff, or you will just get lazy and not do validation correctly.

// restore the session
session_regenerate_id(); // prevents session hijacking

$user = $_SESSION['user'];
$role = $_SESSION['role'];

// validate input parameters

// test for invalid parameter names
foreach( array_keys($_GET) as $key )
  if (! in_array(array('foo','bar','baz'), $key))
    http_bad_request('Excess parameter.');

// this parameter is optional
if (isset($_GET['foo']))
  if (filter_var($_GET['foo'], FILTER_VALIDATE_EMAIL))
    $foo = $_GET['foo'];

// this parameter must exist
if (isset($_GET['bar']))
  if (filter_var($_GET['bar'], FILTER_SANITIZE_INT))
    $bar = $_GET['bar'];
    http_bad_request('Invalid value for bar.');
  http_bad_request('Parameter missing.');

// this parameter must exist
if (isset($_GET['baz']))
  if (filter_var($_GET['baz'], FILTER_VALIDATE_BOOLEAN))
    $baz = $_GET['baz'];
    exit; // die silently
  exit; // die silently

// access control
if (! filter_var($user, FILTER_VALIDATE_INT))
  http_unauthorized('User ID required.');

if ( $role != ROLE_ADMIN )