Protect Your Website from Malicious Input: Validation with Filters

For years (and years) we’ve done data validation in PHP “by hand” either with string functions or with regular expressions. The problem with PHP has been that, well, PHP programmers aren’t so great at regexs. Also, unlike Perl, there’s no data tainting feature that forces you to validate your inputs before they’re used in expressions. (Correction, there is a tainting extension.) The upshot is that a lot of bad data gets through. PHP finally added some data validation functions, but it seems like nobody is using them. It still lacks tainting. The former problem, we can address.

I suspect people don’t use the built-in validation functions because they’re relatively new, because the functions are a little cumbersome, and old habits die hard, especially when the old habit requires less typing. Here’s a function that wraps a call to filter_input, validating the value of the parameter ‘c’.

function get_config_id() {
  static $config_id = NULL;
  if ($config_id==NULL) {
    $config_id = filter_input( INPUT_GET, 'c', FILTER_VALIDATE_INT, 
      array('options'=>array('min_range'=>1, 'max_range'=>1000))
    );
  }
  $config_id = $config_id ? $config_id : 0;
  return $config_id;
}

That code does a little bit of variable caching, using the static keyword. That’s a little C trick there to prevent calling the filter_input function twice. It speeds up the system.

After thinking about this, a lot, I’ve concluded that it’s impossible to avoid writing a lot of validation code. Validating input requires creating some filter functions to filter common types of data, then comparing the input to the acceptable range of data. Then, if you don’t have a value, possibly setting a default value. Additionally, to avoid repeating the process, you want to cache the value in a variable.

For more info see Filtering at PHP.

Sample Code

Extracted from something from 2011. This is for a little server backing an AJAX user interface. It is example code that demonstrates how to check for existence, and import the value into your script, and fail otherwise. The code isn’t complex, but it’s verbose, so you really need to copy-paste this stuff, or you will just get lazy and not do validation correctly.

// restore the session
session_start();
session_regenerate_id(); // prevents session hijacking

$user = $_SESSION['user'];
$role = $_SESSION['role'];

// validate input parameters
require_once('../httperror.lib.php');

// test for invalid parameter names
foreach( array_keys($_GET) as $key )
  if (! in_array(array('foo','bar','baz'), $key))
    http_bad_request('Excess parameter.');

// this parameter is optional
if (isset($_GET['foo']))
  if (filter_var($_GET['foo'], FILTER_VALIDATE_EMAIL))
    $foo = $_GET['foo'];

// this parameter must exist
if (isset($_GET['bar']))
  if (filter_var($_GET['bar'], FILTER_SANITIZE_INT))
    $bar = $_GET['bar'];
  else
    http_bad_request('Invalid value for bar.');
else
  http_bad_request('Parameter missing.');

// this parameter must exist
if (isset($_GET['baz']))
  if (filter_var($_GET['baz'], FILTER_VALIDATE_BOOLEAN))
    $baz = $_GET['baz'];
  else
    exit; // die silently
else
  exit; // die silently

// access control
if (! filter_var($user, FILTER_VALIDATE_INT))
  http_unauthorized('User ID required.');

if ( $role != ROLE_ADMIN )
  http_unauthorized();