PHP Notebook

Lately, I've been doing a lot of PHP work.
This is a notebook of some things learned. It's not definitive, or even correct, but it may prove useful.

2007-2-12 - went over notebook, removed dead articles, and edited some articles for clarity. jk

A Demo of Transmitting Passwords Encrypted

This is a demo of a technique to transmit password encrypted. It's not a perfect solution yet, but it's getting there. It's based on an idea by Slaks at Stackoverflow, at this thread.

Paste this into a file, and execute it through your server.

<?php

$a = "foo"; // a salt used to store password in the db
$b = "bar"; // a salt generated for this session

$p = "magic"; // the password

$hashed = md5($p.$a); // this is what's in the db

$key = md5($hashed.$b);

// pretend that we are now at the client
// we've send a and b to the client, but not p (which we don't know)

$p2 = "magic"; // the user enters this password
// change p2 to see how this works

$hashed2 = md5($p2.$a);

$key2 = md5($hashed2.$b);

$data = base64_encode( "username:$p2" ^ $key2 );

// now back at the server

$check = $key ^ base64_decode($data);

echo $check;

A Totally Simple Text Captcha Class

First draft of a "captcha" class that asks the user the sum of two numbers. It helps you write code to ask a question like this: What is the total of five plus three? (answer with a word) [ ] The user should respond with "eight".

How it Works

A captcha needs to "remember" the question that it posed to the user. To do this, the constructor takes an argument that's used as the key. It then saves the captcha data to a file in tmp/ named with the key. I suggest using the value of $_SERVER['REMOTE_ADDR'] or their username. It needs to be reasonably unique. When a captcha is made, it tries to read in the saved captcha. If it cannot, it creates a new one, and saves it. After it's created, you can clear it with the clear() method. You should clear it after it's been answered. The captcha algorithm is simple. It picks two random number between 1 and 5, and the answer is the sum. Everything is done in written English, mainly to make it a little harder for bots. The toString() method returns the question as a string like "four plus two". The answerMatches() method is provided to compare the answer in English, to the stored answer (a number). You should check out the test_craptcha_class.inc.php, below, to see how to use it. craptcha_class.inc.php

<?php
	// vim:set ts=4 sw=4 ai:
	/**
	 * A pseudo-captcha class.  It asks a math question and requests
	 * an answer.
	 */
	
	class Craptcha
	{
		var $a;
		var $b;
		var $answer;
		var $file;
		function Craptcha( $key )
		{
			$dir = 'tmp/craptcha';
			if (!is_dir($dir))
			{
				mkdir($dir);
			}
			$this->file = $file = "$dir/$key";
			if (file_exists($file))
			{
				list($this->a,$this->b,$this->answer) = 
					unserialize(file_get_contents($file));
			}
			else
			{
				$this->a = rand(1,5);
				$this->b = rand(1,5);
				$this->answer = $this->a + $this->b;
				$text = serialize(array($this->a,$this->b,$this->answer));
				$fh = fopen($file,'w');
				fwrite($fh, $text);
				fclose($fh);
			}
		}
		function clear()
		{
			if (file_exists($this->file))
				unlink($this->file);
		}
		function answerMatches( $word )
		{
			return ($this->numToString( $this->answer) == $word);
		}
		function toString()
		{
			return $this->numToString( $this->a ) . ' plus ' .
				   $this->numToString( $this->b );
		}
		function getAnswer()
		{
			return $this->numToString( $this->answer );
		}
		function numToString( $num )
		{
			switch($num)
			{
				case 1: return 'one';
				case 2: return 'two';
				case 3: return 'three';
				case 4: return 'four';
				case 5: return 'five';
				case 6: return 'six';
				case 7: return 'seven';
				case 8: return 'eight';
				case 9: return 'nine';
				case 10: return 'ten';
			}
		}
	}
?>
The following is test_craptcha_class.inc.php

<?php
	// vim:set ts=4 sw=4 ai:
	include('craptcha_class.inc.php');

	$guess = $_REQUEST['guess'];
	
	$c = new Craptcha( $_SERVER['REMOTE_ADDR'] );

	if ($guess)
	{
		if ($c->answerMatches($guess))
		{
			$c->clear();
			echo "correct<br>";
			echo "<a href=?>again";
			exit;
		}
		else
		{
			echo "incorrect<br>";
			echo "<a href=?>again";
			exit;
		}
	}
?>
<form>
	What is the total of <b><?=$c->toString()?></b>?
	(answer in written English)
	<input type="text" name="guess">
	<input type="submit">
</form>

An Argument Against the Traditional Iterator

Attached are two scripts that contain code for two styles of iterator. One is the traditional Iterator, and the other is what I'll call an IterObj. The Iterator requres separation of the iterator object and the objects that are returned by the iterator.

The IterObj combines the two parts into one. It's an object that also implements the iterator interface. It's "bad" OO design practice, because you're combining concerns, and also possibly creating repetitive code.

Test results indicate that the IterObj takes 1/18th the time to loop over the arrays. So, on a computer that's creating a bunch of iterators and objects, using the IterObj style can help improve performance.

That said, the performance cost of iteration is probably negligible compared to the cost of a database query or disk access.

If you think of what the traditional iterator style costs in additional database queries, it's pretty high. (See the link.) An IterObj that integrates calls to the database can iterate over the results of a single query.

The main tradeoff is memory -- the database result is held in memory on the database server until the iterator finishes. What you save in CPU and memory on the web server, you pay for on the database server. If they're on the same machine, it may not be an issue.

Code follows for each:

class Obj {
	function __construct($id)
	{
		$this->id = $id;
	}
}

class Iter {
	function __construct( $array, $objName )
	{
		$this->list = $array;
		$this->__objName = $objName;
	}
	function next()
	{
		$id = current($this->list);
		next($this->list);
		if ($id)
			return new $this->__objName($id);
		else
			return Null;
	}
}

And the IterObj type:

class IterObj {
	function __construct( $ar )
	{
		$this->list = $ar;
		$this->id = current($this->list);
	}
	function next()
	{
		$this->id = next($this->list);
	}
	function rewind()
	{
		reset($this->list);
		$this->id = current($this->list);
	}
}

http://www.devarticles.com/c/a/PHP/Building-an-Iterator-with-PHP/1/

AttachmentSize
test.iter_.php.txt579 bytes
test.iterobj.php.txt486 bytes

An Idea for Managing Security Quickly

Just an idea for implementing it quickly.

You can take existing code and bolt this code on, and then use it to produce somewhat readable code with rudimentary security features.

// $GROUPS = array( 'Guest', 'Unconfirmed', 'Approval', 'Active', 'Rejected', 'Admin', 'Moderator' ); 

$ACCESS = array(
	'view_friends' => array( 'group', 'group2', 'group3' ),
	'view_comments' => array( 'group', 'group2', 'group3' ),
	'suspend_user' => array( 'group', 'group2', 'group3' ),
	'use_chatroom' => array( 'group', 'group2', 'group3' ),
	'send_message_as_mail' => array( 'group', 'group2', 'group3' ),
	'ability' => array( 'group', 'group2', 'group3' )
);

function cando( $user, $ability )
{
	// get the user's groups
	// check permissions for an ability
	// return true if their groups are in the ability's array
}

Another Cheap Thumb Gallery Maker

One thing about scripting is that, sometimes, a little fragment of code will get used over and over, and it's not a full-featured app. This script just looks over a directory full of specially named files, and produce a "photo gallery" for it. Hundreds of these scripts have been modified to do all kinds of cool things, but this one doesn't do any of those things. The script just gets copied into a directory, and it produces an index of images. Scripts like this won't ever die. They won't get replaced by more elaborate scripts. They've evolved to fit their niche.

<?php // vi:set ts=4 sw=4 ai:

$files = array();
$dh = opendir('.');
while( $d = readdir($dh) )
{
	if (preg_match('/-400.jpg/', $d))
	{
		array_push( $files, $d );
	}
}
sort($files);
foreach( $files as $d )
{
	$t = preg_replace('/-400/','-800', $d);
	list( $width, $height ) = getimagesize( $d );
	$path = preg_replace('/index\\.php/','',$_SERVER['REQUEST_URI']);
	echo "<a href='$path/$t' target='_blank'><img src='$path/$d' width='$width' height='$height' border='0' /></a><br /><br />";
}
?>

Arrays and References Gotcha, Array Passed by Value Behaves Like Array Passed by Reference

In PHP 5, when you pass an array by value, but the array contains references, then, you will get some subtly weird behavior.

$args = array();
$args['x'] =& $_REQUEST['x'];

echo "<p>$args[x]</p>";

f($args);

echo "<p>$args[x]</p>";

$x = $_REQUEST['x'];

echo "<p>$x</p>";

function f($a)
{
	$a['x'] = 100;
}
Calling it like this: test.php?x=5 Displays: 5 100 100


$args = array();
$args['x'] = $_REQUEST['x'];

echo "<p>$args[x]</p>";

f($args);

echo "<p>$args[x]</p>";

$x = $_REQUEST['x'];

echo "<p>$x</p>";

function f($a)
{
	$a['x'] = 100;
}
That displays: 5 5 5 This second example is the "expected" behavior. The array is passed to f() by value, so we expect change made within f() to remain local to f(). However, the array $args is an array of references. That is, it's an array, but it's element 'x' refers to $_REQUEST['x']. So, when $args is copied as an argument to f(), it's copies the reference. Then, within f(), $a['x'] refers to $_REQUEST['x']. So changes to $a['x'] affect $_REQUEST['x']. This can cause problems if you are creating objects based on $args, like this:
$args =& $_REQUEST['x'];
$obj1 = new Obj($args);
$obj2 = new Obj($args);

class Obj {
function Obj($args) {
  $this->args = $args;
}
}
In this situation, changes to $obj1->args['x'] will affect $obj2->args['x'], because they refer to the same variable!

Associative Array (Hash) Tricks in PHP

Suppose you're tallying some data. You want to know if a value has shown up in the data. You can do this by using the array, like this:

$tally = array();
while( $row = getData() )
    array_push($tally, $row);
if (in_array($tally, 'theValueIWant'))
    echo "yes";
else
    echo "no";

There's a faster way, by using the array key. (It should be faster.)

while( $tally[ getData() ] = 1 ) { }
if ($tally['theValueIWant'])
    echo "yes";
else
    echo "no";

We're taking advantage of the hashing feature. This technique is widely used in Perl.

The associative array (hash) is a sophisticated data structure. Think of it as a rapid search table. That's what it is, after all.

To make it even shorter, you can use the ternary operator:

while( $tally[ getData() ] = 1 ) { }
echo $tally['theValueIWant'] ? 'yes' : 'no';

Another trick is to get a list of values, with duplicates removed:

while( $tally[ getData() ] = 1 ) { }
$result = array_keys($tally);

There you go. No sorts, comparisons, or anything like that. You just load the data into the keys, and duplicate values are eliminated.

Here's how to get the histogram of the data. That's the number of times a data value occurs.

while( $tally[ getData() ]++ ) { }
print_r(ksort($tally));

CSV Files, Comma Separated Values

This is the latest version of the CSV class, but renamed to work with CakePHP.

The main change is the addition of a feature that will let you save the headings into the object, then parse lines and return them as associated arrays rather than numerically indexed arrays.

Here's some sample code showing how to use it, followed by the class:


$csv = new csvComponent();
$fh = fopen($filename, 'r');
// First, detect the headings -- this feature allows us to put some
// information above the actual data.
while( $data = fgets( $fh ) )
{
	$line = $csv->textToArray( $data );
	if (preg_match('/name/i',$line[0])) break;
}
// tell the csv object about our headings
$csv->setHeadings($line);
// print out the data
while( $data = fgets( $fh ) )
{
	$line = $csv->textToAssoc( $data );
	print_r($line);
	echo "

"; }


<?php
define('CSV_TAB',"\t");

class csvComponent {
        var $template;

        /**
         * @param array $template
         *
         * The $template is an array used to selectively quote fields.
         * If the Nth element is 'quote', the Nth field will be quoted.
         * For example, array('','','quote') causes the 3rd field to be
         * quoted.
         */
        function CSV( $template = NULL )
        {
                $this->template = $template;
        }

        function arrayToText( $ar, $separator=',' )
        {
                $row = array();
                reset($ar);
                $count=0;
                foreach($ar as $field)
                {
                        if ($this->template[$count] != 'quote')
                        {
                                $field = '"'.$this->quote($field).'"';
                        }
                        $row[] = $field;
                        $count++;
                }
                return join($separator,$row);
        }

        /**
         * Parses one line of a csv file.
         */
        function &textToArray( $str, $separator=',' )
        {
                $out = array();
                while($str)
                {
                        if (preg_match('/^"/', $str))
                        {
                                if (preg_match("/\"(.+?)\"$separator(.+)$/", $str, $matches))
                                {
                                        $head = $this->dequote($matches[1]);
                                        $str = $matches[2];
                                }
                                else // assume it's the last element
                                {
                                        $head = $this->dequote($str);
                                        $str = '';
                                }
                        }
                        else
                        {
                                if (preg_match("/^$separator/",$str))
                                {
                                        // this is a special case of a null field
                                        // it's exceptional, because the . metachar matches
                                        // non-whitespace, and our separator might be whitespace
                                        $head = '';
                                        $str = substr($str,1);
                                }
                                else if (preg_match("/(.+?)$separator(.+)$/", $str, $matches))
                                {
                                        $head = $matches[1];
                                        $str = $matches[2];
                                }
                                else // assume it's the last element
                                {
                                        $head = $str;
                                        $str = '';
                                }
                        }
                        $out[] = $head;
                }
                return $out;
        }
	function textToAssoc( $text, $separator=',' )
	{
		$arr = $this->textToArray( $text, $separator );
		reset($arr);
		for($i=0; $i < count($arr); $i++)
			$output[ $this->headings[$i] ] = $arr[$i];
		return $output;
	}
	function setHeadings( $arr )
	{
		reset($arr);
		$i=0;
		foreach($arr as $heading)
			$this->headings[$i++] = $heading;
	}

        function quote( $s )
        {
                $s = preg_replace('/"/','""', $s);
                return $s;
        }

        function dequote( $s )
        {
                $s = preg_replace('/""/','"', $s);
                return $s;
        }
}
?>

Caching Data for AJAX with Javascript

Here's a way to cache data on the client side, via javascript. This was tested on Firefox 3.6.3 on Ubuntu.

The idea is to convert your data into Javascript, and then load it with the SCRIPT tag. You then use the Expires HTTP header to tell the client how long to cache the data. Finally, you use some Javascript code to display the data.

I read that if you POST the page back to the server, the cache is invalidated, and code is reloaded. My browser isn't doing this. GET, on the other hand, doesn't clear the cache. So, the trick is to use AJAX to post data to the site.

This short example show how to do it.

data.js.php:

<?php
header("Expires: ".date('r',time()+10));
$randchar1 = chr(rand(65,90));
?>
data = '<?=$randchar1?>'

This creates a random character, stores it as data, and sets the document to expire 10 seconds into the future.

code.js:

function showdata()
{
  document.write(data)
}

This just displays the data.

test.html:

<html>
<head>
<script type="text/javascript" src="data.js.php"></script>
<script type="text/javascript" src="code.js"></script>
<script type="text/javascript">
showdata();
</script>
<p><a href="test.html">test</a></p>
<form methoc="post">
<input type="submit">
</form>

You can try the example here.

Now, load up test.html in your browser. (This must be installed on a server running PHP, and you must view the page through the server. i.e. http://localhost/test.html. Don't just view the file.)

Now, start clicking the "test" link at the bottom of the page. You'll see that the data stays the same for 10 seconds, and then changes value.

Try clicking the submit button to see if it causes the cache to expire. Mine didn't.

Note - if your server has the wrong time, the cache behavior will either take too long, or not happen at all.

CakePHP Notes

1. I can't figure out how to use a composite primary key.
2. I can't figure out how to make a table without a primary key.

This is frustrating, because I have a join table that represents attendance at an event - a joining of an event and a person. There's no need for a third column with an "id" for that row, but that was the fix - to add extraneous information.

Character Set Conversion from Latin-1 ISO-8859-1 cp1252 to UTF-8

I swiped this code from php.net.

Character set conversion is one of those things I've avoided over the years. Just use UTF-8 from the start. But IMC has thousands of articles stored as a BLOB datatype, so that it's text in various character sets. The software in front of the data was using ISO-8859-1, but PHP wasn't really mangling the data -- it just passed the binary through unchanged, until I installed the mbstring extension (or more accurately, it was baked into PHP). That caused some problems, and it snowballed into converting everything to UTF-8.

There are five dominant character sets used to enter this data: ascii, iso-8859-1 (aka latin-1), windows-cp1252, and utf-8.

As you probably know, ascii is a subset of the other four, so we can ignore that.

Latin-1 is nominally the charset used in the app, but most users produce data in Windows, and seemed to paste cp1252 codes into the app. cp1252 is an extension to latin-1 that includes things like the curly quotes and em-dashes. Word produces these automatically, so a lot of these characters get pasted into the app. Fortunately, these character codes exist in a range within latin-1 that are not printable.

People also pasted UTF-8 encoded text into the app as well. UTF-8 has all the glyphs of latin-1, except most have different character codes.

So converting the data requires something that converts from cp1252 to UTF-8. Unfortunately, PHP doesn't include such a function. Instead, it has utf8_encode, which converts from latin-1 to utf8. So someone wrote fix_latin, which deals with this hybrid. Code is below.

fix_latin has enough logic to avoid converting utf-8 encoded data, which would result in mangled data.

The Mac is a whole other problem. Today, it uses utf-8, so it's ok, but back in the 80s and 90s, they had a different character set. Unlike Windows, the mappings were totally different. More info here. And see the mapping at madore.org.

Mac text creates a real problem - how to identify if the text was produced on a Mac or on Windows. Conversion isn't the problem. Identification is much harder, because I'm not there to look at each file and determine if it's MacRoman or latin-1. Here's a stackoverflow post about this.

$byte_map=array();
init_byte_map();
$nibble_good_chars = '@^([\x00-\x7F]+|[\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2}|[\xF0-\xF7][\x80-\xBF]{3}|[\xF8-\xFB][\x80-\xBF]{4})(.*)$@s';
function init_byte_map(){
  global $byte_map;
  for($x=128;$x<256;++$x){
    $byte_map[chr($x)]=utf8_encode(chr($x));
  }
  $cp1252_map=array(
    "\x80"=>"\xE2\x82\xAC",    // EURO SIGN
    "\x82" => "\xE2\x80\x9A",  // SINGLE LOW-9 QUOTATION MARK
    "\x83" => "\xC6\x92",      // LATIN SMALL LETTER F WITH HOOK
    "\x84" => "\xE2\x80\x9E",  // DOUBLE LOW-9 QUOTATION MARK
    "\x85" => "\xE2\x80\xA6",  // HORIZONTAL ELLIPSIS
    "\x86" => "\xE2\x80\xA0",  // DAGGER
    "\x87" => "\xE2\x80\xA1",  // DOUBLE DAGGER
    "\x88" => "\xCB\x86",      // MODIFIER LETTER CIRCUMFLEX ACCENT
    "\x89" => "\xE2\x80\xB0",  // PER MILLE SIGN
    "\x8A" => "\xC5\xA0",      // LATIN CAPITAL LETTER S WITH CARON
    "\x8B" => "\xE2\x80\xB9",  // SINGLE LEFT-POINTING ANGLE QUOTATION MARK
    "\x8C" => "\xC5\x92",      // LATIN CAPITAL LIGATURE OE
    "\x8E" => "\xC5\xBD",      // LATIN CAPITAL LETTER Z WITH CARON
    "\x91" => "\xE2\x80\x98",  // LEFT SINGLE QUOTATION MARK
    "\x92" => "\xE2\x80\x99",  // RIGHT SINGLE QUOTATION MARK
    "\x93" => "\xE2\x80\x9C",  // LEFT DOUBLE QUOTATION MARK
    "\x94" => "\xE2\x80\x9D",  // RIGHT DOUBLE QUOTATION MARK
    "\x95" => "\xE2\x80\xA2",  // BULLET
    "\x96" => "\xE2\x80\x93",  // EN DASH
    "\x97" => "\xE2\x80\x94",  // EM DASH
    "\x98" => "\xCB\x9C",      // SMALL TILDE
    "\x99" => "\xE2\x84\xA2",  // TRADE MARK SIGN
    "\x9A" => "\xC5\xA1",      // LATIN SMALL LETTER S WITH CARON
    "\x9B" => "\xE2\x80\xBA",  // SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
    "\x9C" => "\xC5\x93",      // LATIN SMALL LIGATURE OE
    "\x9E" => "\xC5\xBE",      // LATIN SMALL LETTER Z WITH CARON
    "\x9F" => "\xC5\xB8"       // LATIN CAPITAL LETTER Y WITH DIAERESIS
  );
  foreach($cp1252_map as $k=>$v){
    $byte_map[$k]=$v;
  }
}
function fix_latin($instr){
  if(mb_check_encoding($instr,'UTF-8'))return $instr; // no need for the rest if it's all valid UTF-8 already
  global $nibble_good_chars,$byte_map;
  $outstr='';
  $char='';
  $rest='';
  while((strlen($instr))>0){
    if(1==preg_match($nibble_good_chars,$instr,$match)){
      $char=$match[1];
      $rest=$match[2];
      $outstr.=$char;
    }elseif(1==preg_match('@^(.)(.*)$@s',$instr,$match)){
      $char=$match[1];
      $rest=$match[2];
      $outstr.=$byte_map[$char];
    }
    $instr=$rest;
  }
  return $outstr;
}

Update: Attached is a script that was used to convert an sf-active article database from a mix of encodings into UTF-8. The code is not quite straightforward, because shoving a bunch of updates into the queue caused an unintentional DOS. The system has a watchdog script, so the db will come back up. The script accounts for this, and logs successes and failures, and tests that the field is checked to see if the change is already saved. it's not throttling back its requests, because I figure one DOS out of 160k requests was OK, because the db starts back up in a few minutes.

AttachmentSize
convert.php_.txt6.17 KB

Code Generation

This is a list of some advantages to code generation over fully dynamic programming. It might be useful if you ever need to justify spending time writing code generation tools. After working on a fairly elaborate code generator that created simple SQL table editors for MySQL, I've learned a few things.

So why use a generator? Because it unrolls a loop or two, in the right places, leading to faster, less confusing code that, partially, self-documents. Generated code tends to expose the structure of the application, as code. Each chunk of code is specialized, and not responsible for large areas of the application.

Dynamic code tends to be harder to understand, because the behavior of the software can be assessed only at runtime. In a system like PHP, where scripts start up, execute, and exit in a fraction of a second, it's hard to debug dynamic software.

Code Generation, Frameworks, App Servers (not only PHP)

Here's a list of things to check out.

PHP Frameworks comparison site.

J2EE Tutorial

CakePHP

Qcodo

Ruby on Rails

Fusebox

AOL Server (TCL app server)

OK Web Server

Codeigniter

WACT

Catalyst

Python Frameworks.

Java Struts

Zend Framework

Class::DBI

Commenting Trick

This is real simple. You might comment out code like this:

/*
 $thiscode = $is + $not + $interpreted;
*/

When you want to use that chunk of code, you'd remove the comments. That would require two deletions, and restoring would require adding four characters. Here's an alternative way:

/*
 $thiscode = $is + $not + $interpreted;
// */

Note that the double-slash comment doesn't actually get interpreted because it's inside the /* */ comment block. To uncomment it:

//*
 $thiscode = $is + $interpreted;
// */

Common MIME Types, in PHP

This is based on a list of common MIME types that's been posted around the web. It's in PHP array format. The long list cannot be used as-is due to repeated keys. The long list is followed by a shorter sublist of popular formats you're likely to need.

$mime_list = array(
	'.3dm'=>'x-world/x-3dmf',
	'.3dmf'=>'x-world/x-3dmf',
	'.a'=>'application/octet-stream',
	'.aab'=>'application/x-authorware-bin',
	'.aam'=>'application/x-authorware-map',
	'.aas'=>'application/x-authorware-seg',
	'.abc'=>'text/vnd.abc',
	'.acgi'=>'text/html',
	'.afl'=>'video/animaflex',
	'.ai'=>'application/postscript',
	'.aif'=>'audio/aiff',
	'.aif'=>'audio/x-aiff',
	'.aifc'=>'audio/aiff',
	'.aifc'=>'audio/x-aiff',
	'.aiff'=>'audio/aiff',
	'.aiff'=>'audio/x-aiff',
	'.aim'=>'application/x-aim',
	'.aip'=>'text/x-audiosoft-intra',
	'.ani'=>'application/x-navi-animation',
	'.aos'=>'application/x-nokia-9000-communicator-add-on-software',
	'.aps'=>'application/mime',
	'.arc'=>'application/octet-stream',
	'.arj'=>'application/arj',
	'.arj'=>'application/octet-stream',
	'.art'=>'image/x-jg',
	'.asf'=>'video/x-ms-asf',
	'.asm'=>'text/x-asm',
	'.asp'=>'text/asp',
	'.asx'=>'application/x-mplayer2',
	'.asx'=>'video/x-ms-asf',
	'.asx'=>'video/x-ms-asf-plugin',
	'.au'=>'audio/basic',
	'.au'=>'audio/x-au',
	'.avi'=>'application/x-troff-msvideo',
	'.avi'=>'video/avi',
	'.avi'=>'video/msvideo',
	'.avi'=>'video/x-msvideo',
	'.avs'=>'video/avs-video',
	'.bcpio'=>'application/x-bcpio',
	'.bin'=>'application/mac-binary',
	'.bin'=>'application/macbinary',
	'.bin'=>'application/octet-stream',
	'.bin'=>'application/x-binary',
	'.bin'=>'application/x-macbinary',
	'.bm'=>'image/bmp',
	'.bmp'=>'image/bmp',
	'.bmp'=>'image/x-windows-bmp',
	'.boo'=>'application/book',
	'.book'=>'application/book',
	'.boz'=>'application/x-bzip2',
	'.bsh'=>'application/x-bsh',
	'.bz'=>'application/x-bzip',
	'.bz2'=>'application/x-bzip2',
	'.c'=>'text/plain',
	'.c'=>'text/x-c',
	'.c++'=>'text/plain',
	'.cat'=>'application/vnd.ms-pki.seccat',
	'.cc'=>'text/plain',
	'.cc'=>'text/x-c',
	'.ccad'=>'application/clariscad',
	'.cco'=>'application/x-cocoa',
	'.cdf'=>'application/cdf',
	'.cdf'=>'application/x-cdf',
	'.cdf'=>'application/x-netcdf',
	'.cer'=>'application/pkix-cert',
	'.cer'=>'application/x-x509-ca-cert',
	'.cha'=>'application/x-chat',
	'.chat'=>'application/x-chat',
	'.class'=>'application/java',
	'.class'=>'application/java-byte-code',
	'.class'=>'application/x-java-class',
	'.com'=>'application/octet-stream',
	'.com'=>'text/plain',
	'.conf'=>'text/plain',
	'.cpio'=>'application/x-cpio',
	'.cpp'=>'text/x-c',
	'.cpt'=>'application/mac-compactpro',
	'.cpt'=>'application/x-compactpro',
	'.cpt'=>'application/x-cpt',
	'.crl'=>'application/pkcs-crl',
	'.crl'=>'application/pkix-crl',
	'.crt'=>'application/pkix-cert',
	'.crt'=>'application/x-x509-ca-cert',
	'.crt'=>'application/x-x509-user-cert',
	'.csh'=>'application/x-csh',
	'.csh'=>'text/x-script.csh',
	'.css'=>'application/x-pointplus',
	'.css'=>'text/css',
	'.cxx'=>'text/plain',
	'.dcr'=>'application/x-director',
	'.deepv'=>'application/x-deepv',
	'.def'=>'text/plain',
	'.der'=>'application/x-x509-ca-cert',
	'.dif'=>'video/x-dv',
	'.dir'=>'application/x-director',
	'.dl'=>'video/dl',
	'.dl'=>'video/x-dl',
	'.doc'=>'application/msword',
	'.dot'=>'application/msword',
	'.dp'=>'application/commonground',
	'.drw'=>'application/drafting',
	'.dump'=>'application/octet-stream',
	'.dv'=>'video/x-dv',
	'.dvi'=>'application/x-dvi',
	'.dwf'=>'drawing/x-dwf (old)',
	'.dwf'=>'model/vnd.dwf',
	'.dwg'=>'application/acad',
	'.dwg'=>'image/vnd.dwg',
	'.dwg'=>'image/x-dwg',
	'.dxf'=>'application/dxf',
	'.dxf'=>'image/vnd.dwg',
	'.dxf'=>'image/x-dwg',
	'.dxr'=>'application/x-director',
	'.el'=>'text/x-script.elisp',
	'.elc'=>'application/x-bytecode.elisp (compiled elisp)',
	'.elc'=>'application/x-elc',
	'.env'=>'application/x-envoy',
	'.eps'=>'application/postscript',
	'.es'=>'application/x-esrehber',
	'.etx'=>'text/x-setext',
	'.evy'=>'application/envoy',
	'.evy'=>'application/x-envoy',
	'.exe'=>'application/octet-stream',
	'.f'=>'text/plain',
	'.f'=>'text/x-fortran',
	'.f77'=>'text/x-fortran',
	'.f90'=>'text/plain',
	'.f90'=>'text/x-fortran',
	'.fdf'=>'application/vnd.fdf',
	'.fif'=>'application/fractals',
	'.fif'=>'image/fif',
	'.fli'=>'video/fli',
	'.fli'=>'video/x-fli',
	'.flo'=>'image/florian',
	'.flx'=>'text/vnd.fmi.flexstor',
	'.fmf'=>'video/x-atomic3d-feature',
	'.for'=>'text/plain',
	'.for'=>'text/x-fortran',
	'.fpx'=>'image/vnd.fpx',
	'.fpx'=>'image/vnd.net-fpx',
	'.frl'=>'application/freeloader',
	'.funk'=>'audio/make',
	'.g'=>'text/plain',
	'.g3'=>'image/g3fax',
	'.gif'=>'image/gif',
	'.gl'=>'video/gl',
	'.gl'=>'video/x-gl',
	'.gsd'=>'audio/x-gsm',
	'.gsm'=>'audio/x-gsm',
	'.gsp'=>'application/x-gsp',
	'.gss'=>'application/x-gss',
	'.gtar'=>'application/x-gtar',
	'.gz'=>'application/x-compressed',
	'.gz'=>'application/x-gzip',
	'.gzip'=>'application/x-gzip',
	'.gzip'=>'multipart/x-gzip',
	'.h'=>'text/plain',
	'.h'=>'text/x-h',
	'.hdf'=>'application/x-hdf',
	'.help'=>'application/x-helpfile',
	'.hgl'=>'application/vnd.hp-hpgl',
	'.hh'=>'text/plain',
	'.hh'=>'text/x-h',
	'.hlb'=>'text/x-script',
	'.hlp'=>'application/hlp',
	'.hlp'=>'application/x-helpfile',
	'.hlp'=>'application/x-winhelp',
	'.hpg'=>'application/vnd.hp-hpgl',
	'.hpgl'=>'application/vnd.hp-hpgl',
	'.hqx'=>'application/binhex',
	'.hqx'=>'application/binhex4',
	'.hqx'=>'application/mac-binhex',
	'.hqx'=>'application/mac-binhex40',
	'.hqx'=>'application/x-binhex40',
	'.hqx'=>'application/x-mac-binhex40',
	'.hta'=>'application/hta',
	'.htc'=>'text/x-component',
	'.htm'=>'text/html',
	'.html'=>'text/html',
	'.htmls'=>'text/html',
	'.htt'=>'text/webviewhtml',
	'.htx'=>'text/html',
	'.ice'=>'x-conference/x-cooltalk',
	'.ico'=>'image/x-icon',
	'.idc'=>'text/plain',
	'.ief'=>'image/ief',
	'.iefs'=>'image/ief',
	'.iges'=>'application/iges',
	'.iges'=>'model/iges',
	'.igs'=>'application/iges',
	'.igs'=>'model/iges',
	'.ima'=>'application/x-ima',
	'.imap'=>'application/x-httpd-imap',
	'.inf'=>'application/inf',
	'.ins'=>'application/x-internett-signup',
	'.ip'=>'application/x-ip2',
	'.isu'=>'video/x-isvideo',
	'.it'=>'audio/it',
	'.iv'=>'application/x-inventor',
	'.ivr'=>'i-world/i-vrml',
	'.ivy'=>'application/x-livescreen',
	'.jam'=>'audio/x-jam',
	'.jav'=>'text/plain',
	'.jav'=>'text/x-java-source',
	'.java'=>'text/plain',
	'.java'=>'text/x-java-source',
	'.jcm'=>'application/x-java-commerce',
	'.jfif'=>'image/jpeg',
	'.jfif'=>'image/pjpeg',
	'.jfif-tbnl'=>'image/jpeg',
	'.jpe'=>'image/jpeg',
	'.jpe'=>'image/pjpeg',
	'.jpeg'=>'image/jpeg',
	'.jpeg'=>'image/pjpeg',
	'.jpg'=>'image/jpeg',
	'.jpg'=>'image/pjpeg',
	'.jps'=>'image/x-jps',
	'.js'=>'application/x-javascript',
	'.jut'=>'image/jutvision',
	'.kar'=>'audio/midi',
	'.kar'=>'music/x-karaoke',
	'.ksh'=>'application/x-ksh',
	'.ksh'=>'text/x-script.ksh',
	'.la'=>'audio/nspaudio',
	'.la'=>'audio/x-nspaudio',
	'.lam'=>'audio/x-liveaudio',
	'.latex'=>'application/x-latex',
	'.lha'=>'application/lha',
	'.lha'=>'application/octet-stream',
	'.lha'=>'application/x-lha',
	'.lhx'=>'application/octet-stream',
	'.list'=>'text/plain',
	'.lma'=>'audio/nspaudio',
	'.lma'=>'audio/x-nspaudio',
	'.log'=>'text/plain',
	'.lsp'=>'application/x-lisp',
	'.lsp'=>'text/x-script.lisp',
	'.lst'=>'text/plain',
	'.lsx'=>'text/x-la-asf',
	'.ltx'=>'application/x-latex',
	'.lzh'=>'application/octet-stream',
	'.lzh'=>'application/x-lzh',
	'.lzx'=>'application/lzx',
	'.lzx'=>'application/octet-stream',
	'.lzx'=>'application/x-lzx',
	'.m'=>'text/plain',
	'.m'=>'text/x-m',
	'.m1v'=>'video/mpeg',
	'.m2a'=>'audio/mpeg',
	'.m2v'=>'video/mpeg',
	'.m3u'=>'audio/x-mpequrl',
	'.man'=>'application/x-troff-man',
	'.map'=>'application/x-navimap',
	'.mar'=>'text/plain',
	'.mbd'=>'application/mbedlet',
	'.mc$'=>'application/x-magic-cap-package-1.0',
	'.mcd'=>'application/mcad',
	'.mcd'=>'application/x-mathcad',
	'.mcf'=>'image/vasa',
	'.mcf'=>'text/mcf',
	'.mcp'=>'application/netmc',
	'.me'=>'application/x-troff-me',
	'.mht'=>'message/rfc822',
	'.mhtml'=>'message/rfc822',
	'.mid'=>'application/x-midi',
	'.mid'=>'audio/midi',
	'.mid'=>'audio/x-mid',
	'.mid'=>'audio/x-midi',
	'.mid'=>'music/crescendo',
	'.mid'=>'x-music/x-midi',
	'.midi'=>'application/x-midi',
	'.midi'=>'audio/midi',
	'.midi'=>'audio/x-mid',
	'.midi'=>'audio/x-midi',
	'.midi'=>'music/crescendo',
	'.midi'=>'x-music/x-midi',
	'.mif'=>'application/x-frame',
	'.mif'=>'application/x-mif',
	'.mime'=>'message/rfc822',
	'.mime'=>'www/mime',
	'.mjf'=>'audio/x-vnd.audioexplosion.mjuicemediafile',
	'.mjpg'=>'video/x-motion-jpeg',
	'.mm'=>'application/base64',
	'.mm'=>'application/x-meme',
	'.mme'=>'application/base64',
	'.mod'=>'audio/mod',
	'.mod'=>'audio/x-mod',
	'.moov'=>'video/quicktime',
	'.mov'=>'video/quicktime',
	'.movie'=>'video/x-sgi-movie',
	'.mp2'=>'audio/mpeg',
	'.mp2'=>'audio/x-mpeg',
	'.mp2'=>'video/mpeg',
	'.mp2'=>'video/x-mpeg',
	'.mp2'=>'video/x-mpeq2a',
	'.mp3'=>'audio/mpeg3',
	'.mp3'=>'audio/x-mpeg-3',
	'.mp3'=>'video/mpeg',
	'.mp3'=>'video/x-mpeg',
	'.mpa'=>'audio/mpeg',
	'.mpa'=>'video/mpeg',
	'.mpc'=>'application/x-project',
	'.mpe'=>'video/mpeg',
	'.mpeg'=>'video/mpeg',
	'.mpg'=>'audio/mpeg',
	'.mpg'=>'video/mpeg',
	'.mpga'=>'audio/mpeg',
	'.mpp'=>'application/vnd.ms-project',
	'.mpt'=>'application/x-project',
	'.mpv'=>'application/x-project',
	'.mpx'=>'application/x-project',
	'.mrc'=>'application/marc',
	'.ms'=>'application/x-troff-ms',
	'.mv'=>'video/x-sgi-movie',
	'.my'=>'audio/make',
	'.mzz'=>'application/x-vnd.audioexplosion.mzz',
	'.nap'=>'image/naplps',
	'.naplps'=>'image/naplps',
	'.nc'=>'application/x-netcdf',
	'.ncm'=>'application/vnd.nokia.configuration-message',
	'.nif'=>'image/x-niff',
	'.niff'=>'image/x-niff',
	'.nix'=>'application/x-mix-transfer',
	'.nsc'=>'application/x-conference',
	'.nvd'=>'application/x-navidoc',
	'.o'=>'application/octet-stream',
	'.oda'=>'application/oda',
	'.omc'=>'application/x-omc',
	'.omcd'=>'application/x-omcdatamaker',
	'.omcr'=>'application/x-omcregerator',
	'.p'=>'text/x-pascal',
	'.p10'=>'application/pkcs10',
	'.p10'=>'application/x-pkcs10',
	'.p12'=>'application/pkcs-12',
	'.p12'=>'application/x-pkcs12',
	'.p7a'=>'application/x-pkcs7-signature',
	'.p7c'=>'application/pkcs7-mime',
	'.p7c'=>'application/x-pkcs7-mime',
	'.p7m'=>'application/pkcs7-mime',
	'.p7m'=>'application/x-pkcs7-mime',
	'.p7r'=>'application/x-pkcs7-certreqresp',
	'.p7s'=>'application/pkcs7-signature',
	'.part'=>'application/pro_eng',
	'.pas'=>'text/pascal',
	'.pbm'=>'image/x-portable-bitmap',
	'.pcl'=>'application/vnd.hp-pcl',
	'.pcl'=>'application/x-pcl',
	'.pct'=>'image/x-pict',
	'.pcx'=>'image/x-pcx',
	'.pdb'=>'chemical/x-pdb',
	'.pdf'=>'application/pdf',
	'.pfunk'=>'audio/make',
	'.pfunk'=>'audio/make.my.funk',
	'.pgm'=>'image/x-portable-graymap',
	'.pgm'=>'image/x-portable-greymap',
	'.pic'=>'image/pict',
	'.pict'=>'image/pict',
	'.pkg'=>'application/x-newton-compatible-pkg',
	'.pko'=>'application/vnd.ms-pki.pko',
	'.pl'=>'text/plain',
	'.pl'=>'text/x-script.perl',
	'.plx'=>'application/x-pixclscript',
	'.pm'=>'image/x-xpixmap',
	'.pm'=>'text/x-script.perl-module',
	'.pm4'=>'application/x-pagemaker',
	'.pm5'=>'application/x-pagemaker',
	'.png'=>'image/png',
	'.pnm'=>'application/x-portable-anymap',
	'.pnm'=>'image/x-portable-anymap',
	'.pot'=>'application/mspowerpoint',
	'.pot'=>'application/vnd.ms-powerpoint',
	'.pov'=>'model/x-pov',
	'.ppa'=>'application/vnd.ms-powerpoint',
	'.ppm'=>'image/x-portable-pixmap',
	'.pps'=>'application/mspowerpoint',
	'.pps'=>'application/vnd.ms-powerpoint',
	'.ppt'=>'application/mspowerpoint',
	'.ppt'=>'application/powerpoint',
	'.ppt'=>'application/vnd.ms-powerpoint',
	'.ppt'=>'application/x-mspowerpoint',
	'.ppz'=>'application/mspowerpoint',
	'.pre'=>'application/x-freelance',
	'.prt'=>'application/pro_eng',
	'.ps'=>'application/postscript',
	'.psd'=>'application/octet-stream',
	'.pvu'=>'paleovu/x-pv',
	'.pwz'=>'application/vnd.ms-powerpoint',
	'.py'=>'text/x-script.phyton',
	'.pyc'=>'applicaiton/x-bytecode.python',
	'.qcp'=>'audio/vnd.qcelp',
	'.qd3'=>'x-world/x-3dmf',
	'.qd3d'=>'x-world/x-3dmf',
	'.qif'=>'image/x-quicktime',
	'.qt'=>'video/quicktime',
	'.qtc'=>'video/x-qtc',
	'.qti'=>'image/x-quicktime',
	'.qtif'=>'image/x-quicktime',
	'.ra'=>'audio/x-pn-realaudio',
	'.ra'=>'audio/x-pn-realaudio-plugin',
	'.ra'=>'audio/x-realaudio',
	'.ram'=>'audio/x-pn-realaudio',
	'.ras'=>'application/x-cmu-raster',
	'.ras'=>'image/cmu-raster',
	'.ras'=>'image/x-cmu-raster',
	'.rast'=>'image/cmu-raster',
	'.rexx'=>'text/x-script.rexx',
	'.rf'=>'image/vnd.rn-realflash',
	'.rgb'=>'image/x-rgb',
	'.rm'=>'application/vnd.rn-realmedia',
	'.rm'=>'audio/x-pn-realaudio',
	'.rmi'=>'audio/mid',
	'.rmm'=>'audio/x-pn-realaudio',
	'.rmp'=>'audio/x-pn-realaudio',
	'.rmp'=>'audio/x-pn-realaudio-plugin',
	'.rng'=>'application/ringing-tones',
	'.rng'=>'application/vnd.nokia.ringing-tone',
	'.rnx'=>'application/vnd.rn-realplayer',
	'.roff'=>'application/x-troff',
	'.rp'=>'image/vnd.rn-realpix',
	'.rpm'=>'audio/x-pn-realaudio-plugin',
	'.rt'=>'text/richtext',
	'.rt'=>'text/vnd.rn-realtext',
	'.rtf'=>'application/rtf',
	'.rtf'=>'application/x-rtf',
	'.rtf'=>'text/richtext',
	'.rtx'=>'application/rtf',
	'.rtx'=>'text/richtext',
	'.rv'=>'video/vnd.rn-realvideo',
	'.s'=>'text/x-asm',
	'.s3m'=>'audio/s3m',
	'.saveme'=>'application/octet-stream',
	'.sbk'=>'application/x-tbook',
	'.scm'=>'application/x-lotusscreencam',
	'.scm'=>'text/x-script.guile',
	'.scm'=>'text/x-script.scheme',
	'.scm'=>'video/x-scm',
	'.sdml'=>'text/plain',
	'.sdp'=>'application/sdp',
	'.sdp'=>'application/x-sdp',
	'.sdr'=>'application/sounder',
	'.sea'=>'application/sea',
	'.sea'=>'application/x-sea',
	'.set'=>'application/set',
	'.sgm'=>'text/sgml',
	'.sgm'=>'text/x-sgml',
	'.sgml'=>'text/sgml',
	'.sgml'=>'text/x-sgml',
	'.sh'=>'application/x-bsh',
	'.sh'=>'application/x-sh',
	'.sh'=>'application/x-shar',
	'.sh'=>'text/x-script.sh',
	'.shar'=>'application/x-bsh',
	'.shar'=>'application/x-shar',
	'.shtml'=>'text/html',
	'.shtml'=>'text/x-server-parsed-html',
	'.sid'=>'audio/x-psid',
	'.sit'=>'application/x-sit',
	'.sit'=>'application/x-stuffit',
	'.skd'=>'application/x-koan',
	'.skm'=>'application/x-koan',
	'.skp'=>'application/x-koan',
	'.skt'=>'application/x-koan',
	'.sl'=>'application/x-seelogo',
	'.smi'=>'application/smil',
	'.smil'=>'application/smil',
	'.snd'=>'audio/basic',
	'.snd'=>'audio/x-adpcm',
	'.sol'=>'application/solids',
	'.spc'=>'application/x-pkcs7-certificates',
	'.spc'=>'text/x-speech',
	'.spl'=>'application/futuresplash',
	'.spr'=>'application/x-sprite',
	'.sprite'=>'application/x-sprite',
	'.src'=>'application/x-wais-source',
	'.ssi'=>'text/x-server-parsed-html',
	'.ssm'=>'application/streamingmedia',
	'.sst'=>'application/vnd.ms-pki.certstore',
	'.step'=>'application/step',
	'.stl'=>'application/sla',
	'.stl'=>'application/vnd.ms-pki.stl',
	'.stl'=>'application/x-navistyle',
	'.stp'=>'application/step',
	'.sv4cpio'=>'application/x-sv4cpio',
	'.sv4crc'=>'application/x-sv4crc',
	'.svf'=>'image/vnd.dwg',
	'.svf'=>'image/x-dwg',
	'.svr'=>'application/x-world',
	'.svr'=>'x-world/x-svr',
	'.swf'=>'application/x-shockwave-flash',
	'.t'=>'application/x-troff',
	'.talk'=>'text/x-speech',
	'.tar'=>'application/x-tar',
	'.tbk'=>'application/toolbook',
	'.tbk'=>'application/x-tbook',
	'.tcl'=>'application/x-tcl',
	'.tcl'=>'text/x-script.tcl',
	'.tcsh'=>'text/x-script.tcsh',
	'.tex'=>'application/x-tex',
	'.texi'=>'application/x-texinfo',
	'.texinfo'=>'application/x-texinfo',
	'.text'=>'application/plain',
	'.text'=>'text/plain',
	'.tgz'=>'application/gnutar',
	'.tgz'=>'application/x-compressed',
	'.tif'=>'image/tiff',
	'.tif'=>'image/x-tiff',
	'.tiff'=>'image/tiff',
	'.tiff'=>'image/x-tiff',
	'.tr'=>'application/x-troff',
	'.tsi'=>'audio/tsp-audio',
	'.tsp'=>'application/dsptype',
	'.tsp'=>'audio/tsplayer',
	'.tsv'=>'text/tab-separated-values',
	'.turbot'=>'image/florian',
	'.txt'=>'text/plain',
	'.uil'=>'text/x-uil',
	'.uni'=>'text/uri-list',
	'.unis'=>'text/uri-list',
	'.unv'=>'application/i-deas',
	'.uri'=>'text/uri-list',
	'.uris'=>'text/uri-list',
	'.ustar'=>'application/x-ustar',
	'.ustar'=>'multipart/x-ustar',
	'.uu'=>'application/octet-stream',
	'.uu'=>'text/x-uuencode',
	'.uue'=>'text/x-uuencode',
	'.vcd'=>'application/x-cdlink',
	'.vcs'=>'text/x-vcalendar',
	'.vda'=>'application/vda',
	'.vdo'=>'video/vdo',
	'.vew'=>'application/groupwise',
	'.viv'=>'video/vivo',
	'.viv'=>'video/vnd.vivo',
	'.vivo'=>'video/vivo',
	'.vivo'=>'video/vnd.vivo',
	'.vmd'=>'application/vocaltec-media-desc',
	'.vmf'=>'application/vocaltec-media-file',
	'.voc'=>'audio/voc',
	'.voc'=>'audio/x-voc',
	'.vos'=>'video/vosaic',
	'.vox'=>'audio/voxware',
	'.vqe'=>'audio/x-twinvq-plugin',
	'.vqf'=>'audio/x-twinvq',
	'.vql'=>'audio/x-twinvq-plugin',
	'.vrml'=>'application/x-vrml',
	'.vrml'=>'model/vrml',
	'.vrml'=>'x-world/x-vrml',
	'.vrt'=>'x-world/x-vrt',
	'.vsd'=>'application/x-visio',
	'.vst'=>'application/x-visio',
	'.vsw'=>'application/x-visio',
	'.w60'=>'application/wordperfect6.0',
	'.w61'=>'application/wordperfect6.1',
	'.w6w'=>'application/msword',
	'.wav'=>'audio/wav',
	'.wav'=>'audio/x-wav',
	'.wb1'=>'application/x-qpro',
	'.wbmp'=>'image/vnd.wap.wbmp',
	'.web'=>'application/vnd.xara',
	'.wiz'=>'application/msword',
	'.wk1'=>'application/x-123',
	'.wmf'=>'windows/metafile',
	'.wml'=>'text/vnd.wap.wml',
	'.wmlc'=>'application/vnd.wap.wmlc',
	'.wmls'=>'text/vnd.wap.wmlscript',
	'.wmlsc'=>'application/vnd.wap.wmlscriptc',
	'.word'=>'application/msword',
	'.wp'=>'application/wordperfect',
	'.wp5'=>'application/wordperfect',
	'.wp5'=>'application/wordperfect6.0',
	'.wp6'=>'application/wordperfect',
	'.wpd'=>'application/wordperfect',
	'.wpd'=>'application/x-wpwin',
	'.wq1'=>'application/x-lotus',
	'.wri'=>'application/mswrite',
	'.wri'=>'application/x-wri',
	'.wrl'=>'application/x-world',
	'.wrl'=>'model/vrml',
	'.wrl'=>'x-world/x-vrml',
	'.wrz'=>'model/vrml',
	'.wrz'=>'x-world/x-vrml',
	'.wsc'=>'text/scriplet',
	'.wsrc'=>'application/x-wais-source',
	'.wtk'=>'application/x-wintalk',
	'.xbm'=>'image/x-xbitmap',
	'.xbm'=>'image/x-xbm',
	'.xbm'=>'image/xbm',
	'.xdr'=>'video/x-amt-demorun',
	'.xgz'=>'xgl/drawing',
	'.xif'=>'image/vnd.xiff',
	'.xl'=>'application/excel',
	'.xla'=>'application/excel',
	'.xla'=>'application/x-excel',
	'.xla'=>'application/x-msexcel',
	'.xlb'=>'application/excel',
	'.xlb'=>'application/vnd.ms-excel',
	'.xlb'=>'application/x-excel',
	'.xlc'=>'application/excel',
	'.xlc'=>'application/vnd.ms-excel',
	'.xlc'=>'application/x-excel',
	'.xld'=>'application/excel',
	'.xld'=>'application/x-excel',
	'.xlk'=>'application/excel',
	'.xlk'=>'application/x-excel',
	'.xll'=>'application/excel',
	'.xll'=>'application/vnd.ms-excel',
	'.xll'=>'application/x-excel',
	'.xlm'=>'application/excel',
	'.xlm'=>'application/vnd.ms-excel',
	'.xlm'=>'application/x-excel',
	'.xls'=>'application/excel',
	'.xls'=>'application/vnd.ms-excel',
	'.xls'=>'application/x-excel',
	'.xls'=>'application/x-msexcel',
	'.xlt'=>'application/excel',
	'.xlt'=>'application/x-excel',
	'.xlv'=>'application/excel',
	'.xlv'=>'application/x-excel',
	'.xlw'=>'application/excel',
	'.xlw'=>'application/vnd.ms-excel',
	'.xlw'=>'application/x-excel',
	'.xlw'=>'application/x-msexcel',
	'.xm'=>'audio/xm',
	'.xml'=>'application/xml',
	'.xml'=>'text/xml',
	'.xmz'=>'xgl/movie',
	'.xpix'=>'application/x-vnd.ls-xpix',
	'.xpm'=>'image/x-xpixmap',
	'.xpm'=>'image/xpm',
	'.x-png'=>'image/png',
	'.xsr'=>'video/x-amt-showrun',
	'.xwd'=>'image/x-xwd',
	'.xwd'=>'image/x-xwindowdump',
	'.xyz'=>'chemical/x-pdb',
	'.z'=>'application/x-compress',
	'.z'=>'application/x-compressed',
	'.zip'=>'application/x-compressed',
	'.zip'=>'application/x-zip-compressed',
	'.zip'=>'application/zip',
	'.zip'=>'multipart/x-zip',
	'.zoo'=>'application/octet-stream',
	'.zsh'=>'text/x-script.zsh'
);
$mime_list = array(
	'asf'=>'video/x-ms-asf',
	'avi'=>'video/avi',
	'bz2'=>'application/x-bzip2',
	'doc'=>'application/msword',
	'gz'=>'application/x-gzip',
	'gzip'=>'application/x-gzip',
	'htm'=>'text/html',
	'html'=>'text/html',
	'jpe'=>'image/jpeg',
	'jpeg'=>'image/jpeg',
	'jpg'=>'image/jpeg',
	'js'=>'application/x-javascript',
	'mov'=>'video/quicktime',
	'mp3'=>'audio/mpeg3',
	'mpeg'=>'video/mpeg',
	'mpg'=>'video/mpeg',
	'pdf'=>'application/pdf',
	'swf'=>'application/x-shockwave-flash',
	'tgz'=>'application/x-compressed',
	'tif'=>'image/tiff',
	'tiff'=>'image/tiff',
	'txt'=>'text/plain',
	'xls'=>'application/excel',
	'zip'=>'application/x-compressed'
);

Data and Input Validation with Filters (PHP 5.2 and up)

For years (and years) we've done data validation in PHP "by hand" either with string functions or with regular expressions. The problem with PHP has been that, well, PHP programmers aren't so great at regexs. Also, unlike Perl, there's no data tainting feature that forces you to validate your inputs before they're used in expressions. (Correction, there is a tainting extension.) The upshot is that a lot of bad data gets through. PHP finally added some data validation function, but it seems like nobody is using them. It still lacks tainting. The former problem, we can address.

I suspect people don't use the built-in validation functions because they're relatively new, because the functions are a little cumbersome, and old habits die hard, especially when the old habit requires less typing. Here's a function that wraps a call to filter_input, validating the value of the parameter 'c'.

function get_config_id() {
  static $config_id = NULL;
  if ($config_id==NULL) {
    $config_id = filter_input( INPUT_GET, 'c', FILTER_VALIDATE_INT, 
      array('options'=>array('min_range'=>1, 'max_range'=>1000))
    );
  }
  $config_id = $config_id ? $config_id : 0;
  return $config_id;
}

That code does a little bit of variable caching, using the static keyword. That's a little C trick there to prevent calling the filter_input function twice. It speeds up the system.

After thinking about this, a lot, I've concluded that it's impossible to avoid writing a lot of validation code. Validating input requires creating some filter functions to filter common types of data, then comparing the input to the acceptable range of data. Then, if you don't have a value, possibly setting a default value. Additionally, to avoid repeating the process, you want to cache the value in a variable.

For more info see Filtering

at PHP.

Sample Code

Extracted from something from 2011. This is for a little server backing an AJAX user interface. It is example code that demonstrates how to check for existence, and import the value into your script, and fail otherwise. The code isn't complex, but it's verbose, so you really need to copy-paste this stuff, or you will just get lazy and not do validation correctly.

// restore the session
session_start();
session_regenerate_id(); // prevents session hijacking

$user = $_SESSION['user'];
$role = $_SESSION['role'];

// validate input parameters
require_once('../httperror.lib.php');

// test for invalid parameter names
foreach( array_keys($_GET) as $key )
  if (! in_array(array('foo','bar','baz'), $key))
    http_bad_request('Excess parameter.');

// this parameter is optional
if (isset($_GET['foo']))
  if (filter_var($_GET['foo'], FILTER_VALIDATE_EMAIL))
    $foo = $_GET['foo'];

// this parameter must exist
if (isset($_GET['bar']))
  if (filter_var($_GET['bar'], FILTER_SANITIZE_INT))
    $bar = $_GET['bar'];
  else
    http_bad_request('Invalid value for bar.');
else
  http_bad_request('Parameter missing.');

// this parameter must exist
if (isset($_GET['baz']))
  if (filter_var($_GET['baz'], FILTER_VALIDATE_BOOLEAN))
    $baz = $_GET['baz'];
  else
    exit; // die silently
else
  exit; // die silently

// access control
if (! filter_var($user, FILTER_VALIDATE_INT))
  http_unauthorized('User ID required.');

if ( $role != ROLE_ADMIN )
  http_unauthorized();

Decode php Shells Obfuscated with eval gzunzip base64_encode str_rot13

I got hacked (more than once) and they installed a backdoor php shell. It sucked (and I must suck for allowing it to happen... but anyway) and here's a snippet of code that you can use to decrypt these nasties. To use it, paste the function definition for "dc" below into the hack script. Then replace that first instance of "eval" with "dc", which calls this function. Then run the script. The output will be the source of the script. View source to read it formatted correctly - and use Save As... to save it. You may need to edit the output to get it working.

function dc($s) {
  // matches strings
  if (preg_match('/^\'(.+)\'$/s', $s, $matches)) {
    return $matches[1];
  }

  // matches function calls
  if (preg_match('/^(\\?><\\? |)([a-z0-9_]+)\\((.+)\\)(; \\?><\\?|)$/s', $s, $matches)) {
    $func = $matches[2];
    $data = $matches[3];
    switch($func) {
      case 'eval':
        $newdata = dc($data);
        echo dc($newdata);
      break;
      case 'gzinflate':
        return gzinflate(dc($data));
      case 'str_rot13':
        return str_rot13(dc($data));
      case 'base64_decode':
        return base64_decode(dc($data));
      default:
        echo 'do not know ' . $func;
        echo $data;
      break;
    }
  }
  else {
    echo $s;
  }
}

What the function does is evaluate the base64_decode, str_rot13, and gzinflate calls, but skips using eval. It handles that thing they do, adding ?><? at the ends of the string.

What was surprising was that I got hacked "from the inside" - the network was set up correctly, but they probably got me through an application I was using over the network - probably Bittorrent or Freenet, or perhaps a website. (I was downloading/sharing the wikileaks archives.) My undoing was that I didn't practice strong security within my personal computer. Main failings:

- not using the file security features
- not configuring the local software to be secure
- recording passwords in files, and in configs
- using the same password over and over
- sending passwords over cleartext in my LAN

Other problems:
- recording shell history
- huge archives of data kept online, including old emails

To improve the situation:
- study security more regularly
- install a logging firewall
- running security audits on the personal comptuers
- running AIDE to track file changes
- keep backups and archives offline (turn the external drive off)
- use private keys and certs to log into services
- figure out risk exposure from web apps, my own external accounts, phone, etc.

The basic idea has to be to treat the personal computer like I'd treat a server.

Update: now I'm not sure if I got hacked, or if I just left a bunch of hack files strewn around a poorly configured system.

Decoding gzinflate base64_decode

Some themes add a copyright notice using a technique also seen in hack scripts. They take the PHP code, and the base64 encode it, and gzip it. (Hack scripts also eval the code.) Below is a snippet of code that will decode the encoded data, and then save it out as 'some-script.php'.

To use it, first find the code that looks like "eval(gzinflate(base64_decode('ponbZ2smNT3Fy6.....W+ebF')));", and replace the eval with "$script = (gzinflate" and so on. Then, you run this code on $script.

while(preg_match('/^eval\(gzinflate\(base64_decode\(/', $script)) {
    echo '*bingo*';
    $s = substr($script, 30);
    $s = substr($s, 0, -5);
    $script = gzinflate(base64_decode($s));
}
file_put_contents('some-script.php', $script);

Directory Paths

These are a few notes to myself about managing directory paths in PHP. Unlike other programming environments, PHP doesn't hide the ugliness of file paths too well. Consequently, the programmer must spend effort to organize the file hierarchy, and manage paths.

There are a few directory paths that are handy to know:

There are a few extra directories to add for an app framework:

Also, it's best to create some rules about passing paths as arguments into functions. Here are my rules:

These rules just keep me from writing code like this: $path = $path1 . '/' . $path2, and losing track of whether there's a slash at the end of the path or not. Despite my better judgment, I use the literal '/' instead of the DIRECTORY_SEPARATOR constant. This'll change in my code over time. The hardest part is writing this:

$path = rtrim(DIRECTORY_SEPARATOR,$path).DIRECTORY_SEPARATOR;

Drupal PHP Block Visiblity by Taxonomy Term (Category)

Here's a snippet of PHP code that displays a block if a node has a specific term. You set up the block to display based on the result of PHP code.

<?php
if (arg(0) == 'node' && is_numeric($id=arg(1))) {
  $t = taxonomy_node_get_terms_by_vocabulary(node_load($id),9);
  if ($t[99]) return TRUE;
} else {
  return FALSE;
} ?>

(I'm doing this because the click rate on the Google ads has declined, and the ads, with a few exceptions, like the timesheet software, don't match the content too well. So I'm going to experiment with doing my own placements for affiliate commissions.)

Error Handling Notes

Error messaging facility:


$errL = new ErrorListener();

global $errL;
...
$errL->add('context', 'message', $errCode);
...

if ($errL->hasErrors())
$errL->showAsHtml();

$errL->getError('context');

if ($errL->hasError('context'))
exit;

The listener would collect all error messages into an object, and provide methods to help display them.

It can also dispatch code:

$errL->addHandlerObject('context', &$objRef );

If an error is triggered with the key 'context', $objRef->errorHandler($errCode) is called.

Shortly after I wrote this note, a class at PHP Classes showed up that does something related: error_manager

Exception - what is it?

What's an Exception?

That wasn't ever explained to me, so I'll take a minute to explain it to you. Exceptions are basically "errors", but given a bit of OO structure so they act more like "events".

Consider what old-fashioned errors do. They interrupt program flow. They show some kind of message. They may resume program flow.

That's what an exception does, but with the added feature of wrapping the message in an object, with a type. That type can then be used to select which code is executed. Exceptions are also part of the class structure in an OO language, so that an exception in a subclass can be handled by code in the superclass.

Exceptions are implemented with the try { ... } catch { ... } ... structure, which moves much of the error-handling code into the catch block.

So, overall, the entire error handling system is cleaner, because the errors have structure, and the code to handle each error is now external to the main program flow.

There's some exception handling lingo, too. There's try and catch, of course. When an exception happens, it's said that the exception was "raised". A synonym for "raise" is "throw", and that's related to the try/catch terminology. You throw the exception, and then it's caught. (The term "raise" probably comes from electronics, where you "raise a signal" meaning you put voltage on a wire - and then some other electronics detect it and do something.) Finally, the code in the catch blocks are called "exception handlers".

Note that the code to detect anomalous situations that will cause an error in the future should not be in the catch block. That's still part of the regular flow.

For example, if you know that a missing file will cause problems, you should just detect that and handle it in the regular program flow. Don't use exceptions to handle that, because most languages don't allow you to resume the code after the exception is raised.

Working with exceptions and OO is nice. Within a try block, several different types (or classes) of exceptions may be raised. When an exception raised, the program first looks at the catch blocks to see if one matches the type of excption you're throwing. If it's not found, then looks for a handler in the parent class. This continues until some exception handler that matches is found. Otherwise, an "uncaught exception" happens.

Gentoo: PHP CGI compilation problems and symptoms (phpmyadmin, gentoo, cookies)

This is just a note to be found by search engines.

I had a weird problem with a configuration of phpmyadmin on top of lighttpd on top of Gentoo Linux.

The symptom was that phpmyadmin said "cookies must be enabled". Cookies were enabled in my browser.

A test of the setcookie() function in PHP (and the header() function as well) seemed to indicate that no header lines were being sent.

Turns out that this is related to the versions of PHP being installed.

http://www.gentoo.org/proj/en/php/php-upgrading.xml

My problem was that I wasn't compiling the CGI version of PHP. I probably had the CLI version installed.

To force the CGI version, add 'cgi' to your USE. (This is in /etc/make.config.)

Rebuild it (emerge php) and wait.

Then, check that /usr/bin/php-cgi exists, and alter your lighttpd.conf file or mod_cgi.conf file to reflect this fact.

Hope this helps.

Graphical hitcounter

This is a very simplistic hit counter. You include it on your page via an <img src="http://riceball.com/hitcounter/hit.php?id=1"> tag. The only "security" feature it has is checking that we don't increment on a hit from the same IP address twice in a row.

It's useless for busy sites, because repeated pageviews will clear the last ip address. I wrote it specifically for an ebay use. I didn't log all IPs or give out cookies because I suspected that those are against ebay policy.

To create a new counter, change the value of the id. The image should come up with the value "1" for starters. If not, you probably have another user's counter. Choose a diffent number.

Warning - this service may stop operating at any time.


<?php

$id = $_GET['id'];
if (! $id)
$id = '0';

$lastip = $_SERVER['REMOTE_ADDR'];

$insert = "INSERT INTO hitcounter (id,count, lastip) VALUES ($id, 1, '$lastip')";
$update = "UPDATE hitcounter SET count=(count+1), lastip='$lastip' WHERE lastip<>'$lastip'";
$select = "SELECT count FROM hitcounter WHERE id=$id";

$link = mysql_connect( 'localhost:3306', 'riceballcom', 'ginchy' );
mysql_select_db( 'riceball_com', $link );

$res = mysql_query( $insert, $link );
if (!$res)
{
$res = mysql_query( $update, $link );
}

$res = mysql_query( $select, $link );
$row = mysql_fetch_array( $res );
$count = $row[0];

$img = imagecreate( 100, 24 );
$background = imagecolorallocate( $img, 128, 128, 128 );
$text = imagecolorallocate( $img, 255, 255, 255 );

imagestring( $img, 5, 5, 3, "$count", $text );

header( 'Content-type: image/gif' );
imagegif( $img );
exit;

?>

Here's the SQL def:

CREATE TABLE `hitcounter` (
`id` int(11) NOT NULL auto_increment,
`count` int(11) default NULL,
`lastip` varchar(255) default NULL,
PRIMARY KEY (`id`),
KEY `lastip` (`lastip`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

HTML Tag Closing Function

User-submitted HTML often contains small markup errors that can affect other parts of the page. The most common are unclosed tags that cause text to be bolded, italicized, or linked all the way down the page. The visual effect is catastrophic, though the error is really minor.

The html_close_tags() function scans HTML code, and generates a string that will close all the open tags. An easy way to use it is like this:


  $html = $html.html_close_tags($html);

The string analysis is done "C" style, by iterating over characters, rather than using regexes (Perl style), or by breaking the data into parts, parsing it, and then concatenating the output (Lisp style). C style is to just read across the data a character at a time, accumulating substrings as needed. Because PHP lacks GOTO, I settled for using a series of loops to implement the state machine. (Sometimes, goto is the right way to do something.)

Scanning was used because I thought it would be faster than any technique that would require multiple concatenations. It's also very straightforward compared to regexes. I used a couple extraneous variables to add some documentation, as recommended by older programming texts.

One somewhat serious deficiency is that quoted attribute strings aren't well-supported. An attribute like onClick='pop(100,300,\'bar\')' will fail to parse correctly because escaping is not supported.

This is a relatively rare situation, because user input should not allow JavaScript. Of course, another function should have been called to sanitize the input of JavaScript.

To use the code, remember to strip out the test cases.


<?php
// vim:set ts=4 sw=4:
/**
 * A function to close tags in user-supplied HTML.
 * It's written to handle code with improperly nested tags.
 * This does not sanitize the data for tricks like using html entities
 * to encode tag names.
 */
function html_close_tags($html)
{
	$ignoretags = array( 'p', 'br' );
	$tagstack = array();
	$size = strlen( $html );
	$i = 0;
	$ch = $html[$i];
	$mark = $i;
	while( $i < $size )
	{
		// outside the tag state (1)
		while( $ch != '<' )
		{
			$ch = $html[++$i]; // advance one char
		} // while not '<'
		
		// inside the tag state (2)
		// get the tag type (open or close)
		$ch = $html[++$i]; // advance one char
		if( $ch == '/' )
		{
			$closeTag = true;
		}
		else // it's an opening tag
		{
			$closeTag = false;
		}
		if ($closeTag) $i++; // advance one char
		$mark = $i; // mark start of name

		// get the tag name (2.1)
		while( $ch != ' ' and $ch != '>' )
		{
			$ch = $html[++$i]; // advance one char
		}
		$tagname = strtolower( substr( $html, $mark, $i-$mark ) );
		// Don't advance char after this state.
		
		// get the rest of the tag attributes (2.2)
		while( $ch != '>' )
		{
			$ch = $html[++$i]; // advance one char
			
			// special case within quotes
			// note that this does not handle complex quoting or escapes
			if ($ch == '"')
			{
				while( $ch != '"' )
					$ch = $html[++$i]; // advance one char
			}
			if ($ch == "'")
			{
				while( $ch != "'" )
					$ch = $html[++$i]; // advance one char
			}
		}
		
		$last = $html[$i-1];
		// If tag attribute part contains a trailing slash
		// assume it's self-closing and don't add to tag stack.
		if ( $last=='/' ) 
		{
		}
		// If the tag is an opening tag, put on tagstack.
		else if ( $closeTag == false )
		{
			// unless it's in the ignoretags array, add it
			if (!in_array( $tagname, $ignoretags ))
				$tagstack[] = $tagname;
		}
		// If the tag is a closing tag, pop a matching tag off stack
		// by searching for the first matching tag and removing 
		// that element.
		else if ( $closeTag == true )
		{
			for($c=count($tagstack)-1; $c >= 0; $c--)
			{
				if( $tagstack[$c]==$tagname )
				{
					unset($tagstack[$c]);
					break; // stop the for loop
				}
			}
		}
		$ch = $html[++$i]; // advance one char

	} // main loop
	// Scan remaining elements, building up a string of closing tags.
	// Note that string is built up "backwards" to close tags in order
	// (because the loop reads from the bottom of the stack).
	foreach( $tagstack as $tag )
		$result = ''.$result;
	return $result;
}

echo "
";

$test = "


  • "; echo htmlspecialchars(html_close_tags($test)); echo "\n\n"; $test = "test

    test
    test

    • "; echo htmlspecialchars(html_close_tags($test)); echo "\n\n"; $test = "

"; echo htmlspecialchars(html_close_tags($test)); echo "\n\n"; $test = "


  • "; echo htmlspecialchars(html_close_tags($test)); echo "\n\n"; $test = "


    • "; echo htmlspecialchars(html_close_tags($test)); echo "\n\n"; $test = "


      "; echo htmlspecialchars(html_close_tags($test)); echo "\n\n"; $test = "


    "; echo htmlspecialchars(html_close_tags($test)); echo "\n\n"; ?>

AttachmentSize
HtmlCloseTags.inc.php.txt3.32 KB

HTML: ID naming convention

if you have an ajaxy library that can edit divs in-place:

<div id="story.1234">
    <div id="story.1234:title">the title</div>
</div>
<div id="story.1235">
    <div id="story.1235:title">the second title</div>
</div>

The naming convention is:

type.id:property

aka:

table_name.id:column_name

More or less.

You could even stack the objects.

<div id="type1.id.type2.id:property">foobar</div>

The id tells you how to traverse the object path to alter the appropriate property.

If you need a composite key for an id, then: type1.id-id-id:property.

If one of the fields is not numeric and contains a dash, then, perhaps encode the value into a string.

This syntax is legit HTML, and I think it was intended to be used this way.

However, a few google searches only turned up (for me) one example of the above. Asp.Net control ID and HTML. So, the custom controls in ASP.NET are using a similar long naming convention to map from HTML controls to server-side code that handles the control.

(The other discussion was mostly HTML coders debating whether to use presentational or semantic names.)

Now, the trick here to using these long IDs is to pass the long ID back to the server. The server decodes it, and then uses the information to update the appropriate chunk of data.

Likewise, on the server side, when a web page exports html, or a web service exports objects, it should export these long IDs instead of the traditional short IDs used in the table.

Code's not written, but, maybe it will be soon.

Hierarchical Menu

I've been coding on experts exchange, testing myself. Here's some relevant code for one of the answers.

They're classes that have been used to generate hierarchical menus.


<?php
/**
 * @package menu
 * @author John Kawakami 
 * @link http://www.slaptech.net Developed by The Slaptech Collective
 * @copyright 2006 Public Domain
 * @version $Id: Menu.class.php 97 2006-07-27 10:43:40Z johnk $
 *
 * Library to help create hierarchical menus.
 * Use MenuRenderer to display the menus.
 *
 * 
 * $menuBar = new Menu('MenuBar');
 * $fileMenu = $menuBar->addMenu( 'File' );
 * $fileMenu->addUrl( 'Open', '/admin/foo.php' );
 * $fileMenu->addUrl( 'Save', '/admin/bar.php' );
 * $fileMenu->addDivider();
 * $fileMenu->addUrl( 'Close', '/admin/baz.php' );
 * 
 */

include_once('ACL.class.php');
 
class Menu
{
	var $acl;
	var $menu;
	var $menuHash; //fixme - combine these two structures into one
	var $name;
	var $description;
	var $icon;
	var $url;
	var $highlight;

	function Menu( &$acl, $name, $description='', $icon='', $url='' )
	{
		$this->acl			=& $acl;
		$this->menu 		= array();
		$this->name 		=& $name;
		$this->description 	=& $description;
		$this->icon 		=& $icon;
		$this->url 			=& $url;
		$this->highlight	= false;
	}
	function addObject( &$obj )
	{
		array_push( $this->menu, &$obj );
	}
	function addNamedObject( $name, &$obj )
	{
		$this->menuHash[$name] =& $obj;
		$this->addObject( &$obj );
	}
	/**
	 * Utility function.
	 */
	function &addMenu( &$acl, $name, $description='', $icon='', $url='' )
	{
		$new = new Menu( &$acl, $name, $description, $icon, $url );
		$this->addObject( $new );
		return $new;
	}
	function &getNamedObject( $name )
	{
		return $this->menuHash[$name];
	}
	function &addUrl( &$acl, $name, $url, $description='', $popup='' )
	{
		$new = new UrlMenuItem( &$acl, $name, $url, $description, $popup );
		$this->addObject( $new );
		return $new;
	}
	function &addDivider()
	{
		$new = new DividerMenuItem();
		$this->addObject( $new );
		return $new;
	}
	function select( $name )
	{
		for( $i=0; $i < count($this->menu); $i++ )
		{
			if ($this->menu[$i]->name==$name)
			{
				return $this->menu[$i];
			}
		}
	}
	function rewind()
	{
		reset($this->menu);
	}
	function &next()
	{
		// check the acl to see if this user can see the next item - fixme
		list( $index, $obj ) = each( $this->menu );
		return $obj;
	}
	function disable()
	{
		$this->enabled = false;
	}
	function enable()
	{
		$this->enabled = true;
	}
	function highlight()
	{
		$this->highlight = true;
	}
}

class UrlMenuItem extends Menu
{
	var $name;
	var $url;
	var $description;
	var $popup;

	function &UrlMenuItem( &$acl, $name, $url, $description='', $popup='' )
	{
		$this->acl			=& $acl;
		$this->name 		=& $name;
		$this->url 			=& $url;
		$this->description 	=& $description;
		$this->popup 		=& $popup;
		return $this;
	}

}
class DividerMenuItem extends Menu
{
	function DividerMenuItem()
	{
		$this->name = 'divider';
	}
}


?>


The menus are rendered via another class:


<?php
/**
 * @package menu
 * @author John Kawakami 
 * @link http://www.slaptech.net Developed by The Slaptech Collective
 * @copyright 2006 Public Domain
 * @version $Id: MenuRenderer.class.php 92 2006-07-27 09:13:10Z johnk $
 *
 * Rendering classes for menus.  These are kind of like "templates"
 * for menus, except they can render hierarchies, portions of menus,
 * etc.  There's HTML code in here.  Yuk.
 */

/**
 * Rendering interface.  Each renderer must implement the
 * url, divider, and menu methods.  The menu() method is the heart
 * of the renderer.
 */
class MenuRenderer
{
	function MenuRenderer() {}
	function divider() {}
	function url() {}
	function menu() {}
	function item( $obj ) 
	{
		return $obj->name;
	}
}

/**
 * Renders a menu as an html unordered list (UL).
 */
class HtmlMenuRenderer extends MenuRenderer
{
	function HtmlMenuRenderer() {}

	function menu( &$menu, $indent=0 )
	{
		$t = str_repeat( ' ', $indent );
		$m =& $menu;
		$m->rewind();
		if ($m->name != '@')
			$output .= "$t
  • "; if ($m->name != '@') { if ($m->url) $output .= "url'>"; if ($m->icon) { $output .= "icon' border='0' />"; $output .= "
    "; } $output .= $m->name."\n"; if ($m->url) $output .= '
    '; } $output .= "$t
      \n"; while( $obj =& $m->next() ) { $class = strtolower(get_class($obj)); // echo "$class
      "; switch( $class ) { case 'apmenu': case 'menu': $output .= $this->menu( $obj, $indent+4 ); break; case 'urlmenuitem': $output .= $this->url( $obj, $indent+4 ); break; case 'dividermenuitem': $output .= $this->divider( $obj, $indent+4 ); break; default: $output .= $this->item( $obj, $indent+4 ); break; } } $output .= "$t
    \n"; if ($m->name != '@') $output .= "$t
  • \n"; return $output; } function divider( &$obj, $indent=0 ) { $t = str_repeat( ' ', $indent ); return "$t
  • -----------
  • \n"; } function url( &$obj, $indent=0 ) { $t = str_repeat( ' ', $indent ); return "$t
  • url'>$obj->name
  • \n"; } }

    Adjacency Table to Hierarchy

    Here's a way to turn an adjacency table into a hierarchy in html.


    <?php
    $menu = array();
    $menu[] = array( 'id' => 1, 'parent_id' => 0, 'name' => 'a' );
    $menu[] = array( 'id' => 2, 'parent_id' => 1, 'name' => 'a.1' );
    $menu[] = array( 'id' => 3, 'parent_id' => 0, 'name' => 'b' );
    $menu[] = array( 'id' => 4, 'parent_id' => 3, 'name' => 'b.1' );
    $menu[] = array( 'id' => 5, 'parent_id' => 3, 'name' => 'b.2' );
    $menu[] = array( 'id' => 6, 'parent_id' => 5, 'name' => 'b.2.1' );

    // acts like a SELECT statement
    function selectWhereParentIdIs( $id )
    {
    global $menu;
    $out = array();

    for( $i=0; $i 0 )
    {
    $out .= '

      ';
      reset( $ar );
      foreach( $ar as $value )
      {
      $out .= '
    1. '.$value['name'];
      $out .= menuToHtml( $value['id'] );
      $out .= '
    2. ';
      }
      $out .= '

    ';
    return $out;
    }
    else
    {
    return '';
    }

    }

    echo menuToHtml( 0 );

    ?>

    And here's a select version


    <?php
    $menu = array();
    $menu[] = array( 'id' => 1, 'parent_id' => 0, 'name' => 'a' );
    $menu[] = array( 'id' => 2, 'parent_id' => 1, 'name' => 'a.1' );
    $menu[] = array( 'id' => 3, 'parent_id' => 0, 'name' => 'b' );
    $menu[] = array( 'id' => 4, 'parent_id' => 3, 'name' => 'b.1' );
    $menu[] = array( 'id' => 5, 'parent_id' => 3, 'name' => 'b.2' );
    $menu[] = array( 'id' => 6, 'parent_id' => 5, 'name' => 'b.2.1' );

    // acts like a SELECT statement
    function selectWhereParentIdIs( $id )
    {
    global $menu;
    $out = array();

    for( $i=0; $i 0 )
    {
    reset( $ar );
    foreach( $ar as $value )
    {
    $out .= '';
    if ($depth>0) $out .= '-';
    $out .= str_repeat( '-', $depth ) . $value['name'];
    $out .= '';
    $out .= menuToSelect( $value['id'], $depth+1 );
    }
    return $out;
    }
    else
    {
    return '';
    }

    }

    echo '';
    echo menuToSelect( 0, 0 );
    echo '';

    ?>

    Hierarchical Report Generator for CakePHP

    This is an almost complete CakePHP component to create hierarchical reports.

    It's not really canonical Cake, because it only works with MySQL.

    It basically works, but is rough. It can be used in a non-Cake context, to some extent.

    A hierarchical report is just a report with several reports in it, and they are arranged hierarchically. For example, this reports on attendance at an event, for several events, on several different dates. Attendance is grouped by organization, event, and date.

    hierarchical_report.php

    
    class HierarchicalReportComponent
    {
        /*  sample report
         *  10/10/02
         *     poker-game
         *       moe's - 2
         *       joe's - 3
         *       union - 1
         *     bake-sale
         *       moe's - 1
         *       union - 2
         *  10/11/02
         *     auto-race
         *       union - 3
         *       school - 1
         */
        function report( $spec=Null, $next_selector=Null )
        {
        	switch($spec['display_as'])
    	{
    	    case 'section':
    	        return $this->section_report( $spec, $next_selector ? $next_selector:Null );
    	    case 'table':
    	        return $this->table_report( $spec, $next_selector ? $next_selector:Null );
    	}
        }
    
        var $section = 0;
        function section_report( $s, $next_selector = Null )
        {
        	$this->section++;
            $sql = $s['sql'];
            if ($next_selector) $sql = preg_replace( '/__next_selector__/', $next_selector, $sql );
            ( $r = mysql_query( $sql ) ) || die( "FAILED: $sql" );
    	$o = '';
            while( $d = mysql_fetch_array( $r ) )
            {
                if (isset($s['title'])) 
    		$o .= "section}>{$d[$s['title']]}section}>";
                if (isset($s['subreport']))
                    $o .= $this->report( $s['subreport'], $d[$s['next_selector']] );
            }
    		$this->section--;
            return $o;
        }
        function table_report( $s, $next_selector = Null )
        {
    		$hidden_columns = array();
    		if (isset($s['hidden_columns'])) $hidden_columns = $s['hidden_columns'];
    
    		$column_headings = array();
    		if (isset($s['column_headings'])) $column_headings = $s['column_headings'];
    
    		$links = array();
    		if (isset($s['links'])) $links = $s['links'];
    
    		$sorting_columns = array();
    		if (isset($s['sorting_columns'])) $sorting_columns = $s['sorting_columns'];
    
            $sql = $s['sql'];
            if ($next_selector) $sql = preg_replace( '/__next_selector__/', $next_selector, $sql );
            ( $r = mysql_query( $sql ) ) || die( "FAILED: $sql" );
            $first_row = true;
            $o = '';
            while( $d = mysql_fetch_assoc( $r ) )
            {
                if ($first_row==true )
                {
                    $o .= '';
    				foreach($d as $key=>$value) 
    				{
    					if (in_array($key, $hidden_columns)) continue;
    
    					$title = $this->_subst($key,$column_headings);
    
    					if (in_array($key, $sorting_columns))
    					{
    						$url = $s['sorting_url'];
    						if ($s['sorting_on']==$key)
    						{
    							$url = $this->_merge( $url, array('order_by'=>$key, 'asc_desc'=>($s['sorting_direction']=='DESC'?'ASC':'DESC') ) );
    							$title = "$title";
    						}
    						else
    						{
    							$url = $this->_merge( $url, array('order_by'=>$key, 'asc_desc'=>'ASC') );
    							$title = "$title";
    						}
    					}
    					$o .= ''.$title.'';
                    }
    				$o .= '';
                    $first_row = false;
    				reset($d);
                }
                $o .= '';
    			foreach($d as $key=>$value)
    			{
    				if (isset($links[$key]))
    				{
    					$link = $this->_merge( $links[$key], $d );
    					$value = ''.$value.'';
    				}
    				if (! in_array($key, $hidden_columns)) $o .= ''.$value.'';
    			}
                $o .= '';
            }
            if (isset($s['summation_sql']))  /* applies only to tables */
            {
                // summation code
                // insert the sql statement right into the query
                $summation_sql = preg_replace( '/__sql__/', $sql, $s['summation_sql'] );
    			if (!preg_match('/as .+$/', $summation_sql))
    	    	$summation_sql .= ' AS abcdefghi';
                ( $res = mysql_query( $summation_sql ) ) || die("FAILED: $summation_sql");
                $row = mysql_fetch_assoc( $res );
                $o .= '';
    			foreach($row as $key=>$value) $o .= ' '.$value.'';
    			$o .= '';
    		}
            $o .= '';
            return $o;
        }
    
    	function _subst( $key, $substs )
    	{
    		if (isset($substs[$key])) return $substs[$key];
    		return $key;
    	}
    	function _merge( $template, $substs )
    	{
    		$keys = array_keys($substs);
    		for($i=count($keys)-1;$i>=0;$i--) $keys[$i]='__'.$keys[$i].'__';
    		return str_replace( $keys, array_values($substs), $template );
    	}
    
    }
    
    
    

    Then, into the controller, you add a report spec. This is the heart of the report. There are three here.

    
    	function report_orgs()
    	{
    		list($campaign_id,$event_id) = $this->_get_event_context();
    
    		$spec = array(
    			'display_as' => 'table',
    			'sql' => 'SELECT distinct a.ORGID,OrgName FROM tblorganizations AS org JOIN attendances AS a ON a.ORGID=org.ORGID ORDER BY OrgName ASC',
    			'title' => 'Select an organization',
    			'links' => array( 'OrgName'  => '/attendance/report_org_participation/__ORGID__' ),
    			'hidden_columns' => array( 'ORGID' ),
    			'column_headings' => array( 'OrgName' => 'Organization' ),
    		);
    	  	$hrc = new HierarchicalReportComponent();
    	  	$this->set('out',$hrc->report( $spec ));
    	}
    
    	function report_org_participation( $ORGID, $order_by='Fname', $asc_desc='ASC' )
    	{
    		$spec = array(
    			'display_as' => 'table',
    			'sql' => "SELECT act.Fname, act.Lname, e.Name AS EventName, e.Date FROM attendances AS a JOIN events AS e ON e.CEventID=a.CEventID JOIN tblactivists AS act ON a.FEDID=act.FEDID WHERE a.ORGID=$ORGID ORDER BY $order_by $asc_desc",
    			'title' => 'Participation',
    			'sorting_columns' => array('Fname','Lname','EventName','Date'),
    			'sorting_url' => "/attendance/report_org_participation/$ORGID/__order_by__/__asc_desc__",
    			'sorting_on' => $order_by,
    			'sorting_direction' => $asc_desc,
    		);
    	  	$hrc = new HierarchicalReportComponent();
    	  	$this->set('out',$hrc->report( $spec ));
    	}
    
    
    	function report()
    	{
    	  $spec=array(
    		  'display_as' => 'section',
    		  'sql' => 'SELECT distinct `Date` FROM `events` order by `Date`',
    		  'title' => 'Date',
    		  'next_selector' => 'Date',
    		  'subreport' => array(
    			  'display_as' => 'section',
    			  'sql' => "SELECT CEventID,`Name`,`Date` FROM `events` WHERE `Date`='__next_selector__'",
    			  'title' => 'Name',
    			  'next_selector' => 'CEventID',
    			  'subreport' => array(
    				  'display_as' => 'table',
    				  'sql' => 'SELECT att.ORGID,OrgName,COUNT(FEDID) AS persons FROM `attendances` AS att JOIN tblorganizations AS org ON att.ORGID=org.ORGID WHERE CEventID=__next_selector__ GROUP BY att.ORGID',
    				  'summation_sql' => "select 'TOTAL', SUM(persons) FROM ( __sql__ )",
    				  'next_selector' => 'ORGID',
    				  'links' => array( 'persons'  => '/path/to/other/report?id=__ORGID__' ),
    				  'hidden_columns' => array( 'ORGID' ),
    				  'subreport_column' => 'OrgName',
    				  'subreportx' => array(
    						  'display_as' => 'table',
    						  'indent' => 15
    					  )
    				  )
    			  )
    		  );
    		  /* feature to add - the next_selectors should accumulate, so any of them can be used in queries and urls... */
    	  $hrc = new HierarchicalReportComponent();
    	  $this->set('out',$hrc->report( $spec ));
    
    
    
    

    And last, a view. This just happens to be the one I used, but yours will differ.

    
    
    <?php echo $out ?>
    
    
    

    How to program these specs.

    Start by creating a single report for the most specific situation. I use PHPMyAdmin for this.

    Then create a report that pulls up a list of all the situations.

    Then, use the __next_selector__ feature to inject the id from the list into the first report.

    You can repeat this again for another layer of hierarchy. The example has three levels: Date, Event, and the tabular report.

    The example code also shows you can link between reports, and how an automatic sorting feature can allow column headings to be clickable so users can sort the report.

    Image Gallery: Yet Another

    Here's yet another image gallery. This one requires no database, but requires you to create a file called captions.php in each directory of images, and populate it thusly:


    <?php
    $captions = array (
    "image1.jpg" => "Caption for Image 1",
    "image2.jpg" => "Caption for Image 2",
    "image3.jpg" => "Caption for Image 3",
    "image4.jpg" => "Caption for Image 4"
    );
    ?>

    It features a potentially annoying "paramater hiding" feature, one-image-ahead-preload, and very simple navigation. It was written at the request of Eternal_Student on Experts Express.


    <?php
    ###### user settings ################################

    $defaultDir = 'img'; // this must be set

    #####################################################

    $args = array_keys($_GET);
    $arg = $args[0];
    list($dir,$image) = unserialize(base64_decode($arg));

    // set a default dir
    if (!$dir) $dir = 'img';

    // validate directory
    if (preg_match("#(\.|/)#", $dir)) die('dots and slashes in dir disallowed');
    if (!is_dir($dir)) die('bad directory');

    include($dir."/captions.php"); // loads in captions

    // load all the images into an array
    $images = array_keys($captions);

    // if there's no image specified, use a default
    if (!$image) $image=$images[0];

    if (preg_match("#/#", $image)) die('slashes in image disallowed');

    if (!file_exists("$dir/$image")) die("image does not exist");

    // draw the navigation
    $nav = '';
    for($i=0; $i < count($images); $i++)
    {
    $img = $images[$i];
    $arg = base64_encode(serialize(array($dir,$img)));
    if ($img != $image)
    $nav .= "".($i+1)."";
    else
    $nav .= ($i+1).' ';
    if ($i < count($images) - 1) $nav .= ', ';
    }
    $nextImage = $images[$i];
    if (!$nextimage) $nextImage = $images[$i-1];
    ?>

    body {
    background-color: #eee;
    font-size: 80%;
    font-family: "HelveticaNeue","Arial","Verdana";
    }
    .nav {
    text-align: center;
    padding: 5px;
    background-color: #ddd;
    }
    .content {
    text-align: center;
    padding: 30px;
    background-color: white;
    height: 100%;
    }
    .caption {
    padding: 10px;
    }

    /<?=$image?>'>
    <?=$captions[$image]?>

    /<?=$nextImage?>" width="1" height="1" />

    Keyword Analysis or Discovery (a first try)

    I was messing around with some textual analysis, trying to figure out how to do a "related articles" feature in Drupal. The problem with most systems is that they require someone to choose tags, which is additional work on top of the writing and initial categorization.

    This script uses the simple SEO technique of counting unique words, pairs, and triplets. The output produced is not a "related stories" list, but, it's a starting point.

    /* takes text as input, produces a list of keywords with counts */
    $common_words = array('the','be','to','of','and','a','in','that','have','I','it','for','not','on','with',
    'he','as','you','do','at','this','but','his','by','from','they','we','say','her','she','or','an','will',
    'my','one','all','would','there','their','what','so','up','out','if','about','who','get','which','go',
    'me','when','make','can','like','time','no','just','him','know','take','people','into','year','your',
    'good','some','could','them','see','other','than','then','now','look','only','come','its','over','think',
    'also','back','after','use','two','how','our','work','first','well','way','even','new','want','because',
    'any','these','give','day','most','us','is','are',"don't",'has','was',
    'by the','of a','to the','of the','on the','and the','in the','has also','for this',
    'which is','in a','not to','but it','is that','that is');
    
    $topic_words = array('immigration'=>1,'immigrant'=>1,'action'=>1,'police'=>1,'environment'=>1,'liberation'=>1,
    'undocumented'=>1,'election'=>1,'gay'=>1,'environment'=>1,'pesticides'=>1);
    
    $connector_words = array('in'=>1,'is'=>1,'which'=>1,'that'=>1,'for'=>1,'and'=>1,'but'=>1,'to'=>1,'the'=>1,'from'=>1);
    
    $text = '';
    $fh = fopen("php://stdin","r");
    while( $line = fgets($fh) )
    	$text .= $line;
    	
    
    $result = calc($text);
    print_r($result);
    
    function calc( $text ) 
    {
    	global $common_words,$topic_words, $connector_words;
    	foreach($common_words as $word)
    		$common[$word] = 1;
    	$o = array();
    	$text = strtolower($text);
    	$text = preg_replace("/['’]s/",'',$text); // no possessives
    	$text = preg_replace("/[&;:\"“”\(\),.~]/",'',$text);
    	$text = preg_replace("/[\n\r]+/",' ',$text);
    	$words = explode(' ',$text);
    	
    	// now make an array of pairs
    	$pairs = array();
    	$last_word = '';
    	foreach($words as $word)
    	{
    		if ($last_word and $last_word and !$connector_words[$word] and !$connector_words[$last_word]) array_push($pairs, $last_word . ' ' . $word);
    		$last_word = $word;
    	}
    	foreach($pairs as $pair)
    	{
    		if (! $common[$pair])
    		{
    			if (!$o[$pair]) $o[$pair]=0;
    			$o[$pair]++;
    		}
    	}
    
    	// now make an array of triplets
    	$triplets = array();
    	$word2 = $word3 = '';
    	foreach($words as $word3)
    	{
    		if ($word1 and $word2 and $word3 and !$connector_words[$word1] and !$connector_words[$word2] and !$connector_word[$word3]) array_push($triplets, $word1 . ' ' . $word2 . ' ' . $word3);
    		$word1 = $word2;
    		$word2 = $word3;
    	}
    	foreach($triplets as $word)
    	{
    		if (! $common[$word])
    		{
    			if (!$o[$word]) $o[$word]=0;
    			$o[$word]++;
    		}
    	}
    
    	foreach($words as $word)
    	{
    		if (! $common[$word]) 
    		{
    			if ($o[$word])
    				$o[$word]++;
    			else 
    				$o[$word] = 1;
    		}
    	}
    	unset($o['']);
    	foreach($o as $key=>$value)
    		if ($value == 1)
    			unset($o[$key]);
    	asort($o);
    	foreach($words as $word)
    	{
    		if ($topic_words[$word])
    		{
    			if ($t[$word])
    				$t[$word]++;
    			else
    				$t[$word] = 1;
    		}
    	}
    	return array($o,$t);
    }
    

    Sample output:

    johnk@johnk-desktop:~/Sites/test$ php keyword.php < text8
    Array
    (
        [0] => Array
            (
                [environmental] => 2
                [parents] => 2
                [pesticides] => 2
                [robina suwol] => 2
                [organizations] => 2
                [internationally] => 2
                [pest] => 2
                [safety] => 2
                [6] => 2
                [more] => 2
                [heart of] => 2
                [california safe] => 2
                [heart] => 2
                [california safe schools] => 2
                [robina] => 2
                [nation] => 2
                [students] => 2
                [green] => 3
                [health] => 3
                [safe] => 3
                [suwol] => 3
                [safe schools] => 3
                [schools] => 3
                [children] => 3
                [css] => 3
                [california] => 4
                [policy] => 4
                [school] => 7
            )
    
        [1] => Array
            (
                [pesticides] => 2
            )
    )
    

    Knowmore.org Web Scraper

    A little script that scrapes the Knowmore.org page and spits out all the pages via javascript.

    A newer version is stored up at PHPClasses, in KnowMore. [Both are obsolete.]

    AttachmentSize
    webscraper.php.txt1.61 KB

    Logging Very Slow SQL Queries, in PHP

    MySQL has a feature to log slow queries, and it's nice, but the problem is, a lot of the queries look alike. So what you want is a backtrace so you can find the code that created the query. This is a modification to (pretty much) any db abstration layer. What you do is log each query, and generate backtraces if the execution time is long. If a script doesn't have any slow queries, the logged query is deleted.

    The bad news is, if a query times out or the page loading stops, the code to delete the log file won't execute, and you'll be stuck with an irrelevant log file. To fix that, I run a cron job to delete small log files. It's not a perfect solution, but it helps clean up /tmp

    (A better solution would be to use grep's return value as input to a statement that deletes a file. That would probably need to be in a script, to be safe.)

     
       function execute_statement_return_autokey($sql)
        {   
            //runs a SQL statement for an INSERT and returns the ID of the
            //row added
            global $db_debug;
            global $db_debug_save_only_backtraces;
            if ($db_debug) {
                    $fh = fopen('/tmp/sf-active-db-log-'.getmypid(),'a+');   
                    $time_start = time();
                    fwrite($fh,"---start execute_statement_return_autokey \n");
                    fwrite($fh, $sql);
            }
    
            // execute query here...     
    
            if ($db_debug) { 
                    $elapsed = time() - $time_start;
                    if ($elapsed > 10) {
                        fwrite($fh, "\n--------------backtrace-------\n");
                        fwrite($fh, serialize(debug_backtrace()));
                    }
                    fwrite($fh,"\n---end, $elapsed seconds\n");
                    fclose($fh);
                    if ($db_debug_save_only_backtraces && !$db_debug_backtrace)
                        unlink($db_debug_filename);
            }
    
            // other code here, like error checking
        }
    

    MailGust Hacking Notes

    MailGust is a mass mailer application. It's a fairly complex program, but, that's part of what makes it interesting. It's not straightforward like most PHP scripts -- that is, it is programmed by people who like to program, and use potentially confusing techniques.

    This document describes, in some detail, the process of adding a first name field to the Subscription class. The database has been altered so there's a TINYTEXT field named "name" in the maillist_subscription table.

    The application is based around the framework in /gorum.

    index.php brings together a few things, eventually loads gorum/init.php and that runs gorumMain().

    Application related values are set in constants.php. The defaultMethod is "showhtmllist" and the defaultClass is "maillist". This means that the default thing you see is alist of maillists.

    The application state is stored in the URL. The The 'method' parameter is used to set the 'method' property of the class. The class is extracted from the 'list' paramter. (Think of the list as a table in the database.)

    Methods are switched via the allowedMethods array, defined in gorum/gorumlib.php and constants.php. The method name maps to some PHP code, and that code is "eval'd".

    list=subscription_ok means that the Subscription class is instantiated. method=showhtmllist means the showhtmllist method is used. (In allowedMethods, showhtmllist maps to the code "$ret=$base->showHtmlList($s);"

    showHtmlList() (defined in Object) calls loadHtmlList, which calls base->getListQuery(). That function generates a string that's used to call base->loadObjectSQL().

    getListQuery() is defined in HtmlList.php. It calls base->getListSelect(), base->getOrderBy(), and base->getLimit(). The results of these are used to build up the query string.

    getListOrderBy() and getLimit() are defined in the HtmlList class, and automatically write code to sort and show a range of data. They use cookies to store the current page state.

    getListSelect() is defined in the base class. The query in there is altered so the "name" field is included in the query.

    The array at the top of subscription.php is subscription_typ. This array is a description of the list "subscription" and describes both the table in the database, and the Subscription class in the application. This element is added to the array: "name"=>array( "type"=>"TINYTEXT", "length"=>64, "text", "list", "details"),

    Note that there are more elements in the array than in the table. The listName element maps to the list name column in the web page this class helps generate. The value of that column is calculated within this class.

    The table is rendered by showHtmlList(). It scans through the subscription_typ array, and some elements become columns in the table. Elements with values of "no column" and "form invisible" will not be shown. Others will be shown. (Note that this _typ array does not affect the query used to populate the table. This is set in getListSelect(). The query generated should return columns named to match the elements in the _typ array, however.)

    The table headings are taken from the lang/lang_en.php file.
    Alter the lang/lang_en.php file, adding this line:
    $lll[["subscription_name"]]="Name";

    The key is generated by concatenating the class name, _, and the element from _typ.

    In gust, each class maps to a table in the database. The Import class maps to maillist_import in the db. This is unusual, because typical applications would not save the input into the import scripts. This is how gust does it.

    Forms for Import are generated via the generForm() function in form.php. It's called by Object:generForm() method in the parent class for Import. generForm users the data in the _typ variable to generate the form.

    Importation modifications are made to the Import class.

    The Importation process is interesting. It saves the import data to the db. Then it parses it into TempImport. Then, these are moved over to Subscriptions. This last part is done via a cron, so it can be batched. Thus, a new "name" field needs to be added to TempImport to hold our additional data.

    If you're having a hard time reading the code, processing it with [PHPDocumentor] might help.

    Map

    This is a tool to write a small code template for array_walk(). It also demonstrates using JavaScript to generate code -- view source to see this. Perl has a nice feature called "map". It's similar to array_walk(), except not clunky. It looks like this: map( function, @array ); OR map { ... } @array; This latter form is powerful. You can stack it like this: map { ... } ( map { ... } @array ); PHP lacks this nice syntax. It does, however, have array_walk(), and that can work in a pinch. Below is a form that writes a little bit of array_walk() code for you. Type a variable name, and press TAB.
    Array variable name $

    Mass Convert IP Addresses to Domain Names with a Filter

    A PHP script to convert many IP addresses to domain names. You can paste text with addresses and it'll convert them.
    <?php
    
    if ($_POST['text']) {
            $t = $_POST['text'];
            $o = preg_replace('/(\\d+\\.\\d+\\.\\d+\\.\\d+)/e', 'get_host(\'\\1\')', $t);
            # $o = preg_replace('/(\\d+\\.\\d+\\.\\d+\\.\\d+)/e', 'gethostbyaddr(\'\\1\')', $t);
            echo '<pre>'.$o.'</pre>';
    }
    
    function get_host($ip){
            $ptr= implode(".",array_reverse(explode(".",$ip))).".in-addr.arpa";
            $host = dns_get_record($ptr,DNS_PTR);
            if ($host == null) return $ip;
            else return $host[0]['target'];
    }
    
    ?><html>
    <body>
    <p>This script takes text with embedded IP addresses as input, and converts the addresses
    to domain names.</p>
    <form method="post">
    <textarea name="text" rows="10" cols="80">
    </textarea>
    <br />
    <input type="submit" />
    </form>
    </body>
    </html>
    

    Method Chaining, some opinions

    A few years back, method chaining got to be all the rage, and now it's common. I'm not sure it's a good thing, though, because you only gain a little syntactic clarity, but at the cost of losing the return value of a function.

    In a functional programming style, you write code like this:

    a = f(g(h(x)));

    Or you write it Lisp style:

    (set a (f (g (h x) ) ) )

    Pretty much the same thing, and it means the return value is the input to the next function.

    The Unix shell parallel is pipes:

    cat a | sort | uniq | more

    The output of one command becomes the input of the next.

    In method chaining, there's a big difference: each method call returns a reference to the object, and thus the object's context:

    $f = new Foo();
    $f->set_a(10)->set_b(30);

    Fine. both set_a and set_b return $f, so you can make another method call on the object.

    This seems okay when all you're doing is setting, but suppose you wanted to change the type of the object:

    $f = new Foo();
    $f->to_string()->split(',')->apply(function ($a) {echo $a;});
    

    I'm not sure I could do that in PHP, but, that would be, I think, a better way to use that return value.

    Of course, setters and getters are different beasts. The setters can be chained. The getters all return values that are not the object itself.

    What about appending an object:

    $f = new Foo();
    $f->append($x);

    The append() method should probably return $f.

    What about deleting an object:

    $f = new Foo();
    $f->delete_like('x*');

    The delete_like() method should probably return $f.

    Perhaps my suspicions about method chaining are all wrong.

    Mixed Object Types in ContentIterWriter

    Josh and I were congratulating ourselves on the relative goodness of ContentIterWriter (aka ContentObject), and it's been bouncing around my brain a while.

    There was a recent situation where I thought I'd try to put multimedia objects into CIW, and had a hard time. The job to do this never really happened, so figuring it out didn't happen either. I was thinking of creating a hierarchy of tables, where there's a general table for all objects, and a table for each specific type. (That was a bad idea.) Today I think the right way to do it isn't to create a super-type that can hold all types of media, but to create an object that takes CIW objects as input, and sorts them based on a specified field name for each object.

    The only constraint is that the CIWs are pre-sorted.

    On a call to ->next(), the logic within the aggregator object would look at the sort codes, and pop one of the values.

    The display logic can be unrolled into a big switch-like statement. (The templating system doesn't allow for that, but maybe it should.)

    The only problem is that you can't select a range of results, like in a regular query. The simplest solution to this is to cache the results, and then read from the cached results.

    NVU HTML to PHP Parser

    3-10-2008: I never really ended up using this tool.

    The simplified template system described elsewhere was working pretty well, but I was faced with making a number of web forms. I hate coding these up by hand because there are always changes, and it's a pain to edit the table layouts. (CSS isn't quite right.) So, I wrote a tool that converts the HTML code into the .PHT format. .PHT is a very small subset of PHP, consisting of echoing object properties, calling object methods, the ternary operator (? :), and while loops that call the next() method.

    I decided to edit the web forms in NVU. NVU is a visual HTML editing tool, based on Mozilla Composer, but better. My rationale for using NVU was simple: I like to use it, it's free, and it's better than editing HTML code by hand. It would speed up the process of making templates, possibly significantly. Also, it allows the programmer to have the users edit the forms, then, to build the app behind the forms. Here's what I did:

    Whenever I need to introduce a variable substitution, I typed it like this:

    $variable
    
    $object->method()
    
    $object->property
    

    To insert a dollar ($), just put a slash in front of it.

    In some places, I needed to loop over error messages. To do this, I typed:

    $errors->message
    

    Then, I inserted a comment on the line above it, and a comment on the line below it. The comments are on their own lines. The comment above had this content:

    while $errors
    

    The comment below had this comment:

    endwhile
    

    In NVU, the comment is inserted from the Insert menu. When you insert the comment, make sure it's on its own line. This will show up as a yellow question mark. In the sources, it'll show the comment followed by a BR tag; don't sweat it -- the compiler will remove the tag.

    There's a feature that lets you put more than one template into the HTML file, and have it split out into multiple files. This is done with the @file() directive. The following will save the code below it into "signin.pht".

      @file(signin.pht)
    

    I created the templates.html file (see attachment).

    Then, I created a page with this content:

    <?php
    include 'HtmlToPht.class.php';
    
    $p = new HtmlToPht( 'templates.html' ); // the name of the file I saved 
    ?>
    

    And executed it. This emitted several .pht files, each one a functioning template in the simple template language.

    Here's the code for HtmlToPht.class.php:

    
    /**
     * This script converts NVU output to PHT templates.
     * It sprinkles a little syntactic sugar over the HTML.
     * $thing becomes <?php echo $thing ?>, unless it looks like \$thing, which becomes $thing.
     * <!--while $thing--> becomes <?php while( $this->next() ) : ?>
     * and <!--endwhile--> becomes <?php endwhile; ?>
     * ALSO, if the above comments are followed by a <br />, the break is stripped.
     *
     * @file(filename.pht) starts writing the data to filename.pht
     * The first lines are written to the template's filename, but with a pht extension.
     *
     * Overall, the code is pretty crude.
     */
    
    class HtmlToPht 
    {
    	var $whileStack;
    	var $indent;
    
    	function HtmlToPht( $template )
    	{
    		$this->template = $template;
    		$this->f = fopen( $this->template, 'r' );
    		$this->newFileName = preg_replace( '/(.html)$/', '', $this->template ).'.pht';
    		$this->output = fopen( $this->newFileName, 'w' );
    		$this->whileStack = array();
    		$this->indent = 0;
    
    		while( $line = fgets( $this->f ) )
    		{
    			if ( preg_match( '/^<body/i', $line ) ) break;
    		}
    		while( $line = fgets( $this->f ) )
    		{
    			$line = rtrim($line);
    			if ( preg_match( '#^@file\\((.+)\\)#', $line, $matches ) )
    			{
    				$filename = $matches[1];
    				if ($this->output)
    					fclose($this->output);
    				$this->output = fopen( $filename, 'w' );
    			}
    			else if ( preg_match( '#</body>#', $line ) )
    			{
    				fclose($this->output);
    				break;
    			}
    			else
    			{
    				$indented = false;
    				$newline = $line;
    				$newline = $this->replace_while( $newline );
    				if ($newline != $line)
    					$indented = true;
    				$newline = $this->replace_endwhile( $newline );
    				if ($newline==$line)
    					$newline = $this->replace_dollars( $newline );
    
    				if ($newline)
    					fwrite( $this->output, str_repeat( '    ', $this->indent ) . $newline . "\n" );
    
    				if ($indented)
    					$this->indent++;
    			}
    		}
    		// clean up empty file
    		$stat = stat( $this->newFileName );
    		if ( $stat['size']==0 )
    			unlink( $this->newFileName );
    	}
    
    	function replace_dollars( $line )
    	{
    		$out = $line;
    		$out = preg_replace( '/([^\\\\])(\\$[a-zA-Z&;()-]+)/', '\\1<?php echo \\2; ?>', $out );
    		if ($out == $line)
    			$out = preg_replace( '/^(\\$[a-zA-Z&;()-]+)/', '<?php echo \\1; ?>', $out );
    		$out = preg_replace( '/(echo \\$[a-zA-Z]+-)>([a-zA-Z()]+;)/', '\\1>\\2', $out );
    		if ($out == $line)
    			$out = preg_replace( '/\\\\(\\$[a-zA-Z&;()-]+)/', '\\1', $out );
    		return $out;
    	}
    	function replace_while( $line )
    	{
    		$out = $line;
    		preg_match( '/<!--while \\$([a-zA-Z]+)-->/', $out, $matches );
    		if ($matches[1]) 
    			array_push($this->whileStack, $matches[1]);
    		$out = preg_replace( '/<!--while \\$([a-zA-Z]+)-->/', 
    			'<?php while( $\\1->next() ) : //begin \\1 ?>', $out );
    		if ($out != $line)
    			$out = preg_replace( '#(.+)<br />$#', '\\1', $out );
    		return $out;
    	}
    	function replace_endwhile( $line )
    	{
    		$out = $line;
    		if (preg_match( '/<!--endwhile-->/', $line ) )
    		{
    			$while = array_pop($this->whileStack);
    			$out = preg_replace( '/<!--endwhile-->/', 
    				'<?php endwhile; //end '.$while.' ?>', $out );
    			if ($out != $line)
    				$out = preg_replace( '#(.+)<br />$#', '\\1', $out );
    			$this->indent--;
    			$this->indent = max( 0, $this->indent );
    		}
    		return $out;
    	}
    
    }
    

    This is an ad-hoc little language. The parser is primitive and difficult to extend.

    AttachmentSize
    templates.html.txt2.35 KB
    screenshot_nvu.jpg38.36 KB

    Name Based Dispatch

    Here's a simple way to do name-based dispatch. $dispatchTable = array( 'First Thing' => 'functionName', 'Second Way' => array('ClassName', 'StaticMethod') ); ... ... $data = "Your data here, probably in an array."; call_user_func( $dispatchTable[ $name ], $data ); This is nice, because you can put the dispatch table at the top of your code, or in a separate file. I'm using it in a payment processing system, where different products may be dispatched to different code. You get a level of indirection here, where you can swap out different methods for each product (or use the same method). The only "trick" here is that, call_user_func() takes a pseudo-type known as a callback. Callbacks come in three forms:
    string'fooFunc'this will call fooFunc()
    arrayarray('Foo','func')this will call Foo::func()
    arrayarray( $foo, 'func')this will call $foo->func()

    No Eval? Variable Interpolation on PHP Code

    We recently turned off the websites' ability to use the eval() or related functions. In a small CMS I'd written a while back, I was using eval to interpolate variable names in strings. This was a simple way to do "lazy evaluation" on strings I was using as templates. With eval, there was no need to use a special templating syntax - the syntax was PHP's.

    Now, with eval turned off, I needed a function to interpolate variables in a string. Here it is:

    
    function interpolate( $string )
    {
        foreach ($GLOBALS as $name => $value)
        {
            $string = str_replace( '$'.$name, $value, $string );
        }
        $string = preg_replace( '/[$]\\w+/', '', $string );
        return $string;
    }
    

    Overloading Overview

    Good news - overloading in PHP works after PHP 4.3.10.

    Bad news - it's kind of not-totally-right. First off, it's not like operator overloading. Rather, it's a hook that dispatches some functions if the properties or methods don't exist in the class.

    Best practicies for PHP overloading with some bug info. Basically, it says "don't overload a parent class, and overload only the final classes."

    Comments at the php.net site.

    PHP List Hack: Shared Codebase

    This is a trick to allow you to run one copy of the PHPList software on your server, and have separate configurations for each domain. The main advantage is that you don't have a bunch of copies of the same software all over the place. (PHPList takes up around 5 megs of disk. Using a similar technique with Drupal [which takes up 6 megs], we can install Drupal and PHPList in 260k of space, not including databases.)

    At the top of index.php and admin/index.php, there's some code that looks for the config.php file. This new code snippet allowed me to create a config file named "phplist.config.php" in the user's root (or the ftp root) instead of the usual places.

    The lists/ directory, which contains all the code, was moved to /usr/local, and the directory symlinked to the document root.

    The code at the top does a little bit of guessing about the domain name. We also have a preview link (to let us work on the site before it's launched), and we can detect the domain name from the URI if they're using the preview.

    The config file should be generated from a template, for the user.

    
    // johnk - shared codebase needs some info
    $domain = $_SERVER['SERVER_NAME'];
    $domain = preg_replace('/^www/i','',$domain);
    if ($domain=='zanon.slaptech.net') {
        preg_match( '#/preview/(.+?)/#', $_SERVER["REQUEST_URI"], $matches );
        $domain = $matches[1];
    }
    
    if (isset($_SERVER["ConfigFile"]) && is_file($_SERVER["ConfigFile"])) {
      print ''."\n";
      include $_SERVER["ConfigFile"];
    } elseif (isset($cline["c"]) && is_file($cline["c"])) {
      print ''."\n";
      include $cline["c"];
    } elseif (isset($_ENV["CONFIG"]) && is_file($_ENV["CONFIG"])) {
    #  print ''."\n";
      include $_ENV["CONFIG"];
    } elseif (is_file("/www/$domain/phplist.config.php")) {
      print ''."\n";
      include "/www/$domain/phplist.config.php";
    } elseif (is_file("../config/config.php")) {
      print ''."\n";
      include "../config/config.php";
    } else {
      print "Error, cannot find config file\n";
      exit;
    }
    

    In connect.php, a little hack enables staging:

    
    function Redirect($page) {
        if ($GLOBALS['staging']) {
            Header("Location: " . $GLOBALS['staging_url'] . $GLOBALS["adminpages"] . "/?page=$page");
            exit;
        } else {
            $website = getConfig("website");
            Header("Location: " . $GLOBALS['scheme'] . "://" . $website . $GLOBALS["adminpages"] . "/?page=$page");
            exit;
        }
    }
    

    And, in the config file, there's new definitions:

    $staging = 1;
    $staging_url = "http://zanon.slaptech.net/preview/myhdomaind.com";
    

    PHP Namespaces with Autoloader Example

    It took a while to wrap my mind around PHP namespaces - despite the fact I've needed them for years. It's just one of those features that seems weirder in PHP than in other systems. But that's normal for PHP - quirky. Unfortunately, it's not quirky like Perl, where the quirk eventually makes you feel good. With PHP you just feel kind of odd, maybe a little inferior... like your language is slipping toward becoming the Visual Basic of the web.

    Namespaces are mainly a way for vendors to keep their class names and function names from clashing with those of other vendors. There are other uses for them - but generally, one vendor will tend not to have naming clashes within its codebase - and if there is a clash, it can usually be resolved with a few meetings.

    Namespaces add another layer of indirection to the naming system so that you can avoid clashes with code you can't control. It also provides an aliasing feature so if there is still a clash, there are ways around that, too.

    The tradeoff, with PHP, is code ugliness. Of course, that's usually the trade off with namespaces, so it's okay.

    Rather than get into the syntax, let's look at how the files are organized. We have a few PHP files in the docroot, and a classes directory. Within classes, we have a folder hierarchy based on the domain name system (DNS). DNS is what's used in Java, so we'll use it here as well. It helps to avoid namespace clashes.

    johnk@tiny:~/Sites/test$ ls -R *
    index.php  lib.php  ns.php
    
    classes:
    com
    
    classes/com:
    riceball
    
    classes/com/riceball:
    sw  Test.class.php
    

    The only class we have is Test.class.php. That's using the .class.php naming convention for PHP classes. Here's the code for Test.

    <?php
    namespace com\riceball;
    
    class Test {
      function hello() {
        echo "<p>Hello, World</p>";
      }
    }
    

    It just prints Hello World. At the top, the namespace statement declares that all the things below are defined in the com\riceball namespace. The namespace we use can be any string, but our convention will be that the namespace matches the file path. The file path is com/riceball/Test.class.php. The namespace is com\riceball.

    This correspondence between the namespace and the file path will be key in creating an autoloader for this class.

    Autoloaders

    You can skip this if you know what an autoloader is.

    PHP has a feature where you can write a function, named __autoload(), that will be used to call "include" statements to include class definition code. This way you can stop writing those "require_once" statements at the top of your files. (It turns out that require_once is slow, because it does a file stat each time it's called.)

    The way it works is simple. When you try to use the Test class before it's defined, PHP calls __autoload('Test'); where it's the classname being passed in. The __autoload function can then prepend a path, and append '.class.php', and "require()" that. PHP then uses tries to use the class again and succeeds.

    When you're using namespaces, the autoloader is prepends the namespace to the class. Here's the code for __autoload, which is defined in lib.php:

    <?php
    define('CLASS_PATH','/home/johnk/Sites/test/classes/');
    function __autoload($cn) {
      require(CLASS_PATH.str_replace('\\','/',$cn).'.class.php');
    };

    What this does is construct an absolute path to the class' code. Absolute paths are the best because they load faster than relative paths.

    PHP will call __autoload with 'com\riceball\Test'. That gets mangled and turned into /home/johnk/Sites/test/classes/com/riceball/Test.class.php, which is then loaded.

    "Use"-ing Namespaces

    Three different ways to use the namespace are demonstrated below. The first two are in index.php, and the last in ns.php. Here's the code in index.php:

    <?php
    require('lib.php');
    
    // The preferred style.
    
    use \com\riceball as RB;
    
    $u = new RB\Test();
    $u->hello();
    
    // A nicer looking style... but one that's slightly at risk
    // of name collisions.
    
    use \com\riceball\Test as Test;
    
    $t = new Test();
    $t->hello();
    

    While there are a other ways to use classes within namespaces, I'm going to stick with these two. They are pretty explicit without being verbose. The first form is to "use" the namespace, and then alias it to "RB". Then, when we make an instance of Test, we have to call it RB\Test.

    PHP expands this to com\riceball\Test, and passes that to __autoload() which will then load the code for Test.

    The second form is to create an alias for the class itself. Again, this expands to the same name, and that's passed to __autoload().

    The third form is in ns.php:

    <?php
    // Within an application, maybe it's OK to use
    // namespace on every file.
    
    namespace com\riceball;
    
    require('lib.php');
    
    $t = new Test();
    $t->hello();
    
    echo 'global echo';
    

    This form was put into a separate file because the namespace keyword can be used only as the first statement. What this code says is "I'm also in the com\riceball namespace." (So that Test will resolve to com\riceball\Test, and again that's passed into __autoload().)

    When your code is in a namespace, you can still call global functions like echo().

    I'm experimenting with putting all my code into a namespace. This way, the programming style, within the namespace, is identical to not using a namespace at all. When it comes time to re-use code that exists in another namespace, you will need to use one of the above two syntaxes to access the namespaced code.

    Some other web pages have demonstrating using a hierarchy of namespaces in libraries, but I don't see value in that. Why use namespaces for their own sake? Instead, use them when you need a clean way to separate out your code from some other organization's code, without requiring any interaction with the other organizations.

    Granted, projects require lots of libraries, but libraries usually exist within a global community namespace, and they are generally managed by some organization or by a polite group consensus (people name their libraries based on what's already out there and popular). PHP has the PEAR namespace, PHPClasses is a kind of namespace, and each framework or library comes from a vendor, and within each collection of classes, there are no name clashes.

    There's no reason to add yet another organizing schema, in addition to naming conventions and classes (and I guess the way people use global arrays in PHP), when it's not necessary.

    Addendum: real world example

    I'm refactoring a small program and came across two issues. One was that the __autoload() function needs to be defined in the global namespace. That means either putting it in its own file, or using the namespace { } construct to hoist it out of whatever namespace is defined. I went with the former. Here's my actual namespace function:

    function __autoload($cn) {
      try {
        $f = (CLASS_PATH.str_replace('\\','/',$cn).'.class.php');
        if (!file_exists($f)) {
          throw new Exception('file not found');
        }
        require($f);
      } catch (Exception $e) {
        echo '<p>'.$e->getMessage().'</p>';
        echo 'class: '.$cn;
      }
    };
    

    It's a bit less simple than I'd like, but it deals better with errors in your code. Instead of breaking in __autoload(), it'll give you a stack trace back to where your error is. The typical error is using a global constant without the \ prefix.

    PHP OOP Links

    This is a list of links to articles about OOP in PHP. I need to read up on it. Though I know the OO features (most of them at least) I wanted to read some code.

    http://michaelkimsal.com/blog/php-is-not-object-oriented/

    Noob:
    http://www.devarticles.com/c/a/PHP/Object-Oriented-Programming-in-PHP/
    http://www.codewalkers.com/c/a/Programming-Basics/Beginning-Object-Orien...
    http://net.tutsplus.com/tutorials/php/object-oriented-php-for-beginners/
    http://www.killerphp.com/tutorials/object-oriented-php/
    http://notan00b.com/tutorials/php-object-oriented-programming/
    http://oops.opsat.net/doc/php5/oop-basics-chap2.html

    Practitioner:
    http://www.friendsofed.com/book.html?isbn=1430210117
    Same book at Google Books
    At O'Reilly Safari
    http://php.net/manual/en/language.oop5.php
    http://objectorientedphp.com/
    http://www.massassi.com/php/articles/classes/
    http://www.developer.com/lang/php/article.php/3302171/The-Object-Oriente...
    http://www.ibm.com/developerworks/opensource/library/os-php-7oohabits/in...
    http://www.triconsole.com/php/oop.php
    http://zendgeek.blogspot.com/2009/07/php-object-oriented-programming.html

    Expert Craftsperson:
    Awesome list: http://www.gotapi.com/php/php_object_oriented_tutorials.html
    Awesome site: http://www.tonymarston.net/php-mysql/databaseobjects.html
    http://www.ibm.com/developerworks/library/os-php-designptrns/
    http://www.ibm.com/developerworks/opensource/library/os-php-designpatterns/
    (From the cool Developerworks site. PHP on DW)
    http://www.tonymarston.net/php-mysql/model-view-controller.html
    http://www.fluffycat.com/PHP-Design-Patterns/
    http://www.devarticles.com/c/a/PHP/Introduction-to-Design-Patterns-Using...
    http://www.php5dp.com/

    PHP Tutorial

    I used to have a PHP tutorial here, and it was somewhat heavily downloaded. It needed a revision that never happened. Anyway, it's lost, so here's a new one.

    Writing tutorials for newbies is fairly difficult. The subject matter is simple, but the exposition needs to be deliberate. Additionally, written instructions are linear, and follow a single path through the instruction. The problem is, we don't learn in a linear manner; we learn through repetition and by trying different things. So there's a mismatch between how we learn, and how we're taught.

    The short solution is for learners to try different tutorials, and for tutorial writers to expose the material using different examples or in a different order.

    Here's another take on learning PHP.

    (Planned path: Reading Data, Defining New Functions, Reading Lots of Data and Arrays of Arrays, Split, Looping Over Arrays, Libraries of Functions, Writing Data, Data Formats (serialized, base64, CSV, tab delimited), Databases, Regular Expressions, Data Validation, Objects as Data Structures, Objects for Program Organization and Encapsulation, Looping Over Directories, A Database Abstraction Class, A Directory Abstraction Class.)

    Other Tutorials

    For starters, here are some links out to some existing tutorials.

    W3Schools PHP Tutorial

    TizTag's Tutorial

    Webmonkey's Tutorial

    Zend's tutoral

    Evaluation and Variables

    
    <p>
    <php echo 1; ?>
    </p>
    
    
    <p>
    <php echo 1 + 2; ?>
    </p>
    
    
    <p>
    <php echo 2 * 3; ?>
    </p>
    
    
    <p>
    <php echo 10 / 2; ?>
    </p>
    
    
    <p>
    <php echo 10 / 3; ?>
    </p>
    
    
    <p>
    <php echo 1 + 2 * 5; ?>
    </p>
    
    
    <p>
    <php echo ( 1 + 2 ) * 5; ?>
    </p>
    
    
    <p>
    <?php
        $a = 10;
        echo 2 * $a;
    ?>
    </p>
    
    
    <p>
    <?php
        $a = 10;
        $b = 2;
        echo $b * $a;
    ?>
    </p>
    
    
    <p>
    <?php
        echo "Hello, World";
    ?>
    </p>
    
    
    <p>
    <?php
        $a = "Rose";
        echo "Hello, ";
        echo $a;
    ?>
    </p>
    
    
    <p>
    <?php
        $a = "Rose";
        echo "Hello, " . $a;
    ?>
    </p>
    
    
    <p>
    <?php
        $a = "Rose";
        $b = "Hello, " . $a;
        echo $b;
    ?>
    </p>
    
    
    <p>
    <?php
        $a = "Rose";
        $b = "Hello, $a";
        echo $b;
    ?>
    </p>
    
    
    <p>
    <?php
        $a = "Rose";
        $b = "Hello, \$a";
        echo $b;
    ?>
    </p>
    
    
    <p>
    <?php
        $a = 100;
        $b = "The price is \$$a";
        echo $b;
    ?>
    </p>
    
    
    <?php
        $a = 100;
        $b = "The price is \$$a";
        echo "<p>";
        echo $b;
        echo "</p>";
    ?>
    
    
    <?php
        $a = 100;
        $b = "The price is \$$a";
        echo "<p>$b</p>";
    ?>
    

    Arrays, Passing Values, the $_GET[] Variable

    In the previous chapter, we explored evaluation, evaluation order, variables, numbers and strings.

    In this chapter, we explore how to send input to our short programs.

    The most basic way to pass data from the web browser into our program is through the URL's query mechanism.

    You've seen URLs that look like this:

    http://foo.bar.com/index.php?id=100
    

    The stuff to the right of "?" is called the query. The query is made up of pairs of names and values, separated by =; the pairs are separated by "&". Parameters are like variables, in that they assign values to named storage. In the above URL, the value of "id" is "100".

    In PHP, these parameters are passed to a PHP script through the $_GET[] array variable.

    Before discussing $_GET, let's quickly discuss arrays.

    An array is an ordered list of values. You can assign an array to a variable, like this:

    <?php
    $a = array( "John", "Rose" );
    ?>

    <?php
    $a = array( "John", "Rose" );
    echo $a[0];
    ?>

    <?php
    $a = array( "John", "Rose" );
    echo $a[1];
    ?>

    An associative array is a variation on the array, where, instead of referring to each element by number, you use a name.

    <?php
    $a = array( "host" => "John", "guest" => "Rose" );
    echo $a["host"];
    ?>

    <?php
    $a = array( "host" => "John", "guest" => "Rose" );
    echo $a["guest"];
    ?>

    <?php
    $a = array( "host" => "John", "guest" => "Rose" );
    echo $a["Guest"];
    ?>

    <?php
    $a = array( "host" => "John", "guest" => "Rose" );
    echo "<p>Host: $a[host]</p>";
    echo "<p>Guest: $a[guest]</p>";
    ?>

    Now, we can discuss the special variable $_GET[].

    $_GET[] is an array that contains the parameters in the URL's query. Thus, if the URL is:

    http://foo.com/test.php?id=100&filter=91770

    Then the value of $_GET[] are:

    $_GET["id"] = 100;
    $_GET["filter"] = 91770;

    Here's a script that will add two numbers:

    <?php
    $a = $_GET["a"];
    $b = $_GET["b"];
    echo $a + $b;
    ?>

    <?php
    $a = $_GET["a"];
    $b = $_GET["b"];
    echo "$a + $b";
    ?>

    <?php
    $a = $_GET["a"];
    $b = $_GET["b"];
    echo "$a + $b = " . ( $a + $b );
    ?>

    HTML Forms and $_GET[]

    As clever as it is to alter the URL to pass values to your script, the normal way to pass values is through a web form.

    Here's some HTML that will display a form with two fields.

    Here's some PHP code to do something with that form.

    Here's a script with the two parts combined.

    Here's another script, but one which displays the previously entered value in the form.

    POST is another technique, similar to GET, but different.

    HTML Forms and $_POST, a Simple Templating Script

    This page will describe the difference between GET and POST, and the typical uses for each.

    Generally, you use GET when you want the user to be able to bookmark the page. You also use it when retrieving data from a database (we'll get to that later).

    You use POST when you're writing data to a database (again, we'll discuss this later). You don't want people bookmarking that kind of data, because it would quickly lead to repetitive data being recorded.

    Calling Functions, Writing Files

    PHP with Less Risk

    An extremely short article about how to avoid pitfalls that will get you hacked. I've been hacked, so I kind of know this from experience.

    Install Suhosin.

    Disable eval(). Hack scripts use that. Read this.

    Use the filter_var function and validate all input.

    Use filter_var to transform the input into the specifc type of data you need, and within specific ranges.

    Consider validating text data against lists of valid values.

    Malware payloads may be delivered via GET or POST parameters. Check for length and keep data short.

    If tests fail, throw and exception - don't just let it fall through to the next statement.

    Don't ever use $_GET or $_POST below the block of your filter_var statements.

    Avoid using $_REQUEST. Today, people treat GET and POST differently.

    Supply your default values up where you filter_var your user input.

    Always use PDO. Always use parameterized queries. Never concatenate user input to strings to build your queries.

    If your framework's database abstraction layer doesn't use PDO, consider rewriting parts of it to use PDO.

    Use SQLite. It's not only awesome, it'll reduce risks from parsing your bespoke data file formats.

    Do not concatenate user input to make a file path.

    Do not concatenate user input to make a file name.

    If you need to use a filename supplied by the user, validate the filename first. (That means the filename provided by $_FILES.)

    Check the extension of a filename to see if it's valid, and not something that might be malware, like a BAT or EXE file.

    Beware, some PDFs contain malware. Scan them with something like ClamAV.

    Do not hash passwords without a salt. Always add a salt. That's as important as choosing SHA1 over MD5 and choosing SHA256 over SHA1.

    If possible, use Digest authentication instead of relying on SSL to keep your passwords encrypted.

    Don't roll your own security. Use libraries.

    All output must be escaped. Store your data unescaped, but display it escaped. (If you store it escaped, you'll end up having blocks of output code where the data isn't escaped. Then, one time, you will forget to escape some unescaped data.)

    Escape URL values.

    Escape all textual output to the page.

    Escape data values that are going to be concatenated into HTML tag attributes.

    Consider disallowing HTML code, or at least certain kinds, like IFRAME tags.

    Escape JSON using the JSON libary. Don't construct the JSON string by hand.

    If you generate JS, escape all user or other data inserted into the JS templates.

    If you transform data with regexs, consider putting them into functions or methods, so they can be tested.

    Consider using a unit test framework like PHPUnit. It's not as inefficient and work-producing as you would expect.

    Consider using the PHPDocumentor tool to write docs about your code.

    Okay, this isn't so short.

    PHP with More Coolness

    A short article explaining how to improve your experience and produce slightly better code.

    URLs: it's important to define your URLs rather than exposing all your PHP files to the world.

    Use routes - study the one in ZF2 but don't copy it unless you really need it. CodeIgniter's is pretty nice too. Routes help map URLs to classes and methods.

    If you don't want to use routes, use Apache's mod_rewrite. It's faster, and also awesome. It tranforms specific URL into specifc requests to specific scripts.

    Learn a little Java. The PHP dev folks seem to be copying Java, for better or worse, and you need to be ahead of that curve. Read Javadocs, especially the standard classes, and marvel at how much nicer their docs are.

    Mustache, Handlebars, and Twig are the only templating languages worth using.

    Learn the magic of autoloading... but write the autoloader function with something like Composer's autoloader generator. Brrr... this is the ugliest feature of PHP.

    Always name your constructor __construct().

    Learn PHPUnit, and use it.

    Always use __FILE__ and __DIR__ if you don't autoload. Make file references relative.

    Try not to roll your own db abstraction layer. Just use PDO.

    Try not to roll your own awesome validation utilities. Just use filter_var. It's not that this is a great solution - it's just a common one that's not too hard to read.

    Don't use the built in URL file opening features. It's too easy to mistakenly code up something that calls out to the internet based on user input. Use the CURL libraries instead. Wrap them in a class to make life easier for yourself, because the existing functions are hard to use. Add validation code into these functions.

    Accept that PHP frameworks are kind of painful and will eventually be obsoleted by new language features. I helped write one that ran on PHP 4. Would you want to program in a framework hobbled by the limitations of PHP 4? Didn't think so.

    Accept that the newest frameworks will jump on some language feature bandwagon and try to implement something in PHP that isn't possible with regular PHP, or doesn't make sense with PHP:
    jQuery fluent programming, a proper feature, became PHP fluent programming, a pointless feature.
    JS and node.JS do callbacks with closures, a reasonable hack... they became Laravel's nifty routing feature... but does it really gain you anything?
    All frameworks have routes... but isn't ZF2's system a little too elaborate?

    If you don't feel like using dependency injection... just tell the people that adding a layer of indirection will only confuse you and every other programmer.

    Terminate your pathnames with a DIRECTORY_SEPARATOR. Define "DS" to be the directory separator.

    If you're thinking of using a short prefix as a naming convention for functions, consider using the OOP features.

    If you're thinking of using a naming convention for your class names, consider using the namespace features.

    If you start to use namespaces for one class... do it for all the classes in the project. Then go and generate an autoloder.

    YAML is a wonderful file format for configuration files. Python is also a pretty good format for configuration files. The INI file reader is also really good, and supported natively. Whatever you do, use a real configuration file rather than a PHP file that you include. You will be happier if you do that.

    I'm still learning to use type hints and a zillion other features.

    PHP's Annoying Iterator

    PHP has a slightly broken function to help you loop over arrays. It's called next() and you probably think it works like this. reset( $array ); while( $x = next($array) ) { ... } That would be typical C or Java iteration. next(), however, increments the counter before returning the value. So, the first call to next() returns the second element in the array! There's another function, current(), that returns the current value. To use current() and next() to support each other, you write your loop like this: for( reset($array); $x = current($array); next($array) ) { ... } Alas, there's a problem here. If the value of current($array) is 0 or not set, then FALSE is returned. This is usually masked by type casting (FALSE+0 = 0) but it's a bug. They say to use each() instead. To make things even more confusing, the each() function works a lot like next(), except the returns the current value, and then increments the cursor. This is correct iterator behavior, except that each() returns a whole array full of data instead of just the array element. One way to use each() is: while( list($key,$val) = each($ar) ) { ... } The list() pseudofunction causes the = assignment to be done to a list of values. In the above, it's shorthand for: $key = $ar[0] and $val = $ar[1]. What we want is $val, but to get at it, we need to dispose of $ar[0] first. Yuck. Additionally, pseudofunctions annoy, because they look like functions, but, they aren't. In this case, list() sets up a context where an array on the right side is assigned to a list of variables on the left side. What we really want is foreach(), but applied like this: foreach( $a as $n ) iterates correctly, assigning the first element to the variable. Here's a little tool that helps write a for loop for arrays:
    Array variable name $

    PHPList Tweak for Plain Text Version of HTML Mail

    This small alteration to the PHPList code will produce better line breaks.

    sendemaillib.php:

      $text = preg_replace("/<\/p\s*?>/i","<\/p>\n\n",$text);
      $text = preg_replace("/<br>/i","<br />",$text);
      $text = preg_replace("/<br \/>\n+/i","<br />",$text);
      $text = preg_replace("/<br \/>/i","<br />\n",$text);
      $text = preg_replace("/<table/i","\n\n<table",$text);
    

    The technique is to first normalize br to br /, then remove line newlines after br tags. Then, you re-introduce newlines, so each br has only one newline after it.

    Paging Over a Result Set

    I describe how others page over results, and how I do it.

    I find fascinating URIs that looks like this:
    http://mysite.com/index.php?page=3&search=foo

    Why do programmers use a "page"? It seems odd, because that number is going to be translated to a starting record number, and a number of records (per page). Effectively, the URL could be this:
    http://mysite.com/index.php?offset=30&pagesize=10

    Now, you can show any arbitrary number of rows, starting at any arbitrary point in the results. Of course, what you lose are the "pages", because you can get at any single record in the result. (If they can type a url, that is.) To regain the page-y-ness of the interface, you need to write a loop that constructs the pager navigation. One way is this:

    
    for( $i=0; $i < $resultcount; $i+=$pagesize )
    {
        // start the link
        print ''.title.'';
        // and close the link
    }
    
    

    This way reduces the coupling between the creation of the page navigation from the display of the results. It also creates a cleaner mapping to the SQL OFFSET statement.

    If you need to know what page you're on, just divide the offset by the page size.

    Parsing Half.com Data with a CSV (Comma Separated Values) Class

    Here's a PHP function that reads the Half.com data file and returns it as an array.

    There is a PHP function to read CSV files: fgetscv(), but it turns out that function doesn't read CSV files produced by Excel.

    Note: a new version of this CSV class is posted to the PHP notebook.

    The code needs some revision.

    <?php
    include('CSV.class.php');
    
    function readHDCFile($path)
    {
            $o = array();
            $p = new CSV();
            $fh = fopen($path,'r');
            fgets($fh); // get rid of the first line
            while ($data = fgets($fh))
            {
                    $row = $p->textToArray( $data );
                    $o[$row[5]] = array(
                                    'ItemID' => $row[0],
                                    'ISBN' => $row[4],
                                    'Title' => $row[5],
                                    'Category' => $row[8],
                                    'Price' => $row[10],
                                    'Condition' => $row[11],
                                    'Notes' => $row[12],
                                    'URL' => 'http://shops.half.ebay.com/declutter69_W0QQ'
                            );
            }
            return $o;
    }
    
    

    CSV.class.php:

    <?php
    define('CSV_TAB',"\t");
    
    class CSV {
            var $template;
    
            /**
             * @param array $template
             *
             * The $template is an array used to selectively quote fields.
             * If the Nth element is 'quote', the Nth field will be quoted.
             * For example, array('','','quote') causes the 3rd field to be
             * quoted.
             */
            function CSV( $template = NULL )
            {
                    $this->template = $template;
            }
    
            function arrayToText( &$ar, $separator=',' )
            {
                    $row = array();
                    reset($ar);
                    $count=0;
                    foreach($ar as $field)
                    {
                            if ($this->template[$count] != 'quote')
                            {
                                    $field = '"'.$this->quote($field).'"';
                            }
                            $row[] = $field;
                            $count++;
                    }
                    return join($separator,$row);
            }
    
            /**
             * Parses one line of a csv file.
             */
            function &textToArray( $str, $separator=',' )
            {
                    $out = array();
                    while($str)
                    {
                            if (preg_match('/^"/', $str))
                            {
                                    if (preg_match("/\"(.+?)\"$separator(.+)$/", $str, $matches))
                                    {
                                            $head = $this->dequote($matches[1]);
                                            $str = $matches[2];
                                    }
                                    else // assume it's the last element
                                    {
                                            $head = $this->dequote($str);
                                            $str = '';
                                    }
                            }
                            else
                            {
                                    if (preg_match("/^$separator/",$str))
                                    {
                                            // this is a special case of a null field
                                            // it's exceptional, because the . metachar matches
                                            // non-whitespace, and our separator might be whitespace
                                            $head = '';
                                            $str = substr($str,1);
                                    }
                                    else if (preg_match("/(.+?)$separator(.+)$/", $str, $matches))
                                    {
                                            $head = $matches[1];
                                            $str = $matches[2];
                                    }
                                    else // assume it's the last element
                                    {
                                            $head = $str;
                                            $str = '';
                                    }
                            }
                            $out[] = $head;
                    }
                    return $out;
            }
    
            function quote( $s )
            {
                    $s = preg_replace('/"/','""', $s);
                    return $s;
            }
    
            function dequote( $s )
            {
                    $s = preg_replace('/""/','"', $s);
                    return $s;
            }
    }
    

    Re: Sharing cookie info across mulptile domains using PHP

    Answered this question on EE.

    The asker wanted to share a session across websites. There's no PHP code here, but the question was asked in a PHP forum.

    1. when a user logs in, your code has to contact a central login server, and log that user into the "network" of sites. don't return a page yet.

    2. the central log in server will return a global session id.

    3. your code returns a page, and, on that page, put a bunch of images. in each image, set the SRC to a "remote log in" script on each of the network's servers. in each url, append the session id cookie. also set the user's cookie in this page.

    4. each "remote log in" script called will contact the central log in server, to validate the session id. if the cookie validates, then set a set a cookie. also, each script should return it's graphic to the client, so it displays something. maybe it's a 1 pixel gif.

    5. the global session id should be expired shortly after this transaction happens, to avoid session hijacks. the individual sessions are managed on each of the sites, not through the central log in server. there are other ways to make this more secure, too.

    Refactoring for Growth

    These notes are not complete or edited at all... once there's code... it'll be closer to done.

    The only way to grow an application quickly without causing your system to topple from its own (code) weight is by investing some effort in creating better structures for code.

    Mostly, I'm looking at a modified Model-View-Controller design. It has to differ, because MVC was meant for point-and-click GUIs, and web apps are all done via forms.

    After mucking around with some test code, and reading articles on the web, I started to concretize something. The Controller tends to be the central component, and interface to the page, and loosely coupled with the View... making a View-Controller. The Model, on the other hand, tends to get split into the business logic and a database persistence class. The db part becomes its own class because each page load clears memory, and forces us to touch the db. I just call these the Manager and the Table-and-TableRow.

    http://www.onlamp.com/pub/a/php/2004/10/14/page_controller.html

    http://www.tonymarston.net/php-mysql/infrastructure.html

    http://www.tonymarston.net/php-mysql/oop-for-heretics.html

    According to OO gospel, you're not supposed to let the Controller access the db abstraction. This seems to be a real burden, though, because it would force a lot of code to be created in the Manager that really does nothing. My inclination is to allow reading from the db classes by the Controller, and letting the controller pass the data to the View class methods. Relegate the Manager to "management" of the contents of the data.

    This makes it harder to swap out the db, but the risks are lower if you're strict about never writing data from the Controller.

    Also, you're not supposed to do any of the fancy SQL in the db abstraction... but you have to do it somewhere. One dead-end (IMHO) was building up SQL from fragments of text, and encapsulating the idea of a JOIN in program code. This was hard to read and confusing. A nice solution was to put the SQL, complete, or more or less complete, in the Table-and-TableRows classes, and deal with these classes not as "db abstraction" but "data abstraction". Give them some liberty to join tables so the data they produce is useful.

    If you want to fix your SQL, it's all in the classes that extend Table and TableRow.

    If you want to change some aspect of how the program works, it's in the Controller or the Manager.

    If you want to create a test case for some high-level behavior, it's best to create the "API" for the behavior in the Manager, and then test it there.

    Lastly, for convenience, I'm putting all four classes into a single file, called a Package. You can split them into different files, but, that would make uploading the classes harder.

    --------------

    Inspired by Marston, I decided to go against my habit of turning rows in the database into full fledged objects, with properties that mirrored the columns. Most of the quick-and-dirty code I'd been writing for work (as well as Perl scripts for system administration) used associative arrays, and they were working fine.

    Traditional OO code turns the array into an object, but typical PHP code uses the object, more or less, like an array again, almost immediately. Unlike a GUI app, where the data persists for a while, and is manipulated by the user a lot, before being saved to the database... the web app reads the data, alters it, writes it out, and then displays it. It does this over and over.

    Reading from the db returns the data in an array. Writing to the db is via an SQL statement composed from the array or object. The display classes can be designed to use arrays or objects, but, ultimately, it doesn't matter - the code is almost identical. So, it's probably preferable to keep the data in an array format.

    If we need real OO, we can implement it... but assuming data is passed around as arrays will save a lot of coding effort.

    ----------

    An internal and external debate exists about returning arrays versus returning an object that iterates over the result set. The latter is generally considered better practice.

    I recall that my original impulse to use the array instead stems from the fact that PHP has very good support for arrays, and, often, you end up iterating over the entire result set anyway. Building up the whole array takes more memory, but, web pages tend to consume memory quickly, generate the output, and then release the memory.

    Still, having the iterator would be useful.

    ----------

    A good article on real-world scalability.

    http://www.onjava.com/pub/a/onjava/2003/10/15/php_scalability.html

    SQL Expression Parser and Builder

    This is a rough draft of an SQL expression parser kit. What it does is help to break apart an SQL expression, manipulate it's parts, and then reconstruct it into another SQL expression. It's useful if you take some bits of SQL as input, and then insert that into a query. I'm using it to construct user-friendly labels for rows.

    This example takes an expression and prepends 'destination.' to each table name. It parses the expression into a tree. Then, it sends a Visitor into the tree to rename the table names. Finally, another visitor traverses the tree to reconstruct the expression. For example, "concat(a,b)" is altered to "concat( `destination.a`, `destination.b` )". Parser code is derived partly from code in Wikipedia, and other web articles. Visitor code is from Wikipedia and slightly modified. This code has NOT been tested. (I put it on the web because I couldn't find this code anywhere, at least not in PHP.)

    
    
    include_once 'SQLTokenizer.class.php';
    
    /**
     * SQL Parser (yet another).  This one scans the sources three times: tokenize, add types, parse.
     * This should be more OO - the AST is in arrays, not objects.
     */
    class SQLExprParser 
    {
    	var $AST;
    	var $exprStack;
    	var $listStack;
    	var $typedArray;
    
    	function SQLExprParser( &$tokenizer )
    	{
    		$this->typedArray =& $tokenizer->typedArray;
    	}
    
    	/**
    	 * The parser is app specific.  It only parses expressions.
    	 */
    	function parse()
    	{
    		$offset =& $this->offset;
    		$offset = 0;
    		$this->AST = $this->pSUM();
    	}
    	function acceptType( $s )
    	{
    		//report( "acceptType $s .".$this->typedArray[$this->offset][0] );
    		if ( $this->typedArray[ $this->offset ][1] == $s )
    		{
    			$token = $this->typedArray[ $this->offset ][0];
    			$this->_next();
    			return $token;
    		}
    		return false;
    	}
    	function expectType( $s )
    	{
    		//report( "expectType $s .".$this->typedArray[$this->offset][0] );
    		if ( $token = $this->acceptType( $s ) )
    			return $token;
    		die("expectType: unexpected symbol $s. Next symbol is ".$this->typedArray[$this->offset][0]);
    	}
    	function _next()
    	{
    		$this->offset++;
    	}
    	/** 
    	 * p means parse.
    	 */
    	function pSUM()
    	{
    		$expr = $this->pFACTOR();
    		list($op,$list) = $this->pSUMREST();
    		if ($op)
    		{
    			return new ASTNode( 'OPERATOR', $op, array($expr, $list) );
    		}
    		else
    			return $expr;
    	}
    	function pSUMREST()
    	{
    		if ($token = $this->acceptType('OPERATOR'))
    		{
    			$expr = $this->pSUM();
    			return array( $token, $expr );
    		}
    	}
    	function pFACTOR()
    	{
    		if ( $token = $this->acceptType('STRING') ) 
    		{
    			return new ASTNode( 'STRING', $token );
    		} 
    		else if ( $token = $this->acceptType('BAREWORD') )
    		{
    			// if the next token is a paren, we have an error
    			// because it looks like a function
    			return new ASTNode( 'NAME', $token );
    		}
    		else if ( $token = $this->acceptType('NUMBER') )
    		{
    			return new ASTNode( 'NUMBER', $token );
    		}
    		else if ( $token = $this->acceptType('NAME') )
    		{
    			return new ASTNode( 'NAME', $token );
    		} 
    		else if ( $token = $this->acceptType('FUNCTION') )
    		{
    			$list = $this->pARGS();
    			return new ASTNode( 'FUNCTION', $token, $list );
    		} 
    		else if ( $token = $this->acceptType('LEFTP') )
    		{
    			$expr = $this->pSUM();
    			$this->expectType('RIGHTP');
    			return new ASTNode ( 'GROUP', '',  $expr );
    		} 
    		else 
    		{
    			$expr = $this->pUNARY();
    			return $expr;
    		} 
    	}
    	function pUNARY()
    	{
    		$this->expectType('MINUS');
    		$expr = $this->pFACTOR();
    		return new ASTNode( 'UNARY', '-', $expr );
    	}
    	function pARGS()
    	{
    		//report('pARGS');
    		$this->expectType('LEFTP');
    		$list = $this->pLIST();
    		$this->expectType('RIGHTP');
    		return $list;
    	}
    	function pLIST()
    	{
    		//report('pEXPRLIST');
    		$expr = $this->pSUM();
    		$rest = $this->pLISTREST();
    		return array_merge( array($expr), $rest );
    	}
    	function pLISTREST()
    	{
    		if ($this->acceptType('COMMA'))
    		{
    			return $this->pLIST();
    		}
    		return array();
    	}
    }
    
    class ASTNode
    {
    	var $type;
    	var $token;
    	var $children;
    	function ASTNode( $type, $token, $children=null )
    	{
    		$this->type = $type;
    		$this->token = $token;
    		$this->children = array();
    		if ($children)
    		{
    			$this->addChildren( $children );
    		}
    	}
    	function addChildren( &$children )
    	{
    		if (is_array($children))
    		{
    			$this->children =& array_merge($this->children, $children);
    		}
    		else // is not array
    		{
    			array_push($this->children,$children);
    		}
    	}
    	/** 
    	 * For visitor pattern. See wikipedia.
    	 */
    	function accept( &$visitor )
    	{
    		$visitor->visit($this);
    	}
    }
    
    class ASTVisitor {
    	var $autoRecursion = true;
    	function ASTVisitor()
    	{
    		$this->autoRecursion = true;
    	}
    	/**
    	 * Applies the action if the node is the type we want.
    	 */
    	function visit( &$node )
    	{
    		$method = "visit".$node->type;	
    		if ($node->type && method_exists($this,$method))
    		{
    			//print ($method . " called<br>");
    			$this->$method(&$node);
    		}
    		if ($this->autoRecursion)
    		{
    			$this->visitChildren( &$node );
    		}
    	}
    	function visitChildren( &$node )
    	{
    		if (is_array($node->children))
    		{
    			foreach($node->children as $child)
    			{
    				$child->accept(&$this);
    			}
    		}
    	}
    	function visitChild( &$node, $index )
    	{
    		if ( isset( $node->children[$index] ) )
    		{
    			$node->children[$index]->accept(&$this);
    		}
    	}
    }
    class ConcretePrependerASTVisitor extends ASTVisitor
    {
    	var $prefix;
    
    	function ConcretePrependerASTVisitor( $prefix, &$node )
    	{
    		$this->prefix = $prefix;
    		$this->visit(&$node);
    	}
    	/** 
    	 * Prepends the string 'destination.' to
    	 * the name, if it doesn't already have a prefix.
    	 * @param string $string to modify
    	 */
    	function _prepend( $string )
    	{	
    		$string = rtrim($string,'`');
    		$string = ltrim($string,'`');
    		if (preg_match('/\\./', $string))
    			return '`'.$string.'`';
    		return '`'.$this->prefix.'.'.$string.'`';
    	}
    	function visitNAME(&$node)
    	{
    		$node->token = $this->_prepend( $node->token );
    		//print("setting token to ".$node->token."<br>");
    	}
    }
    
    class SQLExprBuilderASTVisitor extends ASTVisitor
    {
    	var $output;
    	var $head;
    	/**
    	 * This resconstructs the expression from the AST.
    	 * The AST is a lisp-like functional tree, without operator precedence.
    	 */
    	function SQLExprBuilderASTVisitor( &$node )
    	{
    		$this->output = '';
    		$this->autoRecursion = false;
    		$this->head = $node;
    	}
    	function asString()
    	{
    		$this->visit($this->head);
    		return $this->output;
    	}
    
    	function visitFUNCTION(&$node) 
    	{ 
    		$this->output .= $node->token.'( ';
    		$count = 0;
    		foreach($node->children as $child)
    		{
    			if ($count) $this->output.=', ';
    			$child->accept(&$this);
    			$count++;
    		}
    		$this->output .= ' )';
    	}
    	function visitNAME(&$node) 
    	{ 
    		$this->output .= $node->token;
    	}
    	function visitOPERATOR(&$node) 
    	{ 
    		$this->visitChild( &$node, 0 );
    		$this->output .= ' '.$node->token.' ';
    		$this->visitChild( &$node, 1 );
    	}
    	function visitSTRING(&$node) 
    	{ 
    		$this->output .= $node->token;
    	}
    	function visitNUMBER(&$node) 
    	{ 
    		$this->output .= $node->token;
    	}
    }
    

    The SQLTokenizer class is on another page (follow the link). (This is all just for reading, not using. The whole package will be up elsewhere.)

    Here's a testing page.

    
    include('SQLExprParser.class.php');
    
    if (isset($_GET['sql']))
    {
    	print "
    ";
    	//print_r(tokenizeSQL($_GET['sql']));
    	$t = new SQLTokenizer();
    	$t->tokenize( $_GET['sql'] );
    	/*
    	print_r($t->tokenArray);
    	print "--------------\n";
    	print_r($t->typedArray);
    	print "--------------\n";
    	*/
    	$p = new SQLExprParser( &$t );
    	$p->parse();
    	$v = new ConcretePrependerASTVisitor( 'destination', &$p->AST );
    	print_r($p->AST);
    	print "--------------\n";
    	$v = new SQLExprBuilderASTVisitor( &$p->AST );
    	$output = $v->asString();
    	print $output;
    	print "

    ";
    }

    ?>

    <?=$_GET['sql']?></textarea>

    This is the yacc/bison compatible grammar that I used to start off. I got it, partly, from the BNF Web Club.

    
    %token NAME
    %token NUMBER
    %token STRING
    %token BAREWORD
    %token FUNCTION
    %token OPERATOR
    %%
    
    /* 
       This bison file was used to test the grammar for LALR validity.
       The parser was generated by hand from this.
    
    	To make the parser LL, remove left recursion.
    	A clean way is to break left-recursive productions
    	at terminal tokens.
     */
    
    /* sum : factor { operator factor } */
    
    SUM : FACTOR SUMREST
    
    SUMREST : OPERATOR SUM
    	|
    
    FACTOR : FUNCTION ARGS
    	| NAME 
    	| NUMBER
    	| STRING 
    	| BAREWORD
    	| '(' SUM ')'
    	| UNARY
    	|
    	
    UNARY :  '-' FACTOR
    ARGS : '(' LIST ')'
    
    /* list : sum { ',' sum } */
    
    LIST : SUM LISTREST
    
    LISTREST : ',' LIST
    	|
    
    

    SQL Tokenizer

    Here's a first-draft of some code that tokenizes SQL statements and returns an array of all the tokens.

    The logic is pretty simple. We scan the string left to right, and look for barewords, single quoted strings, double quoted strings, back quoted strings, or special symbols like "=", "(", "*", etc. These chunks of text may abut each other without intervening whitespace. Thus, "FOO(BAR)" is tokenized into four individual tokens: FOO, (, BAR, ). Quoted strings are scanned and tokenized with quotes intact; the parser must strip the quotes. The functions to scan quoted strings are buggy, and don't handle escaped quote characters.

    See bottom of page for parsing notes.

    
    /** 
     * Tokenizes text that looks something like SQL.
     */
    function tokenizeSQL( $SQL )
    {
    	$functions = array ( 'concat', 'if' );
    	$token = '\\(|\\)|[\']|"|\140|[*]|,|<|>|<>|=|[+]';
    	$terminal = $token.'|;| |\\n';
    	$result = array();
    	$string = $SQL;
    	$string = ltrim($string);
    	$string = rtrim($string,';').';'; // always ends with a terminal
    	$string = preg_replace( "/[\n\r]/s", ' ', $string );
    	while( 
    		   preg_match( "/^($token)($terminal)/s", $string, $matches ) ||
    		   preg_match( "/^({$token})./s", $string, $matches ) ||
    		   preg_match( "/^([a-zA-Z0-9_.]+?)($terminal)/s", $string, $matches) 
    		  )
    	{
    		$t = $matches[1];
    		if ($t=='\'')
    		{
    			// it's a string
    			$t = tokSingleQuoteString( $string );
    			array_push($result, $t);
    		}
    		else if ($t=="\140")
    		{
    			// it's a backtick string (a name)
    			$t = tokBackQuoteString( $string );
    			array_push($result, $t);
    		}
    		else if ($t=='"')
    		{
    			// it's a double quoted string (a name in normal sql)
    			$t = tokDoubleQuoteString( $string );
    			array_push($result, $t);
    		}
    		else
    		{
    			array_push($result, $t);
    		}
    		$string = substr( $string, strlen($t) );
    		$string = ltrim($string);
    	}
    	return $result;
    }
    
    function tokSingleQuoteString( $string )
    {
    	// matches a single-quoted string in $string
    	// $string starts with a single quote
    	preg_match('/^(\'.*?\').*$/s', $string, $matches );
    	return $matches[1];
    }
    
    function tokBackQuoteString( $string )
    {
    	// matches a back-quoted string in $string
    	// $string starts with a back quote
    	preg_match('/^([\140].*?[\140]).*$/s', $string, $matches );
    	return $matches[1];
    }
    
    function tokDoubleQuoteString( $string )
    {
    	// matches a back-quoted string in $string
    	// $string starts with a back quote
    	preg_match('/^(".*?").*$/s', $string, $matches );
    	return $matches[1];
    }
    

    Here's some code that can be used to make a page that will let you see the parser's output.

    if ($_GET['sql'])
    {
    	print "<pre>";
    	print_r(tokenizeSQL($_GET['sql']));
    	print "</pre>";
    }
    
    ?>
    <form>
    <textarea name=sql><?=$_GET['sql']?></textarea>
    <input type="submit">
    </form>
    

    The parser must be able to disabiguate the uses of parens in this string: "select concat( a, b ) from ( select * from x where y=1 )".

    The first use of parens is to delimit the parameters to concat. The second use is to delimit a subselect.

    The tokenizer is insensitive to this issue, and treats all parens equally. I think this is okay, but I may be wrong. All information about whitespace is lost.

    Sign in sheet website

    I spent way too much time tonight putting together a sign-in sheet website, called, generically enough Sign-in Sheet.

    It took around 5 or 6 hours to make, and used a lot of canned resources, like Bootstrap and Bootswatch, Add This, Google Analytics, Google Adsense, and Mailchimp. This kind of dev is fun - lots of reward for not much work. The image is from Wikimedia Commons. The forms were done in LibreOffice Writer. It's not getting any kind of page ranking at all, but it's new.

    There's no CMS. It's a front controller, and an .htaccess file copied from WordPress. These rewrite rules basically deliver the requested file, and any potential file-not-found errors are routed index.php in the root. Here's the .htaccess file:

    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /
    
    RewriteRule ^index\.php$ - [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]
    </IfModule>
    
    Options -Indexes 

    The index.php file acts as a "router", and also does the basic layout. Here's the code:

    $path = $_SERVER['REQUEST_URI'];
    
    $page = array();
    $page['heading'] = "Sign-in Sheet";
    $page['description'] = "A free repository of common sign-in sheets and business forms for your company, organization, church, faith organization, or club.";
    $page['keywords'] = "free, sign-in sheet, business forms, openoffice, open office, libre office, pdf";
    $page['geo.placename'] = "Los Angeles, California";
    $page['geo.region'] = 'US';
    
    
    // now pick a page to load
    switch($path) {
      case '/sign-in-sheet-appointment/': $file = 'tpl/AppointmentSignIn/index.txt'; break;
      case '/sign-in-sheet-email-list/': $file = 'tpl/EmailListLetter/index.txt'; break;
      case '/sign-in-sheet-meeting/': $file = 'tpl/EventSignIn/index.txt'; break;
      case '/sign-in-sheet-timesheet/': $file = 'tpl/TimeSheet/index.txt'; break;
      case '/sign-in-sheet-visitor/': $file = 'tpl/VisitorSignInSheet/index.txt'; break;
      case '/sign-in-sheet-sports/': $file = 'tpl/YouthActivitySignInSheet/index.txt'; break;
      case '/printers/': $file = 'pages/printers.txt'; break;
      case '/contact/': $file = 'pages/contact.txt'; break;
      default: $file = 'pages/index.txt'; break;
    }
    
    $parts = explode( '/', $file );
    $templatePath = '/'.$parts[0].'/'.$parts[1];
    
    include('inc/header.php');
    
    include($file);
    
    include('inc/footer.php');
    

    Determining $templatePath is kind of ugly, unfortunately, but the rest of it is pretty clean and safe.

    The only important thing to notice is that the URI path is not concatenated into the value of $file. We don't take external input and put it into our internal variables. Doing so is a potential security hole because we include $file.

    Instead, we use the switch statement to isolate the input from the internal values.

    So, it's safe to calculate $templatePath from $path, and include $file.

    Improvements

    The main improvement to make is to break out the switch statement into another file, so it can grow to be a lot larger. Also, instead of adding the index.txt to each, set either a path or file, and guess the type of data. We have pages, or we have a directory with some templates, and index.txt, and other metadata.

    This also needs a tool to generate the menus.

    With these changes, it would be easy to make this a 100+ page site, without too many hassles.

    Would it compete with Vertex42? Probably not. They're really good.

    Simple Staging

    Here's yet another idea for setting up your computer to stage a bunch of websites. I run servers on 127.0.0.1 and name all the websites so they end in ".lo" like "test.lo" and "id.lo". I put these into /etc/hosts:

    127.0.0.1 test.lo
    127.0.0.1 id.lo

    Then, create the typical config files (in /etc/apache2/sites-enabled).

    Last, create an index.php in /var/www that reads /etc/hosts and generates a page of links to these staging sites:

    <?php
    $f = file("/etc/hosts");
    foreach($f as $line)
            if (preg_match("/\b([\w-]+?\\.lo)/", $line, $p))
                    print "<p><a href='http://$p[1]'>$p[1]</a></p>\n";

    Simple Staging LAN with OpenWRT and dnsmasq

    This isn't PHP-specific, but it's a simple recipe for a small LAN for testing websites (or any client-server software). With it, you can easily manage a small network with minimal effort. Conceptually, it expands on the idea outlined in "A Simple Staging Server", where you use the /etc/hosts file and apache virtual hosts to stage your sites.

    The staging environment that's created isn't as complete as switching /etc/hosts, but it's a reasonable approximation. Here are the parts:

    OpenWRT uses the dnsmasq daemon to offer DHCP and DNS proxying. dnsmasq does three things. First, it's a DHCP server, handing out IP addresses to machines on your LAN. Second, it takes DNS requests, and relays the requests to the ISP's DNS servers, caching some results. Third, it will also serve up domain names in /etc/hosts, letting you share your hosts file.

    The third feature is what we'll use. But first, you have to get OpenWRT installed on the router. See openwrt.org for that info.

    Log in to the router.

    Your goal is to freeze the IP addresses of your staging servers. You do this by forcing dnsmasq to assign specific IP addresses to specific MAC addresses. Turn on all the computers on your LAN. This will cause the file /tmp/dhcp.leases to be filled with the MAC addresses, IP addresses, and host names of all the computers. Copy this info somewhere and then use it to
    create the file /etc/ethers, which is a list of MAC addresses mapped to IP addresses. Mine looks something like this:

    # /etc/ethers
    # sad
    00:11:aa:88:b9:27 192.168.1.142
    # stage
    00:15:2a:11:aa:9f 192.168.1.106
    # sandwich
    00:41:aa:8c:ba:aa 192.168.1.124
    

    The numbers could have been manually sequenced, but I'm not picky about that.

    Now execute these commands in the shell:

    cd /etc
    rm hosts
    cp /rom/etc/hosts .
    rm dnsmasq.conf
    cp /rom/etc/dnsmasq.conf .
    

    You do that to get around OpenWRT's effort to save some space by symlinking files in /etc to the uncompressed, read-only boot image. To modify the files, delete the symlink, and copy the original over.

    Then you have to add lines /etc/hosts to include your staging server. In the example below, the staging server is on 192.168.1.106.

    #added to /etc/hosts
    192.168.1.106 conc.lo
    192.168.1.106 go.lo
    192.168.1.106 iper.lo
    

    go.lo is a staging server. On that server, in the apache configuration, we have the following. This isn't on the router; it's on the staging PC.

    <VirtualHost 192.168.1.106:80>
            ServerName go.lo
            ServerAdmin nospam@riceball.com
            DocumentRoot /home/johnk/Sites/go/
    </VirtualHost>
    

    That sets up a staging server on the external IP address. (Note that in the article about simple staging servers, we used 127.0.0.1 as the host. In this situation, we want to offer the server to the LAN.)

    Then the final edit is to modify dnsmasq.conf to include the following:

    local=/lo/
    

    local=/lo/ tells dnsmasq to try and resolve domains that end in .lo with data from /etc/hosts. .lo is a fake top-level domain, short for "local" (if it's real, we just prevented ourselves from accessing anything in .lo)

    Once done, you can reload the configs for dnsmasq:

    killall -HUP dnsmasq
    

    The name "go.lo" should now resolve, and is usable as a staging server.

    Simple Staging Server

    This isn't a PHP-specifc article, but is relevant to programming in PHP.

    Sometimes, you'll find that the site you're working on has absoulte URLs. Maybe it's in the HTML, or, just as likely, you're using a feature, like redirection, that requires specifying a domain name.

    There are several ways to adapt your code to operate on two different computers, so you can have one copy of the code on a development machine, and another copy on the live server. One quick trick that can work is to use /etc/hosts to direct requests for your live site over to your local server.

    /etc/hosts is a file on your computer (assuming you have a Unix box at home) that maps IP addresses to domain names. It looks like this:

    127.0.0.1 localhost localhost.localdomain
    

    When I'm working, I will add a couple entries to /etc/hosts to point my domains back to my local development machine.

    127.0.0.1 localhost localhost.localdomain
    127.0.0.1 www.riceball.com
    127.0.0.1 riceball.com
    

    Apache's configured to have virtual hosts for these two domains.

    Within the code, I do a simple detection to adjust my variables:

    if ($_SERVER['SERVER_NAME']=='stage') {
        $dbhost = '127.0.0.1';
        $root = '/home/johnk/Sites/riceball.com';
    }
    

    This code automatically switches a couple global values so that the software will work with my staging environment. Usually, you can get away with switching only two or three values.

    The main "problem" with this technique is that it can make it difficult to upload files to the server. You have to undo the changes to /etc/hosts. Then, relaunch the FTP client, and upload. This can be a pain, but most of the time, you should finish coding, and testing, and committing code to the repository, before uploading. In other words, upload infrequently.

    The following is a script to automate switching between two different /etc/hosts files. It must be run as root.

    #!/bin/sh
    
    # swaps out two /etc/hosts files, for staging purposes.
    
    cp /etc/hosts /etc/hosts.swap1
    mv /etc/hosts.swap /etc/hosts
    mv /etc/hosts.swap1 /etc/hosts.swap
    
    cat /etc/hosts
    

    There is one significant deficiency of this system, in that the staging environment is inaccessible to anyone but the lone developer. A more elaborate setup can be used to stage software on a LAN, however, people outside the LAN won't be able to easily access the servers. It's not a one-size-fits-all solution.

    If you're on a LAN with a router that has been reprogrammed to use OpenWRT, you can try the technique described in Simple Staging LAN. It works like the above, but on the entire network, for all operating systems.

    Templating: A Minimalist Templating System

    A few years back, my co-worker Josh and I came to the conclusion that we didn't like the Smarty templating system. Not that there's anything wrong with Smarty - it's just that we didn't like the fact it was this software system that didn't seem to do anything except behave like a subset of PHP, and required a lot of extra code.

    So, we did some thinking, and thought a bit about Cold Fusion, a really nice language that gets little respect because it looks like HTML. There are a few things CF does to make life easier for HTMLers (but makes it a lousy development language for regular programmers). Some ideas bubbled to the top.

    1. All variables are globals.
    2. Most data is in arrays.
    3. The "IF" syntax of CF sucks. I had a preference for SQLs IIF() or Excel's =IF(). That's a functional IF rather than a structural IF.
    4. It's nice to be able to call object methods.
    5. Global functions are useful at times.

    6. We also established an goal where HTML editors could be used to prototype the design, then SQL used to create the data, generators to make the code, and then the prototype modified to create the output templates.

    From these, we defined a subset of PHP to use as a "template language".

    
    	<?=function($obj) ?>
    	<?=($obj->boolean) ? 'value' : 'value'.$obj->var.'string' ?>
    	<?php while ($obj->next()) : ?>
    	<?php endwhile; ?>
    	<?=$obj->var; ?>
    	<?=$obj->func(); ?>
    

    Believe it or not, that subset is adequate to build websites that display records from fairly complex relational databases.

    To use the template, you set up a Template object that creates an "execution environment" for the template, by exporting into it any variables and objects. These objects become "globals" that you can call. Behind the scenes, that's exactly was the templating system did: it created a bunch of globals. The "trick" is, once the environment is set up, you just "include" the template, and PHP parses and executes. It's fast.

    Sample code:

    include('Template.class.php');
    $t = new Template('foo.pht');
    $t->registerObjects( $x, $y, $z );
    echo $t->toString();
    

    The rationale for using a subset of PHP was this: most template systems give you a subset of a real programming language.

    The rationale for using the ternary operator (? and :) instead of "if" is this: the IF..THEN..ELSE structure looks ugly, mainly becaues HTML is verbose, and can lead to big blocks of code. Instead of big IF blocks, alternatives should be used: CSS, logic in the object, and sub-templates.

    The CSS way is to emit all objects, but show or hide them using CSS style sheets. The Logic-in-the-Object way is to push the logic back into the object, so that the object sets boolean flags that are used as signals by the template. Sub-templates are template objects that are rendered within the ternary operator. (This wasn't implemented in the early template engines. It turns out that sub-templates aren't extremely important, unless you're going to render lists of different types of objects. They are a useful convenience more than a requirement.)

    It also turns out that nested ternary operators are a little easier to manage than nested IF blocks. Generally, there shouldn't be complex logic in templates. The output should be limited in the queries, not in the template layer.

    Getting back to the sub-templates. This is the latest feature, where you can include a template within a template. It's just like "include", but in the system, you export the first template object into the second template.

    include('Template.class.php');
    $t = new Template('t','foo.pht');
    $t->registerObjects( $x, $y, $z );
    $u = new Template('u','bar.pht');
    $u->registerObjects( $t );
    

    The templates now get names of 't', and 'u'. These names match the variable names, in this example, but what they really do is map the object to the variable name used within the template.

    In the following example, in foo.pht, we call our object $a, so we set up our objects to have the name 'a'. Then, we pass that to the template.

    include('Template.class.php');
    $x = new Obj( 'a', 1 );
    $t = new Template('t','foo.pht');
    $t->registerObjects( $x );
    
    $y = new Obj( 'a', 2 );
    $u = new Template('u','foo.pht');
    $u->registerObjects( $y );
    
    $z = new Obj( 'a', 3 );
    $v = new Template('v','foo.pht');
    $v->registerObjects( $z );
    
    $w = new Template('w','bar.pht');
    $w->registerObjects( $t, $u, $v );
    

    Note that any objects that will be used by a template need to support naming. It seems onerous, because it kind of is.

    So, that's what we have so far. I'm thinking it's a little verbose, and it can be made to read more like this:

    $u = new ObjectTemplate( 'foo.pht', new Obj( 2 ) );
    

    That would require that the template refer to the object as "$this" or some other default name. That's a reasonable expectation if we assume each sub-template will display only one object.

    If we need to apply the template multiple times:

    $u = new ObjectTemplate( 'foo.pht' );
    $v = $u->applyToObject( 'v', new Obj(2) );
    $w = $u->applyToObject( 'w', new Obj(3) );
    $x = new Template( 'bar.pht', $v, $w );
    

    Generally, though, you don't have this situation where you have two objects of the same type, with different names. Usually, you just do an $obj = new Obj('obj',$x)., and there's only one instance of Obj on the page.

    When you do have multiples of an object, it's because they're all in a single iterator object, and you're looping over it. So, again, you need only one name. In sum, these above examples are contrived to show off some generally useless flexibility which we rarely ever require.

    I'm starting to think Templates should use positional parameters.

    ----------------

    References: A Minimalist Templating System again
    PHP Savant (not really an influence, but interesting)

    Testing for Hack Scripts, Scan Your Uploads

    This was ripped from a patch I made to ZenCart to deal with malicious uploads. It was stripped from a class, and it should probably be worked into pretty much any uploader class. (The class is in upload.php)

    It doesn't handle binary files, but it's good with scripts. It's fast, so you don't suffer the performance hit of a real virus scan.

        function looks_like_script() {
            $score = 0;
            if (preg_match('/.+(.php|.PHP|.pl|.PL|.cgi|.CGI)$/', 
               $this->filename)) $score++;
    
            $fh = fopen($this->file['tmp_name'],'r'); //this is the temp file
            $line = fgets($fh,6);
            fclose($fh);
            if ('<?'==substr($line,0,2)) $score++;
            if ('#!'==substr($line,0,2)) $score++;
    
            if ($score > 0) return true;
            else return false;
        }
    

    To use it, call it from within the class like this:

         if ($this->looks_like_script()) exit;
    

    Basically, if the file looks like a script exit silently. This isn't as good as something that actually unwinds the current action and throws an error message. That would be harder to implement. This just satisfies the goal of preventing someone from uploading a script to your website.

    Turn URLs into Links Without Affecting Existing Links (and a gripe about collective stupidity)

    This is one of those problems that has been solved, but, it's been solved in incomplete ways so many times that these not-too-useful answers outnumber the useful answers, totally messing up web searches. This consequently seeds the idea that this is an intractable problem! Even at stackoverflow, they say it's really tough.

    This is the opposite of so-called "collective intelligence". It's collective stupidity.

    In fact, this problem is not that tough. The code below will work in most situations. The technique is to guard the existing links by encoding them as base64. Then, find the bare URLs and turn them into links. Then, unguard the existing links by decoding them.

    <?php
    function add_url_links($data)
    {
            $data = preg_replace_callback('/(<a href=.+?<\/a>)/','guard_url',$data);
    
            $data = preg_replace_callback('/(http:\/\/.+?)([ \\n\\r])/','link_url',$data);
            $data = preg_replace_callback('/^(http:\/\/.+?)/','link_url',$data);
            $data = preg_replace_callback('/(http:\/\/.+?)$/','link_url',$data);
    
            $data = preg_replace_callback('/{{([a-zA-Z0-9+=]+?)}}/','unguard_url',$data);
    
            return $data;
    }
    
    function guard_url($arr) { return '{{'.base64_encode($arr[1]).'}}'; }
    function unguard_url($arr) { return base64_decode($arr[1]); }
    function link_url($arr) { return guard_url(array('','<a href="'.$arr[1].'">'.$arr[1].'</a>')).$arr[2]; }
    
    You may need to alter the regex in the second preg_replace_callback for your application. An odd thing is that link_url has to use guard_url to prevent the link from being mangled -- perhaps part of the preg_replace_callback is done in-place.
    AttachmentSize
    puboptionnc.jpg17.13 KB

    Ugly XML Parser Code to Generate INSERT Statements

    Code this ugly should be illegal. It does what it says, though.

    I'd add some sample data, but this is just too far in my past. The original data was just an xml file with tags going two levels deep.

    
    <?php
    
    if (! ($xmlparser = xml_parser_create()) )
    {
         die ("Cannot create parser");
    }
    
    xml_set_element_handler($xmlparser, "start_tag", "end_tag");
    xml_set_character_data_handler($xmlparser, "tag_contents");
    $filename = "Rates.xml";
    $current = "";
    $values = array();
    $fieldsToInclude = array( 'PUT', 'TAGS', 'TO', 'PUT', 'IN', 'DB', 'HERE' );
    $endingTag = 'TTS';  // this is the tag that triggers the creation of the query
    
    if (!($fp = fopen($filename, "r"))) { die("cannot open ".$filename); }
    
    while ($data = fread($fp, 4096)){
       $data=eregi_replace(">"."[[:space:]]+"."< ",">< ",$data);
       if (!xml_parse($xmlparser, $data, feof($fp))) {
          $reason = xml_error_string(xml_get_error_code($xmlparser));
          $reason .= xml_get_current_line_number($xmlparser);
          die($reason);
       }
    }
    xml_parser_free($xmlparser);
    
    function start_tag($xmlparser, $name, $attribs) {
       global $current;
       $current = $name;
    }
    function end_tag($xmlparser, $name) {
       // after the last end-tag, write out the query:
       global $values;
       global $current;
       global $endingTag;
       $current = ''; // clear out the current tag
       if ($name==$endingTag)
       {
    	   reset($values);
    	   $ks = $vs = '';
    	   foreach($values as $key=>$value)
    	   {
    		$value = addslashes($value);
       		$vs .= "'$value',";
    		$ks .= strtolower($key).',';
    	   }
    	   $vs = rtrim($vs,',');
    	   $ks = rtrim($ks,',');
    	   $q = 'INSERT INTO table ('.$ks.') VALUES ('.$vs.')';
    	   insert_query($q);  /////// you need to define this
    	   $values = array();
       }
    }
    function tag_contents($xmlparser, $data) {
       global $current;
       global $values;
       global $fieldsToInclude;
       if (in_array($current, $fieldsToInclude))
       {
    	   $values[$current] = $data;
       }
    }
    
    //// redefine this to insert the row
    function insert_query($q) { echo "$q

    "; }

    ViewIterator, possible pattern

    Bleaaaah. I haven't revisited or updated this issue in a long time. I'm still of the mind this is a good way to loop over results. jk 2007

    The PHP Patterns wiki is locked, so these notes are being written here. The URL to the Iterator Pattern is:

    http://www.phppatterns.com/docs/design/the_iterator_pattern

    See the linked article for the discussion about the SimpleIterator.

    The typical use case in PHP is that the iterator loops over a result from an SQL query, and the objects returned are database rows (as arrays). The traditional Iterator pattern can return any type of object, including hierarchies of objects, or objects of different classes.

    A different kind of iterator pattern should be used for PHP. I'll call it the ViewIterator for now. Unlike the traditional iterator, $this->next() doesn't return the next object. Instead, it returns the boolean true, until it's at the end of the list, when it returns false. next() also sets the properties of the object to the next row. The typical code to use a ViewIterator looks like this:

    while( $obj->next() )
    {
        echo $obj->propA;
        echo $obj->propB;
    }
    

    The ViewIterator avoids the creation of an object, saving memory and time.

    The ViewIterator can be used to encapsulate iterations over delimited text files, arrays, file directories, files, and other collections. The main limitation is that it cannot iterate over a collection of mixed object types or mixed data types. For example, it's somewhat unwiedly to iterate over a directory full of files and other directories, even if there's a property that indicates the type. It's probably better to create a separate ViewIterator for each type.

    The ViewIterator lacks a previous() method. PHP doesn't commonly require code to move backwards over a list.

    The ViewIterator interface is: next(); rewind();

    The idea of the ViewIterator was based on work by Josh from Slaptech.

    Web App Generator: database metadata

    This is an obsolete article, and left here as a placeholder until a better one can be written.

    This is a snapshot of GGDatabase.class.php, a bit of code to construct objects that hold metadata about a MySQL table.

    <?php
    /**
     * Copyright 2006 Slaptech.net
     */
    $base = '../';
    include_once('../config/config.inc.php');
    
    /*
     * Database and table metadata section
     */
    
    class GGDatabaseMetaData {
    	var $tables = array();
    }
    class GGTableMetaData {
    	var $fields = array();
    	function GGTableMetaData( $table )
    	{
    		$db = new Database();
    		$sql = "SELECT * FROM $table";
    		$db->query( $sql );
    		$numFields = mysql_num_fields($db->result);
    		$table  = mysql_field_table($db->result, 0);
    		for ( $i=0; $i < $numFields; $i++ ) 
    		{
    			$name  = mysql_field_name($db->result, $i);
    			$type  = mysql_field_type($db->result, $i);
    			$len   = mysql_field_len($db->result, $i);
    			$flags = mysql_field_flags($db->result, $i);
    			$this->fields[$name] = 
    				new GGFieldMetaData( $name, $type, $len, $flags );
    		}
    	}
    	function fieldNames( $excluding=array() )
    	{
    		return array_diff( array_keys( $this->fields ), $excluding );
    	}
    	/**
    	 * @returns a string like `id`,`name`
    	 */
    	function SQLFieldList( $ex=array() )
    	{
    		return '`'.join('`,`',$this->fieldNames($ex) ).'`';
    	}
    	/**
    	 * @returns a string like %d,`%s`
    	 */
    	function SprintfFormatList( $ex=array() )
    	{
    		$formats = array();
    		foreach( $this->fieldNames($ex) as $f )
    		{
    			switch( $this->fields[$f]->type )
    			{
    				'number': 'int': 'integer': 'decimal': 'real':
    					array_push( $formats, '%d' ); break;
    				default:	
    					array_push( $formats, "'%s'" ); break;
    			}
    		}
    		return join( ',', $formats );
    	}
    	/**
    	 * @returns a string like addslashes($this->id), addslashes($this->name)
    	 */
    	function SprintfArgumentList( $ex=array() )
    	{
    		return 'addslashes($this->'.join( '), addslashes($this->', $this->fieldNames($ex) ) . ')';
    	}
    
    
    }
    class GGFieldMetaData {
    	var $name;
    	var $type;
    	var $len;
    	var $flags;
    	var $not_null;
    	var $primary_key;
    	var $auto_increment;
    	var $multiple_key;
    	function GGFieldMetaData( $name, $type, $len, $flags )
    	{
    		$this->name				= $name;
    		$this->type 			= $type;
    		$this->len 				= $len;
    		$this->flags 			= $flags;
    		$this->not_null 		= is_int(strpos( $flags, 'not_null' ));
    		$this->primary_key 		= is_int(strpos( $flags, 'primary_key' ));
    		$this->auto_increment 	= is_int(strpos( $flags, 'auto_increment' ));
    		$this->multiple_key 	= is_int(strpos( $flags, 'multiple_key' ));
    	}
    }
    
    /*
    $md = new GGTableMetaData( 'sapphos_collections' );
    echo '<pre>';
    print_r( $md->SQLFieldList() );
    print '<p>';
    print_r( $md->SprintfFormatList() );
    print '<p>';
    print_r( $md->SprintfArgumentList() );
    print '<p>';
    print_r( $md );
    // */
    
    
    ?>
    

    Comments

    This is legacy code, and there are a few things I don't like about it. First is that it gets the field names by analyzing the result of a SELECT statement. Second, the Sprintf* and SQLFieldList methods are just utility functions and need to be more comprehensive (support more data types). Third, obviously, GGDatabaseMetadata isn't done. That's a class that could iterate over all the tables, constructing all the table metadata objects.

    GGDatabaseMetadata should be lazy, so it gets metadata on demand.

    GGTableMetatada needs a concept of a primary key (an array of fields that are the PK), and a foreign key. This requires augmenting the information from MySQL with external information about FKs.

    GGDatabaseMetadata might be replaced with another class, GGDataSet, similar to an ADO.NET DataSet. So, maybe the thing there needs to be some kind of GGDataSet, and the DatabaseMetadata could be a dataset that maps to a database. (An ADO.NET dataset is a set of tables and table-like objects that, together, can be queried.)

    Revisions

    I did a little code cleanup, because the original was a little verbose.

    The code was revised to move the field list exclusion feature into fieldNames(). It eliminated a temporary variable, and around 3 lines of code per affected function.

    The logic in SprintFormatList() was altered to be a little shorter. The catenation was altered, and the code is now more obvious. The previous version looked like this:

    	function SprintfFormatList( $excluding=Null )
    	{
    		$fields = $this->fieldNames();
    		if ($excluding)
    			$fields = array_diff( $fields, $excluding );
    		foreach( $fields as $f )
    		{
    			if (in_array($this->fields[$f]->type, 
    					array('number','int','integer','decimal','real')))
    				$output .= ', %d';
    			else
    				$output .= ', \'%s\'';
    		}
    		return ltrim( $output, ',' );
    	}
    

    The changes were simple. The $excluding default was changed to array(). That's usable as a parameter to array_diff(). The first three lines of the body were eliminated, and is now done in $this->fieldNames( $ex ). The If statement became a Switch, and the constants were written just a little differently. The output isn't catenated anymore: it's accumulated into an array, and the output is stringified with a join(). Here's the new code, for comparison:

    	function SprintfFormatList( $ex=array() )
    	{
    		$formats = array();
    		foreach( $this->fieldNames($ex) as $f )
    		{
    			switch( $this->fields[$f]->type )
    			{
    				'number': 'int': 'integer': 'decimal': 'real':
    					array_push( $formats, '%d' ); break;
    				default:	
    					array_push( $formats, "'%s'" ); break;
    			}
    		}
    		return join( ',', $formats );
    	}
    

    15 lines versus 19 lines. More uniform indentation, and clearer code.

    What is dirname(__FILE__) about?

    The old Slaptech framework had the old PHP includes problem, where one cannot to include() a file relative to the current file. All include() paths are assumed to be relative to the first PHP file that's opened in the browser.

    So, if you have a file ./index.php, and a library lib/foo.php, and in lib/ there's another file lib/sub/tool.php, you can't write "include 'sub/tool.php';" in foo.php.

    That really messes with inclusions. The solution we came up with was that the file index.php would have to define a variable $global_base (yes, lame name), and it would usually have a value like "./" or "../" that would be the path to the application's root directory.

    Then, in the classes, you'd have include statements like:

    include_once $global_base."core/Database.class.php";

    We weren't hip to the solution others were using: dirname(__FILE__). You would instead use code like this:

    include_once dirname(__FILE__)."/core/Database.class.php";

    The problem with that is that it causes a call to dirname(), which can't be that fast. See these articles.

    The issue is discussed at PHP 10.0 and Don't Abuse dirname(__FILE__).

    Both styles of specifying paths also suffer from a problem with VIM, because you can't use "gf" to "goto file", a feature in VIM where you can open a file by typing "gf" over the filename.

    The dirname(__FILE__) technique fails because, while the path is relative to the current file, the path starts with "/", fouling up VIM's "gf". Fixing that is simple but wastes even more CPU.

    The $global_base style fails because the path is always rooted in the application's base directory. To make it work, you have to add the app's root to the VIM path. You also need to start the path with a the directory name, not "/".

    The best solution would be if PHP had a new include_once or require_once statement that would perform the include from the current file's directory, and forego looking through "path" for anything. For speed, it should also require the string to be a constant. That way, the path can be checked at compile-time.

    load 'path/to/file.php';
    

    My thinking is that the layout of your files is pretty inflexible. That's just how code is today. It's rare when a program produces some code, adding code to the system. And it's even rarer that the generated code is moved around the filesystem during runtime. In fact, I've never seen it done. It *can* be done, but I've never seen it.

    Code Is Usually Not Data

    One of the big problems with PHP is the include statement. You can include a file. You can define that file at runtime, and alter it during runtime. You can include a string with an interpolated variable. You can alter files and then include them.

    These are all unusual cases.

    What PHP needs is a code loader that accepts only a string constant as an argument, and loads relative from the current file. This statement should assume that the file is relatively unchanging, and check for file modification times only occasionally.

    That way, the PHP compiler can map out what code is unchanging... and compile it into a giant static object.

    WordPress: Exporting Articles into WordPress Extended RSS (WXR)

    This is an example template file that will export articles from your bespoke CMS to and XML file that WordPress version 3.5 will import. There are several articles out there about this, and this is another one.

    This code is a work in progress, and just this code alone won't export from your CMS. It's just an example of a functioning template.

    The technique I used to create this template was to read up on WXR, then do an export of a single article. I copied that xml, and inserted the CMS content. Then, I exported our data, imported it into WordPress, found errors, fixed errors, and repeated the process until all the articles could be imported.

    The template is attached, and some additional support code is below. This support code fixes some data that WP won't import.

    The CMS uses TinyMCE which has inserted some _mce* attributes into the HTML. That gets stripped.

    The template contains some values for users and categories that will need to be altered.

    function cdata($s) { return "<![CDATA[".clean($s)."]]>"; }
    function clean($s) { 
    	$o = preg_replace('/_mce_.+?".+?"/','',$s);
    	$o = preg_replace('/mce_.+?".+?"/','',$o);
    	$o = ltrim(rtrim($o));
    	return $o;
    }
    if( !function_exists( 'xmlentities' ) ) {
        function xmlentities( $string ) {
            $not_in_list = "A-Z0-9a-z\s_-";
            return preg_replace_callback( "/[^{$not_in_list}]/" , 'get_xml_entity_at_index_0' , $string );
        }
        function get_xml_entity_at_index_0( $CHAR ) {
            if( !is_string( $CHAR[0] ) || ( strlen( $CHAR[0] ) > 1 ) ) {
                die( "function: 'get_xml_entity_at_index_0' requires data type: 'char' (single character). '{$CHAR[0]}' does not match this type." );
            }
            switch( $CHAR[0] ) {
                case "'":    case '"':    case '&':    case '<':    case '>':
                    return htmlspecialchars( $CHAR[0], ENT_QUOTES );
                    break;
                default:
                    return numeric_entity_4_char($CHAR[0]);
                    break;
            }       
        }
        function numeric_entity_4_char( $char ) {
            return "&#".str_pad(ord($char), 3, '0', STR_PAD_LEFT).";";
        }   
    }
    
    AttachmentSize
    export.template.php_.txt2.1 KB

    WordPress: Setting the META Description to part of the page's content

    This code puts the first paragraph of the post into the description meta tag. It tries to strip out leading whitespace and any tags. If you insert an image, it should be stripped.

    It could probably use some work - like removing leading nonbreaking spaces, dealing with very short first paragraphs, etc.

    An explanation of why they start with "myplugin". What I do in my install is have a "plugin" that holds a bunch of different functions. This way, when the site upgrades, these functions are retained. They might not work - but the code isn't wiped out.

    function myplugin_meta_description() {
            $default = "The default description goes here, or pull it from some configuration value.";
            if (is_home()) return $default;
            if (is_front_page()) return $default;
            if (is_single() or is_page()) {
                    $q = get_queried_object();
                    if (is_page()) {
                            $obj = get_page( $q->ID );
                    } else {
                            $obj = get_post( $q->ID );
                    }
                    $text = $obj->post_content;
                    return myplugin_first_paragraph( $text );
            }
            return $default;
    }
    
    function myplugin_first_paragraph( $s ) {
            $s = ltrim(strip_tags($s));
            preg_match('/(.+?)(\r\n\r\n|\n\n|\r\r)/mu', $s, $matches);
            return $matches[1];
    }
    

    The tricky thing is that we need the page content before we reach The Loop.

    htdigest Password Changing Function in PHP

    This is a function to change a password within an htdigest password database file. htdigest is one method of user authentication in Apache HTTP Server.

    Global $htdigest contains a path to the htdigest file. Global $domain is the security domain.

    The htdigest formula for the hash is: md5("$username:$securitydomain:$password")

    htdigest is like htpasswd, except it uses the md5 hash for hiding the password, and it supports digest authentication. Digest authentication is more secure than "basic" authentication, because basic authentication sends your password in clear text. Digest authentication sends a hash. This is ever-so-slightly more secure. (Use SSL for real security.)

    For more information: read the caveat about basic authentication.

    function changePass( $username, $secdom, $oldp, $p )
    {
            global $domain;
            global $htdigest;
    
            $changed = false;
            $in = fopen( $htdigest, 'r' );
            while ( preg_match("/:/", $line = fgets($in) ) )
            {
                    $line = rtrim( $line );
                    $a = explode( ':', $line );
                    if ($a[0]==$username && $a[1]==$secdom)
                    {
                            if ($a[2] == md5("$username:$secdom:$oldp"))
                            {
                                    $a[2] = md5("$username:$secdom:$p");
                                    $changed = true;
                            }
                            else
                            {
                                    print "Old password was wrong, or username exist
    s.";
                                    exit;
                            }
                    }
                    $output .= implode( ':', $a )."\n";
            }
            if (! $changed) // assume it's a new password
            {
                    $hash = md5("$username:$secdom:$p");
                    $output .= "$username:$secdom:$hash\n";
            }
            fclose($in);
            $out = fopen( "$htdigest.new", 'w' );
            fwrite( $out, $output );
            fclose( $out );
            system("mv -f $htdigest.new $htdigest");
    }
    

    j_oocms - an excessively object-oriented cms

    They say it's bad to use extend too much. This is a *bad* cms.

    It's a simple image gallery, suitable for things like personal image hosting.

    This was written to study how extend works, and to try out some ideas. At this time, there are six (6) classes. These include text data, html data, form data, multimedia, and photos.

    One feature is that there's no relational database. The data fields are defined by the data that's POSTed to the server. This is probably a security hole, but, the idea was to see how it would "feel" to hack in such an environment. It was pretty cool.

    Another feature is support for attached multimedia. At this time, that's only photos and html files. Still, it's something.

    Yet another feature used in this system is "versioning" or date-based versioning. Each form is associated with a directory, and when you save data to the form, a new copy is made in the directory. So, you never lose data -- though you may have to dig in the files to find the older files.

    The system is raw. It scales, but oh what a mess it makes.

    AttachmentSize
    cms.zip6.49 KB