Lazy Evalutation in PHP (sorta)
This is a nice little example that will show you how to do something really useful and cool. It'll also show you how PHP kinda sucks:
<?php
function f( $x )
{
say("function f called with $x...");
return create_function( '', " return quote('$x'); ");
}
function quote( $s )
{
say('quote called...');
return '***'.$s.'***';
}
say('starting...');
$x = f( 'hello, world' );
say('$x defined as '.$x.'...');
print $x();
///// utility funcs
function say( $s ) { echo $s.'<p>'; }
The results look like this:
starting... function f called with hello, world... $x defined as �lambda_6... quote called... ***hello, world***
What this does is defer the call to quote() until you use $x in the print statement.
(One thing to notice is that $x is evaluated by appending (). $x() will call the function named by the value of $x. In a language that has real lambdas, you'd just use $x, and it would be evaluated. This way, a variable and a lambda would appear to be the same thing.)
Why delay evaluation? Well, in PHP-MySQL-land, you have to make a connection to the database before you can use its quote() function. So, this might allow you to set up the query strings first, perform the queries later, and, thanks to laziness, avoid connecting to the database until the query is run.
Typically, this won't gain you much, but in a loaded environment, the connection to the database can take a long time. Deferring it until you need to query, you reduce the length of time a connection is used. In the meantime, you can emit data to the client, so the page starts to draw.
It's also a good idea to also delay the query until you need to get the first row of the result.
In a lazy scenario, we write this code:
$db = new Database();
$statement = $db->quote( $something ); // this returns a function
$result = $db->query( $statement );
while( $row = $result->next() )...
The actual connection to the database would not happen until the next() function call.
There's a little weirdness here though. The query() method accepts a function as it's argument -- that is, a lambda variable.
The problem is, PHP doesn't have a real function constructor. create_function defines a new function, and names it chr(0).'lambda_'.$somenumber! Then it returns the name of the function.
So, if you do $s = create_function(...). $s will contain a string. gettype($s) will return 'string'. So, to see if a string is really a lambda, you have to compare the string to chr(0).'lambda_'.
WTF?!
Then, if it is, evaluate it like this: $foo()
Yuck.
The problem with this, aside from the load of code you have to write, is that in a lazy world, you want to delay everything possible to the last moment, and then perform all your evaluations at the very end.
Why? Because there's a nice effect, where all the function calls pile up, kind of like a big data structure. Then, when you evaluate, the data structure is turned into the result, piece by piece, in a very efficient order.
In our database system, it would mean that the connection is made, used immediately, and released. The connection isn't held open, the query result set isn't kept in memory for a long time, and when the last row is consumed, the result is released, and the connection is released back into the connection pool.
The trade off is that some memory is consumed to build these temporary functions. The upside is that the memory used to hold code can be pretty small, while memory used to hold I/O handles is larger. The likely upside is that memory for I/O is held for a shorter time, so the overall strain on resources declines.
Too bad PHP doesn't have real closures!
But Does It Matter?
If you really dig into this lazy evaluation, it become apparant that what laziness does for you is re-order operations into efficient sequences. So, why bother? Why not just put things into optimal sequences in the first place. That's the traditional PHP style of scrpting, which looks like this (in a very abbreviated pseudocode):
connect; query; loop over results; display output; query; loop over results; display output; query; loop over results; display output; disconnect;
The upside of this style of programming is that it can be very fast. In fact, it's probably faster than any other style of programming.
The downside is that SQL code, HTML code, and PHP code are mixed together.
To get around this, the querying is usually encapsulated into objects. Then, the code looks more like this:
connect; $a = new Object; $b = new Object; $c = new Object; loop over $a; display output; loop over $b; display output; loop over $c; display output; disconnect;
(To some programmers) this is nicer, because the PHP migrates to the top, and HTML migrates to the bottom, and the SQL is hidden away in the object. Also, the databse connection parts can be removed by moving connections into an object - the objects that need the db will connect on their own.
The application is more scalable, because you can manage the growth of the code.
There's one less line of code above. By hiding the connect and disconnect, you lose two more lines, for a total of three. Moreover, this latter pseudocode is a lot closer to the actual code, while the former is really far more abstract than the actual code. So the savings in lines-of-code is pretty large. (Well, there's no savings in the hidden-away parts. We hide the stuff to reduce complexity.)
The problem with this more-OO style of programming is that you have all these objects in memory, and some of them may be starting I/O operations, or opening db handles, when they're constructed. They get made, and then start using resources! That's bad for performance, especially if you don't take the input and release these resources until the very end of the page.
It doesn't seem like it matters. Say a page takes 1 second to render. Suppose it takes .25 seconds to make a query or do some big I/O, optimized to reduce resource consumption, and it takes 1 second unoptimized (because your objects hold resources open until the pages is complete). The .75 seconds might seem tolerable.
However, on an app server, you have to consider the aggregate effect of inefficiency. Assume that the server is generally fully loaded - that it's always executing PHP pages. One of the limiting constraints is the use of resources. By using and releasing the resources quickly (which is like recycling), you gain capacity. With the optimization in effect, you can (theoretically) render four times as many pages!
