RAID 5 Parity. What is it, and how does it work?

One morning, I started wondering how RAID 5 parity works to rebuild a disk array. It seemed “magical” to me, that you can get redundancy and still use most of your disk capacity. So I searched for it… and turned up not very much info, and one other person’s unanswered question. A few articles explained it, but in a little more detail about performance, and less detail about the actual parity function. So I wrote this. The good articles were at:
Continue reading RAID 5 Parity. What is it, and how does it work?

Help Speed Up a MySQL Query by Helping the Optimizer

Here’s a noob-to-noob optimization trick. Suppose you have a database table with, say, 200,000 records, and you regularly select on multiple criteria. The rule for selection is to put the most specific WHERE clause first, and the least specific last. The goal is to cut down the search set to something small, and then search through the smaller set. Get all the queries using this order, then create a composite index over the keys to speed up the search even more.

Here are some before and after shots, based on real queries (from sf-active):

select * from tb where display='t' and parent_id=0 and id > 198000 limit 0,30


select * from tb where id > 198000 and parent_id=0 and display='t' limit 0,30

This revision will now cause the first clause to eliminate most of the rows from the table, leaving only around 2,000 rows to scan. The second clause, parent_id, eliminates 50% of the remainder. Display=’t’ is the least selective clause.

Also, it wasn’t noted, but there are already indexes for display and parent_id. So we aren’t starting with absolutely nothing.

select * from tb where display='t' and parent_id=0 limit 0,30


select * from tb where parent_id=0 and display='t' limit 0,30

Also do this:

alter table tb add index (parent_id, display)

That looks virtually identical. Again, this is a real-world situation, where the query was built-up dynamically. The optimization here is that I created an index that will speed up the select. The index matches the order of the query, so the query optimizer will be able to find the optimization easily.

Additionally, it would be a good thing to put all the clauses in all the queries into this order, from most specific to least specific, to gain the maximum optimization. I suspect the query optimizer already does this automatically, but, being meticulous about this seems like good mental discipline.

The real-world effect of this simple optimization, which took around two hours to complete, was dramatic. The slow query had been bogging down the server, with queries taking thousands of seconds to execute (or in our situation, to time-out, and require the admin to go in and kill the thread). Now, the query barely shows up in the process list, and the real-world speed feels like it takes less than five seconds to execute through the web (meaning, it includes dns lookup, tcp connection, and page rendering). Typically, it takes one second, and feels pretty fast.

See Also

Optimizing MySQL – Database Journal
MySQL Optimization – DevShed
Query Optimization

Paginating Database Results

I describe how others page over results, and how I do it.

I find fascinating URIs that looks like this:

Why do programmers use a “page”? It seems odd, because that number is going to be translated to a starting record number, and a number of records (per page). Effectively, the URL could be this:

Now, you can show any arbitrary number of rows, starting at any arbitrary point in the results. Of course, what you lose are the “pages”, because you can get at any single record in the result. (If they can type a url, that is.) To regain the page-y-ness of the interface, you need to write a loop that constructs the pager navigation. One way is this:

for( $i=0; $i < $resultcount; $i+=$pagesize )
    // start the link
    print '<a href="/drupal/script.php';
    print "?offset=".$i."&pagesize='.$pagesize;
    print '">'.title.'</a>';
    // and close the link

This way reduces the coupling between the creation of the page navigation from the display of the results. It also creates a cleaner mapping to the SQL OFFSET statement.

If you need to know what page you’re on, just divide the offset by the page size.